May 8, 2025 |
Transitioning from VMware to NutanixTransitioning from VMware to Nutanix10 Considerations for Choosing a High Availability Solution in a Nutanix EnvironmentIf you’re planning a move from VMware to Nutanix, making sure your critical applications stay up and running should be at the top of your list. While Nutanix offers great benefits like simplified management and better performance, its built-in high availability only covers the virtual machine—not the applications themselves. This paper shares ten key insights to help you plan ahead and avoid downtime during and after your migration. You’ll get practical guidance on choosing the right clustering solutions for both Windows and Linux, how to handle shared storage in Nutanix, and what to consider if you’re running a mix of operating systems. Whether you’re moving to Nutanix AHV, or managing hybrid environments, learn how to simplify your HA strategy, reduce risk, and keep your most important systems protected. Reproduced with permission from SIOS |
May 2, 2025 |
Are my servers disposable? How High Availability software fits in cloud best practicesAre my servers disposable? How High Availability software fits in cloud best practicesIn this VMblog article, “Are my servers disposable? How High Availability software fits in cloud best practices,” Philip Merry, a software engineer at SIOS Technology, explores how the shift to cloud computing has changed the role and perception of servers in modern IT environments. With the rise of automation and infrastructure-as-code, servers have become increasingly disposable, easily created, destroyed, and replaced, aligning with cloud best practices like those outlined in the AWS Well-Architected Framework. However, Merry emphasizes that while infrastructure can be treated as temporary, the applications running on it remain critical and must be continuously available. To bridge this gap, high availability (HA) software plays a vital role, allowing IT teams to maintain uptime and reliability by decoupling application continuity from the underlying server hardware. This approach empowers organizations to embrace the flexibility of cloud environments without compromising on the stability and performance of their essential applications. Author: Beth Winkowski, SIOS Technology Corp. Public Relations Reproduced with permission from SIOS |
April 27, 2025 |
Data Recovery Strategies for a Disaster-Prone WorldData Recovery Strategies for a Disaster-Prone WorldWorking in a position with its roots in software engineering, system administration, and customer support positions, one has a unique opportunity of seeing a variety of configurations and a myriad of issues. Additionally, such a position also gives one perspective on users’ various needs, pain points, and concerns in a way that someone working in a purely engineering role might not be exposed to. As a result of almost 5 years on the support team, I have noticed patterns in various teams with which I have worked. Further, when called to help on various configurations, I have a unique opportunity to draw parallels between the different use cases and root causes. . As a result, there is a foundation that I like to ensure is set when it is time to begin collaborating with a new team. Setting this foundation means ensuring administration practices facilitate working optimally with an HA/DR suite, ensuring teams know how to design for High Availability and how to leverage the utilities beyond the software on their systems to achieve success. This foundation can be crucial to ensuring a team knows how to meet or exceed their operational standards. It seemed appropriate to summarize the common questions and their answers to serve as a resource for those who are new to, but interested in implementing a High Availability solution, or simply want to change to using a new High Availability solution. Whether you are a student just now starting to study system administration/systems engineering, or you are a veteran software engineer who has been asked to expand the scope of your role to include system architecture planning, the points below can aid in your journey to get the most out of a high availability/disaster recovery suite. Without further ado, the questions below summarize the common talking points I have seen in my role, and will help make your search for understanding key concepts and finding a fitting solution easier. What is Disaster Recovery and what does it entail?Disaster recovery, when coupled with high availability, works to optimize the recovery time objective (RTO) – how long a service is inaccessible before being restored – and recovery point objective (RPO) – the data you can stand to lose when restoring from a backup. How does Disaster Recovery differ from traditional approaches to weathering outages?Traditionally, without a highly available infrastructure, an environment experiencing a disaster may have a lengthy return time objective. Systems need to be restored, issues may need to be resolved, and applications started by administrators. Depending on the severity of the issue, it could take hours or more to get back up and running. Teams must work efficiently and exhibit tight communication to ensure service is restored without mistake, lest they risk additional delay in returning to operation. Additionally, the data lost during this sort of outage could be significant. If backups were not taken recently or if the copies of up-to-date data are not accessible, then teams could be relying on data that has gone “stale” and experience operational setbacks on an organizational scale due to the loss of critical data. To look at things from a customer perspective, how long are you willing to wait to obtain access to an online service when you need it? As a customer, how accepting are you if an online storefront loses record of your transactions? When introducing a highly available infrastructure, a means to mirror storage, and a means to orchestrate the high availability, the factors influencing RTO and RPO are all optimized, and a disaster can be weathered with far more grace. A highly available infrastructure is redundant, so a standby system is available to take over operation. Further, the orchestrator – software to manage the clustered environment – is able to systemically start services on a standby system with greater responsiveness, reliability, and efficiency than a manual intervention can achieve. As a result, the return time objective is reduced, and rather than taking hours to recover from a disaster, it can take mere minutes or less. Another facet of highly available infrastructure is the redundancy of data. Disks can be “mirrored”, in which disks that are attached to different systems can all receive the exact same data in real time. As a result, the data available on the aforementioned standby system can be an exact copy, effectively maintaining a backup of the data immediately before a disaster occurs. In turn, when service is restored, applications are running with a near-zero recovery point objective, keeping the recovery point objective to the most current state of operation possible when it is time for the orchestrator to move operations to the standby system. What are the most common mistakes organizations make when designing high availability disaster recovery (HADR) strategies, and how can they avoid them?One of the most common missteps observed is the lack of a QA/Testing environment. The SIOS Customer Experience team has responded to multiple instances of such, where organizations attempt to do application/operating system patching/upgrades or just routine maintenance and experience issues due to inadequate planning or some sort of unfortunate incompatibility. Then, there is a downtime that occurs for the environment, and a maintenance procedure turns into a recovery procedure. This introduces delays, complications, and potential for a spiraling issue to occur within a production environment. By far, the biggest recommendation that can be offered to organizations is to create a one-to-one copy of the production environment that operates in a quality assurance capacity. Every procedure that needs to occur on production should first go through a “dress rehearsal” in the QA environment. This gives organizations the freedom to exercise the planned operations and make improvements without risking the productive capacity of their infrastructure. Practicing operations in a safe, low-stakes environment ensures that teams are ready to operate in the production environment without the risk of encountering an unexpected issue and having to go “off script” to respond quickly and correctly while under pressure. If a problem happens in the QA environment, then support teams can be contacted, and the issue can be investigated with the safety of this issue being insulated from affecting business operations. This can greatly improve the potential for solutions to be found and implemented into operations in a controlled, planned, and effective manner. The aforementioned benefit of the QA environment is important for any organization; however as organizations adopt more complex maintenance strategies, the existence of this test environment becomes all the more important. The use of this testing environment not only facilitates smoother upgrade procedures but also allows companies to mitigate risk when adopting maintenance models that introduce complexity for the return of improved system availability during maintenance activities. In any scenario, testing the maintenance plan in a QA environment, improving the plan based on findings from the “dress rehearsal”, and using the experience gained from this practice enables organizations to manage production systems while minimizing the risk of encountering issues. What is the importance of eliminating single points of failure?Another common obstacle that teams can experience arises from having a “weakest link” in the architecture that does not benefit from the degree of planning that other facets of the environment receive. This is best described with an example. The SIOS Customer Experience team once worked with a customer who designed extensively around keeping SAP applications running in their environment and were very well insulated from issues affecting the systems running the SAP applications. Unfortunately, this customer invested much of the planning effort into protecting their applications and did not afford that same planning effort to other aspects of their environment. As a result, all the systems relied on a singular internal DNS system that resolved hosts within their private network. Despite all of the effort in protecting SAP, when an issue occurred on their DNS system, the whole environment experienced significant issues when name resolution was no longer available. Effectively, the effort placed into protecting their SAP applications did not help their environment weather the issue, simply because the DNS was a “weak link” that all of the other systems relied upon to function properly. When planning environments, it is crucial to step back and look at the bigger picture – pay attention to the weakest links that show up in an architecture. Improving the weakest links uplifts the potential for the entire environment to weather a disaster. For organizations relying heavily on cloud services, how can they protect against Zone or region-wide disasters?Protecting against zone or region-wide disasters can be done by just distributing resources geographically. For example, one might host their primary application server in the US-East region. Then, to be protected against an outage affecting the US-East region, there are standby systems hosted in a “Disaster Recovery Site” that is far away from the US-East region – maybe the US-West region. While this does introduce some additional steps to ensure cross-region communication, the effort is invaluable as this provides protection against zone and region-wide levels of disaster. A total outage of the cloud provider’s US-East region can be withstood by bringing applications in service in the US-West region. Protection against outages that occur in a specific region doesn’t need to be complicated, and ensuring a Disaster Recovery site exists to assume operations will improve application availability and data redundancy in production environments. How do you recommend organizations balance the complexity and cost of implementing robust HA/DR strategies with the need for business agility?There is a common assumption that HA/DR solutions are either complex or expensive, or both. In the wake of this assumption, it is essential to keep a strong perspective on the stakes at hand. Systems are operational for some business purpose, and this translates into the production of revenue. When systems are down due to an outage, there is much more cost than just the lost revenue. Without an HA/DR strategy in place, an outage requires employees to be actively troubleshooting the issue, producing a cost of employee-hours to factor into the cost of downtime, perhaps even at hours when employees are not well-rested and prepared to do their best work. In addition to this, there is a lingering collateral cost in terms of interruption of regular duties and delay/slowness when employees have to task switch into resolving production issues and then switch back to their regular duties. Even further, there are reputational costs that could cause failure to recognize opportunities for revenue. For instance, what comes to mind if you think of “CrowdStrike”? Even if this doesn’t immediately bring the issues and related bad press that CrowdStrike experienced in July of 2024, at the time of writing this (March 25th, 2025), their stock prices have only just returned to the levels they were at before the issue on July 19th, 2024. Taking into account the opportunity cost of configuring an HA/DR solution, the aforementioned factors can vastly change the analysis. Commonly, SIOS customers find that the implementation of an HA/DR solution saves them money in the long run. Additionally, backed by decades of improvement and iteration on the HA/DR offerings from SIOS Technology, the complexity of configuring such a solution is more approachable and less complex than ever. If there are factors at play that still bring concern over the complexity of introducing an HA/DR solution to a production environment, SIOS Technology has professional services offerings that can help to train teams, perform installation and configuration activities, or simply validate existing configurations. With these opportunities, bringing High Availability into a system architecture is not only less complex than it has ever been, but it can be implemented faster than ever before. Finally, for organizations concerned about complexity due to unique configurations or trying to reach the absolute maximum utility of an HA/DR solution, our world-class support team is available to help bring any implementation to its full potential. How do SIOS Technology’s solutions play a role in helping organizations implement the disaster recovery approach that you advocate for?SIOS Technology’s solutions can meet all of the aspects addressed previously, to recount some of them: Modern approaches to disaster recovery are adopted by way of our LifeKeeper and DataKeeper products, which together we call SIOS Protection Suite. Whether on Linux or Windows, these products are available to provide cluster-wide orchestration of resources to ensure a quick and efficient response to disasters while also ensuring data is replicated and available on standby systems. LifeKeeper monitors applications for faults and communicates between nodes to ensure systems are valid targets for application recovery. Datakeeper replicates data in real time to ensure standby systems are able to inherit applications in the event of an issue and continue operation on the latest available data. Hand in hand, these products work to minimize the length of time applications are down and minimize the loss of data in the event of a disaster. These products also integrate fully within your environment. There are mechanisms to provide efficient networking control so clients can always resolve the connection to the application servers. The solutions at play will not only monitor applications or specific components of a system, but also an entire system and environment. Through the use of “quorum” functionality, environments are monitored at a “big picture” level to ensure applications are restored on the correct systems and data is protected. There are protections in place for a myriad of disaster scenarios, so SIOS Protection Suite is able to respond appropriately. SIOS Protection Suite is also able to work across regions, providing the protection we discussed against zone or region-level disasters. Applications can be migrated across regions, and data can be replicated across regions with the same ease as it can be replicated within the same region. Additionally, environments can be multi-tiered. Multiple nodes can be hosted in the primary region and act as either active or standby systems, providing fast responsiveness to system-level issues, while a disaster recovery site in a different region can also be maintained to ensure there is protection from region-level disasters with the same speed and efficacy of protection. Finally, the SIOS Protection Suite product benefits from decades of real-world use. It has been put through its paces in a wide range of scenarios and deployment configurations, and benefited from years of ease-of-use improvements. As a result, this is a solution that is flexible, easily adopted, and fits seamlessly into production environments. The complexity of designing and configuring an HA/DR solution is avoided by adopting SIOS Protection Suite and enjoying the benefits of a rich development history with countless improvements, coupled with the world-class support team that is available to help in case of any questions or concerns that may arise. In addition to all of this, there are also opportunities to undergo collaborative installation or validation procedures for SIOS Protection Suite offerings, ensuring your environment is ready for whatever the world can throw at it. Finally, teams that need strongly experienced staff and want to maximize their leverage of SIOS Protection Suite and its components, SIOS offers training engagements where teams are able to work with our staff to understand the components at play and have an active discussion to facilitate deep understandings that ensure staff can hit the ground running with all of the information needed to implement the solution to its highest potential. Protect your business from downtime and data loss—request a demo or start your free trial to see SIOS in action. Author: Philip Merry, CX – Software Engineer at SIOS Technology Corp. Reproduced with permission from SIOS |
April 21, 2025 |
DataKeeper and Baseball: A Strategic Take on Disaster RecoveryDataKeeper and Baseball: A Strategic Take on Disaster RecoveryThroughout my career, DataKeeper is becoming the industry standard within “think tanks” and “water cooler” chatter, when it comes to Data Protection and Disaster Recovery. How about the great American pastime of baseball and its comparison to DataKeeper? Albeit I’m a huge fan of the sport, as these two things are seemingly unrelated, there are some similarities to be drawn upon. Building a Winning Game Plan for Data ProtectionFirst and foremost, both Baseball and DataKeeper require an acute “game plan”. In baseball, teams have practiced and devised a plan to outcompete their opponents in hopes of a victory. Similarly, DataKeeper requires a “thought-provoking” strategy to ensure data protection is leveraged and can be recovered should something catastrophic occur. Secondly, teamwork remains paramount. Infielders, outfielders, managers, and the batboy each have a specific role to ensure the best chance of victory. With DataKeeper, multiple teams may be involved, e.g., Database Administrators, Infrastructure staff, Customer Experience/Support, Management, just to name a few. All should be thoroughly invested in effectively protecting and recovering data. Where Baseball and DataKeeper Differ: The Stakes Are Higher in ITThere are some differences that can’t be overlooked. While losing a baseball game, especially if it’s the World Series, Game 7, the last inning, 2 outs, 3 balls – 2 strikes, can be a “bummer”, the stakes are much, much higher with DataKeeper. Losing data can have serious consequences for a business. While baseball players require a unique skill set of athleticism, DataKeeper is a solution that requires knowledge of Enterprise Systems and related processes. In summary, while baseball and DataKeeper may seem totally different, there are some parallels we can draw upon in conclusion. Both require:
Whether you’re a fan of baseball or an IT professional, it is evident that both require a level of skill and dedication to succeed. What’s Your Data Protection Game Plan?Check out the game plans/solutions that are offered at us.sios.com/solutions/ PLAY BALL . . . Reproduced with permission from SIOS |
April 15, 2025 |
Budgeting for SQL Server Downtime RiskBudgeting for SQL Server Downtime RiskIn this TechRadar Pro article “Budgeting for SQL Server Downtime Risk,” SIOS’ Dave Bermingham emphasizes the importance of aligning business continuity plans with realistic budgets to mitigate interruptions in mission-critical SQL Server deployments. He advises organizations to assess the significance of each SQL Server instance, understand the potential impacts of downtime—including lost revenue, reduced productivity, data corruption, and legal penalties—and allocate appropriate resources, whether on-premises, cloud, or hybrid, to ensure preparedness for disasters. Reproduced from SIOS |
- Results 1-5 of 959
- Page 1 of 192 >