disaster recovery Archives - SIOS SANless clusters

Disaster Recovery Planning in an Unpredictable World

April 4, 2026 by Jason Aw Leave a Comment

Disaster Recovery Planning in an Unpredictable World

Computer systems and computerized infrastructure have become a load-bearing part of a modern business environment. As such, the potential for downtime is not just annoying – it is costly. Though the world is unpredictable, having an emergency plan in place through effective disaster recovery planning can ensure that an unexpected issue does not lead to an unexpected problem. This is the role of a High Availability and Disaster Recovery solution.

Understanding High Availability and Disaster Recovery

High Availability and Disaster Recovery is a multi-faceted endeavor of mutually supportive efforts. Though these concepts work in tandem to uplift one another, it is important to understand the boundaries between them.

What is High Availability?

High Availability refers to the capacity of a system, application, or other infrastructure component to readily continue operation. This encompasses the ability of an infrastructure component to be restarted, migrated, or otherwise recovered with minimal loss or regression in the operational state.

This is to say, the infrastructure is able to continue serving the designated role with access to up-to-date information. Additionally, highly available infrastructure may accommodate the ability for multiple infrastructure components to act in a primary role to provide availability.

What is Disaster Recovery?

Disaster recovery refers to the capacity of a system, application, or infrastructure component to withstand a catastrophic failure. Often, disaster recovery is concerned with the catastrophic and irrecoverable loss of some infrastructure component.

A simple example of a disaster recovery solution can be seen any time a data backup is taken and stored off-site. Doing this to protect the data against building-wide disasters that would make the original storage media unrecoverable meets the criteria of a disaster recovery solution, though via an implementation that leaves room for improvement.

How High Availability and Disaster Recovery Work Together

When combining High Availability and Disaster Recovery, both can work to aid the other’s stated goals. A High Availability solution accommodates the ability to ensure systems can resume their operative role in a timely manner, and the infrastructure that can resume the system’s operative role is frequently a part of the disaster recovery solution.

When planned accordingly, the ability to migrate workloads to a healthy infrastructure can enable a disaster recovery solution to operate quickly and effectively, minimizing downtime. These two elements work hand in hand to produce environments that prioritize resilience and uptime equally.

The Real Cost of Downtime

Every computer system, infrastructure component, or other element of a production environment is susceptible to failure. When failure occurs, it is easy to measure the opportunity cost for lost revenue, reduced productivity, or costs of remediating the issues from which downtime originated. These costs alone posed an average cost of $300,000 or more per hour of downtime, a figure cited by 91% of medium to large-sized companies estimating the cost of downtime, as reported in a study performed by International Technology Intelligence Consulting in 2024.

Often not considered, though, is the “soft cost” of downtime. Outages can erode customer confidence, blemish the reputation of an organization, and apply additional pressure to the personnel responsible for the environment. Though downtime does pose a very real and very immediate cost to business, the ripples of such an occurrence may send shockwaves through a business for months or years to come.

Make Resilience a Design Requirement

Infrastructure reaches the peaks of High Availability and the highest capacity for disaster recovery when it is designed with the intention of being a highly available environment that has a strong disaster recovery plan.

The first stage of honoring HA/DR as a design requirement entails setting realistic expectations. Often, these expectations can be summarized via the “Recovery Point Objective” (RPO) and “Recovery Time Objective” (RTO).

To briefly describe these metrics:

Recovery Point Objective describes the data that an organization can stand to lose when restoring from a backup
Recovery Time Objective describes the desired amount of time before an unavailable environment is able to return to operation.

Defining these metrics naturally sidesteps a common issue. As systems are prioritized by their HA/DR needs, systems that are more resilient to downtime can make use of simpler implementations. Systems that require extremely low RTO and RPO metrics, in turn, can be allocated more effort to ensure that the solutions in place on these systems are equipped to meet the higher operational standards.

Use Automation to Reduce Risk in Disaster Recovery Planning

When addressing the strategies for High Availability and Disaster Recovery, the topic is often business-critical systems. These systems often require speedy issue resolution performed in a reliable manner so that an issue does not spiral out of control. Though the personnel responsible for these systems are experts in the nuances of the environment, the potential of human error during issue resolution is an avoidable risk factor.

A robust High Availability and Disaster Recovery solution can incorporate automated failure detection along with automated recovery actions. Not only is the response faster when the issue is automatically detected and executes a recovery plan in kind, but an automated response also takes action methodically and efficiently without the possibility of human error.

Build Redundancy Beyond Technology

Though it is important to design with HA/DR in mind and ensure that solutions can provide automated responses, there is still a human element to designing, creating, and maintaining critical systems. The key to leveraging personnel in these solutions is to allow teams to work in a low-stress environment that allows for careful and methodical problem-solving approaches. When a person is involved in any work, the outcomes should undergo a validation process to ensure that the solution functions as intended.

Even further than the conditions in which work is done, it is also important to ensure that personnel have access to the knowledge that they need to work effectively. If only one person on a team is capable of a particular maintenance task, then there is potential for a gap in operations should they become unavailable.

Planning for operational continuity extends beyond on-system considerations. Ensuring that teams operate to reduce knowledge silos and can put their outcomes to the test before moving into production can protect systems by avoiding issues entirely.

Disaster Recovery Planning Best Practices for Resilient Systems

While there is no one-size-fits-all approach to implementing High Availability and Disaster Recovery solutions, there are guidelines and best practices that can help build out a disaster recovery planning strategy that suits your organization. The aforementioned points serve as a great foundation. Additionally, improvements can be found via some generally applicable goals such as finding and eliminating single points of failure, documenting processes with clear roles and responsibilities, maintaining an identical QA copy of the production environment to validate procedures, distributing systems across geographically distinct regions, and frequently reviewing and updating documentation.

Preparing for the Next Disruption with Disaster Recovery Planning

Disruptions are inevitable, and no organization wants to experience an outage from a failure that could have been predicted and avoided. Taking an approach of intentional planning and implementing a layered solution to provide environments with High Availability and Disaster Recovery ensures that, whether predictable or not, an environment is prepared to weather issues and continue operating at full capacity, so business can operate without a hiccup.

Request a demo to see how SIOS high availability and disaster recovery solutions help protect critical systems and keep your business running.

Author: Philip Merry, SIOS Technology Corp.

Reproduced with permission from SIOS

How To Improve Customer Satisfaction in Technical Support

March 18, 2026 by Jason Aw Leave a Comment

How To Improve Customer Satisfaction in Technical Support

We have customers all over the world. We speak different languages; we are in different time zones; we are in different countries. But there are many things that we have in common when it comes to technical support. We all want and expect the best support when we have problems and need help. What does wanting and expecting the best support actually mean for an IT team?

6 Customer Expectations for a Technical Support Team

Here’s what our customers tell us they expect from a Technical Support Team.

Listen to the Customer

Customers (just like everyone else) like to be listened to. When talking to a customer, it is important to let the customer describe the problem. As a Support engineer, take notes, listen to what the customer is describing, and ask follow-up questions to gather important information. Do not interrupt the customer while they are talking. To confirm you understand what the customer stated, summarize what the customer told you. Summarize actions and make sure everyone is on the same page. Don’t assume you know the problem before the customer has described it.

Talk to a Real Person

Customers still prefer to talk to a “real” person and not an automated voice / AI/ ChatBot. Customers like talking to a support agent right away who knows the product and not just following a script. Nothing is more frustrating than when you call to get help with a problem you are experiencing, and you have to go through multiple automations to try to get to a “real” person. Many times you end up going in circles and arrive back to the scenario in which you started! Valuable time can be quickly wasted trying to get a “real” person on the phone to help you. Customers calling in for help strongly prefer setting up a video conference to share the problem live with a support team. A picture is worth 1000 words! In our experience, trying to help customers without a visual and without asking “live” questions adds to the length of time to solve a problem.

Availability 24 x 7

Customers are all around the world and want to contact support at any time of the day or night. We offer around-the-clock support every day of the week. To accommodate this, we have multiple teams around the world that cover 24 hours a day, every day of the week. When customers need us, we are there for them. We have procedures in place to escalate cases when our team members need immediate assistance on critical downtime issues affecting the customer’s business. Our customers use our High Availability and Disaster Recovery software, and our Technical Support team reinforces this goal by being ready to provide assistance whenever we are needed.

Experienced Support Engineers

Customers don’t have time to get on the phone with a person who can’t help them and needs to pass the call to someone else. Customers want to talk to support engineers who can assist with their questions and problems. At SIOS, we make a point to ensure that customers are quickly put in contact with an experienced member of our technical support team so the issue can be addressed as soon as possible. Based upon our Customer Surveys, customers love our technical support team! Our support team has an average of 16 years of total support experience; this expertise allows issues to be addressed quickly and often without having to escalate cases to another group. Customers appreciate it when they are met with experienced personnel who can join a video conference and provide real-time assistance based on years of experience.

Be transparent

Customers appreciate transparency. They want to know reality. Don’t make promises that you cannot keep. Always ensure that the customer understands what you are going to do to help them solve the problem and when you will be getting back in touch with them. Explain the steps that need to be done to the customer as you go, and ensure that the steps are approved by the customer before you execute them. Many customers need to get pre-approval prior to implementing changes to their systems. In pursuit of transparency, it is important to give the customer frequent updates that give insight into the support process. Even if your update is, “We are still analyzing the logs”, tell the customer this to keep them updated. Don’t tell them what you think they want to hear; tell them the truth.

Customer Surveys

For every case customers open with technical support, a survey is sent to the customer when the case is closed. This gives the customer an opportunity to provide feedback so our teams can continuously improve our products, documentation, and support. Our support team looks at completed customer surveys at least once a week and responds to customers who have concerns, ideas, and improvement suggestions, letting them know what actions we took on their feedback. Customers often thank us for resolving their issues quickly and for demonstrating our commitment to their success by following through on the notes they leave us after the case is closed.

What Customers Expect from a 24/7 HA/DR Technical Support Team

Customers reaching out for technical support on HA/DR products want to know they are being heard by a real person, not a bot. They expect to talk to experienced agents who actually know how to fix their problems and who stay transparent about what’s happening every step of the way. By offering this human touch with 24/7 availability, we show our customers that we are always there when they need us. Today’s technical support isn’t just about solving a ticket; it’s about building trust, listening, and being reliable and honest whenever customers need assistance.

Looking for a technical support team that understands HA/DR? Schedule time with a SIOS HA expert to see how we deliver high availability, automated recovery, and reliable cluster deployments.

Author: Sandi Hamilton, Director of Product Support Engineering at SIOS

Reproduced with permission from SIOS

Designing High Availability Through Modularity and Abstraction

March 6, 2026 by Jason Aw Leave a Comment

Designing High Availability Through Modularity and Abstraction

Thus far, this series has explored parallels between technical design and rhetoric. The “rhetoric” of a technical solution, the strategy of communicating meaning and purpose, is presented via the design patterns and concepts. The design patterns and concepts exist as a conceptual foundation, upon which the meaning is translated into an applied form when put into practice during implementation.

As previously discussed, the continuity and integrity of this conceptual foundation are paramount to ensuring that solutions are kept up to a standard that is conducive to maintenance, improvement, and long-term reliability. External factors influencing a solution’s design challenge the goal of upholding the conceptual foundations put forth in a solution’s design. These external factors can conflict with the standing principles, and thus, tools, applications, and platforms used in a solution must be chosen mindfully.

In the third and final part of this blog series, modularity and abstraction will be explored as a means to put boundaries in place and ensure that projects with a wide scope can continue to reap the benefits of a well-formed, rhetorically sound design.

High Availability Design Principles: Why Modularity and Abstraction Matter

Before addressing modularization and abstraction as strategies, it is important to understand why these should be implemented. Starting broadly with an analogy, a speaker trying to convince their audience to agree with their plan might first need to outline multiple foundational points. In doing so, each pillar of their argument’s foundation gets put forth and justified.

The speaker first must set up the “A implies B” and “C implies D” basis, upon which they can form the argument “B and D imply E”. This strategy ensures that the reasoning in which “A implies B” does not cross-contaminate and detract from the separate point “C implies D”. This strategy is frequently used because it allows each component of the speaker’s argument to stand independently of others. If the argument “C implies D” is flawed, it can be reconciled while the argument “A implies B” remains sound.

The reason for this structure is the same reason why technical systems are decentralized – a problem in a point of sale system can be remediated without the need to expand the remediation efforts to the databases, APIs, network architecture, and so on. The strategies referenced above are, of course, in reference to the concepts of modularity and abstraction.

Modularity in High Availability Architectures

First, addressing modularity, this is the practice of creating systems from components that are self-contained. In the rhetorical sense, the arguments “A implies B” and “C implies D” are simply modules of reasoning that get assembled into the argument as a whole.

More technically, modularized components (such as the point of sale system in the previous example) allow issues to be addressed entirely within the module where the issue originates. Each module in the solution acts as a building block, and a problem in a single building block can be resolved without dismantling the entire solution.

Abstraction as a Strategy for Scalable Infrastructure Design

Closely related to modularity is “abstraction”. Abstraction is the practice of ensuring the design of the overall solution is independent and agnostic to the design of the modules that compose the overall solution.

Further, abstraction as a design strategy also holds that each module is independent and agnostic to the design of every other module. When a solution is designed to use abstracted elements, these elements can be reused and applied in use cases that allow for understanding to be amplified throughout the project.

Designing High Availability That “Stays Out of the Way”

When designs are built of modular components, boundaries are drawn. These boundaries ensure that each module can “Stay out of the way” of the other modules. When the components are abstracted, the contents of each module can be understood more easily.

In turn, the boundaries serve as a structure by which the design can be understood, and the abstraction within these boundaries serves as an entry point to understand the foundations of the use case. The structure provided via modularity and abstraction mirrors the role of rhetoric in providing a framework by which purpose is understood.

Managing Complex Network Architectures with Modular HA Solutions

As technical solutions are being developed to address more complex problems, the need for a solid framework in that solution’s design grows as well. Network architecture, often the culmination of many solutions that are complex in their own right, serves as a fantastic example of the increasingly complex problem and growing requirement for solid frameworks in design. Furthermore, network architecture often suffers from continual growth as it has to absorb the sprawling web of systems that contribute to the purpose of a growing business.

Layered on top of this, the solution architecture must then employ solutions for High Availability and/or Disaster Recovery. This creates a hot spot for design conflicts to arise, but can be easily mitigated with the strategies of modularization and abstraction.

Applying Modularity and Abstraction with SIOS High Availability Software

The benefits of High Availability software can be achieved without the baggage of complexity and hacked solutions. SIOS LifeKeeper, as an example of a design-compliant High Availability tool, is created in a way that the principles of its operation can mesh seamlessly with the environment in which it is used.

LifeKeeper is modular, as it does not impose requirements outside of the LifeKeeper-protected systems. LifeKeeper also facilitates the abstraction of infrastructure components to bite-sized elements – systems that work together to ensure availability are grouped into a “cluster”.

Through this abstraction, the rhetoric of the environment remains strong – understanding the makeup of one cluster lays the foundation to understand all clusters. Layers of the design can be understood for their purpose; there is no need for asterisks and special considerations on how implementations differ across the design. As the clusters act independently of other clusters or external solution components, a boundary can be drawn where the design elements of each respective layer are contained, avoiding conflict with other layers of the infrastructure.

Building Long-Term Resilient Infrastructure with SIOS Protection Suite

As with any software or tool, SIOS Protection Suite (SIOS LifeKeeper and/or SIOS DataKeeper) influences the design of environments in which they are used. Though these patterns are brought in by virtue of having a LifeKeeper and DataKeeper protected environment, SIOS LifeKeeper and SIOS DataKeeper carefully selected the patterns in use to ensure that these patterns enable abstraction and modularity within the solution as a whole. As a result of the layered abstraction enabled by both LifeKeeper and DataKeeper, the introduction of these utilities facilitates integration with the IT infrastructure that maintains cohesion in the solution’s design.

As a result of the design patterns employed, clusters protected by SIOS Protection Suite (LifeKeeper and/or DataKeeper) compose an abstract and modular element that fits seamlessly into existing designs and solutions. LifeKeeper and DataKeeper do more than simplify the administration of single systems or each respective cluster; LifeKeeper and DataKeeper work with the principles at play in a deployment.

Creating infrastructure becomes simplified and more efficient as the use of SIOS Protection Suite allows for a simple method of understanding the system’s role in the design, while at the same time providing a simple method for implementing High Availability and Disaster Recovery. Administrators may use LifeKeeper and DataKeeper as a tool to improve their ability to understand, operate, and improve upon the solution for years to come.

See how high availability can support your infrastructure’s design—without adding complexity. Request a demo today!

Author: Philip Merry, CX Software Engineer at SIOS

Reproduced with permission from SIOS

Common Customer Misconceptions

February 11, 2026 by Jason Aw Leave a Comment

Common Customer Misconceptions

High availability (HA) and disaster recovery (DR) are often misunderstood.

Listen to the full conversation in this podcast as Greg Tucker unravels common customer misconceptions about HA/DR—from overestimating what cloud providers guarantee to misjudging the complexity of solutions like SQL Server Always On Availability Groups. Greg shares real examples of avoidable mistakes, explains how SIOS helps customers rethink their HA/DR assumptions, and offers practical advice for IT leaders looking to make informed, resilient infrastructure decisions.

Reproduced with permission from SIOS

Ensuring IT Resilience and Service Continuity in State and Local Government

January 20, 2026 by Jason Aw Leave a Comment

White Paper: Ensuring IT Resilience and Service Continuity in State and Local Government

Local government operations rely on always-available IT systems — from court and tax databases to 911 dispatch and school platforms. When these systems go down, the impact is immediate and far-reaching, affecting public safety, legal processes, payroll, and community trust. Many agencies struggle with aging infrastructure, tight budgets, and small IT teams managing complex hybrid environments.

SIOS helps solve these challenges with high availability (HA) and disaster recovery (DR) solutions that prevent downtime, protect critical data, and keep essential public services running, without requiring major infrastructure changes or deep HA expertise.

Reproduce with permission from SIOS