SIOS SANless clusters - SIOS SANless clusters High-availability Machine Learning monitoring

October 14, 2025

Commonalities between Disaster Recovery (DR) and your spare tire

In our recent blogs, we’ve drawn some interesting parallels between cars and DataKeeper. These posts have explored topics such as:

Transitioning from LifeKeeper to Windows Server Failover Clustering (or vice versa)
Maximizing the efficiency of your ‘GET’ commands in DataKeeper
Comparing your car dashboard to the DataKeeper User Interface (UI)

Let’s keep that theme rolling (pun intended)

Understanding the Role of a Spare Tire (and a DR Node)

Let’s give a brief intro on the function of a spare tire and the function of a DR node in a DataKeeper clustered environment running Windows Server Failover Clustering™.

A spare… will temporarily replace a damaged tire, allowing you to reach a repair shop, home, or other destination, saving you time and avoiding being towed ($$$) or stranded. Though convenient, temporary spares have limits on longevity and speed.

Understanding the Role of a Disaster Recovery Node

A Disaster Recovery node . . . is typically a standby node (spare) that contains applications and data, often located in a different region from its primary location to protect against outages/disasters, man-made or natural.

There are endless pros and cons for both. I’ve named just a few for the sake of readership . . .

Drawing Parallels Between Your Spare Tire and a DR Node

Pros (with a spare)	Cons (without a spare)
Reduce being stranded	Delays, stranded overnight
Avoid Roadside Assistance	Roadside service may take hours
Mobile again to go get it fixed permanently	Must wait for a tow or other means to get the repair done, which can be costly

Pros (with DataKeeper)	Cons (without DataKeeper)
Streamline failover without manual intervention	Need to rebuild systems, restore data manually
Reduce risk of data loss	SLAs not met, loss of sales, penalties
Maintaining customer trust	Not meeting customer expectations reduce confidence

In this blog, we can draw a clever analogy between Disaster Recovery (DR) in DataKeeper clustered environments and the humble “doughnut” tire in your car.

Both serve as critical safety nets in moments of crisis, ensuring you can recover quickly and avoid prolonged downtime.

Why a Reliable DR Solution Matters More Than Ever

Just as a spare tire ensures you can keep driving after a flat, a DR node provides critical backup infrastructure to keep your business running smoothly in the face of outages, cyberattacks, or natural disasters.

In today’s fast-paced digital world, downtime can result in lost revenue, damaged reputation, and even legal liabilities—making the need for a reliable DR solution more crucial than ever.

A DR node acts as a safety net, allowing businesses to recover quickly and minimize disruptions to operations. For customers, investing in a DR node is not just about mitigating risk; it’s about ensuring peace of mind, protecting valuable data, and maintaining trust with clients and stakeholders.

Keep Your Business Rolling with DataKeeper

In short, a Disaster Recovery node is the cornerstone of resilience, empowering businesses to stay agile and focused no matter what challenges arise. Whether it’s a spare tire or a Disaster Recovery node, preparedness is the key to staying on track when life throws unexpected challenges your way. Just like you wouldn’t drive without a spare, don’t run your business without a DR plan. Request a demo to see how DataKeeper keeps your operations moving.

Author: Greg Tucker Senior Product Support Engineer at SIOS Technology

Reproduced with permission by SIOS

October 3, 2025

Unlocking Near-Zero Downtime Patch Management with High Availability Clustering

Patch management is one of the toughest balancing acts in IT. Every month or quarter, OS and application vendors release updates with critical security fixes. These patches need to be tested and applied quickly — but rushing the process risks instability, and delaying it increases vulnerability. For organizations running mission-critical applications, the stakes are even higher.

That’s why IT leaders are increasingly turning to high availability (HA) clustering to streamline patch testing and deployment, while keeping downtime to a minimum.

Why Patch Management Is So Challenging

Testing takes time and resources. QA environments aren’t always available, and teams may feel pressure to shortcut testing just to keep up.
Cyberattacks move fast. Zero-day exploits are weaponized within hours of a patch release. According to the Ponemon Institute, 57% of breaches are attributed to unpatched vulnerabilities.
Downtime is costly. Whether planned or unplanned, downtime averages $5,600 per minute (Gartner). In industries such as healthcare, aviation, and manufacturing, even a brief outage can have significant financial and safety implications.

The challenge is clear: organizations must patch faster, test thoroughly, and minimize disruptions.

How HA Clustering Transforms Patch Management

High availability clustering pairs a primary server node with a secondary node. Advanced clustering software continuously monitors the environment — applications, OS, storage, and networks. If a failure occurs, operations seamlessly move to the secondary node without downtime.

This same architecture enables a “rolling upgrade” approach for patching:

Patch the secondary node while the primary node continues to run.
Test the update on the secondary node before making the switch.
Fail back if needed — if issues are found, operations instantly continue on the primary node.
Cut over if successful — if tests pass, operations shift to the secondary node, and the primary can be patched next.

The result: organizations can apply updates faster, avoid risky shortcuts, and keep systems available 24/7.

Strengthening Security, Compliance, and IT Resilience with HA Clustering

Modern regulations, such as HIPAA, PCI DSS 4.0, and NIST 800-53, require timely patching. At the same time, high-profile incidents (such as the CrowdStrike update failure) have shown the danger of rushed, untested updates.

By integrating HA clustering into patch management strategies, IT teams can:

Meet compliance requirements without sacrificing uptime.
Reduce risk from patch-related failures.
Strengthen overall IT resilience against cyberthreats.

Near-Zero Downtime Patch Management for Mission-Critical Applications

The old trade-off between speed and stability in patching no longer exists. With high availability clustering, IT teams can patch quickly, test safely, and keep mission-critical applications online, all while reducing downtime to near zero.

If your organization struggles with patch management, HA clustering may be the key to safer updates and stronger resilience.

Ready to eliminate downtime from your patching process? Request a demo of SIOS High Availability Clustering and see how your team can patch faster, stay compliant, and keep critical applications running 24/7.

Author: Ben Roy, Marketing Specialist at SIOS

Reproduced with permission from SIOS

September 26, 2025

How to Safely Combine DataKeeper for Linux with Backup and Replication Tools

When using other Backup or Replication Software with DataKeeper for Linux, the purpose of DataKeeper is to replicate data between servers in a cluster, ensuring all relevant servers have the most up-to-date copy of data. This is crucial when a server experiences unplanned downtime, and LifeKeeper is able to ensure critical applications are highly available and can maintain uptime with the use of DataKeeper.

When combining DataKeeper with other backup or replication software, it’s essential to confirm compatibility to avoid conflicts. Replication software can interfere with DataKeeper’s resynchronization, sometimes due to the order in which replication processes begin. While aiming for maximum uptime and availability is beneficial, it’s critical to verify that such measures will maintain your cluster in an optimal state.

How to Test DataKeeper for Linux with Backup and Replication Software

It’s important to test the compatibility of the replication software being used alongside DataKeeper to ensure its functionality. Below is a list of items you can check to verify functionality.

1. Test on a QA cluster.

Before using both backup/replication software on your production cluster, create a QA cluster environment with DataKeeper to run tests on.

A QA cluster is beneficial for running tests before introducing anything new into your production cluster. This helps with avoiding issues that would arise on your production cluster by being proactive with catching and/or fixing any issues that arise on your QA cluster.

2. Complete basic functionality test.

A couple of basic tests should be completed with DataKeeper as the only replication software installed. This is a sanity check before verifying continuing with any other software.

Base tests should include testing for a successful switchover and failover. Visit the link below for steps to confirm switchover can be successfully performed.

https://docs.us.sios.com/spslinux/9.9.1/en/topic/testing-your-datakeeper-resource-hierarchy

3. Complete basic functionality tests with other software.

Run the same tests mentioned above while the software is backing up/replicating your data, and after the software has completed backing up/replicating your data.

To be able to use the software with DataKeeper, it’s important that all these functionality tests pass.

Using GenApp Resources to Manage Backup and Replication Processes with DataKeeper for Linux

If testing yields unsuccessful results, it is possible to create a Generic Application (GenApp) to start and stop the relevant processes during a switchover

A GenApp can be used in the hierarchy to restore and remove the process used by the replication software to handle the order in which the software runs.
- A hierarchy determines the relationship between resources. Top-level resources depend on bottom-level resources to create a dependency relationship. When a hierarchy is taken out of service, LifeKeeper takes a top-down approach, removing the top-level resources before the bottom-level resources. When a restore is issued, LifeKeeper takes a bottom-up approach to restore the bottom-level resources before restoring the top-level resources.

With this understanding, two GenApps would be created, one as a top-level resource and the other as a bottom-level resource. This configuration ensures that when the hierarchy comes into service, the bottom-level GenApp will stop the process, and the top-level GenApp will start it. When the hierarchy is being removed, the only action would be for the bottom-level resource to stop the process.

Read more about creating a GenApp in the link below.

https://docs.us.sios.com/spslinux/9.9.1/en/topic/creating-a-generic-application-resource-hierarchy

Ensuring DataKeeper Cluster Compatibility and Preventing Downtime

Ultimately, testing and verification are key before introducing more backup or replication software into your DataKeeper Cluster. These steps are intended to avoid downtime by providing a list of items to complete to make sure your configuration is in order before being introduced into your production environment. Before integrating additional backup or replication software with your Linux DataKeeper Cluster, thorough testing and verification are essential. Completing these steps ensures your configuration is properly set up and helps prevent downtime when introduced into your production environment.

Ready to see how SIOS can help you simplify high availability and ensure seamless backup and replication with DataKeeper for Linux? Request a demo today.

Author: Alexus Gore, Customer Experience Software Engineer

Reproduced with permission from SIOS

September 20, 2025

Think Before You Script: Best Practices for Gen/App Recovery

SIOS Recovery Kits provide a wealth of best practices for application-aware monitoring and recovery. In general, each SIOS recovery kit provides a step-by-step programmatic approach to restoring the application, database, or service in accordance with High Availability (HA) best practices. The SIOS Recovery Kits provide the intelligence needed to restore operation after a normal system shutdown, after an unexpected system failure or crash, and even in the case where the application, database, or service itself crashes or becomes unavailable. In addition, each recovery includes experiential wisdom and improvements from over two decades in the field.

However, if a customer still needs to roll their own script for providing HA, SIOS LifeKeeper for Windows and SIOS LifeKeeper for Linux include an option for script integration via the Generic Application (Gen/App) Recovery Kit.

Best Practices for Writing Gen/App Recovery Scripts

1. Use Modern, Supported Scripting Languages for Gen/App Recovery

A common practice with existing solutions is to use the old existing scripts on new systems and architecture. However, it is essential to make sure you are using a modern, supported scripting language.

2. Avoid Hardcoded Values in Gen/App Scripts

Using hardcoded values can cause portability issues, as well as challenges with long-term maintenance. Avoid using hard-coded values that are subject to change in future deployments, for example, directory paths, user names, or similar.

3. Practice Code Reuse to Improve Gen/App Script Quality

Duplicate code is a common problem in customer-developed scripts. Duplicate code creates quality, maintenance, and troubleshooting problems. Practice code reuse, such as inheritance, functions, and subroutines.

4. Choose Meaningful Names for Functions and Variables

Descriptive variables are more helpful than single-character variables such as ‘n’ or ‘i. When looking at code months or years later, will the variable ‘n’ mean as much as iReturnCode?

5. Remove Unused Functions and Variables to Prevent Code Bloat

While meaningful names for functions and variables are important, avoid cluttering the code with unused variables and functions. Declaring variables and not using them can create confusion during future updates and troubleshooting. While the days of 8 MB of memory are long gone, additional variables or functions that provided limited reuse or no additional value are still burdensome and create code bloat.

6. Verify All Input Parameters for Reliable Gen/App Execution

In the rush to get something working, don’t ignore input variable validation. Verify all input to the script and to functions. Don’t assume that if “we got here,” all of our inputs are valid.

7. Log Helpful and Actionable Messages

Consider what output needs to be logged for status/progress, error conditions, or troubleshooting. Each message should be thoughtfully considered and appropriately worded to provide helpful feedback to operators and future developers.

8. Check Return Codes on All Method/Function/API Calls and Take Defensive Action

Commands that are executed within the body of the script or function will have return codes, explicitly pass, fail, or other. Be sure to check, log, and properly handle both expected and unexpected return codes from methods, functions, and API calls.

9. Use Defensive Programming Techniques

Apply best practices for defensive programming, including least privilege access, input validation, error handling, etc.

10. Test Gen/App Recovery Scripts Beyond the Happy Path

Working code is not enough. Develop a robust validation plan and test the code extensively, especially beyond the happy path when everything is expected to work.

11. Use Version Control for Script Management and Troubleshooting

Use version control and code management tools. Version control is essential for troubleshooting, management, and tracking the inevitable fixes required for your scripts.

12. Catch Errors Early with Code Inspections and Peer Reviews

Use code inspections and peer reviews to increase the resilience and robustness of the code. Code reviews help find problems early and reduce the cost, risk, and burden of late-stage failures and bugs.

13. Verify Permissions Required for Execution in Gen/App Recovery

Having well-organized, modern, reviewed, inspected, tested, and controlled code is an essential part of a well-crafted gen/app script. However, the best-coded script will fail to execute if it does not have the right permissions. Ensure that the script has the correct permissions to execute standalone as well as under the service/user accounts of the HA solution.

14. Comment Code Clearly to Explain Logic and Business Use Cases

Provide comments that help explain the business logic and use case, describe expected function inputs and returns, and contribute to overall understanding. Well-written code still needs comments, especially if it is not obvious what business logic or requirement is being addressed. An example comment block could look like:
Name:

Purpose:

Preconditions:

Postconditions:

Returns:

Ready to Simplify Gen/App Recovery with Confidence?

Don’t leave high availability to chance. With SIOS LifeKeeper and the Generic Application (Gen/App) Recovery Kit, you can safeguard critical applications, streamline recovery, and reduce downtime.

Request a demo today to see how SIOS can help you achieve reliable, cost-effective high availability and disaster recovery.

Author: Cassius Rhue, VP, Customer Experience at SIOS

Reproduced with permission from SIOS

September 13, 2025

The Importance of Disaster Recovery Planning for Modern Businesses

In today’s internet-driven world, a moment of downtime can cost businesses thousands and even millions of dollars. Users expect seamless, uninterrupted access to services and applications 24/7. When your system goes down, they notice instantly. What’s worse than a few minutes of inconvenience? A catastrophic failure where you lose access to your entire production environment. While modern cloud platforms have built-in resiliency, assuming you’re immune to disaster can be a costly mistake. Without a clear disaster recovery (DR) plan, what could be a seamless recovery can quickly spiral into chaos. In this post, we’ll explore why disaster recovery planning is crucial and how it can protect your business from both financial and reputational harm.

Disaster Recovery Preparedness vs. the High Cost of Downtime

Yes, setting up a disaster recovery plan takes time, resources, and strategic planning. But the investment pales in comparison to the costs of an unplanned outage. Lost revenue, damaged customer trust, compliance penalties, and operational disruption can devastate a business, especially if recovery takes days instead of minutes. A solid DR plan isn’t a luxury; it’s a fundamental business continuity requirement in the digital age.

How Disaster Recovery Protects Customer Satisfaction

When systems crash, customers don’t just get frustrated; they often leave. With social media amplifying every complaint, a short outage can become a public relations nightmare. But if you have DR mechanisms in place to ensure continuity, customers experience minimal or no impact. They stay satisfied, loyal, and confident in your brand’s reliability, even during unexpected events.

Turn Uptime into a Competitive Advantage to Over the Competition

Downtime isn’t just bad for you; it’s an opportunity for your competitors. But when their services fail and yours stay up, it sends a powerful message to the market. Reliable uptime can be a major differentiator, especially in industries where trust and availability are paramount. Disaster recovery planning can turn operational resilience into a competitive edge.

Gain Peace of Mind with Data Redundancy

Knowing that your critical data and infrastructure are backed up, replicated, and easily recoverable is invaluable. Whether it’s a cyberattack, hardware failure, or natural disaster, you can rest easy knowing your operations aren’t at the mercy of a single point of failure. DR planning helps you build redundancy so no single event can wipe out your progress or data.

Minimize Downtime and Accelerate Recovery

When disaster strikes, speed is everything. A well-designed disaster recovery plan includes clear steps, designated responsibilities, and automated recovery mechanisms. This significantly reduces the time it takes to return to full operations. Instead of scrambling to assess the damage and improvise a fix, your team can follow a tested recovery playbook to restore production quickly and efficiently.

Final Thoughts: Making Disaster Recovery a Priority

Disaster recovery isn’t just an IT issue; it’s a business imperative. In a world where downtime equals lost dollars and diminished trust, the ability to recover quickly and keep services running is a critical differentiator. Don’t wait for disaster to strike before realizing the value of preparation. Contact the SIOS team to start the journey to build your recovery strategy now, and position your business to weather any storm.

Author: Connor Toohey, Product Support Engineer at SIOS

Reproduced with permission from SIOS

SIOS APAC Portal

Commonalities between Disaster Recovery (DR) and your spare tire

Commonalities between Disaster Recovery (DR) and your spare tire

Understanding the Role of a Spare Tire (and a DR Node)

Understanding the Role of a Disaster Recovery Node

Drawing Parallels Between Your Spare Tire and a DR Node

Why a Reliable DR Solution Matters More Than Ever

Keep Your Business Rolling with DataKeeper

Unlocking Near-Zero Downtime Patch Management with High Availability Clustering

Unlocking Near-Zero Downtime Patch Management with High Availability Clustering

Why Patch Management Is So Challenging

How HA Clustering Transforms Patch Management

Strengthening Security, Compliance, and IT Resilience with HA Clustering

Near-Zero Downtime Patch Management for Mission-Critical Applications

How to Safely Combine DataKeeper for Linux with Backup and Replication Tools

How to Safely Combine DataKeeper for Linux with Backup and Replication Tools

How to Test DataKeeper for Linux with Backup and Replication Software

1. Test on a QA cluster.

2. Complete basic functionality test.

3. Complete basic functionality tests with other software.

Using GenApp Resources to Manage Backup and Replication Processes with DataKeeper for Linux

Ensuring DataKeeper Cluster Compatibility and Preventing Downtime

Think Before You Script: Best Practices for Gen/App Recovery

Think Before You Script: Best Practices for Gen/App Recovery

Best Practices for Writing Gen/App Recovery Scripts

1. Use Modern, Supported Scripting Languages for Gen/App Recovery

2. Avoid Hardcoded Values in Gen/App Scripts

3. Practice Code Reuse to Improve Gen/App Script Quality

4. Choose Meaningful Names for Functions and Variables

5. Remove Unused Functions and Variables to Prevent Code Bloat

6. Verify All Input Parameters for Reliable Gen/App Execution

7. Log Helpful and Actionable Messages

8. Check Return Codes on All Method/Function/API Calls and Take Defensive Action

9. Use Defensive Programming Techniques

10. Test Gen/App Recovery Scripts Beyond the Happy Path

11. Use Version Control for Script Management and Troubleshooting

12. Catch Errors Early with Code Inspections and Peer Reviews

13. Verify Permissions Required for Execution in Gen/App Recovery

14. Comment Code Clearly to Explain Logic and Business Use Cases

Ready to Simplify Gen/App Recovery with Confidence?

The Importance of Disaster Recovery Planning for Modern Businesses

The Importance of Disaster Recovery Planning for Modern Businesses

Disaster Recovery Preparedness vs. the High Cost of Downtime

How Disaster Recovery Protects Customer Satisfaction

Turn Uptime into a Competitive Advantage to Over the Competition

Gain Peace of Mind with Data Redundancy

Minimize Downtime and Accelerate Recovery

Final Thoughts: Making Disaster Recovery a Priority

Join Our Mailing List

Search