SIOS SANless clusters

SIOS SANless clusters High-availability Machine Learning monitoring

  • Home
  • Products
    • SIOS DataKeeper for Windows
    • SIOS Protection Suite for Linux
  • News and Events
  • Clustering Simplified
  • Success Stories
  • Contact Us
  • English
  • 中文 (中国)
  • 中文 (台灣)
  • 한국어
  • Bahasa Indonesia
  • ไทย

Designing for High Availability and Disaster Recovery

December 29, 2025 by Jason Aw Leave a Comment

Designing for High Availability and Disaster Recovery

Designing for High Availability and Disaster Recovery

Design-driven creation, tools, and Conflicting Design patterns in IT Infrastructure

When design drives creation, results are communicable. Design-first mentalities create solutions that individuals can be trained in effectively. Using the design principles as a vehicle to communicate purpose leads to solutions that can be readily maintained and improved. Naturally, when solutions are built upon tools, the ways the tool is designed to be used must be considered in conjunction with the design of the solution it supports.

The tools chosen impose their design assumptions upon the projects in which they are used. As the previous related blog outlines, a design that is cohesive in concept and purpose is the first step in creating a solution that is understandable. Of course, tools employed by a project can incorporate patterns that are anathemic to the project’s design.

Conflict between the initial design and the tools employed creates complexity and reduces the efficacy of the solution.  As such, tools must be selected in a way that the use of the tool is cohesive with the design of the project. When cohesion between the tool and the design is achieved, complexity is reduced. In the context of High Availability and Disaster Recovery, the effects of cohesion between design and tools used are readily apparent.

Designing for High Availability and Disaster Recovery assumed to be complex

Designing for High Availability and Disaster Recovery often carries the assumption of complexity. As IT infrastructure design patterns become increasingly present to meet the high standards intrinsic to High Availability and Disaster Recovery, individual infrastructure components attempt to implement patterns within the scope of that individual component.

As components each work to address the concerns of High Availability and Disaster recovery within the context of their role, environments inherit bloat due to components addressing the concerns of High Availability and Disaster Recovery with divergent design principles.

Infrastructure regularly needs to employ multiple design patterns

Tools grow and can develop competing design principles, yet environments require design that is cohesive. Complexity bleeds into infrastructure as previously unrelated tools begin to interfere with one another. As IT systems grow in terms of purpose and standards of availability, the importance of infrastructure that follows a cohesive design and implements complementary tools grows as well. Technological advancements have provided a myriad of strategies for introducing High Availability and Disaster Recovery, and IT Infrastructure has also grown to accommodate design patterns tailored towards other use cases. Just glance at the common cloud design patterns that Microsoft publishes in its documentation. It is easy to see how each pattern is applicable, but it is just as easy to see how patterns can conflict with one another as well. Pattern overlap is difficult to navigate and can make designing IT infrastructure a difficult process. Infrastructure regularly needs to employ multiple design patterns, and in turn, there is more and more need for patterns that “stay out of each other’s way”.

Author: Philip Merry – Software Engineer at SIOS

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: disaster recovery, High Availability

The Importance of Proper Memory Allocation in HA Environments

December 23, 2025 by Jason Aw Leave a Comment

The Importance of Proper Memory Allocation in HA Environments

The Importance of Proper Memory Allocation in HA Environments

Proper memory allocation is a critical yet often overlooked component in any highly available (HA) environment. When a server begins to experience memory allocation issues, the effects can transpire throughout the entire cluster, impacting application performance, slowing down replication, and even causing failover failures. In more severe cases, memory exhaustion can interrupt SIOS tools such as DataKeeper and LifeKeeper, further increasing the risk of unpredictable and unintentional behavior. Understanding the role memory plays in HA environments is key to maintaining stability, performance, and predictable failover behavior.

Below, we will explore why proper memory allocation matters, what symptoms to watch for, and how memory-related issues can impact the reliability of your cluster in LifeKeeper/DataKeeper environments.

Common Symptoms of Memory Allocation Issues

1. Replication Stalls or Unexpected Mirror Hangs/Application Termination

One of the most noticeable effects of low memory is degraded replication performance. Products like DataKeeper depend on consistent access to system memory for buffering write operations. When memory is constrained, queues begin to fill, replication slows, and in some cases, the mirror may be hung due to resource exhaustion. This can lead to resync operations that take significantly longer than expected, especially with respect to environments with high write rates. In unison, non-graceful terminations of the DataKeeper application can cause certain processes to be left unmonitored/unhandled, leading to unexpected behavior upon “starting” the DataKeeper service again.

2. Slow Application Response or Service Delays

When a system is running low on memory, the operating system may begin paging or swapping active processes. In HA environments running applications such as SQL Server, this can cause slow queries, delayed responses, and high disk activity as memory pages are constantly moved. These delays often cascade into longer failover times, as services take longer to gracefully stop or restart during a failover event.

3. Increased Risk of False Failovers

High availability solutions depend on timely heartbeat communication between nodes. When memory is exhausted, threads responsible for sending or processing heartbeat messages may be delayed. Even small delays can make a healthy node appear unresponsive, leading to unnecessary failovers or, in worst-case scenarios, split-brain events.

4. Kernel or System Logs Showing Memory Pressure

Memory starvation often results in specific system messages (Windows or Linux). These may include warnings about low available memory, paging activity spikes, or processes being terminated by the OS to reclaim memory. For systems running replication drivers or HA services, these warnings often precede more significant issues.

5. Unpredictable Performance in Virtual or Cloud Environments

In virtualized environments, memory issues can appear even when a VM reports “available” RAM. Hypervisors like VMware, Hyper-V, or cloud platforms may throttle memory access through techniques such as ballooning or overcommitment. This can silently impact VM performance, causing replication delays, heartbeat issues, etc., without obvious indications as to the root cause of the issue(s).

Tools for Diagnosing Memory Allocation Issues in HA Environments

  • Performance Monitor / Task Manager (Windows)
    Useful for identifying memory pressure, paging activity, and process-level consumption. Look for:  Highly committed memory values.

    • Large paging file usage
    • Processes consuming excessive RAM
  • Event Viewer (Windows) or journalctl / dmesg (Linux)
    Memory pressure often leaves clues in system logs. Watch for:

    • “Low Memory” warnings
    • Failed memory allocations
    • Replication driver warnings indicating resource exhaustion
  • top, htop, or free (Linux)
    These tools can reveal memory saturation, swap usage, and services using disproportionate amounts of RAM.
  • Hypervisor Tools ( vSphere (VMware) / Hyper-V Manager (Hyper-V) / Cloud Platform Managers) These tools identify ballooning, swapping, host-level contention, or overcommitment as produced by the lack of available, yet demanded, memory.

When to Reevaluate Memory Allocation?

You may need to increase or adjust memory allocation when:

  • Replication regularly enters PAUSED states or hangs under load.
  • Paging or swapping becomes a consistent pattern during peak workload.
  • Your application servers (e.g., SQL Server) frequently consume most of the available RAM.
  • The cluster experiences intermittent failovers with no underlying hardware failures.
  • You are operating in a cloud or virtual environment where host contention is possible.
  • You see “Resource Exhaustion” event logging from your system
  • Unexpected terminations of critical services

In HA environments, memory isn’t just for performance; it helps ensure predictable failover behavior and prevents cascading service interruptions.

Why Proper Memory Allocation Is Key to HA Reliability

Memory pressure can negatively affect nearly every layer of an HA environment, from replication drivers to application performance and failover timing. Proper memory allocation helps ensure predictable performance, stable cluster communication, and reliable recovery when a failover occurs. By proactively monitoring and planning memory usage, organizations can avoid unnecessary downtime and maintain the high availability their systems demand. If memory allocation challenges are impacting HA performance or failover behavior, request a SIOS demo to see how we can help strengthen reliability.

Author: Aidan Macklen, Associate Product Support Specialist at SIOS Technology Corp.

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: Application availability, High Availability

Top Reasons Businesses Are Adopting Disaster Recovery as a Service (DRaaS) Solutions

December 9, 2025 by Jason Aw Leave a Comment

Top Reasons Businesses Are Adopting Disaster Recovery as a Service (DRaaS) Solutions

Top Reasons Businesses Are Adopting Disaster Recovery as a Service (DRaaS) Solutions

In today’s fast-paced digital landscape, businesses rely heavily on data and technology to keep operations running smoothly. However, with increasing threats from cyberattacks, hardware failures, and natural disasters, the risk of downtime and data loss for companies has never been greater. To better deal with these challenges, many companies are turning to Disaster Recovery as a Service (DRaaS) solutions.

What is Disaster Recovery as a Service (DRaaS)?

You might be wondering, what exactly is DRaaS? Well, it’s sort of just like it sounds. It’s a cloud-based service that allows companies to back up their data and applications to a third-party provider.  DRaaS differs from Backup as a Service (BaaS) in one critical aspect in that DRaaS includes the entire infrastructure to quickly failover to the cloud environment, whereas traditional BaaS only backs up the data itself.

Why Businesses Are Turning to DRaaS Solutions

Now, why are companies adopting this approach to Disaster Recovery? One of the biggest reasons is cost.  Traditional disaster recovery setups often require large investments in secondary data centers or hardware, and the maintenance that goes along with that.  DRaaS eliminates these expenses by offering a subscription-based model utilizing the cloud, which allows companies to pay only for the resources they need.  This makes enterprise-level disaster recovery much more cost-effective, even for small and mid-sized businesses.

Key Benefits of Adopting DRaaS

Another major driver for adopting DRaaS is its scalability and flexibility.  As businesses grow and adapt and their data needs evolve, DRaaS solutions allow them to easily scale resources up or down without major infrastructure changes.  This adaptability ensures that recovery plans can evolve alongside business goals.

DRaaS also provides faster recovery times, minimizing downtime through automated failover and continuous data replication. That means getting back up and running quickly, reducing possible financial losses, and minimizing any disrepute.

Additionally, enhanced security and compliance make DRaaS an attractive choice. Leading DRaaS providers enforce advanced security measures such as strict encryption, monitoring, and updates to protect data and meet industry standards.  With the rise of ransomware and other cyber threats, DRaaS adds a critical layer of protection, allowing data to be recovered even after a cyber attack.

How DRaaS Solutions Support Business Continuity and Resilience

Ultimately, adopting DRaaS is not just about disaster recovery; it’s about building resilience.  Companies that invest in DRaaS demonstrate a proactive commitment to protecting their data, maintaining customer trust, and ensuring uninterrupted operations no matter what challenges arise.

DRaaS and High Availability Solutions from SIOS Technology

At SIOS Technology, we understand that every minute of downtime can have a lasting impact on a business, and our powerful, cloud-ready DR and high availability solutions are designed to achieve seamless protection, rapid recovery, and complete peace of mind.  Whether running critical workloads in the cloud, on-premises, or in hybrid environments, SIOS delivers the reliability and performance needed to keep critical businesses running no matter what happens.

Request a demo today to see how SIOS can strengthen your DRaaS strategy and keep your critical systems protected.

Author: Cassy Hendricks-Sinke, Principal Software Engineer at SIOS

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: disaster recovery

99.99% Uptime: Balancing High Availability and Maintenance

November 30, 2025 by Jason Aw Leave a Comment

99.99% Uptime Balancing High Availability and Maintenance

99.99% Uptime: Balancing High Availability and Maintenance

“99.99% uptime,” often referred to as “four nines,” represents a system’s availability 99.99% of the time, allowing for only about 52 minutes of downtime annually. This metric is a “golden” standard for any size organization seeking to deliver reliable services, ensuring minimal disruptions for users.

Achieving four nines (99.99%) indicates a continued commitment in the sphere of High Availability, which is paramount for industries like E-Commerce, Healthcare, and Finance, where downtime can lead to significant financial losses or customer confidence.

However, maintaining reliability at this level presents a core challenge: balancing High Availability with “mandatory” system maintenance. Systems require updates, patching, and upgrades to remain secure and continue to operate, but these activities often require downtime.

Organizations must strive to maintain strategies like redundancy, failover/switchover, and rolling updates to perform maintenance without compromising uptime. Striking this balance is key to sustaining trust and delivering consistent services in competitive markets.

What Is 99.99% Uptime and Why It Matters

By: Alexus Gore, CX Software Engineer at SIOS Technology

Uptime represents the amount of time a service is available and functional. A service with 99.9% uptime would experience 8.77 hours of downtime per year. If a hospital had 99.95% uptime, this would mean 4.38 hours of not being able to access patient data, delaying their care, which is not an ideal circumstance.

99.99% uptime is a common baseline for industries like Finance, Healthcare, SaaS, etc., where it’s desirable to have no more than 52.60 minutes of downtime per year. This uptime value is also more practical to achieve and the highest affordable uptime to maintain. Due to the risks of the impacts that can occur during downtime, 99.99% uptime is ideal to ensure the least possible amount of downtime.

A 99.99% SLA guarantees that the downtime experienced will not exceed the minimum amount of downtime each year. Ensuring this agreement is met builds customer trust by making sure services are readily available for access. In return, this will help maintain the consumer base and ensure business continuity.

The Role of High Availability (HA) in Achieving 99.99% Uptime

By: Bill Darnell, Sr. Product Support Engineer at SIOS Technology

High Availability is a system design approach that ensures applications and services remain accessible, targeting 99.99% uptime. These are built on key components such as redundant hardware, distributed software, and resilient network configurations. The goal is to eliminate single points of failure so operations can continue even if the primary server fails.

SIOS software achieves HA using a cluster (multiple servers) in which each node is able to perform the same functionality. These machines are connected via two or more communication paths.  This creates a fault-tolerant environment that maintains service continuity. Lifekeeper monitors system health by constantly checking servers, applications, and services for failures. If one server or node goes down, LifeKeeper automatically transfers operations to a standby server with minimal downtime.

SIOS supports protection for databases (SQL Server, Oracle, SAP HANA), file systems, and custom applications.

The Hidden Cost of Uptime: Why Maintenance Matters

By: Cassy Hendricks-Sinke, CX Principal Software Engineer at SIOS Technology

In the pursuit of maximum uptime, many organizations delay or skip routine maintenance, a decision that can be dangerously short-sighted. Ignoring updates or patching exposes systems to serious security vulnerabilities, decreases performance efficiency, and increases the risk of non-compliance. Each postponed update can make a company more vulnerable to attacks and accrue technical debt that’s harder to manage over time.

Yet, the real challenge lies in balancing uptime with essential maintenance. Businesses often fear downtime, not recognizing that neglecting updates invites even greater disruption in the form of breaches or extensive outages. The key to dealing with this problem lies in proactive planning! Scheduling rolling updates, using redundant strategies, and adopting tools that allow for hot patching or zero-downtime deployments are all ways to combat or minimize any downtime caused by critical maintenance.

True uptime is more than just staying ‘online’; it’s about staying secure, efficient, and compliant as well.  Investing in smart maintenance strategies ensures systems are not only available but also resilient and trustworthy.

Strategies to Balance 99.99% Uptime and Maintenance

By: Philip Merry, CX Software Engineer at SIOS Technology

Often, maintenance of systems requires that downtime be taken so the maintenance activities can be performed without interruption. Obviously, aiming for high uptime requirements stands in conflict with scheduling downtime windows for maintenance. Delaying and batching maintenance might leave systems in a troubled state for long periods of time in service of the uptime requirements, while frequent maintenance windows can start to drastically lower metrics for system availability. These concerns, though in conflict, can be balanced with the use of a High Availability strategy.

SIOS LifeKeeper is a high availability tool that allows redundancy in the systems that can perform a workload. While one system is actively performing the workload and running the business applications, the other system can act as a standby that assumes workloads if a failure were to occur. This “active/standby” model of providing High Availability gives a straightforward avenue to stay on top of maintenance and updates while ensuring continuity of business applications.

Balancing uptime with maintenance in the context of a High Availability tool like LifeKeeper is, in concept and in practice, very simple. Perform maintenance on the system in the standby role first. Once complete, allow the active and standby systems to switch roles. Now, the active system has undergone the required maintenance and is hosting business applications. Once again, the system in the standby role can have maintenance performed. Upon completion, all of the systems have undergone maintenance while the workload has remained accessible during the maintenance window. This strategy of “Highly Available Updates” enabled by LifeKeeper allows systems to stay maintained and available without sacrificing in either regard.

Tools and Technologies That Support Uptime and Maintenance

By: Connor Toohey, Sr. Product Support Engineer at SIOS Technology

Achieving high availability and zero-downtime deployments requires a strategic mix of technologies for optimal performance. SIOS LifeKeeper and DataKeeper are key solutions, providing robust failover clustering and real-time data replication to ensure application and data availability across cloud, hybrid, and on-prem environments. Kubernetes enables zero-downtime deployments through container orchestration and automated rolling updates. Load balancers such as Azure Load Balancer and AWS Elastic Load Balancing distribute traffic efficiently to reduce the risk of service disruption.

AIOps platforms like Dynatrace or Moogsoft enhance operational stability with AI-powered anomaly detection and automated issue remediation. For server patching, tools such as Rancher, Red Hat Satellite, or WSUS support rolling updates, allowing for maintenance without downtime. Monitoring and logging platforms such as Prometheus, Grafana, Datadog, and Splunk provide real-time visibility into uptime and system performance. Together, these technologies create a resilient infrastructure for uninterrupted, reliable service delivery.

Best Practices for Maintaining 99.99% Uptime

By: Aidan Macklen, Associate Product Support Engineer at SIOS Technology

Achieving 99.99% uptime requires a proactive approach to system management. Rather than reacting to issues after they occur, we should focus on identifying and resolving potential risks before they impact service availability. Proactive maintenance, such as regular log reviews, capacity planning, and hardware inspections, ensures that small issues never escalate into outages.

Before deploying any updates or configuration changes, always test them in a controlled staging environment. This aids in verifying compatibility, stability, and performance under simulated production conditions, reducing the risk of unplanned downtime. Maintaining clear and well-documented incident response and rollback plans is equally critical so that when incidents do occur, we can restore normal operations in an efficient manner.

Highly available systems also benefit from continuous optimization. Regularly audit system performance, failover efficiency, and redundancy configurations to ensure that all components function as intended. Over time, these audits reveal bottlenecks, configuration drift, or underperforming nodes that could compromise uptime.

By prioritizing prevention, disciplined testing, and structured recovery planning, organizations can sustain the 99.99% uptime benchmark and deliver the reliability users expect from modern, highly available environments.

99.99% Uptime Solutions for Continuous Operations

By: Trey Isaac, Sr. Product Support Engineer at SIOS Technology

Every minute of downtime costs your business revenue, damages your reputation, and weakens customer trust. While achieving 99.99% uptime is a crucial benchmark, it’s an ongoing battle against the demands of essential maintenance, patches, and updates. The key isn’t just chasing an uptime number—it’s about building intelligent resilience to ensure your business stays up and running.

This is where SIOS transforms your operations. Our high-availability and disaster recovery solutions are engineered to protect your most critical applications, including SQL Server, Oracle, and SAP. Using automated, application-aware failover and real-time data replication, SIOS ensures your business remains fully operational through untimely failures, unexpected outages, and planned maintenance events alike.

Whether your infrastructure is on-premises, in the cloud, or a hybrid environment, SIOS provides the seamless protection you need. Stop reacting to downtime and start proactively ensuring your business stays operational, customers stay confident, and productivity never stops.

Summary: Achieving and Maintaining 99.99% Uptime

By: Matthew Pollard, Sr. CX Software Engineer, Amateur Kazooist at SIOS Technology

Regardless of what kind of business you do, or what applications you rely on, High Availability is a universal concept for keeping your operations up and running. Aiming for 99.99% uptime is a sure way to increase the reliability of your infrastructure, and in turn enable a high degree of trust from your customers. Achieving this uptime is not without its challenges, though, so the key is doing your research and engaging with a knowledgeable vendor of HA solutions, such as SIOS, to meet your needs. SIOS LifeKeeper allows you to protect your enterprise-level business-critical applications, such as SAP, Oracle, SQL Server, and more, against unplanned outages and downtime, while also minimizing the downtime needed for routine patching or maintenance activities. From simply adding a standby node for recovery purposes to sturdier Disaster Recovery configurations, SIOS solutions give you all of the tools you need.

Don’t wait until you feel the pain of outages or failures to start your search for an HA solution; be proactive! Our experts are eager and waiting to help you build your way to a more secure and robust environment that can stand up to whatever problem comes your way. Your IT teams, business leaders, partners, and customers will all thank you for it. Request a demo today to see how SIOS can help you achieve your uptime goals.

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: High Availability, Uptime

Video: EGGER Achieves 99.99% Uptime with SIOS LifeKeeper for Linux

November 26, 2025 by Jason Aw Leave a Comment

EGGER Achieves 99.99% Uptime with SIOS LifeKeeper for Linux

Video: EGGER Achieves 99.99% Uptime with SIOS LifeKeeper for Linux

SIOS Technology Corp., a leader in high availability (HA) and disaster recovery (DR) software, sits down with the EGGER Group—one of the world’s top wood-based materials manufacturers—to discuss how they achieved an impressive 99.99% uptime for their mission-critical systems. Across 22 manufacturing sites in 11 countries, EGGER relies on SIOS LifeKeeper for Linux to keep essential SAP, Oracle, and custom business applications continuously available.

In the video, EGGER’s IT team shares insights on: 

  • Why they chose SIOS for their high-availability strategy
  • How LifeKeeper for Linux simplifies failover and disaster recovery
  • The measurable impact on operational continuity and productivity

What can SIOS LifeKeeper for Linux clustering solution do for you?

  • Designed for secure environments: supports SELinux in all modes and can operate within AWS IMDS2-enabled instances for enhanced metadata protection.
  • Provides high-availability (HA) and disaster-recovery (DR) protection for critical Linux-based applications, including SAP S/4HANA, Oracle, MaxDB and more.
  • Monitors the entire application stack — server, storage, OS, network, database and application — not just server availability.
  • Includes application-aware Recovery Kits (ARKs) that automate failover, restart or alert when issues are detected.
  • Features a Web Management Console with intuitive setup, progress tracking, and simplified firewall access (only 2 TCP ports) for easier cluster management.
  • Supports major Linux distributions (Red Hat, SUSE, Rocky, Oracle Linux) and a wide range of storage architectures—including SANless setups for cloud and virtual environments.
  • Flexibly deployable in physical, virtual, cloud or hybrid environments: supports P2P, P2V, V2V clustering and SAN-based or local-storage synchronized (SANless) configurations.
Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: disaster recovery, High Availability

  • « Previous Page
  • 1
  • 2
  • 3
  • 4
  • …
  • 111
  • Next Page »

Recent Posts

  • Linux and LifeKeeper
  • Ensuring IT Resilience and Service Continuity in State and Local Government
  • SIOS LifeKeeper vs. Pacemaker in SUSE and Red Hat Environments
  • The Power of Approximation in Business Decisions and Communication
  • SAP Disaster Recovery: Techniques and Best Practices

Most Popular Posts

Maximise replication performance for Linux Clustering with Fusion-io
Failover Clustering with VMware High Availability
create A 2-Node MySQL Cluster Without Shared Storage
create A 2-Node MySQL Cluster Without Shared Storage
SAP for High Availability Solutions For Linux
Bandwidth To Support Real-Time Replication
The Availability Equation – High Availability Solutions.jpg
Choosing Platforms To Replicate Data - Host-Based Or Storage-Based?
Guide To Connect To An iSCSI Target Using Open-iSCSI Initiator Software
Best Practices to Eliminate SPoF In Cluster Architecture
Step-By-Step How To Configure A Linux Failover Cluster In Microsoft Azure IaaS Without Shared Storage azure sanless
Take Action Before SQL Server 20082008 R2 Support Expires
How To Cluster MaxDB On Windows In The Cloud

Join Our Mailing List

Copyright © 2026 · Enterprise Pro Theme on Genesis Framework · WordPress · Log in