Clustering Simplified Archives - Page 2 of 112

SIOS LifeKeeper vs. Pacemaker in SUSE and Red Hat Environments

January 16, 2026 by Jason Aw Leave a Comment

White Paper: SIOS LifeKeeper vs. Pacemaker in SUSE and Red Hat Environments

Keep your mission-critical Linux applications up and running with the right high availability approach. This white paper compares Pacemaker, the open-source HA framework in SLES HAE and RHEL HA, with SIOS LifeKeeper for Linux, a fully engineered commercial HA and DR platform.

Learn how key differences in architecture, deployment, management, and long-term reliability influence real-world outcomes. Whether you have seasoned Linux HA expertise or need a simpler, more predictable solution, this guide helps you choose the best fit.

Download the paper to align your Linux HA strategy with your skills, infrastructure, and availability requirements.

Reproduce with permission from SIOS

The Power of Approximation in Business Decisions and Communication

January 11, 2026 by Jason Aw Leave a Comment

The Power of Approximation in Business Decisions and Communication

Recently, I was reading a book on mathematical history (like everyone does, obviously). In particular, this one covered early developments and practices used by scribes in ancient Babylon and Egypt, and the transition to the Greeks we all know and love, like Pythagoras and Archimedes. It’s always quite fascinating to explore the way that humankind has learned to reason and explain the world around us. As I read, I noted many instances of behavior and processes that those who were unable to take even simple arithmetic for granted employed, and I believe they can be applied to business and interpersonal relationships!

What the Power of Approximation Teaches Us About Problem-Solving

One practice I took inspiration from was how they would go about approximating some difficult divisions and square root type operations, which was to take the average of an easier to calculate number that was a little too big and one that was a little too small. The numbers they obtained with this method could be shockingly accurate compared to what our electronic calculators would show. I even used this method back in school when learning square roots. For example, 16 and 25 are the perfect squares of 4 and 5, so the square root of 20 is probably roughly 4.5 (4.47 with the benefit of a calculator).

Applying Approximation Principles to Business Communication

So, if you’re in a situation where you need to manage a difficult communication, especially one where you have to be a bit firm or find a place to “draw the line,” and you’re unsure of where to draw it, why not try this approximation method? Find out the least you could do that is obviously too strict, then the most you can do that’s obviously too generous, and then evaluate an average that lies somewhere in the middle.

As long as you can establish some reasonable “guideposts” on either end of the true answer, you should be able to closely approach the answer you want. It won’t be helpful if you go too wide with them.

Establishing Guideposts in Customer Project Scopes

For example, let’s consider you have a big customer who is asking to go far beyond the agreed-upon scope of what you agreed to. You could set your “too much” as “I will do everything they need from me regardless”, and obviously, that’s too much, or it would be what you agreed upon in the first place. Inversely, your “too little” could be “I don’t need to go one step outside of what’s clearly written in this contract”. It may seem appealing in theory and occasionally effective, but in this case, it appears overly rigid and risks harming the relationship. We need to calculate our guide posts more closely.

So let’s start with the lower end of the spectrum. Maybe you outline the tasks that you absolutely will do and have already agreed to before they ask for more. Then, once those are decided on and require no further consideration, look at the requests levied on you and decide which ones are totally unreasonable or not currently possible; those are the ones where you would be going too far, and remove those from possible execution.

Now, with the items still left under consideration, which were not obviously “will-do” or “won’t-do” items, you are in your approximation zone, and instead of trying to pinpoint what to do in the whole broad scale of items, you have a smaller, more focused zone to refine. Keep the items in there until you feel that you’ve gotten the feel of what is appropriate to do in this case, and take it back to your partner. Plus, you may even still have a little wiggle room for negotiation!

Now we’ve seen just one way of many that you can take these oxymoronically imprecise mathematical measures from the days of old and apply the spirit of them to your own challenges! After all, back then, the questions and challenges that math would be used for were much more rooted in the day-to-day requirements of civilization than the grand undertakings of moon travel and chemical synthesis. It matters less where it comes from and more the thought process that you can use to provide excellent customer experiences.

Author: Matthew Pollard, Customer Experience Software Engineer at SIOS

Reproduced with the permission from SIOS

SAP Disaster Recovery: Techniques and Best Practices

January 5, 2026 by Jason Aw Leave a Comment

SAP Disaster Recovery: Techniques and Best Practices

In this Enterprise Times article, Harry Aujla, partner alliance director at SIOS, examines why disaster recovery (DR) deserves as much strategic attention as high availability (HA) in protecting SAP environments and ensuring business continuity. He clarifies the differences between HA and DR, emphasizing that while HA aims to keep systems running during localized failures, DR focuses on restoring operations after major incidents such as cyberattacks or natural disasters.

The article outlines the business, financial, and regulatory risks of underinvesting in DR, then guides readers through two core recovery models, sitewide recovery and application-level recovery, highlighting the benefits, trade-offs, and cost considerations of each. Aujla concludes that there is no one-size-fits-all approach to SAP disaster recovery, urging organizations to align their strategy with business priorities, service-level requirements, and tolerance for downtime to build a resilient and practical recovery plan.

Reproduced with permission from SIOS

Designing for High Availability and Disaster Recovery

December 29, 2025 by Jason Aw Leave a Comment

Designing for High Availability and Disaster Recovery

Design-driven creation, tools, and Conflicting Design patterns in IT Infrastructure

When design drives creation, results are communicable. Design-first mentalities create solutions that individuals can be trained in effectively. Using the design principles as a vehicle to communicate purpose leads to solutions that can be readily maintained and improved. Naturally, when solutions are built upon tools, the ways the tool is designed to be used must be considered in conjunction with the design of the solution it supports.

The tools chosen impose their design assumptions upon the projects in which they are used. As the previous related blog outlines, a design that is cohesive in concept and purpose is the first step in creating a solution that is understandable. Of course, tools employed by a project can incorporate patterns that are anathemic to the project’s design.

Conflict between the initial design and the tools employed creates complexity and reduces the efficacy of the solution. As such, tools must be selected in a way that the use of the tool is cohesive with the design of the project. When cohesion between the tool and the design is achieved, complexity is reduced. In the context of High Availability and Disaster Recovery, the effects of cohesion between design and tools used are readily apparent.

Designing for High Availability and Disaster Recovery assumed to be complex

Designing for High Availability and Disaster Recovery often carries the assumption of complexity. As IT infrastructure design patterns become increasingly present to meet the high standards intrinsic to High Availability and Disaster Recovery, individual infrastructure components attempt to implement patterns within the scope of that individual component.

As components each work to address the concerns of High Availability and Disaster recovery within the context of their role, environments inherit bloat due to components addressing the concerns of High Availability and Disaster Recovery with divergent design principles.

Infrastructure regularly needs to employ multiple design patterns

Tools grow and can develop competing design principles, yet environments require design that is cohesive. Complexity bleeds into infrastructure as previously unrelated tools begin to interfere with one another. As IT systems grow in terms of purpose and standards of availability, the importance of infrastructure that follows a cohesive design and implements complementary tools grows as well. Technological advancements have provided a myriad of strategies for introducing High Availability and Disaster Recovery, and IT Infrastructure has also grown to accommodate design patterns tailored towards other use cases. Just glance at the common cloud design patterns that Microsoft publishes in its documentation. It is easy to see how each pattern is applicable, but it is just as easy to see how patterns can conflict with one another as well. Pattern overlap is difficult to navigate and can make designing IT infrastructure a difficult process. Infrastructure regularly needs to employ multiple design patterns, and in turn, there is more and more need for patterns that “stay out of each other’s way”.

Author: Philip Merry – Software Engineer at SIOS

Reproduced with permission from SIOS

The Importance of Proper Memory Allocation in HA Environments

December 23, 2025 by Jason Aw Leave a Comment

The Importance of Proper Memory Allocation in HA Environments

Proper memory allocation is a critical yet often overlooked component in any highly available (HA) environment. When a server begins to experience memory allocation issues, the effects can transpire throughout the entire cluster, impacting application performance, slowing down replication, and even causing failover failures. In more severe cases, memory exhaustion can interrupt SIOS tools such as DataKeeper and LifeKeeper, further increasing the risk of unpredictable and unintentional behavior. Understanding the role memory plays in HA environments is key to maintaining stability, performance, and predictable failover behavior.

Below, we will explore why proper memory allocation matters, what symptoms to watch for, and how memory-related issues can impact the reliability of your cluster in LifeKeeper/DataKeeper environments.

Common Symptoms of Memory Allocation Issues

1. Replication Stalls or Unexpected Mirror Hangs/Application Termination

One of the most noticeable effects of low memory is degraded replication performance. Products like DataKeeper depend on consistent access to system memory for buffering write operations. When memory is constrained, queues begin to fill, replication slows, and in some cases, the mirror may be hung due to resource exhaustion. This can lead to resync operations that take significantly longer than expected, especially with respect to environments with high write rates. In unison, non-graceful terminations of the DataKeeper application can cause certain processes to be left unmonitored/unhandled, leading to unexpected behavior upon “starting” the DataKeeper service again.

2. Slow Application Response or Service Delays

When a system is running low on memory, the operating system may begin paging or swapping active processes. In HA environments running applications such as SQL Server, this can cause slow queries, delayed responses, and high disk activity as memory pages are constantly moved. These delays often cascade into longer failover times, as services take longer to gracefully stop or restart during a failover event.

3. Increased Risk of False Failovers

High availability solutions depend on timely heartbeat communication between nodes. When memory is exhausted, threads responsible for sending or processing heartbeat messages may be delayed. Even small delays can make a healthy node appear unresponsive, leading to unnecessary failovers or, in worst-case scenarios, split-brain events.

4. Kernel or System Logs Showing Memory Pressure

Memory starvation often results in specific system messages (Windows or Linux). These may include warnings about low available memory, paging activity spikes, or processes being terminated by the OS to reclaim memory. For systems running replication drivers or HA services, these warnings often precede more significant issues.

5. Unpredictable Performance in Virtual or Cloud Environments

In virtualized environments, memory issues can appear even when a VM reports “available” RAM. Hypervisors like VMware, Hyper-V, or cloud platforms may throttle memory access through techniques such as ballooning or overcommitment. This can silently impact VM performance, causing replication delays, heartbeat issues, etc., without obvious indications as to the root cause of the issue(s).

Tools for Diagnosing Memory Allocation Issues in HA Environments

Performance Monitor / Task Manager (Windows)
Useful for identifying memory pressure, paging activity, and process-level consumption. Look for: Highly committed memory values.
- Large paging file usage
- Processes consuming excessive RAM
Event Viewer (Windows) or journalctl / dmesg (Linux)
Memory pressure often leaves clues in system logs. Watch for:
- “Low Memory” warnings
- Failed memory allocations
- Replication driver warnings indicating resource exhaustion
top, htop, or free (Linux)
These tools can reveal memory saturation, swap usage, and services using disproportionate amounts of RAM.
Hypervisor Tools ( vSphere (VMware) / Hyper-V Manager (Hyper-V) / Cloud Platform Managers) These tools identify ballooning, swapping, host-level contention, or overcommitment as produced by the lack of available, yet demanded, memory.

When to Reevaluate Memory Allocation?

You may need to increase or adjust memory allocation when:

Replication regularly enters PAUSED states or hangs under load.
Paging or swapping becomes a consistent pattern during peak workload.
Your application servers (e.g., SQL Server) frequently consume most of the available RAM.
The cluster experiences intermittent failovers with no underlying hardware failures.
You are operating in a cloud or virtual environment where host contention is possible.
You see “Resource Exhaustion” event logging from your system
Unexpected terminations of critical services

In HA environments, memory isn’t just for performance; it helps ensure predictable failover behavior and prevents cascading service interruptions.

Why Proper Memory Allocation Is Key to HA Reliability

Memory pressure can negatively affect nearly every layer of an HA environment, from replication drivers to application performance and failover timing. Proper memory allocation helps ensure predictable performance, stable cluster communication, and reliable recovery when a failover occurs. By proactively monitoring and planning memory usage, organizations can avoid unnecessary downtime and maintain the high availability their systems demand. If memory allocation challenges are impacting HA performance or failover behavior, request a SIOS demo to see how we can help strengthen reliability.

Author: Aidan Macklen, Associate Product Support Specialist at SIOS Technology Corp.

Reproduced with permission from SIOS