|November 28, 2021||
Four Avoidance Strategies for Improving Cluster Resilience, Performance, and Outcomes
Simple Steps for Deployment in SIOS Protection Suite Cluster Environment
Avoiding something – we’ve all done it before. An old flame we see in the store while walking with our spouse, a salesperson when we aren’t “ready to buy”, and even a boss while we are out on “vacation”. When I was the manager of a development team, I caught a glimpse of a direct report browsing in a store while they were supposed to be out of the office sick. They ducked between clothing racks and scurried down the next aisle and hurried away. We’ve all done it before, and in some cases, for mental health, physical health, or reasons that remain private and personal, we all need some measures of avoidance. Even in HA. So, how do you add avoidance to your High Availability environment, and why?
Four reasons to use an avoidance strategy in High Availability
One reason to use avoidance strategies in HA is to increase application and server performance. Consider the case of three servers running production workloads, let’s call them Server Alpha, Server Beta, Server Gamma. Servers Alpha and Beta are running critical applications backed by a database, while Server Gamma is running reports and data transformation jobs. In the event of a failure of Server Alpha, a failover to Server Beta would traditionally occur. However, because server Beta is already running a large workload, the resulting additional application load might result in an undesirable server overload and poor performance for both applications. So it might be wise to deploy an avoidance strategy to make sure that Server Gamma is chosen as the failover target.
Consider again the scenario of three servers, Alpha, Beta, and Gamma. Servers Alpha and Beta are scaled to handle peak workloads, while Server Gamma is a cost-optimized server. In the event of a failure of Server Alpha and Server Beta, a failover will occur to the cost-optimized server, Gamma. However, this server is not scaled to handle peak workloads, nor the workloads of both Server Alpha and Server Beta at the same time. In this instance, an avoidance strategy can be used to optimize performance by automatically moving one or both of the workloads from Server Gamma as soon as another host is available.
HA Optimization is another scenario for deploying avoidance strategies. Like the performance optimization strategy, HA optimization is used to ensure that your environment can survive most failure scenarios and that your applications are optimized to provide the highest level of availability possible at any point in time. HA optimization is important for an application such as SAP with replicated enqueue processes. In any SAP environment, you do not want the ASCS (ABAP SAP Central Service) and ERS (enqueue replication services) instance residing on the same server for extended periods of time because of the risk of lost locks and canceled jobs. To prevent this from occurring you can use an avoidance strategy that causes the ERS and ASCS instances to always run on opposite cluster nodes. Consider the case of three servers running production workloads, let’s call them Servers Alpha, Beta, Gamma. Server Alpha is running the ASCS instance, while Server Beta is running the ERS instance. Server Gamma functions as a third node for failovers of both Server Beta (ERS) and Server Alpha (ASCS). If Beta crashes, you wouldn’t want the ERS resource running on the same node as the ASCS instance. To ensure this operation, you can deploy an avoidance strategy that automatically checks first and ensures the two applications are on separate servers, and maintain SAP ASCS/ERS best practices for lock failover.
Suppose you have two data centers: City Alpha and City Beta which are about 70 miles apart with most of your clients centrally located between them. However, due to recent changes in internal organizations, mergers/closures and acquisitions, and governance requirements, your IT team has to add a third data center that is located in City Gamma, which is about 350 miles from Alpha and Beta. Now the resources which were primarily protected in Alpha and Beta are also extended to the Gamma location. Given that most of the users and teams are near the Alpha and Beta locations and even the most extreme users are located in neighboring cities, your team needs to avoid a failover to the Gamma location. Like the other strategies, a DR avoidance seeks to optimize performance, in/out regional data costs, latency, and client access by avoiding the DR node should only one node within either region fail. It would also ensure that even if both nodes fail after different times, failover always occurs to the other node in the cluster or data center before moving to DR.
So, how do you deploy an avoidance strategy?
Many providers have affinity rules that can be configured, while others use a combination of server priorities or manual steps. In the case of the SIOS Protection Suite for Linux, you can use a number of built-in methods including:
In the event of a failure, resources will fail over to the server where they have the lowest remaining priority and cascade to any additional servers (Alpha, Beta, and Gamma). Server Alpha is the primary server for Resource.HR, Server Beta is the primary server for Resource.MFG, and Server Gamma is the backup server for all resources/servers. Using resource prioritization, Resource.HR would have a priority of one (1) on Server Alpha and a priority of two (2) on Server Gamma. While Resource.MFG could have a priority one (1) on Server Beta and a priority of two (2) on Server Gamma. If customers wanted to optimize the use of the environment, then Resource.HR could have a priority of three (3) on Server Beta and Resource.MFG could have a priority of three (3) on Server Alpha. In the event of a failure of Server Alpha, the resource Resource.HR would fail to Server Gamma first before trying to come in-service (be restored) on Server Alpha.
SIOS Protection Suite for Linux (UI and CLI) allow users to specify a priority for each server and resource combination.
Policy rules can also be used to prevent a resource recovery from occurring on a given server and thereby allowing a resource to avoid a specified server that may be running a more critical or resource-intensive workload. Typical policies include:
The SIOS Protection for Linux CLI allows users to specify policy rules which can disable failover to a specific resource for a specified server, provide temporal policies guarding failures, disable failures of a specific application type, constraint policies, and custom policies.
The most granular way to establish a resource avoidance strategy is to deploy specific avoidance scripts within each hierarchy. This method will allow the user to configure specific applications, (eg app1 and app2), to avoid one another whenever possible while allowing other applications to run without restriction. In the case of our three servers, Alpha, Beta, and Gamma, and three resources app1, app2, and app3 this method would provide the greatest flexibility. In this example, app1 and app2 will seek to avoid collocation when a server fails, but app3 will fail to the next available node based on priorities without any collocation restrictions.
For additional examples of avoidance strategies and resources, consider the SIOS Protection Suite for Linux documentation. If a customer has two applications, app1 and app2, that they require to run on different nodes whenever possible, the customer can create two avoidance terminal leaf node resources using the SIOS Protection Suite for Linux gen/app resource and the ‘/opt/LifeKeeper/lkadm/bin/avoid_restore’ script.
Reproduced from SIOS
|November 23, 2021||
Introduction To Clusters – Part 2
What Types of Clusters Are There and How Do They Work?
An Overview of HA Clusters, and Load Balancing Clusters
Clustering helps improve reliability and performance of software and hardware systems by creating redundancy to compensate for unforeseen system failure. If a system is interrupted due to hardware or software failure or natural disaster, this can have a major impact on business and revenue, wasting crucial time and expense to get things back up and running.
This is where clustering comes in. There are three main types of clustering solutions – HA clusters, load balancing clusters, and HPC clusters. Which type will best increase system availability and performance for your business? Let’s have a look at the three types of clustering solutions in more detail below.
What is HA Clustering?
High Availability clustering, also known as HA clustering, is effective for mission-critical business applications, ERP systems, and databases, such as SQL Server SAP, and Oracle that require near-continuous availability.
HA clustering can be divided into two types, “Active-Active” configuration and active-passive configuration.
Let’s take a look at the difference between these two HA clustering types.
HA Clustering Type 1: Active-Active Configuration
In the active-active configuration, processing is performed on all nodes in the cluster. For example, in the case of two-node clustering, both nodes are active. If one node stops, the processing will be taken over the other.
However, if each node is operating at close to 100% and one node stops, it will be difficult for another node to take on the additional processing load. Therefore, capacity planning with a margin is important for HA clustering.
HA Clustering Type 2: Active-Standby Configuration
Let’s use our two-node example again. In the active-standby configuration, one node is configured as the active node and the other node is configured as the standby node. The active node and the standby node exchange signals called “heartbeats” to constantly check whether they are operating normally.
If the standby node cannot receive the heartbeat of the active node, the standby node determines that the active node has stopped and will take over the processing of the active node. This mechanism is called “failover”. Conversely, the mechanism that recovers the stopped operating node and transfers the processing back to the recovered active node is called “failback.”
In an active/standby configuration, when a failure occurs, the simple switch from the active node to the standby node makes recovery relatively easy. However, it is necessary to consider that the resources of the standby node when the operating node is operating normally will be wasted.
Two Components of HA Clustering: Application and Storage
For an HA cluster to be effective, two areas need to be addressed: application orchestration and storage protection. Clustering software monitors the health of the application being protected and, if it detects an issue, moves operation of that application over to the standby node. The standby node needs access to the most up-to-date versions of data – preferably identical to the data that the primary node was accessing before the incident. This can be accomplished in two ways: shared storage, share-nothing storage. In the shared storage model, both cluster nodes access the same storage – typically a SAN. In shared-nothing (aka SANless) configurations, local storage on all nodes are mirrored using replication software.
Clustering software products vary widely in their ability to monitor and detect issues that may cause application failure and in their ability to orchestrate failovers reliably. Many clustering products only detect whether the application server is operational, but do not detect a wide range of software, services, network, and other issues that can cause application failure.
Application Awareness is Essential
Similarly, complex ERP and database applications have multiple component parts that have to be stored on the correct server or instance, started up in the right order, and brought on line in accordance with complex best practices. Choose a clustering software with specialized software called application recovery kits designed specifically to maintain best practices for the application/database-specific requirements.
There are multiple ways to configure an HA Cluster:
Traditional Two Node Clusters with Shared Storage
Two Node SANless Cluster
Clusters can be configured using local LAN and high speed synchronous block-level replication.
Real-time replication can be used to synchronize storage on the primary server with storage on a standby server located in the same data center, in your disaster recovery site, or both. This allows you to build high availability and disaster recovery configurations flexibly; Two node or multi-nodeSIOS block level replication is highly optimized for performance. You can even use super fast, high-speed locally attached storage such as PCIe flash type storage devices on your physical servers to achieve very low cost, high performance, high availability configurations. Your data is protected on the flash device and your application too.
Third Node for Disaster Protection
This configuration uses a SAN-based cluster and adds a third, SANless node into a remote data center or the cloud and achieve full disaster recovery protection. In the event of a disaster, the standby remote physical server is brought into service automatically with no data loss, eliminating the hours needed for restoration from backup media.
What is a Load Balancing Cluster?
Load balancing clustering is a mechanism that can be used as a single system by distributing processing to multiple nodes using a load balancer to improve performance by distributing processing. While it can isolate a failed node to prevent node failure from affecting the entire system, the load balancer is a critical single point of failure risk and not a high availability option. It is only effective for applications such web server load balancing. If the load balancer itself fails, the entire system stops.
What is HPC Clustering?
You can also use clustering for performance instead of high availability. High-Performance Computing clusters, or HPC clusters combine the processing power of multiple (sometimes thousands of nodes) to get the CPU performance needed in CPU-intensive environments such as scientific and technological environments requiring large-scale simulations, CAE analysis, and parallel processing.
Are you ready to find the right HA clustering solution for your business?
Learn more about SIOS High Availability clustering here.
Reproduced with permission from SIOS
|November 18, 2021||
Introduction To Clusters – Part 1
What is clustering in the first place?
Clustering technology is a technology that allows you to connect multiple servers to act as a single functional unit.
|November 13, 2021||
Clustering Software for High Availability and Disaster Recovery
What is Clustering Software?
Clustering software lets you configure your servers as a grouping or cluster so that multiple servers can work together to provide availability and prevent data loss. Each server maintains the same information – operating systems, applications, and data. If one server fails, another server immediately picks up the workload. IT professionals rely on clustering to eliminate a single point of failure and minimize the risk of downtime. In fact, 86 percent of all organizations are operating their HA applications with some kind of clustering or high availability mechanism in place.
Types of Cluster Management Software
There are a variety of cluster management software solutions available for Windows and Linux distributions. Examples include:
Except for SIOS, these products support a single operating system or require expensive SAN hardware, constraining flexibility and deployment options. Moreover, Linux open-source HA extensions require a high degree of technical skill, creating complexity and reliability issues that challenge most operators.
SIOS products uniquely protect any Windows- or Linux-based application operating in physical, virtual, cloud or hybrid cloud environments and in any combination of site or disaster recovery scenarios. Applications such as SAP and databases, including Oracle, SQL Server, DB2, SAP HANA and many others, benefit from SIOS software. The “out-of-the-box” simplicity, configuration flexibility, reliability, performance, and cost-effectiveness of SIOS products set them apart from other clustering software.
How SIOS Clustering Software Provides High Availability for Windows and Linux Clusters
If you are running a critical application in a Windows or Linux environment, you may want to consider SIOS Technology Corporation’s high availability software clustering products.
In a Windows environment, SIOS DataKeeper Cluster Edition seamlessly integrates with and extends Windows Server Failover Clustering (WSFC) by providing a performance-optimized, host-based data replication mechanism. While WSFC manages the software cluster, SIOS performs the replication to enable disaster protection and ensure zero data loss in cases where shared storage clusters are impossible or impractical, such as in cloud, virtual, and high-performance storage environments.
In a Linux environment, the SIOS Protection Suite for Linux provides a tightly integrated combination of high availability failover clustering, continuous application monitoring, data replication, and configurable recovery policies, protecting your business-critical applications from downtime and disasters.
Whether you are in a Windows or Linux environment, SIOS products free your IT team from the complexity and challenges of computing infrastructures. They provide the intelligence, automation, flexibility, high availability, and ease-of-use IT managers need to protect business-critical applications from downtime or data loss. With over 80,000 licenses sold, SIOS is used by many of the world’s largest companies.
Here is one case study that discusses how a leading Hospital Information Systems (HIS) provider deployed SIOS DataKeeper Cluster Edition to improve high availability and network bandwidth in their Windows cluster environment.
How One HIS Provider Improved RPO and RTO With SIOS DataKeeper Clustering Software
This leading HIS provider has more than 10,000 U.S.-based health care organizations (HCOs) using a variety of its applications, including patient care management, patient self-service, and revenue management. To support these customers, the organization had more than 20 SQL Server clusters located in two geographically dispersed data centers, as well as a few smaller servers and SQL Server log shipping for disaster recovery (DR).
The organization has a large customer base and vast IT infrastructure and needed a solution that could handle heavy network traffic and eliminate network bandwidth problems when replicating data to its DR site. The organization also needed to improve its Recovery Point Objective (RPO) and Recovery Time Objective (RTO) to reduce the volume of data at risk and get IT operations back up and running faster after a disaster or system failure. RPO is the maximum amount of data loss that can be tolerated when a server fails, or a disaster happens. RTO is the maximum tolerable duration of any outage.
To address these challenges, this organization chose SIOS DataKeeper Cluster Edition, which provides seamless integration with WSFC, making it possible to create SANless clusters.
Once SIOS DataKeeper Cluster Edition passed the organization’s stringent POC testing, the IT team deployed the solution in the company’s production environment. The team deployed SIOS across a three-node cluster comprised of two SAN-based nodes in the organization’s primary, on-premises data center and one SANless node in its remote DR site.
The SIOS solution synchronizes replication across the three nodes in the cluster and eliminates the bandwidth issues at the DR site, improving both RPO and RTO and reducing the cost of bandwidth. Today, the organization uses SIOS DataKeeper Cluster Edition to protect their SQL Server environment across more than 18 cluster nodes.
How SIOS Clustering Software Works
SIOS software is an essential part of your cluster solution, protecting your choice of Windows or Linux environments in any configuration (or combination) of physical, virtual and cloud (public, private, and hybrid) environments without sacrificing performance or availability.
If you need fast, efficient, replication to transfer data across low-bandwidth local or wide area networks, SIOS DataKeeper protects business-critical Windows environments, including Microsoft SQL Server, Oracle, SharePoint, Lync, Dynamics, and Hyper-V from downtime and data loss in a physical, virtual, or cloud environment.
SIOS Protection Suite for Linux supports all major Linux distributions, including Red Hat Enterprise Linux, SUSE Linux Enterprise Server, CentOS, and Oracle Linux and accommodates a wide range of storage architectures.
Learn more about:
 SIOS in partnership with ActualTech Research, (2018) The State of Application High Availability Survey Report
Reproduced from SIOS
|November 8, 2021||
Disaster Recovery Fundamentals
Disaster recovery overview