Clustering Simplified Archives - Page 50 of 104

Failover Cluster

December 18, 2021 by Jason Aw Leave a Comment

Failover Cluster

Failover Cluster Software Solutions: What You Need to Know

What is Failover?

The “Tell All” on Failover

Failover is the process by which a standby, redundant system, database, or network assumes operations when the primary system, database, or network fails, or primary operations are abnormally terminated. Hot failover is one of the key design principles incorporated into high availability and disaster recovery systems.

RTO is the maximum tolerable duration of any outage. Online transaction processing applications generally have the lowest RTOs, and those that are mission-critical often have an RTO of only a few seconds.

RPO is the maximum amount of data loss that can be tolerated when a failure happens. For HA, RPO is often zero to specify there should be zero data loss under all failure scenarios

Let’s Understand Failover

First, we need to discuss the difference between cold, warm, and hot standby servers:

A cold server (sometimes referred to as a cold failover) is one that is not connected to the primary server but is available and turned on only when the primary server goes down. With a cold server, it can take considerable time to power up the standby server, which may require an updated configuration and software. This means that Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) are the longest. Cold failover is considered unacceptable for mission-critical applications.
A warm server/failover is one that periodically receives updates from the primary server through data replication and mirroring.
A hot server/failover is one that receives regular updates from the primary server and is immediately available to take over in the event of a failover. Hot failover is the most resource-intensive, results in zero data loss (zero RPO) and RPOs of no more than a few minutes and is required to support high availability for mission-critical applications.

SIOS Failover Cluster Software supports hot failovers, delivering a zero RPO and an RTO of milliseconds for high availability. The results: a system failure results in no data loss and is transparent to the user.

SIOS Failover Cluster Software

SIOS offers failover cluster software so you can build a traditional shared storage cluster for high availability or a SIOS SANless cluster that uses local storage. SIOS software uses real-time synchronous (for LAN environments) or asynchronous (for WAN environments) data replication to synchronize storage.

SIOS solutions provide high availability and disaster recovery with a single solution. This approach eliminates the cost and complexity of SAN-based replication. With the ability to replicate to multiple targets, you can configure a multi-node failover cluster with nodes located in multiple locations to protect your systems from disasters.

With SIOS clusters, you can replicate between the configurations of your choice – between SAN and SANless environments and any combination of physical, virtual, cloud, and hybrid configurations. In fact, SIOS clustering solutions are unique in the breadth of operating systems, applications, and infrastructure environments supported, including Windows, Linux, SAP, SQL Server, Oracle, AWS, Azure, and Google cloud platforms.

SIOS failover — SIOS Failover Cluster Software uses efficient block-level replication to keep local storage synchronized, enabling the secondary nodes in your cluster to continue to operate after a failover with access to the most recent data.

In a Windows environment, SIOS DataKeeper gives you the flexibility to build a Windows cluster in any combination of physical, virtual, and cloud environments. SIOS DataKeeper Cluster Edition seamlessly integrates with and extends Windows Server Failover Clustering (WSFC) by providing a performance-optimized, host-based data replication mechanism.

In a Linux environment, SIOS LifeKeeper and SIOS DataKeeper provide a tightly integrated combination of high availability failover clustering, continuous application monitoring, data replication, and configurable recovery policies, protecting your business-critical applications from downtime and disasters.

Let’s review one case study that talks to the benefits of one of SIOS’ solutions – SIOS DataKeeper.

A SIOS Failover Cluster Software Solution in Action

Located in The Netherlands, Van de Lande BR (VDL) manufactures a wide range of PVC and PE compression fittings and valves. Their products are used all over the world in industrial and technical installations. Manufacturing more than 4500 different products, VDL is deeply committed to product improvement and quality, making their brand the choice for builders of systems and installations for more than 50 years.

VDL has a Hyper-V environment, solid state disk (SSD) storage, and its business relies heavily on its ERP solution. With only one data processing system in place, VDL was exposed and needed a disaster recovery (DR) solution to ensure the protection and availability of its ERP, web services, and other mission-critical systems in the event of a disaster.

To deliver immediate failover and DR protection, VDL built a Windows Server Failover Clustering (WSFC) system, with one node replicating data to the other node. If the primary node fails, WSFC transfers all operations to the standby node, giving users continuous access to applications and data.

VDL also chose SIOS DataKeeper Cluster Edition to provide DR for its Hyper-V virtual machines (VM). SIOS DataKeeper is a software add-on that provides uninterrupted data access. It seamlessly integrates with WSFC to add performance-optimized, host-based, synchronous or asynchronous real-time replication of Hyper-V VMs between physical servers across both LAN and WAN connections. Working with WSFC, SIOS DataKeeper monitors the system and application health, maintains client connectivity, and makes it possible to create SANless clusters. Unlike a SAN, a SANless cluster eliminates single points of failure and reduces the cost and complexity of deploying clusters.

SIOS DataKeeper also uses WSFC to provide system administrators with a familiar and application-agnostic HA/DR solution, thereby dramatically simplifying implementation and operation.

VDL deployed two SIOS DataKeeper clusters; one two-node cluster works as a file server and iSCSI server while the other supports a SQL Server (ERP) cluster and Dynamics NAV web services. The implementation took less than one day. During the system failover test, the network services team failed over and failed back the system quickly and easily. After a thorough evaluation of the VDL server configuration and the completion of testing, the installation team confirmed that SIOS DataKeeper and a SANless cluster met all their criteria for disaster recovery, performance, and high availability of their ERP system, web services, and other mission-critical applications. The organization no longer risks data loss in the event of a failure.

One Last Thing

In addition to testing SIOS, VDL tested other solutions with unacceptable results. Comments Maurits van de Lande, ICT Manager at VDL, “We have tested our file server with both DFS replication and AlwaysOn technology. Neither delivered an automated disaster recovery solution to match SIOS DataKeeper, which fully addresses our DR requirements.”

———————————————————————————————————————————-

Regardless of your IT environment, your organization can reap the benefits of SIOS DataKeeper Cluster Edition and DataKeeper Standard Edition, both of which provide configuration flexibility, reduce data transfer costs, eliminate single points of failure, reduce complexities, and optimize network performance.

For more information, contact us or request a free trial.

References:

Reproduced from SIOS

Data Replication

December 13, 2021 by Jason Aw Leave a Comment

Data Replication

Real-Time Data Replication for High Availability

What is Data Replication

Data replication is the process by which data residing on a physical/virtual server(s) or cloud instance (primary instance) is continuously replicated or copied to a secondary server(s) or cloud instance (standby instance). Organizations replicate data to support high availability, backup, and/or disaster recovery. Depending on the location of the secondary instance, data is either synchronously or asynchronously replicated. How the data is replicated impacts Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPO).

For example, if you need to recover from a system failure, your standby instance should be on your local area network (LAN). For critical database applications, you can then replicate data synchronously from the primary instance across the LAN to the secondary instance. This makes your standby instance “hot” and in sync with your active instance, so it is ready to take over immediately in the event of a failure. This is referred to as high availability (HA).

In the event of a disaster, you want to be sure that your secondary instance is not co-located with your primary instance. This means you want your secondary instance in a geographic site away from the primary instance or in a cloud instance connected via a WAN. To avoid negatively impacting throughput performance, data replication on a WAN is asynchronous. This means that updates to standby instances will lag updates made to the active instance, resulting in a delay during the recovery process.

Why Replicate Data to the Cloud?

There are five reasons why you want to replicate your data to the cloud.

As we discussed above, cloud replication keeps your data offsite and away from the company’s site. While a major disaster, such as a fire, flood, storm, etc., can devastate your primary instance, your secondary instance is safe in the cloud and can be used to recover the data and applications impacted by the disaster.
Cloud replication is less expensive than replicating data to your own data center. You can eliminate the costs associated with maintaining a secondary data center, including the hardware, maintenance, and support costs.
For smaller businesses, replicating data to the cloud can be more secure especially if you do not have security expertise on staff. Both the physical and network security provided by cloud providers is unmatched.
Replicating data to the cloud provides on-demand scalability. As your business grows or contracts, you do not need to invest in additional hardware to support your secondary instance or have that hardware sit idle if business slows down. You also have no long-term contracts.
When replicating data to the cloud, you have many geographic choices, including having a cloud instance in the next city, across the country, or in another country as your business dictates.

Why Replicate Data Between Cloud Instances?

While cloud providers take every precaution to ensure 100 percent up-time, it is possible for individual cloud servers to fail as a result of physical damage to the hardware and software glitches – all the same reasons why on-premises hardware would fail. For this reason, organizations that run their mission-critical applications in the cloud should replicate their cloud data to support high availability and disaster recovery. You can replicate data between availability zones in a single region, between regions in the cloud, between different cloud platforms, to on-premise systems, or any hybrid combination.

SIOS Real-Time Data Replication for High Availability and Disaster Recovery

SIOS Datakeeper™ uses efficient, block-level, data replication to keep your primary and secondary instances synchronized. If a failover happens, the secondary instance(s) continues to operate, providing users with access to the most recent data. With SIOS solutions, RPO is always zero and RTO is dependent on the application but typically 30 seconds to a few minutes.

SIOS products uniquely protect any Windows- or Linux-based application operating in physical, virtual, cloud or hybrid cloud environments and in any combination of site or disaster recovery scenarios, enabling high availability and disaster recovery for applications such as SAP and databases, including Oracle, HANA, MaxDB, SQL Server, DB2, and many others. The “out-of-the-box” simplicity, configuration flexibility, reliability, performance, and cost-effectiveness of SIOS products set them apart from other clustering software.

In a Windows environment, SIOS DataKeeper Cluster Edition seamlessly integrates with and extends Windows Server Failover Clustering (WSFC) by providing a performance-optimized, host-based data replication mechanism. While WSFC manages the software cluster, SIOS performs the data replication to enable disaster protection and ensure zero data loss in cases where shared storage clusters are impossible or impractical, such as in cloud, virtual, and high-performance storage environments.

———————————————————————————————————————————

Here is a real-world example of how one leading manufacturing company uses SIOS to create a high availability solution in the cloud using real-time data replication.

How to Achieve HA in a Cloud Environment with Real-Time Data Replication

Bonfiglioli is a leading Italian design, manufacturing, and distribution company, specializing in industrial automation, mobile machinery, and wind energy products and employing over 3,600 employees in locations around the globe. To run its business, the company relies on various mission-critical applications, including its SAP ERP system. The company’s IT infrastructure includes an on-premises VMware data center and a remote data center for business continuity and disaster protection. Since most of their applications run in a Windows environment, Bonfiglioli used guest-level Windows Server failover clustering in their VMware environment to provide high availability and disaster protection.

The company’s IT team implemented a program to move part of its IT operations into the Microsoft Azure cloud and to leverage Azure as their disaster recovery site. An important requirement of the company’s migration plan was to ensure the cloud architecture could provide better high availability protection than before and ensure Bonfiglioli could continue to meet its strict Service Level Agreements (SLAs).

In its on-premises environment, the company uses VMware clustering, which allows Windows Server Failover Clustering (WSFC) to manage failover to a secondary server in the event of an infrastructure failure. However, it was a challenge to provide this type of protection in the cloud because using guest-clustering with shared-bus disks is not a viable cloud solution. Creating a cluster in VMware using Raw Device Mapping and shared-bus disks (RDM) is challenging and creates limitations for backing up the virtual machines.

The Solution

After evaluating several solutions, Bonfiglioli chose SIOS DataKeeper as their cloud high availability and disaster recovery solution upon learning that SIOS DataKeeper is the only certified high availability clustering solution for SAP in a public cloud. In addition, Bonfiglioli’s management consulting partner, BGP, had experience with SIOS DataKeeper and knew that it is easy to install, transparent to the operating system, and a proven, highly effective solution.

With SIOS, the IT team fashioned a cluster environment without RDM. They created a two-node cluster in VMware and added SIOS DataKeeper Cluster Edition to synchronize storage via real-time data replication in each cluster instance. In an on-premises environment, synchronized storage appears to WSFC as a single shared storage disk.

SIOS DataKeeper also provides high availability protection for the company’s SAP instance and eliminates single point of failure. Using SIOS DataKeeper, the IT team replicated an SSD-tiered disk partition in the company’s on-premises data center using real-time data replication. This allows Bonfiglioli to restore their virtual machines to Microsoft Azure in the event of a disaster.

The Results

Daniele Bovina, Systems Architect at Bonfiglioli, comments about the results, “SIOS DataKeeper gave us an easy way to move our business-critical SAP system to the Microsoft Azure cloud while meeting our stringent SLAs for availability, disaster recovery, and performance.”

—————————————————————————————————————————–

For more information about SIOS Clustering Solutions, contact us or request a free trial.

References

Reproduced from SIOS

Achieving IT Resilience with High Availability

December 8, 2021 by Jason Aw Leave a Comment

Achieving IT Resilience with High Availability

What is IT Resilience?

IT resilience is the ability of an organization to maintain acceptable service levels when there is a disruption of business operations, critical processes, or your IT ecosystem. In this digital age, high availability is critical to your organization’s success. Your customers won’t tolerate a downed website. And you cannot afford a downed ERP, CRM, or other business-critical system either. This is where high availability comes in.

Your organization must “check the boxes” on many different technologies and solutions to ensure IT resiliency – not the least among them is ensuring, at a minimum, that you have backup, disaster recovery, cyber resilience, and high availability solutions in place. For purposes of this article, we will be talking about high availability (HA) as one of the key elements required to ensure IT resiliency.

What is High Availability?

High availability systems ensure that business operations continue – with total transparency to customers and users – when your system, applications, and network goes down. HA is a component of a technology system that eliminates single points of failure to ensure continuous operations or uptime for an extended period. Highly available systems incorporate five design principles: automatic failover, automatic detection of application-level failures, no data loss, automatic and quick fail over to redundant components, and push-button failover and failback for planned maintenance.

————————————————————————————————————————–

IT Resilience and High Availability – A Non-Example!

This past August, Nissan Group’s data center in Denver crashed because of a power outage. The system impacted was known internally as NNANet. It is a Nissan solution used by employees to order cars/parts, manage product rebate sales, get info on vehicle recalls, file warranty claims needed to price and start service work, and getting financing information. NNANet is described as Nissan’s lifeblood because everything Nissan does goes through NNANet.

The system remained down for four days, impacting operations at many retailers and production systems at two factories. The company, its retailers, and customers were all impacted.

The Impact

Clearly, this is an example where correctly configured, properly located high availability systems would have saved the day or at least minimized the impact of the crash. What was a high availability situation literally turned in to a disaster for Nissan as “commerce among consumers, retailers, distribution networks, manufacturing plants and finance companies.” were all affected for four days.[1] Nissan reset dealer sales goals by 10 percent for the month as a result of the crash. The total financial impact for Nissan and its dealers/retailers/partners remains to be seen.

IT Resilience– A Real-World Example!

Cayan™ is the leading provider of payment technologies and its Genius Customer Engagement Platform® aggregates and integrates every conceivable transaction technology, payment type, and customer program – both present and future – into a single platform. The Genius platform, as well as other mission-critical applications at Cayan, run on SQL Server.

Cayan customers include some of the world’s largest online retailers, companies with no tolerance for downtime. “Our top priority is ensuring that our customers can complete transactions continuously 24 hours a day, seven days a week,” said Paul Vienneau, Chief Technology Officer, Cayan.

Cayan needed a high availability and disaster recovery system for their SQL Server database. The company considered a traditional shared storage cluster, but a SAN solution was expensive, complicated to manage, and introduced risk associated with a single point of failure.

For these reasons, Cayan IT staff decided to use SIOS #SANLess clusters. SANLess clusters use local storage so there is minimal performance overhead and fast application response times. The SIOS software, SIOS DataKeeper, is integrated with Windows Server Failover Clustering (WSFC). SIOS uses efficient, real-time, data replication to synchronize local storage in the primary and remote cluster nodes, making them appear to WSFC as a virtual SAN.

The Impact

Since deploying SIOS SANless clusters, Cayan has not experienced any downtime or data loss. Comments Paul Vienneau, CTO, “We are very pleased with the SIOS DataKeeper software. It met or exceeded our expectations. Implementation and ongoing administration were easy, and we have had zero downtime since we implemented our SIOS SANLess clusters.”

There are no customer satisfaction issues to report, no lost revenues, no unproductive employees, no disruption to the business.

—————————————————————————————————–

SIOS: Achieve IT Resilience with High Availability

SIOS DataKeeper™ uses efficient block-level replication to keep local storage synchronized, enabling the secondary nodes in your cluster to continue to operate after a failover with access to the most recent data.

SIOS products uniquely protect any Windows- or Linux-based application operating in physical, virtual, cloud or hybrid cloud environments and in any combination of site or disaster recovery scenarios, enabling high availability and disaster recovery for applications such as SAP S/4HANA and databases, including Oracle, SQL Server, DB2, and many others. The “out-of-the-box” simplicity, configuration flexibility, reliability, performance, and cost effectiveness of SIOS products set them apart from other clustering software.

In a Windows environment, SIOS DataKeeper Cluster Edition seamlessly integrates with and extends Windows Server Failover Clustering (WSFC) by providing a performance-optimized, host-based data replication mechanism. While WSFC manages the software cluster, SIOS performs the replication to enable disaster protection and ensure zero data loss in cases where shared storage clusters are impossible or impractical, such as in cloud, virtual, and high-performance storage environments.

In a Linux environment, SIOS LifeKeeper™ and SIOS DataKeeper for Linux provides a tightly integrated combination of high availability failover clustering, continuous application monitoring, data replication, and configurable recovery policies, protecting your business-critical applications from downtime and disasters.

Whether you are in a Windows or Linux environment, SIOS products free your IT team from the complexity and challenges of creating and managing high availability computing infrastructures. They provide the intelligence, automation, flexibility, high availability, and ease-of-use IT managers need to protect business-critical applications from downtime or data loss.

SIOS = IT Resilience with HA + DR

Backup, high availability, disaster recovery, and cyber resilience are all important elements in achieving IT resilience. With SIOS solutions, you can “check the box” for both high availability and disaster recovery – two solutions in one. With the ability to replicate to multiple targets, you can configure a multi-node failover cluster with nodes located in multiple locations to protect your systems from failures and disasters.

For more information, and to ensure IT resilience for your organization, get a free demo of SIOS today.

References:

Reproduced with permission from SIOS

How to Achieve High Availability with Clusters

December 3, 2021 by Jason Aw Leave a Comment

How to Achieve High Availability with Clusters

What is High Availability?

High availability (HA) is a component of a technology system that eliminates single points of failure to ensure continuous operations or uptime for an extended period. High availability clusters are groups of servers that support business-critical applications that require minimal downtime and continuous availability.

All organizations use a variety of business-critical databases and applications, such as data warehouses, e-commerce applications, customer relationship management systems (CRM), financial systems, supply chain management, and business intelligence systems. When a system, database, or application fails, these organizations require high availability protection to keep systems up and running and minimize the risk of lost revenue, unproductive employees, and unhappy customers.

Highly available clusters incorporate five design principles:

They automatically failover to a redundant system to pick up an operation when an active component fails. This eliminates single points of failure.
They can automatically detect application-level failures as they happen, regardless of the causes.
They ensure no amount of data loss during a system failure.
They automatically and quickly failover to redundant components to minimize downtime.
They provide the ability to manually failover and failback to minimize downtime during planned maintenance.

TechTarget defines HA as “a system or component that is continuously operational for a desirably long length of time. Availability can be measured relative to ‘100% operational’ or ‘never failing.’ A widely-held but difficult-to-achieve standard of availability for a system or product is known as ‘five 9s’ (99.999%) availability.”

But let’s define High Availability in simple terms:

High Availability ensures your systems, databases, and applications operate when and as needed.

The “when” takes into consideration the percentage of time the application must be up and running. The “as needed” takes into consideration the proper operation of the system, database, and/or applications with no data loss.

Depending on the system and/or application, high availability will be different. For example, with mission-critical applications, such as your eCommerce systems, four 9s’ (99.99%) availability is considered an industry standard. With 99.99% availability, you can expect no more than 52.60 minutes of downtime per year or 8.64 seconds of downtime per day. However, for non-critical applications and systems, such as a single desktop failure, high availability may be two 9s (99%), which equates into 8.77 hours of downtime per year or 1.44 minutes of downtime per day. When measuring acceptable downtime, it is important that you consider:

Unplanned downtime (e.g., hardware or software failures)
The planned downtime needed for routine hardware and software maintenance
Uptime at the database and application level

Your choice for high availability is dependent on many factors, including how critical the applications are to the business, whether customers are impacted, how often the applications run, how many users are affected, how quickly a database or application must failover to the redundant system, and how much data loss is tolerable.

High Availability Metrics: RTO and RPO

The two metrics normally used to assess HA (and Disaster Recovery (DR) as well) are the Recovery Time Objective (RTO) and the Recovery Point Objective (RPO).

RTO is the maximum tolerable duration of any outage. Online transaction processing applications generally have the lowest RTOs, and those that are mission-critical often have an RTO of only a few seconds.
RPO is the maximum amount of data loss that can be tolerated when a failure happens. For HA, RPO is often zero to specify there should be zero data loss under all failure scenarios.

However, there is a difference between what RTOs and RPOs you can achieve to support high availability versus disaster recovery. With HA, data replication can be synchronous because your redundant components are on your LAN environment. Active and standby databases can be concurrently updated, enabling full, automatic, real-time recoveries that can satisfy the most demanding RTOs and RPOs. As a result, your standby instance is “hot” and in sync with your active instance, so it is ready to immediately take over in the event of a failure.

However, to recover systems, software, and data in the event of a disaster requires redundant components to be on a wide-area network (WAN). This is important because you must keep redundant components in a geographic location away from the active instance. But with a WAN, data replication is asynchronous to avoid negatively impacting throughput performance. This means that updates to standby instances will lag updates made to the active instance, resulting in a delay during the recovery process. Since disasters are rare, some delay may be tolerable and is dependent upon (a) how critical it is to your business to achieve the lowest possible RTO and RPO and (b) how much budget you can allocate to achieve the best RTO and RPO.

How SIOS Helps You Achieve High Availability

SIOS offers a single solution to meet both high availability and disaster recovery needs across a wide variety of operating systems, infrastructure environments, and applications, including SAP, SQL Server, Oracle, and other environments running in SAN-based, shared storage configurations or SANless, local data storage configurations.

Windows Environment: When added to a Windows Server Failover Cluster (WSFC) environment, SIOS DataKeeper lets you create a SANless cluster, where shared storage clusters are impossible or impractical, or add replication for disaster protection in your SAN-based Windows clusters. Fast, efficient host-based replication synchronizes local storage on local and remote cluster nodes, creating a SANLess cluster in any combination of physical, virtual, or cloud environments.
Linux Environments: SIOS Protection Suite for Linux is a packaged clustering software solution that uses SIOS LifeKeeper and SIOS DataKeeper to provide a tightly integrated combination of high availability failover clustering, continuous application monitoring, data replication, and configurable recovery policies to protect your business-critical applications and data from downtime and disasters. SIOS Protection Suite lets you build SAN or SANLess clusters using a wide range of storage devices, including direct-attached storage, iSCSI, and Fibre Channel. SIOS Protection Suite for Linux supports all major Linux distributions, including Red Hat Enterprise Linux, SUSE Linux Enterprise Server, CentOS, and Oracle Linux.

With SIOS solutions, RPO is always zero and RTO is dependent on the application but typically 30 seconds to a few minutes for some applications. Let’s discuss one customer’s “SIOS in action” case study using HA clusters at Switzerland’s largest retail company.

Migros Achieves Critical Business Continuity of its POS system with SIOS High Availability Solutions

Migros is Switzerland’s largest retail company, its largest supermarket chain, and the largest employer with more than 100,000 employees. It is also one of the forty largest retailers in the world. Partnering with Realstuff Informatik AG, a Switzerland-based IT service provider and reseller of SIOS solutions, Migros was looking to replace its Point of Sale (POS) system with a new platform that was more efficient to operate and could minimize the threat of downtime.

The new POS system provides price and product assortment information in Migros’ 650 stores and the retailer needed a high availability solution to support day-to-day sales. Without an HA system, employees could not price products or weigh goods if there was a system failure, bringing operations to a standstill. After evaluating options, Migros decided it wanted an open-source server environment that offered high availability and continuous data protection, was independent of a virtual environment, and could be internally operated by the company’s IT staff. To address these requirements, the team picked SIOS Protection Suite for Linux for replication to safeguard POS data.

For system design, customer training, and native language support, Realstuff partnered with the SIOS Competence and Support Center for Central and Eastern Europe, based in Dresden, Germany and operated by Computer Concept. It was important to Migros to get 24x7x365 support during the regional office time from the Competence and Support Center.

Realstuff implemented the SIOS Protection Suite high-availability solution to constantly monitor the POS servers and replicate data. At each store location, two servers are used to ensure continuous data protection. If one server fails, the second instance takes over the work instantaneously. In addition, both servers mirror data assets on the monitoring system. Read the full Migros case study here.

Final Thoughts

The regional Competence and Support Center consulted with Realstuff to provide insight and direction on the implementation and launch and conducted a three-day training workshop to train the Migros team. Richard Huber, manager and a member of the executive board at Realstuff, commented post-deployment that the benefits of the SIOS high availability solution were its flexibility, reliability, ease of use, and assurance that data is kept in sync at all times.

Today, Migros has met its requirements for HA with SIOS easy-to-use solution, which provides continuous monitoring of servers, storage, applications, databases, and network connections to detect points of failure, reduce downtime, maintain client connectivity, and provide uninterrupted data access.

For more information on SIOS solutions and how SIOS can help you achieve HA in a SQL Server environment, you can read “Why Clustering for SQL Server High Availability” here.

Reproduced from SIOS

Four Reasons To Use An Avoidance Strategy In High Availability

November 28, 2021 by Jason Aw Leave a Comment

Four Avoidance Strategies for Improving Cluster Resilience, Performance, and Outcomes

Simple Steps for Deployment in SIOS Protection Suite Cluster Environment

Avoiding something – we’ve all done it before. An old flame we see in the store while walking with our spouse, a salesperson when we aren’t “ready to buy”, and even a boss while we are out on “vacation”. When I was the manager of a development team, I caught a glimpse of a direct report browsing in a store while they were supposed to be out of the office sick. They ducked between clothing racks and scurried down the next aisle and hurried away. We’ve all done it before, and in some cases, for mental health, physical health, or reasons that remain private and personal, we all need some measures of avoidance. Even in HA. So, how do you add avoidance to your High Availability environment, and why?

Four reasons to use an avoidance strategy in High Availability

Better Performance (minimizing server overload)

One reason to use avoidance strategies in HA is to increase application and server performance. Consider the case of three servers running production workloads, let’s call them Server Alpha, Server Beta, Server Gamma. Servers Alpha and Beta are running critical applications backed by a database, while Server Gamma is running reports and data transformation jobs. In the event of a failure of Server Alpha, a failover to Server Beta would traditionally occur. However, because server Beta is already running a large workload, the resulting additional application load might result in an undesirable server overload and poor performance for both applications. So it might be wise to deploy an avoidance strategy to make sure that Server Gamma is chosen as the failover target.

Performance Optimization

Consider again the scenario of three servers, Alpha, Beta, and Gamma. Servers Alpha and Beta are scaled to handle peak workloads, while Server Gamma is a cost-optimized server. In the event of a failure of Server Alpha and Server Beta, a failover will occur to the cost-optimized server, Gamma. However, this server is not scaled to handle peak workloads, nor the workloads of both Server Alpha and Server Beta at the same time. In this instance, an avoidance strategy can be used to optimize performance by automatically moving one or both of the workloads from Server Gamma as soon as another host is available.

HA Optimization

HA Optimization is another scenario for deploying avoidance strategies. Like the performance optimization strategy, HA optimization is used to ensure that your environment can survive most failure scenarios and that your applications are optimized to provide the highest level of availability possible at any point in time. HA optimization is important for an application such as SAP with replicated enqueue processes. In any SAP environment, you do not want the ASCS (ABAP SAP Central Service) and ERS (enqueue replication services) instance residing on the same server for extended periods of time because of the risk of lost locks and canceled jobs. To prevent this from occurring you can use an avoidance strategy that causes the ERS and ASCS instances to always run on opposite cluster nodes. Consider the case of three servers running production workloads, let’s call them Servers Alpha, Beta, Gamma. Server Alpha is running the ASCS instance, while Server Beta is running the ERS instance. Server Gamma functions as a third node for failovers of both Server Beta (ERS) and Server Alpha (ASCS). If Beta crashes, you wouldn’t want the ERS resource running on the same node as the ASCS instance. To ensure this operation, you can deploy an avoidance strategy that automatically checks first and ensures the two applications are on separate servers, and maintain SAP ASCS/ERS best practices for lock failover.

DR Avoidance

Suppose you have two data centers: City Alpha and City Beta which are about 70 miles apart with most of your clients centrally located between them. However, due to recent changes in internal organizations, mergers/closures and acquisitions, and governance requirements, your IT team has to add a third data center that is located in City Gamma, which is about 350 miles from Alpha and Beta. Now the resources which were primarily protected in Alpha and Beta are also extended to the Gamma location. Given that most of the users and teams are near the Alpha and Beta locations and even the most extreme users are located in neighboring cities, your team needs to avoid a failover to the Gamma location. Like the other strategies, a DR avoidance seeks to optimize performance, in/out regional data costs, latency, and client access by avoiding the DR node should only one node within either region fail. It would also ensure that even if both nodes fail after different times, failover always occurs to the other node in the cluster or data center before moving to DR.

So, how do you deploy an avoidance strategy?

Many providers have affinity rules that can be configured, while others use a combination of server priorities or manual steps. In the case of the SIOS Protection Suite for Linux, you can use a number of built-in methods including:

Resource prioritization

In the event of a failure, resources will fail over to the server where they have the lowest remaining priority and cascade to any additional servers (Alpha, Beta, and Gamma). Server Alpha is the primary server for Resource.HR, Server Beta is the primary server for Resource.MFG, and Server Gamma is the backup server for all resources/servers. Using resource prioritization, Resource.HR would have a priority of one (1) on Server Alpha and a priority of two (2) on Server Gamma. While Resource.MFG could have a priority one (1) on Server Beta and a priority of two (2) on Server Gamma. If customers wanted to optimize the use of the environment, then Resource.HR could have a priority of three (3) on Server Beta and Resource.MFG could have a priority of three (3) on Server Alpha. In the event of a failure of Server Alpha, the resource Resource.HR would fail to Server Gamma first before trying to come in-service (be restored) on Server Alpha.

SIOS Protection Suite for Linux (UI and CLI) allow users to specify a priority for each server and resource combination.

Policy or affinity rules

Policy rules can also be used to prevent a resource recovery from occurring on a given server and thereby allowing a resource to avoid a specified server that may be running a more critical or resource-intensive workload. Typical policies include:

- - - - Constraint policies that will block an application from a specific server by default.
        
        Resource policies that will block an application from a server that does not have sufficient resources
        
        Temporal policies that define a time period that resources are allowed or disallowed from a system
        
        Custom policies that define preferred servers or possible application ownership abilities within the cluster

The SIOS Protection for Linux CLI allows users to specify policy rules which can disable failover to a specific resource for a specified server, provide temporal policies guarding failures, disable failures of a specific application type, constraint policies, and custom policies.

Specific Avoidance Resources

The most granular way to establish a resource avoidance strategy is to deploy specific avoidance scripts within each hierarchy. This method will allow the user to configure specific applications, (eg app1 and app2), to avoid one another whenever possible while allowing other applications to run without restriction. In the case of our three servers, Alpha, Beta, and Gamma, and three resources app1, app2, and app3 this method would provide the greatest flexibility. In this example, app1 and app2 will seek to avoid collocation when a server fails, but app3 will fail to the next available node based on priorities without any collocation restrictions.

For additional examples of avoidance strategies and resources, consider the SIOS Protection Suite for Linux documentation. If a customer has two applications, app1 and app2, that they require to run on different nodes whenever possible, the customer can create two avoidance terminal leaf node resources using the SIOS Protection Suite for Linux gen/app resource and the ‘/opt/LifeKeeper/lkadm/bin/avoid_restore’ script.

Reproduced from SIOS