split brain Archives - SIOS SANless clusters

What is “Split Brain” and How to Avoid It

June 23, 2022 by Jason Aw Leave a Comment

What is “Split Brain” and How to Avoid It

As we have discussed, in a High Availability cluster environment there is one active node and one or more standby node(s) that will take over service when the active node either fails or stops responding.

This sounds like a reasonable assumption until the network layer between the nodes is considered. What if the network path between the nodes goes down?

Neither node can now communicate with the other and in this situation the standby server may promote itself to become the active server on the basis that it believes the active node has failed. This results in both nodes becoming ‘active’ as each would see the other as being dead. As a result, data integrity and consistency is compromised as data on both nodes would be changing. This is referred to as “Split Brain”.

To avoid a split brain scenario, a Quorum node (also referred to as a ‘Witness’) should be installed within the cluster. Adding the quorum node (to a cluster consisting of an even number of nodes) creates an odd number of nodes (3, 5, 7, etc.), with nodes voting to decide which should act as the active node within the cluster.

In the example below, the server rack containing Node B has lost LAN connectivity. In this scenario, through the addition of a 3rd node to the cluster environment, the system can still determine which node should be the active node.

Quorum/Witness functionality is included in the SIOS Protection Suite. At installation, Quorum / Witness is selected on all nodes (not only the quorum node) and a communication path is defined between all nodes (including the quorum node).

The quorum node doesn’t host any active services. Its only role is to participate in node communication in order to determine which are active and to provide a ‘tie-break vote’ in case of a communication outage.

SIOS also supports IO Fencing and Storage as quorum devices, and in these configurations an additional quorum node is not required.

Reproduced with permission from SIOS

The single best way to deploy quorum/witness

April 26, 2022 by Jason Aw Leave a Comment

The single best way to deploy quorum/witness

During a recent meeting, a customer asked a question about High Availability (HA) and the need for quorum/witness feasibility. Their question was, “What is the best way to deploy quorum/witness?” The answer to their question is simple, there is no single best way to deploy quorum. To understand why, let’s start by defining three key things: what is a witness resource, a quorum resource and a split-brain scenario.

What is split brain?

In a normal cluster environment, the protected application is running on the primary node in the cluster. In the event of an application failure of that primary node, the clustering software moves the application operation to a secondary or remote node, which assumes the role of primary. At any given time, there is only one primary node.

Split brain is a condition that occurs when members of a cluster are unable to communicate with each other, but are in a running and operable state, and subsequently take ownership of common resources simultaneously. In effect, you have two bus drivers fighting for the steering wheel. Split-brain, due to its destructive nature, can cause data loss or data corruption and is best avoided through use of fencing, quorum, witness, or a quorum/witness functionality for cluster arbitration.

In most cluster managers, quorum is maintained when:

All servers are able to see the same state for all cluster peers and the witness
All servers are able to see the same state for the all of cluster peers, though not the witness
All servers are able to see the see the witness resource, though not each other, and avoid split-brain scenarios

In most cluster managers, quorum is lost when:

Servers are unable to see all cluster peers and the witness server
Servers are unable to see a majority of cluster peers, even though they can see the witness server
Servers are unable to access or maintain access to the quorum resource to successfully arbitrate quorum membership and resource access

What is a witness resource (or server)?

A witness resource is a server, network endpoint, or a device that is used to achieve and maintain quorum when a cluster has an even number of members. A cluster with an odd number of members, using cluster majority, does not need to use a witness resource as all members of the cluster server to arbitrate majority membership.

What is quorum and a quorum resource?

A quorum resource is a resource (device, system, block storage, file storage, file share, etc) that serves as a means for arbitration of the cluster state and membership. In some cluster managers, quorum is a resource within the cluster that aids or is required for any cluster state and cluster membership decisions. In other cluster managers, quorum functions as a tie-breaker to avoid split-brain.

More than One Way to Deploy a Quorum

Given the critical nature of quorum it is essential that HA architectures deploy quorum/witness resources properly, and fortunately (or unfortunately) there is no single, best way to deploy quorum. There are several factors that may shape the way in which your witness and quorum resources behave. These factors include:

1. Whether or not your deployment will be on-premises, cloud, or hybrid

Deploying in an on-premises datacenter where additional storage devices, such as fiber channel storage, power control devices or connections, or traditional stonith devices are present will provide customers with additional options for quorum and witness functionality that may not reside in the cloud. Likewise, cloud and hybrid environments present differences in what can be deployed and what use cases quorum is being deployed to prevent. Additionally, latency requirements and differences may limit what types of devices and resources are available for a quorum/witness configuration.

2. Your recovery objectives

Recovery objectives are also important to consider when designing and architecting your quorum and witness resources. In an example two node cluster (node A and node B), when node A experiences a loss of connectivity to node B, what is the highest priority for recovery. If the witness/quorum resources are in the same network with node A, this could result in node A remaining online, but severed from clients, while node B is unable to assess quorum and takeover. Likewise, if the quorum device lived only in the region, data-center or network with node B, a loss could result in a failover of resources to a defunct network or center or away from a functional and operation primary node.

3. Redundancy of Available Data Centers (or Regions) Within Your Infrastructure

The redundancy of the data center or region is also an important factor in your HA topology with quorum/witness. If your data center has only two levels of redundancy, you must understand the tradeoff between placement of the quorum/witness in the same data center as the primary or standby cluster node. If the data center has more than two redundant tiers, such as a third availability zone or access to a second region, this option would provide a higher level of redundancy for the cluster.

4. Disaster Recovery Requirements

Understanding your true disaster recovery requirements is also a major factor in your design. If your cluster manager software requires access to the quorum/witness in order to recover from a total data center outage (or region failure) then you’ll need to understand this impact on your design. Many high availability software packages have tools or methods for this scenario, but if your software does not, your design and placement of quorum/witness may need to accommodate this reality.

5. Number Of Members Within the Cluster, and Their Location

An additional quorum/witness server is typically not required when the cluster contains an odd number of nodes. However, if using only two nodes in a cluster or deploying a DR node that is not always available may change your architecture. As VP of Customer Experience I have worked with customers who have deployed three node architectures, but for cost savings they automate periodic shutdown of the third server.

6. Operation System and Cluster Manager

The final factor to mention on quorum/witness is the cluster manager and operating system. Not all HA software and cluster managers are equal when it comes to deployment of quorum/witness or arbitration of quorum status. Some clustering software requires shared disks for arbitration, others are more flexible allowing shares (NFS, SMB, EFS, Azure Files, and S3). Being aware of what your cluster manager requires, and the modes that it supports with regards to quorum (simple majority, witness, file share, etc) will impact not only what you deploy, but how you deploy.

The single best way to deploy a quorum/witness server is to understand your vendor’s definition of quorum/witness and their available options, know your requirements, factor in the limitations or opportunities presented by your data center (or cloud environment) and architect the solution that provides your critical systems the highest level of protection against split-brains, false failovers, and downtime.

-Cassius Rhue, VP, Customer Experience

Reproduced from SIOS

Understanding and Avoiding Split Brain Scenarios

September 23, 2021 by Jason Aw Leave a Comment

Understanding and Avoiding Split Brain Scenarios

Split brain. Most readers of our blogs will have heard the term, in the computing context that is, yet we cannot help but to sympathize with those whose first mental image is of the chaos that would result if someone had two brains, both equally in control at the same time.

What is a Failover Cluster Split Brain Scenario?

In a failover cluster split brain scenario, neither node can communicate with the other, and the standby server may promote itself to become an active server because it believes the active node has failed. This results in both nodes becoming ‘active’ as each would see the other as being failed. As a result, data integrity and consistency is compromised as data on both nodes would be changing. This is referred to as split brain.

There are two types of split-brain scenarios which may occur for an SAP HANA resource hierarchy if appropriate steps are not taken to avoid them.

HANA Resource Split Brain: The HANA resource is Active (ISP) on multiple cluster nodes. This situation is typically caused by a temporary network outage affecting the communication paths between cluster nodes.
SAP HANA System Replication Split Brain: The HANA resource is Active (ISP) on the primary node and Standby (OSU) on the backup node, but the database is running and registered as the primary replication site on both nodes. This situation is typically caused by either a failure to stop the database on the previous primary node during failover, having Autostart enabled for the database, or a database administrator manually running “hdbnsutil -sr_takeover” on the secondary replication site outside of the clustering software environment.

Avoiding Split Brain Issues

Recommendations for avoiding or resolving each type of split-brain scenario in the SIOS Protection Suite clustering environment are given below.

While in a split-brain scenario, a message similar to the following is logged and broadcast to all open consoles every quickCheck interval (default 2 minutes) until the issue is resolved.

EMERG:hana:quickCheck:HANA-SPS_HDB00:136363:WARNING: 
A temporary communication failure has occurred between servers 
hana2-1 and hana2-2. 
Manual intervention is required in order to minimize the risk of 
data loss. 
To resolve this situation, please take one of the following resource 
hierarchies out of service: HANA-SPS_HDB00 on hana2-1 
or HANA-SPS_HDB00 on hana2-2. 
The server that the resource hierarchy is taken out of service on 
will become the secondary SAP HANA System Replication site.

Recommendations for resolution:

Investigate the database on each cluster node to determine which instance contains the most up-to-date or relevant data. This determination must be made by a qualified database administrator who is familiar with the data.
The HANA resource on the node containing the data that needs to be retained will remain Active (ISP) in LifeKeeper, and the HANA resource hierarchy on the node that will be re-registered as the secondary replication site will be taken entirely out of service in LifeKeeper. Right-click on each leaf resource in the HANA resource hierarchy on the node where the hierarchy should be taken out of service and click Out of Service …
Once the SAP HANA resource hierarchy has been successfully taken out of service, LifeKeeper will re-register the Standby node as the secondary replication site during the next quickCheck interval (default 2 minutes). Once replication resumes, any data on the Standby node which is not present on the Active node will be lost. Once the Standby node has been re-registered as the secondary replication site, the SAP HANA hierarchy has returned to a highly available state.

SAP HANA System Replication Split Brain Resolution

While in this split-brain scenario, a message similar to the following is logged and broadcast to all open consoles every quick. Check interval (default 2 minutes) until the issue is resolved.

EMERG:hana:quickCheck:HANA-SPS_HDB00:136364:WARNING: 
SAP HANA database HDB00 is running and registered as 
primary master on both hana2-1 and hana2-2. 
Manual intervention is required in order to 
minimize the risk of data loss. To resolve this situation, 
please stop database instance 
HDB00 on hana2-2 by running the command ‘su – spsadm -c 
“sapcontrol -nr 00 -function Stop”’ 
on that server. Once stopped, 
it will become the secondary SAP HANA System Replication site.

Recommendations for resolution:

Investigate the database on each cluster node to determine whether important data exists on the Standby node which does not exist on the Active node. If important data has been committed to the database on the Standby node while in the split-brain state, the data will need to be manually copied to the Active node. This determination must be made by a qualified database administrator who is familiar with the data.
Once any missing data has been copied from the database on the Standby node to the Active node, stop the database on the Standby node by running the command given in the LifeKeeper warning message:

su – adm -c “sapcontrol -nr <Inst#> -function Stop”

where is the lower-case SAP System ID for the HANA installation and <Inst#> is the instance number for the HDB instance (e.g., the instance number, for instance, HDB00 is 00)
Once the database has been successfully stopped, LifeKeeper will re-register the Standby node as the secondary replication site during the next quickCheck interval (default 2 minutes). Once replication resumes, any data on the Standby node which is not present on the Active node will be lost. Once the Standby node has been re-registered as the secondary replication site, the SAP HANA hierarchy has returned to a highly available state.

Being aware of common split-brain scenarios and taking these steps to mitigate them can save you time and protect data integrity.

Reproduced with permission from SIOS