Video: SIOS LifeKeeper
Reproduced with permission from SIOS
SIOS SANless clusters High-availability Machine Learning monitoring

A leading Hong Kong-based beverage manufacturer produces 61 beverage brands including the number one software drink brand in the world and distributes them to more than 728 million customers throughout Hong Kong, mainland China, Taiwan and western USA.
The company relies on an SAP ERP (enterprise resource planning) system running in a Red Hat Linux environment to manage a variety of critical business operations. The SAP environment comprises a variety of services including the ABAP (Advanced Business Application Programming), SAP Central Services (ASCS), Evaluated Receipt Settlement, Web Dispatcher and the DB2 database. They used a large Storage Area Network (SAN) for data storage. The core SAP applications handle all business operations across the company’s beverage division. In their on-premises data center, the company provided uptime protection for this system using data replication and backups of the SAN.
The company’s IT department determined that they could achieve true high availability (99.99% uptime), disaster recovery, scalability and cost savings by migrating to the cloud and using failover clustering to protect their critical SAP system. However, they realized that SAN and other shared storage required for traditional failover clustering is not practical in some clouds and is not available in others.
After extensive evaluation, the company chose to move their SAP environment to Amazon EC2. They established four key criteria for evaluating their choices for an HA/DR solution. Their solution needed to:
The company’s cloud account manager recommended that they consider the SIOS Protection Suite, offered through AWS China. The SIOS software is certified by SAP for both NetWeaver and DB2, and that SIOS is fully tested and supported on Red Hat Enterprise and other distributions of Linux. The company tested the SIOS clustering software extensively under a variety of challenging failure scenarios, and also evaluated the throughput performance during periods of peak demand. The IT team’s confidence in SIOS Protection Suite increased as it passed each of their rigorous tests and proved to be remarkably easy to use.
SIOS Protection Suite for Linux enables SANless failover clustering to provide full HA and DR for SAP and its critical services. The SIOS software uniquely includes modules called Application Recovery Kits (ARKs) that provide application-specific functionality that simplifies configuration and ensures failover orchestration maintains application best practices. The SAP and HANA ARKs automate configuration steps and validate configuration inputs and manage IP failover, and boot order to minimize human error. Unlike other clustering software that only validates server operability, the SIOS clustering software verifying that SAP and critical services are running, that databases are mounted and available, that any file shares or exports are available, and that clients are able to connect. To ensure these services are all functioning properly, SIOS software continuously monitors the servers, virtual machines, operating system and all major components of the SAP software. For DR protection, the company located the active and standby cluster nodes in different AWS Availability Zones for geographical separation.
SIOS Protection Suite has made it possible for this leading beverage manufacturer to meet the stringent recovery time and recovery point objectives established for its SAP/DB2 environment. To date, the configuration has experienced no perceptible downtime, including during planned maintenance. And these results have been realized with minimal effort, making it possible for the IT staff to focus more on projects that enhance employee productivity or otherwise improve business operations.

This video covers high availability for building maintenance and security, featuring Harry Aujla, technical director at SIOS. Building Management System (BMS) solutions are software-based solutions running on hardware, designed and built with varying degrees of autonomy and intelligence. BMS can either be hosted on-site or off-site at a geographically distant control center.
The BMS sector is at the cusp of another technical evolution as its customers are looking at how the cloud is changing the operating landscape. The market is now sufficiently mature in that many of the cloud vendors now offer secure and redundant connections to their platforms. There’s an implicit trust that BMS related data is being securely transmitted to and from the cloud. A lot of BMS companies are running in the cloud as well.
To define your SLSs before customers embark on a high availability project is important. If we have an instance running in the cloud where our BMS solution is running and this instance for whatever reason happens to fail, the cloud vendors will take necessary actions to recover the instance. But what happens if you suffer an application software issue within the cloud instance? You need a way of monitoring application level failures and orchestrating their recovery. It’s important to consider adding a high availability clustering solution like SIOS that can address the application level high availability needs which can then contribute towards maintaining application performance.
Reproduced with permission from SIOS

Split brain. Most readers of our blogs will have heard the term, in the computing context that is, yet we cannot help but to sympathize with those whose first mental image is of the chaos that would result if someone had two brains, both equally in control at the same time.
In a failover cluster split brain scenario, neither node can communicate with the other, and the standby server may promote itself to become an active server because it believes the active node has failed. This results in both nodes becoming ‘active’ as each would see the other as being failed. As a result, data integrity and consistency is compromised as data on both nodes would be changing. This is referred to as split brain.
There are two types of split-brain scenarios which may occur for an SAP HANA resource hierarchy if appropriate steps are not taken to avoid them.
Recommendations for avoiding or resolving each type of split-brain scenario in the SIOS Protection Suite clustering environment are given below.
While in a split-brain scenario, a message similar to the following is logged and broadcast to all open consoles every quickCheck interval (default 2 minutes) until the issue is resolved.
EMERG:hana:quickCheck:HANA-SPS_HDB00:136363:WARNING: A temporary communication failure has occurred between servers hana2-1 and hana2-2. Manual intervention is required in order to minimize the risk of data loss. To resolve this situation, please take one of the following resource hierarchies out of service: HANA-SPS_HDB00 on hana2-1 or HANA-SPS_HDB00 on hana2-2. The server that the resource hierarchy is taken out of service on will become the secondary SAP HANA System Replication site.
While in this split-brain scenario, a message similar to the following is logged and broadcast to all open consoles every quick. Check interval (default 2 minutes) until the issue is resolved.
EMERG:hana:quickCheck:HANA-SPS_HDB00:136364:WARNING: SAP HANA database HDB00 is running and registered as primary master on both hana2-1 and hana2-2. Manual intervention is required in order to minimize the risk of data loss. To resolve this situation, please stop database instance HDB00 on hana2-2 by running the command ‘su – spsadm -c “sapcontrol -nr 00 -function Stop”’ on that server. Once stopped, it will become the secondary SAP HANA System Replication site.
su – adm -c “sapcontrol -nr <Inst#> -function Stop”
where is the lower-case SAP System ID for the HANA installation and <Inst#> is the instance number for the HDB instance (e.g., the instance number, for instance, HDB00 is 00)
Being aware of common split-brain scenarios and taking these steps to mitigate them can save you time and protect data integrity.
Reproduced with permission from SIOS