Amazon AWS Archives - SIOS SANless clusters

Build High Availability with a HANA 3-Node HSR Cluster in AWS Using SIOS LifeKeeper

January 14, 2024 by Jason Aw Leave a Comment

Build High Availability with a HANA 3-Node HSR Cluster in AWS Using SIOS LifeKeeper

Introduction: How to Ensure HA and DR in Your Database

Creating a highly available SAP HANA environment in AWS is a critical task for many businesses. This guide provides a detailed walkthrough for setting up a 3-node HANA System Replication (HSR) cluster using SIOS LifeKeeper in AWS, ensuring database resilience and high availability.

Prerequisites

AWS account with the ability to deploy EC2 instances.
SIOS LifeKeeper software
SIOS LifeKeeper evaluation or permanent license
SAP HANA software
Familiarity with AWS services and SAP HANA.

Step 1: Preparing Your AWS Environment

EC2 Instance Deployment

Deploy three EC2 instances in AWS. These instances will act as your HANA cluster’s primary, secondary, and tertiary nodes. Ensure they meet the hardware and software requirements for SAP HANA and SIOS LifeKeeper. Make sure you follow the SAP HANA sizing guidelines when building your instance.

Network Configuration

Configure your VPC, subnets, and security groups to allow communication between the nodes and to enable access to necessary services.

When configuring HANA nodes in different regions, you can protect the DNS name using the SIOS LifeKeeper for Linux Route53 Application Recovery Kit or ARK. Following is the architecture for a 3 node HANA database in AWS:

When setting up the storage use separate EBS volumes for /usr/sap, /hana/data, /hana/log and /hana/shared.

We have 2 VPCs one for each region. We need to setup peering between the VPCs and add routes to the routing table to ensure the servers can talk to each other. We also need to modify the security group to allow traffic between the servers.

Finally we need to create a hosted zone containing both VPCs and add records for the domain and hostname we will use to communicate with the active HANA node.

Step 2: Installing and Configuring SAP HANA

Installation on Each Node

Install SAP HANA on each EC2 instance. Ensure that the versions are consistent across all nodes to avoid compatibility issues. This is by far the most challenging process.

Start by determining your installation settings. For mine, I am using the following:

SID: D11

HANA Instance number: 11
HANA db fqdn in Route53: saphana.sapdemo
Node1 hostname: sapdemohana1
Node2 hostname:sapdemohana2
Node3 hostname:sapdemohana3
Instance type: r5.4xlarge

Local Instance Storage:

30GB / (root volume)
15GB /usr/sap
60GB /hana/shared*
200GB /hana/data
200GB /hana/log

*For this installation, this is storage that is not shared between these HANA database servers. If you try to use shared storage, you will not be able to create an identical server because hdblcm will prevent the installation with an error about the SID and instance already existing.

Install the HANA server software on each node independently as if it were a standalone system. Make sure all required libraries are installed, for RHEL 8 they are in SAP note 2772999. You will need to make sure you create the symbolic link after installing compact-sap-c++-9-9.1.1-2.3.el7_6.x86_64.rpm by running: ln -s /opt/rh/SAP/lib64/compat-sap-++-10.so /usr/sap/lib/libstdc++.so.6

yum install xorg-x11-server-Xorg xorg-x11-xauth -y #for the LifeKeeper GUI
yum install nfs-utils

Create partitions, format storage and attach it. Create your swap file.

I create RSA keys on all my hosts and then allow the root ssh login between the hana nodes by adding the public key to the .ssh/authorized_keys file. This will make installation much easier.

Mount your HANA installation media volume.

yum localinstall compat-sap-c++-10-10.2.1-11.el7_9.x86_64.rpm
yum localinstall compat-sap-c++-9-9.1.1-2.3.el7_6.x86_64.rpm
mkdir /usr/sap/lib
ln -s /opt/rh/SAP/lib64/compat-sap-++-10.so /usr/sap/lib/libstdc++.so.6
yum install compat-sap-c++-10 libatomic -y

Run hdblcm from the correct hana installation media directory. Once you have successfully installed HANA on all nodes you are ready for the next step.

System Replication Setup

You will need to take a backup prior to enabling HSR:

su – <SID>adm [ie. su -d11adm]
hdbsql -i <instance number>adm -u system -p <password> [ie. hdbsql -i 11 -u system -p “password123”]
BACKUP DATA USING FILE(‘/usr/sap/<SID>/HDB<instance number>’) [ie. BACKUP DATA USING FILE(‘/usr/sap/D11/HDB11’)

Repeat the backup process above on all nodes.

Configure HANA System Replication on each node:

Start the HDB instance on primary HANA System if it isn’t already running: sapcontrol -nr <instance number> -function StartSystem HDB [ie: sapcontrol -nr 11 -function StartSystem HDB]

Start the HSR at primary site: hdbnsutil -sr_enable –name=<primary site name> [ie. hdbnsutil -sr_enable –name=sapdemohana1

Stop the HDB instance on secondary HANA System: sapcontrol -nr <instance number> -function StopSystem HDB [ie. sapcontrol -nr 11 -function StopSystem HDB]

In the additional HANA systems, backup the KEY and DAT files and copy the primary KEY and DAT files to the required locations:

mv /usr/sap/<SID>/SYS/global/security/rsecssfs/data/SSFS_<SID>.DAT /usr/sap/<SID>/SYS/global/security/rsecssfs/data/SSFS_<SID>.DAT.BAK [ie. mv /usr/sap/D11/SYS/global/security/rsecssfs/data/SSFS_D11.DAT /usr/sap/D11/SYS/global/security/rsecssfs/data/SSFS_D11.DAT.BAK]

mv /usr/sap/<SID>/SYS/global/security/rsecssfs/key/SSFS_<SID>.KEY /usr/sap/<SID>/SYS/global/security/rsecssfs/key/SSFS_<SID>.KEY.BAK [ie. mv /usr/sap/D11/SYS/global/security/rsecssfs/key/SSFS_D11.KEY /usr/sap/D11/SYS/global/security/rsecssfs/key/SSFS_D11.KEY.BAK]

scp root@<primary node>:/usr/sap/<SID>/SYS/global/security/rsecssfs/data/SSFS_<SID>.DAT /usr/sap/<SID>/SYS/global/security/rsecssfs/data/SSFS_<SID>.DAT [ie. scp root@sapdemohana1:/usr/sap/D11/SYS/global/security/rsecssfs/data/SSFS_D11.DAT /usr/sap/D11/SYS/global/security/rsecssfs/data/SSFS_D11.DAT]

scp root@<primary node>:/usr/sap/<SID>/SYS/global/security/rsecssfs/key/SSFS_<SID>.KEY /usr/sap/<SID>/SYS/global/security/rsecssfs/key/SSFS_<SID>.KEY [ie. scp root@sapdemohana1:/usr/sap/D11/SYS/global/security/rsecssfs/key/SSFS_D11.KEY /usr/sap/D11/SYS/global/security/rsecssfs/key/SSFS_D11.KEY]

Make sure the owner of the key and dat files are <SID>adm sapsys:

[root@sapdemohana2 ~]# ls -l /usr/sap/D11/SYS/global/security/rsecssfs/data/
total 12
-rw-r–r– 1 d11adm sapsys 2960 Jan 3 22:19 SSFS_D11.DAT
-rw-r–r– 1 d11adm sapsys 2960 Jan 3 22:15 SSFS_D11.DAT.BAK

hdbnsutil -sr_register –name=<name of secondary HSR> –remoteHost=<primary host name of SAP HANA system> –remoteInstance=<remote instance number> –operationMode=<delta_datashipping | logreplay | logreplay_readaccess> –replicationMode=<sync | syncmem | async>

[ie. hdbnsutil -sr_register –name=sapdemohana2 –remoteHost=sapdemohana1 –remoteInstance=11 –operationMode=logreplay –replicationMode=sync]

Check HSR status on all systems, run the following command as the admin user: d11adm@sapdemohana4:/usr/sap/D11/HDB11>hdbnsutil -sr_state

Once all systems are online you can move onto the next step.

Step 3: Installing SIOS LifeKeeper

AWS CLI Installation

Install AWS CLI and configure it with a key with the following permissions:

Route Table (backend) configuration:

ec2:DescribeRouteTables
ec2:ReplaceRoute
ec2:DescribeNetworkInterfaceAttribute
ec2:ModifyNetworkInterfaceAttribute
Elastic IP (frontend) configuration:
ec2:DescribeAddresses
ec2:AssociateAddress
ec2:DisassociateAddress

LifeKeeper Installation

Install SIOS LifeKeeper on each node. This involves running the installation script and following the setup wizard, which guides you through the necessary steps. For this installation, I am using the networking, Route53 ARK and the database, SAP HANA ARK along with the witness functions.

Edit the /etc/selinux/config file and disable selinux:

I also changed my hostname and edited the /etc/hosts file. Finally edit the /etc/default/LifeKeeper file and add /usr/local/bin to the PATH:

Change NOBCASTPING=1:

I also changed the QUORUM_LOSS_ACTION to osu:

Make sure you have Xwindows working. I remove the cp alias from .bashrc and add /opt/LifeKeeper/bin and /usr/local/bin to my .bash_profile along with copy the ec2-users .Xauthority file to root and the <SID>adm home directory so that Xwindows will work:

I change the root password and reboot. Prior to launching the LifeKeeper GUI. make sure that HSR is online on all nodes and all nodes are registered:

Configuration

Launch the LifeKeeper GUI: lkGUIapp and login with the root user and password:

Click on the connect button to login to the additional nodes in the cluster:

Once logged into all the nodes click on the Create Comm Path button:

Hit next when it asks for the Local Server and then hold shift and select all the nodes:

hit Accept Defaults and hit done when it is complete. Click on the Create Comm path button again and this time change to the second node:

hit next and select the 3rd node:

hit the next button until you can hit the Accept Defaults button. When complete hit done. Now click on the Create Resource Hierarchy button:

Select the IP kit and hit next:

Hit next until you get to the IP resource page. Here enter 0.0.0.0 and hit next:

Hit next until you get to the Create button. Hit the Create button:

When it is complete hit next: Hit Accept Defaults with the Target Server showing the second node:

When complete hit Next Server:

Hit Accept Defaults with the 3rd node showing and when complete hit Finish:

Hit done:

Now we have an IP resource we can add our Route53 resource which will change the dns entry to resolve the fqdn to the active nodes IP address. In this case saphana.sapdemo will resolve to the ip address of sapdemohana1 (172.31.0.25). Hit the Create Resource Hierarchy button to start the process:

Select Route53 and hit next:

Keep hitting next until you get to the Domain Name. It should prepopulate with the active hosted zone name. Hit Next.

Enter the Host Name that everything will use to connect to the HANA database and hit next:

hit next until you get to the create button and click the create button. When done hit Next:

At the Pre-Extend Wizard hit Accept Defaults:

When done hit Next Server:

The Target Server will show the 3rd node. Hit Accept Defaults:

Hit Finish when done. Then hit Done. You can then expand the tree. Open a terminal session to the 2nd node and ping the fqdn for the HANA database [ie. ping -c3 saphana.sapdemo]

Right click on the top standby under sapdemohana3 and select In Service:

Hit In Service on the next screen and then hit Done when it is complete:

Go to the terminal window and repeat the ping test:

You can see that the hostname now resolves to sapdemohana3. Put sapdemohana1 back into service before moving onto the next step.

Step 4: Integrating SAP HANA with SIOS LifeKeeper

Resource Hierarchy Creation

Using the LifeKeeper GUI, create a resource hierarchy for SAP HANA on each node. This setup is crucial for managing failover and recovery processes. Make sure that HSR is active on node1 and the additional nodes are registered:

Click on the Create Resource button:

Select the SAP HANA recovery kit and hit next until you get to the IP Address screen:

Select none and hit next:

Hit next until you get to the Create screen and hit Create:

After creation hit next and then Accept Defaults for node2:

Again when node2 is complete hit Next Server and Accept Defaults:

When complete hit Finish, then hit Done:

Right click on the Hana Hierarchy and select Create Dependency:

For the child Resource Tag select the route53 resource from the pulldown and hit next:

Click on Create Dependency:

Click on Done. Then select view Expand Tree:

If everything is Green we are ready to test.

Step 5: Testing and Validation

Failover/Switchover Testing

Conduct thorough failover tests to ensure that the system correctly switches over to the secondary or tertiary node in case of a primary node failure. This testing should include scenarios like network failures, hardware issues, and software crashes.

The first test we will perform is a switchover which would be used to perform maintenance activities or if you had a scheduled outage. Right click on the 2nd node and select In Service – Takeover with Handshake…

Hit Perform Takeover:

This test will switch to the 2nd node with the minimal downtime to users. When the 2nd node is up and running hit finish:

After some time node1 will come back into standby – In Sync.

Now we can perform a failover test. Open a terminal to node 2 and type echo c > /proc/sysrq-trigger to simulate a system crash. You will see node 1 take over because it has the highest priority of 1:

Eventually, everything will go back to normal:

There are a number of additional types of failure scenarios you may wish to test. Just ensure that your standby nodes are in sync prior to starting your testing.

Data Synchronization Verification

Verify that data is correctly replicating across all nodes. Consistent data across nodes is crucial for the integrity of the HSR setup.

Performance Monitoring

Regularly monitor the performance of the SAP HANA instances and the LifeKeeper setup. Check for any anomalies or issues that could indicate potential problems. Check the /var/log/lifekeeper.log file to ensure that everything is performing as expected. You may need to adjust the Heartbeat timer and number of heartbeats missed based on the network performance. These can be configured in the /etc/default/LifeKeeper file. The tunables are LCMHBEATTIME and LCMNUMHBEATS. You can also check the status of Lifekeeper from the command line with the command lcdstatus -q.

Conclusion

Setting up a 3-node HANA HSR cluster in AWS with SIOS LifeKeeper involves detailed planning and execution. By carefully following these steps, you can establish a robust, resilient, and highly available SAP HANA environment in the cloud, ensuring your critical data remains accessible and secure. SIOS LifeKeeper for Linux makes the administration, monitoring, and maintenance of SAP HANA quick and easy.

SIOS provides resources and training for all our products.

Reproduced with permission from SIOS

Multi-Cloud Disaster Recovery

October 30, 2021 by Jason Aw Leave a Comment

Multi-Cloud Disaster Recovery

If this topic sounds confusing, we get it. With our experts’ advice, we hope to temper your apprehensions – while also raising some important considerations for your organisation before or after going multi-cloud. Planning for disaster recovery is a common point of confusion for companies employing cloud computing, especially when it involves multiple cloud providers.

It’s taxing enough to ensure data protection and disaster recovery (DR) when all data is located on-premises. But today many companies have data on-premises as well as with multiple cloud providers, a hybrid strategy that may make good business sense but can create challenges for those tasked with data protection. Before we delve into the details, let’s define the key terms.

What is multi-cloud?

Multi-cloud is the utilization of two or more cloud providers to serve an organization’s IT services and infrastructure. A multi-cloud approach typically consists of a combination of major public cloud providers, namely Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure.

Organizations choose the best services from each cloud provider based on costs, technical requirements, geographic availability, and other factors. This may mean that a company uses Google Cloud for development/test, while using AWS for disaster recovery, and Microsoft Azure to process business analytics data.

Multi-cloud differs from hybrid cloud which refers to computing environments that mix on-premises infrastructure, private cloud services, and a public cloud.

Who uses multiple clouds?

Regulated industries – Many organizations run different business operations in different cloud environments. This may be a deliberate strategy of optimizing their IT environments based on the strengths of individual cloud providers or simply the product of a decentralized IT organization.

Media and Entertainment – Today’s media and entertainment landscape is increasingly composed of relatively small and specialized studios that meet the swelling content-production needs of the largest players, like Netflix and Hulu. Multi-cloud solutions enable these teams to work together on the same projects, access their preferred production tools from various public clouds, and streamline approvals without the delays associated with moving large media files from one site to another.

Transportation and Autonomous Driving – Connected car and autonomous driving projects generate immense amounts of data from a variety of sensors. Car manufacturers, public transportation agencies, and rideshare companies are among those motivated to take advantage of multi-cloud innovation, blending both accessibility of data across multiple clouds without the risks of significant egress charges and slow transfers, while maintaining the freedom to leverage the optimal public cloud services for each project.

Energy Sector – Multi-cloud adoption can help lower the significant costs associated with finding and drilling for resources. Engineers and data scientists can use machine learning (ML) analytics to identify places that merit more resources to prospect for oil, to gauge environmental risks of new projects, and to improve safety.

Multi-cloud disaster recovery pain points:

Not reading before you sign. Customers may face issues if they fail to read the fine print in their cloud agreements. The cloud provider is responsible for its computer infrastructure, but customers are responsible for protecting their applications and data. There are many reasons for application downtime that are not covered under cloud SLAs. Business critical workloads need high availability and disaster recovery protection software as well.

Developing a centralized protection policy. A centralized protection policy must be created to cover all data, no matter where it lives. Each cloud provider has its unique way of accessing, creating, moving and storing data, with different storage tiers. It can be cumbersome to create a disaster recovery plan that covers data across different clouds.

Reporting. This is important for ensuring protection of data in accordance with the service-level agreements that govern it. Given how quickly users can spin up cloud resources, it can be challenging to make sure you’re protecting each resource appropriately and identifying all data that needs to be incorporated into your DR plan.

Test your DR plan. Customers must fully screen and test their DR strategy. A multi cloud strategy compounds the need for testing. Some providers may charge customers for testing, which reinforces the need to read the fine print of the contract.

Resource skill sets. Finding an expert in one cloud can be challenging; with multi-cloud you will either need to find expertise in each cloud, or the rare individual with significance in multiple clouds.

Overcoming the multi-cloud DR challenge

Meeting these challenges requires companies to develop a data protection and recovery strategy that covers numerous issues. Try asking yourself the following strategic questions:

Have you defined the level of criticality for all applications and data? How much money will a few minutes of downtime for critical applications cost your organization in end user productivity, customer satisfaction, and IT labor?
Will data protection and recovery be handled by IT or application owners and creators in a self-service model?
Did you plan for data optimization, using a variety of cloud- and premises-based options?
How do you plan to recover data? Restoring data to cloud-based virtual machines or using a backup image as the source of recovery?

Obtain the right multi-cloud DR solution

The biggest key to success in data protection and recovery in a multi-cloud scenario is ensuring you have visibility into all of your data, no matter how it’s stored. Tools from companies enable you to define which data and applications should be recovered in a disaster scenario and how to do it – whether from a backup image or by moving data to a newly created VM in the cloud, for example.

The tool should help you orchestrate the recovery scenario and, importantly, test it. If the tool is well integrated with your data backup tool, it can also allow you to use backups as a source of recovery data, even if the data is stored in different locations – like multiple clouds. Our most recent SIOS webinar discusses this same point; watch it here if you’re interested. SIOS Datakeeper lets you run your business-critical applications in a flexible, scalable cloud environment, such as Amazon Web Services (AWS), Azure, and Google Cloud Platform without sacrificing performance, high availability or disaster protection. SIOS DataKeeper is available in the AWS Marketplace and the only Azure certified high availability software for WSFC offered in the Azure Marketplace.

12 Questions to Uncomplicate Your Cloud Migration

September 10, 2021 by Jason Aw Leave a Comment

12 Questions to Uncomplicate Your Cloud Migration

Cloud migration best practices

The “cloud is becoming more complicated,” it was the first statement in an hour-long webinar detailing the changes and opportunities with the boom in cloud computing and cloud migration. The presenter continued with an outline of cloud related things that traditional IT is now facing in their journey to AWS, Azure, GCP or other providers.

There were nine areas that surfaced as complications in the traditional transition to cloud:

Definitions
Pricing
Networking
Security
Users, Roles, and Profiles
Applications and Licensing
Services and Support
Availability
Backups

As VP of Customer Experience for SIOS Technology Corp I’ve seen how the following areas can impact a transition to cloud. To mitigate these complications, consumers are turning to managed service providers, cloud solution architects, contractors and consultants, and a bevy of related services, guides, blog posts and related articles. Often in the process of turning to outside or outsourced resources the complications to cloud are not entirely removed. Instead, companies and the teams they have employed to assist or to transition them to cloud still encounter roadblocks, speed bumps, hiccups and setbacks.

Most often these complications and slowdowns in migrating to the cloud come from twelve unanswered questions:

What are our goals for moving to the cloud?
What is your current on-premise architecture? Do you have a document, list, flow chart, or cookbook?
Are all of your application, database, availability and related vendors supported on your target cloud provider platform?
What are your current on-premises risks and limitations? What applications are unprotected, what are the most common issues faced on-premises?
Who is responsible for the cloud architecture and design? How will this architecture and design account for your current definitions and the definitions of the cloud provider?
Who are the key stakeholders, and what are their milestones, business drivers, and deadlines for the business project?
Have you shared your project plan and milestones with your vendors?
What are the current processes, governance, and business requirements?
What is the migration budget and does it include staff augmentation, training, and services? What are your estimates for ongoing maintenance, licensing, and operating expenses?
What are your team’s existing skills and responsibilities?
Who will be responsible for updating governance, processes, new cloud models, and the various traditional roles and responsibilities?
What are the applications, services, or functions that will move from IaaS to SaaS models?

Know Your Goals for the Cloud

So, how will answering these twelve questions will improve your cloud migration. As you can see from the questions, understanding your goals for the cloud is the first, and most important step. It is nearly universally accepted that “a cloud service provider such as AWS, Azure, or Google can provide the servers, storage, and communications resources that a particular application will require,” but for many customers, this only eliminates “he need for computer hardware and personnel to manage that hardware.” Because of this fact, often customers are focused on equipment or data center consolidation or reduction, without considering that there are additional cloud opportunities and gaps that they still need to consider. For example, cloud does eliminate management of hardware, but it “does not eliminate all the needs that an application and its dependencies will have for monitoring and recovery,” so if your goal was to get all your availability from the cloud, you may not reach that goal, or it may require more than just moving on premises to an IaaS model. Knowing your goals will go a long way in helping you map out your cloud journey.

Know Your Current On-Premises Architecture

A second critical category of questions needed for a proper migration to the cloud, (or any new platform) is understanding the current on-premises architecture. This step not only helps with the identification of your critical applications that need availability, but also their underlying dependencies, and any changes required for those applications, databases, and backup solutions based on the storage, networking, and compute changes of the cloud. Answering this question is also a key step in assessing the readiness of your applications and solutions for the cloud and quantifying your current risks.

A third area that will greatly benefit from working through these questions occurs when you discuss and quantify current limitations. Frequently, we see this phase of discovery opening the door to limitations of current solutions that do not exist in the cloud. For example, recently our services team worked with a customer impacted by performance issues in their SQL database cluster. A SIOS expert assisting with their migration inquired about the solution and architecture, and VM sizing decisions. After a few moments, a larger more application sized instance was deployed correcting limitations that the customer had accepted due to their on-premise restrictions on compute, memory, and storage. Similarly we have worked with customers who were storage sensitive. They would run applications with smaller disks and a frequent resizing policy, due to disk capacity constraints. While storage costs should be considered, running with minimal margins can become a limitation of the past.

Understand Business and Governance Changes

The final group of questions help your team understand schedules, business impacts, deadlines, and governance changes that need to be updated or replaced because they may no longer apply in the cloud. Migrating to the cloud can be a smooth transition and journey. However, failing to assess where you are on the journey and when you need to complete the journey can make it into a nightmare. Understanding timing is important and can be keenly aided by considering stakeholders, application vendors, business milestones, and business seasons. Selfishly, SIOS Technology Corp. wants customers to understand their milestones because as a Service provider it minimizes the surprises. But, we also encourage customers to answer these questions as they often uncover misalignment between departments and stakeholders. The DBAs believes that the cutover will happen on the last weekend of the month, but Finance is intent on closing the books over the final weekend of the same month; or the IT team believes that cutover can happen on Monday, but the applications team is unavailable until Wednesday, and perhaps most importantly the legal team hasn’t combed through the list of new NDAs, agreements, licensing, and governance changes necessary to pull it all together.

As customers work through the questions, with safety and empathy, what often emerges is a puzzle of pieces, ownership, processes, and decision makers that needs to be put back together using the cloud provider box top and honest conversations on budget, staffing, training, and services. The end result may not be a flawless migration, but it will definitely be a successful migration.

For help with your cloud migration strategy and high availability implementation, contact SIOS Technology Corp.

– Cassius Rhue, VP, Customer Experience

Learn more about common cloud migration challenges.

Read about some misconceptions about availability in the cloud.

Reproduced from SIOS

Cloud Migration Best Practices for High Availability

March 25, 2021 by Jason Aw Leave a Comment

Cloud Migration Best Practices for High Availability

In 2020 we have seen more enterprises migrating more of their mission-critical applications, ERPs and databases to the cloud. However, not all of these migrations have been smooth. I have personally witnessed cloud migration projects dramatically slowed and even stopped due to a lack of planning for application availability, the complexity of retrofitting ‘DIY High Availability’, misunderstanding related to what a ‘lift and shift’ entails and unexpected costs.

There are a number of best practices, cloud checklists, and other ways for organizations to prepare for the cloud. The following best practices should be factored into every migration strategy for high availability clustering for those who have either hit pause on their 2020 cloud migration, or plan to forge ahead in 2021.

Cloud Migration Best Practices

Gather the requirements

Many organizations moving to the cloud think that the cloud is an on-premises architecture moved to the cloud. This misunderstanding in cloud migration often leads to stalls and delays when networking, storage, disk speeds, and system sizes for on-premises collide with the cloud reality. A smoother transition to cloud begins by gathering the real requirements for the infrastructure, governance and compliance, security, sizing, and related controls and resources.

Design and Document

In the design phase, the architecture of on-premises environments is mapped to the cloud environment that has been chosen for maximum availability and thoroughly documented. In this phase, as the architecture takes shape and you identify the strategy for IPs, load balancers, IOPS, and data availability. Teams need to look at how availability native to the cloud needs to be augmented with a robust application and infrastructure availability solution capable of automating complexities of the cloud. At SIOS, our experts in AWS and Azure clustering and availability work with customers to swap on-premises NFS for AWS EFS, Azure ANF, or a standalone NFS cluster tier. Additionally, a key part of the successful implementation in this phase will be documenting everything. Documentation is an often-neglected, but essential element of migration success.

Plan for High Availability

Achieving high availability in the cloud requires understanding the requirements, creating the design, and documenting a plan that lays out a strategy for achieving those requirements. A basic plan should include staffing, staff training, deploying a QA system testing, pre-production steps, deployment, post deployment validation, and on-going iterations. The best outcomes for cloud migration arise from a deliberate, planned process; not an ad hoc, break-fix approach.

Staff

How well is your team staffed for the cloud migration? Traditional help desk, client/server IT, or IT teams may not be enough for the cloud migration. If your team is new to the cloud, it may be time to consider adding more resources or professional services-based solutions. Migrating to the cloud can be taxing, tedious, and difficult without the proper insight, information, or training. Does your staff need to incorporate training related to the cloud environment? And while you are looking into training and professional services to assist your IT team, check with your vendor for training related to the availability solution. Many vendors provide flexible training for the HA solution and cloud training can be obtained with the cloud vendors or popular sites such as Udemy.

Deploy QA

The QA deployment phase is the phase in which the team executes the plans for deploying the actual systems into the cloud. Successful deployment teams validate their plans and strategy, understand the data migration process, uncover any missing dependencies, and prepare for the next step in the process, especially testing. When this step is skipped or skimped on, the once-promising migrations often stall or fail. When you reach the QA system deployment phase, your team will do the heavy lifting of the initial migration and configuration of the applications, databases, and critical data in the cloud.

Test Your High Availability

Testing in your QA environment is a critical step. These tests are not a waste of time; they are a time saver. Deploying environments in the cloud is often easier than deploying on-premises. Your QA environment can be scripted with tools like Ansible, deployed quickly as templates from the cloud marketplace or a cloned image, or deployed and built from cloud formation templates. Once deployed, disaster scenarios can be ironed out and optimized before a disaster, not in them. Test scenarios can be leveraged to identify overprovisioning, under-provisioning or bottlenecks with networking or disk speeds. A full test scenario can also be used as a part of an on-boarding strategy for new staff. Additionally, testing should be performed on snapshots and backups as well.

Deploy Production

When the testing phase completes, and your team has validated the test results, the next phase is to move from QA to pre-production, and from pre-production to go-live. The testing phase is the last phase of the heavy lifting involving final user acceptance testing, a final cutover and update of the production data, and then the users.

Review, Revise, and Repeat

A successful migration does not end once you reach the go-live phase, but continues through the lifecycle phases. In the post go-live phase of the cloud migration strategy, your team continues to review, revise, and repeat the steps from ‘Gather’ through ‘Deploy Production’. In fact, your team should repeat this process again and again, based on requirements specific to releases, application updates, security updates, related system maintenance, operating system versions, disaster recovery planning, as well as the requirements from your high availability vendor’s own best practices. The cloud platform is always evolving and adding new features, functionality, and updates that can enhance your existing HA solution and architecture. Reviewing, revising, and repeating the process will be a necessary step in successful onboarding.

In 2021 we’ll see more enterprises migrating more mission-critical applications, ERPs and databases to the cloud. A key major factor in their success will be utilizing cloud migration best practices to avoid delays and failures throughout the process. Understanding your business requirements and needs, documenting the design and plan, deploying in a QA environment with purpose built clustering solutions, and executing extensive testing before go-live will be essential. Contact SIOS Technology to understand how the SIOS Protection Suite can be included in your thoughtful cloud migration best practices.

-Cassius Rhue, VP, Customer Experience

Reproduced from SIOS

Do I Even Need High Availability software in the Cloud?

January 23, 2021 by Jason Aw Leave a Comment

Do I Even Need High Availability software in the Cloud?

Allow me to jog your memory . . .

Maybe today you haven’t had a failure in a dozen or more months and suddenly the slam dunk renewal for your high availability software licenses is under the redline of the CFO’s pen. Or perhaps, due in part to the overuse of the term, clever marketing, or the redefinition of high availability your CIO, once the most die-hard availability fan, has begun to waver on its value. Or maybe, just maybe it’s not the CFO or the CIO, but you who decided that you might have enough HA without needing high or higher availability software in the equation.

While the public cloud is incredibly resilient and availability has been considered at many turns, the need for stable, maintainable high availability software is still a present reality. Consider 2020 for example, advances in public cloud computing and availability have still been unable to prevent common mishaps such as bad practice and bad code causing an application crash, undisclosed data center failures, nameless construction snafus affecting power or networking, capacity overload on a VM, or cooling system failures as noted by one CRN article.

Here are seven reasons you still need higher availability software in the Cloud:

1. For increased depth and breadth of application coverage for your most critical enterprise applications

No single Cloud vendor will have all the tools, software, and applications you need baked into their cloud infrastructure in a way that your enterprise can consume. Because of this, you will likely migrate workloads to the cloud into IaaS offerings that require someone or something to protect these workloads and make sure they are highly available.

2. For automated and intelligent application recovery of systems, resources and their dependencies.

Cloud vendors know about clouds. High availability vendors know about application high availability. When, not if, a failure happens in the cloud your application needs intelligent recovery of the failed components; systems, application resources, infrastructure components and their dependencies. As an expert in availability, your software vendor has a breadth of knowledge baked into the application protection. In the SIOS Protection Suite for Linux product, wizard based automation using industry best practices, and a long history of application expertise drive clear automated recovery of applications in a failure scenario

3. For intelligent block level data replication for your application, increasing your resilience in the event of a system panic or datacenter outage

Application coverage and smart, balanced recovery is made possible when the data is available on the standby system in the event of a failure. When your HA vendor includes block level data replication, you are able to expand the failover resilience of your application beyond a single datacenter or region into multiple datacenters and regions. Block level data replication is also an effective way to avoid hardware values that impact cloud volumes in a single data center. One cloud incident involving a datacenter power and subsequent generator failure resulted in hardware damage and data loss for instances running in the single data center. Cloud does not mean that you are completely safe from all failures, and backups as well as highly available data replication copies is a must.

4. For a faster response mechanism for problem detection and resolution

Your HA software is the first line of defense for identifying and remediating application failures. With monitoring daemons, an application failure can be quickly detected and remediated by the software before users are critically impacted. In addition, your high availability software such as the SIOS Protection Suite for Linux solution includes configurable methods for sending and communicating alerts to administrators, event consoles, or dashboards which allows you to instantly and effectively communicate with key .

5. For an additional source of data that can be mined and audited to help predict the health and stability of your enterprise

Data is king. Your high availability software is a tremendous source of data and information about your environment that can be mined and audited. As your HA solution responds to application failures, infrastructure issues and latencies, and drives your uptime through transient failures their logs capture critical information on the health of your enterprise. As VP of Customer Experience, our Customer Success and Support team was able to use our HA logs to provide a health check up to a customer, informing them of several application issues and optimizations possible because of the captured log data.

6. For the balanced and truthful viewpoint, and supplemental wisdom needed for your enterprise

In addition to the value of the High Availability software, there is another reason why you still need HA software in the cloud. That additional reason is the balanced and truthful viewpoint and supplemental wisdom of your HA vendor’s development, services and customer experience teams. Your HA software is supported by a team of experts, experienced availability engineers, and most importantly a services and support team with years of best practice experience, application specific knowledge, and cross pollinated ideas and skills that can greatly benefit your enterprise.

7. For reduced planned maintenance downtime

Last but not least, your higher availability software helps reduce or possibly eliminate the downtime required for upgrades, minor patches, and rolling preventative maintenance. Utilizing your HA software’s switchover and failover capabilities, your standby server can be actively patched, updated, and tested then promoted to being the active availability node. Thereby ensuring that your critical systems are running on the latest releases while minimizing the penalty of upgrades.

Yes, the Cloud has added increased hardware and platform stability for applications, developers, and enterprise users, but if you’ve begun thinking that you don’t need high availability you are heading down a dark alley that ends in the despair of a late night of cold pizza putting applications back online, explaining the unexplainable, and contemplating dusting off resumes. So thanks for letting me jog your memory . . . You and your HA software need each other, even in the Cloud.

– Cassius Rhue, Vice President, Customer Experience

Reproduced with permission from SIOS