SIOS SANless clusters - Page 33 of 192 - SIOS SANless clusters High-availability Machine Learning monitoring

January 17, 2023	Understanding the Complexity of High Availability for Business-Critical Applications Understanding the Complexity of High Availability for Business-Critical Applications Minimizing downtime in systems, databases, and applications is the key to maximizing productivity. Modern organizations rely on business-critical systems, databases, and applications—such as enterprise resource planning (ERP), customer relationship management (CRM), e-commerce, financial systems, and supply chain management—to operate efficiently and deliver superior customer experiences. When a system, database, or application fails, high availability protection restores operation to keep the business up and running. What Is High Availability? High availability is an attribute of a system, database, or application that is designed to operate continuously and reliably for extended periods. The goal of high availability is to reduce or eliminate planned and unplanned downtime for critical applications by incorporating redundant components and other technologies to address single points of failure in a system, database, or application. Simply stated, high availability ensures that your system, database, or application operates when and as expected: “when” refers to the percentage of time the system, database, or application must be up and running as expected—meaning that the application operates the way users expect and meets their needs in a timely manner. IDC Model Service-level agreements (SLAs) for high availability help ensure that key components of the IT infrastructure are operational and available during business hours. IDC has created an SLA model for high availability that defines five levels with the following uptime requirements: • AL4 (Continuous Availability—System Fault Tolerance): No user interruption and a total maximum of no more than 5 minutes and 15 seconds of planned and unplanned downtime per year (99.999% or “five-nines” availability). • AL3 (High Availability—Traditional Clustering): Minimal user interruption and a total maximum of no more than 52 minutes and 35 seconds of planned and unplanned downtime per year (99.99% or “four-nines” availability). • AL2 (Recovery—Data Replication and Backup): Some user interruption and a total maximum of no more than 8 hours, 45 minutes, and 56 seconds of planned and unplanned downtime per year (99.9% or “three-nines” availability). • AL1 (Reliability—Hot Swappable Components): All service stops and a total of 87 hours, 39 minutes, and 29 seconds of planned and unplanned downtime per year (99% or “two-nines” availability). • AL0 (Unprotected Servers): All service stops, and no uptime SLAs are defined. Your high availability requirements depend on the criticality of the overall system, the application, and numerous other factors, including: • How critical the applications are to the business • Whether customers notice an impact • How often the applications run • How many users are affected by downtime • How quickly a database or application must fail over to the redundant system to avoid disruption • How much data loss is tolerable Five nines availability is typically reserved for applications that require continuous “stateful” operation. For business-critical applications four-nines availability is standard. Non-critical systems and applications, you may only require two-nines availability. When determining acceptable downtime, it’s important to consider: • Unplanned downtime (that is, hardware or software failures) • Planned downtime for routine hardware and software maintenance • Uptime at the application and database level Various high availability solutions can help businesses achieve their SLA objectives for different systems, databases, and applications. Although continuous availability (AL4) may seem like the most appropriate goal for business-critical deployments, it’s important to find the right balance between cost and availability. Continuous availability can also have a negative impact on downtime required for planned maintenance as the system generally has to be taken offline when application or OS updates are applied, versus high availability, which typically allows for rolling updates. High Availability Metrics: RTO vs. RPO In addition to uptime and availability, Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) are important metrics used to assess high availability (as well as disaster recovery) in a system, database, or application. RTO is the maximum tolerable duration of any outage. Online transaction processing applications generally have the lowest RTOs, and those that are business-critical often have an RTO of only a few seconds. RPO is the maximum amount of data loss that can be tolerated when a failure happens. For disaster recovery, a typical RPO for an application and its associated data may be 24 hours. Nightly backups ensure that any changes to data over the past 24 hours can be restored in the event of a disaster. However, for high availability applications and data, the RPO is often zero. That is, there should be no data loss under any failure scenarios. Traditional Clustering High availability clusters are groups of server nodes (and other components) that support business-critical applications that require minimal downtime. Clustering software lets you configure your servers as a cluster so that multiple servers can work together to provide high availability and prevent data loss. IT organizations rely on high availability clustering to eliminate single points of failure and minimize the risk of downtime and data loss. A traditional, on-premises high availability cluster is a group of two or more server nodes connected to shared storage (typically, a storage area network, or SAN) that are configured with the same operating system, databases, and applications (see Figure 1). Figure 1: Traditional server clustering with shared storage One of the nodes is designated as the primary (or active) node and the other(s) are designated as secondary (or standby) nodes. If the primary node fails, clustering allows operation of a system, database, or application to automatically fail over to one or more secondary nodes and continue operating as normal with minimal disruption. Since the secondary node is connected to the same storage, operation continues with zero data loss. The benefits of this cluster architecture are reduced downtime, elimination of data loss, and protected data integrity. However, there are many scenarios in which shared storage is not wanted. A failure in the shared storage will take all of the clusters offline, making it a single point of failure (SPoF) risk. SAN storage can also be costly and complex to own and manage. Lastly, using shared storage in the cloud can add significant, unnecessary cost and complexity. Some clouds do not offer a shared storage option at all. As shown in Figure 2, SANless or “shared nothing” clusters are the best alternative to shared storage. In these configurations, every cluster node has its own local storage. Efficient host-based, block-level replication is used to synchronize storage on the cluster nodes, keeping them identical. In the event of a failover, secondary nodes access an identical copy of the storage used by the primary node. The benefits of this cluster architecture are elimination of a SPoF, elimination of SAN cost and complexity, ease of use and cost savings in the cloud, reduced downtime, and mitigation of data loss. Figure 2: High availability clustering with SANless or shared-nothing storage Design Principles The most advanced high availability clusters incorporate the following design principles: • They automatically and quickly fail over to a redundant system when an active component fails • They maintain application-specific best practices during and after the failover • They provide the ability to manually switchover and switchback to enable efficient testing and “rolling” maintenance with minimal planned downtime • They can automatically detect failures in network, storage, OS, hardware, or application • They prevent data loss in the event of a system failure • They failover across geographically separated nodes for disaster recovery High Availability Clustering A variety of clustering software solutions are available for Windows, Linux distributions, and various hypervisors (virtual machine solutions). One group supports only a single operating system, such as the following: • Windows Server Failover Clustering (WSFC): Provides high availability and disaster recovery for hosted applications such as Microsoft SQL Server and Microsoft Exchange • SUSE Linux Enterprise High Availability Extension (HAE): Supports clustering of physical and virtual Linux servers with policy-driven clustering and continuous data replication • Red Hat Pacemaker (Pacemaker): Creates single-site clusters for performance, high availability, load balancing, and scalability None of the solutions listed here can protect SAP running on Oracle Linux operating system for example. Thus, each solution limits your flexibility and deployment options. More advanced high availability solutions, such as SIOS Protection Suite for Linux, provide application-aware protection in major Linux distributions, including Oracle Linux, Red Hat, and SUSE. In addition, every application, database, and ERP system has its own requirements for configuration and ongoing management. To meet these requirements, HAE and Pacemaker typically require a high degree of technical skill, and complicated manual scripting, which introduces the likelihood of human error and unreliable failover. Some examples of business-critical applications, databases, and ERP systems commonly protected with failover clustering include SAP S/4HANA, SQL Server, and other applications and databases. SAP S/4HANA Several Linux vendors offer open source high availability extensions for SAP in their “Enterprise for SAP” subscriptions. SAP S/4HANA environments comprise multiple services such as ABAP SAP Central Service (ASCS), Evaluated Receipt Settlement (ERS), and other SAP components, that need to be maintained in the right locations and started up in the right order. In open source clustering products, such as SUSE HAE and Red Hat Pacemaker, manually configuring and managing clusters in these complex environments can be time-consuming and prone to human errors that increase the risk of catastrophic downtime and data loss. Specific deep expertise in the applications and database is also required to create an application-aware high availability solution. In contrast, SIOS Protection Suite for Linux includes application recovery kits for SAP and HANA that ensure failovers maintain application best practices. SAP also offers HANA System Replication, a feature that comes with the HANA software. It provides continuous synchronization of an SAP HANA database to a secondary location in the same data center, at a remote site, or in the cloud. The data is replicated to the secondary site and preloaded into memory. When a failure occurs, the secondary site takes over without a database restart, which helps to reduce the RTO. However, failback to the primary node must be manually triggered. HSR needs to be paired with an application-aware clustering software such as SIOS Protection Suite that can detect failures and orchestrate failovers if necessary. SQL Server Many companies rely on SQL Server as the back-end database for key applications supporting important business functions. Microsoft WSFC is commonly used to support Always On Availability Groups (AG) and SQL Server Failover Cluster Instances (FCI) for SQL Server applications. However, WSFC with AG requires costly SQL Server Enterprise Edition licensing. In addition, With FCI, the entire instance is failed over to the standby node. With AG only the databases in the group are protected. Using SIOS DataKeeper with WSFC allows you to provide advanced high availability protection for SQL Server using cost-efficient Standard Edition licensing. Other Applications and Databases SIOS software can be used to protect a wide range of business-critical applications, databases and ERPs, including Oracle, MaxDB, MySQL, PostgreSQL, and DB2. SIOS software enables clustering and disaster recovery. In our next blog, we’ll take a look at specific industry use cases to help you understand how different businesses achieve high-availability for their mission-critical applications. Reproduced with permission from SIOS
January 13, 2023	Epicure Protects Business Critical SQL Server with Amazon EC2 and SIOS SANLess Clustering Software Epicure Protects Business Critical SQL Server with Amazon EC2 and SIOS SANLess Clustering Software SIOS DataKeeper Cluster Edition Software Provides High Availability and Disaster Protection. Epicure, Canada’s leading direct sales company, sells healthy, easy-to-prepare food products through a network of over 16,000 consultants. The company relies on two websites for its critical business operations. Their public website provides company and product information, recipes, blogs, and enrollment information to its customers and to people interested in becoming a consultant. Their internal website provides consultants with important information about products and enables them to place all of their orders. “Our websites are vital to our business,” said Russell Born, Senior Network Infrastructure Administrator at Epicure. The Environment Both of Epicure’s websites run on a single server using two instances of SQL Server Standard Edition—one for each website. As the company expanded its products and services, the Epicure IT department needed to update and to ensure both of its business-critical websites would continue to operate in the event of failures or disasters. They decided to move both websites from a third-party hosted facility to its on-premises data center and to use Amazon Web Services EC2 cloud for disaster recovery. “By bringing the sites in-house, we could ensure that our websites would deliver excellent user experiences for both our customers and consultants as our business continues to grow,” said Born. The Challenge As part of this website update process, Epicure IT staff wanted an efficient, cost- effective way to provide high availability and disaster protection for both websites while continuing to run them on two instances of SQL Server Standard Edition. “We didn’t want the added expense of moving to SQL Server Enterprise Edition if we could provide HA and DR with the more cost- effective Standard Edition,” Born said. The Solution Using SIOS DataKeeper Cluster Edition software, Epicure IT staff created a two-node SANLess cluster in an active-passive failover configuration that enables each SQL instance to failover independently. One cluster node is in the Epicure on-premises data center and the second node is in an instance of the AWS EC2 cloud. Epicure IT staff created the SIOS SANLess clusters and configured them using the software’s intuitive graphical user interface. The Results The SIOS software provided Epicure with an easy, cost-efficient way to provide HA and DR protection for its business-critical SQL Server applications without the cost and complexity of building out a remote DR site or purchasing costly SAN storage or SQL Server Enterprise Edition licenses. “The SIOS software has allowed us to create a hybrid solution that provides the cost savings of running on-premises and the reliability and flexibility of running in the cloud,” said Born. “Because we know that if there is a website outage, it will failover automatically, our IT team can now focus their attention on other priorities to strengthen our business.” Reproduced with permission from SIOS
January 10, 2023	SIOS DataKeeper Clustering Software Enables Gulliver International to Move Internal IT Systems to Amazon Web Services Safely SIOS DataKeeper Clustering Software Enables Gulliver International to Move Internal IT Systems to Amazon Web Services Safely SIOS software provides high availability in AWS environments, enabling leading pre-owned vehicle company to move all IT system to the cloud. Gulliver International is a leading pre-owned car company based in Tokyo with 420 locations throughout Japan. Over the next four years, the company plans to expand into a global business with 1600 stores worldwide. To ensure its IT infrastructure can accommodate this rapid growth, the company is migrating all of its internal systems to AWS and promoting a company-wide “cloud-first” policy for all new applications. “Moving our systems to the cloud will give us flexibility and scalability we need to grow quickly and cost-efficiently, while continuously providing excellent service to our customers,” said Manabu Tsukishima, IT Manager, Gulliver International. The Challenge To ensure the success of their cloud-first initiative, Gulliver needed to protect their business critical applications from downtime in a cloud environment, where traditional failover clusters are not possible. “We would not consider moving our applications to the cloud without an efficient, easy-to-implement high availability solution,” said Tsukishima. Gulliver chose to use SIOS DataKeeper software, which is sold in Japan by SIOS Technology, Inc. The Solution SIOS DataKeeper software enables Gulliver to use Windows Server Failover Clustering (WSFC) to build a failover cluster in a cloud environment, where traditional shared- storage clusters are not possible. SIOS software uses efficient, real time replication to synchronize storage between servers operating as a WSFC cluster in an AWS environment. Using SIOS software, Gulliver can configure two servers operating as a cluster across separate Amazon Availability Zones. Just as in a traditional physical environment, if there is a failure on the primary server in the AWS cloud within one Availability Zone, WSFC moves the application to the second server located in another Amazon Availability Zone, providing full disaster tolerance and recovery in the cloud. The Results “We are extremely pleased with the value that SIOS DataKeeper software brings to our company’s cloud-first initiative,” said Tsukishima. With SIOS DataKeeper software, Gulliver can move to the cloud without adding complexity or disruption to existing operations. “By enabling us to use a clustering configuration in the cloud in the same way we would in a physical environment, SIOS DataKeeper software made it possible for us to migrate to AWS without sacrificing application protection or changing the configuration of our existing system at all.” About 30 percent of Gulliver’s existing on-premises systems have been migrated to AWS without any changes to the company’s system administration or added complexity. As Gulliver continues to execute its expansion plan, it will soon need to protect even larger volumes of data and a wider range of applications. To meet thi s need, it will continue to use SIOS DataKeeper software as it migrates systems to the cloud. As a Standard Consulting Partner of APN (AWS Partner Network), SIOS is committed to continuing to provide high availability systems that operate on AWS.” Reproduced with permission from SIOS
January 5, 2023	Creating a HA Oracle Database server cluster in AWS Creating a HA Oracle Database server cluster in AWS Introduction As a developer tasked with creating a POC for a business critical application that requires a highly available (HA) instance of Oracle I need to set up an Oracle EC2 HA cluster in AWS EC2. Where do you start? If you are like most of us you will spend endless hours googling your next task, reading articles, installation guides, documentation and questions on stack overflow. You will find lots of almost right answers, but they never quite fit your version or environment. Worse you go down a rabbit hole and end up wasting days building an environment that will not work. I am going to structure a series of blogs that focus on setting up HA environments for developing Proof of Concepts using the various SIOS HA solutions like: DataKeeper, LifeKeeper, and SIOS Protection Suite. If you have an immediate need that I have not yet covered, let me know, and I will move your configuration up in my backlog. Thank you for reading this. I hope it makes your life easier. I have a list of tasks below that you can just run through if you already are familiar with how to accomplish those tasks. Then below is a step-by-step guide for performing each task. AWS HA Oracle database SIOS Protection Suite for Linux Launch 2 instances of Oracle on Linux Get Xwindows working Connect to instance and mount additional disk Install AWS cli kit Configure Security/Access Create route entry for the virtual IP Disable Source/Destination Check for ENI’s Edit /etc/hosts Configure the Listener with the VIP hostname Disable SELinux Install SIOS Protection Suite for Linux Start LifeKeeper Connect to second server Build communication paths Create a DataKeeper resource Create Hierarchy with Virtual IP resource Create an Oracle listener resource Create Hierarchy with Oracle Database Create Hierarchy with EC2 Change Shutdown Behavior Test Failover 1. Launch 2 instances of Oracle on Linux In this first blog we are going to set up a HA environment in AWS for an Oracle Cluster using SIOS LifeKeeper for Linux. This means getting all the prerequisites out of the way. I will be using the aws-marketplace/Oracle Database 19.8.0 Enterprise Edition on Oracle Linux 8 AMI. These change frequently and it can be difficult to find the correct one that will fit your needs. This AMI was my 3rd attempt because installing anything, especially something like Oracle, in the cloud is very difficult due to the repositories, licensing, enrollment and security issues. This AMI actually works because Oracle is already installed on the image. Make sure the OS version and the Oracle DB version are supported by SIOS. That can be checked here. My instance has: Single VPC Single Region Different Availability Zones for each server Additional drive(s) for database storage 2 Network interfaces for each instance in different subnets Create 2 Elastic IP addresses and attach one to each server I am attaching an additional disk to the instance for the database and an additional NIC for redundant communication paths. Make sure the two NICs are on different subnets. This will also mean you will have to manually create and assign Elastic IP addresses in order to connect to the instances. Connect to the instance and mount the additional disk. I am using Putty and Xming to connect with my instance. If using Xming make sure to run Xlaunch prior to trying to make a connection. After launching the instance, you will need to partition the new disk. It is easiest to find by [ ls /dev/disk/by-path ]: Now you need to partition the disk with *fdisk: Next create the file system on the new partition with mkfs.xfs: We will now mount the file system with mount: Finally we will add the entry to automatically mount the disk in fstab: It is important to note that you do not need to run the install for Oracle. The AMI has done that and created a database for you. I deleted the database that is pre-configured with this AMI and created a new one on the /data disk using DBCA. I started up the database and I created a schema and added data using SQLPLUS. This all requires that you get Xwindows working. 2. Get Xwindows working Xdisplay using Putty can be setup using Xming for Windows. Install Xming first. Then ensure that you enable X11 forwarding, enter localhost:0.0 in the x display location and the path and xming.exe executable in the x authority file for local display: That takes care of the Windows side, but you still need to fix the Linux side. First edit /etc/ssh/sshd_config and uncomment “X11Forwarding yes”. Finding and adding the correct key to Xauthority is next. You may have to start a new session if you have done any user switching. After logging in as ec2-user run xauth list* which will provide you the hex key you need to add to your Xauthority file. Switch to the oracle user: su – oracle. Then run xauth add $DISPLAY . <hexkey copied from xauth list> . This stores the information into the /home/oracle/.Xauthority file. Exit back to ec2-user. 3. Connect to instance and mount additional disk I am using Putty and Xming to connect with my instance. If using Xming make sure to run Xlaunch prior to trying to make a connection. After launching the instance you will need to partition the new disk. It is easiest to find by [ ls /dev/disk/by-path ]: Now you need to partition the disk with *fdisk: Next we create the file system on the new partition with mkfs.xfs: At this point we want to rename the /u01 to the /oracle directory so that we can mount the new filesystem on /u01 which is where Oracle resides on our server built with the AMI. Create the mount point with mkdir /u01 and mount the volume with mount. Move the files to the new disk with mv /oracle /u01. This will take some time because it is approximately 11GB of data. Finally we will add the entry to automatically mount the disk in fstab: It is important to note that you do not need to run the install for Oracle. The AMI has done that and created a database for you. I started up the database, created a schema, and added data using SQLPLUS. 4. Install AWS cli kit We need the awscli kit; so, while we are root download the file with curl “https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip” -o “awscliv2.zip”* Unzip the file with unzip awscliv2.zip Install the application with sudo ./aws/install Next setup the Access Key in AWS by clicking on your account on the top right of the console and then select Security Credentials Click on Create access key: Then click on Download .csv file: Transfer this file onto your servers and the configure AWS using the Key ID and access key from your csv file with the aws configure command: Test that it is working with something like: aws –no-paginate –no-cli-pager ec2 describe-instances 5. Configure Security/Access First, I added the Oracle user to the root and wheel group giving it pseudo privileges (Usermod -aG wheel oracle). This will make life easy by making the Oracle account the lkadmin account. I downloaded the sps.img and license files onto both servers. Before installing the software there are a few more prerequisite steps that need to be done. First configure the security group for the servers so that they can communicate by opening up TCP ports 5900-59010. Open TCP ports 81 and 82 as well. Also make sure that the ports are open for the Virtual IP. 6. Create route entry for the virtual IP The route table will need to be updated in order for the cluster’s Virtual IP to work. In this multi-subnet cluster configuration, the Virtual IP needs to live outside the range of the CIDR allocated to your VPC. Define a new route that will direct traffic to the cluster’s Virtual IP (172.30.0.101) to the primary cluster node (Oracle1) From the VPC Dashboard, select Route Tables, click Edit. Add a route for “172.30.0.101/32” with a destination of the primary Elastic Network Interface (ENI) on the primary server: 7. Disable Source/Destination Check for ENI’s Under the Network Interfaces select each interface one at a time and then Under Actions select change source/dest. As long as you don’t get an authentication error, it is installed and configured correctly. Uncheck the Enable box: Repeat for all interfaces. 8. Edit /etc/hosts Unless you already have a DNS server setup, you’ll want to create host file entries on both servers so that they can properly resolve each other by name. 9. Configure the Listener with the VIP hostname Edit or create the $ORACLE_HOME/network/admin/listener.ora file to point to the oracle-vip: 10. Disable SELinux Edit the /etc/sysconfig/selinux file and set “SELINUX=disabled” Reboot the server(s). If at this point the server does not come back up, it is possible that you left the SELINUX setting at permissive and set SELINUXTYPE to disabled, which will brick the instance. Just disassociate the volume in AWS from your instance and mount it with the mount -o rw, nouuid {device} {mount directory} command to a new or existing working instance. Edit the /{mount directory]/etc/sysconfig/selinux file and correct the error. Save the file, unmount and disassociate the volume with this instance and re-attach it to the old instance. 11. Install SIOS Protection Suite for Linux Next, as root I installed SIOS protection suite by mounting the image file with mount /home/ec2-user/sps.img /mnt/ -t iso9660 -o loop. Run the setup with /mnt/setup: Under LifeKeeper Authentication I scroll down to the lkadmin group, hit enter and add oracle to the ‘lkadmin’ group: Select OK and then tab to Done and hit enter. Next Scroll to Install License Key File and hit enter: From here type in the location and name of your license file: Next I select the Recovery Kit Selection Menu and hit enter: Here I select Networking: Hit the spacebar to select the LifeKeeper Recovery Kit for EC2. Tab to Done and hit enter. Next I selected the Database menu, scroll down and hit spacebar on LifeKeeper Oracle RDBMS Recovery Kit: Tab to Done or hit D and scroll down to Storage and hit enter. Next I hit the spacebar and select DataKeeper for Linux: Tab to Done and hit enter or hit d backing out to the Recovery Kit Selection and then tab to Done or hit D to back out the Main Configuration menu: Make sure LifeKeeper Startup After Install is selected and then finally one last tab to done or hit d and we get the Install confirmation screen: Here hit enter or y and the install will start. 12. Start LifeKeeper Startup the LifeKeeper GUI with /opt/LifeKeeper/bin/lkGUIapp if it fails it is likely because you don’t have the magic-number for the account you logged in the .Xauthority file. I logged in as oracle and then did an sudo -i to get to root. So, if my gui doesn’t load I will copy the /home/oracle/.Xauthority file to /root : Here I login as oracle: 13. Connect to second server And then click on the Cluster Connect button Login as oracle: 14. Build communication paths Click on the Create Comm Path button : If there is a failure, make sure firewall and iptables are disabled. Hit next: Hit next: Pick your first IP address and hit next: Select the remote IP: Hit next: Hit Create: Hit next: Now hit done: Next we need to create the second comm path by repeating step 14 with the secondary addresses. Once two paths have been successfully established the servers should go green. 15. Create a DataKeeper resource Click on the Create Resource Hierarchies button: Select Data Replication and hit Next: Hit next (Intelligent means that after a failover you need to manually fail back): Hit next: Select your primary server and hit next: Select Replicate Existing Filesystem and hit next: Select the Existing mount point and hit next: Create a Data Replication Resource Tag and hit next: Select a File System Resource Tag and hit next:[1] For optimal performance the bitmap file should be placed on an ephemeral volume. For testing purposes the bitmap can be placed on the OS disk as shown above. Select the bitmap file location and hit next: Select no for Enable Asynchronous Replication and hit next: Select the Target Server and hit next: Select Switchback Type and hit next: Select Template Priority and hit next: Select Target Priority and hit next: HIt next: Select the Target Disk and hit Next: Hit next: Hit next: Select which network endpoints you want to use for replication and hit next: Select the mount point and hit next: Select resource tag and hit next: Hit Finish: Hit Done: If you click on the /u01 you will see the volume syncing: 16. Create Hierarchy with Virtual IP resource Click on the create resource button: Select IP and hit next: Select Switchback Type and hit next: Select the Primary server and hit next: Enter the Virtual IP address from step 6 and hit next: Enter the subnet mask for the VIP and hit Next: Enter the network interface and hit next Enter the resource tag and hit next: After successful creation hit next: Select the Target Server and hit next: Select switchback type and hit next: Select priority and hit next: Select priority and hit next: Upon completion hit next: Hit next: Select the appropriate netmask and hit next: Select the interface and hit next: Select the resource tag and hit extend: Hit finish upon successful completion: Hit done after verification. 17. Create an Oracle Listener resource Make sure the database and listener are running prior to attempting to configure these resources in LifeKeeper. Click on the create resource button: Select Oracle Database Listener and hit next: Select the primary server and hit next: Enter the Listener configuration file path and filename and hit next: Hit next: Enter the path for the Listener Executables and hit next: Select protection level and hit next: Select recovery level and hit next: Select the IP Address associated with the Listener if required and hit next: Enter the listener tag name and hit Create: Hit next: Hit accept defaults to build the resource on your second server: Click on finish: Click on Done and expand the LSNR and /u01: 18. Create Hierarchy with Oracle Database Click on the Create Resource Hierarchy button : Select Oracle Database and hit Next: Select Switchback type and hit next: Select Server and hit next: Select the Database name and hit next (If you get an error unable to find home directory, make sure database is running): Enter the sysdba username and hit next: Enter the password for the account and hit next: Select the Oracle Listener and hit next: Hit Create: Upon successful creation select Next: Select Accept Defaults: Select Finish: Hit Done: Extend the trees to see all resources: 19. Create Hierarchy with EC2 Click on the Create Resource Hierarchie button : Select Amazon EC2 and hit Next> Select Intelligent and hit Next> Select your primary server and hit Next> Select the EC2 Resource type (we are using Backend cluster for this example) and hit Next> Select the IP resource and select Next> Select the EC2 Resource Tag name and hit Create Upon successful creation of the resource hit Next> after a few seconds the pre-extend wizard will pop up. Hit accept defaults: Once the checks are completed successfully hit Accept Defaults again: Hit Finish and after verification hit Done: The configuration is complete. Now we can test the failover. 20. Change Shutdown Behavior By default, LifeKeeper will not failover resources if you simply shutdown or reboot the server. If you want to move a workload prior to shutting down the server you should manually move the resources to the standby server before shutting down the active node. However, you may wish to change the default behavior to facilitate testing. That is controlled by changing the Shutdown Strategy as shown below. Right click on your Primary Server and Select Properties: Under the General Tab change the Shutdown Strategy to Switchover Resources and then hit Apply: Next select the secondary server from the server pull down and verify the setting change: Hit Ok: 21. Test Failover I am running the lkGUIapp from the secondary server. If you are on the primary server exit out of the LifeKeeper GUI and run it from the secondary server. Expand all the Resource Hierarchies and open a SSH session to your primary server. I am also running a ping -i 5 to the oracle-vip: Shutdown the primary server: You can see in my case the IP stopped responding for < 25 seconds. I missed 4 pings 20-23 at 5 second intervals. Everything is now active on the backup server. Because our primary is still down we get warnings on the hierarchy. Once you bring up the Primary server if you left the switchback to intelligent, you will have to manually bring the service up on the primary. Make sure that the Primary server is InSync before trying to bring it into service: Right click on the StandBy button for cdb1 and select In Service… Click In Service Hit Done. It will take a few minutes for the disk to resync, but eventually it will. Upon restoring everything we now have an HA Oracle database in AWS that is ready for development. Reproduced with permission from SIOS
December 30, 2022	Leading Beverage Manufacturer Protects Critical SAP ERP in AWS EC2 Cloud Leading Beverage Manufacturer Protects Critical SAP ERP in AWS EC2 Cloud SIOS Chosen Based on Certifications and Validations for SAP, Amazon Web Services and Red Hat Linux A leading Hong Kong-based beverage manufacturer produces 61 beverage brands including the number one software drink brand in the world and distributes them to more than 728 million customers throughout Hong Kong, mainland China, Taiwan and western USA. The Environment The company relies on an SAP ERP (enterprise resource planning) system running in a Red Hat Linux environment to manage a variety of critical business operations. The SAP environment comprises a variety of services including the ABAP (Advanced Business Application Programming), SAP Central Services (ASCS), Evaluated Receipt Settlement, Web Dispatcher and the DB2 database. They used a large Storage Area Network (SAN) for data storage. The core SAP applications handle all business operations across the company’s beverage division. In their on-premises data center, the company provided uptime protection for this system using data replication and backups of the SAN. The Challenge The company’s IT department determined that they could achieve true high availability (99.99% uptime), disaster recovery, scalability and cost savings by migrating to the cloud and using failover clustering to protect their critical SAP system. However, they realized that SAN and other shared storage required for traditional failover clustering is not practical in some clouds and is not available in others. The Evaluation After extensive evaluation, the company chose to move their SAP environment to Amazon EC2. They established four key criteria for evaluating their choices for an HA/DR solution. Their solution needed to: Be certified and validated for use with SAP, AWS and Red Hat Provide both high availability and enable high performance Protect against all likely failure scenarios Enable easy ongoing operation and maintenance The company’s cloud account manager recommended that they consider the SIOS Protection Suite, offered through AWS China. The SIOS software is certified by SAP for both NetWeaver and DB2, and that SIOS is fully tested and supported on Red Hat Enterprise and other distributions of Linux. The company tested the SIOS clustering software extensively under a variety of challenging failure scenarios, and also evaluated the throughput performance during periods of peak demand. The IT team’s confidence in SIOS Protection Suite increased as it passed each of their rigorous tests and proved to be remarkably easy to use. The Solution SIOS Protection Suite for Linux enables SANless failover clustering to provide full HA and DR for SAP and its critical services. The SIOS software uniquely includes modules called Application Recovery Kits (ARKs) that provide application-specific functionality that simplifies configuration and ensures failover orchestration maintains application best practices. The SAP and HANA ARKs automate configuration steps and validate configuration inputs and manage IP failover, and boot order to minimize human error. Unlike other clustering software that only validates server operability, the SIOS clustering software verifying that SAP and critical services are running, that databases are mounted and available, that any file shares or exports are available, and that clients are able to connect. To ensure these services are all functioning properly, SIOS software continuously monitors the servers, virtual machines, operating system and all major components of the SAP software. For DR protection, the company located the active and standby cluster nodes in different AWS Availability Zones for geographical separation. The Results SIOS Protection Suite has made it possible for this leading beverage manufacturer to meet the stringent recovery time and recovery point objectives established for its SAP/DB2 environment. To date, the configuration has experienced no perceptible downtime, including during planned maintenance. And these results have been realized with minimal effort, making it possible for the IT staff to focus more on projects that enhance employee productivity or otherwise improve business operations. Reproduced with permission from SIOS