High Availability Archives - Page 22 of 47

High Availability & the Cloud: The More You Know

October 25, 2021 by Jason Aw Leave a Comment

High Availability & the Cloud: The More You Know

While researching reasons to migrate to the cloud, you’ve probably learned that the benefits of cloud computing include scalability, reliability, availability, and more. But what, exactly, do those terms mean? Let’s consider high availability (HA), as it is often the ultimate goal of moving to the cloud for many companies.

The idea is to make your products, services, and tools accessible to your customers and employees at any time from anywhere using any device with an internet connection. That means ensuring your critical applications are operational – even through hardware failures, software issues, human errors, and sitewide disasters – at least 99.99% of the time (that’s the definition of high availability).

While public cloud providers typically guarantee some level of availability in their service level agreements, those SLAs only apply to the cloud hardware. There are many reasons for application downtime that aren’t covered by SLAs. For this reason, you need to protect these applications with clustering software that will detect issues and reliably move operations to a standby server if necessary. As you plan what and how you will make solutions available in the cloud, remember that it is important that your products and services and cloud infrastructure are scalable, reliable, and available when and where they are needed.

Quick Stats on High Availability in the Cloud in 2021

Now that we’ve defined availability in the cloud context, let’s look at its impact on organizations and businesses. PSA, these statistics may shock you, but don’t fret. We’ve also got some solutions to these pressing and costly issues.

As much as 80% of Enterprise IT will move to the cloud by 2025 (Oracle).
The average cost of IT downtime is between $5,600 and $11,600 per minute (Gartner; Comparitech).
Average IT staffing to employee ratio is 1:27 (Ecityworks).
22% of downtime is the result of human error (Cloudscene).
In 2020, 54% of enterprises’ cloud-based applications moved from an on-premises environment to the cloud, while 46% were purpose-built for the cloud (Forbes).
1 in 5 companies don’t have a disaster recovery plan (HBJ).
70% of companies have suffered a public cloud data breach in the past year (HIPAA).
48% of businesses store classified information on the cloud (Panda Security).
96% of businesses experienced an outage in a 3-year period (Comparitech).
45% of companies reported downtime from hardware failure (PhoenixNAP).

What You Can Do – Stay Informed

If you are interested in learning the fundamentals of availability in the cloud or hearing about the latest developments in application and database protection, join us. The SIOS Cloud Availability Symposium is taking place Wednesday, September 22nd (EMEA) and Thursday, September 23rd (US) in a global virtual conference format for IT professionals focusing on the availability needs of the enterprise IT customer. This event will deliver the information you need on application high availability clustering, disaster recovery, and protecting your applications now and into the future.

Cloud Symposium Speakers & Sessions Posted

We have selected speakers presenting a wide range of sessions supporting availability for multiple areas of the data application stack. Check out the sessions posted and check back for additional presentations to be announced! Learn more

Register Now

Whether you are interested in learning the fundamentals of availability in the cloud or hearing about the latest developments in application and database protection, this event will deliver the information you need on application high availability clustering, disaster recovery, and protecting your applications now and into the future.

Reproduced from SIOS

Enhanced High Availability for SAP S/4HANA in Cloud Environments

October 16, 2021 by Jason Aw Leave a Comment

SIOS Protection Suite for Linux Now Features Enhanced High Availability for SAP S/4HANA in Cloud Environments

SIOS is pleased to announce the GA release of SIOS Protection Suite for Linux version 9.5.2 clustering software. Our latest release features enhanced automation and application failover orchestration that makes creating and managing high availability (HA) clusters in complex SAP S/4HANA environments easier and more reliable for enterprises.

New features and capabilities in SIOS Protection Suite for Linux 9.5.2, include:

Enhanced, comprehensive Support for high availability in Google Cloud Platform with the addition of SIOS Internal Load Balancer tool designed to orchestrate efficient IP management during switchovers and failovers.
Enhanced ability to automate interactions with SIOS LifeKeeper with products such as Ansible.
Near-Zero Downtime for Switchover During Planned Maintenance. Supports a HANA “Takeover with Handshake” eliminating the potentially time-consuming process of shutting down the primary HANA server database for switchover, allowing to perform planned maintenance without disrupting ongoing service to end-users.
Empowering customers by making SAP logging easier to use and take action on.

Schedule a personalized demonstration here.

Reproduced from SIOS

Deployment of a SQL Server Failover Cluster Instance on Huawei Cloud

September 28, 2021 by Jason Aw Leave a Comment

Deployment of a SQL Server Failover Cluster Instance on Huawei Cloud

*DISCLAIMER: While the following completely covers the high availability portion within the scope of our product, this is a setup “guide” only and should be adapted to your own configuration.

Overview

HUAWEI CLOUD is a leading cloud service provider not just in China but also has global footprint with many datacenters around the world. They bring Huawei’s 30-plus years of expertise together in ICT infrastructure products and solutions and are committed to providing reliable, secure, and cost-effective cloud services to empower applications, harness the power of data, and help organizations of all sizes grow in today’s intelligent world. HUAWEI CLOUD is also committed to bringing affordable, effective, and reliable cloud and AI services through technological innovation.

DataKeeper Cluster Edition provides replication in a virtual private cloud (VPC) within a single region across availability zones for the Huawei cloud. In this particular SQL Server clustering example, we will launch four instances (one domain controller instance, two SQL Server instances and a quorum/witness instance) into three availability zones.

Huawei Cloud SIOS Datakeeper HA Architecture

DataKeeper Cluster Edition provides support for a data replication node outside of the cluster with all nodes in Huawei cloud. In this particular SQL Server clustering example, four instances are launched (one domain controller instance, two SQL Server instances and a quorum/witness instance) into three availability zones. Then an additional DataKeeper instance is launched in a second region including a VPN instance in both regions. Please see Configuration of Data Replication From a Cluster Node to External DR Site for more information. For additional information on using multiple regions please see Connecting Two VPCs in Different Regions.

Huawei Cloud SIOS Datakeeper DR architecture

DataKeeper Cluster Edition also provides support for a data replication node outside of the cluster with only the node outside of the cluster in Huawei Cloud. In this particular SQL Server clustering example, WSFC1 and WSFC2 are in an on-site cluster replicating to a Huawei Cloud instance. Then an additional DataKeeper instance is launched in a region in Huawei Cloud. Please see Configuration of Data Replication From a Cluster Node to External DR Site for more information.

Huawei Cloud SIOS Datakeeper Hybrid DR Architecture

Requirements

Description	Requirement
Virtual Private Cloud	In a single region with three availability zones
Instance Type	Minimum recommended instance type: s3.large.2
Operating System	See the DKCE Support Matrix
Elastic IP	One elastic IP address connected to the domain controller
Four instances	One domain controller instance, two SQL Server instances and one quorum/witness instance
Each SQL Server	ENI (Elastic Network Interface) with 4 IPs · Primary ENI IP statically defined in Windows and used by DataKeeper Cluster Edition · Three IPs maintained by ECS while used by Windows Failover Clustering , DTC and SQLFC
Volumes	Three volumes (EBS and NTFS only) · One primary volume (C drive) · Two additional volumes o One for Failover Clustering o One for MSDTC

Release Notes

Before beginning, make sure you read the DataKeeper Cluster Edition Release Notes for the latest information. It is highly recommended that you read and understand the DataKeeper Cluster Edition Installation Guide.

Create a Virtual Private Cloud (VPC)

A virtual private cloud is the first object you create when using DataKeeper Cluster Edition.

*A virtual Private Cloud (VPC) is an isolated private cloud consisting of a configurable pool of shared computing resources in a public cloud.

Using the email address and password specified when signing up for Huawei Cloud, sign in to the Huawei Cloud Management Console.
From the Services dropdown, select Virtual Private Cloud.

On the right side of the screen, click on Create VPC and select the region that you want to use.
Input the name that you want to use for the VPC
Define your virtual private cloud subnet by entering your CIDR (Classless Inter-Domain Routing) as described below
Input the subnet name, then click Create Now.

*A Route Table will automatically be created with a “main” association to the new VPC. You can use it later or create another Route Table.

*HELPFUL LINK:
Huawei’s Creating a Virtual Private Cloud (VPC)

Launch an Instance

The following walks you through launching an instance into your subnet. You will want to launch two instances into one availability zone, one for your domain controller instance and one for your SQL instance. Then you will launch another SQL instance into another availability zone and a quorum witness instance into yet another availability zone.

*HELPFUL LINKS:
Huawei Cloud ECS Instances

Using the email address and password specified when signing up for Huawei Cloud, sign in to the Huawei Cloud Management Console.
From the Service List dropdown, select Elastic Cloud Server.

Select Buy ECS button and choose the Billing Mode, Region and AZ (Availability Zone) to deploy the Instance
Select your Instance Type. (Note:Select s3.large.2 or larger.).
Choose an Image. Under Public Image, select the Windows Server 2019 Datacenter 64bit English image
1. For Configure Network, select your VPC.
2. For Subnet, select an Subnet that you want to use, select Manually-specified IP address and input the IP address that you want to use
3. Select the Security Group to use or Edit and select an existing one.
4. Assign an EIPif you need the ECS instance to access the internet
5. Click Configure Advanced Settings and provide a name for the ECS, use Password for Login Mode and provide the secure password for Administrator login
6. Click Configure Now on Advanced Options Add a Tag to name your instance and Click on Confirm
Perform final review of the Instance and click on Submit.

*IMPORTANT: Make a note of this initial administrator password. It will be needed to log on to your instance.

Repeat the above steps for all instances.

Connect to Instances

You can connect to your domain controller instance via Remote Login from the ECS pane.

*BEST PRACTICE: Once logged on, it is best practice to change your password.

Configure the Domain Controller Instance

Now that the instances have been created, we started with setting up the Domain Service instance.

This guide is not a tutorial on how to set up an Active Domain server instance. We recommend reading articles on how to set up and configure an Active Directory server. It is very important to understand that even though the instance is running in a Huawei cloud, this is a regular installation of Active Directory.

Static IP Addresses

Configure Static IP Addresses for your Instances

Connect to your domain controller instance.
Click Start/ Control Panel.
Click Network and Sharing Center.
Select your network interface.
Click Properties.
Click Internet Protocol Version 4 (TCP/IPv4), then Properties.
Obtain your current IPv4 address, default gateway and DNS server for the network interface from Amazon.
In the Internet Protocol Version 4 (TCP/IPv4) Properties dialog box, under Use the following IP address, enter your IPv4 address.
In the Subnet mask box, type the subnet mask associated with your virtual private cloud subnet.
In the Default Gateway box, type the IP address of the default gateway and then click OK.
For the Preferred DNS Server, enter the Primary IP Address of Your Domain Controller(ex. 15.0.1.72).
Click Okay, then select Close. Exit Network and Sharing Center.
Repeat the above steps on your other instances.

Join the Two SQL Instances and the Witness Instance to Domain

*Before attempting to join a domain make these network adjustments. On your network adapter, Add/Change the Preferred DNS server to the new Domain Controller address and its DNS server. Use ipconfig /flushdns to refresh the DNS search list after this change. Do this before attempting to join the Domain.

*Ensure that Core Networking and File and Printer Sharing options are permitted in Windows Firewall.

On each instance, click Start, then right-click Computer and select Properties.
On the far right, select Change Settings.
Click on Change.
Enter a new Computer Name.
Select Domain.
Enter Domain Name– (ex. docs.huawei.com).
Click Apply.

*Use Control Panel to make sure all instances are using the correct time zone for your location.

*BEST PRACTICE: It is recommend that the System Page File is set to system managed (not automatic) and to always use the C: drive.

Control Panel > Advanced system settings > Performance > Settings > Advanced > Virtual Memory. Select System managed size, Volume C: only, then select Set to save.

Assign Secondary Private IPs to the Two SQL Instances

In addition to the Primary IP, you will need to add three additional IPs (Secondary IPs) to the elastic network interface for each SQL instance.

From the Service List dropdown, select Elastic Cloud Server.
Click the instance for which you want to add secondary private IP addresses.
Select NICs > Manage Virtual IP Address.
Click on Assign Virtual IP address and select Manual enter an IP address that is within the subnet range for the instance (ex. For 15.0.1.25, enter 15.0.1.26). Click Ok.
Click on the More dropdown on the IP address row, and select Bind to Server, select the server to bind the IP address to, and the NIC card.
Click OK to save your work.
Perform the above on both SQL Instances.

*HELPFUL LINKS:
Managing Virtual IP Addresses
Binding a Virtual IP Address to an EIP or ECS

Create and Attach Volumes

DataKeeper is a block-level volume replication solution and requires that each node in the cluster have additional volume(s) (other than the system drive) that are the same size and same drive letters. Please review Volume Considerations for additional information regarding storage requirements.

Create Volumes

Create two volumes in each availability zone for each SQL server instance, a total of four volumes.

From the Service List dropdown, select Elastic Cloud Server.
Click the instance for which you want to manage
Go to the Disks tab
Click Add Disk to add a new volume of your choice and size, make sure you select the volume in the same AZ as the SQL server that you intend to attach it to
Select the check box to agree to the SLA and Submit
Click Back to Server Console
Attach the disk if necessary to the SQL instance
Do this for all four volumes.

*HELPFUL LINKS:
Elastic Volume Service

Configure the Cluster

Prior to installing DataKeeper Cluster Edition, it is important to have Windows Server configured as a cluster using either a node majority quorum (if there is an odd number of nodes) or a node and file share majority quorum (if there is an even number of nodes). Consult the Microsoft documentation on clustering in addition to this topic for step-by-step instructions. Note: Microsoft released a hotfix for Windows 2008R2 that allows disabling of a node’s vote which may help achieve a higher level of availability in certain multi-site cluster configurations.

Add Failover Clustering

Add the Failover Clustering feature to both SQL instances.

Launch Server Manager.
Select Features in the left pane and click Add Features in the Features This starts the Add Features Wizard.
Select Failover Clustering.
Select Install.

Validate a Configuration

Open Failover Cluster Manager.
Select Failover Cluster Manager, select Validate a Configuration.
Click Next, then add your two SQL instances.

Note: To search, select Browse, then click on Advanced and Find Now. This will list available instances.

Click Next.
Select Run Only Tests I Select and click Next.
In the Test Selection screen, deselect Storage and click Next.
At the resulting confirmation screen, click Next.
Review Validation Summary Report then click Finish.

Create Cluster

In Failover Cluster Manager, click on Create a Cluster then click Next.
Enter your two SQL instances.
On the Validation Warning page, select No then click Next.
On the Access Point for Administering the Cluster page, enter a unique name for your WSFC Cluster. Then enter the Failover Clustering IP address for each node involved in the cluster. This is the first of the three secondary IP addresses added previously to each instance.
IMPORTANT!Uncheck the “Add all available storage to the cluster” checkbox. DataKeeper mirrored drives must not be managed natively by the cluster. They will be managed as DataKeeper Volumes.
Click Next on the Confirmation
On Summary page, review any warnings then select Finish.

Configure Quorum/Witness

Create a folder on your quorum/witness instance (witness).
Share the folder.
1. Right-click folder and select Share With / Specific People….
2. From the dropdown, select Everyone and click Add.
3. Under Permission Level, select Read/Write.
4. Click Share, then Done. (Make note of the path of this file share to be used below.)
In Failover Cluster Manager, right-click cluster and choose More Actions and Configure Cluster Quorum Settings. Click Next.
On the Select Quorum Configuration, choose Node and File Share Majority and click Next.
On the Configure File Share Witness screen, enter the path to the file share previously created and click Next.
On the Confirmation page, click Next.
On the Summary page, click Finish.

Install and Configure DataKeeper

After the basic cluster is configured but prior to any cluster resources being created, install and license DataKeeper Cluster Edition on all cluster nodes. See the DataKeeper Cluster Edition Installation Guide for detailed instructions.

Run DataKeeper setup to install DataKeeper Cluster Edition on both SQL instances.
Enter your license key and reboot when prompted.
Launch the DataKeeper GUI and connect to server.

*Note: The domain or server account used must be added to the Local System Administrators Group. The account must have administrator privileges on each server that DataKeeper is installed on. Refer to DataKeeper Service Log On ID and Password Selection for additional information.

Right click on Jobs and connect to both SQL servers.
Create a Job for each mirror you will create. One for your DTC resource, and one for your SQL resource..
When asked if you would like to auto-register the volume as a cluster volume, select Yes.

*Note: If installing DataKeeper Cluster Edition on Windows “Core” (GUI-less Windows), make sure to read Installing and Using DataKeeper on Windows 2008R2/2012 Server Core Platforms for detailed instructions.

Configure MSDTC

For Windows Server 2012 and 2016, in the Failover Cluster Manager GUI, select Roles, then select Configure Role.
Select Distributed Transaction Coordinator (DTC), and click Next.

*For Windows Server 2008, in the Failover Cluster Manager GUI, select Services and Applications, then select Configure a Service or Application and click Next.

On the Client Access Point screen, enter a name, then enter the MSDTC IP address for each node involved in the cluster. This is the second of the three secondary IP addresses added previously to each instance. Click Next.
Select the MSDTC volume and click Next.
On the Confirmation page, click Next.
Once the Summary page displays, click Finish.

Install SQL on the First SQL Instance

On the domain controller server create a folder and share it..
1. For example “TEMPSHARE” with Everyone permission.
Create a sub folder “SQL” and copy the SQL .iso installer into that sub folder.
On the SQL server, create a network drive and attach it to the shared folder on the domain controller.
- . For example “net use S: \\\TEMPSHARE
On the SQL server the S: drive will appear. CD to the SQL folder and find the SQL .iso installer. Right click on the .iso file and select Mount. The setup.exe installer will appear with the SQL .iso installer.

F:\>Setup /SkipRules=Cluster_VerifyForErrors /Action=InstallFailoverCluster

On Setup Support Rules, click OK.
On the Product Key dialog, enter your product key and click Next.
On the License Terms dialog, accept the license agreement and click Next.
On the Product Updates dialog, click Next.
On the Setup Support Files dialog, click Install.
On the Setup Support Rules dialog, you will receive a warning. Click Next, ignoring this message, since it is expected in a multi-site or non-shared storage cluster.
Verify Cluster Node Configuration and click Next.
Configure your Cluster Network by adding the “third” secondary IP address for your SQL instance and click Next. Click Yes to proceed with multi-subnet configuration.
Enter passwords for service accounts and click Next.
On the Error Reporting dialog, click Next.
On the Add Node Rules dialog, skipped operation warnings can be ignored. Click Next.
Verify features and click Install.
Click Close to complete the installation process.

Install SQL on the Second SQL Instance

Installing the second SQL instance is similar to the first one.

On the SQL server, create a network drive and attach it to the shared folder on the domain controller as explained above for the first SQL server.
Once the .iso installer is mounted, run SQL setup once again from the command line in order to skip the Validate Open a Command window, browse to your SQL install directory and type the following command:

Setup /SkipRules=Cluster_VerifyForErrors /Action=AddNode /INSTANCENAME=”MSSQLSERVER”

(Note: This assumes you installed the default instance on the first node)

On Setup Support Rules, click OK.
On the Product Key dialog, enter your product key and click Next.
On the License Terms dialog, accept the license agreement and click Next.
On the Product Updates dialog, click Next.
On the Setup Support Files dialog, click Install.
On the Setup Support Rules dialog, you will receive a warning. Click Next, ignoring this message, since it is expected in a multi-site or non-shared storage cluster.
Verify Cluster Node Configuration and click Next.
Configure your Cluster Network by adding the “third” secondary IP address for your SQL Instance and click Next. Click Yes to proceed with multi-subnet configuration.
Enter passwords for service accounts and click Next.
On the Error Reporting dialog, click Next.
On the Add Node Rules dialog, skipped operation warnings can be ignored. Click Next.
Verify features and click Install.
Click Close to complete the installation process.

Common Cluster Configuration

This section describes a common 2-node replicated cluster configuration.

The initial configuration must be done from the DataKeeper UI running on one of the cluster nodes. If it is not possible to run the DataKeeper UI on a cluster node, such as when running DataKeeper on a Windows Core only server, install the DataKeeper UI on any computer running Windows XP or higher and follow the instruction in the Core Only section for creating a mirror and registering the cluster resources via the command line.
Once the DataKeeper UI is running, connect to each of the nodes in the cluster.
Create a Job using the DataKeeper UI. This process creates a mirror and adds the DataKeeper Volume resource to the Available Storage.

!IMPORTANT: Make sure that Virtual Network Names for NIC connections are identical on all cluster nodes.

If additional mirrors are required, you can Add a Mirror to a Job.
With the DataKeeper Volume(s)now in Available Storage, you are able to create cluster resources (SQL, File Server, etc.) in the same way as if there were a shared disk resource in the cluster. Refer to Microsoft documentation for additional information in addition to the above for step-by-step cluster configuration instructions.

Connectivity to the cluster (virtual) IPs

In addition to the Primary IP and secondary IP, you will also need to configure the virtual IP addresses in the Huawei Cloud so that they can be routed to the active node.

From the Service List dropdown, select Elastic Cloud Server.
Click on one of the SQL instance for which you want to add cluster virtual IP address (one for MSDTC, one for SQL Failover Cluster)
Select NICs > Manage Virtual IP Address.
Click on Assign Virtual IP address and select Manual enter an IP address that is within the subnet range for the instance (ex. For 15.0.1.25, enter 15.0.1.26). Click Ok.
Click on the More dropdown on the IP address row, and select Bind to Server, select both the server to bind the IP address to, and the NIC card.
Use the same steps 4. and 5 for the MSDTC and SQLFC virtual IPs
Click OKto save your work.

Management

Once a DataKeeper volume is registered with Windows Server Failover Clustering, all of the management of that volume will be done through the Windows Server Failover Clustering interface. All of the management functions normally available in DataKeeper will be disabled on any volume that is under cluster control. Instead, the DataKeeper Volume cluster resource will control the mirror direction, so when a DataKeeper Volume comes online on a node, that node becomes the source of the mirror. The properties of the DataKeeper Volume cluster resource also display basic mirroring information such as the source, target, type and state of the mirror.

Troubleshooting

Use the following resources to help troubleshoot issues:

Troubleshooting issues section
For customers with a support contract – http://us.sios.com/support/overview/
For evaluation customers only – Pre-sales support

Additional Resources:

Step-by-Step: Configuring a 2-Node Multi-Site Cluster on Windows Server 2008 R2 – Part 1 — http://clusteringformeremortals.com/2009/09/15/step-by-step-configuring-a-2-node-multi-site-cluster-on-windows-server-2008-r2-%E2%80%93-part-1/

Step-by-Step: Configuring a 2-Node Multi-Site Cluster on Windows Server 2008 R2 – Part 3 — http://clusteringformeremortals.com/2009/10/07/step-by-step-configuring-a-2-node-multi-site-cluster-on-windows-server-2008-r2-%E2%80%93-part-3/

Beginning Well is Great, But Maintaining Uptime Takes Vigilance

September 28, 2021 by Jason Aw Leave a Comment

Beginning Well is Great, But Maintaining Uptime Takes Vigilance

Author Isabella Poretsis states, “Starting something can be easy, it is finishing it that is the highest hurdle.” It is great to have a kickoff meeting. It is invigorating, and exciting. Managers and leaders look out at the greenfield with excitement and optimism is high. But, this moment of kickoff, and even the Champagne popping moment of a successful deployment are but just the beginning. Maintaining uptime requires ongoing vigilance.

High availability and the elusive four nines of uptime for your critical applications and databases aren’t momentary occurrences, but rather, a constant endeavor to end the little foxes that destroy the vineyard. Staying abreast of threats, up-to-date on the updates, and properly trained and prepared is the work from which your team “is never entitled to take a vacation.”

For those who want to stay vigilant in maintaining uptime, here are five tips:

1. Monitor the Environment

Very little in enterprise software still follows the “set it and forget it” mindset. Everything, since the day you uncorked the grand opening champagne to now, has been moving toward a state of decline. If you aren’t monitoring the servers, workloads, network traffic, and hardware (virtual or physical), you may lose uptime and stability.

2. Perform Maintenance

One thing that I have always noticed in over twenty plus years of software development and services is that all software comes with updates. Apply them. Remember to execute sound maintenance policies, including taking and verifying backups. One tech writer suggested the only update you regret is the one you failed to make.

3. Learn Continuously

My first introduction to high availability came when I unplugged one end of the Token Ring for a server in our lab as an intern, fresh from the CE-211 lab. The administrator was in my face in minutes. After an earful, he gave me an education. Ideally, you and your team want to learn without taking down your network, but you do absolutely want to keep learning. Look into paid courses on existing technology, new releases, emerging infrastructure. Check your vendors for courses and items related to your process, environment, software deployments and company enterprise. Free courses for many things also exist if money is an issue.

4. Multiply the learning

In addition to continuous learning, make a plan to multiply the learning. As the VP of Customer Experience at SIOS we have seen the tremendous difference between teams who share their learning and those who don’t. Teams that share their learning avoid gaps in knowledge that compromise downtime. The best way to know that you learned something is to teach it to somebody else. As you learn, share the learning with team members to reduce the risk of downtime due to error, and for that matter vacation.

5. End well . . .before the next beginning

All projects, servers, and software have an ending. End well. Decommission correctly. Begin the next phase, deployment, software relationship, etc well by closing up loose ends, documenting what went well, what did not, and what to do next. Treat your existing vendors well. You just may need them again later. Understand the existing systems and high availability solutions before proceeding with a new deployment. This proper ending helps you begin again from a better starting place headed towards a stronger outcome.

Keeping the system highly available is a continuous process. Set it and forget it is a nice catch phrase, but the reality is that uptime takes vigilance, continual monitoring, proper maintenance, and constant.

-Cassius Rhue, VP, Customer Experience

Reproduced with permission from SIOS

High Availability Architecture and Best Practices

September 16, 2021 by Jason Aw Leave a Comment

High Availability Architecture and Best Practices

13 Little Known Facts about High Availability

1. Hypervisor HA is not the Same as Application HA

A key misconception is that I have high availability because I have redundancy in my hardware or hypervisor. However, hardware and hypervisor redundancy does not guarantee high availability for applications. It is also not a guarantee that orchestration of applications will be properly executed in a failure.

2. In High Availability, Bigger Does Not Equal Better

If you are a powerlifter, bigger weights are better and smaller reps are better. Or, if we are talking about hugs. (You remember hugs are the things that we used to do when we saw a friend from a different town, that we hadn’t seen in a while.) But, bigger doesn’t always mean better. A bigger kidney stone, for example, is definitely not better. In higher availability, creating a bigger, more complex solution doesn’t always mean that you’ll have increased your high availability. It might mean you have the same availability or less. It may also mean that you have a bigger, more complex system with a lot of moving pieces to sort through in an outage.

3. Everything fails… sometimes

Application programming languages date back to the 1950’s. And while the languages, processors, IDEs, and quality of the code has improved, the reality is “all applications fail at some point.” Failures due to exceptions, bugs, unhandled terminations, accidental terminations, resource exhaustion, and more happen. Having an active/active, or active/passive application availability strategy is still necessary.

4. Focus on ‘why’ as much as ‘how’

Our natural tendency to jump into task completion mode is a necessary asset, but it needs to be tempered and guided by the answer to our questions of why. Adding a solution to an environment without understanding the business, application, database, and stakeholder requirements will lead to either a:

Failure
Over expenditures
Underperformance
Confusion and over architecture
All of the above

Instead of focusing solely on getting availability implemented, spend the necessary resources and effort to understand the business needs and answers to “why”

5. Unpatched issues are a common source of regret

If you do or you don’t you will have consequences. The consequence of all unpatched issues is regret. As VP of Customer Experience I have seen firsthand the downtime caused by customers failing to address known issues in a timely manner.

6. Undocumented issues cause downtime too

Picture the scene. A new admin is looking into servers on the network. The usage reports indicate the server is not active and no clients are connected. Not recognizing the server and finding no “tags”, documentation or other identifiers, the new admin believes that it should be shut down. Unfortunately the undocumented and uncommunicated instance is actually a standby server whose removal will cause downtime when the primary crashes unexpectedly. This isn’t a fictional story, this is the true story of a new admin who incorrectly identified a server as an idle QA system and shut it down prior to a patching exercise.

7. Complacency is also an enemy

We’d all love it if availability on premises or in the cloud, or anywhere in between was something that we can “set and forget.” But, few, if anything in life is really as simple as “set it and forget it.” One of the biggest enemies of your availability in the future, is your success with high availability now. When disasters are few and far between, and teams feel confident that they have realized sustained stability, complacency can step in. Success tempts us to think nothing is going to change, and complacency in respect to high availability therefore is an enemy to high availability. Things around your enterprise and within your enterprise are changing. The cloud is changing, your business needs are changing, and the applications and Operating Systems are also changing.

8. Change is hard

Change is hard. Just ask anyone with a sweet tooth who’s been trying to give up that second slice of cake before bedtime. Similar resistance occurs even in high availability. Teams, even those who experience disasters, are often reluctant to change even if the change is good. They need a vision, an understanding of why, and support. Other teams, those with solutions in place, are reluctant to improve high availability with fear of introducing instability or exposing themselves to new risk.

9. All change is not good change

Change is good, when change is good. When considering a change to the higher availability solution and architecture it is critical that changes are analyzed against the goals, the requirements, and within the scope of increasing availability. Changes that increase stability, add protection for critical components, eliminate workarounds, optimize the availability of services and are thoroughly tested are good changes.

10. Cheaper is not always better

Cheaper is not always better. While cheaper solutions typically have a lower price tag, they may also come with a number of limitations that make them less than ideal. When there is a lower price tag, beware of missing features such as a lack of application awareness, limited orchestration, hidden complexity, manual recovery and failover, and limited to no user validation. Cheaper solutions may also fail to include customer support. Be sure to understand whether your cheaper solution includes support, or if the support is an additional, and substantial addon cost.

The same applies to cheaper deployments with reduced compute, disk or storage. While the price tag and monthly cost might be lower, your solution may also be functioning at a less than ideal capacity.

11. Loud does not equal effective

Ever heard the story of the boy that cried wolf. An application monitoring solution that produces an alert storm is sooner than later a solution that gets ignored. Having a solution that provides alerts is great, but if that solution triggers critical alerts in error or in excess, it is ineffective.

12. High Availability is a culture and a mindset, not just a product or hardware solution

Software, hardware, processes, solutions and services are all a part of high availability. However, without a buy-in across IT functions and business units, it will be fraught with frustration and constantly the source of budget discussions instead of discussions on value, business stability, increased customer satisfaction, and diminished risk.

13. Now is not too late

Hope is not a strategy for high availability, nor does hoping that you will not have a critical disaster or application failure need to be a strategy. Designing and architecting a highly available enterprise architecture can be made possible now, even if it has been weeks or months since the last disaster.

Contact SIOS to learn more about high availability solutions for your application.

– Cassius Rhue, VP, Customer Experience

Reproduced from SIOS