High Availability Architecture and Best Practices

Date: September 16, 2021

Tags: High Availability

High Availability Architecture and Best Practices

13 Little Known Facts about High Availability

1. Hypervisor HA is not the Same as Application HA

A key misconception is that I have high availability because I have redundancy in my hardware or hypervisor. However, hardware and hypervisor redundancy does not guarantee high availability for applications. It is also not a guarantee that orchestration of applications will be properly executed in a failure.

2. In High Availability, Bigger Does Not Equal Better

If you are a powerlifter, bigger weights are better and smaller reps are better. Or, if we are talking about hugs. (You remember hugs are the things that we used to do when we saw a friend from a different town, that we hadn’t seen in a while.) But, bigger doesn’t always mean better. A bigger kidney stone, for example, is definitely not better. In higher availability, creating a bigger, more complex solution doesn’t always mean that you’ll have increased your high availability. It might mean you have the same availability or less. It may also mean that you have a bigger, more complex system with a lot of moving pieces to sort through in an outage.

3. Everything fails… sometimes

Application programming languages date back to the 1950’s. And while the languages, processors, IDEs, and quality of the code has improved, the reality is “all applications fail at some point.” Failures due to exceptions, bugs, unhandled terminations, accidental terminations, resource exhaustion, and more happen. Having an active/active, or active/passive application availability strategy is still necessary.

4. Focus on ‘why’ as much as ‘how’

Our natural tendency to jump into task completion mode is a necessary asset, but it needs to be tempered and guided by the answer to our questions of why. Adding a solution to an environment without understanding the business, application, database, and stakeholder requirements will lead to either a:

Failure
Over expenditures
Underperformance
Confusion and over architecture
All of the above

Instead of focusing solely on getting availability implemented, spend the necessary resources and effort to understand the business needs and answers to “why”

5. Unpatched issues are a common source of regret

If you do or you don’t you will have consequences. The consequence of all unpatched issues is regret. As VP of Customer Experience I have seen firsthand the downtime caused by customers failing to address known issues in a timely manner.

6. Undocumented issues cause downtime too

Picture the scene. A new admin is looking into servers on the network. The usage reports indicate the server is not active and no clients are connected. Not recognizing the server and finding no “tags”, documentation or other identifiers, the new admin believes that it should be shut down. Unfortunately the undocumented and uncommunicated instance is actually a standby server whose removal will cause downtime when the primary crashes unexpectedly. This isn’t a fictional story, this is the true story of a new admin who incorrectly identified a server as an idle QA system and shut it down prior to a patching exercise.

7. Complacency is also an enemy

We’d all love it if availability on premises or in the cloud, or anywhere in between was something that we can “set and forget.” But, few, if anything in life is really as simple as “set it and forget it.” One of the biggest enemies of your availability in the future, is your success with high availability now. When disasters are few and far between, and teams feel confident that they have realized sustained stability, complacency can step in. Success tempts us to think nothing is going to change, and complacency in respect to high availability therefore is an enemy to high availability. Things around your enterprise and within your enterprise are changing. The cloud is changing, your business needs are changing, and the applications and Operating Systems are also changing.

8. Change is hard

Change is hard. Just ask anyone with a sweet tooth who’s been trying to give up that second slice of cake before bedtime. Similar resistance occurs even in high availability. Teams, even those who experience disasters, are often reluctant to change even if the change is good. They need a vision, an understanding of why, and support. Other teams, those with solutions in place, are reluctant to improve high availability with fear of introducing instability or exposing themselves to new risk.

9. All change is not good change

Change is good, when change is good. When considering a change to the higher availability solution and architecture it is critical that changes are analyzed against the goals, the requirements, and within the scope of increasing availability. Changes that increase stability, add protection for critical components, eliminate workarounds, optimize the availability of services and are thoroughly tested are good changes.

10. Cheaper is not always better

Cheaper is not always better. While cheaper solutions typically have a lower price tag, they may also come with a number of limitations that make them less than ideal. When there is a lower price tag, beware of missing features such as a lack of application awareness, limited orchestration, hidden complexity, manual recovery and failover, and limited to no user validation. Cheaper solutions may also fail to include customer support. Be sure to understand whether your cheaper solution includes support, or if the support is an additional, and substantial addon cost.

The same applies to cheaper deployments with reduced compute, disk or storage. While the price tag and monthly cost might be lower, your solution may also be functioning at a less than ideal capacity.

11. Loud does not equal effective

Ever heard the story of the boy that cried wolf. An application monitoring solution that produces an alert storm is sooner than later a solution that gets ignored. Having a solution that provides alerts is great, but if that solution triggers critical alerts in error or in excess, it is ineffective.

12. High Availability is a culture and a mindset, not just a product or hardware solution

Software, hardware, processes, solutions and services are all a part of high availability. However, without a buy-in across IT functions and business units, it will be fraught with frustration and constantly the source of budget discussions instead of discussions on value, business stability, increased customer satisfaction, and diminished risk.

13. Now is not too late

Hope is not a strategy for high availability, nor does hoping that you will not have a critical disaster or application failure need to be a strategy. Designing and architecting a highly available enterprise architecture can be made possible now, even if it has been weeks or months since the last disaster.

Contact SIOS to learn more about high availability solutions for your application.

– Cassius Rhue, VP, Customer Experience

Reproduced from SIOS