High Availability Archives - Page 27 of 46

Lessons in Cloud High Availability from the Movies

August 11, 2020 by Jason Aw Leave a Comment

Lessons in Cloud High Availability from the Movies

Joseph Lalonde of jmlalonde.com has a blog highlighting the leadership lessons from popular movies such as Hancock, The Greatest Showman, and Frozen II. In credit to Joseph’s inspiring leadership lessons, here are four high availability lessons on cloud migration from Disney’s Frozen II.

Disney’s Frozen II and Cloud Migration Lessons

In Disney’s animated adventure Frozen II the characters Anna, Elsa, Kristoff, Olaf and Sven leave Arendelle to travel to an ancient, autumn-bound forest of an enchanted land. In the adventure, they set out to find the origin of Elsa’s powers in order to save their kingdom. Aside from being fun for a father of six girls, the movie is also ripe with leadership, life, and high availability lessons.

1. You can’t go into cloud migration without the proper help.

When Elsa, Anna, Kristoff, Olaf and Sven arrive at the place where the mysterious voice has been calling them, they stand outside an enormous cloud. When Olaf and Kristoff attempt to enter, they are bounced back and repelled. It is only when Elsa approaches the cloud with her magic that the portal opens and they enter.

When migrating to the cloud, there will be a lot of voices calling you. Some will beckon you to AWS, or Azure or GCP, or a myriad of others. But, no matter which you intend to go with, know that you will need the proper help for a successful entry. This help should include:

Architects knowledgeable of the platform, especially the networking aspects
Application administrators who understand the applications installation, configuration, operation and maintenance characteristics
Security and OS administrators
A cloud partner representative
A knowledgeable availability expert for deploying critical applications in highly available architecture

2. Things will not make sense when you are older, so document and iterate.

While walking through the enchanted forest, Olaf begins experiencing the strange phenomenon and magic of the forest. As he does, he begins to sing:

This will all make sense when I am older
Someday I will see that this makes sense
One day, when I’m old and wise
I’ll think back and realize
That these were all completely normal events
I’ll have all the answers when I’m older

The song concludes with the rousing finish of:

‘Cause when you’re older
Absolutely everything makes sense
This is fine

Stop. If things don’t make sense to you when you are starting out in your cloud journey, take the time to make them make sense. If you are using an agile mindset, launch an investigation spike on the singular topic or topics of confusion. For example, if you don’t understand the magic of the VPC, Security Group, Availability Zone or Set, Region, or Region to Region concepts today, they will make even less sense later down the road when you return to this configuration months later. If the test results don’t make sense, don’t move on, run them again. Also, remember to document the architecture, not just the details you think are important, but the details that the older you who have moved through six other projects and is facing a deadline would want to know to make it all make sense.

3. Don’t go running into the fire. Choose the right cloud high availability solution.

After a brush with the wind spirit, the enchanted forest is set ablaze by the fire spirit. As the fire spirit spreads chaos, and fire, Elsa charges off with icy blasts to cool the fire and calm this spirit. In her zeal, Anna runs into the fire behind her sister and has to be saved. When at last the two are reunited, Elsa admonishes her sister- “You can’t just follow me into fire”. The feisty Anna replies, “You don’t want me following you into fire? Then don’t run into fire!”

Migrating to the cloud and choosing the right availability solution for you can be stressful enough without complicating it by making unrealistic schedules with untested theories and scenarios. No leader of a deployment team wants his or her team in a fire drill, so don’t knowingly run into one. Create a plan. Establish checkpoints and milestones. Include realistic risk and risk management strategies. Communicate frequently with vendors and partners, and especially with your team. Test well, and understand backup and backout plans.

4. Don’t get stuck in the past re: what you bring in your cloud migration.

There is a song woven throughout the movie, “All is Found” with the chilling chorus – “Dive down deep into her sound / But not too far or you’ll be drowned.” Elsa dives down into the deep as the movie peaks and her search and exploration of the past leave her Frozen, her last gasp a burst of flurries to warn and inform her sister.

As one of the heads of our Customer Experience Team at SIOS Technology Corp., I have witnessed too many deployments and migrations get stuck in a comparison trap. The phrase goes like this, “In our old data center, we” or “The old system could do that.” It may be true that your old system of fixed systems, dedicated resources, large teams, specific networking, and high cost, high feature SAN storage could do that. (Although, truthfully, sometimes I’ve seen the curtain peeled back and on-premise didn’t really do that either). As you migrate to the cloud, understand what makes sense to mimic in the cloud and what doesn’t. Understand why the system was architected that way on-premises and using the help, “lesson 1, and the learning of lesson 2, make decisions that make sense.

Which leads me to the final lesson.

5. There will always be two kingdoms: on-premises and in the cloud.

At the end of the movie, Anna becomes queen of Arendelle once ruled by her sister Elsa, while Elsa stays and leads the people of the Northuldra. There is one who stays in or near the forest once surrounded by mystery and cloud, while the other returns to the familiar land.

As you consider migration to the cloud and your High Availability strategy, remember there will always be two kingdoms. Your migration strategy would do well to remember that you will always have a need for an on-premise data center to partner with your cloud deployments. Perhaps it isn’t as sprawling as it once was, but not every workload or critical infrastructure can be repurposed and packaged for the cloud. Having an HA solution and strategy that can equip and enable both “kingdoms” is essential.

Going to the cloud takes the right team, the right tools and the right solutions, and a strategy and plan to get there, without going through fire. As you migrate to the cloud, be willing to confront the past, understand it, and remember not to get stuck there. And, like the two sisters of Disney’s Frozen II, you’ll do well to remember every beautiful enterprise story has two sides, on-premise and cloud just might be yours.

— Cassius Rhue, VP, Customer Experience

Reproduced with permission from SIOS

Planning is Key to Enterprise Availability (and to a Happy Marriage)

July 30, 2020 by Jason Aw Leave a Comment

Planning is Key to Enterprise Availability (and to a Happy Marriage)

Planning dates and getaways, fabulously romantic dinners are a great part of loving your spouse well. Seminars and workshops overflowing with tips for improving your relationship abound in nearly every area of the world.

But, listen in on the training session provided by SIOS Technology Corp. Project Manager for Professional Services, Edmond Melkomian, and you’ll quickly learn that planning dinners and anniversary retreats aren’t the only way to love your spouse well.

In a recent class on SIOS Protection Suite for Linux, Edmond shared three tips that help you love your spouse well in an enterprise world: plan, plan, plan.

1. “Plan to plan” your enterprise availability solution

In his course, Edmond Melkomian asked students to name the first thing you should do when deploying an enterprise solution. His answer, “Plan, plan, plan.” It seems obvious, but the first step is to start making the plan. A fairly decent start for a plan includes developing the details for each of the project phases, such as milestones, checkpoints, risks, risk mitigation and strategies, stakeholders, timelines, stakeholder communication plans. A decent plan will also include details about kickoff, sign-off and closure, and resources (staffing, management, legal/contracts).

Plan to create, review, modify, and update your plan throughout the solution lifecycle.

2. Plan what to deploy for enterprise availability

Plan what to deploy. It is likely that a large portion of your enterprise infrastructure exists beyond the realm of the current team’s lifespan with your company. As you migrate to the cloud, or update your availability strategy, it is worth the time and effort to make a plan regarding what to deploy. Focus your plan on ensuring that you deploy redundancy at all critical components, network, compute, storage, power, cooling, and applications. All data centers and cloud providers typically ensure cooling, power, and network redundancy to start.

A number of firms offer architectural teams, cloud solution providers, availability experts, application architects, and migration specialists who help teams discover critical and sometimes hidden dependencies as well as high risk areas vulnerable to Single Points of Failure (SPOF’s). This investigative work will feed into your plan of what to deploy and/or update in your availability strategy.

Plan on reviewing what you need to deploy.

3. Plan to keep a QA/pre-production cluster for reliable availability

When I was in the SIOS Technology Corp. development team, I’ll never forget a Friday night call with a long time, but frantic customer. Earlier in the month a frequent customer unsuccessfully deployed a new software solution into a production environment. The result was a massive failure. He called our 800 number at 4:30pm (EST) on Friday. Why do I recall that exact time? Friday was date night. My wife and I had dinner plans, a babysitter for the six girls on standby (by the hour), and hopes for a romantic and relaxing evening. I was just about to head out for the day when the phone rang. After a tense first hour, we were back up and running. This unfortunate episode could have been avoided or mitigated by keeping a UAT or QA system on hand.

As Harrison Howell, the Software Engineer for Customer Experience at SIOS Technology Corp. noted in his blog 6-common-cloud-migration-challenges the limits of on-prem are no longer the same limits.

Customers coming from an on-prem system need to remember that resources are no longer a limiting factor. In the cloud, systems can be effortlessly copied and run in isolation of production, something not trivial on-premises. On-demand access to IT resources allows UAT of HA and DR to expand beyond “shut down the primary node”. Networks can be sabotaged, kernels can be panicked, even databases can be corrupted and none of this will impact production! Identifying and testing these scenarios improves HA and DR posture.

Plan on deploying and keeping a UAT system for HA and DR testing. As Harrison mentions, “identifying and testing [issues]” “improves [your overall] HA and DR posture,” and that improves your chances of a successful date night.

4. Plan regular maintenance and updates (including documentation)

Lastly, plan times for regular maintenance and updates to maintain Enterprise Availability. Your enterprise needs to remain highly available to remain highly profitable and successful. Environments don’t remain stagnant, and patches, security updates, expansion, and general maintenance are a regular occurrence from inception to retirement. Creating a plan for how and when you will incorporate updates and maintenance into your enterprise will ensure that you are not only kept up to date, but that you minimize risks and downtime while doing it. Be sure to include in your plan the use of a test system. Develop a planned routine and process for validating patches, kernel and OS updates, and security software, and don’t forget to update the project documentation and future plans as you go and grow.

If you can remember to plan for a highly redundant, highly reliable and highly available system upfront, plan to keep a QA/Pre-production cluster after Go-Live, and plan for regular maintenance and updates you will also be able to keep your plans with your spouse for date night. And not just date night, but you’ll also be able to keep your night’s free from 3am wake up calls due to down production systems. This is my tip for loving your spouse well.

I love my wife and so I help customers deploy SIOS Technology Corp.’s DataKeeper Cluster Edition and SIOS Protection Suite for Windows and Linux products as a part of a highly available enterprise protection solution. Contact SIOS.

— Cassius Rhue, VP, Customer Experience

Article reproduced with permission from SIOS

High Availability Software is Insurance Against SAP Downtime

July 18, 2020 by Jason Aw Leave a Comment

High Availability Software is Insurance Against SAP Downtime

We all need to buy insurance – for our cars, our houses, our lives. Nobody likes to pay money for a service that we hope we never have to use. But we all know that we should have it just in case. Most people either put off insurance until something awful happens, buy the cheapest, or actually do their homework and buy it from someone they trust. This last group usually fares the best.

High Availability Software is Insurance Against Downtime

Insurance is often for consumers, but it’s critical to businesses too. You have computer systems and applications that run your business. If they fail for some reason, you want your business to continue to run or it could cost millions of dollars in lost business through lost transactions and customer data, and irreparable damage to your reputation with your customers. High availability software is your “insurance” against system downtime. This is not something you can ignore. This is not something you can trust that will come along with your hardware or software infrastructure. You want to use high availability solutions from a company that has decades of expertise in high availability and knows how to keep your systems up and running.

A trusted high availability software company should:

Provide a single solution that is platform agnostic – usable on-prem, in the cloud, and on all of your hardware and software platforms
Have a product that is easy to configure and set up without having considerable application expertise
Know your applications and when your applications are having a problem
Take the proper action to attempt to restart or failover applications
Fail over the application to a secondary server, maintaining application best practices, and bringing the application back up in the proper order

One of the key applications used in enterprises today is SAP S/4HANA, based on the HANA in-memory database. Most SAP customers will be required to run the HANA database with SAP by 2025. You want to find an intelligent HANA availability solution from a company that knows high availability, that knows SAP, knows HANA, and knows what to do to ensure that your critical SAP applications, and your business, continue to run smoothly.

SIOS Technology is the company you can trust for a reliable High Availability Software. The 9.5 release of the LifeKeeper for Linux product contains a new HANA Application Recovery Kit. This will provide you with all you need to keep your SAP and HANA environment running. Want more information about this release? Watch this interview.

Reproduced with permission from SIOS

Test/QA Systems are a Critical Part of Enterprise Availability

July 8, 2020 by Jason Aw Leave a Comment

Test/QA Systems are a Critical Part of Enterprise Availability

“I could kiss you,” that’s what a friend blurted out to me nearly three decades ago as she ran towards me. She had dropped her reeds for her saxophone on the way to one of the biggest band competitions in our region. I didn’t know whose they were, but when I saw the pack of reeds on the seat on the bus I picked them up and took them with me to the warm-up area. Three minutes into her warm-up, her 1st reed cracked and she panicked as she reached into empty pockets for replacements. When I piped up that I had found them, she blurted out, “I could kiss you right now.”

As the VP of Customer Experience at SIOS Technology Corp. I have the unique and distinct pleasure of working with a number of enterprise customers and partners at different phases of the availability spectrum. Sometimes I have the opportunity of working with end customers for issue resolution, mitigation, and improvements. At other times our teams are actively working with partners and customers to architect and implement enterprise availability to protect their systems from downtime. A recent customer experience reminded me of something that happened nearly 30 years ago when my friend blurted out, “I could kiss you.”

My team and I were on a customer call. The call began with the usual pleasantries, introductions, and an overview of the customer’s enterprise environment. Thirty minutes into the call, things were going so well. Their architecture was solid, thoughtful, and well documented. Their team was knowledgeable, technically sound, and experienced. But then, the customer intimated that due to cost savings they would not be planning to maintain a dedicated test/quality system. I took a deep breath. Actually it was more of an exhale like the rush of air from a gut punch. I prepared to respond, but before I could a voice broke through. “The number one cause of downtime is lack of process,” exclaimed the Partner Rep Architect on the call with us. After a brief banter, the customer agreed to maintain a test/QA system and I nearly blurted out, “I could kiss you!”

On the front lines of many Enterprise deployments (new systems, data center migrations, and system updates) my teams in Support and Services have seen dozens of issues that could have been mediated by utilizing a test system/cluster.

A test/quality system is an invaluable part of an HA strategy to avoid downtime. Common tasks associated with maintaining an enterprise deployment such as patches, updates, and configuration changes come with risk. Enormous risk.

Commonly identified risks of testing in production include several serious and potentially catastrophic issues:

Corrupted or invalid data
Leaked protected data
Incorrect revenue recognition (canceled orders, etc.)
Overloaded systems
Unintended side effects or impacts on other production systems
High error rates that set off alerts and page people on-call
Skewed analytics (traffic funnels, A/B test results, etc.)
Inaccurate traffic logs full of script and bot activity (a)

If a customer attempts to apply risky changes in production, the result can be quite damaging. On top of those listed above, there is an increased risk of downtime, corruption of application installations, and in some cases irreversible damage. Take the case of Customer X (a high profile SAP Enterprise shop in the manufacturing industry).

After reading a critical notice from a reputable site, the OS Administrator quickly updated his production nodes to the latest kernel update available. Within hours the Production nodes began a series of uninitiated crashes and kernel panics. In his haste, he had installed a kernel that was incompatible with his configuration; the combination of existing application packages, devices, file systems, and related packages. This caused a production outage and several high priority escalations to multiple vendors.

When patches are applied to a test/QA or sandbox system, patches and critical fixes can be managed and verified to reduce loss of productivity and unplanned downtime. Testing applications in a production-like environment allows you to identify unforeseen problems and correct the issues before they adversely impact your operations. Pre-production design and testing eliminate costly business disruption, improve your customer experience and protect your brand.

Using a test QA System to Improve Production Availability and Processes

Here are the basics that using a test/QA system, can provide for improving your production availability and processes. A controlled environment, that is similar (it must resemble production as close as possible) to the production environment, provides the ability to:

Test kernel updates and security updates
Validate settings and configuration tuning
Reproduce production issues and test software updates and patches
Verify application version compatibility and reduce the risk of downtime due to incompatible changes
Provide a safe space to practice and revise go-live, maintenance, outage, and other enterprise procedural activities
Train new hires and team members without impacting enterprise clients

If you have a Test/QA environment for deploying your critical enterprise availability software, I could kiss you right now. Having this environment gives your team the ability “to test, validate and verify(2)” architecture, business requirements, user scenarios, and general integration with a system or set of systems that most closely resembles the production environment- you know the one that makes the money. Of course, you will still have to schedule windows to maintain your production systems and perform testing on them as well, but after a safe buffer step has been completed in between.

— Cassius Rhue, VP, Customer Experience

————-

References:

https://opensource.com/article/19/5/dont-test-production Accessed 5/4/2020
https://www.softwaretestingclass.com/system-testing-what-why-how/ Accessed 5/4/2020

Solution Brief: High Availability for SQL Server in Amazon Cloud Environments

May 17, 2020 by Jason Aw Leave a Comment

Solution Brief: High Availability for SQL Server in Amazon Cloud Environments

SIOS software provides a simple, cost-efficient way to provide high availability protection for SQL Server in the Amazon Web Services Cloud. Add SIOS DataKeeper Cluster Edition software to a Windows Server Failover Clustering Environment such as SQL Server Always On Failover Cluster Instance (FCI) to create a cloud-friendly SANless cluster. Use AWS Quickstart deployment templates to create a SIOS SANless cluster in minutes.

Fast, Cost-Efficient Way to Add High Availability

Like all traditional failover clustering solutions, SQL Server FCI environments require the use of a shared storage. This requirement makes them impractical or impossible in public cloud environments, including Amazon Web Services. SIOS SANless clustering software eliminates this requirement in an environment that is fully integrated with Windows Server Failover Clustering. SIOS software adds the flexibility to protect your business critical applications such as SQL Server Standard or Enterprise Edition in Windows or Linux and any combination of physical, virtual, and cloud environments.

Fast, Efficient Synchronization

SIOS software uses highly efficient block-level replication to synchronize storage in all cluster nodes in realtime to create a SANless cluster. By replicating data volumes at the block level, SIOS software use significantly fewer system resources, makes more efficient use of the available bandwidth and transfers more data faster across than file-based replication alternatives. As a result, SIOS software delivers incredibly fast replication speeds—without hardware accelerators or compression devices. You get efficient storage without the cost or configuration limitations of a traditional SAN-based environment.

Failover Across Availability Zones for Disaster Protection

It keeps real-time copies of data synchronized across multiple nodes and across EC2 Availability Zones (AZs) for availability and disaster protection.

High Availability with SQL Server Standard Edition

SIOS DataKeeper Cluster Edition software can be used with SQL Server Standard Edition FCI to create a cost-efficient high availability cluster without the need for more costly SQL Server Enterprise Edition licenses.

Solution Brief: High Availability for SQL Server in Amazon Cloud Environments