Clustering Simplified Archives - Page 70 of 106

Lessons in Cloud High Availability from the Movies

August 11, 2020 by Jason Aw Leave a Comment

Lessons in Cloud High Availability from the Movies

Joseph Lalonde of jmlalonde.com has a blog highlighting the leadership lessons from popular movies such as Hancock, The Greatest Showman, and Frozen II. In credit to Joseph’s inspiring leadership lessons, here are four high availability lessons on cloud migration from Disney’s Frozen II.

Disney’s Frozen II and Cloud Migration Lessons

In Disney’s animated adventure Frozen II the characters Anna, Elsa, Kristoff, Olaf and Sven leave Arendelle to travel to an ancient, autumn-bound forest of an enchanted land. In the adventure, they set out to find the origin of Elsa’s powers in order to save their kingdom. Aside from being fun for a father of six girls, the movie is also ripe with leadership, life, and high availability lessons.

1. You can’t go into cloud migration without the proper help.

When Elsa, Anna, Kristoff, Olaf and Sven arrive at the place where the mysterious voice has been calling them, they stand outside an enormous cloud. When Olaf and Kristoff attempt to enter, they are bounced back and repelled. It is only when Elsa approaches the cloud with her magic that the portal opens and they enter.

When migrating to the cloud, there will be a lot of voices calling you. Some will beckon you to AWS, or Azure or GCP, or a myriad of others. But, no matter which you intend to go with, know that you will need the proper help for a successful entry. This help should include:

Architects knowledgeable of the platform, especially the networking aspects
Application administrators who understand the applications installation, configuration, operation and maintenance characteristics
Security and OS administrators
A cloud partner representative
A knowledgeable availability expert for deploying critical applications in highly available architecture

2. Things will not make sense when you are older, so document and iterate.

While walking through the enchanted forest, Olaf begins experiencing the strange phenomenon and magic of the forest. As he does, he begins to sing:

This will all make sense when I am older
Someday I will see that this makes sense
One day, when I’m old and wise
I’ll think back and realize
That these were all completely normal events
I’ll have all the answers when I’m older

The song concludes with the rousing finish of:

‘Cause when you’re older
Absolutely everything makes sense
This is fine

Stop. If things don’t make sense to you when you are starting out in your cloud journey, take the time to make them make sense. If you are using an agile mindset, launch an investigation spike on the singular topic or topics of confusion. For example, if you don’t understand the magic of the VPC, Security Group, Availability Zone or Set, Region, or Region to Region concepts today, they will make even less sense later down the road when you return to this configuration months later. If the test results don’t make sense, don’t move on, run them again. Also, remember to document the architecture, not just the details you think are important, but the details that the older you who have moved through six other projects and is facing a deadline would want to know to make it all make sense.

3. Don’t go running into the fire. Choose the right cloud high availability solution.

After a brush with the wind spirit, the enchanted forest is set ablaze by the fire spirit. As the fire spirit spreads chaos, and fire, Elsa charges off with icy blasts to cool the fire and calm this spirit. In her zeal, Anna runs into the fire behind her sister and has to be saved. When at last the two are reunited, Elsa admonishes her sister- “You can’t just follow me into fire”. The feisty Anna replies, “You don’t want me following you into fire? Then don’t run into fire!”

Migrating to the cloud and choosing the right availability solution for you can be stressful enough without complicating it by making unrealistic schedules with untested theories and scenarios. No leader of a deployment team wants his or her team in a fire drill, so don’t knowingly run into one. Create a plan. Establish checkpoints and milestones. Include realistic risk and risk management strategies. Communicate frequently with vendors and partners, and especially with your team. Test well, and understand backup and backout plans.

4. Don’t get stuck in the past re: what you bring in your cloud migration.

There is a song woven throughout the movie, “All is Found” with the chilling chorus – “Dive down deep into her sound / But not too far or you’ll be drowned.” Elsa dives down into the deep as the movie peaks and her search and exploration of the past leave her Frozen, her last gasp a burst of flurries to warn and inform her sister.

As one of the heads of our Customer Experience Team at SIOS Technology Corp., I have witnessed too many deployments and migrations get stuck in a comparison trap. The phrase goes like this, “In our old data center, we” or “The old system could do that.” It may be true that your old system of fixed systems, dedicated resources, large teams, specific networking, and high cost, high feature SAN storage could do that. (Although, truthfully, sometimes I’ve seen the curtain peeled back and on-premise didn’t really do that either). As you migrate to the cloud, understand what makes sense to mimic in the cloud and what doesn’t. Understand why the system was architected that way on-premises and using the help, “lesson 1, and the learning of lesson 2, make decisions that make sense.

Which leads me to the final lesson.

5. There will always be two kingdoms: on-premises and in the cloud.

At the end of the movie, Anna becomes queen of Arendelle once ruled by her sister Elsa, while Elsa stays and leads the people of the Northuldra. There is one who stays in or near the forest once surrounded by mystery and cloud, while the other returns to the familiar land.

As you consider migration to the cloud and your High Availability strategy, remember there will always be two kingdoms. Your migration strategy would do well to remember that you will always have a need for an on-premise data center to partner with your cloud deployments. Perhaps it isn’t as sprawling as it once was, but not every workload or critical infrastructure can be repurposed and packaged for the cloud. Having an HA solution and strategy that can equip and enable both “kingdoms” is essential.

Going to the cloud takes the right team, the right tools and the right solutions, and a strategy and plan to get there, without going through fire. As you migrate to the cloud, be willing to confront the past, understand it, and remember not to get stuck there. And, like the two sisters of Disney’s Frozen II, you’ll do well to remember every beautiful enterprise story has two sides, on-premise and cloud just might be yours.

— Cassius Rhue, VP, Customer Experience

Reproduced with permission from SIOS

Why is AWS EC2 Application Monitoring So Hard?

August 2, 2020 by Jason Aw Leave a Comment

Why is AWS EC2 Application Monitoring So Hard?

Congratulations! You’ve migrated your core applications to the AWS cloud. Or, you are developing new “cloud-native” applications and hosting them in the cloud. Perhaps you are taking advantage of Amazon EC2’s scalability and its elastic architecture. Either way, you now want to ensure that those applications stay up and running, or that you are alerted quickly if and when something happens.

Because something will happen. Our customer data shows that companies using only three EC2 instances experience downtime at least once a month. That means unhappy users unable to access their applications. You need a monitoring solution to tell you what’s going on.

How to narrow down EC2 application monitoring solutions

The first step in your search for the perfect EC2 monitoring solution should be to understand your requirements and your own technical capabilities. Monitoring solutions are not all alike.

Are you interested in a feature-rich solution that monitors a wide array of systems? Or one that focuses on a core set of systems, such as your EC2 environment?

What do you want to do with the output from your application monitoring solution? Do you want as much information as possible to help your developers’ troubleshoot issues? Or are you looking for quick alerts and assistance in remediating from any failures?

And what is your technical appetite to install and manage another application? Do you love scripting? Or do you want something that is “set-it-and-forget-it”?

A search for “application performance monitoring solutions” on Google returns 1,170,000,000 results! Jump into the Amazon AWS Marketplace and you’ll find 453 products listed in the DevOps – Monitoring category. Having a clear sense of your requirements and your own technical capabilities will help you narrow down your search.

Monitoring applications running on Amazon EC2 with Amazon CloudWatch or other APM solutions

If you are hosting your applications on Amazon EC2, then you might consider using Amazon CloudWatch. How familiar are you with standard and custom metrics? You should know that you need quite a lot of technical expertise to run Amazon CloudWatch properly. Amazon CloudWatch is a great solution for users who need data and actionable insights to respond to system-wide performance changes, optimize resources and a unified view of their operational health. But this all comes at a price in terms of the knowledge and experience needed to configure and manage Amazon CloudWatch properly.

Another choice is for you to evaluate and acquire one of the many commercially available application performance monitoring (“APM”) solutions on the market, such as from AppDynamics, Datadog, Dynatrace, or New Relic. But keep in mind your requirements. How broadly do you need to monitor? And what do you intend to do with that information? Are you ready to be overwhelmed with alerts? And be aware that many APM solutions do nothing to help you recover beyond pinpointing the issue. You still have to drop everything to manually restart services or reboot your instances.

Monitor applications running on Amazon EC2 using SIOS AppKeeper

But there is another way. SIOS AppKeeper is a SaaS service that can be configured to automatically discover any EC2 instances and their services. It then automatically take any number of actions if and when downtime is experienced. So instead of getting alerts that something is wrong, you get notified that something happened and was automatically addressed.
Why-App_monitoring-hard-2-1024x470

SIOS AppKeeper starts at only US $40 per instance per month. We invite you to view this short video to see how easy it is to install and use AppKeeper.

Why is AWS EC2 Application Monitoring So Hard?

One of our customers, Hobby Japan, a publishing company in Tokyo, was initially using Amazon CloudWatch but their understaffed IT team couldn’t respond fast enough to alerts. They wanted to leverage automation and moved to SIOS AppKeeper. Since moving to AppKeeper they haven’t experienced any issues or unexpected downtime with their EC2 instance. Here’s a link to a case study on Hobby Japan.

Monitoring your cloud applications shouldn’t be a full-time job. You want a monitoring solution that is easy to install and use, doesn’t overwhelm you with alerts, and hopefully takes care of systems impairments automatically. We encourage you to try a 14-day free trial of SIOS AppKeeper by signing up here.

Article reproduced with permission from SIOS

Planning is Key to Enterprise Availability (and to a Happy Marriage)

July 30, 2020 by Jason Aw Leave a Comment

Planning is Key to Enterprise Availability (and to a Happy Marriage)

Planning dates and getaways, fabulously romantic dinners are a great part of loving your spouse well. Seminars and workshops overflowing with tips for improving your relationship abound in nearly every area of the world.

But, listen in on the training session provided by SIOS Technology Corp. Project Manager for Professional Services, Edmond Melkomian, and you’ll quickly learn that planning dinners and anniversary retreats aren’t the only way to love your spouse well.

In a recent class on SIOS Protection Suite for Linux, Edmond shared three tips that help you love your spouse well in an enterprise world: plan, plan, plan.

1. “Plan to plan” your enterprise availability solution

In his course, Edmond Melkomian asked students to name the first thing you should do when deploying an enterprise solution. His answer, “Plan, plan, plan.” It seems obvious, but the first step is to start making the plan. A fairly decent start for a plan includes developing the details for each of the project phases, such as milestones, checkpoints, risks, risk mitigation and strategies, stakeholders, timelines, stakeholder communication plans. A decent plan will also include details about kickoff, sign-off and closure, and resources (staffing, management, legal/contracts).

Plan to create, review, modify, and update your plan throughout the solution lifecycle.

2. Plan what to deploy for enterprise availability

Plan what to deploy. It is likely that a large portion of your enterprise infrastructure exists beyond the realm of the current team’s lifespan with your company. As you migrate to the cloud, or update your availability strategy, it is worth the time and effort to make a plan regarding what to deploy. Focus your plan on ensuring that you deploy redundancy at all critical components, network, compute, storage, power, cooling, and applications. All data centers and cloud providers typically ensure cooling, power, and network redundancy to start.

A number of firms offer architectural teams, cloud solution providers, availability experts, application architects, and migration specialists who help teams discover critical and sometimes hidden dependencies as well as high risk areas vulnerable to Single Points of Failure (SPOF’s). This investigative work will feed into your plan of what to deploy and/or update in your availability strategy.

Plan on reviewing what you need to deploy.

3. Plan to keep a QA/pre-production cluster for reliable availability

When I was in the SIOS Technology Corp. development team, I’ll never forget a Friday night call with a long time, but frantic customer. Earlier in the month a frequent customer unsuccessfully deployed a new software solution into a production environment. The result was a massive failure. He called our 800 number at 4:30pm (EST) on Friday. Why do I recall that exact time? Friday was date night. My wife and I had dinner plans, a babysitter for the six girls on standby (by the hour), and hopes for a romantic and relaxing evening. I was just about to head out for the day when the phone rang. After a tense first hour, we were back up and running. This unfortunate episode could have been avoided or mitigated by keeping a UAT or QA system on hand.

As Harrison Howell, the Software Engineer for Customer Experience at SIOS Technology Corp. noted in his blog 6-common-cloud-migration-challenges the limits of on-prem are no longer the same limits.

Customers coming from an on-prem system need to remember that resources are no longer a limiting factor. In the cloud, systems can be effortlessly copied and run in isolation of production, something not trivial on-premises. On-demand access to IT resources allows UAT of HA and DR to expand beyond “shut down the primary node”. Networks can be sabotaged, kernels can be panicked, even databases can be corrupted and none of this will impact production! Identifying and testing these scenarios improves HA and DR posture.

Plan on deploying and keeping a UAT system for HA and DR testing. As Harrison mentions, “identifying and testing [issues]” “improves [your overall] HA and DR posture,” and that improves your chances of a successful date night.

4. Plan regular maintenance and updates (including documentation)

Lastly, plan times for regular maintenance and updates to maintain Enterprise Availability. Your enterprise needs to remain highly available to remain highly profitable and successful. Environments don’t remain stagnant, and patches, security updates, expansion, and general maintenance are a regular occurrence from inception to retirement. Creating a plan for how and when you will incorporate updates and maintenance into your enterprise will ensure that you are not only kept up to date, but that you minimize risks and downtime while doing it. Be sure to include in your plan the use of a test system. Develop a planned routine and process for validating patches, kernel and OS updates, and security software, and don’t forget to update the project documentation and future plans as you go and grow.

If you can remember to plan for a highly redundant, highly reliable and highly available system upfront, plan to keep a QA/Pre-production cluster after Go-Live, and plan for regular maintenance and updates you will also be able to keep your plans with your spouse for date night. And not just date night, but you’ll also be able to keep your night’s free from 3am wake up calls due to down production systems. This is my tip for loving your spouse well.

I love my wife and so I help customers deploy SIOS Technology Corp.’s DataKeeper Cluster Edition and SIOS Protection Suite for Windows and Linux products as a part of a highly available enterprise protection solution. Contact SIOS.

— Cassius Rhue, VP, Customer Experience

Article reproduced with permission from SIOS

How to Combine Backup, Replication and High Availability Clustering

July 22, 2020 by Jason Aw Leave a Comment

How to Combine Backup, Replication and High Availability Clustering

Backup, replication, and high availability (HA) clustering are fundamental parts of IT risk management, and they are as indispensable as the wheels on a car. Replication is also essential to IT data protection.

Backup and HA Cluster Environments Are Not Mutually Exclusive

While backup, replication, and failover are all important, there are key distinctions among them that need to be understood to ensure they are applied properly.

For example, while you can use replication to maintain a continuously up-to-date copy of data, without considering it in the larger data protection environment, you will also copy problem data (such as virus-infected data).

In such cases, a backup is essential to bring the data back to the last known good point. By performing replication, you can access the image replicated immediately before the system failure (= RTO / RTO is superior) in a way that simply storing data by generation and supporting it in an eDiscovery type model cannot.

Therefore, SIOS Protection Suite includes both SIOS LifeKeeper clustering software and DataKeeper replication software. SIOS LifeKeeper is an HA failover cluster product that monitors application health and orchestrates application failover and DataKeeper is block-based storage replication software. However, just because it is an HA cluster does not mean that backup is unnecessary. Consider the precautions and points to note when backing up in an HA cluster environment using SIOS Protection Suite.

Five Points of Backup in a High Availability Clustering Environment

Consider the following five points as the target of backup acquisition:

Operating System (OS)
SIOS Protection Suite – LifeKeeper/DataKeeper program Clustering Software
SIOS Protection Suite – LifeKeeper/DataKeeper configuration information
Application programs (e.g., SQL Server, SAP S/4 HANA, Oracle, PostgreSQL, etc.)
Application data

Backup the OS

To back up the OS it is common to use a standard OS utility or third-party backup software. However, since there is no special consideration for the high availability environment, we will not cover it here.

Backup the SIOS Protection Suite Clustering Software

SIOS Protection Suite includes SIOS LifeKeeper / DataKeeper program can also be obtained with the OS standard utility or third-party backup software, but if the program disappears due to a disk failure etc. without intentionally backing it up, you need to reinstall it. There will probably be some people who think about the dichotomy of doing so.

Backup the SIOS Protection Suite Configuration Information

SIOS LifeKeeper comes with a simple command called lkbackup that enables you to backup the configuration information. lkbackup can be run on SIOS LifeKeeper and related resources and will not impact running services.

This command can be executed in the following three main cases.

Immediately after installing newly created SIOS LifeKeeper resources
Before and after changing the SIOS LifeKeeper configuration (adding/changing dependencies, adding/deleting resources)
Before and after SIOS LifeKeeper version upgrade

If you back up the configuration information with lkbackup, even if the configuration information disappears due to a disk failure or if the configuration information is corrupted due to an operation mistake, etc.) you can quickly return to the original operational state.

Backup Operational Programs

Although backing up operation programs refers to backing up the business applications being protected in your HA cluster, it is possible to create and restore a backup image using the OS standard utility or third-party backup software as in 1. and 2 above.

Backup Business Application Data

In an HA cluster environment, shared storage that can be accessed by both active and standby servers is provided. During normal operation, the shared storage is used by the active cluster node. Application data (for example, database data) is usually storage in this shared storage, but the following points should be kept in mind when backing up this storage.

For shared storage configuration

When acquiring a backup of the data located in a SANless cluster configuration with storage shared by both the active cluster node and the standby system, the data can only be accessed from the active system (the standby system cannot access the data). As a result, the backup is also active. In this case, ensure that there is sufficient processing power to handle a failover and backup restore scenario.

For shared storage configuration

For data replication configuration

In the case of the data replication configuration, the backup from the operating system is the basic, but by temporarily stopping the mirroring and releasing the lock, the backup can also be executed on the standby system side. However, in this case, the data is temporarily out of sync.

For data replication configuration

Backing up a cluster node from an external backup server

To perform a cluster node backup from an external backup server, use either the virtual or real IP address of the cluster node. The points to note in each case are as follows.

Backing up using the virtual IP address of a cluster node

From the backup server’s perspective, backup is executed to the node indicated by the virtual IP address of LifeKeeper. In this case, the backup server does not need to be aware of which node is the active node.

Backing up using the virtual IP address of a cluster node

Backing up using the real IP address of the cluster node

From the backup server’s perspective, the backup is performed to the real IP address without using the virtual IP address of LifeKeeper. Since the shared storage cannot be accessed from the standby cluster node, the backup server and client must check which node is the active node.

Combining backup, replication, and failover clustering in a well-tested and verified configuration backup is indispensable. Using perform sufficient operation verification in advance on the user side.

Reproduced with permission from SIOS

High Availability Software is Insurance Against SAP Downtime

July 18, 2020 by Jason Aw Leave a Comment

High Availability Software is Insurance Against SAP Downtime

We all need to buy insurance – for our cars, our houses, our lives. Nobody likes to pay money for a service that we hope we never have to use. But we all know that we should have it just in case. Most people either put off insurance until something awful happens, buy the cheapest, or actually do their homework and buy it from someone they trust. This last group usually fares the best.

High Availability Software is Insurance Against Downtime

Insurance is often for consumers, but it’s critical to businesses too. You have computer systems and applications that run your business. If they fail for some reason, you want your business to continue to run or it could cost millions of dollars in lost business through lost transactions and customer data, and irreparable damage to your reputation with your customers. High availability software is your “insurance” against system downtime. This is not something you can ignore. This is not something you can trust that will come along with your hardware or software infrastructure. You want to use high availability solutions from a company that has decades of expertise in high availability and knows how to keep your systems up and running.

A trusted high availability software company should:

Provide a single solution that is platform agnostic – usable on-prem, in the cloud, and on all of your hardware and software platforms
Have a product that is easy to configure and set up without having considerable application expertise
Know your applications and when your applications are having a problem
Take the proper action to attempt to restart or failover applications
Fail over the application to a secondary server, maintaining application best practices, and bringing the application back up in the proper order

One of the key applications used in enterprises today is SAP S/4HANA, based on the HANA in-memory database. Most SAP customers will be required to run the HANA database with SAP by 2025. You want to find an intelligent HANA availability solution from a company that knows high availability, that knows SAP, knows HANA, and knows what to do to ensure that your critical SAP applications, and your business, continue to run smoothly.

SIOS Technology is the company you can trust for a reliable High Availability Software. The 9.5 release of the LifeKeeper for Linux product contains a new HANA Application Recovery Kit. This will provide you with all you need to keep your SAP and HANA environment running. Want more information about this release? Watch this interview.

Reproduced with permission from SIOS