High Availability Archives - Page 26 of 46

5 Signs That It Will Take More Than A Blog Post To Fix Your High Availability

December 8, 2020 by Jason Aw Leave a Comment

5 signs that it will take more than a blog post to fix your high availability

5 Signs That It Will Take More Than A Blog Post To Fix Your High Availability

The signs are there. The warning lights are flashing. In your gut, you can sense it. Maybe you can’t sleep. Your problems with high availability are deep. But, maybe you are not quite sure.

1. If you think your cloud SLA is all you need for high availability

Cloud solutions have provided great advancements in increased hardware availability and resilience. However, application high availability requires more than just selecting the right hypervisor or cloud provider. Your strategy for high availability cannot stop with the SLA provided by the cloud or a virtualization provider. As quoted by Wired, “The almost four-day Amazon outage of April 2011 did not breach Amazon’s EC2 SLA, which as a FAQ explains, “guarantees 99.95% availability of the service within a Region over a trailing 365 period.” In this DZone article, our own David Bermingham breaks down the differences between cloud SLAs and application availability in detail. If you want a highly available infrastructure, it must include monitoring, recovery, and resilience at the data and application layers as well.

2. If you are just using the high availability clustering that came with your open source operating system

If so, then chances are you didn’t select your database based on what was bundled with the OS, so why would you select your HA solution based on that criteria alone. Bundled tools go a long way in providing extra assurance, possibilities, and capabilities. However, despite the ease of access, bundled tools and OS clustering software are not always capable of meeting your SLA, RPO, RTO, and availability requirements. If your enterprise has a combination of Operating Systems, your team will likely need help navigating different tools and understanding how they integrate together. It’s kind of like choosing the hedge clippers and push reel mower left on the curb to shape “Azalea” on the 13th hole par 5 (at Augusta). Both lawn mowers are designed to cut grass but how much time do you have? How are you going to handle the complexity? Which would you trust? Your strategy for high availability requires more than just considering the conveniences of what is bundled with the OS, otherwise, you’d be running MySQL instead of SAP HANA.

3. If you think that enterprise application licensing, such as SQL Enterprise or Oracle Enterprise, is the same thing as enterprise high availability

In addition to increased cost, many enterprise application licenses also increase the ability of the application to recover in some high availability scenarios. However, it is highly unlikely that your entire enterprise is based on a single application. Your high availability is going to require more than just a highly available database solution. You’ll need an enterprise grade application monitoring and recovery solution with a breadth of support for all of your applications and databases. In addition, you’ll need the ability to manage and replicate not just database data, but critical application and configuration data as well. Availability for a single database or a simple application is one thing – but HA for a complex, multipart application and supporting database is very different. More services, more parts that need to be coordinated, more complex architecture to orchestrate, more specific best practices to adhere to before, during and after failover/switchover. More than what your enterprise license paid for.

4. If your downtime is growing and your uptime is shrinking

The pace of life is ever increasing in many fields. When was the last time your team recovered from backup, manually restarted the applications that were deemed critical, or restarted a set of failed virtual machines or nodes? The pace of your outage events cannot continue to outpace sustainability, or your team’s ability to move beyond firefighting to fire prevention and fire proofing. “You can only run so hard so long (Carey Nieuwhof).” For some of you, you’ve been firefighting for too long, and your outages are becoming more common than your up-time.

5. If your first failover test was on the production server

A recent client remarked that it is simply impossible to test for every possible disaster scenario. As new software is created, deployed, updated, and patched the challenges in higher availability are increasing. But, your live, production data is not the place to find out what does not play well together. And while Go-Live and Post-Go-Live will always have their share of surprises, the inability to actually failover and run on the backup node should not be one of them.

Scouring blogs can provide you with helpful tips and insights to define, redefine, and improve your higher availability. But, if the warning signs are going off that you’ve traded true availability for some semblance of ‘just enough’, then it will take more than a blog post, or scouring every blog post in the availability world for that matter, to fix your HA.

– Cassius Rhue, Vice President, Customer Experience

Reproduced with permission from SIOS

9 Signs You Have an Application Availability Problem

November 27, 2020 by Jason Aw Leave a Comment

9 Signs You Have an Application Availability Problem

You’ve heard the saying “recognizing a problem is the first step in solving it.” But, many small, medium, and surprisingly, even large enterprise businesses aren’t aware that their application availability isn’t what it should be.

Read on for these nine signs that you still have an application availability problem:

1. You spend more time restarting an application than using it

Application crashes may be a fact of life, but if your application is down more often than it is up, that is a problem.

2. You’ve started to snooze through the alert storm in your inbox or control center

You have deployed alerts for application or server downtime, but the alert storm has so overwhelmed your inbox that you have silenced them all.

3. You have one data center for all your critical operations

A single data center for operations may sound convenient, but one well intended but misdirected construction crew has been known to turn single data centers into costly unavailability zones.

4. Your idea of data protection involves backup retrieval and archives

Your data protection strategy is critical. Data replication technology and site to site, region to region replication has become a mainstay, so if your replication or data protection strategy is non-existent or involves a lengthy jog to the vault this could be a big problem.

5. Your recovery procedures always require manual intervention

Manual intervention itself is not a problem. Some events are so difficult and complex that some amount of manual effort could be required. But, if manual intervention is always the first, second and third order of business after a server or application outage, that is a problem.

6. Your RTO is measured in days not hours or minutes

How are you measuring your recovery time objective (RTO)? Do you measure your RTO in days or hours instead of minutes per month? True, every business has a tolerance level for their RTO. However, your RTO should not be a function of server rebuilds and gross instabilities in your architecture.

7. You don’t know your RPO because your standby is never reliably in sync

You’ve checked the box on reliable monitoring and recovery of your application, and taken it a step further to provide a standby cluster ready system. Great job. But, before I let you off the hook, what is your recovery point objective (RPO)? An RPO should be something more accurate than “somewhere between day 0 and last night.”

8. Single points of failure don’t just exist, they are the norm

Where are your single points of failure? Your budget may not allow you to eliminate every single point of failure, but if you can identify a single point of failure in every major category and every critical component of your enterprise…

9. Your last disaster made local, regional, or national news

If the last major storm, grid failure, or failure event put a blight on your business due to downtime, then higher availability is the next order of business.

Downtime costs your business in terms of customers, productivity, and peace of mind. Unaddressed risks have a definite impact on your business and reputation. If these warning signings are there, you may have an availability problem. And, if you ignore them you’ll likely have even bigger problems soon thereafter, hence the importance of application availability.

— Cassius Rhue, VP, Customer Experience

Reproduced with permission from SIOS

Six Reasons Not to Buy SIOS High Availability Software . . . If You Dare

October 25, 2020 by Jason Aw Leave a Comment

Six Reasons Not to Buy SIOS High Availability Software . . . If You Dare

You need SIOS Protection Suite (for Linux or Windows) or SIOS DataKeeper Cluster Edition for high availability protection for business critical applications.

UNLESS

1. You prefer free solutions only.

I get it. There are definitely times when I do the same thing when I need to learn a new skill, get a quick tip, drop a few pounds, or set up a quick demo. Rather than signing up for a subscription, purchasing a license, or investing in a combination of the two, I have gone the free route.

However, the saying often holds true, you get what you pay for. Free trials are fine. Permanently free high availability is like gas station sushi – is the risk really worth it? Be sure that free doesn’t prevent you from utilizing the fullness available for optimizing uptime and increasing availability. Make sure you aren’t passing over a reasonably priced high availability solution that is proven to protect your mission-critical applications.

2. Being a single solution shop solution is more important than meeting your HA needs.

We were a “Ford tough” family for decades. Seriously. I understand what it is like to be a one solution shop. My dad owned a Ford truck for work, a Ford Mustang for leisure, a Ford 3600 tractor for the farm, and a Ford minivan for family travel. There was even a season where we received model toy cars with the brandished blue oval as well.

But, when my wife and I were branching out on our own family needs, we broke away from the single solution to address needs that fell out of the Ford wheelhouse (at the time). You may be a single shop buyer, but if your needs have changed and the HA provider or solution hasn’t kept up, consider whether expanding the solution set will eliminate risks, improve success, or be worth the investment in a complementary solution for those new needs. When we needed a reliable, gas efficient, sleek, family friendly, and economical solution for our family, we supplemented Ford tough with a Honda Odyssey. If you are a single stop shop, and you are not worried about vendor lock-in best of luck.

3. You are more of a do it yourself-er coder.

You like coding. You like to write a lot of scripts, and don’t mind pulling out your bash, ksh, perl, python, powershell, batch or command tool kit and wiring things up yourself. You value the joy in flexibility and adding your own tweaks.

I love writing code as well, but there are times when the last thing I want to do is spend time writing a lot of code and scripts for a problem that is solved, proven, and off the shelf ready. For the do-it-yourself admin, off-the-shelf may not be your preference, but consider whether 20 years of expertise and experience should be rehashed and re-architected for your enterprise. But, if you have to get the code writing fix in, High Availability Software SIOS provides the Generic Application Recovery Kit for you to get in a coding fix.

4. You need Ubuntu support (or Solaris).

Your environment is unique. You have customers who’ve cut their teeth on Solaris and are hanging on to it for dear life. Or you’ve got those who have fully embraced the Linux realm and have moved to Ubuntu. In either case, you look at the SIOS products matrix and Ubuntu isn’t currently a match for your SIOS version. Bummer!

While this is true, consider the rich and vast features and flavors of support that are still available. While there are parts of your enterprise that have dug in on Solaris and others that have raced to embrace Ubuntu and newer variants of Linux, it is more likely that you need a solution capable of supporting RHEL, OEL, SuSE, CentOS and possibly Windows as well. Be sure not to single out a high availability solution by what it doesn’t provide and consider the depth of what it does.

5. You don’t run a hybrid of anything in your environment.

I heard it in the middle of a movie last week. The lead character commented on the idea of moving forward with some new idea of an overly excited owner. The classic line: “Sometimes the juice isn’t worth the squeeze.” In your mind you feel that you aren’t running a hybrid environment. Your applications are critical, but not complex. The moving parts are simple- a database, front end and a supporting application. It makes sense that you might not want to “complicate” things with additional processes, products, solutions or services, and you may feel like the juice isn’t worth the squeeze.

Before you make that final decision for a High Availability Software, assess whether a non-hybrid environment is the same as a simple environment. Consider whether or not the moving parts are as simple as you imagine or whether a solution with failover orchestration would be beneficial to reducing your overall RTO and increasing your RPO.

6. Endorsements from HA experts and experience don’t matter.

I bought a set of headphones online in mid-April. As I suspected, I discovered that anyone can do bluetooth headphones. But, not everyone can do them well. Ergonomically, the “new to market” headphones are a nightmare. Pairing was a breeze, but accidental unpairing is a constant battle. The sound quality is amazing, but that amplifies my annoyance when the headphones randomly chirp – loudly and clearly – for system sounds or at the end of a song.

You may believe that high availability and application monitoring can be done by anyone and that experience doesn’t matter. However, consider your own experiences and mine and ask if you’d really want to trust your enterprise environment to a group that just started thinking about the complexities of hybrid environments, or the dependencies and application-centric knowledge needed for the applications you use most frequently.

When deciding the right High Availability Software for your environment, consider carefully whether you want to go without the many best in class features, hardened and tested solutions, knowledgeable experts, broad swath of supported applications and environments, and industry leading experience and decades of insight. Then after careful consideration, choose wisely.

-Cassius Rhue, Vice President, Customer Experience, SIOS

Reproduced with permission from SIOS

Expand Your High Availability Metrics

September 20, 2020 by Jason Aw Leave a Comment

Expand Your High Availability Metrics

In the technology field, we love data. We love data about data and all the metrics and measures that our tools can bring. We’ve created industries around analytics, products that capture every detail from thousands of connected devices. We love metrics and measures. In many instances within the higher availability space, we love the high availability metrics that tell us how quickly a system recovered from the failure. We calculate and track the time between detection and remediation, and we obsess over knowing and measuring how much transactional data would be lost in a disaster, system failure, or disk crash.

Ironically, in high availability and disaster recovery (HA/DR) systems, there are some metrics that don’t get enough attention.

Here are eight other high availability metrics you should be watching to manage your environment:

1. Security alerts

Availability isn’t just about application monitoring and recovery. Systems that are publicly available are always under attack. If you aren’t monitoring security alerts and warnings, your applications may be running flawlessly, while your intellectual property is being funneled flawlessly out the door.

2. Idle connections

Idle connections sound harmless, but they are about as harmless as the green leafy kudzu on a southern lawn. Idle connections take up resources and threaten to fill database pools, congest networks, and stifle performance. Furthermore, idle connections can indicate a problem in the application layer or database configuration.

3. Long-running queries, commands, or jobs

This applies not just to database queries or jobs, but also to commands and backups. Long running queries, commands and jobs can be an indicator of poor system health, slow disk speeds, CPU or other resource contention, or deeper systematic, application compatibility or OS problems.

4. Disk IO

Disk IO typically refers to the input/output operations of the system related to disk activity. Measuring disk I/O can help identify bottlenecks, poor hardware configurations, improperly sized disk or poorly tuned disk layouts for a given workload. Monitoring disk I/O can help tell you if the long running queries are a function of poor sql syntax, poorly coded applications, or latency and access problems.

5. Memory

We all think about how much memory is being used, but memory monitoring goes beyond measuring and looking at free versus used. Monitoring memory helps you look into bottlenecks, leaks, identify improperly sized systems, understand load, load average, and spikes. In addition, knowing about memory intensive patterns can help you tune your availability suite to avoid false failures.

6. Disk Space

As VP of Customer Experience I once had the unfortunate experience of waking up early in the morning for an emergency call. The customer was facing a down production system after a power outage. When they tried to restart their system their protected applications failed to start. After a quick check of the error logs it was clear that the root drive was 100% full. The application could not write to any of the file systems. Disk space monitoring is available in many forms and ways and having it as a metric can prevent unnecessary problems and costly last-minute scrambles to add more. .

7. Errors and alerts

Errors, alerts, and recovery messages in the logs are another good metric to consider. Your availability solution may be keeping your clients online and happy, but it may also be masking an issue that will need your attention soon. Adding log monitoring for FATAL, PANIC, and key ERROR messages can help you identify issues that your availability solution is frequently recovering from, such as database crashes, application panics or core dumps, or fatal errors requiring a cold restart.

8. Recovery numbers

Similar to monitoring errors and alerts, the recovery numbers can tell you a lot about the health of your system’s availability. If you are averaging more than one application recovery per week, you’re likely experiencing something more than your normal availability protection. And while the recovery was successful in restarting your application or system, too many of these false or even real recoveries isn’t healthy.

The list of HA/DR metrics that we can monitor and the tools to monitor them are growing by leaps and bounds. Be sure that you and your team consider expanding your current data capture and analysis to include those that make for the best higher availability system possible.

— Cassius Rhue, VP, Customer Experience

Reproduced with permission from SIOS

How to Deliver High Availability for SQL Server in Linux Environments

September 10, 2020 by Jason Aw Leave a Comment

How to Deliver High Availability for SQL Server in Linux Environments

How to Deliver High Availability For SQL Server in Linux Environments

If your organization is running business-critical Microsoft SQL Server on Linux, your IT team undoubtedly knows how challenging continually maintaining high availability, performance and security can be. Particularly difficult is how to ensure high availability with robust replication and automatic failover. Using open-source software and an easily configured HA SANless cluster solution can offer a simpler maintenance approach without sacrificing the safety and performance your organization requires.

Limited High Availability Options for Linux

Most Linux distributions give IT departments two inferior choices for high availability: either pay more for the SQL Server Enterprise Edition to implement Always On Availability Groups, or struggle to make complex do-it-yourself HA Linux configurations work well—something that can be extraordinarily difficult to do.

The problem with using the Enterprise Edition is that it undermines the cost-saving strategy for using an open-source operating system on commodity hardware. For a limited number of small SQL Server applications, it might be possible to justify the additional cost. But it’s too expensive for many database applications and will do nothing to provide general-purpose HA for Linux.

Providing HA across all applications running in a Linux environment is possible using open-source software, such as Pacemaker and Corosync, or SUSE Linux Enterprise High Availability Extension. But getting the full software stack to work as desired requires creating (and testing) custom scripts for each application, and these scripts often need to be retested and updated after even minor changes are made to any of the software or hardware being used. Availability-related capabilities that are unsupported in both SQL Server Standard Edition and Linux can make this effort more challenging.

Finding an Alternative High Availability Solution for SQL Server in Linux

To make HA both cost-effective and easy to implement, you may want to consider two different, general-purpose approaches.

One is using storage-based systems that protect data by replicating it within a redundant and resilient storage area networks (SANs). This approach is agnostic with respect to the host operating system, but it requires that the entire SAN infrastructure be acquired from a single vendor and relies on separate failover provisions to deliver high availability.

The other approach is host-based and involves creating a storage-agnostic SANless cluster across Linux server instances. As an HA overlay, these clusters are capable of operating across both the LAN and WAN in private, public and hybrid clouds. The overlay is also application-agnostic, enabling organizations to have a single, universal HA solution across all applications. While this approach does consume host resources, these are relatively inexpensive and easy to scale in a Linux environment.

Most HA SANless cluster options provide a combination of real-time block-level data replication, continuous application monitoring, and configurable failover/failback recovery policies to protect all business-critical applications, including those using Always On Failover Cluster Instances available in the Standard Edition of SQL Server.

SIOS Technology Corp. offers more robust HA SANless cluster solutions for Linux with advanced capabilities that are designed to free IT from the complexity and daily challenges of supporting and optimizing computing infrastructures. The SIOS Protection Suite solution with LifeKeeper provides:

Continuous monitoring of the entire Linux application stack
Complete Application-Aware Protection with its application recovery kits (ARK) for fast, safe recovery or failover of complex applications and databases
Wizard-driven setup for Linux clustering
Configuration flexibility, such as using a traditional shared-storage cluster or software to synchronize local storage in a SANless cluster configuration

For example, a SANless cluster can handle two concurrent failures. The basic operation is the same in the LAN and WAN, as well as across private, public, and hybrid clouds.

In a typical two-node cluster server #1 is initially the primary that replicates data to servers #. It experiences a problem, automatically triggering a failover to server #2, which now becomes the primary.

In this situation, the IT department would likely begin diagnosing and repairing whatever problem caused server #1 to fail. Once fixed, it could take over as the primary or server #2 could continue in that capacity replicating data to servers #1.

With most HA SANless clustering configurations, failovers are automatic, and both failovers and failbacks can be controlled by a browser-based console.

For further information about SIOS LifeKeeper and Protection Suite solutions, visit SIOS SAN and SANless High Availability Clusters for Cluster Server Environments.

Reproduced with permission from SIOS