SIOS SANless clusters

SIOS SANless clusters High-availability Machine Learning monitoring

  • Home
  • Products
    • SIOS DataKeeper for Windows
    • SIOS Protection Suite for Linux
  • News and Events
  • Clustering Simplified
  • Success Stories
  • Contact Us
  • English
  • 中文 (中国)
  • 中文 (台灣)
  • 한국어
  • Bahasa Indonesia
  • ไทย

Fifty Ways to Improve Your High Availability

April 5, 2021 by Jason Aw Leave a Comment

Fifty Ways to Improve Your High AvailabilityFifty Ways to Improve Your High Availability

I love the start of another year.  Well, most of it.  I love the optimism, the mystery, the potential, and the hope that seems to usher its way into life as the calendar flips to another year.  But, there are some downsides with the turn of the calendar.  Every year the start of the New Year brings ‘____ ways to do_____.  My inbox is always filled with, “Twenty ways to lose weight.”  “Ten ways to build your portfolio.”  “Three tips for managing stress.”  “Nineteen ways to use your new iPhone.”  The onslaught of lists for self improvement, culture change, stress management, and weight loss abound, for nearly every area of life and work, including “Thirteen ways to improve your home office.”  But, what about high availability?  You only have so much time every week. So how do you make your HA solution more efficient and robust than ever.  Where is your list?  Here it is, fifty ways to make your high availability architecture and solution better:

  1. Get more information from the cluster faster
  2. Set up alerts for key monitoring metrics
  3. Add analytics.  Multiply your knowledge
  4. Establish a succinct architecture from an authoritative perspective
  5. Connect more resources. Link up with similar partners and other HA professionals
  6. Hire a consultant who specializes in high availability
  7. 100x existing coverage. Expand what you protect
  8. Centralize your log and management platforms
  9. Remove busywork
  10. Remove hacks and workarounds
  11. Create solid repeatable solution architectures
  12. Utilize your platforms: Public, private, hybrid or multi-cloud
  13. Discover your gaps
  14. Search for Single Points of Failure (SPOFs)
  15. Refuse to implement incomplete solutions
  16. Crowdsource ideas and enhancements
  17. Go commercial and purpose built
  18. Establish a clear strategy for each life cycle phase
  19. Clarify decision making process
  20. Document your processes
  21. Document your operational playbook
  22. Document your architecture
  23. Plan staffing rotation
  24. Plan maintenance
  25. Perform regular maintenance (patches, updates, security fixes)
  26. Define and refine on-boarding strategies
  27. Clarify responsibility
  28. Improve your lines of communication
  29. Over communicate with stakeholders
  30. Implement crisis resolution before a crisis
  31. Upgrade your infrastructure
  32. Upsize your VM; CPU, memory, and IOPs
  33. Add redundancy at the zone or region level
  34. Add data replication and disaster recovery
  35. Go OS and Cloud agnostic
  36. Get training for the team (cloud, OS, HA solution, etc)
  37. Keep training the team
  38. Explore chaos testing
  39. Imitate the best in class architectures
  40. Be creative.  Innovation expands what you can protect and automate.
  41. Increase your automation
  42. Tune your systems
  43. Listen more
  44. Implement strict change management
  45. Deploy QA clusters.  Test everything before updating/upgrading production
  46. Conduct root cause analysis exercises on any failures
  47. Address RCA and Closed Loop Corrective Action reports
  48. Learn your lesson the first time.  Reuse key learnings.
  49. Declutter.  Don’t run unnecessary services or applications on production clusters
  50. Be persistent.  Keep working at it.

So, what are the ideas and ways that you have learned to increase and improve your enterprise availability? Let us know!

-Cassius Rhue, VP, Customer Experience

Reproduced from SIOS

Filed Under: Clustering Simplified Tagged With: Application availability, High Availability, high availability - SAP, SQL Server High Availability

Why Does High Availability Have To Be So Complicated?

March 1, 2021 by Jason Aw Leave a Comment

Why Does High Availability Have To Be So Complicated?

Why Does love High Availability Have To Be So Complicated?

It’s the Hallmark movie season, I mean Christmas season, I mean Hallmark Christmas movie season… (don’t judge too harshly, I’m a father of six young ladies, a hopeless romantic, and married to an amazing spouse who enjoys a good holiday laugh and happy ending). If you are in the Hallmark movie season, you know that it is highly likely that you’ll hear the phrase, “Why is love so complicated?”  It will be spoken just before the heartbroken young person has developed feelings for a new love interest, and is ready to dance the night away in their arms, just as the old flame walks into the party.  If you aren’t into the Hallmark holiday romances, maybe it isn’t love that you are wondering about.  Perhaps you want to know: “Why does high availability have to be so complex.

Ten Reasons That High Availability Is So ‘Gosh Darn’ Complicated:

  1. The speed of innovation

    Cloud computing, edge computing, hyper converged, multi-cloud, containers, and machine learning are changing the landscape of enterprise availability at a blistering pace.  By conservative estimates, AWS currently has over 175 services, and “provides a highly reliable, scalable, low-cost infrastructure platform in the cloud that powers hundreds of thousands of businesses in 190 countries around the world.” Choosing an HA solution that allows consistent management across all of these environments, with infrastructure and application awareness is an important way to reduce complexity.

  2. Randomness of disasters

    Someone once said, “make your solution disaster proof, and the universe will build a better disaster.”  Not only are we seeing innovations in the realm of technology, but also in the world of disasters. Resource starvation, cooling system disasters, natural disasters, power grid failures, and a host of new and random disasters often make it harder to insulate the entirety of your enterprise. Last year’s solutions will likely need updates to handle this year’s unprecedented outages. It’s important to work with a vendor that has focused on high availability for many years – who has firsthand experience with finding solutions to the randomness of disasters.

  3. Application complexity

    As technology moves head in the realm of virtualization and cloud computing, applications are following suite. As these application vendors add new options to take advantage of the cloud, they are also adding additional complexity.  Your applications should be protected by solutions designed for higher availability and clustering in AWS, Azure, GCP or other environments.  Look for vendors who provide greater application awareness, understanding of best practices, and who deliver availability solutions architected to taking account of how the application may have been architected and are able to optimize the application’s orchestration in the cloud.

  4. Advances in threats

    The threats to your enterprise also impact your availability. Systems have always had to handle the attacks from intruders, hackers, and even the self-inflicted.  These attacks have become more sophisticated, and the solutions and methods to avoid being victimized often impact the layout, architecture, and software that is deployed within your organization. This software has to “play nice” with your availability solution and your applications. As VP of Customer Experience for SIOS Technology, I have seen how an overly aggressive virus scanner can impact your application and your availability solution.  Ensure you understand the impact of your security systems on your HA/DR environment and choose a HA solution that works with, not against your security goals.

  5. Regulatory requirements

    Data breaches impact the architecture for your application, hypervisor and environment, but so too does the regulatory requirements.  Businesses that have become global now have to make sure they are compliant with data handling regulations in multiple countries.  This can impact what region your solutions can be deployed in, and how many zones you can use for redundancy.  Additional, regulatory requirements can also impact the teams that can support your organization which may impact the choices for your availability software and support.

  6. Shrinking windows

    In the world of 24/7 searches, shopping, gaming, banking, and research the windows are shrinking.  Queries must run faster and take less time.  Responses have to be quicker and have better data.  This means that the allowable downtime for your environment is shrinking faster than you previously imagined.  It also means that maintenance windows are tighter, packed, and have to be optimized and highly coordinated. Work with an HA vendor that can provide guidance on optimizing your cluster configuration for both application performance and fast recovery time.

  7. Increasing competitive pressure

    I grew up in a small town. The hardware store had one competitor. The grocery store had one competitor. The bookstore, antique shop, car dealership, rental office, and bank all had one competitor. Today, you have thousands upon thousands of competitors who want nothing more than to see your customers in their checkout carts. This competition impacts the complexity of your entire business. It weighs heavily on what can and cannot be done in maintenance windows, with upgrades, and at what speed you innovate. Environments that may have been refreshed once every five years have moved to the cloud where optimizations and advancements in processor speed and memory can be had in seconds or minutes. Systems that once had a single run book covering a simple list of applications now look closer to “War and Peace” and cover the growing number of processes, products, services and intelligence being added to increase profits while simultaneously working to reduce risks and downtime.

  8. High availability solution costs

    We all wish we had an unlimited budget, but the reality between what you have available is sometimes somewhere between a little and not enough.  Teams are often forced to balance consumption versus fixed cost, license costs for applications on the standby clusters, and associated costs for availability software.  Enterprise licenses often add a ‘tough to swallow’ price tag for a standby server in an availability environment.  Architecting an availability solution is never free, even if you are a hard core ‘DIY’ team.  DIY comes with additional costs in maintenance, management, source control, testing, deployment, version management and version control, patches, and patch management.  While your team of experts may be clearly up for the challenge, your business likely would prefer their highly valued talents be applied to creating more revenue opportunities.

  9. Business growth

    Growth of your business due to innovation means that your teams are now responsible for more critical applications, more sites, more offices, and more data that needs to be accessible and highly available. As your business grows and thrives the challenges that come with scaling up and scaling out add to the complexities mentioned previously, but also just expand what you have to prepare and plan for.

  10. Team turnover

    The complexity of the environments, speed of innovation, growth of your business, advances in the application tier, and growth in the competitive landscape brings with it the challenge of retaining top talent to keep your infrastructure running smoothly.  Most companies understand that availability is a merger of people, process, product, and architecture among other things. So finding ways to reduce the complexity of clustering environments with automated configuration, documented run books, leveraging products with consistent HA strategies across the infrastructure is a key to both retaining the talent that installs and manages your infrastructure, and mitigating the risks and heavy lifting of those responsible for the key components of availability.

Let’s face it, love takes hard work, good communication, time, investment, skill and determination. There are no shortcuts to a successful relationship.  The same can be said about achieving the best outcomes in an ever emerging, increasingly complex, and fluid technology space within your enterprise.  Availability, clustering, disaster recovery and up time is so ‘gosh darn’ hard because it requires a serious, dedicated, non-stop top to bottom cultural shift accounting for the speed of innovation, the complexity of applications and orchestration, competition and growth, and the other components of keeping applications, databases, and critical infrastructure available to those who need them, when they need them.

-Cassius Rhue, Vice President, Customer Experience

Reproduced from SIOS

Filed Under: Clustering Simplified Tagged With: Application availability, disaster recovery, High Availability, high availability - SAP, SQL Server High Availability

How to Understand & Respond to Availability Alerts

January 29, 2021 by Jason Aw Leave a Comment

Understand & Respond to Availability Alerts

Houston We Have a Problem (or How to Understand & Respond to Availability Alerts)

A Successful Failure

Houston we have a problem!  It is an iconic line that reminds countless space buffs and movie fans about the great difficulty, potential disaster, and the perilous state of the Apollo 13 space mission – a mission NASA now calls “A Successful Failure.”  Ignoring your own application availability alerts may not go down in history as a defining moment, but can also wreak similar havoc

Now back to 1970:

“A routine stir of an oxygen tank ignited damaged wire insulation inside it, causing an explosion that vented the contents of both of the Service Module’s (SM) oxygen tanks to space. Without oxygen, needed for breathing and for generating electric power, the SM’s propulsion and life support systems could not operate. The Command Module’s (CM) systems had to be shut down to conserve its remaining resources for reentry, forcing the crew to transfer to the Lunar Module (LM) as a lifeboat. With the lunar landing canceled, mission controllers worked to bring the crew home alive.”

An explosion of oxygen tanks triggered alarms, warnings, pressure and voltage drops, interrupted communications, and then the now famous radio communication between the astronauts and Mission Control.  But what if, after the explosion, the crew did nothing? What if they never checked on the explosion, never responded to the warnings and gauges, and never informed Mission Control of there being an issue?  What if Mission Control, after being notified or alerted back at their dashboard in the control center, never attempted to provide any assistance?  What if the team buried their heads in the sand, or resigned themselves to fate and chance, never tried to learn, improvise, or improve from the failure they encountered?  The result would have been tragic!  It may have made it to a documentary, but hardly a blockbuster movie featuring an iconic line.

What Do You Do When an Alert is Triggered in Your Environment?

Space walks are a far cry from our own day to day activities, unless of course you work for NASA, but recent blogs on Apollo 13 do spark a question applicable to availability.  What do you do when there is an alert triggered in your environment? Do you just ignore it?  Do you downplay it, waiting to see if the alerts, log messages, or other indicators will just go away?  Do you contact your vendor support to understand how you can disable these alerts, warnings, and messages?  Or do you say, “We have a problem here and we need to work it out”?

As a VP of Customer Experience at SIOS Technology Corp. we have experienced both sides of alerts and indicators.  We have painstakingly walked with customers who chose to ignore warnings, turning off critical alerts that indicated issues, ranging from application thresholds to network instability to potential data inconsistency.  And we have also seen customers who have tuned into their alerts, investigated why their alarms were going off, uncovered the root cause and enjoyed the fruit of their labor.  This fruit is most often the sweet reward of improved stability, innovation and learning, or an averted disaster.

4 things you can do when you your availability product triggers an alert

1. Determine if the type and criticality of the availability alert.

Is the alert or error indicative of a warning, an error, or a critical issue? A good place to assist you and your team with understanding criticality is to consult with available documentation. Check the product documentation, online forums, knowledge base articles (KBA), and internal team data and process manuals.

2. Assess the immediacy of the alert. 

For warnings and errors, how likely are they to progress into a critical issue or event.  For critical issues and alerts, this may be obvious but an assessment, even of critical events will provide some guidance on your next steps; self-correction, issue isolation, or immediate escalation.

3. Consult additional sources. 

What other sources can you access to make a determination about the alert condition? For example, if the alert is storage related, are there other tools that can expose the health of your storage?  If the issue is a network alert, are there hypervisor tools, traffic tools, NIC statistics, or other specialized monitoring tools deployed to help with analysis.

4. Contact support.

In other words, if you are unsure, alert Mission Control. After determining the type, assessing the immediacy, and consulting additional sources, it is a good idea to contact your vendor for support.  A warning about a threshold for API calls may seem innocent. But if the API calls will fail once such a limit is reached, this could be cause for immediate action. Getting the authority of the specialist can be helpful in keeping peace of mind and avoiding disaster.

An experienced vendor like SIOS can help you quickly identify the causes of problems and recommend the best solution.

Repeatedly ignoring problems in your availability environment can lead to unexpected, but no less devastating results. Addressing the problems indicated by alerts, log messages, warning indicators, or other installed and configured indicators gives your customers, your business, your teams, and yourself the “opportunity to solve the problems,” before it becomes a disaster. And at the same time, strengthens your availability strategy and infrastructure.  Which will you choose?

–  Cassius Rhue, VP, Customer Experience

Reproduced from SIOS

Filed Under: Clustering Simplified Tagged With: Application availability, application monitoring, disaster recovery, High Availability, high availability - SAP, SQL Server High Availability

5 Signs That It Will Take More Than A Blog Post To Fix Your High Availability

December 8, 2020 by Jason Aw Leave a Comment

5 signs that it will take more than a blog post to fix your high availability

5 Signs That It Will Take More Than A Blog Post To Fix Your High Availability

The signs are there. The warning lights are flashing.  In your gut, you can sense it. Maybe you can’t sleep.  Your problems with high availability are deep. But, maybe you are not quite sure.

1. If you think your cloud SLA is all you need for high availability

Cloud solutions have provided great advancements in increased hardware availability and resilience. However, application high availability requires more than just selecting the right hypervisor or cloud provider. Your strategy for high availability cannot stop with the SLA provided by the cloud or a virtualization provider. As quoted by Wired, “The almost four-day Amazon outage of April 2011 did not breach Amazon’s EC2 SLA, which as a FAQ explains, “guarantees 99.95% availability of the service within a Region over a trailing 365 period.” In this DZone article, our own David Bermingham breaks down the differences between cloud SLAs and application availability in detail. If you want a highly available infrastructure, it must include monitoring, recovery, and resilience at the data and application layers as well.

2. If you are just using the high availability clustering that came with your open source operating system

If so, then chances are you didn’t select your database based on what was bundled with the OS, so why would you select your HA solution based on that criteria alone. Bundled tools go a long way in providing extra assurance, possibilities, and capabilities. However, despite the ease of access, bundled tools and OS clustering software are not always capable of meeting your SLA, RPO, RTO, and availability requirements. If your enterprise has a combination of Operating Systems, your team will likely need help navigating different tools and understanding how they integrate together. It’s kind of like choosing the hedge clippers and push reel mower left on the curb to shape “Azalea” on the 13th hole par 5 (at Augusta). Both lawn mowers are designed to cut grass but how much time do you have? How are you going to handle the complexity? Which would you trust? Your strategy for high availability requires more than just considering the conveniences of what is bundled with the OS, otherwise, you’d be running MySQL instead of SAP HANA.

3. If you think that enterprise application licensing, such as SQL Enterprise or Oracle Enterprise, is the same thing as enterprise high availability

In addition to increased cost, many enterprise application licenses also increase the ability of the application to recover in some high availability scenarios. However, it is highly unlikely that your entire enterprise is based on a single application. Your high availability is going to require more than just a highly available database solution. You’ll need an enterprise grade application monitoring and recovery solution with a breadth of support for all of your applications and databases. In addition, you’ll need the ability to manage and replicate not just database data, but critical application and configuration data as well. Availability for a single database or a simple application is one thing – but HA for a complex, multipart application and supporting database is very different. More services, more parts that need to be coordinated, more complex architecture to orchestrate, more specific best practices to adhere to before, during and after failover/switchover. More than what your enterprise license paid for.

4. If your downtime is growing and your uptime is shrinking

The pace of life is ever increasing in many fields. When was the last time your team recovered from backup, manually restarted the applications that were deemed critical, or restarted a set of failed virtual machines or nodes? The pace of your outage events cannot continue to outpace sustainability, or your team’s ability to move beyond firefighting to fire prevention and fire proofing. “You can only run so hard so long (Carey Nieuwhof).” For some of you, you’ve been firefighting for too long, and your outages are becoming more common than your up-time.

5. If your first failover test was on the production server

A recent client remarked that it is simply impossible to test for every possible disaster scenario. As new software is created, deployed, updated, and patched the challenges in higher availability are increasing. But, your live, production data is not the place to find out what does not play well together. And while Go-Live and Post-Go-Live will always have their share of surprises, the inability to actually failover and run on the backup node should not be one of them.

Scouring blogs can provide you with helpful tips and insights to define, redefine, and improve your higher availability. But, if the warning signs are going off that you’ve traded true availability for some semblance of ‘just enough’, then it will take more than a blog post, or scouring every blog post in the availability world for that matter, to fix your HA.

– Cassius Rhue, Vice President, Customer Experience

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: Amazon AWS, Application availability, application monitoring, High Availability, high availability - SAP, SQL Server High Availability

9 Signs You Have an Application Availability Problem

November 27, 2020 by Jason Aw Leave a Comment

9 Signs You Have an Application Availability Problem

9 Signs You Have an Application Availability Problem

You’ve heard the saying “recognizing a problem is the first step in solving it.”  But, many small, medium, and surprisingly, even large enterprise businesses aren’t aware that their application availability isn’t what it should be.

Read on for these nine signs that you still have an application availability problem:

1. You spend more time restarting an application than using it

Application crashes may be a fact of life, but if your application is down more often than it is up, that is a problem.

2. You’ve started to snooze through the alert storm in your inbox or control center

You have deployed alerts for application or server downtime, but the alert storm has so overwhelmed your inbox that you have silenced them all.

3. You have one data center for all your critical operations

A single data center for operations may sound convenient, but one well intended but misdirected construction crew has been known to turn single data centers into costly unavailability zones.

4. Your idea of data protection involves backup retrieval and archives

Your data protection strategy is critical. Data replication technology and site to site, region to region replication has become a mainstay, so if your replication or data protection strategy is non-existent or involves a lengthy jog to the vault this could be a big problem.

5. Your recovery procedures always require manual intervention

Manual intervention itself is not a problem. Some events are so difficult and complex that some amount of manual effort could be required.  But, if manual intervention is always the first, second and third order of business after a server or application outage, that is a problem.

6. Your RTO is measured in days not hours or minutes

How are you measuring your recovery time objective (RTO)? Do you measure your RTO in days or hours instead of minutes per month?  True, every business has a tolerance level for their RTO.  However, your RTO should not be a function of server rebuilds and gross instabilities in your architecture.

7. You don’t know your RPO because your standby is never reliably in sync

You’ve checked the box on reliable monitoring and recovery of your application, and taken it a step further to provide a standby cluster ready system.  Great job.  But, before I let you off the hook, what is your recovery point objective (RPO)? An RPO should be something more accurate than “somewhere between day 0 and last night.”

8. Single points of failure don’t just exist, they are the norm

Where are your single points of failure?  Your budget may not allow you to eliminate every single point of failure, but if you can identify a single point of failure in every major category and every critical component of your enterprise…

9. Your last disaster made local, regional, or national news 

If the last major storm, grid failure, or failure event put a blight on your business due to downtime, then higher availability is the next order of business.

Downtime costs your business in terms of customers, productivity, and peace of mind.  Unaddressed risks have a definite impact on your business and reputation.  If these warning signings are there, you may have an availability problem.  And, if you ignore them you’ll likely have even bigger problems soon thereafter, hence the importance of application availability.

— Cassius Rhue, VP, Customer Experience

Reproduced with permission from SIOS

Filed Under: Clustering Simplified Tagged With: Amazon AWS, Application availability, application monitoring, High Availability, high availability - SAP, SQL Server High Availability

  • 1
  • 2
  • 3
  • Next Page »

Recent Posts

  • Video: The SIOS Advantage
  • Demo Of SIOS DataKeeper For A Three-Node Cluster In AWS
  • 2023 Predictions: Data Democratization To Drive Demand For High Availability
  • Understanding the Complexity of High Availability for Business-Critical Applications
  • Epicure Protects Business Critical SQL Server with Amazon EC2 and SIOS SANLess Clustering Software

Most Popular Posts

Maximise replication performance for Linux Clustering with Fusion-io
Failover Clustering with VMware High Availability
create A 2-Node MySQL Cluster Without Shared Storage
create A 2-Node MySQL Cluster Without Shared Storage
SAP for High Availability Solutions For Linux
Bandwidth To Support Real-Time Replication
The Availability Equation – High Availability Solutions.jpg
Choosing Platforms To Replicate Data - Host-Based Or Storage-Based?
Guide To Connect To An iSCSI Target Using Open-iSCSI Initiator Software
Best Practices to Eliminate SPoF In Cluster Architecture
Step-By-Step How To Configure A Linux Failover Cluster In Microsoft Azure IaaS Without Shared Storage azure sanless
Take Action Before SQL Server 20082008 R2 Support Expires
How To Cluster MaxDB On Windows In The Cloud

Join Our Mailing List

Copyright © 2023 · Enterprise Pro Theme on Genesis Framework · WordPress · Log in