Clustering Simplified Archives - Page 61 of 104

Fifty Ways to Improve Your High Availability

April 5, 2021 by Jason Aw Leave a Comment

Fifty Ways to Improve Your High Availability

I love the start of another year. Well, most of it. I love the optimism, the mystery, the potential, and the hope that seems to usher its way into life as the calendar flips to another year. But, there are some downsides with the turn of the calendar. Every year the start of the New Year brings ‘____ ways to do_____. My inbox is always filled with, “Twenty ways to lose weight.” “Ten ways to build your portfolio.” “Three tips for managing stress.” “Nineteen ways to use your new iPhone.” The onslaught of lists for self improvement, culture change, stress management, and weight loss abound, for nearly every area of life and work, including “Thirteen ways to improve your home office.” But, what about high availability? You only have so much time every week. So how do you make your HA solution more efficient and robust than ever. Where is your list? Here it is, fifty ways to make your high availability architecture and solution better:

Get more information from the cluster faster
Set up alerts for key monitoring metrics
Add analytics. Multiply your knowledge
Establish a succinct architecture from an authoritative perspective
Connect more resources. Link up with similar partners and other HA professionals
Hire a consultant who specializes in high availability
100x existing coverage. Expand what you protect
Centralize your log and management platforms
Remove busywork
Remove hacks and workarounds
Create solid repeatable solution architectures
Utilize your platforms: Public, private, hybrid or multi-cloud
Discover your gaps
Search for Single Points of Failure (SPOFs)
Refuse to implement incomplete solutions
Crowdsource ideas and enhancements
Go commercial and purpose built
Establish a clear strategy for each life cycle phase
Clarify decision making process
Document your processes
Document your operational playbook
Document your architecture
Plan staffing rotation
Plan maintenance
Perform regular maintenance (patches, updates, security fixes)
Define and refine on-boarding strategies
Clarify responsibility
Improve your lines of communication
Over communicate with stakeholders
Implement crisis resolution before a crisis
Upgrade your infrastructure
Upsize your VM; CPU, memory, and IOPs
Add redundancy at the zone or region level
Add data replication and disaster recovery
Go OS and Cloud agnostic
Get training for the team (cloud, OS, HA solution, etc)
Keep training the team
Explore chaos testing
Imitate the best in class architectures
Be creative. Innovation expands what you can protect and automate.
Increase your automation
Tune your systems
Listen more
Implement strict change management
Deploy QA clusters. Test everything before updating/upgrading production
Conduct root cause analysis exercises on any failures
Address RCA and Closed Loop Corrective Action reports
Learn your lesson the first time. Reuse key learnings.
Declutter. Don’t run unnecessary services or applications on production clusters
Be persistent. Keep working at it.

So, what are the ideas and ways that you have learned to increase and improve your enterprise availability? Let us know!

-Cassius Rhue, VP, Customer Experience

Reproduced from SIOS

Seven Skills That Your Team Needs if You are Going with Open Source High Availability

March 31, 2021 by Jason Aw Leave a Comment

Seven Skills That Your Team Needs if You are Going with Open Source High Availability

In the realm of High Availability (HA) there are certain important skills your team needs if you decide to go the route of open source. Open source by definition denotes software that is freely available to use.

Today, there are numerous commercial implementations of high availability clusters for many operating systems provided by vendors like Microsoft and SIOS Technology Corp. These commercial solutions provide resource monitoring, dependency management, failover and cluster policies, and some form of management prepackaged and priced. An alternative to commercial implementations are several open source options that also give companies the opportunity to provide high availability for their enterprise.

As companies continue to look for optimizations, cost savings, and potential tighter control, a growing number of companies and customers are also considering moving to open source availability solutions.

Here are seven skills that your team may need for a move to Open Source HA:

1. Coding skills

In many cases the lack of pre-packaged and bundled support for enterprise applications means that your team will need to be able to develop solutions to protect components, fix issues with bundled components, or write application connectors to ensure application awareness is properly handled. Lots of people can write scripts, but your team will need to know how to create and adhere to sound development practices and standards. The basics of this include things such as:

Design and Architecture Requirements
Design Reviews
Code / Code Reviews and Unit Tests (preferably automated)

2. Knowledge of the technology environment

Many enterprise applications require integration with multiple systems in order to provide high availability that meets the Service Level Agreements (SLA) and Service Level Objectives (SLO). Your team will require deep application awareness and knowledge of the technology environment to build protection and solutions for this integration with multiple enterprise systems. You need people who know the ins and outs of the critical applications, the technology environment for those applications, networking, hardware, hypervisors, and an understanding of the environmental and application dependencies. You’ll also need team members who understand the architecture, features, and limitations of the set of HA technologies that you intend to use from the Open Source community. Consider how much of these areas your team knows and understands:

Data passing and node communication
Node failure
Application management
System recovery and restart
Logging and messages
Data resilience and protection

3. Business process knowledge

You need someone to understand your business requirements, and the business process. Your team needs professionals who understand the enterprise’s business and the processes that drive it. Your team will need to know and understand how much budget is available to spend for developing the solution, how much risk the business is willing to take, and how to gather additional requirements that may be unspoken or unspecified.

The team will also need to know, or to hire someone who knows how to convert those business requirements into software requirements and how to manage a process that brings a minimum viable high availability solution to fruition that meets the needs of the business, the speed of the business, and fits within the processes of the business.

4. Experience with OS, Applications and Infrastructure

If you are looking to go all open, your team will need experience understanding Operating Systems, Applications and Infrastructure. You’ll need to understand the various OS release cycles, including kernel versions for Linux, updates and hotfixes for Windows. You have applications in house that need to be supported, but you’ll need to also be diligent to understand the application update cycle, their dependencies, and the intersection of applications and OS support matrices. If your environment is homogeneous, great. Otherwise, your team will need to know the differences between RHEL, RHEL derivatives, and SUSE. If you are both Linux and Windows you’ll need to know these as well. You’ll also need to understand the difference that the infrastructure will make on the application and OS combination. AWS and Azure present differences for high availability that differs from GCP, on-premise, and other hypervisors.

5. Change management capabilities

Imagine that you have the development team to create the solution, with technical and business knowledge along with a firm grasp of the OS, Infrastructure and Applications. But, getting the scripts together is just the beginning. Your team will also need change management capabilities. How will your team keep track of the code changes and versions, packages, and package locations? How will your team manage the releases of updates and changes? Your team will need to be versed in a source repository, such as git, project management tools, such as Jira, and release train proficiency. You’ll need a team that understands how to make updates to code, deliver patches and fixes, all while avoiding unwanted impact.

6. Data analytics and troubleshooting experience

When you enter the space of delivering your own HA solution your team will need analytics and troubleshooting experience. You’ll need to have resources who understand the intersection of application code, system messages, and application error logs and trace files. When a system crash occurs, you’ll have to dig deeper into the logs to troubleshoot and find the root cause, analyze the data to make recommendations, and be prepare to roll out changes (see #5 above). Don’t forget, your team will also need to know and understand what the data from these logs and trace files can tell you about the health of your environment even when there isn’t an error, failure or system crash.

7. Connections (Dev, QA, Partners, Community)

Let’s be honest, your business isn’t about delivering high availability, but if you decide to dive into the realm of open source HA you are going to need more help than just the brilliance on your team. Key to getting that additional help will be understanding where to start and then making the right connections to community developers, persons who are experts on testing, HA and application partners, and the open source community. Open forums have been really helpful, but you’ll need to double check if the response times are compliant with your SLAs and SLOs.

Using Open Source solutions is an option that many companies choose to pursue for cost concerns and a perception of flexibility, lower cost, and less risk. But, buyer beware, there may be hidden costs in the form of new skills and management, and hidden risks in terms of the open source programs you use that will be needed for any “roll your own HA solution.”

– Cassius Rhue, VP, Customer Experience

Reproduced from SIOS

Cloud Migration Best Practices for High Availability

March 25, 2021 by Jason Aw Leave a Comment

Cloud Migration Best Practices for High Availability

In 2020 we have seen more enterprises migrating more of their mission-critical applications, ERPs and databases to the cloud. However, not all of these migrations have been smooth. I have personally witnessed cloud migration projects dramatically slowed and even stopped due to a lack of planning for application availability, the complexity of retrofitting ‘DIY High Availability’, misunderstanding related to what a ‘lift and shift’ entails and unexpected costs.

There are a number of best practices, cloud checklists, and other ways for organizations to prepare for the cloud. The following best practices should be factored into every migration strategy for high availability clustering for those who have either hit pause on their 2020 cloud migration, or plan to forge ahead in 2021.

Cloud Migration Best Practices

Gather the requirements

Many organizations moving to the cloud think that the cloud is an on-premises architecture moved to the cloud. This misunderstanding in cloud migration often leads to stalls and delays when networking, storage, disk speeds, and system sizes for on-premises collide with the cloud reality. A smoother transition to cloud begins by gathering the real requirements for the infrastructure, governance and compliance, security, sizing, and related controls and resources.

Design and Document

In the design phase, the architecture of on-premises environments is mapped to the cloud environment that has been chosen for maximum availability and thoroughly documented. In this phase, as the architecture takes shape and you identify the strategy for IPs, load balancers, IOPS, and data availability. Teams need to look at how availability native to the cloud needs to be augmented with a robust application and infrastructure availability solution capable of automating complexities of the cloud. At SIOS, our experts in AWS and Azure clustering and availability work with customers to swap on-premises NFS for AWS EFS, Azure ANF, or a standalone NFS cluster tier. Additionally, a key part of the successful implementation in this phase will be documenting everything. Documentation is an often-neglected, but essential element of migration success.

Plan for High Availability

Achieving high availability in the cloud requires understanding the requirements, creating the design, and documenting a plan that lays out a strategy for achieving those requirements. A basic plan should include staffing, staff training, deploying a QA system testing, pre-production steps, deployment, post deployment validation, and on-going iterations. The best outcomes for cloud migration arise from a deliberate, planned process; not an ad hoc, break-fix approach.

Staff

How well is your team staffed for the cloud migration? Traditional help desk, client/server IT, or IT teams may not be enough for the cloud migration. If your team is new to the cloud, it may be time to consider adding more resources or professional services-based solutions. Migrating to the cloud can be taxing, tedious, and difficult without the proper insight, information, or training. Does your staff need to incorporate training related to the cloud environment? And while you are looking into training and professional services to assist your IT team, check with your vendor for training related to the availability solution. Many vendors provide flexible training for the HA solution and cloud training can be obtained with the cloud vendors or popular sites such as Udemy.

Deploy QA

The QA deployment phase is the phase in which the team executes the plans for deploying the actual systems into the cloud. Successful deployment teams validate their plans and strategy, understand the data migration process, uncover any missing dependencies, and prepare for the next step in the process, especially testing. When this step is skipped or skimped on, the once-promising migrations often stall or fail. When you reach the QA system deployment phase, your team will do the heavy lifting of the initial migration and configuration of the applications, databases, and critical data in the cloud.

Test Your High Availability

Testing in your QA environment is a critical step. These tests are not a waste of time; they are a time saver. Deploying environments in the cloud is often easier than deploying on-premises. Your QA environment can be scripted with tools like Ansible, deployed quickly as templates from the cloud marketplace or a cloned image, or deployed and built from cloud formation templates. Once deployed, disaster scenarios can be ironed out and optimized before a disaster, not in them. Test scenarios can be leveraged to identify overprovisioning, under-provisioning or bottlenecks with networking or disk speeds. A full test scenario can also be used as a part of an on-boarding strategy for new staff. Additionally, testing should be performed on snapshots and backups as well.

Deploy Production

When the testing phase completes, and your team has validated the test results, the next phase is to move from QA to pre-production, and from pre-production to go-live. The testing phase is the last phase of the heavy lifting involving final user acceptance testing, a final cutover and update of the production data, and then the users.

Review, Revise, and Repeat

A successful migration does not end once you reach the go-live phase, but continues through the lifecycle phases. In the post go-live phase of the cloud migration strategy, your team continues to review, revise, and repeat the steps from ‘Gather’ through ‘Deploy Production’. In fact, your team should repeat this process again and again, based on requirements specific to releases, application updates, security updates, related system maintenance, operating system versions, disaster recovery planning, as well as the requirements from your high availability vendor’s own best practices. The cloud platform is always evolving and adding new features, functionality, and updates that can enhance your existing HA solution and architecture. Reviewing, revising, and repeating the process will be a necessary step in successful onboarding.

In 2021 we’ll see more enterprises migrating more mission-critical applications, ERPs and databases to the cloud. A key major factor in their success will be utilizing cloud migration best practices to avoid delays and failures throughout the process. Understanding your business requirements and needs, documenting the design and plan, deploying in a QA environment with purpose built clustering solutions, and executing extensive testing before go-live will be essential. Contact SIOS Technology to understand how the SIOS Protection Suite can be included in your thoughtful cloud migration best practices.

-Cassius Rhue, VP, Customer Experience

Reproduced from SIOS

The New Normal Will Still Include High Availability

March 21, 2021 by Jason Aw Leave a Comment

The New Normal Will Still Include High Availability

The Importance Of Uptime In A Post Pandemic World

As vaccines roll into production and roll out to facilities and communities, and companies are beginning to prepare for reentry to normal. Many articles and writers, both in technical and non-technical spheres, are predicting that ‘normal’ in the post-pandemic era will look a lot different from the ‘normal’ we were used to in 2020. Experts vary. But genuinely agree that every business and type of industry will see a change in what was ‘normal’. This change will affect everything from academics to manufacturing plants to financial institutions to houses of worship. While the new normal will potentially look different than it did when we abruptly left these places in 2020, some things will still be a part of the new normal.

Four Reasons High Availability Will Still Be Included In The New Normal of 2021.

New database and application systems

Predictions abound that home delivery, home schooling, home entertainment, and even home gyms will be a booming part of the future. This boom will lead to new businesses. It means that these new businesses will deploy cloud services and applications which need to be highly available to handle the additional online shopping. Not to mention the growth in shipping, and related services from manufacturing to accounting. For these new businesses and services, downtime is not an option that they will be willing to accept. Cloud availability SLAs – which only address infrastructure availability – will need to be subsidized with application-level HA. New databases and application systems will require 99.99% availability as an essential requirement.

Existing database and application systems

The predicted boom of everything “At Home” will definitely lead to a spike in new businesses with their new databases, applications and services. But, the rise in these new businesses and upstarts will not mean that existing companies will fold their tents and vacate the space. Instead, the boom of new competitors to the various “At Home” spaces will drive an even greater urgency for existing businesses to fortify their databases and applications. The businesses that exist in this space will need to expand to keep up with competition and growth. As they expand, maintaining high availability will be a key focus for their existing database and application systems as they transition to cloud, hybrid cloud, or a multitude of hosting solutions. These existing applications will neither abandon HA in the current locations, nor consider facing a disaster in any new permutation without high availability and disaster recovery.

IoT Management systems

New at home businesses will spark growth requiring higher availability. The shipping industry will continue to expand and generate new systems requiring availability. In fact, whether the prediction of continued work-at-home boom pan out or not, the IoT boom will almost certainly come to fruition. As more customers shop online, ship what was once hand delivered, they would require more assistance with their products. These eager and anxious customers will want more ways to track and get updates on the location of the package. Additional checkpoints, check in hubs, and IoT devices would probably be added as mainstays of the new normal. At the same time, the plans will need to include making sure the additional systems are reliable and available. It is even more important to generate the currency of trust for the customer and data for the enterprise.

New Causes for Downtime

As the whole world looks forward to a return to “normal”, this return will generate new challenges and opportunities. Alongside these new technologies, and expanded deployments of database and application 2021 will experience new causes for downtime, old nemesis causing disasters, and other unexpected outages. Applications – even those with more robust monitoring and capabilities – will still experience the old nemesis of coding bugs that lead to crashes or integration issues that lead to instability or hangs. Systems, cloud or on-premise, will still be susceptible to hardware faults, human faults, the occasional simple maintenance that isn’t so simple, and forces like mother nature or the new guy with elevated privileges and reduced knowledge. Ushered in with opportunity will be new disasters that require thoughtful approaches to highly available clusters, solutions and services.

Yes, a lot will have changed for companies since things shut down in early 2020. However, the new normal that any business returns to will definitely require high availability.

Writers Note:

For many families, the post pandemic world will be drastically different with chairs that sit empty, beds that are no longer filled, and laughter and memories that ended in heart-rending pain. To all of these families, SIOS Technology Corp. extends our deepest and most heartfelt sympathies for your loss and our prayers for your comfort and grief.

– Cassius Rhue, Vice President, Customer Experience

Reproduced from SIOS

How To Build A Highly Available Server Solution?

March 16, 2021 by Jason Aw Leave a Comment

How To Build A Highly Available Server Solution?

A key component to any high availability solution is figuring out how to redirect the client traffic. Almost every user-based application needs to connect to the server. Redirecting the client traffic will allow users to connect without having to know where the application or the database actually resides.

Most solutions recommend network-based IP redirection or network based DNS redirection. This works. However, the best solution for a high availability server in our experience is the use of a virtual IP address that can be switched from one server to another. The server is listening to connections from the virtual IP address, where it’s hosted on one server today and switched to another on another day.

To take it one step further, you can automate the failover. This is where the system makes decisions and switches the application when there is a failure detected. Bear in mind this step is key to building a highly available solution.

Benefits of Buy vs. Build High Availability Solution

This can be implemented using scripts and logic to check the status of processes and virtual IP addresses from one server to another. But one of the challenges we face in a buy vs build high availability solution is how much time we really have to spend in build. This includes time for script coding, API development such as cloudwatch API or lambda functions. Let’s not forget testing, and maintenance.

When I was younger, I was eager to write that code. But after working for large Fortune 100 companies, and getting yelled at by a high level manager, when one of my scripts didn’t work at 3 am in the morning, I feel differently. This issue was exacerbated when I discovered an issue for a code I wrote a year ago. My managers wanted the highly available solution to work 100%. If it didn’t work, time to call up someone and yell at them.

SIOS Automates High Availability

Isn’t it cheaper in the long run to buy the solution and spend a little time to tweak it to fit into our setting? This is where SIOS high availability (HA) solutions come in, whatever the application or database. SIOS has the code to switch the stack of the processes from one server to another. This gives users and managers the peace of mind that comes from automating the failover orchestration and high availability.

There are two things that I love about the SIOS HA umbrella are. One, the code for the virtual IP where the IP address is added to the server and the application is restarted to listen to the connections. The second is enabled through the use of the application agnostic API set that SIOS provides. This allows anyone to protect any application by the use of plugins. Contact SIOS today to learn more about high availability solutions specific to your environment.

– Edmond Melkomian, PMP, MCSD, consultant, SIOS technology, Inc.

Reproduced from SIOS