Glossary Of Terms: Application Performance Management (APM)
Definition: The software and processes IT professionals use to monitoring and managing the performance and availability of software applications.
Reproduced from SIOS
SIOS SANless clusters High-availability Machine Learning monitoring
Definition: The software and processes IT professionals use to monitoring and managing the performance and availability of software applications.
Reproduced from SIOS
Definition: Tools designed to ensure software applications perform as expected. IT professionals use APM tools to ensure their end-users get the quality of service they expect from important business applications. In virtual environments, application monitoring tools helps administrators ensure that application servers operate within the parameters of their service-level agreements (SLAs).
Reproduced from SIOS
Author Carey Nieuwhof hooked me with a blog topic of the biggest trap for 2021. While not directly speaking to HA, the topic alone made me reflect on some of the trends of 2020. Cloud innovations are numerous and begin at the most fundamental levels of the infrastructure. Not to mention advances in AI, machine learning, compute capacity and algorithms, memory management and sharing, and a battery of others. All of these advances add up to making the current generation cloud the most robust, reliable and available data center. These centers, optimized with redundant power, cooling, a legion of IoT devices for monitoring and alerting, redundant networking, high speed interconnects, massive servers, storage, and disks are impressive– and quite possibly the biggest trap that may be looming in 2021.
The biggest trap of 2021 will be believing that cloud availability alone is the same as or enough for higher availability. This is a complex trap to dissect. The list of named advances that make up the backbone of many data centers is indeed vast and impressive, and it is only a fraction of the technological innovations that exist driving the cloud. So, what makes this massively redundant, high capacity, and AI driven infrastructure a trap? Namely, that hardware and infrastructure availability still leave your enterprise at risk.
Disks have gotten faster and more intelligent. New eye-popping advances in chip sets, access technology, manufacturing, storage capacity and raid technology means that cloud vendors are able to put up gaudy numbers for speed, access, and redundancy. This reduces the risk for single points of failure (SPOF’s) for the disk infrastructure and provides confidence that a single disk, or even a momentary loss of power to the disks will not cause a lack of availability.
The storage arrays and enclosures housed within the data center providing access to the disks have also greatly improved. No longer the big eye soar of blinking lights and Airboat sized fans, these units are small in size but loaded with capacity and performance enhancements. You’ll be hard pressed to find a modern chassis that isn’t built with redundant power, redundant disk capabilities, and able to provide near zero replication between connected storage units, even between units that are dispersed at greater distances. In addition these units have added the benefits of AI to predict failures, proactively resolve problems, and optimize workloads to reduce performance bottlenecks.
Remember when it seemed so long ago that big name manufacturers and tech prognosticators were predicting game changing technology that would reshape the landscape of the future. It seems like decades ago when people were predicting server technology advances such as: reduced footprint, faster more complex chipsets, NVMe, battery efficiencies, cooling advances, storage advancements, in-memory and persistent memory advances, GPUs and bare metal provisioning. That future has arrived and been surpassed. Servers are now accelerating the pace of cloud computing capabilities and increasing the ability of the cloud to promote redundancy, reliability and robustness.
Advances in the networking solutions, tools, software and equipment also make the list of things that make cloud availability stronger in 2020. Over the last few years, vendors have released solutions that have expanded the speed, possible topologies, capacities and distance capabilities of inter- and intra- cloud networks. Like so many other technologies, vendors are automating traffic flow and patterns using AI and Machine Learning, taking advantage of advances in manufacturing to build in device redundancy that can be leveraged for availability and reliability.
Applications are still a vulnerable part of the cloud architecture when left unprotected. Applications that are not protected by an application aware higher availability module or framework, or SIOS Application Recovery Kit (ARK) run the risk of being down at the most critical time or moment in your business lifecycle. A SIOS ARK provides the application in the cloud with critical application aware monitoring and recovery, as well as failover and disaster recovery orchestration in the event of a failure.
While numbers of databases have increased their robustness, and some have even jumped in to offer replication enhancements, these databases are still a risk on their own. Databases with replication technology still need orchestration, automation, and the intelligence to make sure that they are highly available to the application components that need them. What good is it if your database continues to hum along in your primary Region and Availability Zone, if your application has actually failed to a different Region or DR site. Supplement databases with replication, such as the SAP HANA database, with the automation and best practices of the SIOS Technology Corp HANA ARK and the SAP certified SAP S/4 HANA ARK. Protect databases that do not have replication technology, or whose technology is limited with the combination of the SIOS Protection Suite, SIOS DataKeeper for Linux and the associated ARK.
In the realm of disks and storage it can be intriguing to believe that the capacity, redundancy of software and hardware raid mean that you are highly available. However, storage is only available if it is accessible to the applications and Virtual Machines that need them. What technology do you have deployed to monitor and recover mounted cloud shares and volumes such as EFS and ANF. An unplanned downtime and its associated chaos can be as near as an unintended unmount, or offline operation by a well-intentioned user.
Hypervisor technology has made your virtual machine push button easy. Integrated cloud solutions promise to monitor if the VM is available and provide options such as restart or migrate. These solutions are not enough to cover issues with your Virtual Machine that may stall, delay, or degrade your availability. In addition to what your cloud vendor provides, you need a monitoring and availability solution that understands how to monitor the VM health such as:
A VM that runs without the ability to process applications requests may escape the eye of your cloud only monitoring, but shouldn’t escape the watchful monitoring of your higher availability solution.
Let’s get real for a moment. All the advances in data center availability, redundancy and reliability does not negate the need for eliminating your data center as a single point of failure (SPOF). As VP of Customer Experience, we have worked with a customer who deployed best in class redundancy within the private cloud data center, much like the major public cloud vendors. And if not for the high availability and data replication solution provided by SIOS Technology Corp, this customer would have experienced major downtime when a tropical storm ripped through their area taking out power, backup generators, cooling, and networking.
However, with SIOS Technology, the customer was able to preemptively failover ahead of the storm to a data center more inland. Cooling failures, construction mishaps, as well as human and natural disasters are continual reminders that a single data center isn’t the same as higher availability.
Don’t fall into the biggest trap of 2021. Make sure you have true high availability by avoiding thinking the cloud has you covered.
– Cassius Rhue, VP, Customer Experience
Reproduced from SIOS
I love the start of another year. Well, most of it. I love the optimism, the mystery, the potential, and the hope that seems to usher its way into life as the calendar flips to another year. But, there are some downsides with the turn of the calendar. Every year the start of the New Year brings ‘____ ways to do_____. My inbox is always filled with, “Twenty ways to lose weight.” “Ten ways to build your portfolio.” “Three tips for managing stress.” “Nineteen ways to use your new iPhone.” The onslaught of lists for self improvement, culture change, stress management, and weight loss abound, for nearly every area of life and work, including “Thirteen ways to improve your home office.” But, what about high availability? You only have so much time every week. So how do you make your HA solution more efficient and robust than ever. Where is your list? Here it is, fifty ways to make your high availability architecture and solution better:
So, what are the ideas and ways that you have learned to increase and improve your enterprise availability? Let us know!
-Cassius Rhue, VP, Customer Experience
Reproduced from SIOS
In the realm of High Availability (HA) there are certain important skills your team needs if you decide to go the route of open source. Open source by definition denotes software that is freely available to use.
Today, there are numerous commercial implementations of high availability clusters for many operating systems provided by vendors like Microsoft and SIOS Technology Corp. These commercial solutions provide resource monitoring, dependency management, failover and cluster policies, and some form of management prepackaged and priced. An alternative to commercial implementations are several open source options that also give companies the opportunity to provide high availability for their enterprise.
As companies continue to look for optimizations, cost savings, and potential tighter control, a growing number of companies and customers are also considering moving to open source availability solutions.
In many cases the lack of pre-packaged and bundled support for enterprise applications means that your team will need to be able to develop solutions to protect components, fix issues with bundled components, or write application connectors to ensure application awareness is properly handled. Lots of people can write scripts, but your team will need to know how to create and adhere to sound development practices and standards. The basics of this include things such as:
Many enterprise applications require integration with multiple systems in order to provide high availability that meets the Service Level Agreements (SLA) and Service Level Objectives (SLO). Your team will require deep application awareness and knowledge of the technology environment to build protection and solutions for this integration with multiple enterprise systems. You need people who know the ins and outs of the critical applications, the technology environment for those applications, networking, hardware, hypervisors, and an understanding of the environmental and application dependencies. You’ll also need team members who understand the architecture, features, and limitations of the set of HA technologies that you intend to use from the Open Source community. Consider how much of these areas your team knows and understands:
You need someone to understand your business requirements, and the business process. Your team needs professionals who understand the enterprise’s business and the processes that drive it. Your team will need to know and understand how much budget is available to spend for developing the solution, how much risk the business is willing to take, and how to gather additional requirements that may be unspoken or unspecified.
The team will also need to know, or to hire someone who knows how to convert those business requirements into software requirements and how to manage a process that brings a minimum viable high availability solution to fruition that meets the needs of the business, the speed of the business, and fits within the processes of the business.
If you are looking to go all open, your team will need experience understanding Operating Systems, Applications and Infrastructure. You’ll need to understand the various OS release cycles, including kernel versions for Linux, updates and hotfixes for Windows. You have applications in house that need to be supported, but you’ll need to also be diligent to understand the application update cycle, their dependencies, and the intersection of applications and OS support matrices. If your environment is homogeneous, great. Otherwise, your team will need to know the differences between RHEL, RHEL derivatives, and SUSE. If you are both Linux and Windows you’ll need to know these as well. You’ll also need to understand the difference that the infrastructure will make on the application and OS combination. AWS and Azure present differences for high availability that differs from GCP, on-premise, and other hypervisors.
Imagine that you have the development team to create the solution, with technical and business knowledge along with a firm grasp of the OS, Infrastructure and Applications. But, getting the scripts together is just the beginning. Your team will also need change management capabilities. How will your team keep track of the code changes and versions, packages, and package locations? How will your team manage the releases of updates and changes? Your team will need to be versed in a source repository, such as git, project management tools, such as Jira, and release train proficiency. You’ll need a team that understands how to make updates to code, deliver patches and fixes, all while avoiding unwanted impact.
When you enter the space of delivering your own HA solution your team will need analytics and troubleshooting experience. You’ll need to have resources who understand the intersection of application code, system messages, and application error logs and trace files. When a system crash occurs, you’ll have to dig deeper into the logs to troubleshoot and find the root cause, analyze the data to make recommendations, and be prepare to roll out changes (see #5 above). Don’t forget, your team will also need to know and understand what the data from these logs and trace files can tell you about the health of your environment even when there isn’t an error, failure or system crash.
Let’s be honest, your business isn’t about delivering high availability, but if you decide to dive into the realm of open source HA you are going to need more help than just the brilliance on your team. Key to getting that additional help will be understanding where to start and then making the right connections to community developers, persons who are experts on testing, HA and application partners, and the open source community. Open forums have been really helpful, but you’ll need to double check if the response times are compliant with your SLAs and SLOs.
Using Open Source solutions is an option that many companies choose to pursue for cost concerns and a perception of flexibility, lower cost, and less risk. But, buyer beware, there may be hidden costs in the form of new skills and management, and hidden risks in terms of the open source programs you use that will be needed for any “roll your own HA solution.”
– Cassius Rhue, VP, Customer Experience
Reproduced from SIOS