Cloud Archives - Page 3 of 22 - SIOS SANless clusters

SIOS Technology Provides High Availability For Critical Applications in Airports

October 22, 2023 by Jason Aw Leave a Comment

SIOS Technology Provides High Availability For Critical Applications in Airports

Failures of mission-critical systems in airports can quickly cause chaos and prove costly, which is why many airports look to implement effective failover solutions. SIOS Technology assists several large airports with high availability of critical applications and putting disaster recovery solutions in place.

In this video, Margaret Hoagland, VP of Global Sales and Marketing at SIOS Technology, discusses high availability in airports, different critical applications airports run and the consequences should they fail. Hoagland goes into detail about how SIOS Technology helps protect these critical applications.

Key highlights of this video:

Hoagland talks about some of the misconceptions around high availability, such as when an application is put in the cloud, the cloud is providing infrastructure high availability not necessarily application-level high availability. She explains how this is where SIOS Technology comes in to bridge that gap.
Since airports run various service applications, whereby everything is deadline-driven and dependent on one another, this presents a number of challenges for high availability. Hoagland discusses the sorts of airport applications they protect, such as video surveillance monitoring systems and card swipers for access control.
Hoagland explains some of the different scenarios in which SIOS Technology assists customers with disaster recovery, saying that some put systems on-premises with a node or a disaster recovery site in the cloud or they run them in the cloud and have a disaster recovery location on-prem. She talks us through some of the main challenges their customers encounter and how their application recovery kits can help with failover.
Airports are continuously running critical applications, which need high availability. Hoagland talks about the chaos that occurs should these high availability applications fail and why even seemingly non-critical applications like baggage handling systems are in fact critical and can have significant knock-on effects with other applications.
Hoagland goes into detail about how SIOS Technology’s solutions help airports maintain high availability for their systems, telling us that the application or data is run on a server that is connected to a secondary server or multiple secondary ones. She discusses how their software detects potential failures and she explains the failover process in this scenario.
SIOS Technology’s solutions are being used by several large international airports for protecting baggage handling, card swipers for security, ticketing, and reservation, and arrival and departure boards. Hoagland talks about the negative consequences of downtime in any of these critical systems.
SIOS Technology works with any industry that has highly critical systems that are essential to the success of the business, such as manufacturing and healthcare.

Reproduced with permission from SIOS

How to Protect Applications in Cloud Platforms

September 15, 2023 by Jason Aw Leave a Comment

How to Protect Applications in Cloud Platforms

Cloud platforms only protect applications from downtime caused by hardware failures. Mission critical applications require HA/DR protection regardless of the cloud environment they operate in.

When providing high availability protection, it is a general principle to ensure all components are redundant to avoid Single Points of Failure (SPOF). That is, ensure that no single element causes the entire system to stop if it fails. However, it is important to note that the operational infrastructure is hard to access in the public cloud.

In a cloud-based high availability cluster, there is a possibility that the standby node(s) will be located on the same host server, in the same rack, and using the same network switch as the operating node. Unless you configure these elements with redundancy, any of them could be a SPOF and put the application at risk for catastrophic failure.

It is necessary to ensure cluster nodes are on different cloud “regions” and “availability zones” that physically separate the data center and operational infrastructure in different geographic locations.

What are the main principles for ensuring availability in the cloud?

You cannot expect the various components that make up a physical IT infrastructure to operate according to specifications forever as parts wear out, systems become incompatible, and settings change. Although regular maintenance can reduce the risk of downtime, it’s likely that something will fail over the course of the product lifecycle.

In some rare cases, you may have a serious bug that is latent in the OS or embedded software that causes the application to stop working.

As you may have already noticed, the HA cluster configuration is exactly in line with this principle, and a single point of failure is eliminated by making the important server and its resources redundant to the active system (production system). However, it is important to remember two things: 1. the server hardware is not the only critical component and 2. other critical SPOF components may be invisible to you in a public cloud infrastructure.

Beware of the pitfalls of a single point of failure hidden in the cloud’s invisible infrastructure

Most public clouds operate in a so-called “multi-tenant” mode. That is, they run the VMs of multiple companies on the same physical host server. And with a regular contract, you can’t specify which host server your system runs on. This may cause problems as

the standby node in your cloud cluster may be placed on the same host server that operates the active node. Even if you configure an HA cluster configuration, if the host server goes down, the operating node and the standby node will both go down too. In this scenario, your cloud operator decides when and how your system will be restored.

The host server that operates the active node and the host server that operates the standby node may be in the same rack. In this case, the rack becomes a SPOF, so if a failure occurs there both the active and standby nodes under it will also fail.

Furthermore, in the upper layers of your infrastructure such as network switches that bundle multiple racks, gateways and routers, and power supply units in data centers, the operating system node and the standby system node may coexist in the same system, and if these key components aren’t redundant, then you have an inescapable single point of failure. Again, for a company that is a public cloud user, such a data center infrastructure is a black box, it may impossible to see into the detailed configuration to identify SPOFs.

Public cloud availability zones and regions should be leveraged for availability

How can we explicitly avoid hidden single points of failures in the public cloud? The most robust method is to use the “Availability Zones” and “Regions” prepared on the cloud side.

An Availability Zone is an independent physical separation of the infrastructure within your data center. And regions are independent data centers that are geographically separated. Public clouds allow you to deliberately use these Availability Zones or regions for different purposes.

By constructing an HA cluster configuration in which operating nodes and standby nodes are distributed in different availability zones across these two or more regions, almost all SPOFs can be avoided with certainty. If you adhere to these best practices, you can confidently ensure availability, DR (Disaster Recovery) and BCP (Business Continuity Planning).

Reproduced with permission from SIOS

Service Level Agreements and the Four Nines are Not Enough for High Availability in the Cloud

July 15, 2023 by Jason Aw Leave a Comment

Service Level Agreements and the Four Nines are Not Enough for High Availability in the Cloud

When most people think of high availability, they set four nines (99.99%) or less than five minutes of downtime every month as the baseline. But according to Dave Bermingham, Senior Technical Evangelist at SIOS Technology, in this TFiR video interview, high availability is more than that.

Dave argues that counting on nines is really a measurement that you might be judged against, but really trying to guarantee a level of nines is almost impossible. Because there’s so many points in that availability chain that can be a single point of failure. Four nines is certainly a great number to be judged against and to strive for, but overall it doesn’t mean a lot to have just four nines for my database server.

Effective High Availability Covers a Complex Availability Chain

Even with Cloud SLAs (Service Level Agreements), one can’t be fully rest assured as most cloud providers offer four nines on compute, which is only one part of the availability chain (along with network, storage, and the hops between). Bermingham warns, “There’s a million points of failure. So, trying to think that my cloud provider offers four nines so I’m covered, you’re kind of fooling yourself there. You have to look at the big picture and do what you can to identify those points of failures, to minimize the potential points of failure and to have a recovery plan, should something happen.”

When considering High Availability/Disaster Recovery (HA/DR), Bermingham believes the thing that causes the most visible downtime is human error. Bermingham also suggests that authorization and access to the system should also be restricted to reduce the point of failure. “You should only give access to those who absolutely need access to it and you should also ensure that they are highly trained and that you have all the things in place to help minimize potential oops.”

SIOS offers a single solution to meet both high availability and disaster recovery needs across a wide variety of operating systems (Windows, Linux), platforms, and applications, including SAP, SAP HANA, MaxDB, SQL Server, Oracle, and other environments running in SAN-based, shared storage configurations or SANless, local data storage configurations.

Reproduced with permission from SIOS