June 7, 2022 |
How Workloads Should be Distributed when Migrating to a Cloud EnvironmentHow Workloads Should be Distributed when Migrating to a Cloud EnvironmentDetermining how Workloads (nodes) should be distributed is a common topic of discussion when migrating to the public cloud with High Availability in mind. If workloads are located within an on-premise environment, more often than not the locations of these workloads are defined by the location(s) of established datacenters. In many cases choosing another location in which to host a workload is not an available option. With a public cloud offering there are a wide range of geographical regions as well as availability zones to choose from. An Availability Zone is generally analogous to one or more datacenters (physical locations) being located in the same physical region (e.g., in California). These datacenters may be located in different areas but are connected using high-speed networks to minimize connection latency between them. (Note that hosting services across several datacenters within an availability region should be transparent to the user). As a general rule, the greater the physical distance between workloads, the more resilient the environment becomes. It’s a reasonable assumption that natural disasters such as earthquakes won’t affect different regions at the same time (e.g., both U.S. west coast and east coast at the same time). However, there is still a chance of experiencing service outages across different regions simultaneously due to system-wide failures (some cloud providers have previously reported simultaneous cross-region outages such as in the US & Australia). It may be appropriate to consider creating a DR (disaster recovery) plan defined across different cloud providers. Another perspective worthy of consideration is the cost to protect the resources. Generally the greater the distance between workloads, the more costs are incurred for data transfer. In many cases, data transfer between nodes within the same datacenter (Availability Zone) is free while it might costs $0.01/GB or more to transfer data across Availability Zones. This additional cost might be doubled (or more) when data is transferred across regions (i.e. $0.02 / GB). In addition, due to the increased physical distance between workloads, greater data latency between nodes should be anticipated between locations. Through consideration of these factors, generally speaking, it is recommended to distribute workloads across Availability Zones within the same Region. Reproduced with permission from SIOS
|
June 3, 2022 |
Benefits of SIOS Protection Suite/LifeKeeper for LinuxBenefits of SIOS Protection Suite/LifeKeeper for Linux
Reproduced with permission from SIOS |
May 29, 2022 |
SIOS Protection Suite/LifeKeeper for Linux – Integrated ComponentsSIOS Protection Suite/LifeKeeper for Linux – Integrated ComponentsSIOS Protection Suite includes the following software components to protect an organization’s mission-critical systems. SIOS LifeKeeperSIOS LifeKeeper provides a complete fault-resilient software solution that enables high availability for servers, file systems, applications, and processes. LifeKeeper does not require any customized, fault-tolerant hardware. LifeKeeper simply requires two or more systems to be grouped in a network and site-specific configuration data is then created to provide automatic fault detection and recovery. In the case of a failure, LifeKeeper migrates protected resources from the failed server to a designated standby server. Users experience a brief interruption during the actual switchover. However, LifeKeeper restores operations on the standby server without operator intervention. SIOS DataKeeperSIOS DataKeeper provides an integrated data replication capability for LifeKeeper environments. This feature enables LifeKeeper resources to operate in shared and non-shared storage environments. Application Recovery Kits (ARKs)Application Recovery Kits (ARKs) include tools and utilities that allow LifeKeeper to manage and control a specific application or service. When an ARK is installed for a specific application, LifeKeeper is able to monitor the health of the application and automatically recover the application if it fails. These Recovery Kits are non-intrusive and require no changes within the application in order for it to be protected by LifeKeeper. There is a comprehensive library of ‘off-the-shelf’ Application Recovery Kits available as part of the SIOS Protection Suite portfolio. The types and quantity of ARKs supplied vary based on the edition of SIOS Protection Suite purchased. Reproduced with permission from SIOS |
May 25, 2022 |
High Availability, RTO, and RPOHigh Availability, RTO, and RPOHigh availability (HA) is an information technology term that refers to a computer software or component that is operational and available for more than 99.99% of the time. End users of an application, or system, experience less than 52.5 minutes per year of service interruption. This level of availability is typically achieved through the use of high availability clustering, a configuration that reduces application downtime by eliminating single points-of-failure through the use of redundant servers, networks, storage, and software. What are recovery time objectives (RTO) and recovery point objectives (RPO)?In addition to 99.99% availability time, high availability environments also meet stringent recovery time and recovery point objectives. Recovery time objective (RTO) is a measure of the time elapsed from application failure to restoration of application operation and availability. It is a measure of how long a company can afford to have that application down. Recovery point objectives (RPO) are a measure of how up-to-date the data is when application availability has been restored after a downtime issue. It is often described as the maximum amount of data loss that can be tolerated when a failure happens. SIOS high availability clusters deliver an RPO of zero and an RTO of minutes. What is a high availability cluster?In a high availability cluster, important applications are run on a primary server node, which is connected to one or more secondary nodes for redundancy. Clustering software, such as SIOS LifeKeeper, monitors clustered applications and dependent resources to ensure they are operational on the active node. System level monitoring is accomplished via intervallic heartbeats between cluster nodes. If the primary server fails, the secondary server initiates recovery after the heartbeat timeout interval is exceeded. For application level failures, the clustering software detects that an application is not available on the active node. It then moves the application and dependent resources to the secondary node(s) in a process called a failover, where operation continues and meets stringent RTOs. In a traditional failover cluster, all nodes in the cluster are connected to the same shared storage, typically a storage area network (SAN). After a failover, the secondary node is granted access to the shared storage, enabling it to meet stringent RPOs. Reproduced with permission from SIOS
|
May 21, 2022 |
SIOS Protection Suite for Linux Evaluation Guide for AWS Cloud EnvironmentsSIOS Protection Suite for Linux Evaluation Guide for AWS Cloud EnvironmentsGet Started Evaluating SIOS Protection Suite for Linux in AWSUse this step-by-step guide to configure and test a two-node cluster in AWS to protect resources such as Oracle, SQL Server, PostgreSQL, NFS, SAP, and SAP HANA. Before You Begin Your EvaluationReview these links to understand key concepts you’ll need before you begin your failover clustering project in AWS.
Configuring Network ComponentsThis section outlines the computing resource required for each node, the network structure and the process required to configure these components.
Creating an Instance on AWS EC2 from Scratch
Configure Linux Nodes to Run SIOS Protection Suite for LinuxInstall SIOS Protection Suite for LinuxLogin and Basic ConfigurationProtecting Critical Resources
Once the IP resource is protected, initiate a switchover (where the “standby” node becomes the “active” node) to test the functionality. |