|March 7, 2019||
SIOS Technology Receives 2018 Excellence Award from Cloud Computing Magazine
SIOS DataKeeper Honored for Innovation
SAN MATEO, CA – February 20, 2019 – SIOS Technology Corp., the industry pioneer in providing intelligent application availability for critical workloads, announced today that TMC, a global, integrated media company, has named SIOS DataKeeper as a 2018 Cloud Computing Excellence Award winner, presented by Cloud Computing Magazine.
The Cloud Computing Excellence Award recognizes companies that have most effectively leveraged cloud computing in their efforts to bring new, differentiated services and solutions to market.
SIOS DataKeeper software is an important ingredient in a cluster solution that lets customers add disaster recovery protection to a Windows cluster or to create a SANless cluster for complete failover protection in environments where shared storage clusters are impossible or impractical, including any combination of physical, virtual, cloud, or hybrid cloud infrastructures. SIOS software runs business-critical applications like SAP and databases such as SQL Server, Oracle, and many others in a flexible, scalable cloud environment, such as Amazon Web Services (AWS), Azure, and Google Cloud Platform without sacrificing performance, high availability or disaster protection.
Cloud Computing Excellence Award winner
“It is a huge leap for IT to trust their central business operations to the cloud. These are big decisions, but clearly, the value of cloud is being broadly appreciated and adoption for even the most critical apps is showing that. Using SIOS high availability clustering software, customers have found they can achieve the service levels they absolutely need for their most critical applications. Their applications automatically recover from infrastructure and application failures in a matter of minutes with no loss of data, keeping data protected, and applications online in the cloud,” said Jerry Melnick, president and CEO, SIOS Technology. “We are honored to be recognized by TMC a Cloud Computing Excellence Award winner.”
“Recognizing leaders in the advancement of cloud computing, TMC is proud to announce SIOS DataKeeper as a recipient of the 8th Annual Cloud Computing Excellence Award,” said Rich Tehrani, CEO, TMC. “SIOS is being honored for their achievement in bringing innovation and excellence to the market, while leveraging the latest technology trends.”
For more information about TMC, please visit www.tmcnet.com.
About SIOS Technology Corp.
SIOS Technology Corp. makes software products that provide the insights and guidance that IT managers need to manage and protect business-critical applications in large, complex data centers. The company SIOS iQ is a machine learning analytics software that helps IT managers optimize performance, efficiency, reliability, and capacity utilization in virtualized environments. SIOS SAN and SANLess software is an essential part of any cluster solution that provides the flexibility to build Clusters Your Way™ to protect your choice of Windows or Linux environment in any configuration (or combination) of physical, virtual and cloud (public, private, and hybrid) without sacrificing performance or availability. Founded in 1999, SIOS Technology Corp. (https://us.sios.com) is headquartered in San Mateo, California, and has offices throughout the United States, United Kingdom and Japan.
SIOS, SIOS Technology, SIOS iQ, SIOS DataKeeper, SIOS Protection Suite, Clusters Your Way, SIOS PERC Dashboard, and associated logos are registered trademarks or trademarks of SIOS Technology Corp. and/or its affiliates in the United States and/or other respective owners.
|February 1, 2019||
About High Availability Applications For Business Operations – An Interview with Jerry Melnick
We are in conversation with Jerry Melnick, President & CEO, SIOS Technology Corp. Jerry is responsible for directing the overall corporate strategy for SIOS Technology Corp. and leading the company’s ongoing growth and expansion. He has more than 25 years of experience in the enterprise and high availability software markets. Before joining SIOS, he was CTO at Marathon Technologies where he led business and product strategy for the company’s fault tolerant solutions. His experience also includes executive positions at PPGx, Inc. and Belmont Research. There he was responsible for building a leading-edge software product and consulting business focused on supplying data warehouse and analytical tools.
Jerry began his career at Digital Equipment Corporation. He led an entrepreneurial business unit that delivered highly scalable, mission-critical database platforms to support enterprise-computing environments in the medical, financial and telecommunication markets. He holds a Bachelor of Science degree from Beloit College with graduate work in Computer Engineering and Computer Science at Boston University.
What is the SIOS Technology survey and what is the objective of the survey?
SIOS Technology Corp. with ActualTech Media conducted a survey of IT staff to understand current trends and challenges related to the general state of high availability applications in organizations of all sizes. An organization’s HA applications are generally the ones that ensure that a business remains in operation. Such systems can range from order taking systems to CRM databases to anything that keeps employees, customers, and partners working together.
We’ve learned that the news is mixed when it comes to how well high availability applications are supported.
Who responded to the survey?
For this survey, we gathered responses from 390 IT professionals and decision makers from a broad range of company sizes in the US. Respondents managed databases, infrastructure, architecture, systems, and software development as well as those in IT management roles.
What were some of the key findings uncovered in the survey results?
The following are key findings based on the survey results:
Tell us about the Enterprise Application Landscape. Which applications are in use most; and which might we be surprised about?
We focused on tier 1 mission critical applications, including Oracle, Microsoft SQL Server, SAP/HANA. For most organizations operating these kinds of services, they are the lifeblood. They hold the data that enables the organization to achieve its goals.
56% of respondents to our survey are operating Oracle workloads while 49% are running Microsoft SQL Server. Rounding out the survey, 28% have SAP/HANA in production. These are all clearly critical workloads in most organizations, but there are others. For this survey, we provided respondents an opportunity to tell us what, beyond these three big applications, they are operating that can be considered mission critical. Respondents that availed themselves of this response option indicate that they’re also operating various web databases, primarily from Amazon, as well as MySQL and PostgresQL databases. To a lesser extent, organizations are also operating some NoSQL services that are considered mission critical.
How often does an application performance issue affect end users?
Application performance issues are critical for organizations. 98% of respondents indicating these issues impact end users in some way ranging from daily (experienced by 18% of respondents) to just one time per year (experience by 8% of respondents) and everywhere in between. Application performance issues lead to customer dissatisfaction and can lead to lost revenue and increased expenses. But, there appears to be some disagreement around such issues depending on your perspective in the organization. Respondents holding decision maker roles have a more positive view of the performance situation than others. Only 11% of decision makers report daily performance challenges compared to around 20% of other respondents.
Is it easier to resolve cloud-based application performance issues?
Most IT pros would like to fully eliminate the potential for application performance issues that operate in a cloud environment. But the fact is that such situations can and will happen. There is a variety of tools available in the market to help IT understand and address application performance issues. IT departments have, over the years, cobbled together troubleshooting toolkits. In general, the fewer tools you need to work with to resolve a problem, the more quickly you can bring services back into full operation. That’s why it’s particularly disheartening to learn that only 19% of responses turn to a single tool to identify cloud application performance issues. This leaves 81% of respondents having to use two or more tools. But, it gets worse. 11% of respondents need to turn to five or more tools in order to identify performance issues with the cloud applications
So now we know cloud-based application performance issues can’t be totally avoided, how long until we can expect a fix?
The real test of an organization’s ability to handle such issues comes when measuring the time it takes to recover when something does go awry. 23% of respondents can typically recover in less than an hour. Fifty-six percent (56%) of respondents take somewhere between one and three hours to recover. After that 23% take 3 or more hours. This isn’t to say that these people are recovering from a complete failure somewhere. They are reacting to a performance fault somewhere in the application. And it’s one that’s serious enough to warrant attention. A goal for most organizations is to reduce the amount of time that it takes to troubleshoot problems. This will reduce the amount of time it takes to correct them.
Do future plans about moving HA applications to the cloud show stronger migration?
We requested information from respondents around their future plans as they pertain to moving additional high availability applications to the cloud. Nine percent (9%) of respondents indicate that all of their most important applications are already in the cloud. By the end of 2018, one-half of respondents expect to have more than 50% of their HA applications migrated to the cloud. While 29% say that they will have less than half of the HA applications in such locations. Finally, 12% of respondents say that they will not be moving any more HA applications to the cloud in 2018.
How would you sum up the SIOS Technology survey results?
Although this survey and report represent people’s thinking at a single point in time, there are some potentially important trends that emerge. First, it’s clear that organizations value their mission-critical applications, as they’re protecting them via clustering or other high availability technology. A second takeaway is that even with those safeguards in place, there’s more work to be done, as those apps can still suffer failures and performance issues. Companies need to look at the data and ask themselves. Therefore, if they’re doing everything they can to protect their crucial assets. You can download the report here.
Contact us if you would like to enjoy High Availability Applications in your project.
Reproduced from Tech Target
|January 30, 2019||
Ensure High Availability for SQL Server on Amazon Web Services
Database and system administrators have long had a wide range of options for ensuring that mission-critical database applications remain highly availability. Public cloud infrastructures, like those provided by Amazon Web Services, offer their own, additional high availability options backed by service level agreements. But configurations that work well in a private cloud might not be possible in the public cloud. Poor choices in the AWS services used and/or how these are configured can cause failover provisions to fail when actually needed. This article outlines the various options available for ensuring High Availability for SQL Server in the AWS cloud.
For database applications, AWS gives administrators two basic choices. Each of which has different high availability (HA) and disaster recovery (DR) provisions: Amazon Relational Database Service (RDS) and Amazon Elastic Compute Cloud (EC2).
RDS is a fully managed service suitable for mission-critical applications. It offers a choice of six different database engines, but its support for SQL Server is not as robust as it is for other choices like Amazon Aurora, My SQL and MariaDB. Here are some of the common concerns administrators have about using RDS for mission-critical SQL Server applications:
Elastic Compute Cloud
The other basic choice is the Elastic Compute Cloud with its substantially greater capabilities. This makes it the preferred choice when HA and DR are of paramount importance. A major advantage of EC2 is the complete control it gives admins over the configuration, and that presents admins with some additional choices.
Picking The Operating System
Perhaps the most consequential choice is which operating system to use: Windows or Linux. Windows Server Failover Clustering is a powerful, proven and popular capability that comes standard with Windows. But WSFC requires shared storage, and that is not available in EC2. Because Multi-AZ, and even Multi-Region, configurations are required for robust HA/DR protection, separate commercial or custom software is needed to replicate data across the cluster of server instances. Microsoft’s Storage Spaces Direct (S2D) is not an option here, as it does not support configurations that span Availability Zones.
The need for additional HA/DR provisions is even greater for Linux, which lacks a fundamental clustering capability like WSFC. Linux gives admins two equally bad choices for high availability: Either pay more for the more expensive Enterprise Edition of SQL Server to implement Always On Availability Groups; or struggle to make complex do-it-yourself HA Linux configurations using open source software work well.
Both of these choices undermine the cost-saving rationale for using open source software on commodity hardware in public cloud services. SQL Server for Linux is only available for the more recent (and more expensive) versions, beginning in 2017. And the DIY HA alternative can be prohibitively expensive for most organizations. Indeed, making Distributed Replicated Block Device, Corosync, Pacemaker and, optionally, other open source software work as desired at the application-level under all possible failure scenarios can be extraordinarily difficult. Which is why only very large organizations have the wherewithal (skill set and staffing) needed to even consider taking on the task.
Owing to the difficulty involved implementing mission-critical HA/DR provisions for Linux, AWS recommends using a combination of Elastic Load Balancing and Auto Scaling to improve availability. But these services have their own limitations that are similar to those in the managed Relational Database Service.
All of this explains why admins are increasingly choosing to use failover clustering solutions designed specifically for ensuring HA and DR protections in a cloud environment.
Failover Clustering Purpose-Built for the Cloud
The growing popularity of private, public and hybrid clouds has led to the advent of failover clustering solutions purpose-built for a cloud environment. These HA/DR solutions are implemented entirely in software that creates, as implied by the name, a cluster of servers and storage with automatic failover to assure high availability at the application level.
Most of these solutions provide a complete HA/DR solution that includes a combination of real-time block-level data replication, continuous application monitoring and configurable failover/failback recovery policies. Some of the more sophisticated solutions also offer advanced capabilities like support for Always on Failover Cluster Instances in the less expensive Standard Edition of SQL Server for both Windows and Linux. They also offer WAN optimization to maximize multi-region performance. There’s also manual switchover of primary and secondary server assignments to facilitate planned maintenance. Including the ability to perform regular backups without disruption to the application.
Most failover clustering software is application-agnostic, enabling organizations to have a single, universal HA/DR solution. This same capability also affords protection for the entire SQL Server application. And that includes the database, logons, agent jobs, etc., all in an integrated fashion. Although these solutions are generally also storage-agnostic, enabling them to work with shared storage area networks, shared-nothing SANless failover clustering is usually preferred for its ability to eliminate potential single points of failure.
Support for Always On Failover Cluster Instances (FCIs) in the less expensive Standard Edition of SQL Server, with no compromises to availability or performance, is a major advantage. In a Windows environment, most failover clustering software supports FCIs by leveraging the built-in WSFC feature. It makes the implementation quite straightforward for both database and system administrators. Linux is becoming increasingly popular for SQL Server and many other enterprise applications. Some failover clustering solutions now make implementing HA/DR provisions just as easy as it is for Windows by offering application-specific integration.
Typical Three-Node SANless Failover Cluster
The example EC2 configuration in the diagram shows a typical three-node SANless failover cluster configured as Virtual Private Cloud (VPC) with all three SQL Server instances in different Availability Zones. To eliminate the potential for an outage in a local disaster affecting an entire region, one of the AZs is located in a different AWS region.
A three-node SANless failover cluster affords carrier-class HA and DR protections. The basic operation is the same in the LAN and/or WAN for Windows or Linux. Server #1 is initially the primary or active instance that replicates data continuously to both servers #2 and #3. It experiences a problem. Then it triggers an automatic failover to server #2, which now becomes the primary replicating data to server #3.
If the failure was caused by an infrastructure outage, the AWS staff would begin immediately diagnosing and repairing whatever caused the problem. Once fixed, it could be restored as the primary, or server #2 could continue in that capacity replicating data to servers #1 and #3. Should server #2 fail before server #1 is returned to operation, as shown, server #3 would become the primary after a manual failover. Of course, if the failure was caused by the application software or certain other aspects of the configuration, it would be up to the customer to find and fix the problem.
SANless failover clusters can be configured with only a single standby instance, of course. But such a minimal configuration does require a third node to serve as a witness. The witness is needed to achieve a quorum for determining the assignment of the primary. This important task is normally performed by a domain controller in a separate AZ. Keeping all three nodes (primary, secondary and witness) in different AZs eliminates the possibility of losing more than one vote if any zone goes offline.
It is also possible to have two- and three-node SANless failover clusters in hybrid cloud configurations for HA and/or DR purposes. One such three-node configuration is a two-node HA cluster located in an enterprise data center with asynchronous data replication to AWS or another cloud service for DR protection—or vice versa.
In clusters within a single region, where data replication is synchronous, failovers are normally configured to occur automatically. For clusters with nodes that span multiple regions, where data replication is asynchronous, failovers are normally controlled manually to avoid the potential for data loss. Three-node clusters, regardless of the regions used, can also facilitate planned hardware and software maintenance for all three servers while providing continuous DR protection for the application and its data.
Maximise High Availability for SQL Server
By offering 55 availability Zones spread across 18 geographical Regions, the AWS Global Infrastructure affords enormous opportunity to maximize High Availability for SQL Server by configuring SANless failover clusters with multiple, geographically-dispersed redundancies. This global footprint also enables all SQL Server applications and data to be located near end-users to deliver satisfactory performance.
With a purpose-built solution, carrier-class high availability need not mean paying a carrier-like high cost. Because purpose-built failover clustering software makes effective and efficient use of EC2’s compute, storage and network resources, while being easy to implement and operate, these solutions minimize any capital and all operational expenditures, resulting in high availability being more robust and more affordable than ever before.
Reproduced from TheNewStack
|January 27, 2019||
Options for When Public Cloud Service Levels Fall Short
All public cloud service providers offer some form of guarantee regarding availability. These may or may not be sufficient, depending on each application’s requirement for uptime. These guarantees typically range from 95.00% to 99.99% of uptime during the month. Most impose some type of “penalty” on the service provider for falling short of those thresholds.
Usually, most cloud service providers offer a 99.00% uptime threshold. This equates to about seven hours of downtime per month. And for many applications, those two-9’s might be enough. But for mission-critical applications, more 9’s are needed. Especially given the fact that many common causes of downtime are excluded from the guarantee.
There are, of course, cost-effective ways to achieve five-9’s high availability and robust disaster recovery protection in configurations using public cloud services, either exclusively or as part of a hybrid arrangement. This article highlights limitations involving HA and DR provisions in the public cloud. It explores three options for overcoming these limitations, and describes two common configurations for failover clusters.
Caveat Emptor in the Cloud
While all cloud service providers (CSPs) define “downtime” or “unavailable” somewhat differently, these definitions include only a limited set of all possible causes of failures at the application level. Generally included are failures affecting a zone or region, or external connectivity. All CSPs also offer credits ranging from 10% for failing to meet four-9’s of uptime to around 25% for failing to meet two-9’s of uptime.
Redundant resources can be configured to span the zones and/or regions within the CSP’s infrastructure. It will help to improve application-level availability. But even with such redundancy, there remain some limitations that are often unacceptable for mission-critical applications. Especially those requiring high transactional throughput performance. These limitations include each master being able to create only a single failover replica. And it requires the use of the master dataset for backups, and using event logs to replicate data. These and other limitations can increase recovery time during a failure and make it necessary to schedule at least some planned downtime.
The more significant limitations involve the many exclusions to what constitutes downtime. Here are just a few examples from actual Public Cloud Service Levels agreements of what is excluded from “downtime” or “unavailability” that cause application-level failures resulting from:
To be sure, it is reasonable for CSPs to exclude certain causes of failure. But it would be irresponsible for system administrators to use these as excuses. It is necessary to ensure application-level availability by some other means.
What Public Cloud Service Levels Are Available?
Provisioning resources for high availability in a way that does not sacrifice security or performance has never been a trivial endeavor. The challenge is especially difficult in a hybrid cloud environment where the private and public cloud infrastructures can differ significantly. It makes configurations difficult to test and maintain. Furthermore, it can result in failover provisions failing when actually needed.
For applications where the service levels offered by the CSP fall short, there are three additional options available based on the application itself, features in the operating system, or through the use of purpose-built failover clustering software.
Three Options for Improving Application-level Availability
The HA/DR options that might appear to be the easiest to implement are those specifically designed for each application. A good example is Microsoft’s SQL Server database with its carrier-class Always On Availability Groups feature. There are two disadvantages to this approach, however. The higher licensing fees, in this case for the Enterprise Edition, can make it prohibitively expensive for many needs. And the more troubling disadvantage is the need for different HA/DR provisions for different applications, which makes ongoing management a constant (and costly) struggle.
Uptime-Related Features Integrated Into The Operating System
Second option to improve Public Cloud Service Levels involves using uptime-related features integrated into the operating system. Windows Server Failover Clustering, for example, is a powerful and proven feature that is built into the OS. But on its own, WSFC might not provide a complete HA/DR solution because it lacks a data replication feature. In a private cloud, data replication can be provided using some form of shared storage, such as a storage area network. But because shared storage is not available in public clouds, implementing robust data replication requires using separate commercial or custom-developed software.
For Linux, which lacks a feature like WSFC, the need for additional HA/DR provisions and/or custom development is considerably greater. Using open source software like Pacemaker and Corosync requires creating (and testing) custom scripts for each application. These scripts often need to be updated and retested after even minor changes are made to any of the software or hardware being used. But because getting the full HA stack to work well for every application can be extraordinarily difficult, only very large organizations have the wherewithal needed to even consider taking on the effort.
Purpose-Built Failover Cluster
Ideally there would be a “universal” approach to HA/DR capable of working cost-effectively for all applications running on either Windows or Linux across public, private and hybrid clouds. Among the most versatile and affordable of such solutions is the third option: the purpose-built failover cluster. These HA/DR solutions are implemented entirely in software that is designed specifically to create, as their designation implies, a cluster of virtual or physical servers and data storage with failover from the active or primary instance to a standby to assure high availability at the application level.
Benefits Of These Solutions
These solutions provide, at a minimum, a combination of real-time data replication, continuous application monitoring and configurable failover/failback recovery policies. Some of the more robust ones offer additional advanced capabilities, such as a choice of block-level synchronous or asynchronous replication, support for Failover Cluster Instances (FCIs) in the less expensive Standard Edition of SQL Server, WAN optimization for enhanced performance and minimal bandwidth utilization, and manual switchover of primary and secondary server assignments to facilitate planned maintenance.
Although these general-purpose solutions are generally storage-agnostic, enabling them to work with storage area networks, shared-nothing SANless failover clusters are normally preferred based on their ability to eliminate potential single points of failure.
Two Common Failover Clustering Configurations
Every failover cluster consists of two or more nodes. Locating at least one of the nodes in a different datacenter is necessary to protect against local disasters. Presented here are two popular configurations: one for disaster recovery purposes; the other for providing both mission-critical high availability and disaster recovery. High transactional performance is often a requirement for highly available configurations. The example application is a database.
The basic SANless failover cluster for disaster recovery has two nodes with one primary and one secondary or standby server or server instance. This minimal configuration also requires a third node or instance to function as a witness, which is needed to achieve a quorum for determining assignment of the primary. For database applications, replication to the standby instance across the WAN is asynchronous to maintain high performance in the primary instance.
The SANless failover cluster affords a rapid recovery in the event of a failure in the primary. Resulting in a basic DR configuration suitable for many applications. It is capable of detecting virtually all possible failures, including those not counted as downtime in public cloud services. As such it will work in a private, public or hybrid cloud environment.
For example, the primary could be in the enterprise datacenter with the secondary deployed in the public cloud. Because the public cloud instance would be needed only during planned maintenance of the primary or in the event of its failure—conditions that can be fairly quickly remedied—the service limitations and exclusions cited above may well be acceptable for all but the most mission-critical of applications.
Three-Node SANless Failover Clusters
The figure shows an enhanced three-node SANless failover cluster that affords both five-9’s high availability and robust disaster recovery protection. As with the two-node cluster, this configuration will also work in a private, public or hybrid cloud environment. In this example, servers #1 and #2 are located in an enterprise datacenter with server #3 in the public cloud. Within the datacenter, replication across the LAN can be fully synchronous to minimize the time it takes to complete a failover. Therefore, maximize availability.
When properly configured, three-node SANless failover clusters afford truly carrier-class HA and DR. The basic operation is application-agnostic and works the same for Windows or Linux. Server #1 is initially the primary or active instance that replicates data continuously to both servers #2 and #3. If it experiences a failure, the application would automatically failover to server #2, which would then become the primary replicating data to server #3.
Immediately after a failure in server #1, the IT staff would begin diagnosing and repairing whatever caused the problem. Once fixed, server #1 could be restored as the primary with a manual failback, or server #2 could continue functioning as the primary replicating data to servers #1 and #3. Should server #2 fail before server #1 is returned to operation, as shown, server #3 would become the primary. Because server #3 is across the WAN in the public cloud, data replication is asynchronous and the failover is manual to prevent “replication lag” from causing the loss of any data.
SANless failover clustering software is able to detect all possible failures at the application level. It readily overcomes the CSP limitations and exclusions mentioned above. So, it makes it possible for this three-node configuration to be deployed entirely within the public cloud. To afford the same five-9’s high availability based on immediate and automatic failovers, servers #1 and #2 would need to be located within a single zone or region where the LAN facilitates synchronous replication.
For appropriate DR protection, server #3 should be located in a different datacenter or region, where the use of asynchronous replication and manual failover/failback would be needed for applications requiring high transactional throughput. Three-node clusters can also facilitate planned hardware and software maintenance for all three servers. At the same time, continue to provide continuous DR protection for the application and its data.
By offering multiple, geographically-dispersed datacenters, public clouds afford numerous opportunities to improve availability and enhance DR provisions. SANless failover clustering software makes effective and efficient use of all compute, storage and network resources. It also being easy to implement and operate. These purpose-built solutions minimize all capital and operational expenditures. Finally, resulting in high availability being more robust and more affordable than ever before.
# # #
About the Author
Cassius Rhue is Director of Engineering at SIOS Technology. He leads the software product development and engineering team in Lexington, SC. Cassius has over 17 years of software engineering, development and testing experience. He also holds a BS in Computer Engineering from the University of South Carolina.
Article from DRJ.com
|January 23, 2019||
Cost of Cloud for High-Availability Applications
Shortly after contracting with a cloud service provider, a bill arrives that causes sticker shock. There are unexpected and seemingly excessive charges. Those responsible seem unable to explain how this could have happened. The situation is urgent because the amount threatens to bust the IT budget unless cost-saving changes are made immediately. So how do we manage the Cost of Cloud for High-Availability Applications?
This cloud services sticker shock is often caused by mission-critical database applications. Especially these tend to be the most costly for a variety of reasons. These applications need to run 24/7. They require redundancy, which involves replicating the data and provisioning standby server instances. Data replication requires data movement, including across the wide area network (WAN). And providing high availability can result in higher costs to license Windows to get Windows Server Failover Clustering (versus using free open source Linux), or to license the Enterprise Edition of SQL Server to get Always On Availability Groups.
Before offering suggestions for managing Cost of Cloud for High-Availability Applications, it is important to note that the goal here is not to minimize those costs. But instead to optimize the price/performance for each application. In other words, it is appropriate to pay more when provisioning resources for those applications that require higher uptime and throughput performance. It is also important to note that a hybrid cloud infrastructure—with applications running in whole or in part in both the private and public cloud—will likely be the best way to achieve optimal price/performance.
Understanding Cloud Service Provider Business And Pricing Models
The sticker shock experience demonstrates the need to thoroughly understand how cloud services are priced and managing Cost of Cloud for High-Availability Applications. Only then can the available services be utilized in the most cost-effective manner.
All cloud service providers (CSPs) publish their pricing. Unless specified in the service agreement, that pricing is constantly changing. All hardware-based resources, including physical and virtual compute, storage, and networking services, inevitably have some direct or indirect cost. These are all based to some extent on the space, power, and cooling these systems consume. For software, open source is generally free. But all commercial operating systems and/or application software will incur a licensing fee. And be forewarned that some software licensing and pricing models can be quite complicated. So be sure to study them carefully.
In addition to these basic charges for hardware and software, there are potential á la carte costs for various value-added services. This includes security, load-balancing, and data protection provisions. There may also be “hidden” costs for I/O to storage or among distributed microservices, or for peak utilization that occurs only rarely during “bursts.”
Because every CSP has its own unique business and pricing model, the discussion here must be generalized. And, in general, the most expensive resources involve compute, software licensing, and data movement. Together they can account for 80% or more of the total costs. Data movement might also incur separate WAN charges that are not included in the bill from the CSP.
Storage and networking within the CSP’s infrastructure are usually the least costly resources. Solid state drives (SSDs) normally cost more than spinning media on a per-terabyte basis. But SSDs also deliver superior performance, so their price/performance may be comparable or even better. And while moving data back to the enterprise can be expensive, moving data from the enterprise to the public cloud can usually be done cost-free (notwithstanding the separate WAN charges).
Formulating Strategies For Optimizing Price/Performance
Covering the Cost of Cloud for High-Availability Applications needs meticulous checks. Here are some suggestions for managing resource utilization in the public cloud in ways that can lower costs while maintaining appropriate service levels for all applications. This include those that require mission-critical, high uptime and throughput.
In general, right-sizing is the foundational principle for managing resource utilization for optimal price/performance. When Willie Sutton was purportedly asked why he robbed banks, he replied, “Because that’s where the money is”. In the cloud, the money is in compute resources, so that should be the highest priority for right-sizing.
For new applications, start with minimal virtual machine configurations for compute resources. Add CPU cores, memory and/or I/O only as required to achieve satisfactory performance. All virtual machines for existing applications should eventually be right-sized. Begin with those that cost the most. Reduce allocations gradually while monitoring performance constantly until achieving diminishing returns.
It is worth noting that a major risk associated with right-sizing is the potential for under-sizing. However it can result in unacceptably poor performance. Unfortunately, the best way to assess an application’s actual performance is with a production workload, making the real world the right place to right-size. Fortunately, the cloud mitigates this risk by making it easy to quickly resize configurations on demand. So right-size aggressively where needed. However be prepared to react quickly in response to each change.
Storage, in direct contrast to compute, is generally relatively inexpensive in the cloud. But be careful using cheap storage, because I/O might incur a separate—and costly—charge with some services. If so, make use of potentially more cost-effective performance-enhancing technologies such as tiered storage, caching, and/or in-memory databases, where available, to optimize the utilization of all resources.
Software licenses can be a significant expense in both private and public clouds. For this reason, many organizations are migrating from Windows to Linux, and from SQL Server to less-expensive commercial and/or open source databases. But for those applications for which “premium” operating system and/or application software is warranted, check different CSPs to see if any pricing models might afford some savings for the configurations required.
Finally, all CSPs offer discounts, and combinations of these can sometimes achieve a savings of up to 50%. Examples include pre-paying for services, making service commitments, and/or relocating applications to another region.
Creating And Enforcing Cost Containment Controls
Self-provisioning for cloud services might be popular with users. But without appropriate controls, this convenience makes it too easy to over-utilize resources, including those that cost the most.
Begin the effort to gain better control by taking full advantage of the monitoring and management tools all CSPs offer. This is likely to encounter a learning curve of course. Because the CSP’s tools may be very different from, and potentially more sophisticated than, those being used in the private cloud.
One of the more useful cost containment tools involves the tagging of resources. Tags consist of key/value pairs and metadata associated with individual resources. And some can be quite granular. For example, each virtual machine, along with the CPU, memory, I/O, and other billable resources it uses, might have a tag. Other useful tags might show which applications are in a production versus development environment, or to which cost center or department each is assigned. Collectively, these tags could constitute the total utilization of resources reflected in the bill.
Organizations that make extensive use of public cloud services might also be well-served to create a script. Include loading information from all available monitoring, management, and tagging tools into a spreadsheet or similar application for detailed analyses and other uses, such as chargeback, compliance, and trending/budgeting. Ideally, information from all CSPs and the private cloud would be normalized for inclusion in a holistic view to enable optimizing price/performance for all applications running throughout the hybrid cloud.
Handling The Worst-Case Use Case: High Availability Applications
In addition to the reasons cited in the introduction for why high-availability applications are often the most costly, all three major CSPs—Google, Microsoft, and Amazon—have at least some high availability-related limitations. Examples include failovers normally being triggered only by zone outages and not by many other common failures; master instances only being able to create a single failover replica; and the use of event logs to replicate data, which creates a “replication lag” that can result in temporary outages during a failover.
None of these limitations is insurmountable, of course—with a sufficiently large budget. The challenge is finding a common and cost-effective solution for implementing high-availability across public, private, and hybrid clouds. Among the most versatile and affordable of such solutions is the storage area network (SAN)-less failover cluster. These high-availability solutions are implemented entirely in software that is purpose-built to create. As implied by the name, a shared-nothing cluster of servers and storage with automatic failover across the local area network and/or WAN to assure high availability at the application level. Most of these solutions provide a combination of real-time block-level data replication, continuous application monitoring, and configurable failover/failback recovery policies.
Some of the more robust SAN-less failover clusters also offer advanced capabilities. For example WAN optimization to maximize performance and minimize bandwidth utilization, robust support for the less-expensive Standard Edition of SQL Server. And let’s not forget manual switchover of primary and secondary server assignments for planned maintenance, and the ability to perform routine backups without disruption to the applications.
Maintaining The Proper Perspective
While trying out some of these suggestions in your hybrid cloud, endeavor to keep the monthly CSP bill in its proper perspective. With the public cloud, all costs appear on a single invoice. By contrast, the total cost to operate a private cloud is rarely presented in such a complete, consolidated fashion. And if it were, that total cost might also cause sticker shock. A useful exercise, therefore, might be to understand the all-in cost of operating the private cloud—taking nothing for granted—as if it were a standalone business such as that of a cloud service provider. Then those bills from the CSP for your mission-critical applications might not seem so shocking after all.
Article from www.dbta.com