There is a misconception that the cloud is plagued with down-time – leaving most organizations to believe on-prem is the better solution.
A recent failure in Amazon’s simple storage service brought down thousands of web sites. While the outage had been limited to Amazon’s Northern Virginia datacenter, it happened to be their most widely used region. Such an outage would not be visible to customers if those websites were properly load balanced across multiple regions, something that does not fall within Amazon’s responsibility.
From an end-user perspective, any service interruption translates into “the cloud is down”. Yet, the failure could be with the application rather than the underlying infrastructure services provided by the public cloud provider. When cloud-based applications fail, it can magnify the misconception that the cloud is unreliable.
In reality, infrastructure in the cloud is designed to fail. They are commonly built using commodity hardware to keep costs down. Cloud providers bake in layers of high availability on top of the commodity hardware, like keeping multiple copies of data across data centers, and re-routing traffic around failed network gear. Service level agreements are based on what can be reliably delivered to the customer with some amount of downtime expected and planned for.
For example, Microsoft Azure provides a service level agreement that states Virtual Machines using premium disks will have an up-time of 99.9%. This leaves an 0.1% opportunity of the service delivered by that single virtual machine to either fail, or see down-time, translating to 8.76 hours per year. Applications must be designed with potential down-time in mind, and cloud providers give high availability design guidance based on their service level agreements.
Here are some examples of high availability, high up-time designs within Microsoft Azure that can keep a web service running even when failures occur:
Deploy your application on two Virtual Machines in an Availability Set
Use a Load Balancer to manage traffic, which brings your service level agreement up to 99.95%. This type of design will be resilient against single virtual machine outages.
Deploy your application on three Virtual Machines in an Availability Set
Use a Load Balancer to manage traffic, to take advantage of all three available fault domains in a region. This type of design will be resilient against multiple virtual machine outages.
Deploy your application on six Virtual Machines in two Availability Sets across two Regions
Use Traffic Manager to manage traffic across regions, and Load Balancers to manage traffic within a region, to take advantage of multiple fault domains across multiple regions. This type of design will be resilient against entire region outages.
While these examples can sound complicated, Traffic Manager, Load Balancers, and Availability Sets are actually simple to deploy for http/https based applications.
Deploying your application across multiple fault domains, multiple regions, or even multiple cloud providers will significantly improve your application’s availability and up-time.
Moving back to the question in hand. It is not simply a matter of “Is my data safe in the cloud?” There are many factors that need to be considered and understood about cloud services such as Microsoft Azure and Amazon AWS to cross-analyze the benefits and reliability of hosting your data in the cloud vs. on-prem.
We would be happy to sit down with your organization to strategize your cloud transformation.