AWS Suffers Significant Outage – How do You Minimise Business Disruption?

As has been widely reported, Amazon Web Services suffered a fairly significant outage in their ‘US-EAST-1’ region which had an impact on 33 of their services and caused issues to a number of websites and cloud based services such as Netflix.

“We’ve identified the issue as high error rates with S3 in US-EAST-1, which is also impacting applications and services dependent on S3. We are actively working on remediating the issue.”

Who’s to blame?

Subsequently, AWS blamed user error for the outage and apologised to all affected customers or users.

“At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process,” Amazon said in a post-mortem.

Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems.”

What happened next?

Clearly outages are a concern to a cloud services and hosting organisation, but the scale and speed to which these issues can be redressed are part of the service you are paying for.  Because of this, the services were either partially or fully resolved in a matter of hours.

The pros and cons of cloud services do encompass these challenges; you have no direct input into any resolution to a problem, but what you do have is a reduction in the need to tie up your own IT teams in resolving the problem.

Not just AWS…

Bear in mind, numerous cloud services providers including Microsoft (Azure) and Google (GCP) will have outages, but as their entire business model is based on service delivery, they will fix these issues very quickly.

The business case for cloud…

…remains the same.  Defining and executing a cloud strategy will deliver against your goals as long as advice and guidance is sought to establish the full range of risk management contributes to this strategy.  Even at its most binary level, cloud services need to be viewed as another person’s datacentre and as such, ensuring the wraparound services such as security, disaster recovery and information assurance should be critical to this plan.

The bad news…

Any sole reliance on cloud services does run the risk of outages and not considering the wider range of datacentre services will create issues either in the short, medium or long term.

The good news…

SCC have a breadth of cloud services that can help you deliver against a cloud strategy.

What should you consider?

  • Service resilience / disaster recovery plan
  • Securing the perimeter….in the Cloud
  • Manage predictable software spend / controlling costs
  • Shadow IT / process implementation
  • Workload migration
  • What can migrate and more importantly, what can’t

With SCC’s expertise in cloud platform management we are well situated to provide the necessary guidance to customers on navigating their journey to or within existing cloud environments. Whether this be a public cloud, SCC private cloud platform, colocation or a hybrid model , then SCC have a solution!

SCC works closely with customers to determine their detailed requirements in terms of service resilience, availability, security and IT asset cost management to help ensure the services are delivered to the individual needs of their business.

Each service comes as standard with industry leading SLA’s, detailed consumption reporting and IT asset utilisation and cost controls, along with ITIL based service management to ensure our customers are meeting their compliance, governance and audit requirements.

We work closely with our partners Amazon and Microsoft to deliver the best platforms and services that are right for our customers.

Get in touch today [email protected]

Scroll to Top