
Amazon Web Services (AWS) has “returned to normal operations”, the company has said, following an outage that caused widespread chaos and again exposed the fragile foundations that today’s digital world is built on.
Slack, Snapchat, Signal and Perplexity were some of the affected apps and websites, among a host of big names. AWS offers cloud servers that allow these services, and millions of other websites and platforms, to run.
What exactly is AWS?
AWS is a cloud-computing platform that provides the infrastructure underpinning much of the internet.
It is one of the world’s biggest web-hosting providers, offering storage space and database management, and connecting traffic to more than 76 million websites around the world.
It has “positioned itself as the backbone of the internet” said BBC technology editor Zoe Kleinman. And “that’s how it sells its services: let us look after your business’s computing needs for you.”
Bringing in $108 billion (£80 billion) last year, AWS now accounts for the majority of Amazon’s profits.
What went wrong?
Within hours of the outage, Amazon engineers had identified the root cause of the issue: a Domain Name System (DNS) error. DNSs effectively serve as maps or phonebooks that link web URLs to server IP addresses so traffic is directed to the correct website.
“To keep with the phonebook analogy”, when DNS resolution issues occur servers provide the “wrong numbers for a given name, or vice versa”, said Wired.
Because so much of today’s online ecosystem is reliant on a small number of cloud platforms, when an outage of this magnitude occurs on one, “the ripple effects can quickly spread across industries and into people’s daily lives”, Rob van Lubek, of US software development firm Dynatrace, told The National.
That is what happened on Monday, with banking services, social networks messaging apps, government services, airline booking sites and online shopping all affected. Even Amazon.com itself was down for a time, while the company’s Alexa smart speakers and Ring doorbells stopped working.
“The headlines will focus on streaming services being down,” Ismael Wrixen, of US software developer ThriveCart, told The National. “The real, untold story is the unrecoverable loss of conversions for millions of small businesses. Every minute this occurs, entrepreneurs are learning the most painful lesson in e-commerce: your perfectly optimised ad funnel means nothing if the ‘buy’ button is dead.”
Surely this shouldn’t happen?
Monday’s outage has shown how integral AWS, and the other major cloud-computing services run by Google and Microsoft, have become.
Put bluntly, “when AWS sneezes, half the internet catches the flu”, Monica Eaton, of US payment services company Chargebacks911, told The National.
But after similar AWS disruption in 2021 and 2023 – as well as last year’s faulty CrowdStrike update, which brought down Microsoft Windows systems causing $5 billion (£3.7 billion) in direct business losses – many are asking how this keeps happening and why there are not fail-safes given how important these services are to people all around the world.
It raises “some difficult questions”, said The Register. “After all, cloud operations are supposed to have some built-in resiliency, right?”
When so much of the world’s digital infrastructure runs on a handful of American cloud providers, “resilience becomes as much a geopolitical issue as a technical one”, said Tech.eu, noting how even the UK’s tax authority HMRC was affected by the AWS outage.
It has “underscored just how dependent governments, businesses and users have become on the ‘big three’ cloud giants” and highlighted the “urgent need for multi-region, multi-provider strategies to mitigate systemic risk”.
Chaos caused by Monday’s online outage shows that ‘when AWS sneezes, half the internet catches the flu’