Glass Cannons
One can hardly navigate the world these days without the internet. From shopping to paying bills, social media to remote work, it’s unthinkable to be without it. That’s a lot of traffic. So when something like Amazon Web Services goes down, it’s a significant problem which can lead to widespread outages across multiple layers of the web, and the goods and services utilized every moment of the day. And yesterday, that’s exactly what happened.
Every location in digital space has a unique identifier, its IP address. But unless you’re like me, who deciphers them for a living, the average user doesn’t see that backbone. They see their browser, a website, streaming service, game app, etc. The reason for that is Domain Name System (DNS) protocols. DNS translates the numerical value of an IP address into the user facing website, app or service. It’s a pretty straightforward, one to one translation between provider and client.
So what happened?
There are billions of IP addresses out there. And all of that data must be stored and organized somewhere in order for anything to move online in a timely manner. AWS uses a database service that is designed to handle the performance of such storage and organization – DynamoDB. In the words of Mike Chapple, quoted in PBS’s article on the outage, it’s ‘one of the record-keepers of the modern internet’. And in the case of this particular outage, the issue wasn’t that the data was missing, it was that DynamoDB ‘lost’ its ability to communicate where it was with the DNS. Without the provider to client translation, traffic could not connect with its destination.
AWS was back up and running within a few hours. And while many services and sites were disrupted, ultimately no damage was done in terms of vulnerability or leaked data. And to be honest, the outage itself isn’t really my point in this report. DNS errors are common, often simply because of the sheer volume of requests. Anywhere between 60-90% of pings do not connect from host to client (source: every pcap I’ve ever looked at in Kibana). In my work, it’s often the first thing I filter out, just so I can feasibly see what did. My point is highlighting the danger of corporate monoculture.
Amazon is a powerful entity, and that’s putting it rather mildly. AWS hosts cloud computing infrastructure to government departments, businesses, universities, gaming platforms, social media sites and even things like McDonald’s online ordering app. But as large and powerful as AWS is, it’s also fragile. A single error in a single hub affected 64 internal services, which rippled outward to affect millions of users globally.
In biology, too much genetic similarity results in a slow decline towards extinction. A single disaster can wipe out a species because there aren’t enough to survive and adapt, especially if their core population is all in the same place.
Obviously, the internet isn’t a biological creature, but it is a microcosm of evolution in real time. Just 25 years ago – a single generation – the internet as it exists today was still a concept in the earliest stages of development. 50 years ago, it didn’t exist at all. 24 hour, global connectivity is young. And the problem we have with it currently is a lack of diversity in terms of how and where data is stored and backed up, and how and where that data is communicated between devices. A mere handful of tech companies like Amazon hold the majority of the power over the web’s operations and functions. It would not take much to create a global digital disaster, and we should be looking at ways to spread the logistics of keeping the internet actually running a little (a lot) more evenly.
Posted on LinkedIn 10/21/25














