On Monday, October 20, 2025, a massive outage at Amazon Web Services (AWS) sent shockwaves across the internet, disrupting thousands of websites, apps, and online platforms globally. The outage began around midnight Pacific Time, primarily impacting AWS’s critical US-East-1 data center cluster in Northern Virginia—a hub that supports numerous cloud services and powers a significant portion of the internet’s infrastructure.
The root cause, according to AWS, was a malfunction within an internal subsystem responsible for monitoring the health of network load balancers. This glitch led to cascading failures, including a critical Domain Name System (DNS) issue. DNS acts like the internet’s phone book, translating easy-to-remember domain names like “amazon.com” into machine-readable IP addresses. When the DNS service falters, websites and apps fail to connect, leading to widespread outages.
Among the affected services were some of the internet’s biggest names such as Snapchat, Reddit, Roblox, Fortnite, Coinbase, Robinhood, Venmo, Ring, and even Amazon’s primary retail and Prime Video services. Users worldwide reported difficulties accessing social media, gaming, streaming, and financial platforms, underscoring the deep interdependence on AWS’s cloud infrastructure. For many, platforms essential for communication, entertainment, and financial transactions became unreachable for hours.
Cybersecurity expert David Choffnes from Northeastern University highlighted the enormous centralization of the internet’s backbone: “When one cloud provider goes down, so much of what we depend on goes down.” The outage revealed the fragility of global internet architecture where a failure in a single regional cluster could cascade into a near-internet-wide disruption.
AWS quickly engaged in mitigation efforts, with service status pages confirming increased error rates and latency. By Monday afternoon Eastern Time, Amazon announced that the core issue had been resolved, though some customers faced lingering performance issues. This incident marks the second major outage at the US-East-1 cluster in four years, leading experts to stress the importance of building resilience by spreading workloads across multiple regions and availability zones.
The impact went beyond consumer inconvenience; businesses relying on AWS for cloud computing, data storage, and AI services faced operational challenges, revealing vulnerabilities in critical digital supply chains. Luke Kehoe, an industry analyst, remarked, “The key takeaway for companies is the need to architect for failure—deploy across multiple zones to minimize risk.”
This outage serves as a stark reminder of the internet’s centralized nature and the cascading risks that accompany it. Looking ahead, companies and consumers alike may push for more diversified and robust cloud strategies to safeguard against future disruptions.