Amazon's DNS Problem Knocked Out Half the Web, Likely Costing Billions
21 octobre 2025 à 19:45
An anonymous reader quotes a report from Ars Technica: On Monday afternoon, Amazon confirmed that an outage affecting Amazon Web Services' cloud hosting, which had impacted millions across the Internet, had been resolved. Considered the worst outage since last year's CrowdStrike chaos, Amazon's outage caused "global turmoil," Reuters reported. AWS is the world's largest cloud provider and, therefore, the "backbone of much of the Internet," ZDNet noted. Ultimately, more than 28 AWS services were disrupted, causing perhaps billions in damages, one analyst estimated for CNN.
[...] Amazon's problems originated at a US site that is its "oldest and largest for web services" and often "the default region for many AWS services," Reuters noted. The same site has experienced two outages before in 2020 and 2021, but while the tech giant had confirmed that those prior issues had been "fully mitigated," apparently the fixes did not ensure stability into 2025. ZDNet noted that Amazon's first sign of the outage was "increased error rates and latency across numerous key services" tied to its cloud database technology. Although "engineers later identified a Domain Name System (DNS) resolution problem" as the root of these issues and quickly fixed it, "other AWS services began to fail in its wake, leaving the platform still impaired" as more than two dozen AWS services shut down. At the peak of the outage on Monday, Down Detector tracked more than 8 million reports globally from users panicked by the outage, ZDNet reported. Ken Birman, a computer science professor at Cornell University, told Reuters that "software developers need to build better fault tolerance."
"When people cut costs and cut corners to try to get an application up, and then forget that they skipped that last step and didn't really protect against an outage, those companies are the ones who really ought to be scrutinized later."
Read more of this story at Slashdot.