Although some details are still vague, Amazon has released “an explanation” on why its systems suddenly shut down last Tuesday.
The issue caused streaming services such as Netflix to go down and caused delays in Amazon’s package facilities. The servers went down due to a software issue, and Amazon has yet to say whether human error was involved.
Amazon gave its statement in a technical matter, writing on the AWS website, “An automated activity to scale capacity of one of the AWS services hosted in the main AWS network triggered an unexpected behavior from a large number of clients inside the internal network.”
In simpler terms, a central AWS network automatically ran an “unexpected behavior,” causing havoc across all US-East-1 servers.
Forrester analyst Brent Ellis says the bad code caused a “snowball effect,” which in turn caused “their internal controls and monitoring systems [to be] taken offline by the storm of traffic caused by the original problem.”
The AWS servers in that area were overwhelmed by the data triggered by that “unexpected event,” which Amazon has yet to define. The event occurred between the central AWS network and more local “internal networks.”
Both networks constantly share data with each other, causing a complex and fragile internet ecosystem. With the server issues, anything that ran on Amazon’s AWS system had connection difficulties.
Workers at Amazon warehouses couldn’t scan packages, Amazon robot vacuums shut down, and ticket sales using AWS slowed to a crawl.
Amazon’s AWS servers are extremely technical, but some criticized Amazon for not offering a complete explanation.
Cloud economist Corey Quinn stated, “They don’t explain what this unexpected behavior was and they didn’t know what it was. So, they were guessing when trying to fix it, which is why it took so long.”
The outage only lasted a few hours, but people reliant on smart-home technology were left without working refrigerators or doorbells. The last major AWS crash in 2017 was caused by human error.
Amazon apologized for the crash and said they are taking steps to upgrade their Service Health Dashboard, which shows the status of different AWS services. The Service Health Dashboard was hit by Tuesday’s crash, making it difficult for people to access information on AWS servers.