The important quote from the timeline:
Mar 01 9:41 AM PST
We want to provide some additional information on the power issue in a single Availability Zone in the ME-CENTRAL-1 Region. At around 4:30 AM PST, one of our Availability Zones (mec1-az2) was impacted by objects that struck the data center, creating sparks and fire. The fire department shut off power to the facility and generators as they worked to put out the fire. We are still awaiting permission to turn the power back on, and once we have, we will ensure we restore power and connectivity safely. It will take several hours to restore connectivity to the impacted AZ. The other AZs in the region are functioning normally.
This reminds me of a visit to an Equinix data centre where the sales person was droning on and on about how incredibly reliable their power supplies were, how uninterruptible everything was, etc, etc…
Essentially, he was trying to assure us that no-no-no, we don’t need multiple zones like the public clouds, they can instead guarantee 100% uninterrupted power under all circumstances.
A bit bored and annoyed, I pointed to the giant red button conspicuously placed in the middle of a pillar and asked what it is for.
“Oh, that’s in case there’s a fire!”
“What does it do?”
“It cuts… the power… uhh… for the safety of the fire department.”
“So… if there’s a wisp of smoke in a corner somewhere, the fireys turn up, the first thing they do is… cut the power?”
“… yes.”
“Not 100% then, is it?”
Should have pushed it.
> we will ensure we restore power and connectivity safely
this would require human intervention and I am a bit worried what if the strike can happen again and human lives might be lost.
IIRC there have been cases in history where sometimes a same location is targeted across multiple days. Obviously, AWS might have local employees working in the region but would there be an evaluation of this threat itself within the relevant team in AWS. What if they try to bring the service back but then missiles are struck again and what if human lives might be lost on it. Let's just hope that it could be part of a evaluation as well.
> this would require human intervention
that's the difference between heroes and ordinary employees who bitch about having to go into the office twice a month.
same as the stories you hear of guys taking snow-cats up a mountain in a blizzard to restore phone circuits or radio transmitters gone offline.
Man, don’t be a “hero” trying to restore a lower ping to someone trying to buy a kindle in Jeddah.
What about local hospitals which may have service from that data center? There are heroes needed everywhere, all the time.
In that case, the hero was the person who avoided relying on a single AZ when they deployed to cloud.
I'm sure bezos will be really happy someone is being a hero for him in a war zone while he sails his newest yacht to wherever the new version of the island is.
on second thought there is a difference between restoring critical infrastructure in times of crisis vs restoring bot infrastructure for indian spamming operations. choose wisely
But I mean,are the employees safe at home? I guess if the really targeted the data center then home is safer, but in the fog of war maybe the data center wasn't the target?