David Goldman, CNNMoney.com (April 22, 2011)
"This was never supposed to happen.
"Amazon Web Services is the Titanic of cloud hosting, designed with backups to the backups' backups that prevent hosted websites and applications from failing.
"Yet, like the famous ocean liner, Amazon's cloud crashed this week, taking with it Reddit, Quora, FourSquare, Hootsuite, parts of the New York Times, ProPublica and about 70 other sites. The massive outage raised questions about the reliability of AWS and the cloud itself.
"It was supposed to work like this: Thousands of companies use AWS to run their websites through a service called Elastic Compute Cloud, or EC2. Rather than hosting their sites on their own servers, these customers turn to Amazon, which essentially rents out its unused -- and highly intricate -- server capacity.
"EC2 is hosted in five regions across the globe: Northern Virginia, Northern California, Ireland, Tokyo and Singapore....
Spread out that way, with facilities on three different continents (two, if you count Eurasia as one continent), Amazon's EC2 looks pretty safe: with five-way redundancy. A big question, as CNNMoney.com's article pointed out, is - - -
Cloud Computing Meets Murphy's Law"...So what went wrong exactly?
"Amazon (AMZN, Fortune 500) has been tight-lipped about the incident, and the company said it won't be able to fully comment on the situation until it does a 'post-mortem.' So it's not clear yet exactly how the problem occurred.
"But bits and pieces of information from Amazon, its customers and cloud experts help to explain what happened.
"Thursday's crash happened at Amazon's northern Virginia data center, located in one of its East Coast availability zones. In its status log, Amazon said that a 'networking event' caused a domino effect across other availability zones in that region, in which many of its storage volumes created new backups of themselves. That filled up Amazon's available storage capacity and prevented some sites from accessing their data.
"Amazon didn't say what that 'networking event' was....
The Lemming doesn't blame Amazon for not saying what the "networking event" was. It's possible that the techs who keep EC2 working (most of the time) don't know themselves. With a SNAFU this big, whoever's in charge of EC2 would reasonably want to be really certain of facts before publicly stating anything.
Then, there's the possibility that the "networking event" was something trivial-sounding, like the possible "wiring problem" cited in the article.
Whatever the explanation, the Lemming's glad to be well clear of having to explain what happened. As well-known as Murphy's law is, "...anything that can go wrong will go wrong" (Princeton's WordNet), folks often seem surprised when it crops up in something they're using.
The Usual Anonymous Experts"...EC2 is so simple to use -- a credit card and a few keystrokes literally gets your business into the cloud -- that some experts say can give a false sense of security. They see in Amazon customers a certain level of naivety that nothing could possibly go wrong.
"Of course, things go wrong and systems fail. Other cloud-hosted products like Google's (GOOG, Fortune 500) Gmail have gone down from time to time...."
The Lemming's gotten a bit tired of the anonymous "experts" that pop up in the news. It's one thing to have a reputation, and make a statement based on that reputation: and something else to come up with this sort of 'false sense of security' proclamations.
In contrast, there's this non-anonymous chap quoted in the article:
"...'Amazon's products are only as good as the people putting the architecture up,' said Michael Kirven, co-founder of cloud services provider Bluewolf. 'If you put all of your eggs in one basket, you put yourself at risk.'..."
True enough - and fairly obvious, when you think about it.
Which brings up what the Lemming sees as a serious problem with cloud computing.
The idea of getting access to (relatively) low-cost storage and computing power, over (generally) reliable connections, that (usually) don't stay down for long - looks good. And, in the Lemming's opinion, might make sense: if a company's data is backed up through an independent system - and it doesn't matter if operations stop dead in their tracks on occasion.
It's one of those things, in the Lemming's opinion, that looks good on paper; may be practical in some cases; and isn't going to solve all our problems
Cloud Computing and DilbertThere's been an interesting evolution - maybe - in Dilbert's attitude toward cloud computing:
November 19, 2009
January 7, 2011