Need to survive a blameless post-mortem when it's actually your fault? Get this free, humorous yet helpful handbook of survival tips for your workloads and work-life — from experts who have been there.
Sure, I like to laugh ->Enjoy this special Valentine’s Day edition of True Tales of Survival: Twenty Years of (Server) Solitude. Names have been redacted to protect the (possibly not so) innocent.
The following is a true story from [redacted], now a senior staff engineer at [redacted]. We collected this and other true tales of technical survival in The Survival Guide for Engineers: Expert advice for handling workload (and work-life) disasters, published by Cockroach Labs. This free book, deftly and hilariously illustrated by Giovanni Cruz, offers top tips from experts for surviving your job, surviving the workplace, and surviving whatever comes next (which, these days, could be anything).
I used to work for [redacted], a provider of IT production and disaster recovery services started by an oil company that had experienced their own catastrophic IT outage. After learning the hard way that they needed to set up disaster recovery for their own business, they realized they could sell disaster recovery solutions to other companies. This spinoff turned into a very successful enterprise that over a few decades grew to 45 data centers and data recovery centers in five different countries.
At one point leadership decided they wanted to consolidate some of these different data centers. During the consolidation project the engineers went to begin decommissioning one of the data centers and there they found an AS/400 server.
That in and of itself was no surprise, it’s a data center, there are tons of servers. But this one: they found this AS/400 under a desk. Nobody who worked there had even known it was under there — where it had apparently been running, completely on its own, for 20 years.
No one had the slightest idea what this mystery server did. Was this server for the company? For customers? Was an actual customer application running on this, or maybe someone’s database? No one had the slightest clue as to why it was here or what it did or how it had just been left to run on its own for two decades.
The discovery actually held up the consolidation initiative for this data center, because they couldn’t unplug it until they knew what it did! For almost a year people were trying to figure out, What does this thing even connect to? No one could solve the mystery. There was no documentation, nobody in the company knew anything about it. Finally, though, the deadline for vacating the facility arrived.
There was no other choice: they just had to unplug it and wait to see what would happen next.
What happened next was, some random guy out of like Arizona emailed to say, “Hey, my website stopped working! What did you guys do???”
That was it. The only complaint.
Everyone had been waiting to see what terrible disaster would unfold, what critical things might unexpectedly break, as a result of unplugging this mystery server. After an entire year of trying, and failing, to find out what it did or what it was for, followed by a moment of true terror when we actually unplugged it.
To me, the funniest part is that they actually transported this server to the new data center and plugged it again. There was an RTO/RPO agreement with the customer and so they had to make sure that it got plugged back in as soon as they got the email. That AS/400 is probably still there, going into its third decade, just doing its job.
— [name redacted], Senior Staff Software Engineer
The job of a software engineer/platform architect/DevOps diehard is not simply “all tech, all the time.” We survive a …
Read moreThe following is a true story from Derrick Miller, now a customer engineer at Google Cloud. We collected this and other …
Read moreThis happened 20 years ago, but it is an amazing tale of how a series of small but unfortunate events can very quickly …
Read more