Bang!
What should have been a routine memory replacement was not going well. It's 4am, and the engineers are staring at POST wondering how they hell they got bumped for the night shift, and how the hell they're going to get the machine up and running.
Crash!
Down again, diagnosis: faulty system board, faulty backplane, faulty memory. Jury rig something until parts arrive.
Fast forward 24 hours, the DBA team and an admin are restoring corrupt datafiles from tape when the box disappears. Sleepy panic ensues.
It's a further 24 hours before normal service resumes.
Architecting downtime
This sounds like a great hypothetical example, only it actually happened last week and it wasn't that great: I know, I was there.
The folly of not carrying redundant hardware for essential services is exceeded only by an application architecture that makes it near impossible to do so.
The applications in question resemble an intergenerational ball of yarn infused with many different colours of trends long since past. Logically disparate systems are so intertwined that they cannot be deployed individually and as such there are no favourites: The reporting applicaiton used by 10 people attains the same level of precedence as the application used by 80,000. Need one, you've got to deploy the other (and every other inhabiting the same infrastructure).
I wish it were a joke, but alas...
We, as application developers, have a responsibility to our customers (employers) to create modular and easily deployable systems. Make it your core promise: the consequences of failing to deliver can be dire.
Have a read of this.

Feed
Comments are now closed.