SME_ITDR: The interesting part about ITDR is timing

Different types of measures can be included in disaster recovery plan (DRP). Disaster recovery planning is a subset of a larger process known as business continuity planning and includes planning for resumption of applications, data, hardware, electronic communications (such as networking) and other IT infrastructure.
Disaster recovery – Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/Disaster_recovery

# – # – # – # – # 

The interesting part of splitting this into parts is there is no holistic view of a recovery.

The timing of a recovery is essential to success. It really doesn’t matter if one can “turn the lights on” (i.e., power up back up components and get the parts running). What matters is to resync the whole “mess”.

From my first assignment in ITDR to my latest, no one seems to understand that.

From the internal job schedulers, to the external third parties, and in all the intra-system interfaces — everything must be set to common point in time. 

And, the “real world” keeps on going without you. So that makes “catching up” even harder.

So, “recovery” must be automated. Push that “big red easy button”, with apologies to Staples, and the systems must “automagically” on command: instantiate the recovery environment, fall back to a known good restart point, replay all the transaction “book” between the recovery point and the disaster, and present systems for acceptance by Business Users.

That’s a tall order.

In my first assignment, the arithmetic worked out that regardless of when during the processing week a disaster occurred, the environment would always be ready on the following Monday. (Quite a novel discovery. And shook the Business and IT Leadership awake. “Hey we need a better BCP for Monday disaster!”.)

Unfortunately, without the holistic view, everyone sees “trees”, but not the “forest”.

It’s a good thing that disasters are relatively rare. Most corporations don’t survive them.

— 30 —