SME_ITDR: Start of Day and End of Day as known recovery points

2015-May-08

Had a discussion the other day about how should an application recover.

It’s obvious that real-time replication of data, databases, and “stuff” is NOT the same as a restart after a recovery.

(Well, maybe not so obvious to the techies who love these things.)

I made a case to a group of IT folks that an “application”, or suite of applications, or a portfolio of applications had to accomplish certain functions:

  • Determine the state of the world when it starts
  • Verify the data and files being presented are “correct” (i.e., IAAA checked)
  • Align all its data and files to the “correct” starting point
  • Run any transaction logs necessary to bring the app up to speed (i.e., recovery point runs up to current time or last good transaction time)
  • Allow for correction and catchup to external suorces
  • Preserve trails to demonstrate “correct” recovery

If the “application” can NOT do all of these things then it is doomed to a completely manual recovery.

When an enterprise has thousands of “applications” — usually intertwined in a rat’s nest of complexity — the likelyhood of a timely successful recovery is directly proportional to the enterprise’s “luck”. (More likely to win the Lotto.)

And, it’s interesting what Leadership, Regulators, External Auditors, Internal Auditors, and Risk Managers will accept as “proof” this will all work when needed.

Glad it’s not my paycheck on the “pass line” at this particular crap shoot.

# – # – # – # – #