Why do enterprise production deployments go wrong so often?
The thought typically occurs to senior IT managers at some point in their careers - especially during times of rapid growth, severe resource limitations or big organizational changes. A weekend project failure, or a series of failures, followed by the inevitable post-mortem lessons learned, inspire new management initiatives in an attempt to fill the gaps. But driving real progress and consistent performance across a large organization takes time. Your project deployments continue to falter and require several tries to achieve success. Your business community continues to wait for new technology. Nobody’s happy!
In the photograph we see that the Queen’s Guard had a bad day on the parade grounds. The non-reaction by the men who are still at attention is interesting. Their colleague lies unconscious with his face plowed into the ground. The other troops stand firmly at attention and ignore him. No assistance is offered. But, in fact, the Queen’s Guard is executing their contingency plan flawlessly. Stand at attention, ignore him, and await orders. That’s the plan.
In contrast, the science and art of planning enterprise IT production deployments across global systems requires a more sophisticated strategy.
Weekend deployments of ECM application software upgrades, infrastructure expansions, new custom applications, content migrations and the like can be jeopardized by a technical surprise in one of the integrated product stacks or by a project engineer making a configuration mistake. If the unexpected problem can’t be resolved within the narrow window of weekend time, then backing out is the only option. Executing the back-out procedure adds new technical and business risks. Finally, after a lot of work and stress - nothing accomplished. A Busted Weekend!
The aftermath of the Busted Weekend begins with the ticking sound of a freshly wound Late IT Project Clock. IT leadership is motivated to urgently resolve the open technical issues and schedule a second try at a future date. It’s not over yet!
If this is a familiar scenario, please allow me to offer some ideas and encouragement. With a clear-eyed view of what’s been going wrong, and why, it’s certainly possible to nail your ECM production cut-overs on the first try!
An honest and rational IT leadership re-assessment of production system deployment risk is the right starting point.
Large and geographically dispersed organizations are ripe for overly-optimistic “group think” estimation of system-wide risks. Its human nature to focus primarily on the piece that you own personally, be confident in your own abilities, and also harbor an unconsciously positive bias about the capabilities of other people - especially if you’re not personally close to their activities or the various managers responsible for global team members. See the “References” section of this document for interesting links from the cognitive and management sciences about the human tendency to underestimate complex risk (for a live demonstration visit the nearest casino!).
To illustrate the difference between an intuitive, but flawed, risk assessment and one that is solidly grounded in a quantitative analysis, let’s imagine an ECM production system deployment planned for an upcoming weekend. The project goal is to upgrade the ECM software stack, configure additional virtual server capacity and then deploy updated workflows and custom applications into the production environment. This major project requires that 5 technical specialists make the needed system changes in a coordinated sequence:
- Systems Engineer #1 – Server OS and storage path updates
- Systems Engineer #2 - Network and load balancer update
- Systems Engineer #3 - ECM software upgrade and expanded capacity configured on new servers
- Developer - Custom application and workflow updates and redeployments
- Database Administrator - Database update
It’s essential that these 5 activities are executed with precision and that the new technologies work as expected to avoid a Busted Weekend. Upon reflection, you estimate about a 95% confidence level in success of the plan, people, products, and integrating technologies at each of the 5 steps. Based on this assessment, you might even report to your boss that you have a 95%+ confidence level in overall success.
But that would be a mistake.
The actual, and mathematically sound, probability of success for a project with 5 key process stage gates, and an estimated 95% success rate for each, is about a one-in-four probability of failure. This degradation from 95% confidence in individual elements is the result of a significant mathematical headwind inherent to multi-step probability calculations. Overall probability of success is not a simple average of the individual critical step probabilities (as human nature is unconsciously inclined to do). Instead, the overall probability of success is the product of the individual critical node probabilities:
If you are only 85% confident in 2 of the 5 steps, then the odds of success plunge to 62%.
The enterprise risk assessment lesson here is that in addition to the basic project management responsibility to identify production deployment risks and assign individual ownership to “mitigate” them, it’s also wise to honestly assess and prepare for the meta-level statistical risk of production deployment failure. Note that the overall probability of failure can be unexpectedly high while at the same time the risk calculation for individual steps is considered to be fairly low.
The key technology stacks in the ECM integrated solution – virtual servers and storage, network, ECM software, custom applications, database – plus the systems engineers and administrators making the changes should each be counted as a potential single-point-of-failure.
The most effective way to collapse the meta-level, enterprise-wide, ECM production deployment risk is to make sure you have real ECM systems engineering experts on your project team. Their extensive product training, certification, and experience with a broad variety of enterprise environment challenges will serve you well during all phases of major ECM projects. ECM systems engineering experts will also help drive the whole project team’s success with updating (or troubleshooting) the integrated enterprise technologies. In short, having the right people with the right ECM skills and experience on the project team will significantly reduce your enterprise production deployment risk.
With complex IT projects, we don’t have the Queen’s Guard option of simply continuing to completion while ignoring a major individual failure. Make sure you have seasoned ECM experts on your project team. Avoid the Busted Weekend!
The Queen’s Guard: http://en.wikipedia.org/wiki/Queen's_Guard
Cognitive abilities and superior decision making under risk: A protocol analysis and process model evaluation; Edward T. Cokely and Colleen M. Kelley in Judgment and Decision Making, vol. 4, no. 1, February 2009, pp. 20-33; (mb: This study describes how people don’t intuitively perform mathematically sound calculations of risk probability or “expected-value calculations” in the words of the article).
The Dunning-Kruger effect: http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
How to Calculate Probability, Part 2 of 4: Calculating the Probability of Multiple Random Events; www.Wikihow.com/Calculate-Probability (mb: note that for purposes of simple calculation, the production deployment steps described in the blog article are “independent” probability events, meaning that your confidence in success is estimated in advance, and the success or failure of any one step does not directly influence the success probability of other steps. An analogy would be to have 5 jars of jellybeans, each with 19 blue and 1 red, representing a 95% chance of blue in each. If you randomly select one jellybean from each jar, then the probability of getting 5 in a row that are blue is .95 x .95 x .95 x .95 x .95 = 77%
Checklist for ECM Success – 14 Steps; AIIM Training: By Betsy Fanning, Director, Standards and Chapter Relations.
Imagine Solutions Platform Services are available to help make your next ECM project very successful!