What do you do when a ShowStopper escapes into production?

When you are a true professional, you see failure
as an opportunity to become a better professional.

Here’s a real life scenario, let’s see how many of you can relate to it:

It is the middle of the day on a regular Tuesday afternoon.  You are the QA Manager of a company developing an Enterprise Application, and last week your team released a minor version that included the fixes from all the patches of the last two months plus four minor features that were requested by product marketing in order to “close some pretty important deals”.  

All of a sudden the phone rings and it is your R&D director “inviting” you to an urgent meeting in his room.

You arrive to find there the R&D director, together with his development team managers, the product marketing manager, and the support team leader in charge of your product who is standing next to the whiteboard…  

As you sit down the support team leader tells all of you about an urgent showstopper that was released in last week’s version and that is expected to affect about a third of the companies who install this upgrade.

I want to stop this scenario (that happened to me about 7 years ago) to ask you a simple question:

What would you and your team do in this situation?

I believe that no two teams would react in the same way, and I don’t want to come up with best & worst case scenarios; but here are two contradictory approaches to serve as points in the possible behavioral continuum.

Scenario No. 1 – Blame & Panic

Step 1
The meeting turns into a witch-hunt where development blames testing for not finding the bug, then testing blames development for not documenting all the changes made in the system, then development and testing blame product marketing for pushing the teams to release even though not all the tests had been completed, etc…

Step2
After the meeting dissolves without any clear action items, support starts telling customers not to install the new release, the programmers start working on a solution without fully understanding the problem, and you are left on the side wondering how did you miss this bug and trying to find the person who should be responsible for it.

Step 3
Since the developers think this is absolutely urgent they decide to send the fix directly to support, and only in parallel they send it to your team for validation and verification.
They do this at 8:30 PM when no one is left in the office and you can only start testing it at 8:30 AM the next day.  About 30 minutes in to your testing cycle you start finding bugs in their new version.
The problem is that your support team already started delivering this solution to the initial set of customer who already complained about the bug.

Step 4
By mid day your team finished the tests on the system, they found that the initial fix only works on about half the supported configurations, and more important it also causes a regression bug on an area not directly related with the fix.
Within 30 minutes you get a new version that is verified and released to the support team by 4:00 PM.

Step 5
Customer support wants to kill both your testing team as well as the developers because now they need to find every company that downloaded the first fix and call to ask them not to install it, or even worst to install yet another fix on top of it.
Product Marketing is also mad at you since they already started getting calls from customers, account managers, and even some of your company’s top executives, all complaining about the mess and bad publicity this fiasco is already creating in the field.  They let you know that as a result of it your company will need to offer large discounts to all customers that complaint, and they think this issue may cause a number of important deals to be delayed or lost…

Step 6

(Step into your time-machine and fast-forward 3-4 months ahead)

Everything is still the same, no one was fired by the fiasco but the atmosphere was tense for about one week afterwards; after that it became water under the bridge.

Your team is about to release another minor version including all the patches of the last months, plus another three features needed for important deals.
As always product marketing is pushing for the release to go out on schedule even though your team got the final built a week late and you learned only yesterday that they included another feature that you were not even aware off…

As they say…

 

Scenario No. 2 – Solve, Learn & Improve

 

 

Step 0 – Don’t Panic!

The last thing you want to do is start looking for someone to blame.  Chances are that more than one person is “to blame” for making mistakes that lead to the issue been released, It is almost certain no one did this on purpose, and most importantly it will not help you to solve the issue!

So try stoping your basic instinct to blame someone else for the issue.

If another member of the team starts the blaming game, immediately ask him how is this contributing to solving the issue any faster?  You can also state that there will be enough time, after the issue is solved, to understand what went wrong and why.

Step 1 – Fix & Test
OK, so there is a bug out there and you better fix it quickly!
Put together the best team you can that will (1) analyze the issue, (2) define the quickest, safest and most effective fix, (3) create this fix, and finally (4) recommend how to test it.
What testing approach to take is not a simple decision either, depending on the bug and the product you are working you may choose to deliver the fix directly without doing any tests and verify only after the issue has been solved (for example if your application is web-based and the servers are down, it is better to get them back up and test once they are up).  On other occasions you may choose to run some or all possible tests in-house before releasing the fix, this is usually the case if the bug is not really critical and the fix may cause bigger bugs such as data-loss or business disruption.

In short, the first step is to create the best solution and find the most appropriate approach to get it to your customers.

Step 2 – Analyze
Once the critical part is over and the issue has been solved, but before people “move on” with their lives and tasks, its better to make sure you understand what went wrong.
The analysis process should never be a witch-hunt, it should be an opportunity where everybody feels safe to collaborate and bring forward all the factors that contributed to the problem happening in the first place (both internal as well as external factors).
This activity is fairly common and it is called a retrospective or post-mortem.

Step 3 – Corrective actions
Once the factors and issues have been identified as part of the post-mortem, the next step is to define corrective actions to prevent this from happening again.
Make sure these actions are clearly defined and actionable (duh!).
Many times we see stuff like “make sure communication is better” but this is not actionable at all, and it will not help anybody to change the way they have communicated up to now.
So as dumb as it may sound, make sure your corrective actions are actionable and they will lead to a change in the way things were been done in your company up to now.

Strong team share their failures while weak teams sweep them under the rug

I remember coming up with a phrase a couple of years ago as part of an presentation I did for a group of testers: “When you are a true professional, you see failure as an oportunity to become a better professional“.

I have seen many development teams where they perform retrospectives but then they are affraid or ashamed to share their results with the rest of the Company.  Would you blame the team for been self-centered and egoists?  I would actually blame their companies for not making sure their work atmosphere encourages teams to take risks and learn from their failures…

You need to make sure your team, and preferably your Company, encourages the sharing of retrospectives and corrective actions.  It is one of the best sources of free advice available.

Also, if you look closely, teams with the confidence and maturity required to openly share their risks and failures are also the most fun and challenging teams to work in.

And if all this was not enough to convince you, just go a head and read the Dilbert Comic Strip published earlier this week on the subject.  Thinks that make you hmmmm.

How would your team react?

Any war stories or insights into good or bad reactions?

, ,