The following is based on a true story 🙂
Imagine you are hired to work as a software tester (or test lead) for a company that develops a web-based product .
When you get to the place you notice that these guys are really sharp and work with the latest technology and methodologies available; but very quickly you also notice a very strange behavior in the company: they run only a small number of tests before rolling new features out into production!
So you think to yourself: “EUREKA! I found the first thing to change in the company: release better products by running more tests before rolling stuff out!”
But when you take a closer look at the process and its numbers you realize two things:
1. Even though some bugs are been released into production, their numbers are not a lot higher than in other places you’ve worked. And what’s even more interesting is that whenever these bugs are released they are handled very quickly (in a matter of minutes) and with next-to-none effect on the users.
2. If you were to introduce more testing into the system (even automation!) it would dramatically increase the cost and extend the development cycle delaying time to market to levels far from what the company is willing to accept.
So what can you do?
Well… apparently you should not go out and automatically dictate that more tests should be run! Maybe you should start by understanding what they are doing and why?!
Sometimes exhaustive testing is not the best solution
It is pretty cool when you are working in ways that may seem unconventional and all-of-a-sudden someone shows you that other people are doing just about the same and writing about it in their blogs 🙂
I wanted to thank wonitta who last week pointed me to a post by Gojko Adzic where he wrote about companies that manage to reduce their exhaustive regression testing by using alternative approaches. I won’t repeat what Gojko wrote, but I definitely recommend you read the article.
What I can do is explain how the company that I wrote about above successfully (IMHO) handles its development and testing process. They work with a number of principles that help them achieve their goal of very-fast-time-to-market while maintaining high levels of product quality.
No magic spells, only common sense and discipline
I don’t think these guys are singular, nor do they use any technique or technology that is out of reach to most development teams today. They simply have evolved their process while trying to adapt to their highly competitive business environment, while looking for non-trivial ways of achieving their goals.
The process is not hard, but it does require discipline and the cooperation of all the team (developers, testers, product, etc).
Here are the principles I was able to identify from their process:
1. Short iterations & small incremental changes – by working on small agile cycles and making sure they break down large features into smaller iterations they are able to lower the complexity and the underlying risk of each release.
2. Good designs & risk analysis from the start – it is incredible how people always say that it is better to catch bugs in the design phase, but we still never do anything about it! Right 😉
Well, when you don’t have time to make mistakes it is better you work right from the start, and the best way to do this is by reviewing your assumptions and the design of your feature before you write your code.
By doing this these guys are able to make a good product (features, GUI, UX, integrations, etc) from the start; and by identifying and analyzing the high risk areas up-front they are able to design and write the feature in ways that lower these risks significantly.
The math is really simple: LOWER RISK = LESS BUGS.
3. All the team tests – one thing about good developers is that they are not snobs!
In order to achieve high quality products this company has an “Agile Testing” whole-team-tests policy and this means that even though it is assumed that testers can perform tests better than programmers this doesn’t mean that the later should not run their own tests.
Testers help programmers define their tests and in the most risky features they also perform a large number of the test themselves, but there are also features where testers will not run a single test and all these tasks will fall upon the shoulders of the programmers themselves.
I think that this policy serves 2 main purposes, (1) it relieves testers from being the bottlenecks in the process, and (2) it gives developers a higher sense of responsibility towards their programming standards (you take more careful steps on the tight-rope when you are your own safety net!).
4. Continuous integration – if you don’t know what this means then go and read more about it here.
The best thing about CI is that it allows the team to work on a continuously stable product, to find trivial regression bugs fast and to fix them even faster.
5. Organized pushes to production – releases and changes to production need to be controlled, performed gradually, and scheduled in advanced to make sure there are no conflicting pushes been done simultaneously.
Multiple changes to the same components, specially when done by different internal teams, automatically increase the risk of having bugs in production.
6. Gradual deployments – that allow you to deploy your code while keeping under control the effect they may have in your system.
By using techniques like A/B testing with limited percentages, or allowing only a subset of your users to access a specific change you are able to contain the effect of the changes to a small number of your users and thus limit the risk to your business.
These “production tests” are similar to Beta programs and provide the same type of information but a lot quicker!
7. Extensive monitoring – because when you have a bug in production you want to be the first to know about it and to get all the information up-front.
There are a number of different monitoring tools available, but putting them in place is only half the work. You also need to develop your product in a way that it will integrate with the monitoring tools you use, and provide them with the information that will allow the team to understand the issue and fix for it right away.
In a sense is like having good-old Dr. Watson, but been able to solve the issues reported by your users RIGHT AWAY (instead of in the next version of Windows).
8. Predefined rollback and patching process – when you want to do something quickly and correctly under pressure you better define in advance what you want to do. There is a saying in Hebrew that goes: “Work hard in practice, so that it comes easily during battle”.
The team should have a step-by-step rollback procedure with the scripts to run, the files to modify and all the rest of the operations required in order to quickly return the system to its last known good state (how it was before the release). Since this operations may change from release to release it is necessary to create this procedure for each push (or at least to review the generic process and make the needed changes).
Remember also that rollbacks should not be the only option, there should also be a defined procedure that defines what bugs can be “fixed in production” and how this should be done and tested.
9. Continuous learning and retrospective culture – because mistakes will be made, but you should learn from them in order not to commit them again.
The team has in place a process where each problem detected in production (bug, configuration issue, etc) is reviewed in a (semi) formal retrospective session, with lessons learned and action items to ensure the same issue won’t happen again.
An important fact to understand is that no two companies are the same; and this is sometime true also for two groups or products within the same company. Technology, company culture, and most importantly your users and the way they interact with your product, will define the way you develop and deploy your apps and the degrees of freedom you have as part of this process.
There are some industries where the price to pay for a mistake is really high, for example if you work on a life-sciences project where any bug can mean life-or-dead for a patient, or on the banking industry where a mistake can mean thousands or millions of dollars been mis-handled, or in aviation where bugs can mean a plane going down, then you should do exhaustive testing.
Still, the vast majority of companies working with web-based products don’t fall within the group above, and they have an intrinsic advantage that gives them a higher degree of freedom not available to firms that sell software that is installed in-house.
It is very easy to come up with many reasons why you cannot implement the process these guys use in your company, and you are probably right on most of them. But you also need to take into account the business reality you work in, and realize that 95% of the companies world-wide can allow to have a bug or two in production once in a while, and balance this with the advantages of been able to release higher quality products and with shorter release cycles.
A final thought:
After all there is no such thing as perfect & bug-free software, and most of us really want to make the best we can for the companies we work for…