When your job is NOT TO TEST

The following is based on a true story 🙂

Imagine you are hired to work as a software tester (or test lead) for a company that develops a web-based product .

When you get to the place you notice that these guys are really sharp and work with the latest technology and methodologies available; but very quickly you also notice a very strange behavior in the company: they run only a small number of tests before rolling new features out into production!

So you think to yourself: “EUREKA! I found the first thing to change in the company: release better products by running more tests before rolling stuff out!

But when you take a closer look at the process and its numbers you realize two things:

1.  Even though some bugs are been released into production, their numbers are not a lot higher than in other places you’ve worked.  And what’s even more interesting is that whenever these bugs are released they are handled very quickly (in a matter of minutes) and with next-to-none effect on the users.

and

2.  If you were to introduce more testing into the system (even automation!) it would dramatically increase the cost and extend the development cycle delaying time to market to levels far from what the company is willing to accept.

So what can you do?

Well… apparently you should not go out and automatically dictate that more tests should be run!  Maybe you should start by understanding what they are doing and why?!

Sometimes exhaustive testing is not the best solution

It is pretty cool when you are working in ways that may seem unconventional and all-of-a-sudden someone shows you that other people are doing just about the same and writing about it in their blogs 🙂

I wanted to thank wonitta who last week pointed me to a post by Gojko Adzic where he wrote about companies that manage to reduce their exhaustive regression testing by using alternative approaches.  I won’t repeat what Gojko wrote, but I definitely recommend you read the article.

What I can do is explain how the company that I wrote about above successfully (IMHO) handles its development and testing process. They work with a number of principles that help them achieve their goal of very-fast-time-to-market while maintaining high levels of product quality.

No magic spells, only common sense and discipline

I don’t think these guys are singular, nor do they use any technique or technology that is out of reach to most development teams today.  They simply have evolved their process while trying to adapt to their highly competitive business environment, while looking for non-trivial ways of achieving their goals.

The process is not hard, but it does require discipline and the cooperation of all the team (developers, testers, product, etc).

Here are the principles I was able to identify from their process:

1. Short iterations & small incremental changes – by working on small agile cycles and making sure they break down large features into smaller iterations they are able to lower the complexity and the underlying risk of each release.

2. Good designs & risk analysis from the start – it is incredible how people always say that it is better to catch bugs in the design phase, but we still never do anything about it!  Right 😉
Well, when you don’t have time to make mistakes it is better you work right from the start, and the best way to do this is by reviewing your assumptions and the design of your feature before you write your code.
By doing this these guys are able to make a good product (features, GUI, UX, integrations, etc) from the start; and by identifying and analyzing the high risk areas up-front they are able to design and write the feature in ways that lower these risks significantly.
The math is really simple:  LOWER RISK = LESS BUGS.

3. All the team tests – one thing about good developers is that they are not snobs!
In order to achieve high quality products this company has an “Agile Testing” whole-team-tests policy and this means that even though it is assumed that testers can perform tests better than programmers this doesn’t mean that the later should not run their own tests.
Testers help programmers define their tests and in the most risky features they also perform a large number of the test themselves, but there are also features where testers will not run a single test and all these tasks will fall upon the shoulders of the programmers themselves.

I think that this policy serves 2 main purposes, (1) it relieves  testers from being the bottlenecks in the process, and (2) it gives developers a higher sense of responsibility towards their programming standards (you take more careful steps on the tight-rope when you are your own safety net!).

4. Continuous integration – if you don’t know what this means then go and read more about it here.
The best thing about CI is that it allows the team to work on a continuously stable product, to find trivial regression bugs fast and to fix them even faster.

5. Organized pushes to production – releases and changes to production need to be controlled, performed gradually, and scheduled in advanced to make sure there are no conflicting pushes been done simultaneously.
Multiple changes to the same components, specially when done by different internal teams, automatically increase the risk of having bugs in production.

6. Gradual deployments – that allow you to deploy your code while keeping under control the effect they may have in your system.
By using techniques like A/B testing with limited percentages, or allowing only a subset of your users to access a specific change you are able to contain the effect of the changes to a small number of your users and thus limit the risk to your business.
These “production tests” are similar to Beta programs and provide the same type of information but a lot quicker!

7. Extensive monitoring – because when you have a bug in production you want to be the first to know about it and to get all the information up-front.
There are a number of different monitoring tools available, but putting them in place is only half the work.  You also need to develop your product in a way that it will integrate with the monitoring tools you use, and provide them with the information that will allow the team to understand the issue and fix for it right away.
In a sense is like having good-old Dr. Watson, but been able to solve the issues reported by your users RIGHT AWAY (instead of in the next version of Windows).

8. Predefined rollback and patching process – when you want to do something quickly and correctly under pressure you better define in advance what you want to do.  There is a saying in Hebrew that goes: “Work hard in practice, so that it comes easily during battle”.
The team should have a step-by-step rollback procedure with the scripts to run, the files to modify and all the rest of the operations required in order to quickly return the system to its last known good state (how it was before the release).  Since this operations may change from release to release it is necessary to create this procedure for each push (or at least to review the generic process and make the needed changes).
Remember also that rollbacks should not be the only option,  there should also be a defined procedure that defines what bugs can be “fixed in production” and how this should be done and tested.

9. Continuous learning and retrospective culture – because mistakes will be made, but you should learn from them in order not to commit them again.
The team has in place a process where each problem detected in production (bug, configuration issue, etc) is reviewed in a (semi) formal retrospective session, with lessons learned and action items to ensure the same issue won’t happen again.

WARNING:
Not all companies are created equally!

An important fact to understand is that no two companies are the same; and this is sometime true also for two groups or products within the same company.  Technology, company culture, and most importantly your users and the way they interact with your product, will define the way you develop and deploy your apps and the degrees of freedom you have as part of this process.

There are some industries where the price to pay for a mistake is really high, for example if you work on a life-sciences project where any bug can mean life-or-dead for a patient, or on the banking industry where a mistake can mean thousands or millions of dollars been mis-handled, or in aviation where bugs can mean a plane going down, then you should do exhaustive testing.

Still, the vast majority of companies working with web-based products don’t fall within the group above, and they have an intrinsic advantage that gives them a higher degree of freedom not available to firms that sell software that is installed in-house.

It is very easy to come up with many reasons why you cannot implement the process these guys use in your company, and you are probably right on most of them.  But you also need to take into account the business reality you work in, and realize that 95% of the companies world-wide can allow to have a bug or two in production once in a while, and balance this with the advantages of been able to release higher quality products and with shorter release cycles.

A final thought:

Maybe this is what we should aim for when we define our jobs as QA (Quality Assurance) Engineers and not as Testers?

After all there is no such thing as perfect & bug-free software, and most of us really want to make the best we can for the companies we work for…

, ,

6 Responses to When your job is NOT TO TEST

  1. Omri Lapidot July 21, 2011 at 9:07 am #

    Well, it’s good to see that here at Sears Israel we have managed to implement 7 out of your 9 principals and will implement the rest by September…

  2. Omri Lapidot July 21, 2011 at 9:07 am #

    Well, it’s good to see that here at Sears Israel we have managed to implement 7 out of your 9 principals and will implement the rest by September…

  3. ElizaF July 21, 2011 at 10:07 am #

    Fantastic post and as a model for best practices in non-risk based environments, I would love to work this way.

  4. Anonymous July 22, 2011 at 7:53 am #

    Hey Omri, 

    You are posing me a challenge, to continue updating the list the rest of the practices these guys keep adding (and taking out!) to mold to their changing reality – will see if by september I can add some more so that it doesn’t get boring in SEARS IL…

    (btw, there is one thing you can start implementing right away that is not in the list – these guys use PractiTest as their Test Management solution 🙂

    -joel

  5. Anonymous July 22, 2011 at 7:59 am #

    Hey Eliza,

    Not sure what’s the nature of your project, but risk is never black & white, in its continuum there are always practices that can be taken and those that are out of your reach.

    The idea is to find the stuff you can implement and adapt it to the changing reality of your work.  The best thing the company I wrote above has is a Kaizen approach to their process – http://en.wikipedia.org/wiki/Kaizen.

    This continuos improvement drive allows them to keep looking at themselves and find the places where they can keep improving and gaining – both in small as well as in larger scales.

    Good luck!

    -joel

  6. halperinko July 24, 2011 at 6:57 am #

    I would like to thank Gojko and Joel for raising this issue, many times it’s hard to put the finger on the source of the problem, and your posts clarified some things for me.
    I can’t say I fully agree, especially with the post above which gives much weight to the Agile methods as the cure, mainly since agile just like waterfall methods may inflect same level of failures in legacy parts of the system.
    While there is no doubt of the advantages of Continuous Integration, I claim this is yet another Regression method, which should be well focused to reduce redundant effort – and is viable for waterfall just as it is for agile.
     
    As opposed to the development teams who mainly focus on new functionality, we as testers bury the burden of supporting all of the company’s legacy (may it be products or features of past releases, and even just combinations of supported environments), which leads us into a cycle of constantly increasing content. 
     
    On the other hand, I do agree that we waste too much time & effort on Regression, which yields relatively low amount of bugs.
    From my experience, most of the bugs raised in automatic regression, were raised during the debugging of these test suits, or while reusing them on newly developed units which share much resemblance. (again – you can’t call that regression yet).
    I have claimed in the past, that we tend to execute most of our regression tests – just since it’s there, without constantly verifying the usefulness of these actions.
    More than that, I claim that we tend to use atomic tests which are well written for the initial release of new features, and instead of throwing these away and designing much more useful combined test cases for regression, we drag around with these redundant and useless tests onto regressions of next releases.
     
    I envy the Web & Cloud industries, which can control and get constant feedback from their products, while the “commercial” SW/Products have much less chances for that (although in some cases improving our SW and User Activity logs, as well as verifying that more of these shall return from the field and will be properly investigated, may give some advantage)

Leave a Reply

Shares