Archive for February, 2008

A par-wise solution for your planning your testing environment matrix

Posted in Best Practices on February 26th, 2008 by Joel Montvelisky – 2 Comments

Most development projects need to handle multiple user environments, this is especially true for web applications and/or J2EE systems supporting multiple back-end and front-end configurations at once. A major concern in these projects is how to test all the platform-combinations needed to achieve the required level of environment support.

Since there is no economic or realistic way to test all the possible combinations I use a method based on the pair-wise (or all-pairs) approach, testing the most common interactions between every 2 relevant environment variables. Citing Wikipedia and a description by Rex Black:

“…the simplest bugs in a program are generally triggered by a single input parameter. The next simplest category of bugs consists of those dependent on interactions between pairs of parameters, which can be caught with all-pairs testing. Bugs involving interactions between three or more parameters are progressively less common, whilst at the same time being progressively more expensive to find by exhaustive testing…”

If so, the question becomes how to plan the testing strategy to efficiently cover all the possible pairs? For this I use an iterative heuristic based on simultaneous coverage matrices. This procedure is not as complicated as it sounds and I will show an example using a simplified example from a typical J2EE application testing process.

Step 1.

List all the relevant parameters and their possible values. Since our Testing Application is a J2EE System we have both server-side and client-side components:

a. Server O/S

b. Server App Server

c. Server Database

d. Client O/S

e. Client Browser

f. Upgrading User Client vs. New User Client (this reflects if the user has a clean environment or if she already has some of our previous components in his machine).

Step 2.

Define the testing coverage distribution rate for each parameter. One way of doing this is by thinking of the percentage of users on each environment. In our example we will use the following distributions:

Server Side:

a. Win 2003 30% — AIX 30% — RH Linux 40%

b. JBoss 50% — WebLogic 50%

c. Oracle 9i 20% — Oracle 10g 50% — DB2 30%

Client Side:

d. Win XP SP2 75% — Win Vista 25%

e. IE 6.5 20% — IE 7 40% — FireFox 40%

f. Upgrading Users 60% — New Users 40%

Step 3.

We make a list of all our tests and randomly perform an initial distribution of the testing environments based on the rate numbers defined above.

Starting from this step I will use MS-Excel and Data Pivot Tables, mainly since I have not found any testing tool that provides these views in a comfortable way to perform these operations.

In our example I have a total of only 50 tests even though a real testing project may have over 1000 test cases or instances.

See how the list looks like in example_file_No.1

Step 4.

For each relevant combination we create a Pivot Table with 2 of the parameters.
i. Server O/S vs. App Server
ii. Server O/S vs. DB Server
iii. DB Server vs. App Server
a. Client O/S vs. Browser
b. Client O/S vs. User type
c. Browser vs. User type
For Example (click on the image to see the complete file):
Pivot Table
All the information for the tables is taken from our test list, the Grand Total for each parameter represents the target number of tests we want to perform on the individual environment, while the body of the matrix shows the 2-parameter environment combinations that are been covered based on our current Test Execution Plan.

Notice that in order to refresh the data on the Pivot Tables after changing the test distribution on the Execution Plan you need to right-click on the table itself and request it to refresh.

Step 5.

Our iterative heuristic will consist on leveling the numbers within the pivot tables while making sure that the Grand Totals are maintained as they are right now.

By “leveling” we mean that each row and column needs to have a statistical distribution similar to their respective Grand Totals (these been the distribution rates we on Step 2 above for each of the individual parameters).

I will start on the first Pivot Table in the servers Excel sheet tabs, in principle it’s not important which table or which tab since we will need to cover all of them.

I already explained that each cell in the table represents a testing environment defined by the parameters in its respective row and column, and the number in the cell represents the number of tests we are planning to run on each of these environments based on our current distribution. We now need to distribute the tests for each testing environments based on our coverage rates.

For example, to level the first table:

1. I will move 10 tests from RH-Linux/WebLogic to RH-Linux/JBoss

2. Then I’ll move 3 tests from AIX/JBoss to AIX/WebLogic and another 7 from Win-2003/JBoss to Win-2003/WebLogic in order to maintain the grand total numbers with the original distribution (50% JBoss & 50% WebLogic; 30% AIX & 40% RH-Linux & 30% Win-2003).

->You should end up with the following table:

Pivot Table

It’s important to remember to do all the changes in the Test Execution Plan tabs and then refresh our table to see the correct distribution.
After leveling the first table our file ends up looking like this: example_file_No.3

Step 6.

We perform the same operation for the second Pivot Table (DB Server vs. Server OS); remembering that our goal is to maintain the Grand Totals while leveling the internal numbers. At the end of our modification the table will look like this: example_file_No.4

Step 7.

The third table is similar to the first 2. And we should end up with the following Pivot table, example_file_No.5

Step 8.
If you look now at our first Pivot Table on the excel sheet you will see that the internal rates we set at the beginning were distorted by our operations in Steps 6 & 7. Don’t worry, this is a normal side-effect of our method!
In order to solve this we will return to our first Table and repeat the leveling operations in order to re-achieve internal distributions similar to our target Grand Totals.

After finishing this operation we see in example_file_No.6 that all 3 Pivot Tables provide the internal and external coverage rates we are looking for and this means we can stop at this point. Still, this might not have been achieved with the first correction step, and in such a case this we would have continued with our incremental changes on the following Pivot Tables until we receive the desired distribution results (FYI – larger numbers of tests tend to take 2 to 3 complete iterations over all the Pivot Tables to reach the desired leveling).

Step 9.
After finishing the Server Side configuration we need to do the same for the client side.
Since the configurations and their respective parameters are independent from each other the changes done to them will have no effect on the 3 matrices we worked just now.
After doing the “dirty work” and you can see the results on example_file_No.7

End Product:

After all our iterations we can go back to our Test Execution Plan and take extract from it our new Leveled and Efficient Environment Testing Matrix.

This process is nor infallible, but it has helped me to plan my testing cycles for over 10 years now with very good results.

Improving the efficiency by keeping track of your waste

Posted in Bug Reporting, Metrics & Statistics, Testing Intelligence on February 17th, 2008 by Joel Montvelisky – Be the first to comment

All development organizations have a number of recurring events that waste the time of their teams. When reviewing the subject of Defect Lifecycle Management two of the most important undesired incidents are :
(1) Rejected Defects – Defects that are reported by the QA and rejected by the product or development teams.
(2) Reopened Defects – Defects that are fixed or rejected by the development and are reopened by the QA.

These incidents continually waste the time required to detect, report, review, analyze, assign, and/or fix an important number of defects; and they usually are the result of human error and/or lack of communication (and understanding) between people in different teams. Even if at the first glance the number of defects may appear small, when reviewed closely they add up to days and weeks of wasted time per project.

A couple of years back I worked at a company that implemented an effective way to fight this unnecessary waste. As part of our monthly QA reports with all sorts of project and defect data we started providing the accumulated statistics for the Percentage of Rejected and Reopened Defects per project together with a threshold and traffic light dashboard for each. Deviation of up to 10% brought a yellow-light indicator and after that the projects turned to red on these specific measurements.

The reports were sent each week to each Development & QA Manager and the dashboards were presented once a month to the VP R&D during the periodic Directors’ Meeting where they quickly became a topic of choice, especially as our organization was looking for ways to improve the efficiency of our internal processes.

Each manager was responsible for his own area and was required to explain any major deviations from the set threshold. At the beginning only one in seven teams was bellow the threshold and two more where close to it; more importantly many managers thought the measurement unfair and the threshold impossible to maintain over time.

Then, with the constant pounding of Management and after some months of analyzing the chain of events around defect reporting, more teams started showing green-light indicators. We continued producing and studying these numbers until most teams had improved their metrics and defect management processes considerably around this area.

The above exercise showed me 2 things:
(1) Teams can work more effectively by making sure they communicate better, creating less unnecessary garbage and friction in the process.
(2) There are many hard things that can be achieved once the correct information is placed in the spotlight and enough focus is put in the right places.

Organizations should implement the good habit of keeping track of statistics for these kind of unwanted behaviours. They should keep thresholds for the maximum number of trash their systems produces and in cases where these thresholds are exceeded they should understand the root cause and fix it.

Leveraging Customers Rejections into Your Testing Environment

Posted in Test Process, Tools on February 9th, 2008 by Joel Montvelisky – Be the first to comment

Some years ago I managed the testing team for an off-the-shelf platform aimed at Enterprise Companies. One of our biggest problems happened each time we changed the platform’s DB schema and released a utility to upgrade the system. In too many cases the automatic upgrade would get stuck in the middle of the process, leaving customers dead-on-the-water with a system half way upgraded and without the possibility to roll back to a previous version. It got so bad that we instructed our users to always back-up their databases before upgrading; and had a team of 3 engineers ready to fly around the world fixing the issues and manually upgrading the platforms for some months after each major release.

We analyzed the problem and concluded that we were falling on optimizations and customizations that users had done to the databases; we had not tested all possible configurations and got stuck on stuff that was within the boundaries of a “supported environment”. We now had around 25,000 configurations to test if we wanted to cover all possible settings for all the parameters at stake. Since that solution was not feasible we tested on the 20 configurations we judged to be the most popular and went out with our next release.

The results were not significantly better, we needed another approach.

During a brainstorming session one of the testers gave a brilliant idea; he said that the problem would stop if we could test the upgrade on all customer databases before releasing the product. This solution was also not feasible, but it got us thinking in the right direction: since we could not test on all databases, we could at least test on all databases we already knew had problems.

We created a plan together with our Customer Support Organization; whenever a client called with a DB upgrade issue, in parallel to solving their specific problem we would ask them for a copy of their database project. We told them that to insure the problem would not return in the future we would specifically test our upgrade procedure on their database and correct any issues before the release. With this approach we got over 75% of the database projects where customers reported problems.

At the beginning we did the tests manually: restored the projects into our DB servers, tested the upgrade, made a short sanity and reported any defects we would found. The effort gave good results; even when the bugs from the previous upgrades had been corrected we found new ones related to changes done for the current upgrade utility. The issues we found were good, but it was taking too many efforts to run these tests once every 2 to 3 weeks, specially as we were getting an additional 5 to 10 new database projects a month.

We decided to invest some resources (both in development time and machines) and created a complete environment specifically intended for testing the upgrade procedure in an almost automatic way. It had multiple database servers to perform parallel tests; an automatic process that systematically restored each customer’s project, tried to upgrade the db schema, performed a simple sanity check, and reported the results one after the other. It took it 2 to 3 days to upgrade around 100 projects; and we ran the complete test once a week!

After 2 releases, and once we had around 150 customer projects, we were able to declare our upgrade procedure a non-issue. And to the best of my knowledge the system is still in place within the testing organization.

I was reminded of this testing system this week, talking with a customer who told me about the issues they were having with all sorts of user server configurations and their inability to test all the possible environment configurations as part of their testing efforts.

My take on the subject is that at the end of the day you don’t need to test all the environment configurations, it is enough to have a sample of environments that is (1) large enough to be representative; and (2) close enough to the real world environments in order to provide the same issues and effects your customers experience.

Using your Kitchen as a Communication Channel

Posted in Metrics & Statistics, Testing Intelligence, Tools on February 1st, 2008 by Joel Montvelisky – 1 Comment

Do you want a simple way to keep your team updated on what’s happening on your project, here’s one: a Kitchen Screen.

Most companies have a small kitchen or resting area where people go to make themselves a cup of coffee or grab something quick to eat a couple of times a day. Team members stay there for 5 minutes clearing their heads and informally chatting with colleagues about all sorts of stuff. Did you ever think about putting there, in a non-intrusive way, a small monitor showing important information about your project?

Some years ago my team placed in our kitchen a short 3-slide PowerPoint presentation running on an endless loop with some graphs about our project. We showed a bug detection & fixing convergence graph, our test execution progress graph, and a table with defect statistics showing the amount of open and closed bugs per team. Suddenly people came for coffee and started looking at the screen and talking about the stuff in it.

Two weeks after our “first showing” I got a request from a development manager to add an additional slide with a couple of graphs showing the progress of his team. A week later the project manager asked me to place the updated status of our project Gantt.

The initial presentation was very simple, and I had to remember each morning to manually update the graphs using a template we had defined in Excel. After a while I got one of my engineers to create a small exe file that would generate the graphs and update the presentation automatically twice a day. It got so popular that other teams from within the company started using our “platform” to place small screens in their own kitchens. Our kitchen screen managed to get the information to more people in a more effective way than our weekly Testing Update Reports.

If you decide to use these screens there two important things to keep in mind:

1. Make sure to keep the information updated. Once you stop updating your graphs for more than a couple of days people will stop noticing the screen, and it will take you extra efforts to regain their attention.

2. Don’t put too much information in your presentation and give time for people to digest it. Use a maximum of 5-7 slides, place only 1 or 2 graphs on each page, and show them for about 12 to 20 seconds. The optimal screening time is around 1 minute for all the presentation.

Finally, keep your screens and presentations fun and imaginative, people like surprises once in a while.