Monday, August 29, 2011

Why Tests Don't Really Pass or Fail

Most testers think of tests passing or failing. (We often compute statistics immediately when a test suite is run based on pass/fails.) We run a test. Either it found a bug or it didn’t. Mark the checkbox and on to the next test. This is especially true when we run batches of automated tests.

Unfortunately, experience repeatedly shows us that passing a test doesn’t really mean there is no bug in the area being tested. If it “passes” there will be no further action because there’s nothing to look for. Further action is indicated when the test results flag us to an error or we observe something unusual during the running of the test. It is still possible for bugs to exist in the feature being tested in spite of passing the test. The test may miss the conditions necessary to show the bug. It is also quite possible to miss noticing an error even though a test to surfaces it. Passing a test really means that we didn’t notice anything interesting. We’ll probably never know if it passes erroneously.

Likewise, failing a test does not guarantee that a bug is present. The conclusion that further work is required is what most people call failing a test. Determining whether or not there is an underlying bug requires further investigation. There could be a bug in the test itself, a configuration problem, corrupted data, or a host of other explainable reasons that do not mean that there is anything wrong with the software being tested. It could be behaving exactly as expected given the circumstances. Failing really only means that something that was noticed warrants further investigation. Because we follow-up, we’ll probably figure out (and possibly fix) the real root cause.

Pass/Fail metrics don’t really give us interesting information until the cause of every pass and fail is understood. Unless we validate the pass indications, we don’t know which tests missed a bug in the SUT. Because we don’t really know which passes are real (and we are unlikely to investigate to figure out which are real), any measure of the count of passes misrepresents the state of the SUT. Likewise, the count of failures is only meaningful after thorough investigation and identification of which failures are due to SUT errors and which not.

Skepticism is healthy when reporting test results. As much as we’d like to have definitive answers and absolutes, the results from our testing are inconclusive (especially before failures have been completely investigated). Initial test reports should be qualified with the fact that the numbers are subject to change as more information is gathered about failing tests. Just as the government regularly revises past economic indicators when new information becomes available, so should we treat passes and failures as estimates until we have gathered all the evidence we expect to glean from the tests.

(MA paper I wrote on this is available at: Why Tests Don't Pass)

On Doug Hoffman's blogging

I’ve been testing a broad spectrum of software and systems for a very long time. I've been consulting in SQA for over 20 years after having a similar amount of industry experience. I've studied and taught about a bunch of subjects like CS, EE, QA, business, management, and testing. I've been a software developer, support engineer, teacher, manager, and at heart I'm mostly a quality engineer/tester. I've earned a string of degrees and awards. I've written and published papers and presented at conferences for a long time (see my papers and presentations).

OK. So what?

I’ve still got a lot to learn, but I think I've got a lot to share. I've decided that it's way past time for me to share more online in a blog.

What I want to share are generally new ideas. Things that I've learned from others or figured out through the school of hard knocks. I don't want to rehash topics I agree with that others have already presented. I want to go after things that I don't see generally published or may be contrary to the accepted norms in software testing and quality assurance. For example, things that help explain why software has bugs after we thoroughly test, how software metrics may get us into trouble, that tests don't really pass or fail, how to approach test automation in far more powerful ways, that there are different types of test oracles, that there are different approaches to results comparison, issues in managing of quality assurance and test groups, and on and on.

I've acquired a lot of ideas from a lot of sources, many from the school of hard knocks. I heard that we learn more from our mistakes than from our successes. I agree and I've done my share of learning that way. I'm hoping that my posts are thought provoking. I welcome challenges to the ideas I present, especially those coming from contrary experiences. Therein lies a rich opportunity for our learning more.

- Doug

It's a shame to make the same old mistakes when there are still plenty of new ones to discover. - Doug Hoffman

Friday, June 24, 2011

CAST 2011 August 8 - 10, 2011

This year's Conference of the Association for Software Testing (CAST) promises to be another outstanding opportunity to learn about and contribute to context-driven software testing. Last year's conference was again critically acclaimed, and most attendees are returning for more. The 2011 conference is being held in Seattle, WA on August 8 - 10 (http://www.associationforsoftwaretesting.org/conference/). The price is kept reasonable because the AST is a not-for-profit professional society dedicated to advancing the understanding of the science and practice of software testing (not dedicated first to making money).

Besides being reasonably priced, CAST is really unusual in the realm of software quality and software testing conferences because the participants actually confer. It's not just talking heads and experts telling you the way things are according to them. Half of the session time at CAST is devoted to facilitated questions and shared experiences from the floor. Rooms are available (and have been used) to continue discussions after a session is over. Networking time is scheduled into the conference and the culture encourages open, professional questioning and debate. New ideas, emerging topics, debates, and expressing contrary opinions are all encouraged because we learn so much more from our differences and failures than from similarities and successes.

It looks like CAST is going to have a full house this year, with the majority of seats already sold months before the conference. There are still seats available and time to register. I'm looking forward to meeting up with associates and making new friends.