Doug Hoffman's Blog

Monday, August 29, 2011

Why Tests Don't Really Pass or Fail

Most testers think of tests passing or failing. (We often compute statistics immediately when a test suite is run based on pass/fails.) We run a test. Either it found a bug or it didn’t. Mark the checkbox and on to the next test. This is especially true when we run batches of automated tests.

Unfortunately, experience repeatedly shows us that passing a test doesn’t really mean there is no bug in the area being tested. If it “passes” there will be no further action because there’s nothing to look for. Further action is indicated when the test results flag us to an error or we observe something unusual during the running of the test. It is still possible for bugs to exist in the feature being tested in spite of passing the test. The test may miss the conditions necessary to show the bug. It is also quite possible to miss noticing an error even though a test to surfaces it. Passing a test really means that we didn’t notice anything interesting. We’ll probably never know if it passes erroneously.

Likewise, failing a test does not guarantee that a bug is present. The conclusion that further work is required is what most people call failing a test. Determining whether or not there is an underlying bug requires further investigation. There could be a bug in the test itself, a configuration problem, corrupted data, or a host of other explainable reasons that do not mean that there is anything wrong with the software being tested. It could be behaving exactly as expected given the circumstances. Failing really only means that something that was noticed warrants further investigation. Because we follow-up, we’ll probably figure out (and possibly fix) the real root cause.

Pass/Fail metrics don’t really give us interesting information until the cause of every pass and fail is understood. Unless we validate the pass indications, we don’t know which tests missed a bug in the SUT. Because we don’t really know which passes are real (and we are unlikely to investigate to figure out which are real), any measure of the count of passes misrepresents the state of the SUT. Likewise, the count of failures is only meaningful after thorough investigation and identification of which failures are due to SUT errors and which not.

Skepticism is healthy when reporting test results. As much as we’d like to have definitive answers and absolutes, the results from our testing are inconclusive (especially before failures have been completely investigated). Initial test reports should be qualified with the fact that the numbers are subject to change as more information is gathered about failing tests. Just as the government regularly revises past economic indicators when new information becomes available, so should we treat passes and failures as estimates until we have gathered all the evidence we expect to glean from the tests.

(MA paper I wrote on this is available at: Why Tests Don't Pass)

On Doug Hoffman's blogging

I’ve been testing a broad spectrum of software and systems for a very long time. I've been consulting in SQA for over 20 years after having a similar amount of industry experience. I've studied and taught about a bunch of subjects like CS, EE, QA, business, management, and testing. I've been a software developer, support engineer, teacher, manager, and at heart I'm mostly a quality engineer/tester. I've earned a string of degrees and awards. I've written and published papers and presented at conferences for a long time (see my papers and presentations).

OK. So what?

I’ve still got a lot to learn, but I think I've got a lot to share. I've decided that it's way past time for me to share more online in a blog.

What I want to share are generally new ideas. Things that I've learned from others or figured out through the school of hard knocks. I don't want to rehash topics I agree with that others have already presented. I want to go after things that I don't see generally published or may be contrary to the accepted norms in software testing and quality assurance. For example, things that help explain why software has bugs after we thoroughly test, how software metrics may get us into trouble, that tests don't really pass or fail, how to approach test automation in far more powerful ways, that there are different types of test oracles, that there are different approaches to results comparison, issues in managing of quality assurance and test groups, and on and on.

I've acquired a lot of ideas from a lot of sources, many from the school of hard knocks. I heard that we learn more from our mistakes than from our successes. I agree and I've done my share of learning that way. I'm hoping that my posts are thought provoking. I welcome challenges to the ideas I present, especially those coming from contrary experiences. Therein lies a rich opportunity for our learning more.

- Doug

It's a shame to make the same old mistakes when there are still plenty of new ones to discover. - Doug Hoffman

Friday, June 24, 2011

CAST 2011 August 8 - 10, 2011

This year's Conference of the Association for Software Testing (CAST) promises to be another outstanding opportunity to learn about and contribute to context-driven software testing. Last year's conference was again critically acclaimed, and most attendees are returning for more. The 2011 conference is being held in Seattle, WA on August 8 - 10 (http://www.associationforsoftwaretesting.org/conference/). The price is kept reasonable because the AST is a not-for-profit professional society dedicated to advancing the understanding of the science and practice of software testing (not dedicated first to making money).

Besides being reasonably priced, CAST is really unusual in the realm of software quality and software testing conferences because the participants actually confer. It's not just talking heads and experts telling you the way things are according to them. Half of the session time at CAST is devoted to facilitated questions and shared experiences from the floor. Rooms are available (and have been used) to continue discussions after a session is over. Networking time is scheduled into the conference and the culture encourages open, professional questioning and debate. New ideas, emerging topics, debates, and expressing contrary opinions are all encouraged because we learn so much more from our differences and failures than from similarities and successes.

It looks like CAST is going to have a full house this year, with the majority of seats already sold months before the conference. There are still seats available and time to register. I'm looking forward to meeting up with associates and making new friends.

Tuesday, July 13, 2010

CAST 2010, August 2-4

For each of the past 4 years the AST has put together a conference focused on software testing. This year CAST runs August 2-4 in Grand Rapids, Michigan. It's not just any conference. It's really a different kind of conference because of the way we encourage dialog - actual conferring at a conference. It's not for everyone, only people who want to think about what they do and are willing to learn and share new ideas. The track sessions are not just talking heads dumping their messages unchallenged and unchallengeable. (E.g., the speakers only get 1/2 the time to speak - the rest of the time is for questions and counter-examples from the attendees. We designed CAST to allow open discussion of issues and real thinking to take place. Not as just a venue for consultants and vested interests to dump their messages and run.

What we did was design a unique conference for software testing:
o For Professional Testers (or those who want to be professional)
o Striving to advance and improve the science of software testing
o Opportunities for networking and dialog are encouraged by the conference format
o Many testing leaders with diverse (and often conflicting) views participate
o Debate and passionate discussion is encouraged
o Half the session time is set aside for moderated feedback, questions, and experience reports from participants
o There are spare rooms available if discussion/debate spills over past the end of the session)

I'd really like to encourage people to attend CAST. It's not your grandfather's software conference, though. Plan to meet with people, talk about testing, share your ideas, and hear unique viewpoints from other professionals.

Saturday, July 10, 2010

Hello world!

I'm a late bloomer when it comes to blogging. I've resisted for years now, but have finally decided to share my thoughts this way, too. People tell me that I've got a lot to share, and I guess that's true sometimes. I hope my sharing stimulates some thinking.

Who am I (professionally)?

I've been doing software engineering since before there were university degrees in it. I've especially focused on software quality and testing. For the past 20 years I've been a management consultant. I've worked with a broad range of software and systems from computer systems manufacturers to government and database engines to web applications. Most of my work has been with commercial software but a good deal of it with IT organizations and my most recent work has been architecting an organizational transformation for the US Treasury Department.

My background is solidly based in formal training and methods in computer science and quality assurance, but my application of the ideas is often unorthodox. I subscribe to the notion that what works best is situational; there are no best practices applicable to all situations. So-called standards are excellent sources of ideas but can be counterproductive when applied by rote. Therefore, I develop custom solutions for the organizations I work with.

I have been very active professionally; speaking at and putting on software conferences (too numerous to list, but currently PNSQC and CAST), co-founding and running professional organizations (e.g., SSQA, ASQ, and AST). I'm President of the Association for Software Testing (AST, sponsor of CAST); Keynote, Invited Speaker, and Tutorial Chair for the Pacific Northwest Software Quality Conference (PNSQC), and Auditor for the Silicon Valley Software Quality Association (SSQA). I'm also on the Computer Science Department Advisory Board for Florida Institute of Technology (FIT). I've been teaching AST's on-line Foundations and Bug Advocacy classes for several years.

I have a web site where I have dozens of my published papers and dozens of presentations. I'm also found in LinkedIn and various other places on the web.

In my spare time I like to solve puzzles (math in particular), work in the yard (chainsaw and industrial weed whacker), and fix various things around the house (lots of opportunities there).