|
Is Code Coverage
Really All That Useful?
By Kevin Pang
Test driven development proponents often tend to push code
coverage as a useful metric for gauging how well tested an application
is. 100% code coverage has long been the ultimate goal of testing
fanatics. But is code coverage really all that useful? If I
told you that my application has 100% code coverage, should that mean
anything to you?
What does code coverage tell us?
Code coverage tells us which lines in our application are executed by our
unit tests. For example, the code below has 50% code coverage if the
unit tests only call Foo with condition = true:
string Foo(bool condition)
{
if (condition)
return "true";
else
return "false";
}
What does code coverage not tell us?
Code coverage does not tell us what code is working and
what code is not. Again, code coverage only tells us what was
executed by our unit tests, not what executed correctly.
This is an important distinction to make. Just because a line
of code is executed by a unit test, does not necessarily mean that that
line of code is working as intended.
For example, the following code could have 100% code coverage and pass all
unit tests if it is never called with b = 0. However, once this code
is introduced into the wild it could very well crash with a div by zero
exception:
double Foo(double a, double b)
{
return a / b;
}
So what is code coverage good for then?

To borrow an analogy from Scott
Hanselman's interview with Quetzal Bradley, imagine you are a civil
engineer responsible for testing a newly constructed series of roads.
To test the roads, your first thought might be to drive over them in
your car, making sure that there are no potholes, missing bridges,
etc. After driving over all of the roads a few times, you might
conclude that they have been tested and are ready for public use. But
once you open the roads to the public, you discover that the bridge
overhangs are too low for big rigs, the turns are too sharp for sports
cars, and that certain areas of the roads flood when it rains.
In the above scenario, you had the equivalent of 100% code coverage since
you had driven over all the roads, but you only superficially tested their
behavior. Specifically, you didn't test the roads in different
vehicles and under different weather conditions. So although you went
through each possible execution path, you failed to accomodate for
different states while doing so.
In light of this, the only solid conclusion you can draw from code
coverage seems to be what lines of your code have definitely
not been tested. The lines that have been tested are
still up for grabs it seems unless you are willing to go through each and
every possible state the application can be in when executing them.
This makes code coverage far less useful as a metric as it only tells you
what still needs testing but offers you no help in determining when you are
done testing.
What *is* a good metric then?
Unfortunately, there doesn't seem to be a good metric for determining
whether a line of code has been thoroughly tested or when a developer is
done testing. Perhaps this is a good thing as it keeps us from
falling into a false sense of complacency. It simply isn't feasible
in even a moderately complex application to test each and every line of
code under every possible circumstance. The best case scenario seems
to be to test the most common scenarios and reasonable edge cases, then add
additional tests as functionality inevitably breaks on those scenarios that
you didn't account for. It's an admitedly clumsy system, but it's a
realistic one compared to depending on 100% code coverage to weed out all
possible bugs. That's not to say that there isn't use in achieving
100% code coverage. Executing the code in one particular state still
has value, just not as much as developers seem to give it.
Until next time,
Kevin Pang
To
read more of Kevin's work, visit his blog.
|