Test Coverage and Mutation Testing
After writing a test suite, how do you assess its quality? How do you know when to stop testing? We need adequacy criteria, which are metrics for measuring test suite quality.
These criteria let us compare two test suites or decide if our current suite is sufficient.
The most common adequacy criteria are based on structural coverage. The principle is that a failure cannot be observed if the faulty code is never executed. While 100% coverage does not guarantee a bug-free program, low coverage indicates an inadequate test suite.
Here are the common structural coverage criteria, from weakest to strongest.
1. Statement Coverage
- Goal: Execute each statement in the program at least once.
- Metric: \(\frac{\#~statements~executed}{\#~statements~in~program}\)
- Discussion: This is the most basic coverage metric. It can miss faults, such as an empty
elsebranch that should have contained code.
2. Branch Coverage
- Goal: Execute each branch in the program's control flow at least once.
- Metric: \(\frac{\#~branches~executed}{\#~branches~in~program}\)
- Discussion: This metric requires every
ifandwhilecondition to be evaluated to both true and false. This subsumes statement coverage. It can be weak for complex logical conditions. For example, a test for(A || B)might achieve 100% branch coverage without ever evaluatingBas true (due to short-circuiting).
3. Modified Condition/Decision Coverage (MC/DC)
- Goal: MC/DC requires that every atom (a sub-expression in a condition) independently affects that condition's outcome.
- Discussion: To show independent effect for an atom
Ain a condition like(A && B), you need a pair of tests where:
1. The value of B is the same in both tests.
2. The value of A is true in one test and false in the other.
3. The outcome of (A && B) is different for both tests.
- Regulations for safety-critical systems, such as in avionics, require this metric.
4. Path Coverage
- Goal: Execute each program path at least once.
- Metric: \(\frac{\#~executed~paths}{\#~total~paths}\)
- Discussion: This is the most thorough structural metric. For any program with loops, the number of paths can be infinite, making 100% path coverage impractical.
---
Mutation Testing
Mutation testing assesses test suite quality by measuring how many faults it can detect.
The process:
- Generate Mutants: The tool inserts small faults ("mutations") into your program. For example, it might change a
+to a-(AOR operator) or a>to a>=(ROR operator). - Execute Tests: The test suite runs against each mutant program.
- Assess:
* If a test fails, it has killed the mutant, meaning the test suite found the fault.
* If all tests pass, the mutant survived, indicating a gap in the test suite.
The "Mutation Score Indicator" (MSI) is the percentage of mutants killed. This metric measures the fault-finding ability of your tests.