Test Coverage and Mutation Testing

After writing a test suite, how do you assess its quality? How do you know when to stop testing? We need adequacy criteria, which are metrics for measuring test suite quality.

These criteria let us compare two test suites or decide if our current suite is sufficient.

The most common adequacy criteria are based on structural coverage. The principle is that a failure cannot be observed if the faulty code is never executed. While 100% coverage does not guarantee a bug-free program, low coverage indicates an inadequate test suite.

Here are the common structural coverage criteria, from weakest to strongest.

1. Statement Coverage

Goal: Execute each statement in the program at least once.
Metric: $\frac{\#~statements~executed}{\#~statements~in~program}$
Discussion: This is the most basic coverage metric. It can miss faults, such as an empty else branch that should have contained code.

2. Branch Coverage

Goal: Execute each branch in the program's control flow at least once.
Metric: $\frac{\#~branches~executed}{\#~branches~in~program}$
Discussion: This metric requires every if and while condition to be evaluated to both true and false. This subsumes statement coverage. It can be weak for complex logical conditions. For example, a test for (A || B) might achieve 100% branch coverage without ever evaluating B as true (due to short-circuiting).

3. Modified Condition/Decision Coverage (MC/DC)

Goal: MC/DC requires that every atom (a sub-expression in a condition) independently affects that condition's outcome.
Discussion: To show independent effect for an atom A in a condition like (A && B), you need a pair of tests where:

1. The value of B is the same in both tests.

2. The value of A is true in one test and false in the other.

3. The outcome of (A && B) is different for both tests.

Regulations for safety-critical systems, such as in avionics, require this metric.

4. Path Coverage

Goal: Execute each program path at least once.
Metric: $\frac{\#~executed~paths}{\#~total~paths}$
Discussion: This is the most thorough structural metric. For any program with loops, the number of paths can be infinite, making 100% path coverage impractical.

---

Mutation Testing

Mutation testing assesses test suite quality by measuring how many faults it can detect.

The process:

Generate Mutants: The tool inserts small faults ("mutations") into your program. For example, it might change a + to a - (AOR operator) or a > to a >= (ROR operator).
Execute Tests: The test suite runs against each mutant program.
Assess:

* If a test fails, it has killed the mutant, meaning the test suite found the fault.

* If all tests pass, the mutant survived, indicating a gap in the test suite.

The "Mutation Score Indicator" (MSI) is the percentage of mutants killed. This metric measures the fault-finding ability of your tests.