Joy of Coding… and mutation testing in Java
For many years now it has been good practice to write unit tests for your source-code. And also to use test coverage reporting to see how much of your code is covered by tests. Although line + branch coverage reporting is quite useful, it doesn’t tell you how good your unit tests actually are. Hence it’s even possibly to achieve 100% coverage without even a single assert in your tests. Being interested in better ways of testing I attended the “Mutation testing” workshop during this years Joy of Coding conference.
Mutation testing is a radical different approach of executing and analyzing the result and coverage of your unit tests. Instead of measuring how much of your code is “accessed from” your unit tests it determines how much of your code is actually “tested by” your unit tests.
So how does it actually work
The basic idea behind mutation testing is to make a small change (a mutation) to the (byte) code and then execute your tests to see if it is detected by the unit tests.
Possible mutations are altering a “ > ” into “ >= “, replacing “ ++ ” with “ — ” and removing “ void ” method invocations.
Each mutation therefor creates an altered version of your code called a “mutant”.
- the mutant is detected by our unit tests: the tests fails and therefore the “mutant” is considered “killed”.
- the mutant remains unnoticed by our unit tests: the tests did “not” fail (the “mutant” is considered “alive”) and didn’t notice the mutation; this means that the “mutant” is actually “not” tested (uncovered) by the unit tests.
An example of mutation testing
So how does this “mutation testing” actually work?
Consider the following method:
public String foo(int i) < if ( i >= 0 ) < return "foo"; >else < return "bar"; >>
And the fact that the unit tests consist of only one test method:
What if we would create a “mutant” of our code in which “ >= ” is altered into “ > “?
We would expect our unit test method to detect this, right? Well in this case it’s not since the test method doesn’t contain a single assertion.
What is we would change a “testFoo” method to include an assertion:
Now our unit test method will fail and detect (aka “killed) the “mutant” code.
- the first return method could be altered to return null (instead of «foo» );
this “mutant” is “killed” by the “testFoo” method due to the “assertEquals” statement but remains unnoticed the original “testFoo” method (without any assertions). - the second return method can be altered to return null (instead of «bar» );
since no test method actually covers this execution path this “mutant” will remain unnoticed.
NOTE: some mutation testing tooling (like PIT for Java) won’t even bother creating a “mutant” for the second return statement as it will never be covered by the unit tests (as detected by traditional line coverage).
Equivalent mutations causing false-positives
As apposed to traditional line + branch coverage, mutation coverage can possibly lead to false-positives.
It could “incorrectly” report (a false-positive) that a “mutant” as “not” being detected by your unit tests.
For instance consider the following Java code:
public int someNonVoidMethod() < return 0; >public void foo() < int i = someNonVoidMethod(); // do more stuff with i >
During mutation testing (using PIT Mutation testing with some “non”-default configuration) the following “mutant” could have been created:
public int someNonVoidMethod() < return 0; >public void foo() < int i = 0; // do more stuff with i >
The “ int i = 0 ” statement in the “mutant” is functionally “equivalent” to the original code in which “ someNonVoidMethod ” returns 0 .
Such an “equivalent mutation” cannot be detected since the unit tests will (and should) not fail on it.
And therefore it will be reported as being non-covered whereas it is actually a false-positive.
When using PIT, a mutation testing framework for Java, “equivalent mutations” should, according to the documention, be minimal using the “default” set of mutators.
For instance the “Non Void Method Call Mutator” of PIT causing the “ int i = 0 ” equivalent mutation is disabled at default.
Conclusion
After participating in workshop, some additional investigation and playing around with PIT, I got really enthusiastic about using “mutation testing” in the near future (starting with new components) on my current project.
As apposed to traditional coverage reporting the mutation test coverage actually measures the quality of your tests and cannot be fooled like traditional coverage reporting.
- check out this very funny presentation from Chris Rimmer about the basic concept of mutation testing.
- furthermore there’s an interesting article from a company called TheLadders using the PIT mutation testing tool.
- also theres an extensive article from Filip van Laenen about “mutation testing” in edition 108 of the overload magazine.
- last but not least there’s the documentation on the PIT mutation testing website.
What is mutation in java
Clearly, as programmers we need to think about whether or not to use mutation. We have examples and warnings here, but not principles. We should try to articulate the impact of mutation on computation and some guidelines on how to use it effectively.
Philosophically, we can state the following principle:
Mutation introduces time into computation
To help explain this, consider the following sequence of operations:
LinkedList MyList = new LinkedList();
MyList.contains(3) —> returns false
MyList.contains(3) —> returns true
Clearly, the two evaluations of MyList.contains(3) yield different reults. This means that when we call a computation matters. We’re not measuring «when» in clock-time, but in time as an ordered sequence of events. Depending on what events happened in the past (a notion of time), we get different answers.
Mathematically, this statement is somewhat bizarre: a function, by definition, returns the same output every time it is called on the same input. How do we make sense of this?
We understand this by realizing that there is always an implicit argument to every function: the contents of memory. The contains method is still a function (in the mathematical sense), but if we are being honest about it, its type is
contains : LinkedList E (VarNames->Addresses) -> boolean
Between the two calls to contains , memory changes. Hence, contains really is a function; it just doesn’t always appear that way since the memory argument is implicit. This is our second main observation about mutation:
Mutation forces you to consider an implicit parameter.
But wait–isn’t memory also an implicit parameter in programs without mutation? Yes, but the whole point of mutation is that it can change the contents of that implicit parameter (beyond adding new names). When you program without mutation, the address tied to variable name remains the same, so the implicit argument is irrelevant. When you program with mutation, you have to think about the contents of that implicit parameter.
But so what? Why is this an issue? Because people tend to forget to think about implicit data. Remember our backtracking tic-tac-toe example? People often forget to «undo» the board edit. With memory not being explicit in the program, that’s easy to do.
2 When To Use Mutation
Mutation is not uniformly evil. We’ve seen that we cannot create cyclic data (graphs) without it, for example. Many programs also require that you use mutation to maintain information. To understand these conditions, consider our observation that mutation introduces time into computation (via past events). There are many computations when you do want the result to be dependent on past events. In a banking system, for example, the history of past transactions determines a customer’s current balance. This suggests that a banking system should use mutation to reflect transactions over time.
But hang on: when we discussed tic-tac-toe yesterday, we said that we should not have used mutation there. We should have just passed the current board around as an argument. Didn’t that «current board» capture past choices of moves, which would justify state?
The current board does indeed capture past move choices. However, only the search function can modify or consult those choices. In a banking application, different functions add to or consult the history of transactions. Mutation helps coordinate the actions between those different functions. Put differently, mutation helps manage the combination of remembering (the effect of) past events and sharing that knowledge across multiple functions.
3 A Practical Perspective
A friend of mine is on the Google Chrome team. These are his comments on the role of functional programming (versus mutation-based programming) at Google.
What should be type of a search method be? Assume we want (roughly) the same results each time we search on the same terms.
What’s the type of a typical command in a system for searching for flights (for example)?
Both of these operations search massive amounts of data. We therefore want to distribute the computation into smaller chucks that different machines can run in parallel. Which of these two type contracts is easier to parallelize?
- Functional code is easy to parallelize; mutation-based code is not.
- The average PC stays up roughly 2 months without crashing. If you build your infrastructure on thousands of cheap machines, you’ll have one failing every few minutes. You must be able to move computations to new machines when a machine goes down.
- Testing is much harder with shared state or lots of local state.
- How do you schedule updates to highly-shared data structures when machines might die mid-computation? Functions have to return effects (ie, operations to perform on data). Now, highly-parallel algorithms such as map-reduce can work with the results.
3.1 Reflection
Think about this proposal that a function return effects rather than modify data: what it really does is make an implicit parameter (memory) explicit. When we pass a tic-tac-toe board as an argument rather than modify it, we are making the board data explicit. Making data updates explicit turns methods back into functions. This is a powerful and useful technique that simplifies many real-world concerns such as those Google faces.
4 Back to Graph Representations
Buried in our lecture notes on graphs is a proposal that we didn’t have time to cover in lecture: rather than accumulate the visited nodes in a list, we could have a «visited» flag within each node that we set when a node is visited.
What are the implications and tradeoffs of this representation versus our representation that passed around a list of visited nodes?
- You have to remember to tear down after each operation.
- Modifying the nodes requires less memory than passing around a list.
- Modifying the nodes assumes that you have access to them (if you are writing a mashup over the Google-Maps API, you don’t get to modify the map graph).
- Modifying the nodes assumes that only one graph traversal will be running at a time.
Note that none of this discussion was about whether we used a mutation-based data structure (ie, a LinkedList ) to store the visited nodes. That list is a local data structure; you can handle that however you want. The question is whether we modify the overall data structure or use a parameter to store our relevant history. There are cases in which each approach is arguably preferable.
5 Summary
The message from all of this is that you should think carefully about not just your data structures, but the properties of their implementations, when writing larger-scale software. We’re not saying «never use a mutation-based data structure» – that would be silly, as it would imply that you never use the built-in Java libraries, for example.
We are, however, saying that you need to think about mutation across the interfaces of your code. Do you want clients of your code to have a stateful view of your code (regardless of whether you use state internally)? What should your code return ( void or effects)? You have control over your APIs and the access modifiers (including immutable , which prevents clients from modifying a piece of data) that you put on your objects.