Blog

Mutation testing: Improve your sleep

Jun 30, 2023 | 6 minutes read

I’m going to start this new post with a question: Which kind of achievement makes you happier as a developer? Not necessarily a big one, something that occurs often, maybe daily.

I’m going to start with one of mine: the entire test suite passing after a change + a clean SonarQube analysis report after pushing my changes 👯💘

Lovely, right? Of course, later will come the rollout and measure the business impact, but from the perspective of a developer, and being completely honest, most of the first pressure is released once this shiny and beautiful dashboard is shown. Why?

Well, there’s a sort of non-explicit agreement that says that if you integrate your new code into a codebase which has a 100% of coverage and the tests are succeeding, then you can be completely calm and rely on this safety net: you’re done with the implementation, it’s being automatically tested and thus, nothing bad might occur from now onwards. Rollouts on Friday afternoon are more than welcome, being on-call doesn’t mean become a firefighter, your sleep quality will increase due max confidence and so on.

But mate, unfortunately the thing is that this is partially true. The code coverage (the manager’s best-friend and this sort of Holy grail of code quality), is actually showing out that the tests execution is passing through all the code that is considered “covered”, which is important, but doesn’t ensure you that the functionality output is asserted or the test is properly designed to be robust in front of future changes, so much so that a test without any assertion can have 100% of coverage. Let me demonstrate this with a concrete example.

Use case

We have this simple class which represents the size of an object, and within the constructor we have added a guard clause in order to ensure data integrity for each instance:

@Getter
public class Dimension{

    private final double height;
    private final double width;

    public Dimension(double height, double width) {
        if (height <= 0 || width <= 0) {
            throw new RuntimeException("Provided piece dimensions are invalid: [height: " + height + ", width: " + width + "]");
        }

        this.height = height;
        this.width = width;
    }
}

In order to test the logic of this class, we have added also a unit test that checks both corner cases and the happy path:

public class DimensionTest {

    public static final int INVALID_SIZE = -1;
    public static final int VALID_SIZE = 1;

    @Test
    void givenInvalidHeightWhenDimensionIsCreatedThenExceptionIsThrown() {
        //When/Then
        assertThrows(RuntimeException.class, () -> new Dimension(INVALID_SIZE, VALID_SIZE));
    }

    @Test
    void givenInvalidWidthWhenDimensionIsCreatedThenExceptionIsThrown() {
        //When/Then
        assertThrows(RuntimeException.class, () -> new Dimension(VALID_SIZE, INVALID_SIZE));
    }

    @Test
    void givenValidHeightAndWidthWhenDimensionIsCreatedThenInstanceIsCreated() {
        //When
        Dimension dimension = new Dimension(VALID_SIZE, VALID_SIZE);

        //Then
        assertEquals(VALID_SIZE, dimension.getHeight());
        assertEquals(VALID_SIZE, dimension.getWidth());
    }
}

If we analyze the code coverage for this code, the result will be that we have 100% of coverage on it:

As I said firstly, this would mean that everything is fine because we’re delivering a code which has automated tests that are ensuring the expected behaviour as well as are the safety net for the possible future changes around it. In this case, the first affirmation is true but actually the second one it isn’t. Once again, the question is why, right?

Let’s see what would happen if someone does this small change by mistake relying on the tests:

@Getter
public class Dimension{

    private final double height;
    private final double width;

    public Dimension(double height, double width) {
        if (height < 0 || width < 0) {
            throw new RuntimeException("Provided piece dimensions are invalid: [height: " + height + ", width: " + width + "]");
        }

        this.height = height;
        this.width = width;
    }
}

We would agree that the business logic was changed because from now onwards the value 0 will be allowed for the height and width attributes and, therefore, at least one of the tests should fail in order to let us notice about this mistake. Unfortunately, this doesn’t occur: coverage2 Wait a minute, I had 100% of coverage, I modified my business logic and after that important change, it remains succeeding with 100% coverage??? Sadly yes

If you take a look at the test code, the permutations that we’re covering with the tests are based on values that are above or below the conditional boundary, but we are missing to test the behaviour using the value that is strictly on the boundary, which in this case is zero. Let’s modify a little bit the test class in order to make it more robust:

public class DimensionTest {

    public static final int ZERO_SIZE = 0;
    public static final int INVALID_SIZE = -1;
    public static final int VALID_SIZE = 1;

    @Test
    void givenInvalidHeightWhenDimensionIsCreatedThenExceptionIsThrown() {
        //When/Then
        assertThrows(RuntimeException.class, () -> new Dimension(INVALID_SIZE, VALID_SIZE));
    }

    @Test
    void givenInvalidWidthWhenDimensionIsCreatedThenExceptionIsThrown() {
        //When/Then
        assertThrows(RuntimeException.class, () -> new Dimension(VALID_SIZE, INVALID_SIZE));
    }

    @Test
    void givenZeroHeightWhenDimensionIsCreatedThenExceptionIsThrown() {
        //When/Then
        assertThrows(RuntimeException.class, () -> new Dimension(ZERO_SIZE, VALID_SIZE));
    }

    @Test
    void givenZeroWidthWhenDimensionIsCreatedThenExceptionIsThrown() {
        //When/Then
        assertThrows(RuntimeException.class, () -> new Dimension(VALID_SIZE, ZERO_SIZE));
    }

    @Test
    void givenValidHeightAndWidthWhenDimensionIsCreatedThenInstanceIsCreated() {
        //When
        Dimension dimension = new Dimension(VALID_SIZE, VALID_SIZE);

        //Then
        assertEquals(VALID_SIZE, dimension.getHeight());
        assertEquals(VALID_SIZE, dimension.getWidth());
    }
    
}

If we execute the tests now, then there’s some of them that are failing, which for instance is good: test_failure If now we rollback our code mistake and run the tests, we will see that the tests keep working as before the mistake, but now are more robust against the possible mistakes.

How to protect ourselves?

At this point, the question would be how to get rid of this possible developer mistake given that the code coverage is not something that we can 100% rely on. For instance, I used to introduce these kind of manual mistakes at my code once the entire test suite was locally passing in order to ensure their robustness. I don’t trust my skills and, at least for me, always looks weird that all the new tests are passing on the first run. Incorrect mocks? Missing assertions? Am I really testing this piece of code in a proper way?

Of course, you probably will be saying that this is not a proper and sustainable way of doing that, because our real code doesn’t look that simple and having to do all the permutations manually would be crazy, and I fully agree with you that makes no sense. Fortunately there’s a type of testing that allow us to do it in an automated way: mutation testing.

Mutation testing is a type of testing which will do all these permutations by us based on some mutation rules, letting us check our test’s solidity. I always define mutation testing as a test of the tests, a sort of meta testing, the double check of WhatsApp. In this post I’m not going to go deep on the concepts, setup and usage because this will come in a second post, but just as a seed I will show you how mutation testing would have helped us to detect this hidden possible bug and increase our confidence in our code: mutation-report This is how the report looks like after executing the mutation testing tool (in this case I’m using Pitest) as part of our local development loop and/or CI. Briefly explained this report is alerting us that after some code mutations, there’s some tests that remain passing, which is a smell that something is not well-designed.

Hopefully this post helped you to understand which could be the problem of relying completely on the code coverage metric as well as which is the aim of this testing type.

In the next post you will find an overview of the mutation testing concepts, which tools are available and a hands-on of how to use and integrate them to, now yes, let you sleep soundly.