A ramble on Unit Tests

Before reading I’d like to issue a warning: this post is highly subjective based on my own experiences. You may not feel the same.

It’s also worth mentioning that the topic of unit testing is so vast that this article barely scratches the surface. I’ll leave a list of references at the end of the article if you want to learn more about the subject.

What is the goal of unit testing?

Before anything it’s best to think why you’re doing something. Unit testing is no different. It wasn’t always that unit testing was considered the norm(and it still isn’t in some cases).

Generally having unit tests leads to an overall increase in the quality of the code. But that’s not the main goal, it’s just a side effect. Unit testing allows a project to grow in a sustainable way. That’s the main goal.

Imagine you’re just starting out on a project. Initially features are cranked out at neck breaking speed, but as time goes on this speed starts to feel more like a walk, then a crawl and if you’re unlucky enough it feels like almost walking through mud. What, at the start of the project, took a few hours now takes days.

Unit tests help in this matter quite a lot. The initial investment is large, in some cases it can even be significant, but as the project progresses the speed of development should stay mostly the same. Of course there will be some increase in development time but that’s normal. But, it won’t grind to a halt.

The end goal of unit tests are sustainability and scalability of a project.

Good and bad tests

Writing unit tests for a piece of code is a very accurate negative indicator, hard to write unit tests are a good sign of badly written code, but it’s also a terrible positive indicator, the fact that you can easily write unit tests for some piece of code doesn’t mean the code itself is good.

I view having unit tests for the sake of having unit tests more harmful than having no unit tests. Having more code is not good, the ideal amount of code is no code, and the next ideal amount is having the minimum amount of code to get the job done. The more code you have, the more time you have to invest in maintaining it. Having more code can be considered an asset the same way having more holes in your swiss cheese can be considered an asset.

And tests are no different, they’re still code.

Out of all the various definitions I’ve read about what makes a unit test good this one stuck with me the most

“A good unit test has the following four attributes: fast feedback, maintainability, offers protection against regression bugs, is resistant to refactoring(this refers to refactoring of the code being tested, not the test itself)”

Definition of refactoring taken from wikipedia.

In computer programming and software design, code refactoring is the process of restructuring existing computer code—changing the factoring—without changing its external behavior

Resistance to refactoring is, I believe, the most important one, as this is where most of the work goes(excluding the initial setup). It’s very hard to strike a good balance between a simple test and a useful test. This point also goes hand in hand with maintainability. As you’d want to do very little work, if any, on a unit test when refactoring the code that it’s testing.

Fast feedback quite literally means how fast the unit test runs. In should be in the double digit milliseconds.

Protection against regression bugs is in fact the reason to have a unit test in the first place. If it doesn’t catch bugs then what’s the point of having it?

So the goal here is to have as few unit tests as possible, keep them maintainable and easy to understand but also resistant to refactoring of the code being tested while also providing fast feedback.

That’s a lot too ask from anything.

But unfortunately you can’t have your cake and eat it too. All unit tests make some trade offs between those 4 attributes.

In my experience resistance to refactoring should never be traded off for anything. So in fact you’re left off with choosing between protection against regression, maintainability, and fast feedback.

If you’re familiar with CAP Theorem it’s basically the same thing.

Of course in real life there’s no universal answer. You’ll have to do varying degrees of trade off which, dear reader, falls onto you to make those trade offs as best you can.

Are you from Detroit or London?

Strange way to talk about unit tests when I’ve never actually given a definition.

Generally a unit test is an automated test that verifies code, does it fast, and it does it in an isolated manner.

Different people have different ideas about what “in an isolated manner” means. So much so that there are 2 different schools of thought when it comes to this.

One is the classicist way of doing unit tests, also known as the Detroit style of unit testing.

One is the mockist way of doing unit tests, also know as the London style of unit testing.

Each of these styles treats isolation differently.

Lets start with the mockist take.

But before we do that it’s important to know the difference between stubs, mocks, fakes, spies and dummies. A test double is an all encompassing term for them(like stunt doubles in movies – that’s where the term came about actually).

Out of all of them mocks and stubs used the most(in my experience).

I will quite literally copy paste Martin Fowler’s definition from Mocks aren’t Stubs article(which if you haven’t read, I highly recommend that you do).

  • Dummy objects are passed around but never actually used. Usually they are just used to fill parameter lists.
  • Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an in memory database is a good example).
  • Stubs provide canned answers to calls made during the test, usually not responding at all to anything outside what’s programmed in for the test.
  • Spies are stubs that also record some information based on how they were called. One form of this might be an email service that records how many messages it was sent.
  • Mocks are what we are talking about here: objects pre-programmed with expectations which form a specification of the calls they are expected to receive.

The whole idea behind the mockist take is that a class is a unit. Any dependencies it has should not be real objects but rather mocks.

The general approach when doing TDD with a mockist take is that you’d start outside-in. Starting at the top layer(UI usually) then working your way to the database(or whatever your last layer may be). Much like an onion being peeled away.

It focuses heavily on using abstraction(usually interfaces). It also focuses heavily on implementation details of the code itself. Which also leads to heavy coupling of the tests to the code.

I think an example is in order so you can get a better understanding of what I mean.

Lets assume you have to program a coffee machine of sorts. Starting with the UI which in our case is a a button.

A button is in fact the best analogy for an interface. You press it and you get coffee. You don’t care how you get coffee, you just know that pressing the button gets you coffee.

interface CoffeeButton
{
    public function getCoffee();
}
class CoffeeMachine
{
    private CoffeeButton $coffeeButton;

    public function __construct(CoffeeButton $coffeeButton)
    {
        $this->coffeeButton = $coffeeButton;
    }

    public function brewCoffee()
    {
        return $this->coffeeButton->getCoffee();
    }
}

And the test

class CoffeeMachine extends TestCase
{
    public function testCoffeeMachine() {
        $coffeeButton = $this->createMock(CoffeeButton::class);
        $coffeeButton->expects($this->once())
            ->method("getCoffee")
            ->willReturn("freshly brewed coffee");

        $coffeeMachine = new CoffeeMachine($coffeeButton);
        ...
    }
}

Using this approach will completely isolate the class from its dependencies. It can also reliably tell you what exactly fails and where. Since everything is isolated.

Now, of course this is a very naive example. Classes in real life may depend on more than one class, can sometimes have circular dependencies, and can lead to pretty complex mocks.

Trying to test interconnected code without using mocks is hard, unless you want to recreated the whole object graph, which may not be feasible.

So far so good. But how would the classicist approach this?

When doing classic unit testing the isolation level is different.

Looking at some code will make this a lot clearer. We’ll be using the same classes/interfaces as before but we’ll add a concrete implementation.

class CoffeeButtonImpl implements CoffeeButton
{
    public function getCoffee(): string
    {
        // does some complex logic
        return "freshly brewed coffee";
    }
}
class CoffeeMachineTest extends TestCase
{
    public function testCoffeeMachineClassic() {
        $coffeeButton = new CoffeeButtonImpl();
        $coffeeMachine = new CoffeeMachine($coffeeButton);

        $this->assertEquals("freshly brewed coffee", $coffeeMachine->brewCoffee());
    }
}

You’ll immediately notice that the level of isolation is highly subjective based on how many dependencies the class has.

In this case the level of isolation is just 2 classes. But in a real life scenario it can be many classes.

So in the eyes of a classicist the isolation is between a unit test and other unit tests, NOT between classes/methods.

You can very much say that unit testing in a classicist way is testing a use case or a business requirement.

In that sense unit testing in the classic style can feel like integration testing.

Classic also takes an inside out approach. You begin at the component level and work your way out, much like a worm eating an apple from inside out.

So which school should you adhere too?

There’s no correct answer to this. We’re suppose to be engineers, and as such we must take a pragmatical approach. However, this is where the bias I was talking about at the start comes in.

I personally prefer a classicist approach when writing unit tests but I don’t think for a second that test doubles aren’t useful.

Imagine the very real scenario that our coffee machine has to check for a firmware update and if it finds one then it does the update. Your unit tests will call a real API to check that?

This is where test doubles come in. You can use a mock to simulate API data, or a stub to simulate the API call with a fake answer but leave the rest of class to do real work on the fake answer the stub provides.

The London approach offers better granularity, checking one class at a time. Having interconnected classes is not a big issues since you can mock them. Failing tests point you exactly to the class that’s the culprit since the other classes are mocks.

Mockist approach is a clear white box approach. You have to think about the implementation details when writing tests, which leads to brittle tests that easily break when changing implementation details of a method. Depending on how many tests you have for that one class, or how many areas you touch when refactoring this can quickly get out of hand.

The Detroit approach offers a more rough state, it doesn’t check one class at a time(usually). Having many interconnected classes can lead to complex setups. Failing tests don’t really point to a specific class, since a single bug can lead to cascading failure. Usually this is not an issue since the bug generally is in what you last edited.

The Detroid style takes a black box approach. You only care about what the end result is, the implementation details are not important. This leads to highly resistant tests when it comes to refactoring of the underlying code(thus the reason I prefer classic approach).

Classic approach has more initial infrastructure time investment when writing tests, as you have to setup dependencies, stubs, etc. But careful managing of this can lead to a lot of reusable code.

London approach has minimal initial infrastructure investment because you can just mock everything. Of course, this can lead to having many mocks.

I also find classic approach much easier to read, since you don’t care about how methods reach that end result, only that they reach it. Mock approach on the other hand is much more verbose, and it takes extra brain power to understand what should happen.

My very biased opinion is that classic testing is better than mock testing. You should use real objects wherever you can, and mock strictly what’s necessary to get the job done. Tests should focus on end result not how you reach that end result(steps taken). Black box testing is far more resistant to refactoring of the class(es) being tested than white box testing. Refactoring should have minimal impact on your tests unless they change the outcome of the end result.

OK, so what makes a unit test actually good?

Expanding on the definition above.

“A good unit test has the following four attributes: fast feedback, maintainability, offers protection against regression bugs, is resistant to refactoring(this refers to refactoring of the code being tested, not the test itself)”

We’ve talked about what a good and bad test is from the perspective of the test itself. But testing for the sake of testing is pointless.

A good unit test should first and foremost be integrated in your development cycle. There’s very little point in having a fast, maintainable, resistant to refactoring unit test that also catches bugs if you’re not going to use it!

This leads me to my next point: target only specific parts of your code base. Yes, I said it, not everything needs to have unit tests. If you’re familiar with the Pareto principle, also known as the 80/20 rule it makes very little sense to invest time in something that provides little value.

The domain model(whatever that may be for you) deserves the most attention. Perhaps this translates into a service, or some specific component that does some intense calculations. I see very little use in unit testing a component that’s used only for gluing stuff together.

It should also provide maximum value with minimal maintenance. If for every line of refactored code you end changing 3, 4…n lines of unit test code then maybe you should reconsider your approach to writing unit tests.

Worse off, if you’re working in a team, which I assume that vast majority of people are, then developers will simply use tricks to avoid changing the tests.

Lets face it, writing unit tests can be pretty boring. They’re fundamental to a good development cycle but you’re not exactly doing world changing work here. So the less of it you have to do, the better.

Avoid writing unit tests for trivial code. You really don’t need to test setters/getters.

Don’t specifically test 3rd party libraries, frameworks, or external systems. They will be included in your tests by default when you test classes that use those libraries/frameworks. This is a good thing. Libraries/frameworks can have bugs too. But don’t specifically write tests for them.

Now, I want to stress this: writing the least amount of code doesn’t mean skipping out on legitimate tests. It means being smart about it, reusing code where possible, create helper functions, use a faking library, stuff like that.

Imagine our coffee machine gets an extra button for adding sugar to the coffee. This new functionality has no bearing on the end result. You’re still getting coffee, but now it has sugar.

Lets see a code example

class SugarButton
{
    public function getSugar() : string
    {
        return "sugar";
    }
}

And the changes in the unit test

class CoffeeMachineTest extends TestCase
{
    public function testCoffeeMachineMock() {
        $coffeeButton = $this->createMock(CoffeeButton::class);
        $coffeeButton->expects($this->once())
            ->method("getCoffee")
            ->willReturn("freshly brewed coffee");

        $sugarButton = $this->createMock(SugarButton::class);
        $sugarButton->expects($this->once())
            ->method("getSugar")
            ->willReturn("sugar");

        $coffeeMachine = new CoffeeMachine($coffeeButton, $sugarButton);
        ...
    }

    public function testCoffeeMachineClassic() {
        $coffeeButton = new CoffeeButtonImpl();
        $sugarButton = new SugarButton();
        $coffeeMachine = new CoffeeMachine($coffeeButton, $sugarButton);

        $this->assertEquals("freshly brewed coffee with sugar", $coffeeMachine->brewCoffee());
    }
}

You’ll notice that the mock style testing requires a bit of extra work. This can seem trivial at a glance but with real world objects this can lead to quite a lot of mocking and tight coupling of unit tests to the implementation. This in turn leads to brittle tests that can break during any refactoring.

The classic style just creates the new objects and changes the assert(s), because it does not care how the end result is created, only what the end result is.

The end result is the only thing that should matter since there are a ridiculous amount of ways in which you can arrange the code to meet that end result.

Good test: is the end result correct?

Bad test: are the steps correct?

Classic testing is more resistant to change.

And one of the most overlooked indicators of a good unit test: how hard is it to understand the test?

Classical unit testing is like telling a story: press coffee button, then press sugar button -> get coffee.

Mock unit testing is like telling the steps of telling a story: coffeeButton is pressed exactly once and will call method getCoffee which will return freshly brewed coffee, sugarButton is pressed exactly once and will call method getSugar which will return sugar and so on…

The first feels more natural to me.

With that said, white box testing still has it’s merits when using code coverage tools. By analyzing the code itself you’re basically testing which branch has been covered, there’s no other way to do this.

A combination of these two methods can lead to good results.

Different styles of unit testing

Generally when doing unit tests you do it in one of these 3 ways: you check the end result, you check the state, or you check the behavior.

So far in this article I’ve shown examples for the first(does the coffee machine produce coffee) and the last(does the coffee machine take the correct steps in the correct order to produce coffee) but no example of the second one(state).

A state has no return result. It just means that the internal state of the system has changed. That can mean anything, so lets use a concrete example.

Lets assume that we want to keep track of the coffees produced so far.

We’ll add a new private attribute to the CoffeeMachine class to keep track of that.

class CoffeeMachine
{
    private CoffeeButton $coffeeButton;
    private SugarButton $sugarButton;
    private int $coffeesServed;

    public function __construct(CoffeeButton $coffeeButton, SugarButton $sugarButton)
    {
        $this->coffeeButton = $coffeeButton;
        $this->sugarButton = $sugarButton;
    }

    public function brewCoffee()
    {
        $this->coffeesServed++;
        return $this->coffeeButton->getCoffee() . " with " . $this->sugarButton->getSugar();
    }

    public function getCoffeesServed(): int
    {
        return $this->coffeesServed;
    }
}

Great, now we have a way to keep track of the coffees served and a way to get that number.

Lets write a test for it too.

    public function testCoffeeMachineTotalCoffeesServed() {
        $coffeeButton = new CoffeeButtonImpl();
        $sugarButton = new SugarButton();
        $coffeeMachine = new CoffeeMachine($coffeeButton, $sugarButton);

        $coffeeMachine->brewCoffee();

        $this->assertEquals(1, $coffeeMachine->getCoffeesServed());
    }

Now, there’s nothing new about these styles of testing. They’ve existed for quite a while(although I’ll admit I didn’t put a name to them until I read Unit Testing Principles, Practices, and Patterns). You’ve probably used all of the styles without knowing their names.

You can use all of these styles in a single test, or use only one style. Or any combination of them, it’s not a rule written in stone.

But these styles are not equal. Lets see what’s different about them.

When it comes to fast feedback all are equally fast, unless you’re doing work outside of the scope of unit tests.

All styles will offer good protection against regression bugs, except maybe behavior tests if you skimp out on checking all the steps.

When it comes to maintainability things start to get dicey.

Tests that check the end result are by far the most maintainable since they generally have less code and are easier to read.

State based tests are hit or miss, they can be easy to maintain, but if you have to check a lot states, things can get out of hand pretty fast.

Behavior are by far the worst offenders, checking every step of a method is by itself time consuming. Refactoring that method, or worse, adding new functionality can mean tediously checking which steps are no longer required, finding the step in the mock, changing them(or removing them if no longer used), and that’s just assuming it’s only one method that’s changing(is it ever?). So in this regard behavior tests are…not that great.

Resistance to refactoring is pretty much the same as maintainability here. End result testing is the most resistant, followed by state, and lastly behavior.

Now you may have noticed that I put a heavy emphasis on maintainability(and by default resistance against refactoring I suppose), that’s because maintainable tests that are the most pleasant to work with. That aspect should not be underestimated. Having maintainable code is one of the best(if not the best) virtues a code base can have, it’s one of those magic attributes that tends to hang out with all the right people. Maintainable means proper design, good encapsulation, easy to change, easy to work with, and easy to understand(generally). Maintainable code is like having good socks while hiking, you won’t miss them if you never had good socks, but once you get a good pair you’ll never want to good back to a bad pair.

Since the most maintainable tests are by far end result based tests I highly encourage writing those sorts of tests. Of course state and behavior tests also have their place, but should not comprise the majority of your unit tests.

Realistically in most cases you’ll end up using a combination of mocks(for database layers, API calls, email services, or what have you) and classic testing.

Unit testing in the real world

I don’t know about you but I’m mostly stuck working on legacy code bases. That can be good or bad, depending on the code. It’s generally somewhere in the middle, and rarely, if ever, towards the good side.

When you have such code bases unit testing becomes hard, classes are tightly coupled, the logic can be highly convoluted, and tests can be outright missing completely. In these cases mocks become valuable, but they should be used as a transition tool towards end result testing.

Mocking a complex class that’s a dependency for another class you want to write a test for can be an easy way to write a unit test for it. Ideally in the future an opportunity will present itself when you can decouple the classes and transition towards a classic, maintainable, resistant to refactoring unit test for that class. But in the meantime mocks are good.

Even if not working on legacy code mocking remains valuable due to it’s unique ability to replace complex objects or outside calls with a fake answer that’s just good enough to write a unit test with.

As I said in a previous chapter, there’s no correct answer when it comes to choosing a specific school of thought, be pragmatical. But keep in mind the tradeoffs that each school offers.


If you want to learn more about Unit Testing(and testing in general) I highly recommend reading:

  • Unit Testing Principles, Practices, and Patterns by Vladimir Khorikov
  • Test-Driven Development: By Example by Kent Beck
  • Growing Object-Oriented Software, Guided by Tests by Steve Freeman and Nat Pryce.
  • Working Effectively with Legacy Code by Michael C. Feathers


Posted

in

by