I’m sure you know about unit tests. And you probably know that you don’t write them enough. More is better than less, some is better than none. May be your project is fully covered by thousands of unit tests. There is only one “tiny” problem — probably most of those tests are wrong…
I think the first thing you’ve learned about unit tests is that every new class, function or a line of code must be tested. Some argue that you should even write unit tests before writing actual code itself. But I’m not here to teach you Test Driven Development (TDD). For me it isn’t really important who came first — code or unit test. Your code needs to be fully covered — this is important. But even if every line of code in your project is covered by unit tests there is great chance that most of them are wrong kind of tests.
Let me explain.
There are two categories of unit tests: black box and white box. Black box tests don’t know about details of internal architecture and only call external interfaces of your application. White box tests are the opposite — they know the internal architecture and try to test every single facet of it. I.e. black box tests would instantiate your entire application, module, subsystem and invoke their public APIs (i.e. REST API). White box tests would instantiate internal classes, invoke every single method to verify the logic of those internal components separately.
Some will argue that black box tests aren’t unit tests, but integrational tests. I don’t care! For all practical purposes, if they test your code, they are fast, so you can run them as often as you want — they are unit tests.
White box tests are usually easier to write and therefore, I think, many developers prefer writing those more often. While I would argue that white box tests are wrong tests, and you’d better off writing black box tests whenever possible.
Imagine two development teams working on a backend application. Team A writes only black box tests. Team B — mostly white box tests. Both teams came to a point where they realize that their code architectures require big refactoring. May be even full rewrite!
Team A throws away their old badly designed code and rewrites it. They run existing black box unit tests because REST APIs stay the same for compatibility with client application. They only changed the internal implementation of those APIs. And since they had full coverage or their REST APIs, they can rely on existing unit tests. They should not modify them, actually quite the opposite — those unit tests are time-tested source of truth. Once all unit tests pass — the team is confident that the refactoring is finished successfully.
Team B on the other hand will need to throw away not only the old code, but also almost all unit tests as well. Those unit test were designed to verify internal components of the old badly designed code. Developers will need to write new unit tests for the new architecture. Writing unit tests is time consuming and can require the same amount of time as writing the code itself. The result is that team B spends twice as much time refactoring the code and writing new unit tests. And since they don’t have old, time-tested unit tests, they will probably make mistakes implementing the new ones. I.e. they can miss a few bugs here and there. Though, I’m sure they will find those in production and fix them very quickly.
There is also another reason why black box unit tests are better. And this is even bigger than you may think.
Imagine regular, not rewrite-everything, but small refactoring tasks that every software developer does every now and then to fix little design issues in the code. Team B developers find very often that when refactoring is finished — some of the unit tests fail. Sure they do — those are white box tests! If you change some of the internal components you break corresponding unit tests. Therefore team B will often fix their unit tests after changing the code.
Team A on the other hand will rarely need to fix existing unit tests after refactoring, because refactoring should never change external behavior of the system. In most cases when unit tests fail team A will know that something went wrong during refactoring and needs to be fixed, not the unit tests.
White box tests are fragile. Team B often encounters that unit test failures don’t identify issues in the new code. The new code is correct, but unit tests need to be updated to account the new behavior. Team B developers will start treating their code as the source of truth more often than not and unit tests as pesky kids that require babysitting. Because every time when the code is changed they will go and fix those “baby tests”, that don’t know what new things “daddy’s new code” is doing. Over time this attitude will gradually eclipse believe in unit tests in general. Developers will write less and less new tests and start fixing some of them even when they identify genuine bugs in the code. Just because their past experience will tell them that unit tests are usually wrong.
Team A appreciates that unit tests pin-point actual bugs in the new code and rarely need to be updated — only when external interfaces change, which is expected. Black box unit tests become time-tested source of truth, bullet prove wall that only gets stronger over time.
P.S. I’m not 100% against white box unit tests. They are absolutely fine! May be your external interfaces are so dynamic and your internal architecture is so rigid. This article is exaggeration to demonstrate the main point: if most of your unit tests use stable interfaces, which often are external interfaces, your tests stay relevant longer, give your development team more useful feedback and better development experience, as they don’t require frequent “babysitting”, which I would argue is very important.