Update 2009-06-30: Despite the attention seeking title and first few paragraphs, the point I was trying to make with this post was about the benefits of automated functional testing. Since it’s been posted to proggit and hacker news, a number of people have objected to the misleading intro. Apologies for that, and please start reading from “Benefits of automated functional tests” if you’re not interested in the misdirection! — Tim.
In the last few years of writing “enterprise” software in Java I’ve created and maintained thousands of unit tests, often working on projects that require 80% or better code coverage. Lots of little automated tests, each mocking out dependencies in order to isolate just one unit of code, ensuring minimal defects and allowing us to confidently refactor code without breaking everything. Ha! I’ve concluded that we’d be better off deleting 90% of these tests and saving a lot of time and money in the process.
The stem of the problem is that most “units” of code in business software projects are trivial. We aren’t implementing tricky algorithms, or libraries that are going to be used in unexpected ways by third parties. If you take most Java methods and mock out the services or components they call then you’ll find you aren’t testing very much at all. In fact the unit test will look very similar to the code it is testing. It will have the same mistakes and need parallel changes whenever the method is modified. How often do you find a bug thanks to a unit test? How often do you spend half the day fixing up test failures from perfectly innocent code changes?
The complexity that does arise in this kind of software is all down to interactions between components, messy and changing business requirements and how the whole blob integrates at runtime.
Should we do away with automated testing all together then? No. In fact I have found “functional” end-to-end automated tests to be incredibly useful. By functional tests, I mean those that test software in more-or-less the same way as a human tester does, covering wide swathes of the application without checking the result of every possible variation.
The benefits of functional tests are outlined below, and after discussing the one big downside I admit where unit tests can be appropriate.
Benefits of automated functional tests
If we deploy this release of the application, is the application going to release a deployment of toxic fumes and metal scraps?
I’ve worked on a project where, with the “final” release looming, we were creating a release for manual testing at the end of every day. And the next morning, testers would try to run a few simple scenarios and the application would fall over at the first hurdle. Incorrect database patches, configuration errors, NullPointerExceptions in critical paths and the like.
Later on I developed a set of a dozen or so functional tests, covering just a few basic scenarios. Nevertheless, this was sufficient to ensure that manual testers were able to find useful bugs in the application rather than having to sit around for days twiddling their thumbs because nothing works.
It’s tough being a new developer on a project. Faced with a mountain of code I haven’t seen before, performing some process I don’t really understand – wait, what does this application do again? – the first thing I like to do it run the damned thing and get a few ideas on what it actually does. To play tester for a while. Unit tests aren’t very useful here. We’re wanting a high-level overview of the application rather than worrying about how some tiny part of it might be implemented.
So how do you get the application up and running on your development environment. If you’re lucky, there’ll be some up-to-date instructions for getting it to kind of start up. Then you’ll get one of the other developers to show you how to run a few things. “Um, let’s do it on my machine,” they’ll say. See, you need to hack up the reference data in the database like this and take this sample XML file I’ve got sitting around and modify this, this and this field, stick it in this table with these dozen values, and then … it goes on. Good luck ever replicating that yourself.
If however, you have some automated functional tests, you can say “run this test that covers the simplest scenario with everything going right”. They can then explore the database and see what kinds of information has been written and even create breakpoints in a debugger to step through the application and get an idea of how everything sits together.
This is massively handy, not just for the new developer but for everyone around them. Instead of having to spend weeks or months babysitting each new developer you can just point them at a set of tests and tell them to play around for a while.
What’s more, a “new developer” is not just one who is new to the entire project. With any project of a reasonable size, you will naturally find that each developer has parts of the application that they don’t see, touch or understand. If you don’t want to maintain the same part of the application for all time then it’s in your best interests to make sure you can hand it over to another person with the least impact possible. Having functional tests is a great way of doing this.
The benefits described here give us a few clues to how our functional tests should be implemented.
First, they need to set up their own reference data so that a new developer can sit down and run them straight away without hacking their database. In fact, the first thing any test should do is to clear out all the reference data, and set up just the absolute minimal data required to run through this test successfully. This helps people understand the impact of setting difference reference data values (this is often woefully under-documented!) and by having the data be independent, we avoid the problem where adding a new test requires jumping through hoops to avoid breaking every other test in the system. (Aside – I have a convenient method for setting up reference data in code that I may share at some future stage.)
Second, if people are going to be running through the test in a debugger and stopping at breakpoints to explore the runtime state of the application, then we can’t get away with putting in sleep(fixedTime) calls through the tests. You can’t say “the application should have finished processing by now” if someone has actually paused it for half an hour. Tests that sleep are slow and unreliable in any case.
Experimentation and Precise Requirements
If a client asks you “how does the application behave in this weird scenario,” how long does it take you to give an answer? The requirements documents probably don’t have that level of detail, and in any case the question is how the application really behaves.
If you have functional tests already, then it’s probably straightforward to modify one that checks a related scenario. After you’ve written it, you can read over it again to check that you’re actually testing what you intend to. Checking it manually would be error-prone and involve a lot of set up time.
Odd special cases
I’ve had cases where we decided that one particular client would be able to submit input data that was incorrect in a couple of specific ways, and we’d accept it anyway. The last was one of those good times where something that’s causing major headaches at a high-level in the business turns out to be an issue that I as an individual developer could make go away with some minor code changes. I think sometimes we take for granted how fortunate we are as developers to be able to make such a direct impact on problems instead of being wholly dependent on others.
Right, so I’ve put this worthwhile hack in. What’s to stop another developer coming along and changing the code again? There’s no obvious reason for why it behaves in this way – it’s just down to history. I can put comments in the code, get requirements to update their documents, but no-one reads documents anyway What’s more, even though I only had to make some minor changes, that was due to some co-incidental reasons of how other parts of the code just happen to be currently implemented.
What I did was to write a functional test that takes the exact sample file from the client and runs through the applicable scenario, checking that it is accepted. (I also wrote another test to check that the same file is rejected if it’s from any other client.) If a developer ever makes a change that breaks the test, the first thing they’ll do it open it up and find a big comment at the top that I wrote explaining all the history around why this is necessary and what it does.
This makes the requirement discoverable when it matters. Bonus points!
Fixing a bug, really
A tester raises a defect. “When I do this, the application gives an error.” You check the code and find, oops! We’re doing something stupid. A quick fix, commit, mark the defect as fixed. Weeks later they re-test it, re-open the defect because the same series of steps still fails in the same place. It’s a different error message though! As a developer you can try to say that the original defect was “fixed”, it’s just the application it now getting further along and hitting a different issue. No-one cares though.
If you write a functional test to verify the scenario now works (even if this is just an ad-hoc test that you don’t commit) you’ll ensure that the tester will agree with you that it’s been fixed.
Downsides of automated functional tests
There’s really only one – they are SLOW. Whereas you can easily run a hundred unit tests in a second if you’re using a lot of mock objects, a single functional test may take 2 minutes to complete. This is a big problem. Whenever I’ve been talking about writing an ad-hoc test, and it being beneficial even if you don’t commit it, I’m alluding to the issue that it’s practical to write a functional test for every defect raised, but not so practical to run them all every time someone commits, or even once a day, as a way to guard against regressions.
Because I think that functional tests have enormous benefits, I also think it’s worth investing time getting them to run as fast as you can. This may include running tests in parallel on a cluster of servers (hey, seems I’ve found a way lots of companies could utilise “cloud computing”). It also tends to be better to add more assertions to existing tests than to construct an entirely new independent test. You can make tests that are “data-driven”, so that a single test may have a table of a dozen different options that it loops through, without having to start from the beginning each time.
The slowness means that you’ll only be able to have a small number of functional tests – under a hundred – and this means that there’ll be a lot of scenarios (particularly negative scenarios) that you’re still not testing.
On the other hand, an automated functional test still runs faster than a human tester. It’s not unusual to have teams of testers that take a week to run through a hundred scenarios. With every release they have to go through this entire process again. What’s the cost of this, in both money and time? More than the cost of a set of servers?
(The other option, a key feature of Rails and becoming popular in other recent web frameworks as well, is to support “integration tests” that work at a lower level than functional tests, but still well above unit tests. Such a test would, for example, request a URL and assert that the response is to be forwarded to a different URL, without actually going through a web server or web browser. The exact level at which you test will depend on the type of application, and how easy it is to simulate a user.)
There’s an interesting post on the RunCodeRun Blog called It’s Okay to Break the Build. It argues that it’s unreasonable for developers to have to run an entire set of slow tests before every commit. I agree, and have found that simply running a couple of “relevant” functional tests before committing ensures that it is unlikely that you will break any other tests. Breakages can be quickly reverted after the test failure shows up on the main build server. (Perhaps there are some DVCS workflows that would be useful here?)
Continuous Deployment at IMVU: Doing the impossible fifty times a day is another fascinating article. It describes a project that has 4.4 hours of automated tests, including an hour of tests that involve running Internet Explorer instances an simulating a user clicking and typing. By running them across dozens of servers, they can run all the tests in 9 minutes. What’s more, they have so much faith in these tests that their code is automatically deployed to live when the tests pass. This happens 50 times a day on average. Brilliant!
Concluding by backtracking
Although I’ve dismissed the benefit of having lots of unit tests in the types of projects I’ve recently been working on, what if your environment or project isn’t the same as mine?
Consider the strength of interfaces and the separation of developers in your project.
Are you writing a library or framework that people may use in ways you don’t expect? Do you have a lot of “generic” code that has strong interface boundaries? What is the impact if a bug is found? If all your development is within one team, as soon as you discover a bug in one section of the code it’s easy enough to write a functional test and fix it. That doesn’t hold though if the developers that discover the bug and need the fix will have little or no hope of writing it.
In some cases it is useful to test code in isolation and mock out its dependencies – because you actually don’t know what these will be at runtime. In a lot of “business” code though, while people may claim some code is generic it isn’t really. Or if it is, it’s a case of premature indirection (not abstraction!) being the root of all evil.
Another dimension to consider is the rate of change of requirements documents.
If requirements are fuzzy and constantly changing, then what’s the point of testing all sorts of weird edge cases? The desired behaviour won’t be well-defined in any case! With changing requirements, functional tests are very useful to keep track of how the application is supposed to (and does) behave at the current point in time.
I would go so far in certain circumstances to say that good testers should be able to themselves write automated functional tests, based of higher-level requirements, using a domain-specific-language (DSL) written by the developers. This provides a precise language with which business analysts, testers and developers can communicate. Imagine receiving a bug report with a test case that you can immediately reproduce instead of having to guess at! With enough skill, a large set of functional tests would act in concert with higher-level requirements documents to produce a system that has well-documented behaviour. Unfortunately the “certain circumstances” I’m talking about is “Tim’s dream world”. In the real world, you need to have a least the first big set of tests written by a decent developer who has good taste.
To conclude, delete most of your unit tests and replace them with just 10 functional tests for a start. Forget about code coverage metrics, and let me know how it goes!
(If you liked this post, you may also be interested in It’s not done yet, where I talk about some of the issues around getting the kinds of applications I typically work on into a live environment.)