Monday, February 14, 2011

Unit Testing - the Ugly

So you've weighed the benefits and drawbacks to unit testing and decided to do some unit testing. Now you will learn a rarely talked about fact, unit testing is ugly.

A unit test is composed of 3 steps:
  1. Set up state
  2. Execute component
  3. Validate state
For any type of complex components, the code for setting up the state and validating the state can be quite large and tedious. If you look at a set of tests for a single component you'll also find that they tend to be very redundant.

To a certain extent, there is nothing you can do. Comprehensive tests, by necessity, will contain a lot of data and a lot of validation. If you are a believer in the DRY principle, the redundancy will drive you batty. Through standard refactoring ideas you can pull out the common setup and validation code into their own methods and helper classes. However, there are two catches.

Almost the Same
Each of your unit tests are presumably testing different aspects of your component. As such the state will have to be different for each test. And the validation steps will be different for each test. Trying to parameterize your extracted setup and validation methods to support your scenarios can be as challenging a design problem as any portion of your application proper.

Self Contained
One goal of unit testing is that each test is self descriptive. That way when the testFrombulatorWithNull fails you can quickly narrow down the problem. Unfortunately, if you have refactored your unit tests according to the DRY principle, the setup/validation information won't be solely in the specific test method. You'll have to understand how the helper methods and classes interact so you can understand the test so you can understand what went wrong.

Example
Imagine we modify the PhoneFormatter example from before to now take a formatting string in its constructor. Now, we have a lot more cases to test, which might look something like this:
  1 import static org.junit.Assert.*;
  2 import org.junit.Test; 
  3 import org.junit.Before; 
  4  
  5 public class PhoneFormatterTest {
  6  
  7   private PhoneFormatter mFormatter; 
  8  
  9   @Test public void testWithSpace() {
 10     mFormatter = new PhoneFormatter("(###) ###-####"); 
 11     check("(757) 555-1234", "7575551234"); 
 12     check("(757) 555-1234", "17575551234"); 
 13     check("555-1234", "5551234"); 
 14     check("911", "911"); 
 15   } 
 16  
 17   @Test public void testNoSpace() {
 18     mFormatter = new PhoneFormatter("(###)###-####"); 
 19     check("(757)555-1234", "7575551234"); 
 20     check("(757)555-1234", "17575551234"); 
 21     check("555-1234", "5551234"); 
 22     check("911", "911"); 
 23   } 
 24  
 25   @Test public void testNoParen() {
 26     mFormatter = new PhoneFormatter("###-###-####"); 
 27     check("757-555-1234", "7575551234"); 
 28     check("757-555-1234", "17575551234"); 
 29     check("555-1234", "5551234"); 
 30     check("911", "911"); 
 31   } 
 32  
 33   @Test public void testDots() {
 34     mFormatter = new PhoneFormatter("###.###.####"); 
 35     check("757.555.1234", "7575551234"); 
 36     check("757.555.1234", "17575551234"); 
 37     check("555.1234", "5551234"); 
 38     check("911", "911"); 
 39   } 
 40  
 41   /* Test more formats here */ 
 42  
 43   private void check(String expected, String number) {
 44     checkImpl(expected, number); 
 45     checkImpl(expected, "9" + number); // check leading 9 
 46   } 
 47  
 48   private void checkImpl(String expected, String number) {
 49     PhoneNumber pn = new PhoneNumber(number);
 50     assertEquals(expected, mFormatter.format(pn)); 
 51   } 
 52 }
As you can see the unit test is now a lot longer. And this is for testing a very simple class that takes a very simple input and has a very simple output. Imagine how much uglier this would be if PhoneFormatter took more complex objects as parameters to its methods and returned objects?

Even with such a simple class under test there is a ton of duplication, each unit test method is almost identical to each other. This has the usual problems of cut-and-paste. For example, if you want to add international numbers to your test, you need to make sure you add this to each and every method, and it could be easy to miss one.

There's also the possibility for confusion. Notice the method check (lines 43-46) checks numbers both with and without a leading 9. A future developer, if they don't carefully follow the code, may not expect this, making it unclear why a particular test fails. In this case, it is probably worth this extra indirection to cut the size of the test class almost in half, but you are bound to come up with examples that aren't so clear.

Conclusion
Obviously, the goal to find the happy medium where your test code is structured to minimize redundancy while still maximizing readability, just like writing any other code. However, the goals of unit testing (test component, find bugs, show example of how component is used) are not the same as writing other code. This means that you may make different decisions as part of your design trade-offs than you would elsewhere in code. When combined with the fact that comprehensive unit tests can't be concise (otherwise they wouldn't be comprehensive), it is likely that at least some of your unit tests will be ugly.

2 comments:

Unknown said...

I have been thinking about your post as I have recently been working on my first project that employs TDD and unit testing. I'm by no means an expert, but I think in this case the ugliness in the test code serves as an indicator that the tests could be structured better.

The core problem is that the tests are centered around the strings used to construct a formatter. However, the ugliness could be reduced, or possibly eliminated, by restructuring the tests around the behavior of the formatter instead. In this case, a separate test would be created for each input type the formatter supports, such as seven-digit numbers, seven-digit numbers with a prefix, 911, etc. The tests would be named appropriately to indicate the feature they are exercising (formatsSevenDigits, formatsSevenDigitsWithAPrefix, formats911, etc.).

Each test could then contain a table of strings used for the constructor and the expected output for that type of formatter. The code that actually constructs a formatter, formats an input string, and compares it to the expected output string could be put into its own support method. This insulates the tests to changes in how the object is constructed or how an input string is formatted, addressing some of the DRY issues.

If a developer has to add a new constructor string, the tests serve as a list of input types he must support. He can add the new format string and expected output string to each test as he adds support for that type of input.

If a developer needs to add a new feature, such as the international number support you suggested, he adds a new test for it. As he adds each constructor string to the new test, he can add the required code to implement that support.

The test methods also serve as a list of the object's features when organized this way. This can aid in the maintenance of the application, even when the class itself is not being modified.

I'm not arguing that unit tests can't be ugly. However, I think that in some cases ugly tests suggest that something is wrong with the code being tested or with the tests themselves.

Michael Haddox-Schatz said...

You may be right. I agree that ugly code is an indicator that the code could be structured better. And I knew that by putting up "ugly" code, I was opening myself up to the criticism, "Hey it wouldn't be ugly if you did this..."

So maybe my example wasn't good, but my experience still tells me that unit testing is often ugly. I suppose that might just mean that I am bad at writing unit tests. :)

However, even if tests are written cleanly, I am coming to the conclusion that automated tests are, by necessity, verbose. It's often harder to describe correct behavior than to just implement the correct behavior. Hence, automated tests that are at all robust will tend to be longer than the the code they are testing. I guess I find that length "ugly". I suppose, maybe I should just get over that.