Sunday, February 25, 2007

Is Decomposition Readable

Last week I claimed that you could make code more readable by decomposing methods into smaller, well-named methods, which are effectively self-documented. This week I am going to challenge that claim.

On the surface, the technique I showed last week looks good. Each individual method is short enough that you can grasp all of its semantics at once. Its easy to see if each method does what it is supposed to do. With such short well-defined methods bugs should be scarce and easy to find, right?

Unfortunately this isn't true. As I found out last week when I was testing the code, there was a bug in it. The bug was in the original version, so at least I hadn't introduced a new bug. However, when I asked myself honestly which version would be easier to debug, I wasn't sure. The problem with such decomposition is that your code is now scattered about. If you want to trace through code you have to now jump around many methods. While each individual method is short, the algorithm as a whole may now be too large to fit on your screen at once. This makes it harder to debug.

subtle details matterThe problem is that decomposition doesn't actually solve one of the problems of programming - subtle details matter. What happens when you choose the smallest or largest element as your pivot? What happens if your split point is the first or last element in the array? Boundary conditions always matter, but when you decompose a problem, now the subunits have to handle the boundary conditions in a consistent manner. So while decomposition can make some bugs easier to avoid, it can add an insidious subtle new bug of inconsistency.

Note that this problem is not limited to decomposing a single function like I did last week. This problem is endemic to component based software whether your components are methods, objects, or entire software products. Even if every component is bug free, the system will still contain bugs if the components have different assumptions. From experience I can say that some of these bugs can be a doozie.

Despite the objections I have just raised, I still think decomposition is a good thing. However, it is no silver bullet and you have to watch out for its pitfalls.

Monday, February 19, 2007

Writing Readable Code

Writing readable - and therefore maintainable and debuggable - code is one of the challenges of the professional programmer. There seems to be two schools of thought on how to do this. One is to insert inline, or even block, comments wherever code might be unclear or where it might be helpful to know what the developer was thinking. The other school says that the code is its own documentation.

While the second attitude seems obnoxious, the first attitude tends to not work in practice. Either comments aren't written ("I'll add them in when I'm done and know I won't change the code") or they restate the obvious (x = 4; // assign 4 to x ) and through their prevelance actually make the code harder to read. I'm not saying good comments can't be helpful, but it is hard to write good comments, and even good comments can become bad if the code changes and the comment doesn't change with it.

The solution is to strive to write code that is easily readable without comments. To demonstrate, I will show some code I randomly found online and I will show how it can be made more readable. Since that will make this post long enough, I will save my analysis on the tradeoffs of my approach for next week.

First, the "before" code:

void sort(int a[], int lo0, int hi0) {
int lo = lo0;
int hi = hi0;
if (lo >= hi) {
return;
}
int mid = a[(lo + hi) / 2];
while (lo < hi) {
while (lo < hi && a[lo] < mid) {
lo++;
}
while (lo < hi && a[hi] >= mid) {
hi--;
}
if (lo < hi) {
int T = a[lo];
a[lo] = a[hi];
a[hi] = T;
}
}
if (hi < lo) {
int T = hi;
hi = lo;
lo = T;
}
sort(a, lo0, lo);
sort(a, lo == lo0 ? lo+1 : lo, hi0);
}

As you can see, it is a sorting algorithm. In particular, it is an implementation of QuickSort. Is it correct? If it wasn't an algorithm you were familiar with it, how long would it take you to figure out what it does. If you found a bug in it, how confident are you that you could fix it without introducing new bugs? These are the problems that come up when coding clever algorithms. Obviously these issues could be partially mitigated with some inline comments explaining the purpose of the various blocks of code. Rather than show that, I will show an alternative, which is to make the comments be the code.

Basically you take each section of the algorithm that you would comment and you make it a method which is named descriptively:

void quickSortRange(int a[], int low, int high) {
if ( ! indicesInRange(low, high) ) {
return;
}
int pivot = getPivot(a, low, high);
int splitPointIndex = adjustElementsAroundPivot(a, low, high, pivot);
sortSubRanges(a, low, high, splitPointIndex);
}

When you look at this, it is easy to see what is happening. As long as the sub-methods do their job correctly, you can be confident this works. If it's too complicated to describe what a method does with just a name, the description can go in the method header which is much less disruptive than cluttering up the code in line. Modern IDE's will even display this comment in a context sensitive way for the maintenance developer.

So now to define the sub methods:

boolean indicesInRange(int low, int high) {
return low < high;
}

int getPivot(int[] a, int low, int high) {
int middleIndex = (low + high) / 2;
return a[middleIndex];
}

int adjustElementsAroundPivot(int[] a, int low, int high, int pivot) {
while (indicesInRange(low, high) ) {
low = findFirstBadLow(a, low, high, pivot);
high = findFirstBadHigh(a, low, high, pivot);
swapBadElements(a, low, high);
}
return getSplitPoint(low, high);
}

void sortSubRanges(int[] a, int low, int high, int splitPointIndex) {
quickSortRange(a, low, splitPointIndex);
int newLow = (low == splitPointIndex) ? splitPointIndex+1 : splitPointIndex;
quickSortRange(a, splitPointIndex+1, high);
}

As you can see, sub-methods should be implemented with the same idea. i.e. Rather than complicated code, they should call out to their own sub-methods.

int findFirstBadLow(int[] a, int low, int high, int pivot) {
while (indicesInRange(low, high) && isLowOk(a, low, pivot)) {
low++;
}
return low;
}

int findFirstBadHigh(int[] a, int low, int high, int pivot) {
while (indicesInRange(low, high) && isHighOk(a, high, pivot)) {
high--;
}
return high;
}

void swapBadElements(int[] a, int low, int high) {
if (indicesInRange(low, high)) {
int temp = a[low];
a[low] = a[high];
a[high] = temp;
}
}

int getSplitPoint(int low, int high) {
return Math.min(low, high);
}

boolean isLowOk(int[] a, int low, int pivot) {
return a[low] < pivot;
}

boolean isHighOk(int[] a, int high, int pivot) {
return a[high] >= pivot;
}

When code is written this way, inline comments become much less necessary. If you change how the code works, just make sure you rename methods appropriately.

This has gone on long enough for one post, but I think this example provides plenty of fodder to think about. At least it does for me. Next week, I will analyze this example in further detail including the bug that is in both versions of the code.

Sunday, February 18, 2007

President's Day Delay

Due to the 3 day weekend, there will be a one day delay on this weeks post.

Sunday, February 11, 2007

Hockey

This is my first post based on a request. Joshua wants a post about hockey to go with the picture on the front page. I will try to tie this into something technical in a bit, but first a bit about my lunch times.

There are a group of guys (there used to be a woman who played too, but she left to work elsewhere) at work who play street hockey just about every noon, whether its below freezing or in the 90s. I had never played any form of hockey before, but it looked like fun and I figured I could use the exercise, so a few months after I started at the lab I joined them. I was completely awful when I first started, but after some months I got good enough to where I wasn't a complete drag on whatever team I played with. I am now one of the regulars and find that playing is a lot of fun and good cardio exercise, though probably not the best thing if you are afraid of pain.

So how does this tie into anything technical, to go with the theme of my blog? Squint your eyes and bear with me.

This past September I was a finalist for the Google Code Jam. The Daily Press found out about it and wrote an article about me. They even sent a photographer out to get pictures of me. The picture you see on the front of my blog ran in the paper next to main part of the story. They also ran a head shot of me on the front page where the story started (yes, slow news day). It was a great boost to my ego and I think its a great action shot - that's why I use that picture on my page. But what I find interesting is what it says about the field of programming.

The article highlighted my hockey playing. Hockey is a sport that is held in such low esteem in the U.S. that when they cancelled the 2004-2005 season I'm not sure anyone noticed. Why is it that golf and bowling and even poker are regularly televised, but never programming? Why is it that whenever Hollywood portrays programmers it in no ways resembles what I or anyone I know does?

It's because programming isn't accessible to the average person. When you watch NFL games, you can imagine tossing a football and probably have even done it at some point. Watching poker tournaments you can compare how they play with how you think you would play. You can identify with these people - even if you don't play poker or football - so you enjoy it. If you've never programmed, its very hard to identify with that, hence why the article highlighted other interests of mine.

I'll admit it can be frustrating to work in a profession that people can't identify with. But I guess it is appropriate given that what we do is make the translation between what people understand - programs like iTunes - and what they don't - the low level logic and commands that make it work. Besides, I can always take my frustration out on the hockey court.

Monday, February 5, 2007

Don't Want That!

When I was making the transition from the procedural oriented programmer I was in college to the much more object oriented programmer I am now, I had many issues. I wish I could remember the specifics, but this one time at my first job I was trying hard to design the software I was working on in a good OO way. I came to a sticking point where I could figure out how I should do the one part of the design. I could see ways forward but they were decidedly non-OO. I went to Chuck, a coworker with years of OO experience, for advice and wisdom. Instead of showing me an OO way out of my corner, he redesigned the entire component on the white board. I left his office very frustrated.

delete the sentenceJust now, I am reading "On Writing Well" by William Zinsser. (yes, I am trying to improve my writing.) There is a paragraph where he tells you the trick for how to fix those problem sentences, the ones that no matter how you rearrange and reword it always sound awkward. His solution - delete the sentence.

Some years ago, when I was less experienced in Java than I am now, I was writing some code where I really wanted function pointers. I was complaining to Chris, who had the office next to me, about Java's lack of function pointers and he pointed out that if I restructured my code with interfaces I wouldn't need C's function pointers.

redefine the surroundingsTime and time again we come across problems which seem insoluble. When this happens, the stubborn individual persists at trying to find a solution while the more faint of heart gives up. Often times the best way forward is actually take a lesson from the faint of heart. Whatever it is you want - don't want that. The solution is to redefine the surroundings of the problem so the trouble area just goes away. That's what Chuck did, that's what William Zinsser says, that's what Chris did, and now that is what I try to do.