Madking's Musings: June 2011

Monday, June 27, 2011

No T-Shirt For Me

This past Saturday my brain turned off and I failed to get even a single problem right in the 2^nd round of the TCO 2011. This was the round you had to get past to get a t-shirt.

So what happened? One thing I like to do with problems is work out a couple simple inputs by hand to make sure I understand the problem. As one of my former co-workers liked to say, if you don't know how to do something, you don't know how to do it with a computer. On the first problem from Saturday's contest, the input was a positive integer, and we had to return an integer based on the problem description. I worked out the answers for the inputs 1 - 6. Unfortunately, I had come up with the wrong answer for an input of 6. Since 6 wasn't one of the sample inputs, I didn't know I was wrong.

Not surprisingly, trying to generalize from my wrong solution to a general solution didn't work so well. I eventually gave up, but not in time to code up a correct solution to the next problem.

walk away from the problemThis is a problem that I've had before, and I've seen it in others. Once you make a mistake, it can be really hard to see that mistake. Your mind tends to stay in that rut and you repeat the mistake when you redo the problem. In the real world (i.e. not the artificially constrained world of programming contests), there are a couple of ways to solvie this type of mistake. The first is to talk through the problem with an external party. If you can, though, don't explain what you did, but rather let them solve it themselves. That way you don't lead their thought process into the same rut as your own. The second approach is to walk away from the problem. Often times, just walking away from a problem is enough to let you see the solution when you pick it up the next day.

Unfortunately, neither of those were options for me with the programming contest. So, instead, I get to kick myself for a year. Oh well. Maybe next year will go better.

Monday, June 20, 2011

Testing in Production

A co-worker of mine has the picture to the right posted on her door. Last week we put that into practice. We deployed a new internal app which depended on two other components which were also new. On Thursday at 4:30 PM an email went out to tell people they could use it. By 4:45 our server had crashed. Thursday was a late night, but we eventually got the system where it could run without crashing the server. On Friday morning (when it really got used), we had to scramble to fix some less obvious (at least to the end user) bugs, but just as critical.

So what happened?
The short answer is, we didn't do enough testing. Every bug that we've found (so far), arguably should've been found in testing. All in all, it was a pretty embarrassing experience. Especially since part of it was our first internal Rails app which I had pushed hard to get us to use. However, in hind-sight if I had it to do all over again, I think I'd have done the same thing. Why?

Obviously, it's not good to have such a big failure. Plus, I never like to *have* to work late. However, nothing is in a vacuum. Every decision has to be compared to the alternative. We probably could've found most or all of the bugs with another week of testing. So which was better to the Lab? That we test for another week, with the opportunity cost that two developers can't be working on other projects? Or that we deploy a poorly tested app, give a few users a poor experience, and have a couple people work one long day on a Thursday?

The fact of the matter is, 15 minutes of real user testing found bugs that would've taken at least a week of developer testing. In terms of people hours, it was much more efficient to get some real use than to continue testing. As Jeff Atwood says, release your buggy software. So what was the negative? In this particular situation it was an internal app so there are no customers to lose or investors to turn away. The only negative is that a few people probably think less of me and my division. On the plus side, getting the app in front of people helped define requirements better than any amount of up front analysis. Also, the majority of this code will be reused on future apps, so this testing should allow for future high quality releases, allowing us to regain some of that lost reputation.

Releasing finds missing requirementsSo what's the take home message? Well, I am still embarrassed by how many problems we had with this release. However, given that there were no real negative consequences, it was probably in everyone's best interest for us to release (potentially) buggy software, rather than delay it for more exhaustive testing. Releasing finds missing requirements and shows how the code is really used. No amount of testing can do that.

In general, I try to have this blog be a voice for higher quality software, and to not subject users to a bad experience. However, at times it might be better to give a temporary bad experience so that the majority can have a good experience sooner.

Tuesday, June 14, 2011

Too Slow

I have been remiss in getting my posts out. A few weeks ago I was at ICSE. I had intended on making a post based on that conference, but as soon as it was over, I went into vacation mode. While on vacation I found a number of things to do that kept me away from computers. About the only computer thing I did do was compete in the 2^nd round of the Google Code Jam.

This was more challenging than normal for me because Google tries to schedule their contests to be at a somewhat reasonable hour for most of the world. They've made the decision that there is this great big ocean without a lot of people living in it, and so they schedule it so the contest is in the middle of the night for the Pacific. Of course, I happened to be in Hawai'i at this time and so got up at O'dark-thirty for the 4AM contest start. Surprisingly, I not only did well enough to get a t-shirt (top 1000), but did well enough to advance to the next round (top 500).

So this is why I missed the last two posts. What about this post? Well, after two weeks away, we had a lot of catching up to do at home and at work. The little bit of free computer time I was able to find went to practicing for the 3^rd round of the Google Code Jam where they would select the top 25. Unfortunately, that was for naught, as I did not advance.

Too Slow
24 out of 25 of the advancers solved the first 3 problems, as well as the small input set on the 4^th problem. I, also, know how to solve the first 3 problems as well as the small input set on the 4^th problem. So what's the difference between the advancers and me? On this problem set, speed. At the same time as I was submitting my solution to the second problem, 25^th place was submitting his final solution.

The top competitors were probably a little faster at coming up with algorithms and at coding the algorithms. However, what killed me was debugging. In the first problem, I had numerous bugs in my code that I had to spend time tracking down. Careless mistakes on this problem singlehandedly knocked me out of the running to advance. On the third problem, bugs were again my downfall. I didn't take my own advice and I didn't spend a couple minutes thinking about how to best structure my data structures. I just used the first thing that popped into my head which ended up being very error prone. After the contest, when I had a couple minutes to think without the pressure, I came up with a much simpler way to code the same algorithm.

So what's the moral of this story? Don't make mistakes! :) And trying to do things in a rush just makes it more likely that you will make mistakes. Oh, and going on vacation during a contest is probably not conducive to preparing.

As for why this post is a day late... well, I am just a little bit slow.