Monday, September 6, 2010

Depending on Dependency Management

Back in school most development projects were standalone, depending on nothing more than the standard libraries that ship with the compiler (typically C or Pascal in those days).  And really, how much in the way of libraries do you need to calculate the effects of different collision resolution techniques on hash tables?  However in the "real" world, you are writing something much larger, and you'd be foolish not to take advantage of various libraries out there to help.  Heck, in most groups I've worked in, we've had our own internal libraries to do common tasks in addition to external libraries.

One other important difference between school and the "real" world (at least for purposes of this discussion), is that in school projects are small, both in terms of people and duration.  Which means if you do need some library you just make sure everyone on the team gets it.  You don't typically have the problem of taking a new guy on a team and telling them to go get this project from 3 years ago and get it working on your current hardware/software platform.

So now, 3 paragraphs in, I've gotten to the plight of our mythical developer.  He checks out the code out of CVS (ugh, are they still using CVS?), loads it up in Eclipse, and !?!?!   I (err...I mean he) can't even build it?  What's up with that?  "Oh, you need to go get jar file A and B and C.  They can be found here."  Ok, now it builds, let's deploy it.  "Oh, now you need to install jar file D in your JBoss directory..."  Ok, that wasn't too bad, but how long would it have taken me (I mean him) to figure out all of these dependencies if someone wasn't around who knew what was needed?

And now, only 1 paragraph later, we've gotten to the main topic - Dependency Management.  These days everyone has dependencies (see paragraph 1), and here are some ways that you may handle them in the Java world:

Options
  1. Manage the dependencies in your head - go and get the libraries you need when you need them and put them in their CLASSPATH.  Or the slightly better variants where these dependencies are documented on your team Wiki (you do have a team wiki, right?), or in a script that you run when you do an initial checkout to get your dependencies.
  2. Create a lib directory in the project and check in all of the required dependent libraries in this directory.
  3. Have a common network share that can be mounted by all of your developers and have your project CLASSPATH (as described in ant or a .classpath file or something checked into your version control) refer to these shared jars
  4. Use Ivy - an ant plugin for describing dependencies and let it go grab all the jars that you need when you try to build
  5. Use Maven - a project management tool that will both grab your dependencies and jars.
As you've probably guessed by this point, I am not a fan of the first approach.  It makes it hard for someone to get started on the project.  This startup cost can be the difference between someone helping out or not, if they have just a little bit of time to work on your project.

Option 2, putting all your jars in a lib directory and checking it in, is definitely easy.  There is some concern that this will be too large.  Let's investigate this concern.  Our largest project (in terms of dependencies) seems to have about 80 megs of dependent jars, so I am going to use that number.  Let's say that we upgrade these jar files once a year and that the new jars and that the version control system thinks they are 100% different, so space is duplicated.  So in 5 years, 400MB of your version control system is taken up with these jars.  And if you are using a DVCS like Mercurial this means that each developer will have all of this, and will have to copy it across the network when they do their first checkout.  And, while that wouldn't have fit on the hard drive of my college computer ("Back in my day..."), now-a-days with terabyte drives and gigabit Ethernet that's not really that much, and 5 years from now (which is when it will be that big), its going to be even less.  Of course, your numbers may be different - so it may pay to do this calculation yourself.

Option 3 is about as easy as option 2, but without requiring the "bloating" of your version control system.  However, it kind of feels wrong to me, and here are a couple of reasons.
  • Remote working - hard to work disconnected from home, if you need access to the shared drive just to compile
  • Will your team really be as disciplined with these files as files in version control?  i.e. Are you sure a file won't just get deleted / replace / modified?  Can you revert to an old version if it does?
I am going to talk about Option 4 and 5 together.  I have used Maven in the past and recently looked into Ivy which led to this blog post.  They both seem like cool ideas, and if you are in an environment where you may not be able to include the dependent libraries with your code (e.g. open source projects), it may be a really good idea to use one of them.  However, for in-house development, I am not sure they are worth the effort.  Using either of them means one or more configuration files that have to be associated with each project.  This means yet one more tool to integrate with all your other tools and that everyone on the team has to know and understand.  At least one person has to take the time to understand it well enough to be able to setup/convert your projects to use this tool.  And it adds one more external dependency, which can particularly be a pain going forward - just ask people who had Maven 1 projects and then had to go through work to upgrade them to Maven 2.  It may not seem like a lot of work, but it's more than just a 10 minute download and install and any work is typically too much if it isn't helping you write your software.

Conclusion

There are plenty of options out there, but unless you have a good reason to do something different, just check your libraries into a lib directory and then forget about it and get back to work.

No comments: