I've fiddled with my blog template because I decided I wanted more horizontal viewing space, given that it was using less than a third of my 1920 horizontal pixels. If it feels too spread out for you, I added a drag-and-drop handle over to the left to let you resize the main content column. The javascript is pretty primitive. If it breaks, drop me a comment.

Monday, July 13, 2009

The Journey to Git, Part I—Distributed vs Central VCS

This one has been a long time coming. I've been using Git as a client to Subversion at work now for a number of months. Over the weekend, I attended a session on Git at the NFJS conference, and it served to solidify some of the concepts in my mind. I think I'm ready to do one or more posts on the subject now.
This first post is going to be more an examination of distributed vs. centralized VCS than anything specific to Git itself. The remainder of the posts in this series of currently unknown length will be a quick start tutorial to get you up and running, teaching you how to use Git at a very basic level with just a single, local Git repository. When I was trying to get going with Git, I couldn't find a good reference online to get me up and running in the way I needed, so I'm going to try to make this as complete as possible. I assume zero Git knowledge here, but I do assume you're familiar with version control in general.


Articles in this series:

Why Git?

First, I'm going to examine why we even need to care about a new VCS. What does Git have that SVN doesn't? That really comes down to the fact that they're two different beasts. It's not so much Git vs. SVN, but distributed VCS (DVCS) vs centralized VCS.

Centralized VCS

With centralized version control, a development team is locked in in terms of workflow. You have a local copy of the project (or more than one). You can make changes to your heart's content, but everything is strictly local until and unless you commit back to the repository. Then whatever you did is visible to everyone. There's no in-between here, no compromise. I actually find it quite remarkable that developers the world over have been content with this model for so long. There are a few very immediate problems that come up.
Problems with Central Control
#1: Collaboration
Scenario: You're working on adding a new feature to an application and you want another developer to give you a hand with part of it or look over some code you think is questionable, or maybe he's just working on something else that interacts with your piece. Your code needs to be available for the other guy to work with. With a centralized VCS, your only options are to 1) commit your code, making it visible to everyone else, or 2) share it some other way outside of the VCS. Option 2) is just messy, if workable. Option 1) would probably be preferred for the most part. The problem here is that your code could be in very early development, possibly even broken. We all know there's a horrible taboo on committing broken code, and rightly so with central control. So you're stuck with not being able to share or with having to stop work to make things non-broken to the point that you can safely commit before you can collaborate on a piece of code.
#2: Releases
This problem is significantly more insidious. Any time you commit something to a centralized VCS, you're essentially affirming that this feature will absolutely be included in the next release. This assumes, of course, that you're not doing work on your own branch specific either to your feature or yourself--a whole other worm can with centralized version control. As your release process becomes more streamlined and automated (that's happening, right?) and your release cycles get shorter because you're aiming to be a more agile--that's agile rather than necessarily Agile--team (that's happening, too, right?), it becomes more important to not schedule features for a release, but to schedule a release for a date and put in whatever features are finished on that date. This means it's imperative that unfinished features never be committed to your mainline development branch, like trunk, since you can't know exactly when a feature will be ready for release. Instead, features should be developed in isolation and only moved into the "trunk" branch when completely finished. This leads back to the branch-per-feature idea, which central VCS in general and SVN in specific just aren't designed for.
#3: History
This one's simple and may be less compelling to some. It's a big hangup for me. A VCS is all about keeping track of history, right? Well, who ever said history was perfect? Only a centralized VCS tries to enforce that by demanding that every commit be 100% bug-free since everyone is going to see it. This leads to larger, less frequent commits, potentially as much as days apart. Days! Why have we come to accept this as the norm? I say version control should be no more than an hour behind you at worst. Small, incremental changes are better. VCS should be a coarse-grained undo system.

Distributed VCS

How does a DVCS answer these concerns? Laying out the problems took quite a bit of space, but interestingly, they're all solved by this fundamental difference: in a DVCS, you not only have your own working copy, you in fact have your own copy of the entire repository, or at least the pieces you care about. You can change anything about it in any way, and nothing leaks out to anyone else unless 1) you push it out or 2) someone else pulls it from you intentionally. This solves all three of my problems since nobody, including a release, will see what you've done until you're ready for them to, but at the same time, your changes are available to anyone who's interested.
Now that you know why a DVCS is a good idea, let's get back to Git itself. There seem to be three mainstream DVCSs out there today: Git, Mercurial, and Bazaar. Of the three, Git seems to be most widely used and gaining traction. That's not necessarily a reason to use it, but Git is what I've used, so it's what I'm writing about.
See the next post to begin a walkthrough tutorial of Git: Part II--Git Started.
Vote for this article on DZone.


Donny said...

Thanks a lot for your post on Git :D
It is very helpful!!

Minearo said...

How is a DVCS better than a centralized VCS? The explanation is extremely small and very fast. Can anyone give a better description of how a DVCS is better?


Ryan said...

@Donny: thx for the feedback. I'm glad you found it helpful.

@Minearo: There seems to be some demand for a more thorough exploration of these issues, so I might do that in the future. I should have made it clear that I was only trying to give a brief view of some of the problems I'm seeing right now and not an exhaustive comparison of centralized vs. distributed version control.