This one has been a long time coming. I've been using Git as a client to Subversion at work now for a number of months. Over the weekend, I attended a session on Git at the NFJS conference, and it served to solidify some of the concepts in my mind. I think I'm ready to do one or more posts on the subject now.
This first post is going to be more an examination of distributed vs. centralized VCS than anything specific to Git itself. The remainder of the posts in this series of currently unknown length will be a quick start tutorial to get you up and running, teaching you how to use Git at a very basic level with just a single, local Git repository. When I was trying to get going with Git, I couldn't find a good reference online to get me up and running in the way I needed, so I'm going to try to make this as complete as possible. I assume zero Git knowledge here, but I do assume you're familiar with version control in general.
Articles in this series:
- Part I--Distributed vs. Central VCS
- Part II--Git Started
- Part III--The Basics
- Part IV--Branching
- Part V--Merging
- Part VI--Rewriting History
- Part VII--Other Useful Stuff
- Part VIII--Connecting Git to Subversion
- Part IX--Communicating from Git to Subversion
- Part X--Communicating Between Repositories
First, I'm going to examine why we even need to care about a new VCS. What does Git have that SVN doesn't? That really comes down to the fact that they're two different beasts. It's not so much Git vs. SVN, but distributed VCS (DVCS) vs centralized VCS.
With centralized version control, a development team is locked in in terms of workflow. You have a local copy of the project (or more than one). You can make changes to your heart's content, but everything is strictly local until and unless you commit back to the repository. Then whatever you did is visible to everyone. There's no in-between here, no compromise. I actually find it quite remarkable that developers the world over have been content with this model for so long. There are a few very immediate problems that come up.
Problems with Central Control
Scenario: You're working on adding a new feature to an application and you want another developer to give you a hand with part of it or look over some code you think is questionable, or maybe he's just working on something else that interacts with your piece. Your code needs to be available for the other guy to work with. With a centralized VCS, your only options are to 1) commit your code, making it visible to everyone else, or 2) share it some other way outside of the VCS. Option 2) is just messy, if workable. Option 1) would probably be preferred for the most part. The problem here is that your code could be in very early development, possibly even broken. We all know there's a horrible taboo on committing broken code, and rightly so with central control. So you're stuck with not being able to share or with having to stop work to make things non-broken to the point that you can safely commit before you can collaborate on a piece of code.
This problem is significantly more insidious. Any time you commit something to a centralized VCS, you're essentially affirming that this feature will absolutely be included in the next release. This assumes, of course, that you're not doing work on your own branch specific either to your feature or yourself--a whole other worm can with centralized version control. As your release process becomes more streamlined and automated (that's happening, right?) and your release cycles get shorter because you're aiming to be a more agile--that's agile rather than necessarily Agile--team (that's happening, too, right?), it becomes more important to not schedule features for a release, but to schedule a release for a date and put in whatever features are finished on that date. This means it's imperative that unfinished features never be committed to your mainline development branch, like trunk, since you can't know exactly when a feature will be ready for release. Instead, features should be developed in isolation and only moved into the "trunk" branch when completely finished. This leads back to the branch-per-feature idea, which central VCS in general and SVN in specific just aren't designed for.
This one's simple and may be less compelling to some. It's a big hangup for me. A VCS is all about keeping track of history, right? Well, who ever said history was perfect? Only a centralized VCS tries to enforce that by demanding that every commit be 100% bug-free since everyone is going to see it. This leads to larger, less frequent commits, potentially as much as days apart. Days! Why have we come to accept this as the norm? I say version control should be no more than an hour behind you at worst. Small, incremental changes are better. VCS should be a coarse-grained undo system.
How does a DVCS answer these concerns? Laying out the problems took quite a bit of space, but interestingly, they're all solved by this fundamental difference: in a DVCS, you not only have your own working copy, you in fact have your own copy of the entire repository, or at least the pieces you care about. You can change anything about it in any way, and nothing leaks out to anyone else unless 1) you push it out or 2) someone else pulls it from you intentionally. This solves all three of my problems since nobody, including a release, will see what you've done until you're ready for them to, but at the same time, your changes are available to anyone who's interested.
Now that you know why a DVCS is a good idea, let's get back to Git itself. There seem to be three mainstream DVCSs out there today: Git, Mercurial, and Bazaar. Of the three, Git seems to be most widely used and gaining traction. That's not necessarily a reason to use it, but Git is what I've used, so it's what I'm writing about.
See the next post to begin a walkthrough tutorial of Git: Part II--Git Started.
Vote for this article on DZone.