I've fiddled with my blog template because I decided I wanted more horizontal viewing space, given that it was using less than a third of my 1920 horizontal pixels. If it feels too spread out for you, I added a drag-and-drop handle over to the left to let you resize the main content column. The javascript is pretty primitive. If it breaks, drop me a comment.

Sunday, September 6, 2009

The Journey to Git, Part X—Communicating Between Repositories

So you want to do some collaboration using Git. If you don't know where to start, you're in the right place. Start here. This post, like my earlier Git posts, will take you on a guided tour of how to collaborate with others (or yourself) using Git remoting. It will be light on theory and practical application of principles and instead focus on the "how" so you can start using it as quickly as possible.

In this post, I assume you're comfortable working with a single Git repository with the basic commands like "git add", "git commit", "git branch", "git merge", and so on. If you're not to that point yet, hop back to my earlier posts in this series for a quick walkthrough:


Articles in this series:

Making a Clone

We need an existing repository to start from, so create a directory named "cloneme", change to it, and set up a repository like so:

git init
echo "foo" > foo
git add foo
git commit -m "first commit"

Simple enough: a repository with one commit and one file being tracked. Now move to the parent directory of cloneme, and run:

git clone file:///path-to-cloneme clone

Note: The "/path-to-cloneme" part should be the absolute path to the cloneme directory. It's best to go absolute here for a couple of reasons. Don't use a relative path unless you understand the implications of having a relative path stored in your .git/config file.

You've just performed your first remote Git operation by cloning an existing repository. As you might expect, you now have a complete copy of the "cloneme" project in the "clone" project. Note, however, that it's not just a copy of the working tree. It's a complete clone of the original repository. Git is, after all, a distributed VCS.

All we did in this first clone was basically a filesystem copy since we used the "file://" transport. Git, of course, supports remote operations over networks with other transports: ssh, rsync, http, https, and a native "git" transport. Each has its own, very similar, URL syntax for specifying how to find a remote repository. I use the ssh transport almost exclusively. It's secure and just as easy to use as the file transport.

At this point, you have two repositories with identical content. Running "git log" in both of them, for instance, would produce identical output. Start up gitk now, and you'll see the familiar "master" designator pointing at the head of the branch, but next to it is another thing that says "remotes/origin/master". The initial "remotes" is kind of a namespace that's set aside for specifying branches that are in remote repositories. The next piece, "origin", is the name of the remote repository, and the final one is the name of the branch in that remote repository. When you clone a repository, the cloned one automatically becomes the "origin" for the clone, making for convenient interaction with it, as we'll see in a moment.

What this gitk output is telling you is that the head of the remote repository's master branch is at the same commit as your local master branch... as far as this repo knows. Changes in a remote repository are not automatically detected by gitk, so something in the remote could've changed, but gitk won't reflect it until you "git fetch" it. Let's take a look.

Getting New Changes from the Origin Repo

Go back to cloneme, and make a new commit:

echo bar >> foo
git commit -am "second commit"

Now go back to clone. Both "git log" and gitk will show exactly the same thing as before. As I mentioned, these two commands don't do any remoting, so they have no way of knowing about the change. In order to see the new commit, you need to fetch it:

git fetch

When run with no arguments, this command will retrieve all of the latest changes from the remote repository named "origin". That's some of the convenience that I mentioned earlier. Run gitk again, but this time with "gitk --all", or you'll only see a partial picture. Now you can clearly see that the remote named "origin", which is cloneme, is one commit ahead of clone.

Note: When I say "all of the latest changes", I do mean "all". In this exercise, we're confining our work to a single branch, but "git fetch" retrieves the latest changes from all of the branches of the specified remote, as well as any new branches that have been created.

Next run:

git status

You'll see that it also quite clearly tells you that "origin" is ahead of you with a message like:

Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.

Let's go ahead and do the mentioned "fast forward":

git merge remotes/origin/master

That should seem pretty natural to you. It's just a simple fast-forward merge, the same as you'd use to merge any branch into another. The only difference is that you're effectively merging changes from a remote branch into a local one.

Local and Remote Branches

This is a good place to take a look at exactly what that "remotes/origin/master" thing is. Run:

git branch -a

You should see output like:

* master

The -a flag to "git branch" tells the command to display both local and remote-tracking branches, which is what remotes/origin/master--shown as "origin/master" here--is. It's a local representation of a remote branch. A remote-tracking branch exists for the sole purpose of storing commits that you fetch from remote repositories. You don't ever make any commits or do anything else to them except for fetch remote changes into them.

You can, however, make a local branch that "tracks" a remote-tracking branch and make commits there. We'll get into the details of that later, but you already have one of these. The master branch of the repository in "clone" is a local branch that tracks the remotes/origin/master remote-tracking branch. It was set up this way when you did the clone. That's how Git was able to tell you that you were a commit behind the remote branch. It knows that your local branch "master" is tracking a branch named "master" in the remote named "origin".

The Fast Way: Pull

The fetch and merge are fine for illustrating what's happening, but generally you just want to pull the latest changes from the remote repository directly into your local branch, and the two separate commands are an unnecessary step. Enter "git pull". This command is nothing but a combination of "git fetch" and "git merge". It's even clever enough to figure out what you want it to do without any arguments if you're on a branch that's tracking a remote-tracking branch, like your master branch in "clone". Go make another commit in "cloneme":

echo baz >> foo
git commit -am "third commit"

Now switch to "clone" and simply run:

git pull

Everything happens automatically, and the "master" branch of "clone" now has the new commit in it. As I mentioned, you didn't have to tell "git pull" which branch to merge from because the current branch, "master", tracks "remotes/origin/master", so that's the one it selects for the merge.

Note: Unlike "git fetch", "git pull" doesn't pull all changes from all branches into the matching local branches. Since part of a pull is a fetch, it does fetch all of the changes into the remote-tracking branches, but only the current local branch is updated with changes from its respective remote-tracking branch. That is, only one merge is performed upon a pull.

Everything so far has just been in one direction: from the original repository to the clone. Eventually, you'll want to go back in the other direction. Make a fourth commit in the "clone" project:

echo clone >> foo
git commit -am "commit in clone"

Now switch to the "cloneme" project. When you clone a repository, the cloned one doesn't gain any knowledge of the clone, so it should be no surprise that running a simple "git pull" from "cloneme" will get you an error like:

fatal: 'origin': unable to chdir or not a git archive
fatal: The remote end hung up unexpectedly

Detour: Configuring a New Remote Repository

Remember that "git pull" tries to fetch changes from "origin" if you don't tell it something different. Because this repository wasn't cloned from anything, it doesn't have an "origin". We'll need to tell it where it can get changes from by adding a remote repository:

git remote add theclone file:///path-to-clone

Note: Again, /path-to-clone should be the absolute path to the "clone" project.

This adds a remote named "theclone" to this repository's configuration.

Pull Continued

With the newly configured remote, pulling changes is as simple as:

git pull theclone master

Why the extra arguments? Well, first, we have to specify the name of the remote, since the default is "origin". We could have named our remote "origin", but that's not really what it is, so I picked something else. As for the "master" part, since our current branch--the local branch "master"--isn't set up to track any remote-tracking branches, "git pull" doesn't have any information about which remote branch to merge changes from. Therefore, we explicitly state which branch we want to use. The changes are pulled into the current branch.

You now know how to clone repositories, add remotes, and pull changes. That's about all you need to know to start using Git to collaborate on projects; however, there's one more thing that Git lets you do: push. Because of what it does, it's somewhat more difficult to use correctly. There are some caveats, which I'll mention as we go along.

Pushing Changes Instead of Pulling

When would you need to push changes out instead of pulling them in? Well, it's great that Git is distributed and that everyone has their own complete repository for working in, but if you were working on a project team of even moderate size, you can imagine how difficult it would be to say what the "current" state of the project is if everybody just has their own repos and swaps changes at will. You would want to create what Git terms a "blessed" repository. That's a repository where finished work gets pushed to and where you pull from to get the latest "official" state of the project.

Warning--Angels Fear This

Before we go on, let me clearly state that the Git FAQ says you should only push to a bare repository "until you know what you are doing". A bare repository is one that was created with the --bare option. It has no working tree. It says this because pushing into a branch that is checked out to a working tree can be problematic. That's what we're going to do here, though, because properly managed, it's not an issue, and I find it to be very useful to sync changes between two different computers that I'm working on. Just realize that the issues we'll encounter related to working tree state don't arise when you follow the FAQ's advice of pushing only to bare repos.

The Simple Push

At this point, your two repositories, "cloneme" and "clone" should be in sync. That is, they both have the same set of four commits in them. A "git pull" from either side will end with an "Already up-to-date", and neither has any uncommitted changes. Let's add a new commit to "cloneme" and push it to "clone":

echo pushme >> foo
git commit -am "a commit to be pushed"
git push theclone

The first thing to note is that we didn't specify a branch name, only the name of the remote. When you do that, changes in all local branches are pushed to the remote if a branch with the same name already exists there. In other words, if we were to create a new branch named "mybranch" in project "cloneme" and run "git push theclone" again, no changes would be made because that branch doesn't exist in "clone". If you want to send the new branch across, you could do it by specifying the branch name like "git push theclone mybranch".

Why Push Isn't So Simple

Let's go see what "clone" looks like now. You might be a bit surprised at the result. A "git log" will show you that the latest commit was pushed successfully. However, "git status" shows that you have changes in your index. How did this happen? It was clean before the push. Well, run a "git diff --staged" to see what it says has changed. You should see something like this:

diff --git a/foo b/foo
index 5a347e2..90c3f45 100644
--- a/foo
+++ b/foo
@@ -2,4 +2,3 @@ foo

It's saying that in project "clone", you've removed the line that you just added in "cloneme". Why? Because "git push" does not make any changes to the working tree or index of a remote repository, lest work be lost. Particularly when you push to a remote that's not in your control, you have no way of knowing whether somebody else is making changes to the working tree or index at the same time, and you can imagine the havoc if "git push" were to mess with those changes. So while the new commit was added to the repo, the working tree hasn't been touched, and is in the same state as it was when the HEAD^ commit was the latest. Therefore a "git diff" shows exactly that: the output you would expect from running "git diff HEAD HEAD^" in either of the repositories.

To correct this, since you know that no work will be lost, simply run:

git reset --hard

Now your working tree and index properly reflect the tip of the branch, where you want them to be.

Another Restriction on Push

There's one more caveat about "git push": by default, it will only succeed if you can fast-forward the remote branch(es) you're pushing to. Put another way, if you're pushing from "cloneme" master to "clone" master, then the set of commits in "cloneme" must be a superset of the ones in "clone", or the push can't succeed. Again, it's a question of overwriting someone else's work. The most likely way for this to happen is if you're trying to push changes to a remote branch that you previously pulled from, but someone else has added new commits to it in the meantime. The solution in that case is to do another "git pull" to get the latest changes, and then you'll be able to push because you'll have the required superset of commits.

Of course, you can force Git to do a non-fast-forward push. Just make sure you understand that this will destroy work that's been done! Let's look at an example. In project "clone", make a new commit:

echo loseme >> foo
git commit -am "this commit will be lost by a bad push"

Now go back to "cloneme" and run:

echo destroyer >> foo
git commit -am "this commit will cause the loss of a commit in clone"

First, try a typical push:

git push theclone

It will result in an error like:

 ! [rejected]        master -> master (non-fast forward)
error: failed to push some refs to 'file:///cygdrive/c/dev/projects/clone'

Now force it to do the push with:

git push theclone +master

The '+' indicates that Git should force the push. Go over to "clone" now, and a "git log" will show you that the last commit we made there has disappeared. Because we're pushing to a non-bare repository, the index will still have the lost change in it, but another "git reset --hard" will bring it up to date with the repo.


And that, as they say, is that. Journey complete. If you've read and followed along with all of my Git posts, you may be an incurable geek, and you certainly should know enough to be dangerous with Git and to start seeing how great it is in comparison with a centralized VCS. Aside from a quick command reference, which is almost finished, this is all I plan to post about Git for the time being (finally!!! woohoo!!!). If you have any questions, feel free to drop me a comment, and I'll answer it to the best of my ability.

Late addition: I've published a Git reference card on Scribd that should be good for reminding you of the commands you need to use without having to dig back through these posts.