Shotguns and Penguins

Friday, January 22, 2010

Git Quick Reference

I've been lazy/sick/on vacation for a while, but I think I'm finally ready to release my Git Quick Reference into the wild. It's kind of a follow-up to my series of Git tutorial posts, collecting all the important stuff into one, relatively short document for easy access once you've started down the road of learning your way around Git. It's available on ~~Scribd: Git Quick Reference~~ Google Docs: Git Quick Reference

Update (2013-04-19): Since Scribd apparently no longer offers free access, here's the Quick Reference available via my Google Drive. I'm also attempting to include it inline here for convenience. I've made little effort to ensure it looks right here. It's just a quick export from LibreOffice plus a little minor surgery to clean up the major flaws. I recommend the PDF for serious use.

I've tried to list the most commonly used/useful commands here. They're broken up into a handful of categories describing general activities you perform in Git. Most commands have multiple variants, listed separately, composed of different options and arguments. A few commands have variants listed under multiple categories because they have quite varied uses. Options that apply to multiple variants of a command are listed to the far right.

Setting up the repo

Use these commands to create and configure a repository and get up and running with Git.

git init	Create a repo in the current directory.	--bare Create a bare repo (no working tree)
git clone <url>	Clone the repo at the given URL into a local directory.
git clone <url> <dir>	Clone into the specified directory.
git config <key> <value>	Set the "key" to "value" in the config file.	--global Change user-wide configuration. --system Change system-wide configuration.
git config -–unset <key>	Remove "key" from the config file.
git config -e\|--edit	Edit the config file.

Getting things into your repo

These commands deal with turning your work into history (commits) stored in the repository. There are four areas that can hold changes, where a "change" is roughly equivalent to a new file, a removed file, or a modification to an existing file.

1) Working tree—This is your workspace. As soon as you change a file, you've changed your working tree.
2) Index—This is a staging area for changes that are ready to be committed to the repository. When a change is staged to the index, it is no longer considered to be in your working tree, though your filesystem still reflects changes that have been staged. You stage changes when preparing to commit them.
3) Repository—Upon commit, any and all staged changes turn into a commit in the repository. Commits are a permanent part of history.
4) Stash—This is an out-of-sight holding area for changes. When you stash changes, they are no longer in your working tree, and the filesystem doesn't reflect them anymore, either.

The diagram to the right roughly illustrates the flow of changes around your local repository.

The commit is the basic building block of a Git repository. Each commit has, among other things, an author, a timestamp, 0 or more parent commits, and a complete snapshot of the state of the project when the commit was made. Every commit is identified by a SHA-1 hash that is calculated from the previously mentioned properties plus some. This means that if two commits have the same hash, then they have the same parent(s), author, etc. This identity is unique even across repositories.

git add <path> [<path>...]	Stage files and directories to the index. Directories are staged recursively.
git add -p [<path>...]	Interactively choose what to stage for commit.
git mv <src> <dest>	Stage a file rename.
git add --all	Stage everything, including deletions and untracked files.
git rm <file>	Stage a file deletion.
git rm -r <dir>	Stage a recursive directory deletion.
git commit	Commit staged changes to your local repository. Opens your configured editor for a commit message.	-m "message" Specify commit message on command line. -C <commit> Use the message from the given commit for this commit. Very useful in conjunction with --amend to reuse the last commit message (-C HEAD).
git commit <path> [<path>...]	Commit changes in the given files and/or directories.
git commit -a	Stage and commit all changes to tracked files.
git commit --amend	Alter the most recent commit to also contain staged changes.

Housekeeping

This group of commands is for moving changes between the working tree, index, and stash. At first, the stash may feel like an odd thing to have in a VCS, but it will soon become second nature. A typical workflow goes like this:

•code, code, code
•boss brings a hot bug that needs fixing NOW
•git stash
•fix bug
•commit
•git stash pop
•code, code, code

Any time you need a quick place to shove aside your in-progress work, the stash is there waiting.

git reset	Unstage all changes. I.e. move them from the index back to the working tree.
git reset <path> [<path>...]	Unstage changes to the given files and directories.
git clean -f	Delete any non-ignored, untracked files in your working tree.	-n Dry run; only show what would be deleted.
git clean -fd	Delete directories as well as files.	-n Dry run; only show what would be deleted.
git checkout <path> [<path>...]	Revert working tree changes to the given files and directories. Doesn't affect staged changes.
git stash	Push all changes in the working tree and index onto the top of the stash. (Acts like a stack.)
git stash list	List each stash that's been made..
git stash apply	Apply most recent stash to the working tree.	stash@{n} Use stash number n instead of most recent stash.
git stash pop	Apply most recent stash and remove it (if no conflicts occur).
git stash drop	Delete most recent stash.
git stash show	Show a diffstat of the most recent stash.
git stash show -p	Show a diff of the most recent stash.

Working with branches

Branches in Git are lightweight and highly flexible. In fact, a branch is little more than a Post-it note stuck to a commit that shows where the tip of the branch is. When a new commit is made at the branch tip, the Post-it moves to the new commit. Since a branch is so light, it has no knowledge of the commits it contains within itself. It just points at a commit and is considered to "contain" all ancestors of that commit. The ancestry determines the content of the branch, and it makes merging relatively trivial. To merge n branches, a new commit is created which has n parents. Each parent was the tip of one of the branches being merged. The new commit then shares the ancestry of all n branches. As you come to understand the concept of ancestry determining branch content, you'll begin to understand the power that Git places at your fingertips.

While Git is very good at merging, conflicts are sometimes unavoidable. When a conflict occurs, the merge won't be committed automatically. Instead, successfully merged files will be left in your index, and unmerged files will be in your working tree. To finish the merge, resolve the conflicts, stage the unmerged files, and do the commit yourself. Note that merge conflicts occur not only when merging, but can also happen when popping a stash, rebasing, performing remote operations, or anything else that involves combining commits somehow. For information on integrating Git with various merge tools, check out the "merge.tool" option of git config.

git branch	List local branches.
git branch -r	List remote-tracking branches.
git branch -a	List all branches (remote and local).
git branch <name>	Create a branch with the given name at your current commit.	--track Use with the second form above to create a branch that tracks another branch, typically a remote branch. Then remote operations on the new branch will automatically use the tracked remote branch.
git branch <name> <commit>	Create a branch with the given name pointing at the given branch or commit.
git checkout <branch name>	Make the given branch your current branch. Your working tree will reflect the state of the commit at the tip of the branch, and new commits will be applied to the branch.
git checkout <commit>	Make your working tree reflect the given commit. You won't be on a branch, and any commits made will be lost if you don't create a branch to contain them.
git checkout -b <name>	Create a new branch with your currently checked-out commit as its parent, and check out the new branch.
git checkout -b <name> <commit>	Like the git branch command of similar form, but checks out the new branch.
git branch -d/-D <name>	Delete the given branch. If not an ancestor of your current branch, you must force deletion with -D.
git merge <branch name>	Merge the given branch into your current branch.
git merge <branch name> -m "message"	Perform a merge using the given message for the commit created by the merge.
git cherry-pick <commit>	Copy the given commit to your current branch.
git reset <commit>	Move to the given commit, making it your HEAD and the tip of your current branch. Changes introduced are left in working tree.
git reset --soft <commit>	Same as above, but leave changes in the index.
git reset --hard <commit>	Same as above, but don't leave changes anywhere.

Rebasing

While this really belongs to the section on branches, it's a difficult concept, so I gave it its own spot. If you're coming from a centralized VCS, rebasing will take some time to wrap your head around. To make it as simple as possible, think of the commits in a branch like they're links in a chain. A rebase lets you measure out a length of the chain, cut it off at a particular link, then reattach the part you cut off to a different link somewhere else, maybe even on a different chain.

On top of that, a rebase actually reattaches a single link of the chain at a time, and you can tell it to let you modify and reorder the links (commits) as it applies them. This is called an "interactive" rebase. You can even use it to go back in history and make some changes without even moving anything.

If a rebase is interrupted by a conflict or if you're doing an interactive rebase and choose to modify a commit during the process, you'll have to give Git further instructions after you do your work. When a rebase stops, it will stop after a commit's changes have been applied to the index and/or working tree, but before the changes are committed. At that point, you can continue it, abort it, or skip the commit that it stopped on.

git rebase <commit>	Take commits in the current branch that aren't ancestors of the given commit, and move them onto the given commit. (<commit> = green; moved commits = purple)
git rebase --onto <commit2> <commit1>	Like above, but move to the "onto" commit instead. This lets you specify a range of commits to be "clipped" and moved over to any arbitrary commit. (<commit1> = orange; <commit2> = green; moved commits = purple)
git rebase -i <commit>	Rewind all commits back to <commit>, and reapply them, optionally editing commits along the way. In fact, any rebase operation can be made interactive by supplying this flag.
git rebase --continue	Continue an interrupted rebase.
git rebase --abort	Abort an interrupted rebase, returning to the state before the rebase started.
git rebase --skip	Skip the current commit and continue with the rebase. The skipped commit won't appear in the final result.

Working with other repos

This section has all the commands you need to communicate between repositories. Remoting is a key part of Git. It's what makes Git distributed. Remoting in Git consists of connecting a repository to other ones (remotes) and of pushing and pulling commits to and from those repositories. You provide a name and a URL when you configure a remote. The name is your identifier for a remote, and the URL is how Git locates it. There are a number of supported transports, but all use a similar URL format:

• rsync://host.xz/path/to/repo.git/
• http://host.xz/path/to/repo.git/
• https://host.xz/path/to/repo.git/
• git://host.xz/path/to/repo.git/
• ssh://[user@]host.xz[:port]/path/to/repo.git/
• file:///path/to/repo.git/

For a fuller treatment of URLs, see "GIT URLS" in "git help pull".

When you use remote repositories, Git automatically creates branches called "remote-tracking" branches that store the state of those remotes. You never work directly on a remote-tracking branch. They're just mirrors of your remotes that are used as synchronization points for pushing and pulling commits.

It's worth mentioning that when you clone a repository, Git creates a remote in the new repository named "origin" that points at the repository you cloned.

git remote	List this repository's remotes.
git remote add <name> <url>	Add a remote repository with the given name and located at the given URL.
git remote rename <old> <new>	Change the name of a remote repository. Updates branch names and the remote configuration as well.
git remote rm <name>	Remove a remote repository and its remote-tracking branches and configuration entries.
git fetch	Fetch unfetched commits from all branches in the remote named "origin" into remote-tracking branches.
git fetch <remote name>	Fetch from the given remote instead of "origin".
git pull <remote name> <branch>	Fetch a branch from a remote and merge it into your current working branch.
git push	Push all new commits in "matching" branches to the remote named "origin". A "matching" branch is one that exists both locally and on the remote.	--all Push all local branches, creating them in the remote if they don't exist.
git push <remote name>	Push matching branches to the given remote.
git push <remote name> <branch>	Push the given branch to the given remote.

Seeing the things in your repo

These commands are all read-only and give you different ways of looking at the history of your files. Many Git commands take a commit as input. In this guide, that is generally denoted with the placeholder <commit>. Most of these commands actually accept what's known as a "treeish", meaning anything that can be dereferenced to a "tree", which is an internal object stored by Git. Here are some popular forms that a treeish comes in:

•commit hash: c42b40edc2b5b09e565e20663079e9c14b37aa21
•small, unique part of a commit hash: c42b4
•branch name: master
•special "current commit" ref: HEAD
•the suffix "^" indicates "parent of": HEAD^ master^ c42b4^
•"^" applies multiple times: HEAD^^^

See "SPECIFYING REVISIONS" in "git help rev-parse" for more ways of naming a commit/treeish.

git status	Display paths with uncommitted changes. Shows three sections: changes in the index, changes in the working tree, and files not being tracked by Git.
git log	Show the history of your current branch or commit.	-p Under each commit shown in the log, show a diff of the changes introduced by that commit. --stat For each commit, show a summary of files changed by the commit.
git log <commit>	Show history a particular branch or commit instead of the current one.
git log <path> [<path>...]	Only show commits that affect the give path(s).
git diff	Show changes in the working tree relative to the index.	--staged\|--cached Show changes that are in the index instead of the working tree.
git diff <commit1> <commit2>	Show changes introduced by going from commit1 to commit2.
git diff <commit1>...<commit2>	Show changes that would be applied if you were to merge commit2 to commit1.
git diff <path> [<path>...]	Only show changes that affect the given path(s).
git show <commit>	Show details of and changes introduced by the given commit.
git show <commit>:<path>	Show the content of the given path at the given commit.

Additional references

For when you need a little more help...

git help	Brief list of the most common commands.
git help git	Extensive list of commands and options to the "git" command itself.
git help tutorial[-2]	Two parts of a rapid tutorial introducing basic Git operations.
git help glossary	Glossary of common Git terms to use along with other documentation.
git help <command>	Detailed help on a particular Git command.

You can find a walk-through tutorial for Git on my blog: http://shotgunsandpenguins.blogspot.com/search/label/Git

Feel free to email me with comments or questions:

Ryan Stewart <rds6235@gmail.com>

Creative Commons License

This work is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco, California, 94105, USA.

Sunday, September 6, 2009

The Journey to Git, Part X—Communicating Between Repositories

So you want to do some collaboration using Git. If you don't know where to start, you're in the right place. Start here. This post, like my earlier Git posts, will take you on a guided tour of how to collaborate with others (or yourself) using Git remoting. It will be light on theory and practical application of principles and instead focus on the "how" so you can start using it as quickly as possible.

In this post, I assume you're comfortable working with a single Git repository with the basic commands like "git add", "git commit", "git branch", "git merge", and so on. If you're not to that point yet, hop back to my earlier posts in this series for a quick walkthrough:

Making a Clone

We need an existing repository to start from, so create a directory named "cloneme", change to it, and set up a repository like so:

git init
echo "foo" > foo
git add foo
git commit -m "first commit"

Simple enough: a repository with one commit and one file being tracked. Now move to the parent directory of cloneme, and run:

git clone file:///path-to-cloneme clone

Note: The "/path-to-cloneme" part should be the absolute path to the cloneme directory. It's best to go absolute here for a couple of reasons. Don't use a relative path unless you understand the implications of having a relative path stored in your .git/config file.

You've just performed your first remote Git operation by cloning an existing repository. As you might expect, you now have a complete copy of the "cloneme" project in the "clone" project. Note, however, that it's not just a copy of the working tree. It's a complete clone of the original repository. Git is, after all, a distributed VCS.

All we did in this first clone was basically a filesystem copy since we used the "file://" transport. Git, of course, supports remote operations over networks with other transports: ssh, rsync, http, https, and a native "git" transport. Each has its own, very similar, URL syntax for specifying how to find a remote repository. I use the ssh transport almost exclusively. It's secure and just as easy to use as the file transport.

At this point, you have two repositories with identical content. Running "git log" in both of them, for instance, would produce identical output. Start up gitk now, and you'll see the familiar "master" designator pointing at the head of the branch, but next to it is another thing that says "remotes/origin/master". The initial "remotes" is kind of a namespace that's set aside for specifying branches that are in remote repositories. The next piece, "origin", is the name of the remote repository, and the final one is the name of the branch in that remote repository. When you clone a repository, the cloned one automatically becomes the "origin" for the clone, making for convenient interaction with it, as we'll see in a moment.

What this gitk output is telling you is that the head of the remote repository's master branch is at the same commit as your local master branch... as far as this repo knows. Changes in a remote repository are not automatically detected by gitk, so something in the remote could've changed, but gitk won't reflect it until you "git fetch" it. Let's take a look.

Getting New Changes from the Origin Repo

Go back to cloneme, and make a new commit:

echo bar >> foo
git commit -am "second commit"

Now go back to clone. Both "git log" and gitk will show exactly the same thing as before. As I mentioned, these two commands don't do any remoting, so they have no way of knowing about the change. In order to see the new commit, you need to fetch it:

git fetch

When run with no arguments, this command will retrieve all of the latest changes from the remote repository named "origin". That's some of the convenience that I mentioned earlier. Run gitk again, but this time with "gitk --all", or you'll only see a partial picture. Now you can clearly see that the remote named "origin", which is cloneme, is one commit ahead of clone.

Note: When I say "all of the latest changes", I do mean "all". In this exercise, we're confining our work to a single branch, but "git fetch" retrieves the latest changes from all of the branches of the specified remote, as well as any new branches that have been created.

Next run:

git status

You'll see that it also quite clearly tells you that "origin" is ahead of you with a message like:

Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.

Let's go ahead and do the mentioned "fast forward":

git merge remotes/origin/master

That should seem pretty natural to you. It's just a simple fast-forward merge, the same as you'd use to merge any branch into another. The only difference is that you're effectively merging changes from a remote branch into a local one.

Local and Remote Branches

This is a good place to take a look at exactly what that "remotes/origin/master" thing is. Run:

git branch -a

You should see output like:

* master
  origin/HEAD
  origin/master

The -a flag to "git branch" tells the command to display both local and remote-tracking branches, which is what remotes/origin/master--shown as "origin/master" here--is. It's a local representation of a remote branch. A remote-tracking branch exists for the sole purpose of storing commits that you fetch from remote repositories. You don't ever make any commits or do anything else to them except for fetch remote changes into them.

You can, however, make a local branch that "tracks" a remote-tracking branch and make commits there. We'll get into the details of that later, but you already have one of these. The master branch of the repository in "clone" is a local branch that tracks the remotes/origin/master remote-tracking branch. It was set up this way when you did the clone. That's how Git was able to tell you that you were a commit behind the remote branch. It knows that your local branch "master" is tracking a branch named "master" in the remote named "origin".

The Fast Way: Pull

The fetch and merge are fine for illustrating what's happening, but generally you just want to pull the latest changes from the remote repository directly into your local branch, and the two separate commands are an unnecessary step. Enter "git pull". This command is nothing but a combination of "git fetch" and "git merge". It's even clever enough to figure out what you want it to do without any arguments if you're on a branch that's tracking a remote-tracking branch, like your master branch in "clone". Go make another commit in "cloneme":

echo baz >> foo
git commit -am "third commit"

Now switch to "clone" and simply run:

git pull

Everything happens automatically, and the "master" branch of "clone" now has the new commit in it. As I mentioned, you didn't have to tell "git pull" which branch to merge from because the current branch, "master", tracks "remotes/origin/master", so that's the one it selects for the merge.

Note: Unlike "git fetch", "git pull" doesn't pull all changes from all branches into the matching local branches. Since part of a pull is a fetch, it does fetch all of the changes into the remote-tracking branches, but only the current local branch is updated with changes from its respective remote-tracking branch. That is, only one merge is performed upon a pull.

Everything so far has just been in one direction: from the original repository to the clone. Eventually, you'll want to go back in the other direction. Make a fourth commit in the "clone" project:

echo clone >> foo
git commit -am "commit in clone"

Now switch to the "cloneme" project. When you clone a repository, the cloned one doesn't gain any knowledge of the clone, so it should be no surprise that running a simple "git pull" from "cloneme" will get you an error like:

fatal: 'origin': unable to chdir or not a git archive
fatal: The remote end hung up unexpectedly

Detour: Configuring a New Remote Repository

Remember that "git pull" tries to fetch changes from "origin" if you don't tell it something different. Because this repository wasn't cloned from anything, it doesn't have an "origin". We'll need to tell it where it can get changes from by adding a remote repository:

git remote add theclone file:///path-to-clone

Note: Again, /path-to-clone should be the absolute path to the "clone" project.

This adds a remote named "theclone" to this repository's configuration.

Pull Continued

With the newly configured remote, pulling changes is as simple as:

git pull theclone master

Why the extra arguments? Well, first, we have to specify the name of the remote, since the default is "origin". We could have named our remote "origin", but that's not really what it is, so I picked something else. As for the "master" part, since our current branch--the local branch "master"--isn't set up to track any remote-tracking branches, "git pull" doesn't have any information about which remote branch to merge changes from. Therefore, we explicitly state which branch we want to use. The changes are pulled into the current branch.

You now know how to clone repositories, add remotes, and pull changes. That's about all you need to know to start using Git to collaborate on projects; however, there's one more thing that Git lets you do: push. Because of what it does, it's somewhat more difficult to use correctly. There are some caveats, which I'll mention as we go along.

Pushing Changes Instead of Pulling

When would you need to push changes out instead of pulling them in? Well, it's great that Git is distributed and that everyone has their own complete repository for working in, but if you were working on a project team of even moderate size, you can imagine how difficult it would be to say what the "current" state of the project is if everybody just has their own repos and swaps changes at will. You would want to create what Git terms a "blessed" repository. That's a repository where finished work gets pushed to and where you pull from to get the latest "official" state of the project.

Warning--Angels Fear This

Before we go on, let me clearly state that the Git FAQ says you should only push to a bare repository "until you know what you are doing". A bare repository is one that was created with the --bare option. It has no working tree. It says this because pushing into a branch that is checked out to a working tree can be problematic. That's what we're going to do here, though, because properly managed, it's not an issue, and I find it to be very useful to sync changes between two different computers that I'm working on. Just realize that the issues we'll encounter related to working tree state don't arise when you follow the FAQ's advice of pushing only to bare repos.

The Simple Push

At this point, your two repositories, "cloneme" and "clone" should be in sync. That is, they both have the same set of four commits in them. A "git pull" from either side will end with an "Already up-to-date", and neither has any uncommitted changes. Let's add a new commit to "cloneme" and push it to "clone":

echo pushme >> foo
git commit -am "a commit to be pushed"
git push theclone

The first thing to note is that we didn't specify a branch name, only the name of the remote. When you do that, changes in all local branches are pushed to the remote if a branch with the same name already exists there. In other words, if we were to create a new branch named "mybranch" in project "cloneme" and run "git push theclone" again, no changes would be made because that branch doesn't exist in "clone". If you want to send the new branch across, you could do it by specifying the branch name like "git push theclone mybranch".

Why Push Isn't So Simple

Let's go see what "clone" looks like now. You might be a bit surprised at the result. A "git log" will show you that the latest commit was pushed successfully. However, "git status" shows that you have changes in your index. How did this happen? It was clean before the push. Well, run a "git diff --staged" to see what it says has changed. You should see something like this:

diff --git a/foo b/foo
index 5a347e2..90c3f45 100644
--- a/foo
+++ b/foo
@@ -2,4 +2,3 @@ foo
 bar
 baz
 clone
-pushme

It's saying that in project "clone", you've removed the line that you just added in "cloneme". Why? Because "git push" does not make any changes to the working tree or index of a remote repository, lest work be lost. Particularly when you push to a remote that's not in your control, you have no way of knowing whether somebody else is making changes to the working tree or index at the same time, and you can imagine the havoc if "git push" were to mess with those changes. So while the new commit was added to the repo, the working tree hasn't been touched, and is in the same state as it was when the HEAD^ commit was the latest. Therefore a "git diff" shows exactly that: the output you would expect from running "git diff HEAD HEAD^" in either of the repositories.

To correct this, since you know that no work will be lost, simply run:

git reset --hard

Now your working tree and index properly reflect the tip of the branch, where you want them to be.

Another Restriction on Push

There's one more caveat about "git push": by default, it will only succeed if you can fast-forward the remote branch(es) you're pushing to. Put another way, if you're pushing from "cloneme" master to "clone" master, then the set of commits in "cloneme" must be a superset of the ones in "clone", or the push can't succeed. Again, it's a question of overwriting someone else's work. The most likely way for this to happen is if you're trying to push changes to a remote branch that you previously pulled from, but someone else has added new commits to it in the meantime. The solution in that case is to do another "git pull" to get the latest changes, and then you'll be able to push because you'll have the required superset of commits.

Of course, you can force Git to do a non-fast-forward push. Just make sure you understand that this will destroy work that's been done! Let's look at an example. In project "clone", make a new commit:

echo loseme >> foo
git commit -am "this commit will be lost by a bad push"

Now go back to "cloneme" and run:

echo destroyer >> foo
git commit -am "this commit will cause the loss of a commit in clone"

First, try a typical push:

git push theclone

It will result in an error like:

 ! [rejected]        master -> master (non-fast forward)
error: failed to push some refs to 'file:///cygdrive/c/dev/projects/clone'

Now force it to do the push with:

git push theclone +master

The '+' indicates that Git should force the push. Go over to "clone" now, and a "git log" will show you that the last commit we made there has disappeared. Because we're pushing to a non-bare repository, the index will still have the lost change in it, but another "git reset --hard" will bring it up to date with the repo.

Finally!

And that, as they say, is that. Journey complete. If you've read and followed along with all of my Git posts, you may be an incurable geek, and you certainly should know enough to be dangerous with Git and to start seeing how great it is in comparison with a centralized VCS. Aside from a quick command reference, which is almost finished, this is all I plan to post about Git for the time being (finally!!! woohoo!!!). If you have any questions, feel free to drop me a comment, and I'll answer it to the best of my ability.

Late addition: I've published a Git reference card on Scribd that should be good for reminding you of the commands you need to use without having to dig back through these posts.

Friday, August 21, 2009

Book Review: xUnit Test Patterns + Code Hangover

This is not a book review.

This is a book review.

Over the past few weeks, I read another book: xUnit Test Patterns. I posted the review on a different blog: codehangover.com. It's a new blog that I'm coauthoring with some former coworkers of mine. I haven't decided exactly what the division of labor between this and that blog will be, but I intend to put the more formal ones, like book reviews and my Git series (I'll finish it soon!), over there.

Some of the other authors on codehangover.com had their own technical blogs, and we decided to combine efforts to hopefully make a more useful blog and, honestly, one that will draw more traffic and maybe earn us all a bit more from affiliate sales ;)

Friday, August 7, 2009

The Journey to Git, Part IX--Communicating from Git to Subversion

In this second part of the Git/Subversion interaction guide, we'll explore the commands that let you do the equivalent of "svn update" and "svn commit". You need to already have a Git repository that's linked to a Subversion repository. The previous post in this series will help you with that if you need it.

Getting a Git copy of a Subversion repository and making local commits/branches/whatever in it is great. How do you get further updates of commits that others have made to Subversion? What do you do when you're ready to send your changes back to Subversion? First, decide which branch it is you want to send or receive commits for, and make sure it's checked out. Both the commands I'm going to discuss work within your current branch.

Before committing anything to Subversion, it's never a bad idea to update first and see if there have been any changes, so let's look at that command first.

Updating from Subversion with Git

The Rebase

It may not seem apparent at first, but when we pull changes from SVN to Git, what we really want is a rebase. Why? To maintain a perfectly linear history for the sake of Subversion. We've seen the git rebase command before. Recall that it's the one that lets you freely squash, edit, delete, and move commits around in a branch. To pull the latest changes from Subversion into your current branch--every Git branch can be traced back to a Subversion branch from which it originated, and that's the one it pulls commits from--you first need a clean index and working tree, and then run:

git svn rebase

Note: See the note at the beginning of my previous post to learn how "git svn rebase" is currently broken in Cygwin and to find a workaround.

Your index and working tree have to be clean because of what the rebase does, which I'll get to in a minute. First, let's examine why we have to use a rebase more in depth. When you're working out of a Subversion repository with Git, you'll always be building on top of Subversion commits. Say you're on a branch where there are two Subversion commits: A and B. Then you make commit C in Git locally. Meanwhile, someone else has made a new commit to Subversion--call it X. When you go to pull that change from Subversion down to your local Git repo, where should it go? Your first inclination might be that your local history should become A -> B -> C -> X, but this is 100% wrong. Subversion already has A -> B -> X, and you're not easily going to convince it to put C in front of X. Git, on the other hand, has no problem at all sticking the X before the C. That's exactly what "git rebase" is for. Therefore, when you pull commit X from Subversion, you want to end up with A -> B -> X -> C. That is, you want Subversion commits to all be on top of each other and ahead of any of your local commits.

Now, a "git svn rebase" behaves much like a normal "git rebase". First, it moves all of your local commits--the ones that Subversion doesn't know about yet--out of the way, effectively taking them out of the branch and making HEAD point at the latest SVN commit that's in your Git repo. Then it pulls down all of the SVN commits that aren't represented locally and applies them one at a time on the HEAD of the current branch. This part should all go smoothly because it's basically just copying history from Subversion into Git. After that's done, the rebase puts your commits back on to the HEAD of the current branch, again one at a time.

When Conflict Occurs

When your commits are being reapplied, it's quite possible that you'll experience conflicts with the new commits you got from Subversion, just as you would with "svn update", and this is the main reason that your working tree and index must be clean before you start any kind of rebase. In the event of a conflict, you use your working tree and index to resolve it.

It's very important that you pay attention to Git's messages during any kind of rebase because if there is a conflict, it becomes an interactive process. The rebase will stop, and you'll be looking at a dirty working tree with unmerged files and possibly some staged changes in the index. All the changes you see are the content of the commit Git was trying to apply when the conflict happened. It's waiting for you to resolve the conflict somehow and then tell it to continue the rebase with:

git rebase --continue

Conflict resolution works exactly as I described in my post on merging. Note that we continue the rebase with the "git rebase" command and not "git svn rebase". Once the rebase is kicked off, it acts just like any other rebase you'd perform in Git, and as with any other rebase, resolving the conflicts and continuing is only one of your three options. You can also skip the current commit with:

git rebase --skip

You'd use this if, for instance, the current commit is no longer applicable because of an upstream commit that you've received. The commit is effectively deleted, and it won't be in the branch when the rebase is complete. The third option is to abort the rebase entirely with:

git rebase --abort

You can always abort up until the rebase is completely finished. An abort takes you back to the state you had before you started the rebase. If you really get yourself in a bind, or if you decide you just don't have the knowledge to resolve the conflicts effectively, you can always abort and come back to it later.

The really important thing about rebasing is that when it stops in the middle, you must see it through to the end one way or another. Don't go off to work on something else until the rebase is complete, or you're really going to confuse yourself.

Now that we've got all the incoming changes, let's see how to send changes back to Subversion.

Commiting to Subversion with Git

Compared to what we've seen so far about Git-SVN interaction, sending your local commits back to SVN is a breeze. Just make sure you have a clean working tree and index, then run:

git svn dcommit

There's not much that can go wrong with this command. The working tree and index have to be clean because a dcommit ends with a rebase or a reset, and we've already seen why you have to clean those up before a rebase. I'm not certain exactly what the rebase/reset does, but I think it has to do with putting the "SVN version" of the commit in your branch in place of your local commit.

When the dcommit is complete, there will be a commit in Subversion matching each of the local commits you had in Git, and the commits in Git will all reflect a git-svn-id, indicating that they're recorded in Subversion.

That's it. Two commands are all you need to swap commits with a Subversion repository. Next up, I'll talk about interacting with remote Git repositories. You'll find that the basics are quite similar to the Subversion interaction, but it's a lot more powerful.

The Journey to Git, Part VIII--Connecting Git to Subversion

In this post, we'll start seeing how to use Git as a client to a Subversion repository. This is an excellent way to get your feet wet with Git without forcing the learning curve on others working on the same project. It might also be a useful intermediate step in moving from SVN to Git by getting all the members of a team accustomed to Git while still having their old SVN client as backup in case they get lost. As has happened previously, when I got to the end of what I thought was one post, I decided it was way too long, so I'm breaking it up into two pieces. This piece discusses cloning an existing Subversion repo and what you'll have after you do that. The next one explains the commands you use to trade commits with Subversion: the equivalents of "svn commit" and "svn update".

Before you dig in here, you should be able to use basic Git commands like commit, checkout, and branch. If you're not comfortable with that, have a look at my earlier posts:

Setting Up for Subversion Interaction

All Subversion interaction is done through a set of special sub-commands that start with "git svn". If you're running Git through Cygwin, there's two things to note. (Non-Cygwin-users can skip to the next paragraph.) First, you need to install the "subversion-perl" package in Cygwin to be able to use the "git svn" set of commands. Second, a change introduced to Cygwin a few months ago slightly broke "git svn" under Cygwin. See this message for a description of the problem and a workaround for it. When I refer to the "git svn rebase" command in the next post, you'll need to use the mentioned workaround in its place. It may also affect the "git clone" command, but I've neither checked it for myself nor seen any reports on it.

Only Cygwin Git users need to do anything special. On Linux and in mysysgit, everything is already in place. I expect the primary way that Java developers will start using Git is by cloning an existing SVN repo, and that's what I'm going to go through here.

Cloning an Existing Subversion Repo

To use Git to work against an existing SVN repository, your first step is to clone it. Remember that Git is a Distributed VCS, meaning you have your own copy of the entire repository. Cloning SVN is one way to get one.

Note: This can take several hours on a large project with a moderate number of branches because of the way SVN stores branches and the sheer number of files that have to be transferred!

If you're ready to start the clone, get the URL of your SVN repo, and switch to the directory in which you want your project directory to live. The "git clone" command creates a subdirectory and checks out the project in it. For a Subversion repo using the standard directory layout--that is, directories named trunk, branches, and tags--run:

git svn clone --stdlayout --username=<your username> <svn url> foo

If your repository structure differs from the standard layout, use this form instead:

git svn clone --trunk=<trunk dir> --tags=<tags dir> --branches=<branches dir> --username=<your username> <svn url> foo

Username is only required if using authentication, obviously, and "foo" is the name of the directory to create to hold the project. If you don't provide this, then the last bit of the URL--after the final '/'--will be used as the directory name.

Working with a Git Clone of a Subversion Repository

After a successful clone, the target directory will have a typical Git repository and working tree in it. Your standard master branch will be there, and master's HEAD is what's checked out. In my experience, the content of master will be the Subversion branch with the most recent commit on it, but I haven't seen this behavior documented anywhere. You can see all of your Subversion branches and tags by running:

git branch -r

Note: I haven't covered remote Git interaction yet, so this may stray a bit into unfamiliar territory. Just understand that master is your local branch, where you do your work. If you were to "git commit" something, this is where that commit would go. All the Subversion branches you just saw are called "remote-tracking branches". For the most part, you can pretend they're not there. They just act kind of like a mirror of the Subversion repo so that you always have a copy of it around. The usefulness of this will become apparent later. Finally, master "tracks" one of the remote Subversion branches, meaning initially it contains exactly the same commits as that branch, and when you commit to or update from Subversion, that's the branch you'll be interacting with.

So now you have a Git copy of your SVN repo. What next? Well, now you can develop away using Git just like you always would: make commits, branch, merge, rebase, etc. There's just one caveat: don't fool with the history that came from Subversion. Don't try to rebase and change SVN commits around, for example. Just treat the commits from SVN as read-only. Immutable. Untouchable. Get the idea? If you screw with SVN's tiny brain in that way, don't come back to me unless it's only to describe how your SVN or Git repo melted down! I'd be interested to hear about that. You'll know the SVN commits because when you "git log", you'll see a special "git-svn-id" line in each SVN commit. Other than that, it's open season for making changes.

Handling Different Subversion Branches from Git

Of course, there's just the one local branch--master--and it's tracking just the one Subversion branch. That means any changes you make on master will always be sent back to that same Subversion branch. How do we send changes to a different branch? Just create a new local branch that tracks the remote-tracking branch you want to interact with. To create and switch to it with one command, run:

git checkout -b <new branch name> --track <remote branch name>

Now all commits made in your Git branch <new branch name> will eventually be sent to the Subversion branch <remote branch name>, and when you update from Subversion on that local branch, you'll get changes from that remote branch. Remember that you can use "git branch -r" to see all the remote branches that Git knows about. If you don't see the branch you want, then you'll need to use this command to refresh your remote-tracking branches with the latest Subversion changes, including new branches:

git svn fetch

Note: If there's a new branch to fetch, it can take a while, though not as long as the initial clone.

The one thing to keep in mind when branching in Git is to make sure to keep the history linear from Subversion's perspective. Don't branch from master and try to commit both the branch and master back to Subversion. Either merge them together, or commit one back, update the other to get it current, and then commit it, too. Again, I'm only interested in hearing about the details of the meltdown.

Those are the high points of cloning a Subversion repo and working in the clone. The next step is to be able to send commits back to Subversion and update the clone that you've made when someone else commits.

Monday, July 27, 2009

The Journey to Git, Part VII--Other Useful Stuff

My previous Git posts were mostly a walkthrough of the basic workflow to get you up and running with Git fast. This post is less that and more a quick survey of other commands that are regularly used and/or useful. Previous posts aren't a prerequisite for this, but you need to at least have a repository with a few commits and branches in it to be able to run the commands and see what they do.

See What Changed

One of the most frequent commands is one ubiquitous to version control:

git diff

This command, by default, simply shows you what is different in your working tree from your index. In other words, it shows you what you've changed since the last commit but haven't staged yet. To see changes you've staged for commit, use:

git diff --staged

Of course, you can also use it to view the changes between any two arbitrary commits and/or branches:

git diff <commit|branch> <other commit|branch>

Note: Unless you want to see history in reverse, you always put the older commit first and the newer commit second.

And finally, you can see just the changes to a particular file or set of files by listing their names after the command and any options:

git diff file1 file2 ...

When you're using commands like this that refer to commits, it quickly gets old to look up their hashes, even when you can just copy/paste them. Fortunately, Git provides a concise vocabulary for specifying commits without using hashes. First, "HEAD" always refers to the "tip", or latest commit, of the current branch. You can also typically use the branch name to refer to the same commit, so we have at least one commit in each branch we can always refer to without knowing its hash. After that, when you know the name of any commit, you can use a caret to say "previous", so "HEAD^" means the commit before the latest commit on the current branch. Likewise, "master^" would refer to the commit before the latest commit on branch master. Carets stack, and each additional one signifies one more commit backward: "HEAD^^^^" is four commits before the latest commit on the current branch. This can also be expressed with "HEAD~4". Just use the tilde and a number to go back a specified number of commits. This is just the proverbial tip of the iceberg on specifying commits, but it's likely all you'll need for a great majority of what you'll do.

Given this new way of specifying commits, a command I use quite regularly is:

git diff HEAD^ HEAD

That is: show me what I did in the last commit. One final note: there are many diff viewing GUIs out there, but I'm not going to go into that much right now. If you've made it this far, you can probably manage setting one of them up on your own. I'll just point you at:

git help config

Search the output for "diff.external", and go from there. If you need more help, drop me a comment, and I'll see what I can do.

See History

The Command Line Way

Another often-used command that's common to VCSs is the log command:

git log

We've used this command in previous posts, but I'm going to add a few variations and a bit of detail to your toolbox here.

You might be accustomed to a log command that shows all the commits made on the current branch. Git does things slightly differently. The "git log" command shows all commits contained within the current branch. It's a subtle difference. When you merge a branch into another, not only does the merge commit show up in the destination branch, the commits from the branch that was merged appear as well. That's because all those commits are part of the state of that branch now. This can be slightly confusing to look at sometimes, but there's a handy option that helps you sort out where each commit came from when you need to:

git log --graph

The --graph option represents branches as lines to the left of the commits being shown. Each commit will have an asterisk next to it in one of the lines indicating which branch the commit was actually made on.

Note: The --graph option also changes the ordering scheme of the commits, potentially causing them to not appear in chronological order. I suppose this is supposed to make it easier to read the graph, but I find it distracting. Use the --date-order option to put them back in chronological order.

Another useful option lets you search commit messages and show only commits that match the search pattern:

git log --grep="some text"

Finally, sometimes it's handy to just see commits from certain times:

git log --since=yesterday

git log --until="15 Jul"

Note that none of these options are mutually exclusive. This is perfectly valid:

git log --graph --since="last week" --until="two days ago"

Note: I don't know what handles the date parsing in Git. I've seen it in the docs somewhere, but I can't figure out where. Whatever it is, it's very versatile.

The Fancy Way

Well, "git log" is great and all that, but what about when you really need to get down in the dirt and pick through the complete history of files? That's what gitk is for. It's similar to "git log" but packs much more detail into a screen. Gitk is one of the GUIs that comes with Git. In Cygwin and msysgit, it's installed along with the Git package, but on Linux, it's a separate package named--wait for it--"gitk". In any of them, the name of the command is also "gitk".

Run "gitk", and take a look around the interface. The top part of the screen is your list of commits, just like in "git log". Click a commit to select it. The bottom part is a diff showing what was changed in the currently-selected commit. In between the two is a control panel that, among other things, shows the SHA-1 of the selected commit, lets you search the selected commit for text, and lets you run rather powerful searches of all the commits appearing in the top pane.

In addition to all of that, "gitk" accepts many options that "git log" does, including some that lend themselves extremely well to the graphical representation. For one, you can see all commits from all branches with:

gitk --all

Another great feature is the ability to view any uncommon history between two branches you're thinking about merging:

gitk branch1...branch2

Run that way, it will show all commits from the latest commit on each of the branches back to the nearest common commit between the two: i.e. all the commits you're about to merge together.

Note: Gitk just runs off of the output from "git log", and it uses the --graph option when it does so. This means the commit ordering isn't necessarily chronological, like I explained above. Use "gitk --date-order" to get them back in order by date.

Hopping Around History

In previous posts, I introduced you to "git checkout" as a way to drop your changes to a file by getting the latest version of the file from the repository:

git checkout <path to file>

and as a way to move to a different branch:

git checkout <branch name>

In fact, "git checkout" can move you to any commit:

git checkout <commit>

This command pulls the state associated with the specified commit from the repository and makes it your working tree.

Note: Although you can use checkout to undo your changes to a file by getting that file from the repo, checking out a commit or branch is different. It isn't allowed to overwrite anything, and it does not perform a merge, so if something is in the way of what you're trying to check out, like a change in your working tree that would be lost by the checkout, Git will refuse to do it. There's two ways out of this situation: stash your changes and do the checkout, or use the -f option to "git checkout" to force the checkout, overwriting any changes.

The message from checking out an arbitrary commit brings up an interesting point: after you do it, you're no longer on a branch. You're on what Git calls a "detached HEAD". Interestingly, though, you can still do just about anything, including commit things. Since you're not on a branch, the commits naturally don't get applied to any branch, but they do happen. They're more or less in limbo, though, and you'll never see them again once you move back to a branch unless you do something to put them into a branch.

Since you can commit outside of a branch, it's rather important that you always ensure you're on a branch when you're working. Both "git status" and "git branch" show what branch you're on, if any. To move back onto a branch when you're not on one, just check out a branch again.

Looking Without Leaping

If all you want is to see what a file looked like at some time in the past, there's a much quicker way to do that than checking out the whole commit:

git show <commit>:<path to file>

Cool. Let's end on a short section. Stay tuned for still more Git goodness as I further explain how to interact with other Git repositories and with Subversion--a really cool feature!

Next stop, Subversion interaction.