This is a guest blog post by Charles O’Farrell, a developer at Atlassian, that will focus on the reasons a team may choose Git as their DVCS of choice. Charles is focused on coding in any DVCS and has spent some time switching users over from ClearCase to Git.

In our previous blog we explored why teams may choose Mercurial as their distributed version control system of choice. Now let’s explore why Git is a strong option as your distributed version control system (DVCS).

Since the dawn of time (1970), geeks have fought a long and bloody war between right and wrong; good and evil; Vim and Emacs. In recent years another set of tools have called upon us geeks to fight once again for our right to spend hours arguing on blogs instead of doing actual work. I speak, of course, of the bitter conflict between Git and Mercurial.

This article takes the *cough*winning*cough* side of Git and looks at some of the compelling reasons why it may have risen to dominance in this epic struggle.

Caveats *yawn*

Firstly, let me be upfront and admit that I would be the last person to claim Git is perfect. Far from it. I have spent far too many hours of my life trying to explain why Git does something completely unexpected. In particular, I always get nervous and start acting shifty when I have to explain the different ‘modes’ of the checkout command. And while msysgit is indeed an amazing release of Git for Windows, after all these years, it still feels like a second-class citizen.

With that said, I originally started my DVCS life with Mercurial, but later switched to Git and never looked back.

Why is that?

Storage Format

For me, the single most distinguishing part of Git is the repository format. Many of the parts that I love about Git stem from the way that it stores and thinks about content.

On the one hand, Mercurial has bet all its chips on append-only logs, optimising (quite reasonably) for disk seeks on a slow, spinning platter. On the other hand, Git stores every commit/file in a simple hashed document repository. Every commit you make, every version of every file, will end up in this repository as a separate entity. Before they introduced the pack file in the very early days, this process was terribly inefficient. But the idea was a sound one, and it is still used today. What is important to note is that the identity of each object is a hash of the contents, which means everything is immutable. To change something as simple as a commit message, you must create a new commit object first. This leads to…

Safer History with Git

No, really!

It always really irks me when people claim Git is “destructive”. On the contrary – I would claim Git is actually the safest of all the DVCS options. As we saw above, Git never actually lets you change anything, just create new objects. What happened to the old version then? Git, Y U no keep my change?!?

Git actually keeps track of every change you make, storing them in the reflog. Because every commit is unique and immutable, all the reflog has to do is store a reference to them. After thirty days, Git will remove entries from the reflog, at which point they can finally be garbage collected. You see, Git won’t remove anything that still has a reference to it. Branches are obviously the most useful way to keep references to commits, but the reflog is another and you don’t even have to think about it!

There is a corresponding command, reflog, which lets you inspect this history of changes just like you would your normal commits with the ‘git log’ command. Don’t leave home without it.

1
2
3
4
5
6
7
8
> git reflog
5adb986 HEAD@{0}: rebase: Use JSONObject instead of strings
6a34803 HEAD@{1}: checkout: moving from finagle to 6a3480325f3beeecbafd351d30877694963a3f01^0
74bd03e HEAD@{2}: commit: Use JSONObject instead of strings
36c9142 HEAD@{3}: checkout: moving from 36c9142e81482f6c3eb8ad110642206a4ea3dfec to finagle
36c9142 HEAD@{4}: commit: Finagle and basic folder/json
1090fb7 HEAD@{5}: commit: Ignore Eclipse files
d6e3e63 HEAD@{6}: checkout: moving from master to d6e3e63889fd98e89e12e53a79bf96b53cbf9396^0

Rewriting History

What I never liked about Mercurial is that it makes it very difficult to retroactively tweak commits. “Why would I want to do that?”, you may ask. If a pull request affects many files or involves significant refactoring, it’s much easier to review if the commits tell a comprehensible story. With Git, it’s easy to “go back in time” to edit earlier commits if necessary. As a result, commit logs in Git can be carefully crafted stories, rather than faithful (but messy) recordings of the order in which the changes were actually made.

There is an extension for Mercurial that does basically the same thing called Mercurial Queues. Mercurial Queues are a way to stack up pre-commits so that you can re-order them before finally deciding to make an actual commit. MQ comes with a whole bunch of separate commands (which aren’t in SVN!).

1
2
3
4
5
6
hg qnew firstpatch
hg qrefresh
hg qdiff
hg qnew secondpatch
hg qrefresh
hg qcommit

In Git, just commit as normal and worry about what to do with them later. When later does eventually come around, there is really only one thing you need to know: interactive rebase. This command launches a text editor and lets you modify the history of Git to your heart’s content.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
> git rebase --interactive origin/master
pick   94f56db Debug an error raised when importing view
squash 772e7e8 Re-join comments using DELIM
reword a04f10e Error on filter branch - print line
pick   e09b0a2 Added troubleshooting for msysgit + Cygwin
fixup  276c49a Added troubleshooting for missing master_cc branch
pick   a2c08f6 Added exclude configuration
pick   4c09e5e Ignore errors from _really_ long file paths
pick   9f38cf0 Actually, use fnmatch for exclude

# Rebase f698827..9f38cf0 onto f698827
#
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.

Mercurial has the roughly equivalent histedit extension, but this uses strip to update the normally append-only repository; spitting out an external backup file. How would you query the changes from this backup I wonder? How long should I keep them lying around? What new command do I need to run to restore it?

Back to Git. I’m worried about losing a commit after the 30-day reflog window closes. If only there were some way to stop Git from garbage collecting it after a month. Some way to, you know, apply a label for future reference “just in case”…

Like a branch?

Right! Because these ‘backups’ in Git are just commits, minus a branch, so the reflog has our back. You don’t need to learn another set of commands to know what to do with them.

“Make things as simple as possible, but not simpler”.

Branching in Git

For a long time branching in Git was the ‘killer feature’. Mercurial would (and still does) recommend that you clone a repository for each branch. Wait, isn’t this DVCS and not SVN? They also had an an actual ‘branch’ command, which would permanently attach a label to a given commit. Once applied, it would be impossible to modify except when you finally merged or closed it. Eventually, due to popular demand, the Bookmark extension was introduced as a direct clone of Git’s branches, although initially you couldn’t push bookmarks to the server.

One advantage that still remains is that bookmarks in Mercurial share a single namespace. To understand what this means let’s take a look at a fairly normal scenario where someone has pushed some changes to the server.

1
2
3
4
5
6
7
8
9
10
> git fetch
From bitbucket.org:atlassian/helloworld
* [new branch]      test       -> origin/test
565ad9c..9e4b1b8  master     -> origin/master

> git log --graph --oneline --decorate --all
* 9e4b1b8 (origin/master, origin/test) Remove unused variable
| * 565ad9c (HEAD, master) Added Hello example
|/
* 46f0ac9 Initial commit

Would the real master branch please stand up? Of course, there isn’t anything wrong with this. There are two branches that just so happen to have the same name ‘master’. The namespace of the server (origin in this case) is making it clear which is which.

What about Mercurial?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> hg pull
pulling from default
importing bookmark test
divergent bookmark master stored as master@default

> hg glog
o changeset: 2:98c63da09bb1
| bookmark: master@default
| bookmark: test
| summary: Third commit
|
| o changeset: 1:d9989a0da93e
| | bookmark: master
| | summary: Second commit
|/
o changeset: 0:2e92d3b3d020
summary: First commit

When we do the pull you can see we have one branch that clashes with our own master and one ‘test’ that doesn’t. Because there is no notion of namespaces, we have no way of knowing which bookmarks are local and which ones are remote, and depending on what we call them, we might start running into conflicts.

Staging

This is one of the things that people either love or hate about Git. Git has this strange thing which it confusingly calls the “index”. Some people refer to it as a staging area. Whatever.

For anything to be added to a commit in Git, it must first pass through the index. How do you get content into the index? By calling ‘git add‘. This makes sense to SVN users for new files, but it can get a little confusing to have to do it for files you’ve already committed. The thing to keep in mind is that you are ‘adding’ your changes, not the files themselves. What I like about this is you know exactly what is going to be committed each time.

To help explain what I mean, a command that I use almost more than any other is patching. Patching lets you add specific hunk/snippets from a file, rather than an all-or-nothing approach.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
> git add --patch

diff --git a/OddEven.java b/OddEven.java
index 99c0659..911da1b 100644
--- a/OddEven.java
+++ b/OddEven.java
@@ -32,6 +32,7 @@ public class OddEven {
* Object) and initializes it by calling the constructor.  The next line of code calls
* the "showDialog()" method, which brings up a prompt to ask you for a number
*/
+        System.out.println("Debug");
OddEven number = new OddEven();
number.showDialog();
}
Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? n
@@ -49,7 +50,7 @@ public class OddEven {
* After that, this method calls a second method, calculate() that will
* display either "Even" or "Odd."
*/
-            this.input = Integer.parseInt(JOptionPane.showInputDialog("Please Enter A Number"));
+            this.input = Integer.parseInt(JOptionPane.showInputDialog("Please enter a number"));
this.calculate();
} catch (final NumberFormatException e) {
/*
Stage this hunk [y,n,q,a,d,/,K,g,e,?]? y

You can see I had forgotten to remove a debug statement; good thing we checked before committing! What’s nice about this is that, if I so choose, I can leave it there but accept the second ‘hunk’. All of this from within the same file and without have to re-edit anything after I realize my mistake.

Not surprisingly, Mercurial does have a Record extension that imitates this behavior. But because it is just an extension (or at least a basic one), it has to copy the un-staged changes to a temporary location, update the working storage files, commit and then revert the changes. If you make a mistake, you have to start again. What’s nice about the Git approach is that at a fundamental level Git knows and cares about the index and doesn’t have to touch your files. When you run status after staging your changes you can double-check everything looks correct before proceeding.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
> git status

# On branch master
# Changes to be committed:
#   (use "git reset HEAD ..." to unstage)
#
#   modified:   OddEven.java
#
# Changes not staged for commit:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#
#   modified:   OddEven.java
#

For those who are worried about testing unstaged changes, there is always ‘git stash –keep-index’ to temporarily archive anything that won’t be committed. As an aside, this stash is stored, not surprisingly, as just another commit and can be seen by our old friend reflog.

1
2
> git reflog --all
b7004ea refs/stash@{0}: WIP on master: 46f0ac9 Initial commit

The Blame Game

One of the interesting things about Git is that it doesn’t actually track renames. This is a source of concern for some people, but I think Git is actually doing the Right Thing™. What is a rename anyway? We’re just moving content from one file location to another. But what happens if we only move parts of the file? Git blame is a useful command that will normally display the commits that last touched each line of a file. With the magic ‘-C’ option it will now detect lines moved between files. (The ‘-s’ in this case is to suppress some date and author noise for this example).

1
2
3
4
5
6
7
8
9
10
11
> git blame -s -C OddEven.java

d46f0ac9 OddEven.java     public void run() {
d46f0ac9 OddEven.java         OddEven number = new OddEven();
d46f0ac9 OddEven.java         number.showDialog();
d46f0ac9 OddEven.java     }
d46f0ac9 OddEven.java
565ad9cd Hello.java       public static void main(final String[] args) {
565ad9cd Hello.java           new Hello();
565ad9cd Hello.java       }
d46f0ac9 OddEven.java }

Notice that not all the lines originated from this one file. This bad boy doesn’t have an equivalent Mercurial extension.

Conclusion

Git means never having to say, “you should have”. There are times though when Mercurial says exactly that. The second you want to rebase/modify a commit or use single-repository branches (aka bookmarks) — something I know I do every day — you are stepping outside of Mercurial’s comfort zone. The append-only repository format was intentionally designed without this behaviour in mind. I agree with Scott Chacon (from GitHub) who said Mercurial feels like a “Git Lite“.

Git isn’t perfect. However, I would argue that there are more important things than having a cuddly command line. Sure, it would be nice if Git did things a little better, gave less cryptic errors, ran fast on Windows, etc. But at the end of the day, these things are superficial. Go and write an alias if you don’t like a particular command. Stop running Windows (no seriously). The repository format drives what is possible with our DVCS tools, now and in the future.

Git and Mercurial Cheat Sheet

Hopefully this article and the previous one exploring the advantages of Mercurial over Git will illuminate some of the strengths and weaknesses of both systems.

If you are moving from the centralized version control system Subversion to Git check out our Git tutorial and workflow guides.