Search

Categories

My Git Workflow and Why You Should Use Git

Learning a distributed version control system is kind of a revelation. If you’re like me you’ve probably gone through the following progression. You started off with SourceSafe. You thought, wow, how did we ever get things done without version control?

Sometime after that you switched to cvs. If you were like me you probably thought. Wow, I can’t believe we ever put up with all those locks and only allowing a single person to edit a file at once. How stupid was that?

A few years later you might have switched to subversion or svn. At the time I remember thinking. Wow, how did we ever live without atomic commits?

Then, after that I got experience with perforce or p4 and I remember thinking, OMG, how did I ever live without this speed?

About 15 months ago I learned mercurial or hg. Joel Spolsky turned me on to it with 2 great articles. The first about how his employees all started using hg until it finally donned on him that maybe he should look into why.

That lead him to write a great set of tutorials about some of the benefits of distributed version control. I followed those tutorials and promptly set about using hg in some small projects.

Some of the people collaborating on those projects had git experience and they pointed out how disappointed they were using mercurial. I was basically thinking like “whatever, you probably are just not familiar with it.”

Chromium, the project I currently work on, started allowing git as an option. After hearing so may raves about git so I finally took the plunge. It’s been at least 6 months since I started using git and I can say that again I had another of those “WOW” moments. This time it’s “Wow, how did I ever get along without easy branching”.

People have tried to explain how git works, what the commands are, how it’s different from svn, p4 or hg. Rather than go into technical details about how it works I’m just going to show my workflow. I believe this workflow helps me be more efficient, helps me write more code, and is not encouraged by svn, p4, or hg.

In git you always work in branches. Git itself has no concept of a main branch. There is a convention that many projects use of a main branch called ‘master’ but nothing in git enforces that. It’s just a convention. I happen to follow it. I keep a ‘master’ branch which is effectively my copy of the official version with all the latest code contributed by everyone on the team. To update that and get the latest I do this

$ git checkout master
switched to branch 'master'
$ git pull

At setup time I created a ‘master’ branch and it is set to pull the latest stuff from the place I cloned from so ‘git pull’ gets all the latest changes.

Now I want to do some work. I’m going to add support for texture compression so I make a new branch

$ git checkout -b texture-compression
Switched to a new branch 'texture-compression'

I edit some files and commit them locally. I upload the changes for code review and then send the changes to our try bots (servers that build the project) to see if they work on various platforms.

$ git commit -a -m "Added texture compression"
$ cl upload
$ try

‘cl’ is a script that uploads the changes to our code review system. ‘try’ is a script that takes the changes on our code review system and sends them to our try bots to build them. Now, being that chromium is a big project, 27k+ files, it can take from 1 to 3 hours before the try bots will be able to compile, build and link everything then then run the thousands and thousands of tests on all the systems they need to run on. So, I decide to start working on adding logging for JavaScript for WebGL.

$ git checkout -b add-webgl-js-logging master

And in generally under 1 second I’m now working on a new thing. I write some code, I get half way in and someone comes by and asks if I can look at why the new TexStorage function we added is having problems. No problem.

$ git commit -a -m "work in progress"
$ git checkout -b texstorage-work master

And now I’m working on a fresh branch, it has none of the other changes I’ve worked on today. I build, debug, find the issue and tell my co-worker how to fix it. Now I want to get back to working on my the WebGL JS Logging stuff so

$ git checkout add-webgl-js-logging

And I’m back to working on that. I get an email from the try bots that there’s a problem with my texture compression stuff. I check in my work here and switch to that.

$ git commit -m "work in progress"
$ git checkout texture-compression

I make a few changes, commit it locally and start the try servers on my new stuff

$ git commit -a -m "fixed bug in texture-compression, added unit test"
$ cl upload
$ try

And now I switch back to working on webgl javascript logging

$ git checkout add-webgl-js-logging

And I’m back to working on that.

Does this workflow of switching between different things seem useful to you? How did I accomplish this in svn or p4 or hg? I had multiple copies of the entire project checked out in different folders. In git none of that is necessary. hg basicially says as much. If you want to work on 2 different things at once you should ‘hg clone’ which means ‘copy everything’. On chromium that takes a couple of minutes in hg. In git, making a new branch to start working on something else takes under 2 seconds, even on chrome which is 27k files!

On top of that, hg, svn and p4 arguably don’t encourage this kind of branching. In git can see what I’m working on easily.

$ git branch -vv
* add-webgl-js-logging 513b342 work-in-progress
  master               2bc582a networking latency issue 54125 fixed
  texstorage-work      2bc582a networking latency issue 54125 fixed
  texture-compression  a34d46c fixed bug in texture compression, added unit test

This shows that I’m on branch ‘add-webgl-js-logging’. There are 4 branches total. It’s pretty clear that ‘texstorage-work’ is at the same state as ‘master’ and what was last done on the ‘texture-compression’ branch.

What would be the equivalent in hg, svn or p4? Given that checking out the entire project again is very slow on those systems most likely the best you can do is make a few folders ‘checkout01′, ‘checkout02′, ‘checkout03′, ‘checkout04′ and try to mentally remember that you were working on texture compression in checkout02 and webgl logging in checkout04. I actually used to do that. In both p4 and svn I’d give my checkout folders more interesting names like ‘mars’, ‘penelope’, ‘samson’ and other random names but I’d have no way of knowing which folder contained which work except to remember in my head or manually switch to each one and do an ‘svn status’ or something similar.

Now it’s easy. branches are SUPER CHEAP so I can make as many as I want. Switching between them is trivial and fast and git’s design encourages it making it easy to see what I’m doing.

Let’s keep going just to finish this up

I get an email from the try bots that says everything went well with texture-compression so let’s check that in. Now in chromium’s case they’ve added a commit queue system so at this point I can just go to our code review site and click ‘commit’ and the bots will do some more thorough tests and if they all pass the code would automatically be committed. We happen to still support svn at this point in time (I’m sure all the git fan team members can’t wait for the day we switch to 100% git). But, assuming I want to check in by hand. I save the stuff I’m working on for logging, switch to the texture-compression branch and commit.

$ git commit -a -m "latest work in progress"
$ git checkout texture-compression
$ cl dcommit

It’s now checked in. Let’s switch back to webgl gl logging and try to check that in.

$ git checkout add-webgl-js-logging
$ cl dcommit
conflict foo/bar/js-logging.cc has changed
aborted

Something changed since I started working on this feature and it’s telling me I should fix that first. Let’s grab the latest. Since I happen to base everything off my own local copy of master I would generally do this.

$ git checkout master
$ git pull
$ git rebase add-webgl-js-logging master
$ cl dcommit

That switches to the ‘master’ branch. Gets all the latest changes off the net and merges them into master so it matches. I then switch back to add-webgl-js-logging telling it to ‘rebase’ on master which effectively means. Take all my changes and put them aside, update the add-webgl-js-logging branch so it matches master, then reapply my changes on top. If I didn’t care about my local ‘master’ branch I could just rebase directly off the latest stuff from the net with this

$ git fetch origin/master
$ git rebase add-webgl-js-logging origin/master
$ cl dcommit

Note that I haven’t gone into any of the details of adding files, resolving conflicts or other things. That’s not the point of this post. The point is to show my workflow in git. This is why git is so popular. It encourages a certain style of workflow that none of the other popular systems do AFAIK.

If you want to learn git I highly recommend you read at least the first and second chapter of ProGit. Another huge advantage to git is this concept called ‘the stage’ which has no equivalent in any other system I know of. It lets you easily edit a bunch of files but then chose which ones will get committed when, even down to selecting individual lines in a file. The first 2 chapters of ProGit will cover the stage and some other differences of a distributed system (hg, git) vs a non-distributed system (svn, cvs, p4)

I also highly recommend you start with github and their help

Because it’s so different there is a learning curve for git. It’s not nearly as bad as people make it out to be. I think most of the complaints come from 3-4 years ago and things have gotten much better.

Once you learn git I don’t think you’ll ever want to go back to another system.

NOTE: If you’re curious about ‘cl’ and ‘try’ above, in our actual workflow they are installed as git plugins. Why I have no idea. Git has what some might call a plugin system. Any script named git-name will be run if you type ‘git name’. In the case above, ‘try’ and ‘cl’ are custom scripts shell scripts, git-try and git-cl that run python scripts git_try.py and git_cl.py respectively and are part of chromium, not git. There’s no particular reason to name them as git plugins AFAIK. They could just as easily be named ‘try’, ‘chromium_try’ and ‘commit_to_chromium’ or something. ‘For your project you’d likely do something different. You might ‘try’ by building locally. You might commit by pushing to a specific remote branch or by putting up a pull request like on github. Again, the point is the workflow, not the specifics.

  • Jakub Narebski

    What do those do?

    $ git cl upload
    $ git try

    These are not git commmands

  • Shawn Casey

    P4 shelves makes it really easy to go between tasks.

  • http://greggman.com greggman

    p4 shelve is not the same thing as git branches. p4 shelve is more like git stash. For example looking at the docs I see no names associated with p4 shelve so it’s kind of hard to remember which shelve has which changes. shevles also have the problem that they don’t keep their own change history. branches do.

  • http://greggman.com greggman

    Thanks for that link.

    I don’t feel like those are the same and that’s basically why I said “doesn’t encourage” in my post. You can work on multiple things in svn, p4 and hg. Just checkout multiple times (p4 checkout, hg clone, svn checkout, etc..) but that’s hardly encouraging a fast workflow.

    hg bookmark seems like an after thought. In fact you have turn them on. Support for them is not on by default. On top of that they are pushed to the server which means every user on the team has to agree not to use the same names for their bookmarks else there will be collisions.

    hg named branches are also kind of lame. The seem like a solution to a problem that doesn’t exist in git.

    Interestingly, you are one of the people that got me to switch to git. Have you changed your mind about hg?

  • http://greggman.com greggman

    read the “NOTE:” at the bottom of the article.

  • http://greggman.com greggman

    Actually, I’ll edit those to make it less confusing

  • Jakub Narębski

    Git has sensible branches *and* it does have ‘git stash’ which I think is something like ‘p4 shelve’

  • Jakub Narębski

    The problem with “bookmark branches” in Mercurial was that they were not transferrable, and still is that the namespace is global — compare to perhaps more difficult to understand but also more flexible and powerful system of so called “remote-tracking branches” in Git.

    See also: http://stackoverflow.com/questions/1598759/git-and-mercurial-compare-and-contrast/1599930#1599930

  • http://twitter.com/agrant3d Andrew Grant

    Just curious if you are aware of Perforce Streams?
    http://www.perforce.com/product/product_features/perforce_streams
    There is a nice video tutorial on that page.  

  • http://greggman.com greggman

    Interesting. One of the problems with discussing the topic of “branching” is that everyone and every VCS has a different concept of what a branch is. A branch in svn and p4 is a heavy weight thing the entire team needs to be aware of. This p4 stream thing from is very description is following along those lines talking about allowing the project architect to look at and manage all the streams.

    That’s absolutely nothing like git branches. Git branches can possibly more easily be thought of as personal folders at a certain state. Whether or not anyone but you sees one is up to you. Whether or not they get pushed to anyone else is up to you. It’s functionally no different then typing ‘mkdir somefolder; cp -r projectfolder somefolder’ except it’s way faster and changes are tracked. There’s no management and no contact with a server.

  • Vincent Scheib

    I think some of the checkout commands you list may need a 2nd parameter of origin/master (or a checkout master first) to not cause the new branch to be based on the current branch.

  • http://greggman.com greggman

    ??? I’m not seeing which ones need a 2nd parameter.

  • td

    Mercurial has “record” that lets you commit portions (hunks) of files to your repo and “shelve” lets you temporary push off changes to a temporary holding area.

    For my newer projects, I’ve been using git exclusively.  That said, if TortoiseHg hadn’t deprecated some features in THG 2.0, I’d probably have stuck with Hg.

  • Amit Bakshi

    Git is awesome, but as you know it doesn’t work for game-dev because of all the large binaries. The new p4 has a ‘streams’ feature is similar to local git branches, but nothing can hold a candle to git as long as you’re mostly source code (rebase, cherry-pick, merges).

  • http://greggman.com greggman

    So I’ve been curious what I would do if I got back into games and how I’d solve that problem.

    2 things come to mind

    #1) Google has this script, gclient (http://code.google.com/p/gclient/) that manages multiple sub-repos in 1 project using different source control systems. Basically here’s a file in the main project called DEPS that lists which version of some other project to pull in and where to put it. I might not use that exact script but doing something similar I could use git for source and p4 for assets.

    #2) Do something like Naughty Dog used to do (maybe still does?) which is manage assets with a custom system on the server.

    Their assets are built and stored on the server. During dev the game loads assets directly from the server. Like at Namco I assume the game caches assets locally so it doesn’t always have to get them from the server.

    Libraries load from folders by version number so for example library version 3.2 might load a tree from game/assets/v3.2/model/tree.bin That way the libraries only see files that are of the correct version.

    Since they’re loading from the net and the net is running on a unix box then assets are referenced from your home folder on the net. //server/users/amit/game/assets for example. When you sync to the latest assets all that happens is //server/users/amit/game/assets gets populated with links to the lastest shared assets. Think “ln -s //server/shared/game/assets/v3.2/model/tree.bin //server/users/amit/game/assets/v3.2/model/tree.bin. That means sync is very fast.

    As for building, when you build something I assume it builds on your local machine and uploads your home folder. When you check in it copies the new asset to the shared folder on the net and changes your home folder’s link to that asset to point to the shared one.

    Or something like that. I don’t actually know the details but it sounds a hell of a lot better than most projects I’ve seen where everyone builds all assets or where syncing the latest assets takes tens of minutes.

  • Patrick

    Been looking into getting started with Git, and found this very helpful! Thanks!

  • Sam Izzo

    I was wondering what you mean about the named branches in Mercurial?  That concept seems pretty similar to git branches (for more, see http://hgbook.red-bean.com/read/managing-releases-and-branchy-development.html), although in the docs – and various other sites – they say things like “a good rule of thumb is to use branch names sparingly and for rather longer lived concepts like “release branches” and not for short lived work of single developers”.  Just trying to work out an hg workflow or whether I should switch to git.

  • http://greggman.com greggman

    The simple answer is you should switch to git ;-)  After you’re comfortable with it then you can decide which fits your needs better.

    In git everything is a named branch. You branch all the time in git and delete them all the time as well. It’s the workflow git encourages. You can see hg does not encourage this just by reading the docs.

    Sorry, that’s not really answering your question. It sounds like named branches in hg are local only (it says something about single repo in the docs). git can be local or shared, it’s up to you to decide which ones you want to share.

  • Sam Izzo

    Thanks – yeah I should just bite the bullet and try it :) I definitely like the idea of being able to effortlessly branch anywhere and everywhere!

  • Isopod

    I have used both Git and hg, and I still to this day don’t see how Git’s branching is supposed to be different from hg’s. Using hg, I can just as well work with multiple branches and switch between them as often as I like. There is no need to check out into multiple folders as stated in your article.

    The only “difference” is perhaps that in Git you first create a new branch, then make some changes and commit, while in hg you just make your changes and create a new branch as you commit. But this is just a matter of taste, I guess…