git is an extremely powerful tool once you really get to know it, but many factors conspire to make it very difficult for a beginner to just start contributing to a project without reading a whole manual. git was originally designed with a particular workflow in mind -- the Linux kernel, which has a hierarchical maintainer structure and only one person maintains the authoritative repository -- and a lot of those design decisions are still with us today.
As of git 1.7.6, there have been some improvements to make your workflow easier, like more informational messages, but there's still a pretty steep learning curve, especially for people working with a shared remote repository.
The staging area
Despite having the ability to be extremely useful, this is one of the first
stumbling blocks for new git users, and it's entrenched in everything git does.
Also known as the cache or index, it comes between you and a simple
commit that just checks in everything that has changed in your copy or
git diff that shows you all the differences between the last commit
and your changes. The output of
git status has been improved over the
years to give you the exact commands you need to run for common operations, but
the staging area is still mostly a hindrance for beginners.
Pulling always merges by default
The recommended workflow for git involves creating a new branch for every
change, no matter how small, but beginners aren't used to that, and branching in
many other systems is scary and expensive. If I commit a change in my local
master branch, and the matching remote master branch has newer updates
before I push my changes, doing a
pull always does a merge. This is needed if I'm sharing my local
repository with multiple people, but a rebase would be better if I only ever
push/pull from a single repository and never merge any branches, so that a
linear history is preserved. A merge in the case of local differences adds an
extra merge commit to the log, which is unnecessary and harder to read later
Push does not set up tracking by default
Another common point of confusion is that the first time a branch is pushed up
to a remote, tracking is not set up automatically, so a subsequent pull or push
does not work by itself. There is
push -u or
--set-upstream to automatically set up tracking (and even works on
already-existing local branches), but there is no way to make this the default
Tags are great if there's only one person who manages them, but for a shared
repository they can get pretty hairy. They're not treated like normal commits.
You can't really modify or delete tags that have already been pulled by someone
else, and there's no mechanism for merging tags created by different people. It
might be a lot better if tags acted like
Dealing with submodules
The submodules system can be very aggravating for newbies (and even experienced users) for a number of reasons, many of which can at least be explained away by the fact that as a distributed system, submodules need to be distributed in nature too.
Committing directly to files in submodules can be very tricky - if you don't remember to check out a named branch before committing, the changes can end up in a limbo state since they're not referenced anywhere. And of course, if you don't push your submodule changes, another user of the repository won't be able to find the right commit. But both of these problems can be worked around with a proper continuous integration tool.
The other major stumbling block is that git status offers relatively little
information about submodules. If you forget to run
git submodule init after
a new submodule is added to the tree and you can't build your software
git status won't tell you about it; you have to remember to look
git submodule, and to know what the output means. You also have to
remember to run
git submodule update after a pull has fetched submodule
updates. Problems related to missing or out-of-date submodules can be looked
for with a version check in whatever build system you're using, but that
duplicates some work.