Git – Commits

How to configure nginx for Joomla

In Git, a commit is used to record changes
to a repository.

At face value, a Git commit seems no different from a commit or
check-in found in other VCS. Under the hood, however, a Git commit operates
in a unique way.

When a commit occurs, Git records a
snapshot of the index and places that snapshot in the
object store. (Preparing the index for a commit is covered in Chapter 5.) This snapshot does
not contain a copy of every file and directory in the
index, because such a strategy would require enormous and prohibitive
amounts of storage. Instead, Git compares the current state of the index to
the previous snapshot and so derives a list of affected files and
directories. Git creates new blobs for any file that has changed and new
trees for any directory that has changed, and it reuses any blob or tree
object that has not changed.

Commit snapshots are chained together, with each new snapshot pointing
to its predecessor. Over time, a
sequence of changes is represented as a series of commits.

It may seem expensive to compare the entire index to some
prior state, yet the whole process is remarkably fast because every Git
object has an SHA1 hash. If two objects, even two subtrees, have the same
SHA1 hash, the objects are identical. Git can avoid swaths of recursive
comparisons by pruning subtrees that have the same content.

There is a one-to-one correspondence between a set of changes
in the repository and a commit: A commit is the only method of introducing
changes to a repository, and any change in the repository must be introduced
by a commit. This mandate provides accountability. Under no circumstance should
repository data change without a record of the change! Just imagine the
chaos if, somehow, content in the master repository changed and there was no
record of how it happened, who did it, or why.

Although commits are most often introduced explicitly by a developer,
Git itself can introduce commits. As you’ll see in Chapter 9, a merge operation causes a commit in the
repository in addition to any commits made by users before the merge.

How you decide when to commit is pretty much up to you and your
preferences or development style. In general, you should perform a commit at
well-defined points in time when your development is at a quiescent stage,
such as when a test suite passes, when everyone goes home for the day, or
any number of other reasons.

However, don’t hesitate to introduce commits! Git is well-suited to
frequent commits and provides a rich set of commands for manipulating them.
Later, you’ll see how several commits—each with small, well-defined
changes—can also lead to better organization of changes and easier
manipulation of patch sets.

Atomic Changesets

Every Git commit represents a single, atomic
changeset
with respect to the previous state. Regardless of
the number of directories, files, lines, or bytes that change with a
commit,[12] either all changes apply or none do.

In terms of the underlying object model, atomicity just makes sense:
A commit snapshot represents the total set of modified files and
directories. It must represent one tree state or the other, and a
changeset between two state snapshots represents a
complete tree-to-tree transformation. (You can read about derived
differences between commits in Chapter 8.)

Consider the workflow of moving a function from one file to another.
If you perform the removal with one commit and then follow with a second
commit to add it back, there remains a small semantic gap
in the history of your repository during which time the function is gone.
Two commits in the other order is problematic, too. In either case, before
the first commit and after the second your code is semantically
consistent, but after the first commit, the code is faulty.

However, with an atomic commit that simultaneously deletes and adds
the function, no such semantic gap appears in the history. You can learn
how best to construct and organize your commits in Chapter 10.

Git doesn’t care why files are changing. That
is, the content of the changes doesn’t matter. As the developer, you might
move a function from here to there and expect this to be handled as one
unitary move. But you could, alternatively, commit the removal and then
later commit the addition. Git doesn’t care. It has nothing to do with the
semantics of what is in the files.

But this does bring up one of the key reasons why Git implements
atomicity: It allows you to structure your commits more appropriately by
following some best practice advice.

Ultimately, you can rest assured that Git has not left your
repository in some transitory state between one commit snapshot and the
next.

Identifying Commits

Whether you code individually or with a team, identifying individual commits is an essential task. For example, to
create a branch, you must choose a commit from which to diverge; to
compare code variations, you must specify two commits; and to edit the
commit history, you must provide a collection of commits. In Git, you can
refer to every commit via an explicit or an implied reference.

You’ve already seen explicit references and a few implied
references. The unique, 40-hexadecimal-digit SHA1 commit ID is an explicit
reference, whereas HEAD, which always
points to the most recent commit, is an implied reference. At times,
though, neither reference is convenient. Fortunately, Git provides many
different mechanisms for naming a commit, each with advantages and some
more useful than others, depending on the context.

For example, when discussing a particular commit with a colleague
working on the same data but in a distributed environment, it’s best to
use a commit name guaranteed to be the same in both repositories. On the
other hand, if you’re working within your own repository and need to refer
to the state a few commits back on a branch, a simple relative name works
perfectly.

Absolute Commit Names

The most rigorous name for a commit is its hash
identifier. The hash ID is an absolute name, meaning it can only
refer to exactly one commit. It doesn’t matter where the commit is among
the entire repository’s history; the hash ID always identifies the same
commit.

Each commit ID is globally unique, not just
for one repository but for any and all repositories. For example, if a
developer writes you with reference to a particular commit ID in his repository and if you
find the same commit in your repository, then you can be certain that
you both have the same commit with the same content. Furthermore,
because the data that contribute to a commit ID contain the state of the
whole repository tree as well as the prior commit state, by an inductive
argument, an even stronger claim can be made: You can be certain that
both of you are discussing the same complete line of development leading
up to and including the commit.

Because a 40-hexadecimal-digit SHA1 number makes for a
tedious and error-prone entry, Git allows you to shorten this number to
a unique prefix within a repository’s object database. Here is an
example from Git’s own repository.

    $ git log -1 --pretty=oneline HEAD
    1fbb58b4153e90eda08c2b022ee32d90729582e6 Merge git://repo.or.cz/git-gui

    $ git log -1 --pretty=oneline 1fbb
    error: short SHA1 1fbb is ambiguous.
    fatal: ambiguous argument '1fbb': unknown revision or path
        not in the working tree.
    Use '--' to separate paths from revisions

    $ git log -1 --pretty=oneline 1fbb58
    1fbb58b4153e90eda08c2b022ee32d90729582e6 Merge git://repo.or.cz/git-gui

Although a tag name isn’t a globally unique name, it is absolute
in that it points to a unique commit and doesn’t change over time
(unless you explicitly change it, of course).

refs and symrefs

A ref is an SHA1 hash ID that refers to an object within the Git object store. Although
a ref may refer to any Git object, it usually refers to a commit object.
A symbolic reference, or
symref, is a name that indirectly points to a Git
object. It is still just a ref.

Local topic branch names, remote tracking branch names, and tag
names are all refs.

Each symbolic ref has an explicit, full name that begins
with refs/ and each is stored
hierarchically within the repository in the .git/refs/ directory. There are basically
three different namespaces represented in refs/: refs/heads/ ref for
your local branches, refs/remotes/ ref
for your remote tracking branches, and refs/tags/ ref for
your tags. (Branches are covered in more detail in Chapter 7 and in Chapter 12.)

For example, a local topic branch named dev is really a short form of refs/heads/dev. Remote tracking branches are
in the refs/remotes/ namespace, so
origin/master really names refs/remotes/origin/master. And finally, a tag
such as v2.6.23 is short for refs/tags/v2.6.23.

You can use either a full ref name or its abbreviation, but if you
have a branch and a tag with the same name, Git applies a disambiguation
heuristic and uses the first match according to this list from the
git rev-parse manpage:

    .git/ref
    .git/refs/ref
    .git/refs/tags/ref
    .git/refs/heads/ref
    .git/refs/remotes/ref
    .git/refs/remotes/ref/HEAD

The first rule is usually just for a few refs described later:
HEAD, ORIG_HEAD, FETCH_HEAD,
CHERRY_PICK_HEAD, and MERGE_HEAD.

Tip

Technically, the name of the Git directory, .git, can be changed. Thus, Git’s internal
documentation uses the variable $GIT_DIR instead of the literal .git.

Git maintains several special symrefs automatically for particular
purposes. They can be used anywhere a commit is used.

HEAD

HEAD always
refers to the most recent commit on the current branch. When you
change branches, HEAD is
updated to refer to the new branch’s latest commit.

ORIG_HEAD

Certain operations, such as merge and reset, record
the previous version of HEAD in
ORIG_HEAD just prior to
adjusting it to a new value. You can use ORIG_HEAD to recover or revert to the
previous state or to make a comparison.

FETCH_HEAD

When remote repositories are used, git fetch records the heads of all
branches fetched in the file .git/FETCH_HEAD. FETCH_HEAD is a shorthand for the head
of the last branch fetched and is valid only immediately after a
fetch operation. Using this symref, you can find the HEAD of commits from git fetch even if an anonymous fetch
that doesn’t specifically name a branch is used. The fetch operation is covered in Chapter 12.

MERGE_HEAD

When a merge is in progress, the tip of the
other branch is temporarily recorded in the
symref MERGE_HEAD. In other
words, MERGE_HEAD is the commit
that is being merged into HEAD.

All of these symbolic references are managed by the
plumbing command git
symbolic-ref
.

Warning

Although it is possible to create your own branch with one of
these special symbolic names (e.g., HEAD), it isn’t a good idea.

There are a whole raft of special character variants for
ref names. The two most common,
the caret ( ^) and tilde ( ~), are described in the next section. In
another twist on refs, colons can be used to refer to alternate versions
of a common file involved in a merge conflict. This procedure is
described in Chapter 9.

Relative Commit Names

Git also provides mechanisms for identifying a commit relative
to another reference, commonly the tip of a branch.

You’ve seen some of these names already, such as master and master^, where master^ always
refers to the penultimate commit on the master branch.
There are others as well: you can use master^^, master~2, and even a complex name like
master~10^2~2^2.

Except for the first root commit,[13] each commit is derived from at least one earlier commit and possibly many, where direct
ancestors are called parent commits. For a commit
to have multiple parent commits, it must be the result of a merge
operation. As a result, there will be a parent commit for each branch
contributing to a merge commit.

Within a single generation, the caret is used to select a different parent. Given a
commit C, C^1 is the first parent, C^2 is the second parent, C^3 is the third parent, and so on, as shown
in Figure 6-1.

Figure 6-1. Multiple parent names

The tilde is used to go back before an ancestral parent and select
a preceding generation. Again, given the commit C, C~1 is
the first parent, C~2 is the first
grandparent, and C~3 is the first
great-grandparent. When there are multiple parents in a generation, the
first parent of the first parent is followed. You might also notice that
both C^1 and C~1 refer to the first parent; either name is
correct, and is shown in Figure 6-2.

Figure 6-2. Multiple parent names

Git supports other abbreviations and combinations as well. The
abbreviated forms C^ and C~ are the same as C^1 and C~1, respectively. Also, C^^ is the same as C^1^1 and, because that means the first
parent of the first parent of commit C,
it refers to the same commit as
C~2.

By combining a ref and instances of
caret and tilde, arbitrary commits may be selected from the ancestral
commit graph of ref. Remember, though, that
these names are relative to the
current value of ref. If
a new commit is made on top of ref, the
commit graph is amended with a new generation and each
parent name shifts further back in the history and
graph.

Here’s an example from Git’s own history when Git’s master branch was at commit 1fbb58b4153e90eda08c2b022ee32d90729582e6.
Using the command:

    git show-branch --more=35

and limiting the output to the final 10 lines, you can inspect the
graph history and examine a complex branch merge structure:

    $ git rev-parse master
    1fbb58b4153e90eda08c2b022ee32d90729582e6

    $ git show-branch --more=35 | tail -10
    -- [master~15] Merge branch 'maint'
    -- [master~3^2^] Merge branch 'maint-1.5.4' into maint
    +* [master~3^2^2^] wt-status.h: declare global variables as extern
    -- [master~3^2~2] Merge branch 'maint-1.5.4' into maint
    -- [master~16] Merge branch 'lt/core-optim'
    +* [master~16^2] Optimize symlink/directory detection
    +* [master~17] rev-parse --verify: do not output anything on error
    +* [master~18] rev-parse: fix using "--default" with "--verify"
    +* [master~19] rev-parse: add test script for "--verify"
    +* [master~20] Add svn-compatible "blame" output format to git-svn

    $ git rev-parse master~3^2^2^
    32efcd91c6505ae28f87c0e9a3e2b3c0115017d8

Between master~15 and master~16, a merge took place that introduced
a couple of other merges as well as a simple commit named master~3^2^2^. That happens to be commit
32efcd91c6505ae28f87c0e9a3e2b3c0115017d8.

The command git
rev-parse
is the final authority on translating any form of commit
name—tag, relative, shortened, or absolute—into an actual, absolute
commit hash ID within the object database.

Commit History

Viewing Old Commits

The primary command to show the history of commits is
git log. It has more options,
parameters, bells, whistles, colorizers, selectors, formatters, and
doodads than the fabled ls. But don’t worry. Just as with ls, you don’t need to learn all the details
right away.

In its parameterless form, git
log
acts like git log HEAD,
printing the log message associated with every commit in your history
that is reachable from HEAD. Changes
are shown starting with the HEAD
commit and work back through the graph. They are likely to be in
roughly reverse chronological order, but recall Git
adheres to the commit graph, not time, when traveling back over the
history.

If you supply a commit à la git
log commit
, the log starts at the
named commit and works backward. This form of the command is useful for
viewing the history of a branch:

    $ git log master

    commit 1fbb58b4153e90eda08c2b022ee32d90729582e6
    Merge: 58949bb... 76bb40c...
    Author: Junio C Hamano <gitster@pobox.com>
    Date:   Thu May 15 01:31:15 2008 -0700

    Merge git://repo.or.cz/git-gui

    * git://repo.or.cz/git-gui:
      git-gui: Delete branches with 'git branch -D' to clear config
      git-gui: Setup branch.remote,merge for shorthand git-pull
      git-gui: Update German translation
      git-gui: Don't use '$$cr master' with aspell earlier than 0.60
      git-gui: Report less precise object estimates for database compression

    commit 58949bb18a1610d109e64e997c41696e0dfe97c3
    Author: Chris Frey <cdfrey@foursquare.net>
    Date:   Wed May 14 19:22:18 2008 -0400

    Documentation/git-prune.txt: document unpacked logic

    Clarifies the git-prune manpage, documenting that it only
    prunes unpacked objects.

    Signed-off-by: Chris Frey <cdfrey@foursquare.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>

    commit c7ea453618e41e05a06f05e3ab63d555d0ddd7d9

    ...

The logs are authoritative, but rolling back through the entire
commit history of your repository is likely not very practical or
meaningful. Typically, a limited history is more informative. One
technique to constrain history is to specify a commit
range using the form
since .. until. Given a
range, git log shows all commits
following since running through
until. Here’s an example.

    $ git log --pretty=short --abbrev-commit master~12..master~10

    commit 6d9878c
    Author: Jeff King <peff@peff.net>

    clone: bsd shell portability fix

    commit 30684df
    Author: Jeff King <peff@peff.net>

    t5000: tar portability fix

Here, git log shows the commits
between master~12 and master~10, or the 10th and 11th prior commits
on the master branch. You’ll see more about ranges in Commit Ranges later in this chapter.

The previous example also introduces two formatting
options, --pretty=short and --abbrev-commit. The
former adjusts the amount of information about each commit and has
several variations, including oneline, short, and full. The latter simply requests that hash IDs
be abbreviated.

Use the -p option to print the patch, or
changes, introduced by the commit.

    $ git log -1 -p 4fe86488

    commit 4fe86488e1a550aa058c081c7e67644dd0f7c98e
    Author: Jon Loeliger <jdl@freescale.com>
    Date:   Wed Apr 23 16:14:30 2008 -0500

    Add otherwise missing --strict option to unpack-objects summary.

    Signed-off-by: Jon Loeliger <jdl@freescale.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>

    diff --git a/Documentation/git-unpack-objects.txt b/Documentation/git-unpack-objects.txt
    index 3697896..50947c5 100644
    --- a/Documentation/git-unpack-objects.txt
    +++ b/Documentation/git-unpack-objects.txt
    @@ -8,7 +8,7 @@ git-unpack-objects - Unpack objects from a packed archive

     SYNOPSIS
     --------
    -'git-unpack-objects' [-n] [-q] [-r] <pack-file
    +'git-unpack-objects' [-n] [-q] [-r] [--strict] <pack-file

Notice the -1 option as well: it restricts the
output to a single commit. You can also type
- n to limit the output to at
most n commits.

The --stat option enumerates the files
changed in a commit and tallies how many lines were modified in each
file.

    $ git log --pretty=short --stat master~12..master~10

    commit 6d9878cc60ba97fc99aa92f40535644938cad907
    Author: Jeff King <peff@peff.net>

    clone: bsd shell portability fix

     git-clone.sh |    3 +--
     1 files changed, 1 insertions(+), 2 deletions(-)

    commit 30684dfaf8cf96e5afc01668acc01acc0ade59db
    Author: Jeff King <peff@peff.net>

    t5000: tar portability fix

     t/t5000-tar-tree.sh |    8 ++++----
     1 files changed, 4 insertions(+), 4 deletions(-)

Tip

Compare the output of git log
–stat
with the output of git diff
–stat
. There is a fundamental difference in their displays.
The former produces a summary for each individual commit named in the
range, whereas the latter prints a single summary of the total
difference between two repository states named on the command
line.

Another command to display objects from the object store is
git show. You can use it to see a
commit:

    $ git show HEAD~2

or to see a specific blob object:

    $ git show origin/master:Makefile

In the latter display, the blob shown is the Makefile from the branch named origin/master.

Commit Graphs

In Chapter 4, Object Store Pictures introduced some figures to
help visualize the layout and
connectivity of objects in Git’s data model. Such sketches are
illuminating, especially if you are new to Git; however, even a small
repository with just a handful of commits, merges, and patches becomes
unwieldy to render in the same detail. For example, Figure 6-3 shows a more complete but still somewhat
simplified commit graph. Imagine how it would appear if all commits and
all data structures were rendered.

Yet one observation about commits can simplify the
blueprint tremendously: Each commit introduces a tree object that
represents the entire repository. Therefore, a commit can be pictured as just a name.

Figure 6-3. Full commit graph

Figure 6-4 shows the same commit
graph as Figure 6-3 but without depicting the
tree and blob objects. Usually for the purpose of discussion or
reference, branch names are also shown in the commit graphs.

Figure 6-4. Simplified commit graph

In the field of computer science, a
graph is a collection of nodes and a set of edges between the
nodes. There are several types of graphs with different properties. Git
makes use of a special graph called a directed acyclic
graph
(DAG). A DAG has two important properties. First, the
edges within the graph are all directed from one node to another.
Second, starting at any node in the graph, there is no path along the
directed edges that leads back to the starting node.

Git implements the history of commits within a repository
as a DAG. In the commit graph, each node is a
single commit, and all edges are directed from one
descendant node to another parent node, forming
an ancestor relationship. The graphs you saw in Figure 6-3 and Figure 6-4 are both DAGs. When speaking of
the history of commits and discussing the relationship between commits
in a graph, the individual commit nodes are often labeled as shown in
Figure 6-5.

In these diagrams, time is roughly left to right. A is the initial commit because it has no
parent, and B occurred after A. Both E
and C occurred after B, but no claim can be made about the relative
timing between C and E; either could have occurred before the
other. In fact, Git doesn’t really care about the time or timing
(absolute or relative) of commits. The actual wall clock
time of a commit can be misleading because a computer’s clock can be set
incorrectly or inconsistently. Within a distributed development
environment, the problem is exacerbated. Time stamps can’t be trusted.
What is certain, though, is that if commit Y points to parent X, then X
captures the repository state prior to the repository state of commit
Y, regardless of what time stamps
might be on the commits.

Figure 6-5. Labeled commit graph

The commits E and C share a common parent, B. Thus, B
is the origin of a branch. The master branch
begins with commits A, B,
C, and D. Meanwhile, the sequence
of commits A, B,
E, F, and G
form the branch named pr-17. The branch
pr-17 points to commit G. (You can read more about branches in Chapter 7.)

The commit H is a
merge commit, where the pr-17
branch has been merged into the master branch. Because it’s a merge, H has more than one commit parent—in this
case, D and G. After this commit is made, master will be updated to refer to the new
commit H, but
pr-17 will continue to refer to G. (The merge operation is discussed in more
detail in Chapter 9.)

In practice, the fine points of intervening commits are considered
unimportant. Also, the implementation detail of a commit pointing back
to its parent is often elided, as shown in Figure 6-6.

Figure 6-6. Commit graph without arrows

Time is still vaguely left to right, there are two branches shown,
and there is one identified merge commit ( H), but the actual directed edges are
simplified because they are implicitly understood.

This kind of commit graph is often used to talk about the
operation of certain Git commands and how each might modify the commit
history. The graphs are a fairly abstract representation of the actual
commit history, in contrast to tools (e.g., gitk and git
show-branch
) that provide concrete representations of commit
history graphs. With these tools, though, time is usually represented
from bottom to top, oldest to most recent. Conceptually, it is the same
information.

Using gitk to View the Commit Graph

The purpose of a graph is to help you visualize a
complicated structure and relationship. The gitk command[14] can draw a picture of a repository DAG whenever you
want.

Let’s look at our example website:

    $ cd public_html
    $ gitk

The gitk program can do a lot
of things, but let’s just focus on the DAG for now. The graph output
looks something like Figure 6-7.

Figure 6-7. Merge viewed with gitk

Here’s what you must know to understand the DAG of commits.
First of all, each commit can have zero or more
parents, as follows:

  • Normal commits have exactly one parent, which is the
    previous commit in the history. When you make a change, your
    change is the difference between your new commit and its
    parent.

  • There is usually only one commit with zero parents: the
    initial commit, which appears at the bottom of the
    graph.

  • A merge commit, such as the
    one at the top of the graph, has more than one parent.

A commit with more than one child is the
place where history began to diverge and formed a branch. In Figure 6-7, the commit Remove my poem is the branch point.

Tip

There is no permanent record of branch start points,
but Git can algorithmically
determine them via the git
merge-base
command.

Commit Ranges

Many Git commands allow you to specify a commit
range
. In its simplest instantiation, a commit range is a
shorthand for a series of commits. More complex forms allow you to
include and exclude commits.

A range is denoted with a double-period ( ..), as in start.. end,
where start and
end may be specified as described in Identifying Commits. Typically, a range is used to
examine a branch or part of a branch.

In Viewing Old Commits, you saw how to
use a commit range with git log. The
example used the range master~12..master~10 to specify the 11th and
10th prior commits on the master branch. To visualize the range,
consider the commit graph of Figure 6-8. Branch M is shown over a portion of its commit
history that is linear:

Figure 6-8. Linear commit history

Recall that time flows left to right, so M~14 is the oldest commit shown, M~9 is the most recent commit shown, and
A is the 11th prior commit.

The range M~12.. M~10 represents two commits, the 11th and 10th
oldest commits, which are labeled A
and B. The range does not include
M~12. Why? It’s a matter of
definition. A commit range, start.. end,
is defined as the set of commits reachable from end that are not
reachable from start. In other
words, the commit end is
included
whereas the commit
start is
excluded.
Usually this is simplified to
just the phrase in end but not
start.

Reachability in Graphs

In graph theory, a node X is said to be
reachable from another node A if you can start
at A, travel along the arcs of the graph according to the rules, and
arrive at X. The set of reachable nodes for a
node A is the collection of all nodes reachable from A.

In a Git commit graph, the set of reachable commits are those
you can reach from a given commit by traversing the directed parent
links. Conceptually and in terms of dataflow, the set of reachable
commits is the set of ancestor commits that flow into and contribute
to a given starting commit.

When you specify a commit Y, to git
log
, you are actually requesting Git to show the log for all
commits that are reachable from Y.
You can exclude a specific commit X
and all commits reachable from X with
the expression ^X.

Combining the two forms, git log ^X
Y
is the same as git log
X..Y
and might be paraphrased as give me all commits
that are reachable from Y and don’t give me any commit leading up to and
including X.

The commit range X..Y is
mathematically equivalent to ^X Y.
You can also think of it as a set subtraction: Use everything leading up
to Y minus everything leading up to
and including X.

Returning to the commit series from the earlier example, here’s
how M~12.. M~10 specifies just two commits, A and B.
Begin with everything leading up to M~10 as shown in the first line of Figure 6-9. Find everything leading
up to and including M~12, as shown in the second line
of the figure. And finally, subtract M~12 from
M~10 to get the commits shown in the third line of
the figure.

Figure 6-9. Interpreting ranges as set subtraction

When your repository history is a simple linear series of commits,
it’s fairly easy to understand how a range works. But when branches or
merges are involved in the graph, things can become a bit tricky and so
it’s important to understand the rigorous definition.

Let’s look at a few more examples. In the case of a master branch with a linear history, as shown
in Figure 6-10, the set B..E, the set ^B
E
, and the set of C,
D, and E are equivalent.

Figure 6-10. Simple linear history

In Figure 6-11, the
master branch at commit V was merged into the topic branch at B.

Figure 6-11. Master merged into topic

The range topic..master
represents those commits in master,
but not in topic. Because each commit
on the master branch prior to and
including V (i.e., the set {…,
T, U, V})
contributes to topic, those commits
are excluded, leaving W, X, Y, and
Z.

The inverse of the previous example is shown in Figure 6-12. Here, topic has been merged into master.

Figure 6-12. Topic merged into master

In this example, the range topic..master, again representing those
commits in master but not in topic, is the set of commits on the master branch leading up to and including
V, W, X,
Y, and Z.

However, we have to be a little careful and consider the full
history of the topic branch. Consider
the case where it originally started as a branch of master and then merged again as shown in Figure 6-13.

Figure 6-13. Branch and merge

In this case, topic..master,
contains only the commits W, X, Y, and
Z. Remember, the range will exclude
all commits that are reachable (going back or left
over the graph) from topic (i.e., the
commits D, C, B,
A, and earlier), as well as V, U, and
earlier from the other parent of B.
The result is just W through Z.

There are two other range permutations. If you leave either the
start or end
commits out of range, HEAD is
assumed. Thus, .. end is equivalent
to HEAD.. end and
start.. is
equivalent to start..HEAD.

Finally, just as start.. end
can be thought of as representing a set subtraction operation, the
notation A... B
(using three periods) represents the symmetric
difference
between A and
B, or the set of commits that are reachable
from either A or B
but not from both. Because of the function’s symmetry, neither commit
can really be considered a start or end. In this sense A and B are equal.

More formally, the set of revisions in the symmetric difference
between A and B,
A... B,
is given by

    $ git rev-list A B --not $(git merge-base --all A B)

Let’s look at the example in Figure 6-14.

Figure 6-14. Symmetric difference

We can compute each piece of the symmetric difference
definition:

    master...dev = (master OR dev) AND NOT (merge-base --all master dev)

The commits that contribute to master are ( I, H, . . .
, B, A, W,
V, U). The commits that contribute to dev are ( Z,
Y, . . . , U, C,
B, A).

The union of those two sets is ( A, . . . , I, U, . . .
, Z). The merge base between master and dev is commit W. In more complex cases, there might be
multiple merge bases, but here we have only one. The commits that
contribute to W are ( W, V,
U, C, B, and
A); those are also the commits that
are common to both master and
dev, so they need to be removed to
form the symmetric difference: ( I,
H, Z, Y,
X, G, F,
E, D).

It may be helpful to think of the symmetric difference between two
branches, A and B, as show everything in branch A or in branch B,
but only back to the point where the two branches
diverged.

Now that we’ve described what commit ranges are, how to write
them, and how they work, it’s important to reveal that Git doesn’t
actually support a true range operator. It is purely a notational
convenience that A..B represents the
underlying ^A B form. Git actually
allows much more powerful commit set manipulation on its command line.
Commands that accept a range are actually accepting an arbitrary
sequence of included and excluded commits. For example, you could
use:

    $ git log ^dev ^topic ^bugfix master

to select those commits in master but not in either of the dev, topic,
or bugfix branches.

All of these example may be a bit abstract, but the power of the
range representation really comes to fruition when you consider that any
branch name can be used as part of the range. As described in Tracking Branches of Chapter 12, if one of your branches
represents the commits from another repository, then you can quickly
discover the set of commits that are in your
repository that are not in another repository!

Finding Commits

Part of a good RCS is the support it provides for
archaeology and investigating a repository. Git provides
several mechanisms to help you locate commits that meet certain criteria
within your repository.

Using git bisect

The git bisect command
is a powerful tool for isolating a particular, faulty commit based on
essentially arbitrary search criteria. It is well-suited to those times
when you discover that something wrong or
bad is affecting your repository and you know the code
had been fine. For example, let’s say you are working on the Linux
kernel and a test boot fails, but you’re positive the boot worked
sometime earlier, perhaps last week or at a previous release tag. In
this case, your repository has transitioned from a known
good state to a known bad state.

But when? Which commit caused it to break? That is precisely the
question git bisect is designed to
help you answer.

The only real search requirement is that, given a checked-out
state of your repository, you are able to determine if it does or does
not meet your search requirement. In this case, you have to be able to
answer the question: Does the version of the kernel checked out
build and boot?
You also have to know a good and a bad version
or commit before starting so that the search will be bounded.

The git bisect command
is often used to isolate a particular commit that introduced some
regression or bug into the repository. For example, if you were working
on the Linux kernel, git bisect could
help you find issues and bugs such as fails to compile, fails to boot,
boots but can’t perform some task, or no longer has a desired
performance characteristic. In all of these cases, git bisect can help you isolate and determine
the exact commit that caused the problem.

The git bisect command
systematically chooses a new commit in an ever decreasing range bounded
by good behavior at one end and by bad behavior at the other.
Eventually, the narrowing range will pinpoint the one commit that
introduced the faulty behavior.

There is no need for you to do anything more than provide an
initial good and bad commit and then repeatedly answer the question
Does this version work?

To start, you first need to identify a good commit and a bad
commit. In practice, the bad version is often your current HEAD, because that’s where you are working
when you suddenly noticed something wrong or were assigned a bug to
fix.

Finding an initial good version can be a bit difficult, because
it’s usually buried in your history somewhere. You can probably name or
guess some version back in the history of the repository that you know
works correctly. This may be a tagged release like v2.6.25 or some commit 100 revisions ago,
master~100, on your master branch.
Ideally, it is close to your bad commit ( master~25 is better than master~100) and not buried too far in the
past. In any event, you need to know or be able to verify that it is, in
fact, a good commit.

It is essential that you start the git
bisect
process from a clean working directory. The process
necessarily adjusts your working directory to contain various different
versions of your repository. Starting with a dirty work space is asking
for trouble; your working directory could easily be lost.

Using a clone of the Linux kernel in our example, let’s tell Git
to begin a search:

    $ cd linux-2.6
    $ git bisect start

After initiating a bisection search, Git enters a bisect
mode, setting up some state information for itself. Git employs a
detached HEAD to manage the current checked-out
version of the repository. This detached HEAD is essentially an anonymous branch that
can be used to bounce around within the repository and point to
different revisions as needed.

Once started, tell Git which commit is bad. Again, because this is
typically your current version, you can simply default the revision to
your current HEAD.[15]

    # Tell git the HEAD version is broken
    $ git bisect bad

Similarly, tell Git which version works:

    $ git bisect good v2.6.27
    Bisecting: 3857 revisions left to test after this
    [cf2fa66055d718ae13e62451bb546505f63906a2] Merge branch 'for_linus'
        of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6

Identifying a good and bad version delineates a range of commits
over which a good to bad transition occurs. At each step along the way,
Git will tell you how many revisions are in that range. Git also
modifies your working directory by checking out a revision that is
roughly midway between the good and bad end points. It is now up to you
to answer the question: Is this version good or bad? Each
time you answer this question, Git narrows the search space in half,
identifies a new revision, checks it out, and repeats the good or
bad?
question.

Suppose this version is good:

    $ git bisect good
    Bisecting: 1939 revisions left to test after this
    [2be508d847392e431759e370d21cea9412848758] Merge git://git.infradead.org/mtd-2.6

Notice that 3,857 revisions have been narrowed down to 1,939.
Let’s do a few more:

    $ git bisect good
    Bisecting: 939 revisions left to test after this
    [b80de369aa5c7c8ce7ff7a691e86e1dcc89accc6] 8250: Add more OxSemi devices

    $ git bisect bad
    Bisecting: 508 revisions left to test after this
    [9301975ec251bab1ad7cfcb84a688b26187e4e4a] Merge branch 'genirq-v28-for-linus'
        of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

In a perfect bisection run, it takes log2
of the original number of revision steps to narrow down to just one
commit.

After another good and bad answer:

    $ git bisect good
    Bisecting: 220 revisions left to test after this
    [7cf5244ce4a0ab3f043f2e9593e07516b0df5715] mfd: check for
        platform_get_irq() return value in sm501

    $ git bisect bad
    Bisecting: 104 revisions left to test after this
    [e4c2ce82ca2710e17cb4df8eb2b249fa2eb5af30] ring_buffer: allocate
        buffer page pointer

Throughout the bisection process, Git maintains a log of your
answers along with their commit IDs.

    $ git bisect log
    git bisect start
    # bad: [49fdf6785fd660e18a1eb4588928f47e9fa29a9a] Merge branch
        'for-linus' of git://git.kernel.dk/linux-2.6-block
    git bisect bad 49fdf6785fd660e18a1eb4588928f47e9fa29a9a
    # good: [3fa8749e584b55f1180411ab1b51117190bac1e5] Linux 2.6.27
    git bisect good 3fa8749e584b55f1180411ab1b51117190bac1e5
    # good: [cf2fa66055d718ae13e62451bb546505f63906a2] Merge branch 'for_linus'
        of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
    git bisect good cf2fa66055d718ae13e62451bb546505f63906a2
    # good: [2be508d847392e431759e370d21cea9412848758] Merge
        git://git.infradead.org/mtd-2.6
    git bisect good 2be508d847392e431759e370d21cea9412848758
    # bad: [b80de369aa5c7c8ce7ff7a691e86e1dcc89accc6] 8250: Add more
        OxSemi devices
    git bisect bad b80de369aa5c7c8ce7ff7a691e86e1dcc89accc6
    # good: [9301975ec251bab1ad7cfcb84a688b26187e4e4a] Merge branch
        'genirq-v28-for-linus' of
    git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
    git bisect good 9301975ec251bab1ad7cfcb84a688b26187e4e4a
    # bad: [7cf5244ce4a0ab3f043f2e9593e07516b0df5715] mfd: check for
        platform_get_irq() return value in sm501
    git bisect bad 7cf5244ce4a0ab3f043f2e9593e07516b0df5715

If you get lost during the process, or if you just want to
start over for any reason, type the git bisect
replay
command using the log file as input. If needed, this is
an excellent mechanism to back up one step in the process and explore a
different path.

Let’s narrow down the defect with five more bad
answers:

    $ git bisect bad
    Bisecting: 51 revisions left to test after this
    [d3ee6d992821f471193a7ee7a00af9ebb4bf5d01] ftrace: make it
        depend on DEBUG_KERNEL

    $ git bisect bad
    Bisecting: 25 revisions left to test after this
    [3f5a54e371ca20b119b73704f6c01b71295c1714] ftrace: dump out
        ftrace buffers to console on panic

    $ git bisect bad
    Bisecting: 12 revisions left to test after this
    [8da3821ba5634497da63d58a69e24a97697c4a2b] ftrace: create
        _mcount_loc section

    $ git bisect bad
    Bisecting: 6 revisions left to test after this
    [fa340d9c050e78fb21a142b617304214ae5e0c2d] tracing: disable
        tracepoints by default

    $ git bisect bad
    Bisecting: 2 revisions left to test after this
    [4a0897526bbc5c6ac0df80b16b8c60339e717ae2] tracing: tracepoints, samples

You may use the git bisect
visualize
to visually inspect the set of commits still within
the range of consideration. Git uses the graphical tool gitk if the DISPLAY environment variable is set. If not,
then Git will use git log instead. In
that case, --pretty=oneline might be useful,
too.

    $ git bisect visualize --pretty=oneline

    fa340d9c050e78fb21a142b617304214ae5e0c2d tracing: disable tracepoints
        by default
    b07c3f193a8074aa4afe43cfa8ae38ec4c7ccfa9 ftrace: port to tracepoints
    0a16b6075843325dc402edf80c1662838b929aff tracing, sched: LTTng
        instrumentation - scheduler
    4a0897526bbc5c6ac0df80b16b8c60339e717ae2 tracing: tracepoints, samples
    24b8d831d56aac7907752d22d2aba5d8127db6f6 tracing: tracepoints,
        documentation
    97e1c18e8d17bd87e1e383b2e9d9fc740332c8e2 tracing: Kernel Tracepoints

The current revision under consideration is roughly in the middle
of the range.

    $ git bisect good
    Bisecting: 1 revisions left to test after this
    [b07c3f193a8074aa4afe43cfa8ae38ec4c7ccfa9] ftrace: port to tracepoints

When you finally test the last revision and Git has isolated the
one revision that introduced the
problem,[16] it’s displayed:

    $ git bisect good
    fa340d9c050e78fb21a142b617304214ae5e0c2d is first bad commit
    commit fa340d9c050e78fb21a142b617304214ae5e0c2d
    Author: Ingo Molnar <mingo@elte.hu>
    Date:   Wed Jul 23 13:38:00 2008 +0200

    tracing: disable tracepoints by default

    while it's arguably low overhead, we dont enable new features by default.

    Signed-off-by: Ingo Molnar <mingo@elte.hu>

    :040000 040000 4bf5c05869a67e184670315c181d76605c973931
        fd15e1c4adbd37b819299a9f0d4a6ff589721f6c M  init

Finally, when your bisection run is complete and you are finished
with the bisection log and the saved state, it is vital that you tell
Git that you have finished. As you may recall, the whole bisection
process is performed on a detached HEAD:

    $ git branch
    * (no branch)
      master

    $ git bisect reset
    Switched to branch "master"

    $ git branch
    * master

Running git bisect
reset
places you back on your original branch.

Using git blame

Another tool you can use to help identify a particular
commit is git blame. This command
tells you who last modified each line of a file and which commit made
the change.

$ git blame -L 35, init/version.c

4865ecf1 (Serge E. Hallyn 2006-10-02 02:18:14 -0700 35)         },
^1da177e (Linus Torvalds  2005-04-16 15:20:36 -0700 36) };
4865ecf1 (Serge E. Hallyn 2006-10-02 02:18:14 -0700 37) EXPORT_SYMBOL_GPL(init_uts_ns);
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 38)
c71551ad (Linus Torvalds  2007-01-11 18:18:04 -0800 39) /* FIXED STRINGS! 
                                                        Don't touch! */
c71551ad (Linus Torvalds  2007-01-11 18:18:04 -0800 40) const char linux_banner[] =
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 41)       "Linux version "
                                                              UTS_RELEASE "
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 42)       (" LINUX_COMPILE_BY "@"
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 43)       LINUX_COMPILE_HOST ")
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 44)       (" LINUX_COMPILER ") 
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 45)       " UTS_VERSION "\n";
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 46)
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 47) const char linux_proc_banner[] =
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 48)       "%s version %s"
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 49)       " (" LINUX_COMPILE_BY
                                                              "@"
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 50)       LINUX_COMPILE_HOST ")"
3eb3c740 (Roman Zippel    2007-01-10 14:45:28 +0100 51)       " (" LINUX_COMPILER ")
                                                              %s\n";

Using Pickaxe

Wheareas git blame
tells you about the current state of a file, git log -S string
searches back through the history of a file’s diffs for the given
string. By searching the actual diffs between
revisions, this command can find commits that perform a
change in both additions and deletions.

    $ git log -Sinclude --pretty=oneline --abbrev-commit init/version.c
    cd354f1... [PATCH] remove many unneeded #includes of sched.h
    4865ecf... [PATCH] namespaces: utsname: implement utsname namespaces
    63104ee... kbuild: introduce utsrelease.h
    1da177e... Linux-2.6.12-rc2

Each of the commits listed on the left ( cd354f1, etc.) will either add or delete lines
that contain the word include. Be
careful, though. If a commit both adds and subtracts exactly the same
number of instances of lines with your key phrase, that won’t be shown.
The commit must have a change in the number of
additions and deletions in order to count.

The -S option to git
log
is called pickaxe. That’s brute
force archeology for you.


[12] Git also records a mode flag indicating the executability of
each file. Changes in this flag are also part of a changeset.

[13] Yes, you can actually introduce multiple root commits into a
single repository. This happens, for example, when two different
projects and both entire repositories are brought together and
merged into one.

[14] The gitk command is not a
Git subcommand; it is its own independent command and installable
package.

[15] For the curious reader who would like to duplicate this
example, HEAD is commit 49fdf6785fd660e18a1eb4588928f47e9fa29a9a
here.

[16] No, this commit did not necessarily introduce a problem. The
good and bad answers were fabricated
and landed here.

Comments are closed.