Git – Diffs

Install PHP on CentOS 8

A diff is a compact summary of the
differences (hence the name diff) between two items. For
example, given two files, the Unix and Linux diff command compares the files line by line and
summarizes the deviations in a diff, as shown in Example 8-1. In the example, initial is one version of some prose and
rewrite is a subsequent revision. The
-u option produces a unified diff, a
standardized format used widely to share modifications.

Example 8-1. Simple Unix diff
    $ cat initial                $ cat rewrite
    Now is the time              Today is the time
    For all good men             For all good men
    To come to the aid           And women
    Of their country.            To come to the aid
                                 Of their country.

    $ diff -u initial rewrite
    --- initial     1867-01-02 11:22:33.000000000 -0500
    +++ rewrite     2000-01-02 11:23:45.000000000 -0500
    @@ -1,4 +1,5 @@
    -Now is the time
    +Today is the time
     For all good men
    +And women
     To come to the aid
     Of their country.

Let’s look at the diff in detail. In the header, the original
file is denoted by --- and the new file
by +++. The @@ line provides line number context for both file
versions. A line prefixed with a minus sign () must be removed from the original file to
produce the new file. Conversely, a line with a leading plus sign ( +) must be added to the original file to produce
the new file. A line that begins with a space is the same in both files and
is provided by the -u option as context.

By itself, a diff offers no reason or rationale for a change, nor does
it justify the initial or final state. However, a diff offers more than just
a digest of how files differ. It provides a formal description of how to
transform one file to the other. (You’ll find such instructions useful when
applying or reverting changes.) In addition, diff can be extended to show differences among
multiple files and entire directory hierarchies.

The Unix diff command can
compute the differences of all pairs of files found in two directory
hierarchies. The command diff -r
traverses each hierarchy in tandem, twins files by pathname (say, original/src/main.c and new/src/main.c), and summarizes the differences
between each pair. Using diff -r -u
produces a set of unified diffs comparing two hierarchies.

Git has its own diff facility and can likewise produce a digest of
differences. The command git diff can
compare files much akin to Unix’s diff
command. Moreover, like diff -r, Git can
traverse two tree objects and generate a
representation of the variances. But git
diff
also has its own nuances and powerful features tailored to
the particular needs of Git users.

Note

Technically, a tree object represents only one directory level in
the repository. It contains information on the directory’s immediate files
and immediate subdirectories, but it does not catalog the complete
contents of all subdirectories. However, because a tree object references
the tree objects for each subdirectory, the tree object at the root of the
project effectively represents the entire project at a moment in time.
Hence, we can paraphrase and say git
diff
traverses two trees.

In this chapter, we’ll cover some of the basics of git diff and some of its special capabilities. You will learn how to use Git to
show editorial changes in your working directory as well as arbitrary
changes between any two commits within your project history. You will see
how Git’s diff can help you make well-structured commits during your normal
development process and you will also learn how to produce Git patches,
which are described in detail in Chapter 14.

Forms of the git diff Command

If you pick two different root-level tree objects for
comparison, git diff yields all
deviations between the two project states. That’s powerful. You could use
such a diff to convert wholesale from one project state to another. For
example, if you and a co-worker are
developing code for the same project, a root-level diff could effectively
sync the repositories at any time.

There are three basic sources for tree or treelike objects to use
with git diff:

  • Any tree object anywhere within the entire commit graph

  • Your working directory

  • The index

Typically, the trees compared in a git diff command are named via commits, branch
names, or tags, but any commit name discussed in Identifying Commits of Chapter 6 suffices. Also, both the file and directory
hierarchy of your working directory, as well as the complete hierarchy of
files staged in the index, can be treated as trees.

The git diff command can perform
four fundamental comparisons using various combinations of those three
sources.

git diff

git diff shows the
difference between your working directory and the index. It exposes
what is dirty in your working directory and
is thus a candidate to stage for your next
commit. This command does not reveal differences between what’s in
your index and what’s permanently stored in the repository (not to
mention remote repositories you might be working with).

git diff
commit

This form summarizes the differences between your working
directory and the given commit. Common
variants of this command name HEAD or a particular branch name as the
commit.

git diff –cached
commit

This command shows the differences between the staged
changes in the index and the given
commit. A common commit for the
comparison—and the default if no commit is specified—is HEAD. With HEAD, this command shows you how your next
commit will alter the current branch.

If the option --cached doesn’t make sense to
you, perhaps the synonym --staged will. It is
available in Git version 1.6.1 and later.

git diff
commit1
commit2

If you specify two arbitrary commits, the command
displays the differences between the two. This command ignores
the index and working directory, and it is the workhorse for
arbitrary comparisons between two trees that are already in your
object store.

The number of parameters on the command line determines what
fundamental form is used and what is compared. You can compare any two
commits or trees. What’s being compared need not have a direct or even an
indirect parent–child relationship. If you don’t supply a tree object or
two, then git diff compares implied
sources, such as your index or working directory.

Let’s examine how these different forms apply to Git’s object model.
The example in Figure 8-1 shows a project
directory with two files. The file file1 has been modified in the working
directory, changing its content from foo to
quux. That change has been staged in the index using
git add file1, but it is not yet
committed.

Figure 8-1. Various file versions that can be compared

A version of the file file1
from each of your working directory, the index, and the HEAD have been identified. Even though the
version of file1 that is in the
index, bd71363, is actually stored as a
blob object in the object store, it is indirectly referenced through the
virtual tree object that is the index. Similarly, the HEAD version of the file, a23bf, is also indirectly referenced through
several steps.

This example nominally demonstrates the changes within file1. The bold arrows in the figure point to
the tree or virtual tree objects to remind you that the comparison is
actually based on complete trees and not just on individual files.

From Figure 8-1, you can see how using
git diff without arguments is a good
technique for verifying the readiness of your next commit. As long as that
command emits output, you have edits or changes in your working directory
that are not yet staged. Check the edits on each file. If you are
satisfied with your work, use git add
to stage the file. Once you stage a changed file, the next git diff no longer yields diff output for that
file. In this way, you can step progressively through each dirty file in
your working directory until the differences disappear, meaning that all
files are staged in your index. Don’t forget to check for new or deleted
files, too. At any time during the staging process, the command git diff –cached shows the complementary
changes, or those changes already staged in the index that will be present
in your next commit. When you’re finished, git
commit
captures all changes in your index into a new
commit.

You are not required to stage all the changes from your working
directory for a single commit. In fact, if you find you have conceptually
different changes in your working directory that should be made in
different commits, you can stage one set at a time, leaving the other
edits in your working directory. A commit captures only your staged
changes. Repeat the process, staging the next set of files appropriate for
a subsequent commit.

The astute reader might have noticed that, although there are four
fundamental forms of the git diff
command, only three are highlighted with bold arrows in Figure 8-1. So, what is the fourth? There is only one
tree object represented by your working directory, and there is only one tree object
represented by the index. In the example, there is one commit in the
object store along with its tree. However, the object store is likely to
have many commits named by different branches and tags, all of which have
trees that can be compared with git
diff
. Thus, the fourth form of git
diff
simply compares any two arbitrary commits (trees) already
stored within the object store.

In addition to the four basic forms of git diff, there are myriad options as well. Here
are a few of the more useful ones.

-M

The -M option detects renames and
generates a simplified output that simply records the file rename
rather than the complete removal and subsequent addition of the
source file. If the rename is not a pure rename but also has some
additional content changes, Git calls those out.

-w or
--ignore-all-space

Both -w and
--ignore-all-space compare lines without
considering changes in whitespace as significant.

--stat

The --stat option adds statistics
about the difference between any two tree states. It reports in a
compact syntax how many lines changed, how many were added, and how
many were elided.

--color

The --color option colorizes the
output; a unique color represents each of the different types of
changes present in the diff.

Finally, the git diff may be
limited to show diffs for a specific set of files or
directories.

Warning

The -a option for git diff does nothing even remotely like the
-a option for git
commit
. To get both staged and unstaged changes, use git diff HEAD. The lack of symmetry is
unfortunate and counterintuitive.

Simple git diff Example

Here we construct the scenario presented in Figure 8-1, run through the scenario, and watch the
various forms of git diff in action.
First, let’s set up a simple repository with two files in it.

    $ mkdir /tmp/diff_example
    $ cd /tmp/diff_example

    $ git init
    Initialized empty Git repository in /tmp/diff_example/.git/

    $ echo "foo" > file1
    $ echo "bar" > file2

    $ git add file1 file2

    $ git commit -m "Add file1 and file2"
    [master (root-commit)]: created fec5ba5: "Add file1 and file2"
     2 files changed, 2 insertions(+), 0 deletions(-)
     create mode 100644 file1
     create mode 100644 file2

Next, let’s edit file1 by
replacing the word foo with quux.

    $ echo "quux" > file1

The file1 has been modified in
the working directory but has not been staged. This state is not yet the
situation depicted in Figure 8-1, but you can
still make a comparison. You should expect output if you compare the
working directory with the index or the existing HEAD versions. However, there should be no
difference between the index and the HEAD because nothing has been staged. (In other
words, what is staged is the current HEAD tree still.)

    # working directory versus index
    $ git diff
    diff --git a/file1 b/file1
    index 257cc56..d90bda0 100644
    --- a/file1
    +++ b/file1
    @@ -1 +1 @@
    -foo
    +quux

    # working directory versus HEAD
    $ git diff HEAD
    diff --git a/file1 b/file1
    index 257cc56..d90bda0 100644
    --- a/file1
    +++ b/file1
    @@ -1 +1 @@
    -foo
    +quux

    # index vs HEAD, identical still
    $ git diff --cached
    $

Applying the maxim just given, git
diff
produced output and so file1 could be staged. Let’s do this
now.

    $ git add file1

    $ git status
    # On branch master
    # Changes to be committed:
    #   (use "git reset HEAD <filed>..." to unstage)
    #
    #       modified:   file1

Here you have exactly duplicated the situation described by Figure 8-1. Because file1 is now staged, the working directory and
the index are synchronized and should not show any differences. However,
there are now differences between the HEAD version and both the working directory and
the staged version in the index.

    # working directory versus index
    $ git diff

    # working directory versus HEAD
    $ git diff HEAD
    diff --git a/file1 b/file1
    index 257cc56..d90bda0 100644
    --- a/file1
    +++ b/file1
    @@ -1 +1 @@
    -foo
    +quux

    # index vs HEAD
    $ git diff --cached
    diff --git a/file1 b/file1
    index 257cc56..d90bda0 100644
    --- a/file1
    +++ b/file1
    @@ -1 +1 @@
    -foo
    +quux

If you ran git commit now, the
new commit would capture the staged changes shown by the last command,
git diff –cached (which, as mentioned
before, has the new synonym git diff
–staged
).

Now, to throw a monkey wrench in the works, what would happen if you
edited file1 before making a commit?
Let’s see!

    $ echo "baz" > file1

    # wd versus index
    $ git diff
    diff --git a/file1 b/file1
    index d90bda0..7601807 100644
    --- a/file1
    +++ b/file1
    @@ -1 +1 @@
    -quux
    +baz

    # wd versus HEAD
    $ git diff HEAD
    diff --git a/file1 b/file1
    index 257cc56..7601807 100644
    --- a/file1
    +++ b/file1
    @@ -1 +1 @@
    -foo
    +baz

    # index vs HEAD
    $ git diff --cached
    diff --git a/file1 b/file1
    index 257cc56..d90bda0 100644
    --- a/file1
    +++ b/file1
    @@ -1 +1 @@
    -foo
    +quux

All three diff operations show some form of difference now!
But which version will be committed? Remember, git commit captures the state present in the
index. And what’s in the index? It’s the content revealed by git diff –cached or git diff –staged command, or the version of
file1 that contains the word
quux!

    $ git commit -m "quux uber alles"
    [master]: created f8ae1ec: "quux uber alles"
     1 files changed, 1 insertions(+), 1 deletions(-)

Now that the object store has two commits in it, let’s try
the general form of the git diff
command.

    # Previous HEAD version versus current HEAD
    $ git diff HEAD^ HEAD
    diff --git a/file1 b/file1
    index 257cc56..d90bda0 100644
    --- a/file1
    +++ b/file1
    @@ -1 +1 @@
    -foo
    +quux

This diff confirms that the previous commit changed file1 by replacing foo with
quux.

So is everything synchronized now? No. The working directory copy of
file1 contains
baz.

    $ git diff
    diff --git a/file1 b/file1
    index d90bda0..7601807 100644
    --- a/file1
    +++ b/file1
    @@ -1 +1 @@
    -quux
    +baz

git diff and Commit Ranges

There are two additional forms of git diff that bear some explanation, especially
in contrast to git log.

The git diff command
supports a double-dot syntax to represent the difference between two commits. Thus, the following two
commands are equivalent:

    $ git diff master bug/pr-1
    $ git diff master..bug/pr-1

Unfortunately, the double-dot syntax in git
diff
means something quite different from the same syntax in
git log, which you learned about in
Chapter 6. It’s worth comparing git diff and git
log
in this regard because doing so highlights the relationship
of these two commands to changes made in repositories. Some points to keep
in mind for the following example:

  • git diff doesn’t care about
    the history of the files it compares or anything about branches

  • git log is extremely
    conscious of how one file changed to become another—for example, when
    two branches diverged and what happened on each branch

The log and diff commands perform two fundamentally
different operations. Whereas log
operates on a set of commits, diff
operates on two different end points.

Imagine the following sequence of events:

  1. Someone creates a new branch off the master branch to fix bug pr-1, calling the new branch bug/pr-1.

  2. The same developer adds the line Fix Problem report
    1
    to a file in the bug/pr-1
    branch.

  3. Meanwhile, another developer fixes bug pr-3 in the master branch, adding the line Fix
    Problem report 3
    to the same file in the master branch.

In short, one line was added to a file in each branch. If you look
at the changes to branches at a high level, you can see when the bug/pr-1 branch was launched and when each
change was made:

    $ git show-branch master bug/pr-1
    * [master] Added a bug fix for pr-3.
     ! [bug/pr-1] Fix Problem Report 1
    --
    *  [master] Added a bug fix for pr-3.
     + [bug/pr-1] Fix Problem Report 1
    *+ [master^] Added Bob's fixes.

If you type git log -p
master..bug/pr-1
, you will see one commit, because the syntax
master..bug/pr-1 represents all those
commits in bug/pr-1 that are not also
in master. The command traces back to
the point where bug/pr-1 diverged from
master, but it does not look at
anything that happened to master since
that point.

    $ git log -p master..bug/pr-1
    commit 8f4cf5757a3a83b0b3dbecd26244593c5fc820ea
    Author: Jon Loeliger <jdl@example.com>
    Date:   Wed May 14 17:53:54 2008 -0500

    Fix Problem Report 1

    diff --git a/ready b/ready
    index f3b6f0e..abbf9c5 100644
    --- a/ready
    +++ b/ready
    @@ -1,3 +1,4 @@
     stupid
     znill
     frot-less
    +Fix Problem report 1

In contrast, git diff
master..bug/pr-1
shows the total set of differences between the
two trees represented by the heads of the master and bug/pr-1 branches. History doesn’t matter; only
the current state of the files does.

    $ git diff master..bug/pr-1
    diff --git a/ready b/ready
    index f3b6f0e..abbf9c5 100644
    --- a/ready
    +++ b/ready
    @@ -1,4 +1,4 @@
     stupid
     znill
     frot-less
    -Fix Problem report 3
    +Fix Problem report 1

To paraphrase the git diff
output, you can change the file in the master branch to the version in the bug/pr-1 branch by removing the line Fix
Problem report 3
and then adding the line Fix Problem
report 1
to the file.

As you can see, this diff includes commits from both branches. This
may not seem crucial with this small example, but consider the example in
Figure 8-2 with more expansive lines of
development on two branches.

Figure 8-2. git diff larger history

In this case, git log
master..maint
represents the five individual commits V, W, …,
Z. On the other hand, git diff master..maint represents the
differences in the trees at H and
Z, an accumulated 11 commits: C, D, …,
H and V, …, Z.

Similarly, both git log
and git diff accept the form
commit1... commit2
to produce a symmetrical difference. As
before, however, git log
commit1 commit2
and git diff
commit1 commit2
yield different results.

As discussed in Commit Ranges of Chapter 6, the command git log
commit1 commit2
displays the commits reachable from either commit but not both.
Thus, git log master…maint in the
previous example would yield C,
D, …, H and V, …,
Z.

The symmetric difference in git
diff
shows the differences between a commit that is a common
ancestor (or merge base) of
commit1 and commit2.
Given the same genealogy in Figure 8-2,
git diff master…maint combines the
changes in the commits V, W, …, Z.

git diff with Path Limiting

By default, the command git
diff
operates on the entire directory structure rooted at a
given tree object. However, you can leverage the same path
limiting
technique employed by git
log
to limit the output of git
diff
to a subset of the repository.

For example, at one point[18] in the development of the Git’s own repository, git diff –stat displayed this:

    $ git diff --stat master~5 master
     Documentation/git-add.txt            |    2 +-
     Documentation/git-cherry.txt         |    6 +++++
     Documentation/git-commit-tree.txt    |    2 +-
     Documentation/git-format-patch.txt   |    2 +-
     Documentation/git-gc.txt             |    2 +-
     Documentation/git-gui.txt            |    4 +-
     Documentation/git-ls-files.txt       |    2 +-
     Documentation/git-pack-objects.txt   |    2 +-
     Documentation/git-pack-redundant.txt |    2 +-
     Documentation/git-prune-packed.txt   |    2 +-
     Documentation/git-prune.txt          |    2 +-
     Documentation/git-read-tree.txt      |    2 +-
     Documentation/git-remote.txt         |    2 +-
     Documentation/git-repack.txt         |    2 +-
     Documentation/git-rm.txt             |    2 +-
     Documentation/git-status.txt         |    2 +-
     Documentation/git-update-index.txt   |    2 +-
     Documentation/git-var.txt            |    2 +-
     Documentation/gitk.txt               |    2 +-
     builtin-checkout.c                   |    7 ++++-
     builtin-fetch.c                      |    6 ++--
     git-bisect.sh                        |   29 ++++++++++++--------------
     t/t5518-fetch-exit-status.sh         |   37 ++++++++++++++++++++++++++++++++++
     23 files changed, 83 insertions(+), 40 deletions(-)

To limit the output to just Documentation changes, you could instead use
git diff –stat master~5 master
Documentation
:

    $ git diff --stat master~5 master Documentation
     Documentation/git-add.txt            |    2 +-
     Documentation/git-cherry.txt         |    6 ++++++
     Documentation/git-commit-tree.txt    |    2 +-
     Documentation/git-format-patch.txt   |    2 +-
     Documentation/git-gc.txt             |    2 +-
     Documentation/git-gui.txt            |    4 ++--
     Documentation/git-ls-files.txt       |    2 +-
     Documentation/git-pack-objects.txt   |    2 +-
     Documentation/git-pack-redundant.txt |    2 +-
     Documentation/git-prune-packed.txt   |    2 +-
     Documentation/git-prune.txt          |    2 +-
     Documentation/git-read-tree.txt      |    2 +-
     Documentation/git-remote.txt         |    2 +-
     Documentation/git-repack.txt         |    2 +-
     Documentation/git-rm.txt             |    2 +-
     Documentation/git-status.txt         |    2 +-
     Documentation/git-update-index.txt   |    2 +-
     Documentation/git-var.txt            |    2 +-
     Documentation/gitk.txt               |    2 +-
     19 files changed, 25 insertions(+), 19 deletions(-)

Of course, you can view the diffs for a single file, too.

    $ git diff master~5 master Documentation/git-add.txt
    diff --git a/Documentation/git-add.txt b/Documentation/git-add.txt
    index bb4abe2..1afd0c6 100644
    --- a/Documentation/git-add.txt
    +++ b/Documentation/git-add.txt
    @@ -246,7 +246,7 @@ characters that need C-quoting.  `core.quotepath` configuration can be
     used to work this limitation around to some degree, but backslash,
     double-quote and control characters will still have problems.

    -See Also
    +SEE ALSO
     --------
     linkgit:git-status[1]
     linkgit:git-rm[1]

In the following example, also taken from Git’s own repository, the
-S string” searches the past
50 commits to the master branch for
changes containing string.

    $ git diff -S"octopus" master~50
    diff --git a/Documentation/RelNotes-1.5.5.3.txt b/Documentation/RelNotes-1.5.5.3.txt
    new file mode 100644
    index 0000000..f22f98b
    --- /dev/null
    +++ b/Documentation/RelNotes-1.5.5.3.txt
    @@ -0,0 +1,12 @@
    +GIT v1.5.5.3 Release Notes
    +==========================
    +
    +Fixes since v1.5.5.2
    +--------------------
    +
    + * "git send-email --compose" did not notice that non-ascii contents
    +   needed some MIME magic.
    +
    + * "git fast-export" did not export octopus merges correctly.
    +
    +Also comes with various documentation updates.

Used with -S, often called the
pickaxe, Git lists the diffs that contain a change
in the number of times the given string is used
in the diff. Conceptually, you can think of this as Where is the
given string either introduced or
removed?
You can find an example of the pickaxe used with git log in Using Pickaxe of Chapter 6.

Comparing How Subversion and Git Derive diffs

Most systems, such as CVS or SVN, track a series of
revisions and store just the changes between each pair of files. This
technique is meant to save storage space and overhead.

Internally, such systems spend a lot of time thinking about things
like the series of changes between A and B. When you update
your files from the central repository, for example, SVN remembers that
the last time you updated the file you were at revision r1095, but now the repository is at revision
r1123. Thus, the server must send you
the diff between r1095 and r1123. Once your SVN client has these diffs, it
can incorporate them into your working copy and produce r1123. (That’s how SVN avoids sending you all
the contents of all files every time you update.)

To save disk space, SVN also stores its own repository as a series
of diffs on the server. When you ask for the diffs between r1095 and r1123, it looks up all the individual diffs for
each version between those two versions, merges them together into one
large diff, and sends you the result. But Git doesn’t work like
that.

In Git, as you’ve seen, each commit contains a tree, which is a list
of files contained by that commit. Each tree is independent of all other
trees. Git users still talk about diffs and patches, of course, because
these are still extremely useful. Yet, in Git, a diff and a patch are
derived data, not the fundamental data they are in CVS or SVN. If you look
in the .git directory, you won’t find
a single diff; if you look in a SVN repository, it consists mostly of
diffs.

Just as SVN is able to derive the complete set of differences
between r1095 and r1123, Git can retrieve and derive the
differences between any two arbitrary states. But SVN must look at each
version between r1095 and r1123, whereas Git doesn’t care about the
intermediate steps.

Each revision has its own tree, but Git doesn’t require those to
generate the diff; Git can operate directly on snapshots of the complete
state at each of the two versions. This simple difference in storage
systems is one of the most important reasons that Git is so much faster
than other RCSs.


[18] d2b3691b61d516a0ad2bf700a2a5d9113ceff0b1

Comments are closed.