loading...

Git – Tips, Tricks, and Techniques

install phpMyAdmin On CentOS 8

With a plethora of commands and options, Git provides a rich resource
for performing varied and powerful changes to a repository. Sometimes,
though, the actual means for accomplishing some particular task are a bit
elusive. Sometimes, the purpose of a particular command and option isn’t really
clear or becomes lost in a technical description.

This chapter provides a collection of various tips, tricks, and
techniques that highlight Git’s ability to do interesting
transformations.

Interactive Rebase with a Dirty Working Directory

Frequently, when developing a multicommit change sequence on
a local branch, I realize that I
need to make an additional modification to some commit I’ve already made
earlier in the sequence. Rather than scribbling a note about it on the
side and coming back to it later, I will immediately edit and introduce
that change directly into a new commit with a reminder note in the commit
log entry that it should be squashed into a previous commit.

When I eventually get around to cleaning up my commit
sequence, and want to use git rebase
-i
, I am often midstride and find myself with a dirty working
directory. In this case, Git will refuse to do the rebase.

    $ git show-branch --more=10
    [master] Tinker bar
    [master^] Squash into 'More foo and bar'
    [master~2] Modify bar
    [master~3] More foo and bar
    [master~4] Initial foo and bar.

    $ git rebase -i master~4
    Cannot rebase: You have unstaged changes.
    Please commit or stash them.

As suggested, clean out your dirty working directory with
git stash first!

    $ git stash
    Saved working directory and index state WIP on master: ed6e906 Tinker bar
    HEAD is now at ed6e906 Tinker bar

    $ git rebase -i master~4

    # In the editor, move master^ next to master~3
    # and mark it for squashing.
    pick 1a4be28 More foo and bar
    squash 6195b3d Squash into 'more foo and bar'
    pick 488b893 Modify bar
    pick ed6e906 Tinker bar

    [detached HEAD e3c46b8] More foo and bar with additional stuff.
     2 files changed, 2 insertions(+), 1 deletions(-)
    Successfully rebased and updated refs/heads/master.

Naturally, you will want to recover your working directory changes
now:

    $ git stash pop
    # On branch master
    # Changes not staged for commit:
    #   (use "git add <file>..." to update what will be committed)
    #   (use "git checkout -- <file>..." to discard changes in working directory)
    #
    #    modified:   foo
    #
    no changes added to commit (use "git add" and/or "git commit -a")
    Dropped refs/stash@{0} (71b4655668e49ce88686fc9eda8432430b276470)

Remove Left-Over Editor Files

Because the git
filter-branch
command really drives a shell operation, either
the --index-filter
command
or the
--tree-filter command can use
normal shell wild card matching in its
command. That can be handy
when you accidentally add, say, temporary editor files on first creating
your repository.

    $ git filter-branch --tree-filter 'rm -f *~' -- --all

That command will remove all files matching the *~ pattern from -- --all refs
in one command.

Garbage Collection

In The git fsck Command, which I
expanded on the concept of reachability was first introduced in Chapter 4. In those sections, I explained how the
Git object store and its commit graph might leave unreferenced or dangling
objects within the object store. I also gave a few examples how some
commands might leave these unreferenced objects in your repository.

Having dangling commits or unreachable objects is not
necessarily bad. You may have moved away from a particular commit
intentionally or added a file blob and then changed it again before
actually committing it. The problem, however, is that over a long period,
manipulating the repository can be messy and leave many unreferenced
objects in your object store.

Historically, within the computer science industry, such
unreferenced objects are cleaned up by an algorithm called garbage
collection.
It is the job of the git
gc
command to perform periodic garbage collection and keep your
repository object stores neat and tidy.

This is neat, tidy, and small. Git’s garbage collection has one
other very important task: optimizing the size of the repository by
locating unpacked objects (loose objects) and creating pack files for
them.

So when does garbage collection happen, and how often? Is it
automatic or is it something that needs to be done manually? When it runs
does it remove everything it can? Pack everything it can?

All good questions and, as usual, the answers are all, It
depends.

For starters, Git runs garbage collection automatically at strategic
times. At other times, you should run git
gc
directly by hand.

Git runs garbage collection automatically:

  • If there are too many loose objects in the repository

  • When a push to a remote repository happens

  • After some commands that might introduce many loose
    objects

  • When some commands such as git reflog
    expire
    explicitly request it

And finally, garbage collection occurs when you explicitly request
it using the git gc command. But when
should that be? There’s no solid answer to this question, but there is
some good advice and best practice.

You should consider running git
gc
manually in a few situations:

  • If you have just completed a git
    filter-branch
    . Recall that filter-branch rewrites many commits,
    introduces new ones, and leaves the old ones on a ref that should be removed when you are
    satisfied with the results. All those dead objects (that are no longer
    referenced since you just removed the one ref pointing to them) should be removed via
    garbage collection.

  • After some commands that might introduce many loose objects.
    This might be a large rebase effort, for example.

And on the flip side, when should you be wary of garbage
collection?

  • If there are orphaned refs that you might want to recover

  • In the context of git
    rerere
    [44] and you do not need to save the resolutions
    forever

  • In the context of only tags and branches being sufficient to
    cause Git to retain a commit permanently

  • In the context of FETCH_HEAD
    retrievals (URL-direct retrievals via git
    fetch
    ) because they are immediately subject to garbage
    collection

Git doesn’t spontaneously jump to life and carry out garbage
collection of its own free will, not even automatically. Instead, what
happens is that certain commands that you run cause Git to then consider
running garbage collection and packing. But just because you run those
commands and Git runs git gc doesn’t
mean that Git acts on this trigger. Instead, Git
takes that opportunity to inspect a whole series of configuration
parameters that guide the inner workings of both the removal of
unreferenced objects and the creation of pack files. Some of the more
important git config parameters
include:

gc.auto

The number of loose objects allowed to exist in a
repository before garbage collection causes them to be packed. The
default is 6700.

gc.autopacklimit

The number of pack files that may exist in a
repository before pack files are themselves repacked into larger,
more efficient pack files. The default is 50.

gc.pruneexpire

The period of time unreachable objects may linger in
an object store. The default is two weeks.

gc.reflogexpire

The git reflog
expire
command will remove reflog entries older than this time
period. The default is 90 days.

gc.reflogexpireunreachable

The git reflog
expire
command will remove reflog entries older than this time period
only if they are unreachable from the current branch. The default is
30 days.

Most of the garbage collection config parameters have a value that
means either do it now or never do
it.

Split a Repository

You can use Git’s filter-branch to split a repository or to
extract subdirectories. And in this case, we mean split a repository and
maintain the history that lead to this point. (If you don’t care about the
development and commit history and want to split a repository, just clone
the repository and remove the parts from each that you don’t want!) This
approach preserves the appropriate development and commit
history.

For example, let’s say you had a repository with four top-level
directories named part1, part2, part3,
and part4, and you wanted to split the
top-level directory part4 into its own
repository.

For starters, you should work in a clone of the original repository
and remove all of the origin remote
references. This will ensure that you don’t destroy the original
repository, nor will you think you can push or fetch changes from your
original via a lingering remote reference.

Then, use the --subdirectory-filter option like
this:

    $ git filter-branch --subdirectory-filter part4 HEAD

However, there are likely some extenuating circumstances
that will cause you to want to extend that command to allow for incidental
and tricky situations. Do you have tags and want them reflected in the new
part4 repository too? If so, add the
--tag-name-filter cat option. Might a commit end up empty
due to its inapplicability to this sub-section of the original repository?
Almost certainly, so add the --prune-empty too. Are you
interested in only the one current branch indicated by HEAD? Almost certainly not. Instead, you might
want to cover all branches from the original repository. In that case,
you’ll want to use -- --all in place of the final
HEAD parameter.

The revised command now looks like this:

    $ git filter-branch --tag-name-filter cat \
    --subdirectory-filter part4 -- --all

Naturally, you will want to verify the contents are as expected and
then expire your reflog, remove the original refs, and do garbage
collection on the new repository.

Finally, you might (or might not) need to return to your original
repository and perform a different git
filter-branch
to remove part4
from it, too!

Tips for Recovering Commits

Time is the enemy of lost commits. Eventually, Git’s garbage
collection will run and clean out any dangling or unreferenced commits and
blobs. Garbage collection will eventually retire reflog refs as well. At
that point, lost commits are lost and git
fsck
will no longer be able to find them. If you know you are
slow to realize a commit has been lost, you may want to adjust the default
timeouts for reflog expiration and retiring unreferenced commits during
garbage collection.

    # default is 90 days
    $ git config --global gc.reflogExpire "6 months"

    # default is 30 days
    $ git config --global gc.reflogExpireUnreachable "60 days"

    # default is 2 weeks
    $ git config --global gc.pruneexpire="1 month"

Sometimes, using a graphical tool such as gitk or viewing a log graph can help find and
establish necessary context for interpreting and understanding the reflog
and other dangling or orphaned commits.

Here are two aliases that you might add to your global .gitconfig:

    $ git config --global \
        alias.orphank=!gitk --all `git reflog | cut -c1-7`&
    $ git config --global \
        alias.orphanl=!git log --pretty=oneline --abbrev-commit \
        --graph --decorate `git reflog | cut -c1-7`

Subversion Conversion Tips

General Advice

Maintaining an SVN repository and a Git repository in
parallel is a lot of work, especially if subsequent new commits to the
SVN repository are allowed. Make absolutely sure that you need to do
this before you commit to this workflow. By far the easiest approach is
to do the SVN to Git conversion once, making the SVN repository
inaccessible when the conversion has been completed.

Plan on doing all of your importing, converting, and cleaning up
once up front before ever publishing the first Git version of your
repository. There are several steps in a well-planned conversion that
you really should do before anyone else has a chance to clone the first
version of your Git repository. For example, all of your global changes,
such as directory renaming, author and email address cleanup, large file
removal, branch fiddling, tag construction, etc., will be significantly
more difficult for both you and your downstream consumers if they happen
after they have cloned the conversion repository.

Do you really want to remove all the SVN commit identifiers from
your Git commit logs? Just because recipes exist to do so and someone
shows you how, doesn’t mean you should. It’s your call.

After doing a conversion, the metadata in the .git directory for the SVN conversion is lost
upon cloning or pushing to a Git repository. Make sure you are
done.

If you can, ensure that you have a good author and email mapping
file prior to doing your import. Having to fix them up later with
git filter-branch is just extra
pain.

If creating and maintaining parallel SVN and Git repositories
seems complicated, and you find you still must use both, using GitHub’s
Subversion Bridge (see Subversion Bridge) is an easy
alternative that meets this requirement.

Remove a Trunk After an SVN Import

Often, after creating a new repository from an SVN import,
you are left with a top-level directory such as trunk that you don’t really want in your Git
repository.

    $ cd OldSVNStuff

    $ ls -R .
    .:
    trunk

    ./trunk:
    Recipes  Stuff  Things

    ./trunk/Recipes:
    Chicken_Pot_Pie  Ice_Cream

    ./trunk/Stuff:
    Note_to_self

    ./trunk/Things:
    Movie_List

There is no real reason to keep trunk. You can use Git’s filter-branch to remove it:

    $ git filter-branch --subdirectory-filter trunk HEAD
    Rewrite b6b4781ee814cbb6fc6a01a91c8d0654ec78fbe1 (1/1)
    Ref 'refs/heads/master' was rewritten

    $ ls
    Recipes  Stuff  Things

Everything under trunk will
be hoisted up one level and the directory trunk will be eliminated.

Removing SVN Commit IDs

First, run git filter-branch
–msg-filter
using a sed
script to match and delete the SVN commit IDs from your Git log
messages.

    # From the git-filter-branch manual page
    $ git filter-branch --msg-filter 'sed -e "/^git-svn-id:/d"'

Toss the reflog or else it will have lingering references:

    $ git reflog expire --verbose --expire=0 --all

Remember that after a git
filter-branch
command, Git leaves the old, original branch
refs in refs/original/. You should
remove them and take the garbage out with prejudice:

    # Careful...
    $ rm -rf .git/refs/original

    $ git reflog expire --verbose --expire=0 --all
    $ git gc --prune=0
    $ git repack -ad

Alternatively, clone away from it:

    $ cd /tmp/somewhere/else/
    $ git clone file:///home/jdl/stuff/converted.git

Remember to use a file:/// URL,
because a normal, direct file reference will hard link the files rather
than copy them; that won’t be effective.

Manipulating Branches from Two Repositories

I am occasionally asked the question, How do I
compare two branches from different repositories?
It is sometimes
asked with slight variations as well: How do I tell whether my
commits from my repository have been merged into a branch in some other
repository?
Or sometimes something like, What does the
devel branch in this remote repository
have that isn’t in my repository?

These are all fundamentally the same question in that they aim to
resolve or compare branches from two different repositories. Developers
are sometimes thrown off by the fact that the branches they wish to
compare are in two or more different repositories, and that those
repositories might also be remote or located on another server.

In order for these questions to make sense at all, the developer
must know that, at some point back in time during the earlier development
of these repositories, they must have had some common ancestor and were
derived from a common basis. Without such a relationship, it makes little
to no sense to even ask how two branches might compare to each other. That
means that Git should be able to discover the commit graph and branch
history of both repositories and be able to relate them.

The key technique for solving all these questions, then, is to
realize that Git can compare branches only within one local repository.
Thus, you need to have all the branches from all the repositories
colocated in one repository. Usually, this is a simple matter of adding a
new remote for each of the
other repositories containing a needed branch, and
then fetching from it.

Once the branches are all in one repository, use any of the usual
diff or comparison commands on those
branches as needed.

Recovering from an Upstream Rebase

Sometimes, when working in a distributed environment where
you don’t necessarily control the upstream repository from which you
derived your current development clone, the upstream version of the branch
on which you have developed your work will undergo a non–fast-forward
change or a rebase. That change destroys the basis of your branch, and
prevents you from directly sending your changes upstream.

Unfortunately, Git doesn’t provide a way for an upstream repository
maintainer to state how its branches will be treated. That is, there is no
flag that says this branch will be rebased at will, or
don’t expect this branch to fast-forward. You, the
downstream developer, just have to know, intuit its intended behavior, or
ask the upstream maintainer. For the most part, other than that, branches
are expected to fast-forward and not be rebased.

Sure, that can be bad. I’ve explained before how changing published
history is bad. Nevertheless, it happens sometimes. Furthermore, there are
some very good development models that even encourage the occasional
rebasing of a branch during the normal course of development. (For an
example, see how the pu, or proposed
updates branches, of the Git repository itself are handled.)

So when it happens, what do you do? How do you recover so that your
work can be sent upstream again?

First, ask yourself whether the rebased branch is really the
right branch on which you should have been basing your work in the first
place. Branches are often intended to be read only. For example, maybe a
collection of branches are being gathered and merged together for testing
purposes into a read only branch, but are otherwise available individually
and should form the basis of development work. In this case, you likely
shouldn’t have been developing on the merged collection branch. (The Linux
next branches tend to operate like
this.)

Depending on the extent of the rebase that occurred
upstream, you may get off easily and be able to recover with a simple
git pull –rebase. Give it a try; if it
works, you win. But I wouldn’t count on it. You should be prepared to
recover an ensuing mess with a judicious use of reflog.

The real, more reliable approach is to methodically transfer your
developed and orphaned commit
sequence from your now defunct branch to the new upstream branch. The
basic sequence is to:

  • Rename your old upstream branch. It is important to do
    this before you fetch because it allows a clean fetch of the new
    upstream history. Try something like: git
    branch save-origin-master origin/master
    .

  • Fetch from upstream to recover the current upstream content. A
    simple git fetch should be
    sufficient.

  • Rebase your commits from the renamed branch onto the new
    upstream branch using commands like cherry-pick or rebase. This should be the command: git rebase –onto origin/master save-origin-master
    master
    .

  • Clean up and remove the temporary branch. Try using the command
    git branch -D
    save-origin-master
    .

It seems easy enough, but the key can often be in locating the point
back in the history of the upstream branch where the original history and
the new history begin to diverge. It’s possible that everything between
that point and your first commit isn’t needed at all; that is, the rewritten commit history
changes nothing that intersects with your work. In this case, you win
because a rebase should happen readily. On the other hand, it is also
possible that the rewritten history touches the same ground that you were
developing. In this case, you likely have a tough rebase road ahead of you
and will need to fully understand the semantic meanings of the original
and changed histories in order to figure out how to resolve your desired
development changes.

Make Your Own Git Command

Here’s a neat little trick to make your own Git command
that looks like every other git
command
.

First, write your command or script using a name that begins with
the prefix git-. Place it in your
~/bin directory or some other place
that is found on your shell PATH.

Suppose you wanted a script that checked to see if you were in the
top level of your Git repository. Let’s call it git-top-check, like this:

    #!/bin/sh
    # git-top-check -- Is this the top level directory of a Git repo?

    if [ -d ".git" ]; then
        echo "This is a top level Git development repository."
        exit 0
    fi

    echo "This is not a top level Git development repository."
    exit -1

If you now place that script in the file ~/bin/git-top-check and make it executable, you
can use it like this:

    $ cd ~/Repos/git
    $ git top-check
    This is a top level Git development repository.

    $ cd /etc
    $ git top-check
    This is not a top level Git development repository.

Quick Overview of Changes

If you need to keep a repository up to date by continually
fetching from an upstream source, you may find yourself frequently asking
a question similar to, So, what changed in the last
week?

The answer to your wonderment might be the git whatchanged command. Like many commands, it
accepts a plethora of options centered around git
rev-parse
for selecting commits, and formatting options typical
of, say, git log such as the
--pretty= options.

Notably, you might want the --since= option.

    # The Git source repository
    $ cd ~/Repos/git
    $ git whatchanged --since="three days ago" --oneline
    745950c p4000: use -3000 when promising -3000
    :100755 100755 d6e505c... 7e00c9d... M  t/perf/p4000-diff-algorithms.sh
    42e52e3 Update draft release notes to 1.7.10
    :100644 100644 ae446e0... a8fd0ac... M  Documentation/RelNotes/1.7.10.txt
    561ae06 perf: export some important test-lib variables
    :100755 100755 f8dd536... cf8e1ef... M  t/perf/p0000-perf-lib-sanity.sh
    :100644 100644 bcc0131... 5580c22... M  t/perf/perf-lib.sh
    1cbc324 perf: load test-lib-functions from the correct directory
    :100755 100755 2ca4aac... f8dd536... M  t/perf/p0000-perf-lib-sanity.sh
    :100644 100644 2a5e1f3... bcc0131... M  t/perf/perf-lib.sh

That’s dense. But we did ask for --oneline! So the
commit log has been summarized in single lines like this:

    561ae06 perf: export some important test-lib variables

And each of those are followed by the list of files that changed
with each commit:

    :100755 100755 f8dd536... cf8e1ef... M  t/perf/p0000-perf-lib-sanity.sh
    :100644 100644 bcc0131... 5580c22... M  t/perf/perf-lib.sh

That’s file mode bits, before and after the commit, the SHA1s of
each blob before and after the commit, a status letter
( M here means modified content or mode bits), and
finally the path of the blob that changed.

Although the previous example defaulted the branch reference to
master, you could pick anything of
interest, or explicitly request the
set of changes that were just fetched:

    $ git whatchanged ORIG_HEAD..HEAD

You can also limit the output to the set of changes that affect a
named file:

    $ cd /usr/src/linux
    $ git pull

    $ git whatchanged ORIG_HEAD..HEAD --oneline Makefile
    fde7d90 Linux 3.3-rc7
    :100644 100644 66d13c9... 56d4817... M  Makefile
    192cfd5 Linux 3.3-rc6
    :100644 100644 b61a963... 66d13c9... M  Makefile

The workhorse behind this output is git diff-tree. Grab yourself a caffeinated
beverage prior to reading that
manual page.

Cleaning Up

Everyone enjoys a clean and tidy directory structure now and
then! To help you achieve repository directory nirvana, the git clean command may be used to remove
untracked files from your working tree.

Why bother? Perhaps cleaning is part of an iterative build process
that reuses the same directory for repeated builds but needs to have
generated files cleaned out each time. (Think make clean.)

By default, git clean just
removes all files that are not under version control
from the current directory and down through your directory structure.
Untracked directories are considered slightly more valuable than plain files and are left in place unless you supply the
-d option.

Furthermore, for the purposes of this command, Git uses a slightly
more conservative concept of under version control. Specifically, the
manual page uses the phrase files that are unknown to Git
for a good reason: even files that are mentioned in the .gitignore and .git/info/exclude files are actually known to
Git. They represent files that are not version controlled, but Git does
know about them. And because those files are called
out in the .gitignore files, they
must have some known (to you) behavior that shouldn’t be disturbed by Git.
So Git won’t clean out the ignored files unless you explicitly request it
with the -x option.

Naturally, the -X option causes the inverse
behavior: namely, only files explicitly ignored by Git are removed. So choose the
files that are important to you carefully.

If you are skittish, do a --dry-run
first.

Using git-grep to Search a Repository

You may recall from Using Pickaxe
that I introduced the pickaxe option (spelled
-S string) for the git log command, and then in git diff with Path Limiting, I showed it in use
with the git diff command. It searches
back through a branch’s history of commit changes for commits that
introduce or remove occurrences of a given string or regular
expression.

Another command that can be used to search a repository is git grep. Rather than searching each commit’s
changes to a branch, the git grep
command searches the content of files within a repository. Because
git grep is really a generic Swiss Army
knife with a multitude of options, it is more accurate to say that
git grep searches for text patterns in
tracked blobs (i.e., files) of the work tree, blobs cached in the index,
or blobs in specified trees. By default, it just searches the tracked
files of the working tree.

Thus, pickaxe can be used to
search a series of commit differences, whereas git grep can be used to search the repository
tree at a specific point in that history.

Want to do some ego surfing in a repository? Sure you do. Let’s go
get the Git source repository and find out![45]

    $ cd /tmp
    $ git clone git://github.com/gitster/git.git

    Cloning into 'git'...
    remote: Counting objects: 129630, done.
    remote: Compressing objects: 100% (42078/42078), done.
    Receiving objects: 100% (129630/129630), 28.51 MiB | 1.20 MiB/s, done.
    remote: Total 129630 (delta 95231), reused 119366 (delta 85847)
    Resolving deltas: 100% (95231/95231), done.

    $ cd git

    $ git grep -i loeliger
    Documentation/gitcore-tutorial.txt:Here is an ASCII art by Jon Loeliger
    Documentation/revisions.txt:Here is an illustration, by Jon Loeliger.
    Documentation/user-manual.txt:Here is an ASCII art by Jon Loeliger

    $ git grep jdl
    Documentation/technical/pack-heuristics.txt:  <jdl> What is a "thin" pack?

Ever wonder where the documentation for the git-grep command itself is located? What files
in the git.git even mention git-grep
by name? Do you even know where it is located? Here’s how you can find
out:

    # Still in the /tmp/git repository

    $ git grep -l git-grep
    .gitignore
    Documentation/RelNotes/1.5.3.6.txt
    Documentation/RelNotes/1.5.3.8.txt
    Documentation/RelNotes/1.6.3.txt
    Documentation/git-grep.txt
    Documentation/gitweb.conf.txt
    Documentation/pt_BR/gittutorial.txt
    Makefile
    command-list.txt
    configure.ac
    gitweb/gitweb.perl
    t/README
    t/perf/p7810-grep.sh

A few things to note here: git-grep supports many of the normal command
line options to the traditional grep
tool, such as -i for case insensitive searches,
-l for a list of just the matching file names,
-w for word matching, etc. Using the --
separator option, you can limit the paths or directories that Git will
search. To limit the search to the occurrence within the Documentation/ directory, do something like
this:

    # Still in the /tmp/git repository

    $ git grep -l git-grep -- Documentation
    Documentation/RelNotes/1.5.3.6.txt
    Documentation/RelNotes/1.5.3.8.txt
    Documentation/RelNotes/1.6.3.txt
    Documentation/git-grep.txt
    Documentation/gitweb.conf.txt
    Documentation/pt_BR/gittutorial.txt

Using the --untracked option, you can also search
for patterns in untracked (but not ignored) files that have neither been
added to the cache nor committed as part of the repository history. This
option may come in handy if you are developing some feature and have
started adding new files but haven’t yet committed them. A default
git grep wouldn’t search there, even
though your past experience with the traditional grep command might lead you to believe that all
files in your working directory (and possibly its subdirectories) would
otherwise be searched.

So why even bother introducing the git
grep
in the first place? Isn’t the traditional shell tool
sufficient? Yes and no.

There are several benefits to building the git grep command directly into the Git toolset.
First, speed and simplicity. Git doesn’t have to completely check out a
branch in order to do the search; it can operate directly on the objects
from the object store. You don’t have to write some script to check out a
commit from way back in time, then search those files, then restore your
original checked out state. Second, Git can offer enhanced features and
options by being an integrated tool. Notably, it offers searches that are
limited to tracked files, untracked files, files cached in the index,
ignored or excluded files, variations on searching snapshots from the
repository history, and repository-specific pathspec limiters.

Updating and Deleting refs

Way back in refs and symrefs, I
introduced the concept of a ref and mentioned Git also had several
symbolic refs that it maintained. By now, you should be familiar with
branches as refs, how they are maintained under the .git directory, and that the symbolic refs are
also maintained there. Somewhere in there a bunch of SHA1 values exist,
get updated, shuffled around, deleted, and referenced by other
refs.

Occasionally, it is nice or even necessary to directly change or
delete a ref. If you know exactly what you are doing, you could manipulate
all of those files by hand. But if you don’t do it correctly, it is easy
to mess things up.

To ensure that the basic ref manipulations are done
properly, Git supplies the command git
update-ref
. This command understands all of the nuances of refs,
symbolic refs, branches, SHA1 values, logging changes, the reflog, etc. If
you need to directly change a ref’s value, you should use a command
like:

    $ git update-ref someref SHA1

where someref is the name of a branch or
ref to be updated to the new value, SHA1.
Furthermore, if you want to delete a ref, the proper way to do so
is:

    $ git update-ref -d someref

Of course, the normal branch operations might be more appropriate,
but if you find yourself directly changing a ref, using git update-ref ensures that all of the
bookkeeping for Git’s infrastructure is done properly, too.

Following Files that Moved

If, over the history of a file, it is moved from one place
to another within your repository directory structure, Git will usually
only trace back over its history using its current name.

To see the complete history of the file, even across moves, use the
--follow as well. For example, the following command
shows the commit log for a file currently named file, but includes the log for its prior names
as well:

    $ git log –-follow file

Add the --name-only option to have Git also state
the name of that file as it changes:

    $ git log –-follow --name-only file

In the following example, file a is first added in the directory foo and then moved to directory bar:

    $ git init
    $ mkdir foo
    $ touch foo/a
    $ git add foo/a
    $ git commit -m "First a in foo" foo/a
    $ mkdir bar
    $ git mv foo/a bar/a
    $ git commit -m "Move foo/a to bar/a"

At this point, a simple git log
bar/b
will show only the commit that created file bar/a, but adding option
--follow will trace back through its name changes,
too:

    $ git log --oneline bar/a
    6a4115b Move foo/a to bar/a

    $ git log --oneline --follow bar/a
    6a4115b Move foo/a to bar/a
    1862781 First a in foo

If you want to use its original name, you have to work harder
because only the current name of the file, bar/a, is able to be referenced normally.
Adding option -- and then any of its current or former
names will work. And adding --all will produce a comprehensive search as if all refs were
searched, too.

    $ git log --oneline foo/a
    fatal: ambiguous argument 'foo/a': unknown revision or path not in the
           working tree.
    Use '--' to separate paths from revisions

    $ git log --oneline -- foo/a
    6a4115b Move foo/a to bar/a
    1862781 First a in foo

Keep, But Don’t Track, This File

A common developer problem, described here by Bart Massey,
arises with Makefiles and other
configuration files: the version that the developer works with locally may
be customized in ways that are not intended to be visible upstream. For
example, I commonly change my Makefile CFLAGS from -Wall -g
-O2
to -Wall -g -pg during
development. Of course, I also
change the Makefile in ways that
should be visible upstream, such as adding new targets.

I could maintain a separate local development branch, which differs
only in the Makefile. Whenever I make a change, I
could merge back to master and push
upstream. I’d have to do an interactive merge in order to omit my custom
CFLAGS (while maybe merging other
changes). This seems hard and error prone.

Another solution would be to implement some form of Makefile snippet that provided local overrides
for certain variable settings. But this approach is highly specific where
an otherwise general problem remains.

It turns out that git update-index
–assume-unchanged Makefile
will leave the Makefile in the repository, but will
cause Git to assume that subsequent changes to the working copy are not to
be tracked. Thus, I can commit the version with the CFLAGS I want published, mark the Makefile with
--assume-unchanged
, and edit the CFLAGS to correspond to my development version.
Now, subsequent pushes and commits will ignore the Makefile. Indeed, git
add Makefile
will report an error when the Makefile is marked
--assume-unchanged.

When I want to make a published change to my Makefile, I can proceed via:

   $ git update-index --no-assume-unchanged Makefile
   $ git add -p Makefile

   #  [add the Makefile changes I want published]
   $ git commit
   $ git update-index --assume-unchanged Makefile
   $ git push

This work flow does require that I remember to perform the previous
steps when I want a Makefile change
published. But that is relatively infrequent. Further, initially
forgetting carries a low price tag: I can always do it later.

Have You Been Here Before?

Ever have that feeling you’ve worked through a complex merge or
rebase over and over again? Are you getting tired of it yet? Do you wish
there was some way to automate it?

I thought so. And so did the Git developers!

Git has a feature named rerere that automates the chore of
solving the same merge or rebase conflicts repeatedly. The seemingly
alliterative name is a shortening of reuse recorded resolution. Sometimes
long development cycles that use a branch to hold a line of development
that undergoes many development iterations before finally being merged
into a mainline development will have to be rebased or moved through the
same set of conflicts and resolutions many times.

To enable and use the git
rerere
command, you must first set the Boolean rerere.enabled option to true.

    $ git config --global rerere.enabled true

Once enabled, this feature records the right and left side of a
merge conflict in the .git/rr-cache
directory
and, if resolved, also records the manual resolution
to that conflict. If the same conflict is seen again, the automatic
resolution engages and preemptively solves the conflict.

When rerere is enabled and participates in a merge, it will prevent
autocommitting of the merge, giving the opportunity to review the
automatic conflict resolution before making it a part of the commit
history.

Rerere has only one prominent shortcoming: the nonportability of the
.rr-cache directory. Conflict and
resolution recording happens on a per clone basis and is not transmitted
in push or pull operations.


[44] No, that’s not a typo. See Have You Been Here Before?.

[45] I both elided an obsolete name reference, and shortened the
actual output lines for this example. Oh, and apparently I’m a closet
Git artist!

Comments are closed.

loading...