Git – Remote Repositories

Installing Apache On CentOS 8

So far, you’ve worked almost entirely within one local
repository. Now it’s time to explore the much lauded distributed features of
Git and learn how to collaborate with other developers via shared
repositories.

Working with multiple and remote repositories adds a few new terms to
the Git vernacular.

A clone is a copy of a repository. A
clone contains all the objects from the original; as a result, each clone is
an independent and autonomous repository and a true, symmetric peer of the
original. A clone allows each developer to work locally and independently
without centralization, polls, or locks. Ultimately, it’s cloning that
allows Git to easily scale and permit many geographically separated
contributors.

Essentially, separate repositories are useful whenever:

  • Developers work autonomously.

  • Developers are separated by a wide area network. A cluster of
    developers in the same location may share a local repository to amass
    localized changes.

  • A project is expected to diverge significantly along separate
    development paths. Although the regular branching and merging mechanisms
    demonstrated in previous chapters can handle any amount of separate
    development, the resulting complexity may become more trouble than it’s
    worth. Instead, separate development paths can use separate repositories
    to be merged again whenever appropriate.

Cloning a repository is just the first step in sharing code. You must
also relate one repository to another to establish paths for data exchange.
Git establishes these repository connections through
remotes.

A remote is a reference, or handle, to another
repository through a filesystem or network path. You use a remote as a
shorthand name for an otherwise lengthy and complicated Git URL. You can
define any number of remotes in a repository, thus creating terraced
networks of repository sharing.

Once a remote is established, Git can transfer data from one
repository to another using either a push or a pull model. For example, it’s
common practice to occasionally transfer commit data from an original
repository to its clone in order to keep the clone in sync. You can also
create a remote to transfer data from the clone to its original or configure
the two to exchange information bidirectionally.

To keep track of data from other repositories, Git uses
remote-tracking branches. Each remote-tracking branch
in your repository is a branch that serves as a proxy for a specific branch
in a remote repository. You may set up a local-tracking
branch
that forms the basis for integrating your local changes
with the remote changes from a corresponding remote-tracking branch.

Finally, you can make your repository available to others. Git
generally refers to this as publishing a repository
and provides several techniques for doing so.

This chapter presents examples and techniques to share, track, and
obtain data across multiple repositories.

Repository Concepts

Bare and Development Repositories

A Git repository is either a bare
or a development ( nonbare)
repository.

A development repository is used for normal, daily development. It
maintains the notion of a current
branch and provides a checked out copy of the current branch in a
working directory. All of the repositories mentioned in the book so far
have been development
repositories.

In contrast, a bare repository has no working directory and
shouldn’t be used for normal development. A bare repository has no
notion of a checked out branch, either. Think of a bare repository as
simply the contents of the .git directory. In other
words, you shouldn’t make commits in a bare repository.

A bare repository might seem to be of little use, but its role is
crucial: to serve as an authoritative focal point for collaborative
development. Other developers clone
and fetch from the bare repository
and push updates to it. We’ll work
through an example later in this chapter that shows how all this works
together.

If you issue git clone
with the --bare option, Git creates a bare repository;
otherwise, a development repository is created.

Tip

Notice that we did not say that git
clone –bare
creates a new or empty repository. We said it
creates a bare repository. And that newly cloned
repository will contain a copy of the content from the upstream
repository. The command git init
creates a new and empty repository, and that new repository can come
in both development and bare
variants. Also, be aware of how the --bare flag
affects the directory that is initialized:

$ cd /tmp
$ git init fluff2
Initialized empty Git repository in /tmp/fluff2/.git/
$ git init --bare fluff
Initialized empty Git repository in /tmp/fluff/

By default, Git enables a reflog (a record
of changes to refs) on development repositories but not on bare
repositories. This again anticipates that development will take place in
the former and not in the latter. By the same reasoning, no remotes are
created in a bare repository.

If you set up a repository into which developers push changes, it
should be bare. In effect, this is a special case of the more general
best practice that a published repository should be bare.

Repository Clones

The git clone command
creates a new Git repository based on the original you specify via a
filesystem or network address. Git doesn’t have to copy all the
information in the original to the clone. Instead, Git ignores
information that is pertinent only to the original repository, such as
remote-tracking branches.

In normal git clone
use, the local, development branches of the original repository, stored
within refs/heads/, become
remote-tracking branches in the new clone under
refs/remotes/. Remote-tracking
branches within refs/remotes/ in
the original repository are not cloned. (The clone doesn’t need to know
what, if anything, the upstream repository is in turn tracking.)

Tags from the original repository are copied into the clone, as
are all objects that are reachable from the copied refs. However, repository-specific information such
as hooks (see Chapter 15), configuration files, the
reflog, and the stash of the original repository are not reproduced in
the clone.

In Making a Copy of Your Repository of Chapter 3, we showed how git clone can be used to create a copy of your
public_html repository:

    $ git clone public_html my_website

Here, public_html is
considered the original, remote repository. The new,
resulting clone is my_website.

Similarly, git clone can be
used to clone a copy of a repository from network sites:

    # All on one line...
    $ git clone \
        git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

By default, each new clone maintains a link back to its
parent repository via a remote called origin.
However, the original repository has no knowledge of—nor does it
maintain a link to—any clone. It is purely a one-way
relationship.[25]

The name origin isn’t special in any way. If you
don’t want to use it, simply specify an alternate with the
--origin name option during
the clone operation.

Git also configures the default origin remote with a default fetch refspec:

    fetch = +refs/heads/*:refs/remotes/origin/*

Establishing this refspec anticipates that you want to continue
updating your local repository by fetching changes from the originating
repository. In this case, the remote repository’s
branches are available in the clone on branch names prefixed with
origin/, such as
origin/master, origin/dev, or origin/maint.

Remotes

The repository you’re currently working in is called the
local or current
repository, and the repository with which you exchange files is called
the remote repository. But the latter term is a
bit of a misnomer, because the repository may or may not be on a
physically remote or even different machine; it could conceivably be
just another repository on a local filesystem. In Chapter 13, I discuss how the term
upstream repository is usually used to identify
the remote repository from which your local repository is derived via a
clone operation.

Git uses both the remote and the remote-tracking branch to
reference and facilitate the connection to another repository. The
remote provides a friendly name for the repository and can be used in
place of the actual repository URL. A remote also
forms part of the name basis for the remote-tracking branches for that
repository.

Use the git remote
command to create, remove, manipulate, and view a remote. All the
remotes you introduce are recorded in the .git/config file and can be manipulated using
git config.

In addition to git clone, other
common Git commands that refer to remote repositories are:

git fetch

Retrieves objects and their related metadata from a remote
repository.

git pull

Like git fetch,
but also merges changes into a corresponding local branch.

git push

Transfers objects and their related metadata to a
remote repository.

git ls-remote

Shows a list of references held by a given remote
(on an upstream server). This command indirectly answers the
question Is an update available?

Tracking Branches

Once you clone a repository, you can keep up with changes
in the original source repository even as you make local commits and
create local branches.

As Git itself has evolved, some terminology around branch names
have also evolved and become more standard. To help clarify the purposes
of the various branches, different namespaces have been created.
Although any branch in your local repository is still considered a local
branch, they can be further divided into different categories.

  • Remote-tracking branches are
    associated with a remote and have the specific purpose of following
    the changes of each branch in that remote repository.

  • A local-tracking branch is
    paired with a remote-tracking branch. It is a form of integration
    branch that collects both the changes from your local development
    and the changes from the remote-tracking branch.

  • Any local, nontracking branch is usually generically
    called a topic or development
    branch
    .

  • Finally, to complete the namespaces, a
    remote branch is a branch located in a
    nonlocal, remote repository. It is likely an upstream source for a
    remote-tracking branch.

During a clone operation, Git creates a remote-tracking branch in
the clone for each topic branch in the upstream repository. The set of
remote-tracking branches is introduced in a new, separate namespace
within the local repository that is specific to the remote being cloned.
They are not branches in a remote repository. The local repository uses
its remote-tracking branches to follow or track changes made in the
remote repository.

Tip

You may recall from refs and symrefs of Chapter 6 that a local topic branch that you call
dev is really named refs/heads/dev. Similarly, remote-tracking
branches are retained in the refs/remotes/ namespace. Thus, the
remote-tracking branch origin/master is actually refs/remotes/origin/master.

Because remote-tracking branches are lumped into their own
namespace, there is a clear separation between branches made in a
repository by you ( topic branches) and those
branches that are actually based on another, remote repository
( remote-tracking branches). In the early Git
days, the separate namespaces were just convention and best practice,
designed to help prevent you from making accidental conflicts. With
later versions of Git, the separate namespaces are much more than
convention: it is an integral part
of how you are expected to use your branches to interact with your
upstream repositories.

All the operations that you can perform on a regular topic branch
can also be performed on a tracking branch. However, there are some
restrictions and guidelines to observe.

Because remote-tracking branches are used exclusively to follow
the changes from another
repository, you should effectively treat them as read only. You
shouldn’t merge or make commits onto a remote-tracking branch. Doing so
would cause your remote-tracking branch to become out of sync with the
remote repository. Worse, each future update from the remote repository
would likely require merging, making your clone increasingly more
difficult to manage. The proper management of tracking branches is
covered in more detail later in this chapter.

Referencing Other Repositories

To coordinate your repository with another repository, you
define a remote, which here means a named entity
stored in the config file of a repository. It consists of two different
parts. The first part states the name of the other repository in the form
of a URL. The second part, called a refspec,
specifies how a ref (which usually represents a branch) should be mapped
from the namespace of one repository into the namespace of the other
repository.

Let’s look at each of these components in turn.

Referring to Remote Repositories

Git supports several forms of Uniform Resource
Locators
(URLs) that can be used to name remote
repositories. These forms specify both an access protocol and the
location or address of the data.

Technically, Git’s forms of URLs are neither true URLs
nor Uniform Resource Identifiers (URIs), because none entirely conform
to RFC 1738 or RFC 2396, respectively. However, because of their
versatile utility in naming the location of Git repositories, Git’s
variants are usually referred to as Git URLs.
Furthermore, the .git/config file
uses the name url as well.

As you have seen, the simplest form of Git URL refers to a
repository on a local filesystem, be it a true physical filesystem or a
virtual filesystem mounted locally via the Network File System (NFS).
There are two permutations:

    /path/to/repo.git
    file:///path/to/repo.git

Although these two formats are essentially identical, there is a
subtle but important distinction between the two. The former uses hard
links within the filesystem to directly share exactly the same objects
between the current and remote repository; the latter copies the objects
instead of sharing them directly. To avoid issues associated with shared
repositories, the file:// form is
recommended.

The other forms of the Git URL refer to repositories on remote
systems.

When you have a truly remote repository whose data must be
retrieved across a network, the most efficient form of data transfer is
often called the Git native protocol, which
refers to the custom protocol used internally by Git to transfer data.
Examples of a native protocol URL include:

    git://example.com/path/to/repo.git
    git://example.com/~user/path/to/repo.git

These forms are used by git-daemon to publish repositories for
anonymous read. You can both clone and fetch using these URL
forms.

The clients that use these formats are not authenticated, and no
password will be requested. Hence,
whereas a ~user format can be
employed to refer to a user’s home directory, a bare ~ has no context for an expansion; there is
no authenticated user whose home directory can be used. Furthermore, the
~user form works only if the server
side allows it with the --user-path option.

For secure, authenticated connections, the Git native
protocol can be tunneled over Secure Shell(SSH) connection using the
following URL templates:

    ssh://[user@]example.com[:port]/path/to/repo.git
    ssh://[user@]example.com/path/to/repo.git
    ssh://[user@]example.com/~user2/path/to/repo.git
    ssh://[user@]example.com/~/path/to/repo.git

The third form allows for the possibility of two different user
names. The first is the user under whom the session is authenticated,
and the second is the user whose home directory is accessed.

Git also supports a URL form with scp-like syntax. It’s identical to the SSH
forms, but there is no way to specify a port parameter:

    [user@]example.com:/path/to/repo.git
    [user@]example.com:~user/path/to/repo.git
    [user@]example.com:path/to/repo.git

Although the HTTP and HTTPS URL variants have been fully supported
since the early days of Git, they have undergone some important changes
after Version 1.6.6.

    http://example.com/path/to/repo.git
    https://example.com/path/to/repo.git

Prior to Git Version 1.6.6, neither the HTTP nor the
HTTPS protocols were as efficient as the Git native protocol. In Version
1.6.6, the HTTP protocols were improved dramatically and have become essentially
as efficient as the native Git protocols. Git literature refers to this implementation
as smart in contrast to the prior, so-called
dumb implementation.

With the HTTP efficiency benefit realized now, the utility
of the http:// and https:// URL forms will likely become more
important and popular. Notably, most corporate firewalls allow the HTTP
port 80 and HTTPS port 443 to remain open while the default Git port
9418 is typically blocked and would require an act of Congress to open
it. Furthermore, these URL forms are being favored by popular Git
hosting sites like GitHub.

Finally, the Rsync protocol can be specified:

    rsync://example.com/path/to/repo.git

The use of Rsync is discouraged because it is inferior to the
other options. If absolutely necessary, it should be used only for an
initial clone, at which point the remote repository reference should be
changed to one of the other mechanisms. Continuing to use the Rsync
protocol for later updates may lead to the loss of locally created
data.

The refspec

In refs and symrefs of Chapter 6, I explained how the
ref, or reference, names a
particular commit within the history of the repository. Usually a ref is
the name of a branch. A refspec maps branch names
in the remote repository to branch names within your local
repository.

Because a refspec must simultaneously name branches from the local
repository and the remote repository, complete branch names are common
in a refspec and are often required. In a
refspec, you typically see the names of
development branches with the refs/heads/ prefix and the names of
remote-tracking branches with the refs/remotes/ prefix.

The syntax of a refspec is:

    [+]source:destination

It consists primarily of a source
ref
, a colon ( :), and a
destination ref. The whole format may be prefixed
with an optional plus sign ( +). If
present, the plus sign indicates that the normal fast-forward safety
check will not be performed during the transfer. Furthermore, an
asterisk ( *) allows a limited form of
wildcard matching on branch names.

In some uses, the source ref is
optional; in others, the colon and
destination ref are optional.

Refspecs are used by both git
fetch
and git push. The
trick to using a refspec is to understand the data flow it specifies.
The refspec itself is always
source: destination,
but the roles of source and
destination depend on the Git operation being
performed. This relationship is summarized in Table 12-1.

Table 12-1. Refspec data flow
Operation Source Destination
push Local ref being pushed Remote ref being updated
fetch Remote ref being fetched Local ref being updated

A typical git fetch
command uses a refspec such as:

    +refs/heads/*:refs/remotes/remote/*

This refspec might be paraphrased as follows:

All the source branches from a remote repository in namespace
refs/heads/ are (i) mapped into
your local repository using a name constructed from the
remote name and (ii) placed under the
refs/remotes/ remote
namespace.

Because of the asterisks, this refspec applies to multiple
branches as found in the remote’s
refs/heads/*. It is
exactly this specification that causes the remote’s
topic branches to be mapped into your repository’s namespace as
remote-tracking branches and separates them into subnames based on the
remote name.

Although not mandatory, it is convention and common best practice
to place the branches for a given remote
under refs/remotes/ remote/*.

Tip

Use git show-ref to
list the references within your current repository. Use git ls-remote
repository to list the references in a
remote repository.

Because git pull’s
first step is fetch, the fetch
refspecs apply equally to git
pull
.

Tip

You should not make commits or merges onto a remote-tracking
branch identified on the righthand side of a pull or fetch refspec. Those refs will be used as
remote-tracking branches.

During a git push
operation, you typically want to provide and publish the changes you
made on your local topic branches. To allow others to find your changes
in the remote repository after you upload them, your changes must appear
in that repository as topic branches. Thus, during a typical git push command, the source branches from
your repository are sent to the remote repository using a refspec such
as:

    +refs/heads/*:refs/heads/*

This refspec can be paraphrased as follows:

From the local repository, take each branch name found under the
source namespace refs/heads/ and
place it in a similarly named, matching branch under the destination
namespace refs/heads/ in the remote
repository.

The first refs/heads/ refers to
your local repository (because you’re executing a push), and the second
refers to the remote repository. The asterisks ensure that all branches
are replicated.

Multiple refspecs may be given on the git
fetch
and git push command
lines. Within a remote definition, multiple fetch refspecs, multiple
push refspecs, or a combination of both may be specified.

What if you don’t specify a refspec at all on a git push command? How does Git know what to do
or where to send data?

First, without an explicit remote given to the command, Git
assumes you want to use origin.
Without a refspec, git push will send
your commits to the remote for all branches that are common between your
repository and the upstream repository. Any local branch that is not
already present in the upstream repository will not be sent upstream;
branches must already exist and match names. Thus, new branches must be
explicitly pushed by name. Later they can be defaulted with a simple
git push. Thus, the default refspec
makes the following two commands equivalent:

    $ git push origin branch
    $ git push origin branch:refs/heads/branch

For examples, see Adding and Deleting Remote Branches.

Example Using Remote Repositories

Now you have the basis for some sophisticated sharing via
Git. Without a loss of generality and to make examples easy to run on your
own system, this section shows multiple repositories on one physical
machine. In real life, they’d probably be located on different hosts
across the Internet. Other forms of remote URL specification may be used
because the same mechanisms apply to repositories on physically disparate
machines as well.

Let’s explore a common use scenario for Git. For the sake of
illustration, let’s set up a repository that all developers consider
authoritative, although technically it’s no different from other
repositories. In other words, authority lies in how everyone agrees to
treat the repository, not in some technical or security measure.

This agreed on authoritative copy is often placed in a
special directory known as a depot. (Avoid using
the terms master or repository when
referring to the depot, because those idioms mean something else in
Git.)

There are often good reasons for setting up a depot. For instance,
your organization may thereby reliably and professionally back up the
filesystems of some large server. You want to encourage your coworkers to
check everything into the main copy within the depot in order to avoid
catastrophic losses. The depot will be the remote
origin
for all developers.

The following sections show how to place an initial repository in
the depot, clone development repositories out of the depot, do development
work within them, and then sync them with the depot.

To illustrate parallel development on this repository, a second
developer will clone it, work with his repository, and then push his
changes back into the depot for all to use.

Creating an Authoritative Repository

You can place your authoritative depot anywhere on your
filesystem; for this example, let’s use /tmp/Depot. No actual development work should
be done directly in the /tmp/Depot
directory or in any of its repositories. Instead, individual work should
be performed in a local clone.

In practice, this authoritative upstream repository would likely
already be hosted on some server, perhaps GitHub, git.kernel.org, or one of your private
machines.

These steps, however, outline what is necessary to transform a
repository into another bare clone repository capable of being the
authoritative upstream source repository.

The first step is to populate /tmp/Depot with an initial repository.
Assuming you want to work on website content that is already established
as a Git repository in ~/public_html, make a copy of the ~/public_html repository and place it in
/tmp/Depot/public_html.git.

    # Assume that ~/public_html is already a Git repository

    $ cd /tmp/Depot/
    $ git clone --bare ~/public_html public_html.git
    Initialized empty Git repository in /tmp/Depot/public_html.git/

This clone command
copies the Git remote repository from ~/public_html into the current working
directory, /tmp/Depot. The last
argument gives the repository a new name, public_html.git. By convention, bare
repositories are named with a .git
suffix. This is not a requirement, but it is considered a best
practice.

The original development repository has a full set of project
files checked out at the top level, and the object store and all of the
configuration files are located in the .git subdirectory:

    $ cd ~/public_html/
    $ ls -aF 
    ./   fuzzy.txt  index.html  techinfo.txt
    ../  .git/      poem.html

    $ ls -aF .git
    ./              config       hooks/  objects/
    ../             description  index   ORIG_HEAD
    branches/       FETCH_HEAD   info/   packed-refs
    COMMIT_EDITMSG  HEAD         logs/   refs/

Because a bare repository has no working directory, its
files have a simpler layout:

    $ cd /tmp/Depot/

    $ ls -aF public_html.git
    ./   branches/  description  hooks/  objects/     refs/
    ../  config     HEAD         info/   packed-refs

You can now treat this bare /tmp/Depot/public_html.git repository as the
authoritative version.

Because you used the --bare option during this
clone operation, Git did not introduce the normal,
default origin remote.

Here’s the configuration in the new, bare repository:

    # In /tmp/Depot/public_html.git

    $ cat config
    [core]
            repositoryformatversion = 0
            filemode = true
            bare = true

Make Your Own Origin Remote

Right now, you have two repositories that are virtually
identical, except the initial repository has a working directory and the
bare clone does not.

Moreover, because the ~/public_html repository in your home
directory was created using git init
and not via a clone, it lacks an origin. In fact, it has no remote configured
at all.

It is easy enough to add one, though. And it’s needed if the goal
is to perform more development in your initial repository and then push
that development to the newly established, authoritative repository in
the depot. In a sense, you must manually convert your initial repository
into a derived clone.

A developer who clones from the depot will have an origin remote created automatically. In fact,
if you were to turn around now and clone off the depot, you would see it
set up for you automatically, too.

The command for manipulating remotes is git remote. This operation introduces a few
new settings in the .git/config
file:

    $ cd ~/public_html

    $ cat .git/config
    [core]
            repositoryformatversion = 0
            filemode = true
            bare = false
            logallrefupdates = true

    $ git remote add origin /tmp/Depot/public_html

    $ cat .git/config
    [core]
            repositoryformatversion = 0
            filemode = true
            bare = false
            logallrefupdates = true
    [remote "origin"]
            url = /tmp/Depot/public_html
            fetch = +refs/heads/*:refs/remotes/origin/*

Here, git remote added a new
remote section called origin to our configuration. The name origin isn’t magical or special. You could
have used any other name, but the remote that points back to the basis
repository is named origin by
convention.

The remote establishes a link from your current repository
to the remote repository found, in this case, at /tmp/Depot/public_html.git as recorded in the
url value. As a convenience, the
.git suffix is not required; both
/tmp/Depot/public_html and /tmp/Depot/public_html.git will
work. Now, within this repository, the name origin can be used as a shorthand
reference for the remote repository found in the depot. Note that a
default fetch refspec that follows branch name mapping conventions has
also been added.

The relationship between a repository that contains a
remote reference (the referrer) and that remote repository (the referee)
is asymmetric. A remote always points in one direction from referrer to
referee. The referee has no idea that some other repository points to
it. Another way to say this is as follows: a clone knows where its
upstream repository is, but the upstream repository doesn’t know where
its clones are.

Let’s complete the process of setting up the origin remote by establishing new
remote-tracking branches in the original repository to represent the
branches from the remote repository. First, you can see that there is
only one branch, as expected, called master.

    # List all branches

    $ git branch -a
    * master

Now, use git remote
update
:

    $ git remote update
    Updating origin
    From /tmp/Depot/public_html
     * [new branch]      master     -> origin/master

    $ git branch -a
    * master
      origin/master

Depending on your version of Git,[26] the remote-tracking branch ref may be shown with or without the remotes/ prefix:

    $ git branch -a
    * master
      remotes/origin/master

Git introduced a new branch called origin/master into the repository. It is a
remote-tracking branch within the origin remote. Nobody does development in this
branch. Instead, its purpose is to hold and track the commits made in
the remote origin repository’s
master branch. You could consider it
your local repository’s proxy for commits made in the remote; eventually
you can use it to bring those commits into your repository.

The phrase Updating
origin
, produced by the git
remote update
, doesn’t mean that the
remote repository was updated. Rather, it means
that the local repository’s notion of the origin has been updated based on information
brought in from the remote repository.

Tip

The generic git remote update
caused every remote within this repository to be updated by checking
for and then fetching any new commits from each repository named in a
remote. Rather than generically updating all remotes, you can restrict
the operation to fetch updates from a single remote by supplying the
desired remote name on the git remote
update
command:

    $ git remote update remote_name

Also, using the -f option when the remote is
initially added causes an immediate fetch of from that remote repository:

    $ git remote add -f origin repository

Now you’re done linking your repository to the remote repository
in your depot.

Developing in Your Repository

Let’s do some development work in the repository and add
another poem, fuzzy.txt:

    $ cd ~/public_html

    $ git show-branch -a
    [master] Merge branch 'master' of ../my_website

    $ cat fuzzy.txt
    Fuzzy Wuzzy was a bear
    Fuzzy Wuzzy had no hair
    Fuzzy Wuzzy wasn't very fuzzy,
    Was he?

    $ git add fuzzy.txt
    $ git commit
    Created commit 6f16880: Add a hairy poem.
     1 files changed, 4 insertions(+), 0 deletions(-)
     create mode 100644 fuzzy.txt

    $ git show-branch -a
    * [master] Add a hairy poem.
     ! [origin/master] Merge branch 'master' of ../my_website
    --
    *  [master] Add a hairy poem.
    -- [origin/master] Merge branch 'master' of ../my_website

At this point, your repository has one more commit than the
repository in /tmp/Depot. Perhaps more interesting
is that your repository has two branches, one ( master) with the new commit on it, and the
other ( origin/master) that is
tracking the remote repository.

Pushing Your Changes

Any change that you commit is completely local to your
repository; it is not yet present in the remote repository. A convenient
way to get your commits from your master branch into the origin remote repository is to use the
git push command. Depending on your
version of Git, the master parameter
on this command was assumed.

    $ git push origin master
    Counting objects: 4, done.
    Compressing objects: 100% (3/3), done.
    Writing objects: 100% (3/3), 400 bytes, done.
    Total 3 (delta 0), reused 0 (delta 0)
    Unpacking objects: 100% (3/3), done.
    To /tmp/Depot/public_html
       0d4ce8a..6f16880  master -> master

All that output means that Git has taken your master branch changes, bundled them up, and
sent them to the remote repository named origin. Git has also performed one more step
here: it has taken those same changes and added them to the origin/master branch in your
repository as well. In effect, Git has caused the changes that were
originally on your master branch to
be sent to the remote repository and then has requested that they be
brought back onto the origin/master
remote-tracking branch as well.

Git doesn’t actually round-trip the changes. After all, the
commits are already in your repository. Git is smart enough to instead
simply fast-forward the remote-tracking branch.

Now both local branches, master and origin/master, reflect the same commit within
your repository:

    $ git show-branch -a
    * [master] Add a hairy poem.
     ! [origin/master] Add a hairy poem.
    --
    *+ [master] Add a hairy poem.

You can also probe the remote repository and verify that it, too,
has been updated. If your remote repository is on a local filesystem, as
it is here, then you can easily check by going to the depot
directory:

    $ cd /tmp/Depot/public_html.git
    $ git show-branch
    [master] Add a hairy poem.

When the remote repository is on a physically different
machine, a plumbing command can be used to determine the branch
information of the remote repository:

    # Go to the actual remote repo and query it

    $ git ls-remote origin
    6f168803f6f1b987dffd5fff77531dcadf7f4b68        HEAD
    6f168803f6f1b987dffd5fff77531dcadf7f4b68        refs/heads/master

You can then show that those commit IDs match your current, local
branches using something like git rev-parse
HEAD
or git show
commit-id
.

Adding a New Developer

Once you have established an authoritative repository,
it’s easy to add a new developer to a project simply by letting him
clone the repository and begin working.

Let’s introduce Bob to the project by giving him his own cloned
repository in which to work:

    $ cd /tmp/bob
    $ git clone /tmp/Depot/public_html.git
    Cloning into 'public_html'...
    done.

    $ ls
    public_html
    $ cd public_html

    $ ls 
    fuzzy.txt  index.html  poem.html  techinfo.txt

    $ git branch
    * master

    $ git log -1
    commit 6f168803f6f1b987dffd5fff77531dcadf7f4b68
    Author: Jon Loeliger <jdl@example.com>
    Date:   Sun Sep 14 21:04:44 2008 -0500

        Add a hairy poem.

Immediately, you can see from ls that the clone has a working directory
populated with all the files under version control. That is, Bob’s clone
is a development repository, and not a bare repository. Good. Bob will
be doing some development, too.

From the git log
output, you can see that the most recent commit is available in Bob’s
repository. Additionally, because Bob’s repository was cloned from a
parent repository, it has a default remote called origin. Bob can find out more information
about the origin remote within his
repository:

    $ git remote show origin
    * remote origin
      URL: /tmp/Depot/public_html.git
      Remote branch merged with 'git pull' while on branch master
        master
      Tracked remote branch
        master

The complete contents of the configuration file after a default
clone show how it contains the origin
remote:

    $ cat .git/config
    [core]
            repositoryformatversion = 0
            filemode = true
            bare = false
            logallrefupdates = true
    [remote "origin"]
            url = /tmp/Depot/public_html.git
            fetch = +refs/heads/*:refs/remotes/origin/*
    [branch "master"]
            remote = origin
            merge = refs/heads/master

In addition to having the origin remote in his repository, Bob also has
a few branches. He can list all of the branches in his repository by
using git branch -a:

    $ git branch -a
    * master
      origin/HEAD
      origin/master

The master branch is
Bob’s main development branch. It is the normal, local topic branch. It
is also a local-tracking branch associated with the correspondingly
named master remote-tracking branch.
The origin/master branch is a
remote-tracking branch to follow the commits from the master branch of the origin repository. The origin/HEAD ref indicates which branch the
remote considers the active branch, through a symbolic name. Finally,
the asterisk next to the master
branch name indicates that it is the current, checked-out branch in his
repository.

Let’s have Bob make a commit that alters the hairy poem and then
push that to the main depot
repository. Bob thinks the last line of the poem should be
Wuzzy?, makes this change, and commits it:

    $ git diff

    diff --git a/fuzzy.txt b/fuzzy.txt
    index 0d601fa..608ab5b 100644
    --- a/fuzzy.txt
    +++ b/fuzzy.txt
    @@ -1,4 +1,4 @@
     Fuzzy Wuzzy was a bear
     Fuzzy Wuzzy had no hair
     Fuzzy Wuzzy wasn't very fuzzy,
    -Was he?
    +Wuzzy?

    $ git commit fuzzy.txt 
    Created commit 3958f68: Make the name pun complete!
     1 files changed, 1 insertions(+), 1 deletions(-)

To complete Bob’s development cycle, he pushes his changes
to the depot, using git push as
before:

    $ git push
    Counting objects: 5, done.
    Compressing objects: 100% (3/3), done.
    Writing objects: 100% (3/3), 377 bytes, done.
    Total 3 (delta 1), reused 0 (delta 0)
    Unpacking objects: 100% (3/3), done.
    To /tmp/Depot/public_html.git
       6f16880..3958f68  master -> master

Getting Repository Updates

Let’s suppose that Bob goes on vacation and, in the
meantime, you make further changes and push them to the depot
repository. Let’s assume you did this after getting Bob’s latest
changes.

Your commit looks like this:

    $ cd ~/public_html
    $ git diff
    diff --git a/index.html b/index.html
    index 40b00ff..063ac92 100644
    --- a/index.html
    +++ b/index.html
    @@ -1,5 +1,7 @@
     <html>
     <body>
     My web site is alive!
    +<br/>
    +Read a <a href="fuzzy.txt">hairy</a> poem!
     </body>
     <html>

    $ git commit -m "Add a hairy poem link." index.html 
    Created commit 55c15c8: Add a hairy poem link.
     1 files changed, 2 insertions(+), 0 deletions(-)

Using the default push refspec, push your commit upstream:

    $ git push
    Counting objects: 5, done.
    Compressing objects: 100% (3/3), done.
    Unpacking objects: 100% (3/3), done.
    Writing objects: 100% (3/3), 348 bytes, done.
    Total 3 (delta 1), reused 0 (delta 0)
    To /tmp/Depot/public_html
       3958f68..55c15c8  master -> master

Now, when Bob returns he’ll want to refresh his clone of
the repository. The primary command for doing this is git pull:

    $ git pull
    remote: Counting objects: 5, done.
    remote: Compressing objects: 100% (3/3), done.
    remote: Total 3 (delta 1), reused 0 (delta 0)
    Unpacking objects: 100% (3/3), done.
    From /tmp/Depot/public_html
       3958f68..55c15c8  master     -> origin/master
    Updating 3958f68..55c15c8
    Fast forward
     index.html |    2 ++
     1 files changed, 2 insertions(+), 0 deletions(-)

The fully specified git pull
command allows both the repository and multiple refspecs to be
specified: git pull
options repository
refspecs
.

If the repository is not specified on the command line, either as
a Git URL or indirectly through a remote name, then the default remote
origin is used. If you don’t specify
a refspec on the command line, the fetch refspec of the remote is used.
If you specify a repository (directly or using a remote) but no refspec,
Git fetches the HEAD ref of the
remote.

The git pull operation is
fundamentally two steps, each implemented by a separate Git command.
Namely, git pull implies git fetch followed by either git merge or git
rebase
. By default, the second step is merge because this is almost always the
desired behavior.

Because pull also
performs the second merge or rebase step, git
push
and git pull are not
considered opposites. Instead, git
push
and git fetch are
considered opposites. Both push and
fetch are responsible for
transferring data between repositories, but in opposite
directions.

Sometimes you may want to execute the git
fetch
and git merge as two
separate operations. For example, you may want to fetch updates into
your repository to inspect them but not necessarily merge immediately.
In this case, you can simply perform the fetch, and then perform other
operations on the remote-tracking branch such as git log, git
diff
, or even gitk. Later,
when you are ready (if ever!), you may perform the merge at your
convenience.

Even if you never separate the fetch and merge, you may do complex
operations that require you to know what’s happening at each step. So
let’s look at each one in detail.

The fetch step

In the first fetch step, Git
locates the remote repository. Because the command line did not
specify a direct repository URL or a direct remote name, it assumes
the default remote name, origin.
The information for that remote is in the configuration file:

    [remote "origin"]
            url = /tmp/Depot/public_html.git
            fetch = +refs/heads/*:refs/remotes/origin/*

Git now knows to use the URL /tmp/Depot/public_html as the source
repository. Furthermore, because
the command line didn’t specify a refspec, Git will use all of the
fetch = lines from the remote entry. Thus, every refs/heads/* branch from the remote will be
fetched.

Next, Git performs a negotiation protocol with the source
repository to determine what new commits are in the remote repository
and are absent from your repository, based on the desire to fetch all
of the refs/heads/* refs as given
in the fetch refspec.

Tip

You don’t have to fetch all of the topic branches from the
remote repository using the refs/heads/* wildcard form. If you want
only a particular branch or two, list them explicitly:

    [remote "newdev"]
            url = /tmp/Depot/public_html.git
            fetch = +refs/heads/dev:refs/remotes/origin/dev
            fetch = +refs/heads/stable:refs/remotes/origin/stable

The pull output prefixed by remote: reflects the negotiation,
compression, and transfer protocol, and it lets you know that new
commits are coming into your repository.

    remote: Counting objects: 5, done.
    remote: Compressing objects: 100% (3/3), done.
    remote: Total 3 (delta 1), reused 0 (delta 0)

Git places the new commits in your repository on an appropriate
remote-tracking branch and then tells you what mapping it uses to
determine where the new commits belong:

    From /tmp/Depot/public_html
       3958f68..55c15c8  master     -> origin/master

Those lines indicate that Git looked at the remote repository
/tmp/Depot/public_html, took
its master
branch, brought its contents back to your repository, and placed them
on your origin/master branch. This process is the
heart of branch tracking.

The corresponding commit IDs are also listed, just in case you
want to inspect the changes directly. With that, the fetch step is finished.

The merge or rebase step

In the second step of the pull operation, Git performs a merge (the default), or a rebase operation. In this example, Git
merges the contents of the remote-tracking branch, origin/master, into your local-tracking
branch, master, using a special
type of merge called a fast-forward.

But how did Git know to merge those particular branches? The
answer comes from the configuration file:

    [branch "master"]
            remote = origin
            merge = refs/heads/master

Paraphrased, this gives Git two key pieces of
information:

When master is the current,
checked out branch, use origin as
the default remote from which to fetch updates during a fetch (or pull). Further, during the merge step of git
pull
, use refs/heads/master from the remote as the
default branch to merge into this, the master branch.

For readers paying close attention to detail, the first part of
that paraphrase is the actual mechanism by which Git determines that
origin should be the remote used
during this parameterless git pull
command.

The value of the merge field
in the branch section of the
configuration file ( refs/heads/master) is treated like the
remote part of a refspec, and it must match one of the
source refs just fetched during the git pull command. It’s a little convoluted,
but think of this as a hint conveyed from the fetch step to the merge step of a pull command.

Because the merge
configuration value applies only during git
pull
, a manual application of git
merge
at this point must name the merge source branch on the
command line. The branch is likely a remote-tracking branch name, such
as this:

    # Or, fully specified: refs/remotes/origin/master

    $ git merge origin/master
    Updating 3958f68..55c15c8
    Fast forward
     index.html |    2 ++
     1 files changed, 2 insertions(+), 0 deletions(-)

Note

There are slight semantic differences between the merging
behavior of branches when multiple refspecs are given on the command
line and when they are found in a remote entry. The former causes an
octopus merge, wherein all branches are merged simultaneously in an
n-way operation, whereas the latter does not. Read the git pull manual page carefully!

If you choose to rebase rather than merge, Git will instead
forward port the changes on your local-tracking topic branch to the
newly fetched HEAD of the
corresponding remote-tracking branch. The operation is the same as
that shown in Figure 10-12 and Figure 10-13
in Chapter 10.

The command git pull
–rebase
will cause Git to rebase (rather than merge) your
local-tracking branch onto the remote-tracking branch during only this
pull. To make rebase the normal operation for a branch,
set the branch. branch_name.rebase
configuration variable to true:

    [branch "mydev"]
        remote = origin
        merge = refs/heads/master
        rebase = true

And with that, the merge (or
rebase) step is also done.

Should you merge or rebase?

So, should you merge or rebase your changes during a
pull operation? The short answer is
Do either as you wish. So, why would you choose to do
one over the other? Here are some issues to consider.

By using merge, you will potentially incur an additional merge
commit at each pull to record the updated changes simultaneously
present in each branch. In a sense, it is a true reflection of the two
paths of development that took place independently and were then,
well, merged together. Conflicts will have to be resolved during the
merge. Each sequence of commits on each branch will be based on
exactly the commit on which it was originally written. When pushed
upstream, any merge commits will continue to be present. Some consider
these superfluous merges and would rather not see them cluttering up
the history. Others consider these merges a more accurate portrayal of
the development history and want to see them retained.

As a rebase fundamentally changes the notion of when and where a
sequence of commits was developed, some aspects of the development
history will be lost. Specifically, the original commit on which your
development was originally based will be changed to be the newly
pulled HEAD of the remote-tracking
branch. That will make the development appear to happen later (in
commit sequence) than it actually did. If that’s OK with you, it’s OK
with me. It’ll just be different and simpler than if the history was
merged. Naturally, you will have to resolve conflicts during the
rebase operation as needed still. As the changes that are being
rebased are still strictly local within your repository and haven’t
been published yet, there’s really no reason to fear the don’t
change history
mantra with this rebase.

With both merge and rebase, you should consider that the new,
final content is different from what was present on either development
branch independently. As such, it might warrant some form of
validation in its new form: perhaps a compilation and test cycle prior
to being pushed to an upstream repository.

I tend to like to see simpler, linear histories. During most of
my personal development, I’m usually not too concerned by a slight
reordering of my changes with respect to those of my coworker’s that
came in on a remote-tracking branch fetch, so I am fond of using the
rebase option.

If you really want to set up one consistent approach, consider
setting config options branch.autosetupmerge or branch.autosetuprebase to true, false, or always as desired. There are also a few
other options to handle behavior between purely local branches and not
just between a local and a remote branch.

Remote Repository Development Cycle in Pictures

Integrating your local development with changes from an
upstream repository is at the very core of the distributed development
cycle in Git. Let’s take a moment to visualize what happens to both your
local repository and an upstream origin repository during clone and pull
operations. A few pictures should also clarify the often confusing uses of
the same name in different contexts.

Let’s start with the simple repository shown in Figure 12-1 as the basis for discussion.

Figure 12-1. Simple repository with commits

As with all of our commit graphs, the sequence of commits flows from
left to right and the master label
points to the HEAD of the branch. The
two most recent commits are labeled A
and B. Let’s follow these two commits,
introduce a few more, and watch what occurs.

Cloning a Repository

A git clone command
results in two separate repositories, as shown in Figure 12-2.

Figure 12-2. Cloned repository

This picture illustrates some important results of the clone
operation:

  • All the commits from the original repository are copied to
    your clone; you could now easily retrieve earlier stages of the
    project from your own repository.

  • The branch named master
    from the original repository is introduced into your clone on a new
    remote-tracking branch named origin/master.

  • Within the new clone repository, the new origin/master branch is initialized to
    point to the master HEAD commit, which is B in the figure.

  • A new local-tracking branch called master is created in your clone.

  • The new master branch is
    initialized to point to origin/HEAD, the original repository’s
    active branch HEAD. That happens
    to be origin/master, so it also
    points to the exact same commit, B.

After cloning, Git selects the new master branch as the current branch and checks
it out for you. Thus, unless you change branches, any changes you make
after a clone will affect your
master.

In all of these diagrams, development branches in both the
original repository and the derived clone repository are distinguished
by a dark shaded background, and remote-tracking branches by a lighter
shaded background. It is important to understand that both the
local-tracking development branches and remote-tracking branches are
private and local to their respective repositories. In terms of Git’s
implementation, however, the dark shaded branch labels belong to the
refs/heads/ namespace whereas, the
lighter ones belong to refs/remotes/.

Alternate Histories

Once you have cloned and obtained your development
repository, two distinct paths of development may result. First, you may
do development in your repository and make new commits on your master branch, as shown in Figure 12-3. In this picture, your
development extends the master branch
with two new commits, X and Y, which are based on B.

Figure 12-3. Commits in your repository

In the meantime, any other developer who has access to the
original repository might have done further development and pushed her
changes into that repository. Those changes are represented in Figure 12-4 by the addition of commits
C and D.

Figure 12-4. Commits in original repository

In this situation, we say that the histories of the
repositories have diverged or
forked at commit B. In much the same way that local branching
within one repository causes alternate histories to diverge at a commit,
a repository and its clone can diverge into alternate histories as a
result of separate actions by possibly different people. It is important to realize that this is
perfectly fine and that neither history is more correct than the
other.

In fact, the whole point of the merge operation is that these
different histories may be brought back together and resolved again.
Let’s see how Git implements that!

Non–Fast-Forward Pushes

If you are developing in a repository model in which you
have the ability to git push your
changes into the origin repository,
then you might attempt to push your changes at any time. This could
create problems if some other developer has previously pushed
commits.

This hazard is particularly common when you are using a shared
repository development model in which all developers can push their own
commits and updates into a common repository at any time.

Let’s look again at Figure 12-3, in which you have made new
commits, X and Y, based on B.

If you wanted to push your X and Y
commits upstream at this point, you could do so easily. Git would
transfer your commits to the origin
repository and add them on to the history at B. Git would then perform a special type of
merge operation called a fast-forward on the
master branch, putting in your edits
and updating the ref to point to Y. A
fast-forward is essentially a simple linear history advancement
operation; it was introduced in Degenerate Merges of Chapter 9.

On the other hand, suppose another developer had already pushed
some commits to the origin repository and the picture was more like
Figure 12-4 when you attempted to
push your history up to the origin repository. In effect, you are
attempting to cause your history to be sent to the shared repository
when there is already a different history there. The origin history does not simply fast-forward
from B. This situation is called the
non–fast-forward push problem.

When you attempt your push, Git rejects it and tells you about the
conflict with a message like this:

    $ git push
    To /tmp/Depot/public_html
     ! [rejected]        master -> master (non-fast forward)
    error: failed to push some refs to '/tmp/Depot/public_html'

So what are you really trying to do? Do you want to overwrite the
other developer’s work, or do you want to incorporate both sets of
histories?

Tip

If you want to overwrite all other changes, you can! Just use
the -f option on your git
push
. We just hope you won’t need that
alternate history!

More often, you are not trying to wipe out the existing origin history but just want your own changes
to be added. In this case, you must perform a merge of the two histories
in your repository before pushing.

Fetching the Alternate History

For Git to perform a merge between two alternate
histories, both must be present within one repository on two different
branches. Branches that are purely local development branches are a
special (degenerate) case of their already being in the same
repository.

However, if the alternate histories are in different
repositories because of cloning, then the remote branch must be brought
into your repository via a fetch operation. You can carry out the
operation through a direct git fetch
command or as part of a git pull
command; it doesn’t matter which. In either case, the fetch brings the
remote’s commits, here C and D, into your repository. The results are shown
in Figure 12-5.

Figure 12-5. Fetching the alternate history

In no way does the introduction of the alternate history with
commits C and D change the history represented by X and Y;
the two alternate histories both now exist simultaneously in your repository and form
a more complex graph. Your history is represented by your master branch, and the remote history is
represented by the origin/master
remote-tracking branch.

Merging Histories

Now that both histories are present in one repository, all
that is needed to unify them is a merge of the origin/master branch into the master branch.

The merge operation can be initiated either with a direct git merge origin/master command or as the
second step in a git pull request. In
both cases, the techniques for the merge operation are exactly the same
as those described in Chapter 9.

Figure 12-6 shows the commit graph
in your repository after the merge has successfully assimilated the two
histories from commit D and Y into a new merge commit, M. The ref for origin/master remains pointing at D because it hasn’t changed, but master is updated to the merge commit,
M, to indicate that the merge was
into the master branch; this is where
the new commit was made.

Figure 12-6. Merging histories

Merge Conflicts

Occasionally there will be merge conflicts between the
alternate histories. Regardless of the outcome of the merge, the fetch
still occurred. All the commits from the remote repository are still
present in your repository on the tracking branch.

You may choose to resolve the merge normally, as described
in Chapter 9, or you may choose to abort the merge
and reset your master branch to its
prior ORIG_HEAD state using the
command git reset –hard ORIG_HEAD.
Doing so in this example would move master to the prior
HEAD value, Y, and change your working directory to match.
It would also leave origin/master at
commit D.

Tip

You can brush up on the meaning of ORIG_HEAD by reviewing refs and symrefs of Chapter 6; also see its use in the section Aborting or Restarting a Merge (Chapter 9).

Pushing a Merged History

If you’ve performed all the steps shown, your repository
has been updated to contain the latest changes from both the origin repository and your repository. But the
converse is not true: the origin
repository still doesn’t have your changes.

If your objective is only to incorporate the latest updates from
origin into your repository, then you
are finished when your merge is resolved. On the other hand, a simple
git push can return the unified and
merged history from your master
branch back to the origin repository.
Figure 12-7 shows the results
after you git push.

Figure 12-7. Merged histories after push

Finally, observe that the origin repository has been updated with your
development even if it has undergone other changes that had to be merged
first. Both your repository and the origin repository have been fully updated and
are again synchronized.

Remote Configuration

Keeping track of all of the information about a remote
repository reference by hand can become tedious and difficult: you have to
remember the full URL for the repository; you must type and retype remote
references and refspecs on the command line each time you want to fetch
updates; you have to reconstruct the branch mappings; and so on. Repeating
the information is also likely to be quite error prone.

You might also wonder how Git remembers the URL for the remote from
the initial clone for use in subsequent fetch or push operations using
origin.

Git provides three mechanisms for setting up and maintaining
information about remotes: the
git remote command, the git config command, and editing the .git/config file directly. All three
mechanisms ultimately result in configuration information being recorded in the .git/config file.

Using git remote

The git remote command
is a more specialized interface, specific to remotes, that manipulates the configuration file data
and remote refs. It has several subcommands with fairly intuitive names.
There is no help option, but you can circumvent that to display a
message with subcommand names via the unknown subcommand
trick
:

    $ git remote xyzzy
    error: Unknown subcommand: xyzzy
    usage: git remote
       or: git remote add <name> <url>
       or: git remote rm <name>
       or: git remote show <name>
       or: git remote prune <name>
       or: git remote update [group]

        -v, --verbose         be verbose

You saw the git remote add and update commands in the section Make Your Own Origin Remote, earlier in this
chapter, and you saw show in Adding a New Developer. You used git remote add origin to add a new remote
named origin to the newly created
parent repository in the depot, and you ran the git remote show origin command to extract all
the information about the remote origin. Finally, you used the git remote update command to fetch all the
updates available in the remote repository into your local
repository.

The command git remote
rm
removes the given remote and all of its associated
remote-tracking branches from your local
repository. To remove just one remote-tracking branch from your local
repository, use a command like this:

    $ git branch -r -d origin/dev

But you shouldn’t really do that unless the corresponding remote
branch really has been removed from the upstream repository. Otherwise, your next
fetch from the upstream repository is likely to recreate the branch
again.

The remote repository may have branches deleted from it by
the actions of other developers,
even though your copies of those branches may linger in your repository.
The git remote prune command may be
used to remove the names of those stale (with respect to the actual
remote repository) remote-tracking branches from your local
repository.

To keep even more in sync with an upstream remote, use the command
git remote update –prune
remote
to first get updates from
the remote and then prune stale tracking branches all in one
step.

To rename a remote and all of its refs, use git remote rename old
new
. After this command:

    $ git remote rename jon jdl

any ref like jon/bugfixes will
be renamed as jdl/bugfixes.

In addition to manipulations of the remote name and its refs, you
can also update or change the URL of the remote:

    $ git remote set-url origin git://repos.example.com/stuff.git

Using git config

The git config command
can be used to manipulate the entries in your configuration file
directly. This includes several config variables for remotes.

For example, to add a new remote named publish with a push refspec for all the
branches you would like to publish, you might do something like
this:

    $ git config remote.publish.url 'ssh://git.example.org/pub/repo.git'
    $ git config remote.publish.push '+refs/heads/*:refs/heads/*'

Each of the preceding commands adds a line to the
.git/config file. If no publish remote section exists yet, then the
first command you issue that refers to that remote creates a section in
the file for it. As a result, your .git/config contains, in part, the following
remote definition:

    [remote "publish"]
            url = ssh://git.example.org/pub/repo.git
            push = +refs/heads/*:refs/heads/*

Tip

Use the -l (lowercase L) option à la git config -l to list the contents of the
configuration file with complete variable names:

    # From a clone of git.git sources

    $ git config -l
    core.repositoryformatversion=0
    core.filemode=true
    core.bare=false
    core.logallrefupdates=true
    remote.origin.url=git://git.kernel.org/pub/scm/git/git.git
    remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
    branch.master.remote=origin
    branch.master.merge=refs/heads/master

Using Manual Editing

Rather than wrestling with either the git remote or git
config
commands, directly editing the file with your favorite
text editor may be easier or faster in some situations. There is nothing
wrong with doing so, but it can be error prone and is usually done only
by developers who are very familiar with Git’s behavior and the
configuration file. Yet having seen the parts of the file that influence
various Git behaviors and the changes resulting from commands, you
should have basis enough to understand and manipulate the configuration
file.

Multiple Remote Repositories

Operations such as git remote
add
remote
repository-URL can be executed multiple
times to add several new remotes to your repository. With multiple
remotes, you can subsequently fetch commits from multiple sources and
combine them in your repository. This feature also allows you to
establish several push destinations that might receive part or all of
your repository.

In Chapter 13, we’ll show you
how to use multiple repositories in different scenarios during your
development.

Working with Tracking Branches

Because the creation and manipulation of tracking branches
is such a vital part of the Git development methodology, it is important
to understand how and why Git creates the different tracking branches and
how Git expects you to develop using them.

Creating Tracking Branches

In the same way that your master branch can be thought of as extending
the development brought in on the origin/master branch, you can create a new
branch based on any remote-tracking branch and use it to extend that
line of development.

We’ve already seen that remote-tracking branches are
introduced during a clone operation or when remotes are added to a
repository. In later versions of Git, after about 1.6.6 or so, Git makes
it very easy to create a local- and remote-tracking branch pair using a
consistent ref name for them. A simple check out request using the name
of a remote-tracking branch causes a new local-tracking branch to be
created and associated with the remote-tracking branch. However, Git
does this only if your branch name matches just one remote branch name
from all of the repository remotes. And by the phrase branch name
matches,
Git means the full branch name after the name of the
remote in a refspec.

Let’s use Git’s source repository for some examples. By
pulling both from GitHub and git.kernel.org, we’ll create a repository
that has a vast collection of branch names from two remotes, some of
which are duplicates.

    # Grab GitHub's repository
    $ git clone git://github.com/gitster/git.git
    Cloning into 'git'...
    ...

    $ git remote add korg git://git.kernel.org/pub/scm/git/git.git

    $ git remote update
    Fetching origin
    Fetching korg
    remote: Counting objects: 3541, done.
    remote: Compressing objects: 100% (1655/1655), done.
    remote: Total 3541 (delta 1796), reused 3451 (delta 1747)
    Receiving objects: 100% (3541/3541), 1.73 MiB | 344 KiB/s, done.
    Resolving deltas: 100% (1796/1796), done.
    From git://git.kernel.org/pub/scm/git/git
     * [new branch]      maint      -> korg/maint
     * [new branch]      master     -> korg/master
     * [new branch]      next       -> korg/next
     * [new branch]      pu         -> korg/pu
     * [new branch]      todo       -> korg/todo


    # Find a uniquely name branch and check it out.
    $ git branch -a | grep split-blob
      remotes/origin/jc/split-blob

    $ git branch
    * master

    $ git checkout jc/split-blob
    Branch jc/split-blob set up to track remote branch jc/split-blob from origin.
    Switched to a new branch 'jc/split-blob'

    $ git branch
    * jc/split-blob
      master

Notice that we had to use the full branch name jc/split-blob and not simply split-blob.

In the case when the branch name is ambiguous, you can directly
establish and set up the branch yourself.

    $ git branch -a | egrep 'maint$'
      remotes/korg/maint
      remotes/origin/maint

    $ git checkout maint
    error: pathspec 'maint' did not match any file(s) known to git.

    # Just select one of the maint branches.
    $ git checkout --track korg/maint
    Branch maint set up to track remote branch maint from korg.
    Switched to a new branch 'maint'

It is likely that the two branches represent the same commit as
found in two different repositories and you can simply choose one on
which to base your local-tracking branch.

If for some reason you wish to use a different name for
your local-tracking branch, use the -b option.

    $ git checkout -b mypu --track korg/pu
    Branch mypu set up to track remote branch pu from korg.
    Switched to a new branch 'mypu'

Under the hood, Git automatically adds a branch entry to the .git/config to indicate that the
remote-tracking branch should be merged into your new local-tracking
branch. The collected changes from the previous series of commands
yields the following config file:

    $ cat .git/config
    [core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
    [remote "origin"]
        fetch = +refs/heads/*:refs/remotes/origin/*
        url = git://github.com/gitster/git.git
    [branch "master"]
        remote = origin
        merge = refs/heads/master
    [remote "korg"]
        url = git://git.kernel.org/pub/scm/git/git.git
        fetch = +refs/heads/*:refs/remotes/korg/*
    [branch "jc/split-blob"]
        remote = origin
        merge = refs/heads/jc/split-blob
    [branch "maint"]
        remote = korg
        merge = refs/heads/maint
    [branch "mypu"]
        remote = korg
        merge = refs/heads/pu

As usual, you may also use git
config
or a text editor to manipulate the branch entries in the configuration
file.

Tip

When you get lost in the tracking branch mire, use the command
git remote show
remote
to help sort out all the
remotes and branches.

At this point, it should be pretty clear that the default clone
behavior introduces local-tracking branch master for the remote-tracking branch origin/master as a simplifying convenience
just as if you had explicitly checked out the master branch yourself.

To reinforce the idea that making commits directly on a
remote-tracking branch isn’t good form, checking out a remote-tracking
branch using early versions of Git (prior to about 1.6.6 or so) caused a
detached HEAD. As mentioned in Detached HEAD Branches of Chapter 7, a
detached HEAD is essentially an
anonymous branch name. Making commits on the detached HEAD is possible, but you shouldn’t then
update your remote-tracking branch HEAD with any local commits lest you suffer
grief later when fetching new updates from that remote. (If you find you
need to keep any such commits on a detached HEAD, use git
checkout -b my_branch
to create a
new, local branch on which to further develop your changes.)
Collectively, it isn’t really a good, intuitive approach.

If you don’t want to check out a local-tracking branch
when you create it, you can instead use git
branch –track local-branch
remote-branch
to create the
local-tracking branch and record
the local- and remote-branch association in the .git/config file for you:

    $ git branch --track dev origin/dev
    Branch dev set up to track remote branch dev from origin.

And, if you already have a topic branch that you decide should be
associated with an upstream repository’s remote-tracking branch, you can
establish the relationship using the --set-upstream
option. Typically, this is done after adding a new remote, like
this:

    $ git remote add upstreamrepo git://git.example.org/upstreamrepo.git

    # Branch mydev already existed.
    # Leave it alone, but associated it with upstreamrepo/dev.
    $ git branch --set-upstream mydev upstreamrepo/dev

Ahead and Behind

With the establishment of a local- and remote-tracking
branch pair, relative comparisons between the two branches can be made.
In addition to the normal diff,
log, and other content-based
comparisons, Git offers a quick summary of the number of commits on each
of the branches and states which branch it judges to be ahead
of
or behind the other branch.

If your local development introduces new commits on a
local-tracking branch, it is considered to be ahead of the corresponding
remote-tracking branch. Conversely, if you fetch new commits onto
remote-tracking branches and they are not present on your local-tracking
branch, Git considers your local-tracking branch to be behind the
corresponding remote-tracking branch.

The git status usually
reports this status:

    $ git fetch
    remote: Counting objects: 9, done.
    remote: Compressing objects: 100% (6/6), done.
    remote: Total 6 (delta 4), reused 0 (delta 0)
    Unpacking objects: 100% (6/6), done.
    From example.com:SomeRepo
       b1a68a8..b722324  ver2  -> origin/ver2

    $ git status
    # On branch ver2
    # Your branch is behind 'origin/ver2' by 2 commits, and can be fast-forwarded.

To see which commits you have in master that are not in origin/master, use a command like this:

    $ git log origin/master..master

Yes, it is possible to be both ahead and behind
simultaneously!

    # Make one local commit on top of previous example
    $ git commit -m "Something" main.c
      ...

    $ git status
    # On branch ver2
    # Your branch and 'origin/ver2' have diverged,
    # and have 1 and 2 different commit(s) each, respectively.

And in this case, you probably want to use the symmetric
difference to see the changes:

    $ git log origin/master...master

Adding and Deleting Remote Branches

Any new development you create on branches in your local
clone are not visible in the parent repository until you make a direct
request to propagate it there. Similarly, a branch deletion in your
repository remains a local change and is not removed from the parent
repository until you request it to be removed from the remote as
well.

In Chapter 7, you learned how to add new
branches to and delete existing ones from your repository using the
git branch command. But git branch operates only on a local
repository.

To perform similar branch add and delete operations on a
remote repository, you need to specify different forms of refspecs in a
git push command. Recall that the
syntax of a refspec is:

    [+]source:destination

Pushes that use a refspec with just a
source ref (i.e., with no
destination ref) create a new branch in the
remote repository:

    $ cd ~/public_html

    $ git checkout -b foo
    Switched to a new branch "foo"

    $ git push origin foo
    Total 0 (delta 0), reused 0 (delta 0)
    To /tmp/Depot/public_html
     * [new branch]      foo -> foo

A push that names only a source is just a shorthand for using the
same name for both the source and destination ref name. A push that names
both a source and a destination ref that are different can be used to
create a new destination named branch or extend
an existing destination remote branch with the content from the local
source branch. That is, git push origin
mystuff:dev
will push the local branch mystuff to the upstream repository and either create or
extend a branch named dev. Thus, due to
a series of default behaviors, the following commands have the same
effect:

    $ git push upstream new_dev
    $ git push upstream new_dev:new_dev
    $ git push upstream new_dev:refs/heads/new_dev

Naturally, upstream would be a reference to an
appropriate upstream repository and might typically be
origin.

Pushes that use a refspec with just a
destination ref (i.e., no
source ref) cause the
destination ref to be deleted from the remote
repository. To denote the ref as the
destination, the colon separator must be
specified:

    $ git push origin :foo
    To /tmp/Depot/public_html
     - [deleted]         foo

If that : branch form causes
you heartache, you can use a syntactically equivalent form:

    $ git push origin --delete foo

So what about renaming a remote branch? Unfortunately, there is not
a simple solution. The short answer is create a new upstream branch with
the new name and then delete the old branch. That’s easy enough to do
using the git push commands as shown
previously.

    # Create new name at exiting old commit
    $ git branch new origin/old
    $ git push origin new

    # Remove the old name
    $ git push origin :old

But that’s the easy and obvious part. Now what are the distributed
implications? Do you know who has a clone of the upstream repository that
was just modified out from underneath them? If you do, they could all just
fetch and remote prune to get their repositories updated.
But if you don’t, then all those other clones will suddenly have dangling
tracking branches. And there’s no real way to get them renamed in a
distributed way.

Bottom line here: this is just a variant on the Be careful
how you rewrite history
theme.

Bare Repositories and git push

As a consequence of the peer-to-peer semantics of Git
repositories, all repositories are of equal stature. You can push to and
fetch from development and bare repositories equally, because there is no
fundamental implementation distinction between them. This symmetric design
is critically important to Git, but it also leads to some unexpected
behavior if you try to treat bare and development repositories as exact
equals.

Recall that the git push command
does not check out files in the receiving repository. It simply transfers
objects from the source repository to the receiving repository and then
updates the corresponding refs on the receiving end.

In a bare repository, this behavior is all that can be expected,
because there is no working directory that might be updated by checked out
files. That’s good. However, in a development repository that is the recipient
of a push operation, it can later cause confusion to anyone using the
development repository.

The push operation can update the repository state, including the
HEAD commit. That is, even though the
developer at the remote end has done nothing, the branch refs and HEAD might change, becoming out of sync with the
checked out files and index.

A developer who is actively working in a repository into which an
asynchronous push happens will not see the push. But a subsequent commit
by that developer will occur on an unexpected HEAD, creating an odd history. A forced push
will lose pushed commits from the other developer. The developer at that
repository also may find herself unable to reconcile her history with
either an upstream repository or a downstream clone because they are no
longer simple fast-forwards as they should be. And she won’t know why: the
repository has silently changed out from underneath her. Cats and dogs
will live together. It’ll be bad.

As a result, you are encouraged to push only into a bare repository.
This is not a hard-and-fast rule, but it’s a good guide to the average
developer and is considered a best practice. There are a few instances and
use cases where you might want to push into a development repository, but
you should fully understand its implications. When you
do want to push into a development repository, you
may want to follow one of two basic approaches.

In the first scenario, you really do want to have a working
directory with a branch checked out in the receiving repository. You may
know, for example, that no other developer will ever be doing active
development there and therefore there is no one who might be blind sided
by silent changes being pushed into his repository.

In this case, you may want to enable a hook in the receiving
repository to perform a checkout of some branch, perhaps the one just
pushed, into the working directory as well. To verify that the receiving
repository is in a sane state prior to having an automatic checkout, the
hook should ensure that the nonbare repository’s working directory
contains no edits or modified files and that its index has no files in the
staged but uncommitted state when the push happens. When these conditions
are not met, you run the risk of losing those edits or changes as the
checkout overwrites them.

There is another scenario where pushing into a nonbare repository
can work reasonably well. By agreement, each developer who pushes changes
must push to a non–checked out branch that is considered simply a
receiving branch. A developer never pushes to a branch that is expected to
be checked out. It is up to some developer in particular to manage what
branch is checked out and when. Perhaps that person is responsible for
handling the receiving branches and merging them into a master branch
before it is checked out.


[25] Of course, a bidirectional remote relationship can be set up
later using the git remote
command.

[26] Version 1.6.3 appears to be the delineation here.

Comments are closed.