loading...

Git – Introduction

Installing MySQL On CentOS 8

Background

No cautious, creative person starts a project nowadays without a
back-up strategy. Because data is ephemeral and can be lost easily—through
an errant code change or a catastrophic disk crash, say—it is wise to
maintain a living archive of all work.

For text and code projects, the back-up strategy typically includes version control, or
tracking and managing revisions. Each developer can make several revisions
per day, and the ever increasing corpus serves simultaneously as
repository, project narrative, communication medium, and team and product
management tool. Given its pivotal role, version control is most effective when tailored to the working habits and
goals of the project team.

A tool that manages and tracks different versions of software or
other content is referred to generically as a version control system
(VCS), a source code manager (SCM), a revision control system (RCS), and several
other permutations of the words revision,
version, code, content,
control, management, and
system. Although the authors and users of each tool might
debate esoterics, each system addresses the same issue: develop and
maintain a repository of content, provide access to historical editions of
each datum, and record all changes in a log. In this book, the term
version control system (VCS) is used to refer
generically to any form of revision control system.

This book covers Git, a particularly powerful, flexible, and
low-overhead version control tool that makes collaborative development a
pleasure. Git was invented by Linus Torvalds to support the development of
the Linux®[1] kernel, but it has since proven valuable to a wide range of
projects.

The Birth of Git

Often, when there is discord between a tool and a project, the
developers simply create a new tool. Indeed, in the world of software, the
temptation to create new tools can be deceptively easy and inviting. In
the face of many existing version control systems, the decision to create
another shouldn’t be made casually. However, given a critical need, a bit
of insight, and a healthy dose of motivation, forging a new tool can be
exactly the right course.

Git, affectionately termed the information manager from
hell
by its creator (Linus is known for both his irascibility and
his dry wit), is such a tool. Although the precise circumstances and
timing of its genesis are shrouded in political wrangling within the Linux
kernel community, there is no doubt that what came from that fire is a
well-engineered version control system capable of supporting the worldwide
development of software on a large scale.

Prior to Git, the Linux kernel was developed using the commercial
BitKeeper VCS, which provided sophisticated operations not available in
then-current, free software VCSs such as RCS and the Concurrent Versions
System (CVS). However, when the company that owned BitKeeper placed
additional restrictions on its free as in beer version in
the spring of 2005, the Linux community realized that BitKeeper was no
longer a viable solution.

Linus looked for alternatives. Eschewing commercial solutions, he
studied the free software packages but found the same limitations and
flaws that led him to reject them previously. What was wrong with the
existing VCSs? What were the elusive missing features or characteristics that Linus wanted and
couldn’t find?

Facilitate Distributed Development

There are many facets to distributed
development,
and Linus wanted a new VCS that would cover
most of them. It had to allow parallel as well as independent and
simultaneous development in private repositories without the need
for constant synchronization with a central repository, which could
form a development bottleneck.
It had to allow multiple developers in multiple locations even if
some of them were offline temporarily.

Scale to Handle Thousands of Developers

It isn’t enough just to have a distributed development
model. Linus knew that thousands of developers contribute to each
Linux release. So any new VCS had to handle a very large number of
developers whether they were working on the same or different parts
of a common project. And the new VCS had to be able to integrate all
of their work reliably.

Perform Quickly and Efficiently

Linus was determined to ensure that a new VCS was fast and
efficient. In order to support the sheer volume of update operations
that would be made on the Linux kernel alone, he knew that both
individual update operations and network transfer operations would
have to be very fast. To save space and thus transfer time,
compression and delta techniques would be needed.
Using a distributed model instead of a centralized model also
ensured that network latency would not hinder daily
development.

Maintain Integrity and Trust

Because Git is a distributed revision control system, it is
vital to obtain absolute assurance that data integrity is maintained and is not somehow being altered. How
do you know the data hasn’t been altered in transition from one
developer to the next? Or from one repository to the next? Or, for
that matter, that the data in a Git repository is even what it
purports to be?

Git uses a common cryptographic hash function, called
Secure Hash Function (SHA1), to name and identify objects within its database.
Though perhaps not absolute, in practice it has proven to be solid
enough to ensure integrity and trust for all Git’s distributed
repositories.

Enforce Accountability

One of the key aspects of a version control system is
knowing who changed files and, if at all possible, why. Git enforces
a change log on every commit that changes a file. The information
stored in that change log is left up to the developer, project requirements,
management, convention, and so on. Git ensures that changes will not
happen mysteriously to files under version control because there is
an accountability trail for
all changes.

Immutability

Git’s repository database contains data objects that are
immutable. That is, once they have been created
and placed in the database, they cannot be modified. They can be
recreated differently, of course, but the original data cannot be
altered without consequences. The design of the Git database means
that the entire history stored within the version control database
is also immutable. Using immutable objects has several advantages,
including quick comparison for equality.

Atomic Transactions

With atomic transactions, a number of different but related
changes are performed either all together or not at all. This
property ensures that the version control database is not left in a partially
changed or corrupted state while an update or commit is happening.
Git implements atomic transactions by recording complete, discrete
repository states that cannot be broken down into individual or
smaller state changes.

Support and Encourage Branched Development

Almost all VCSs can name different genealogies of
development within a single project. For instance, one sequence of
code changes could be called development while
another is referred to as test. Each version control
system can also split a single line of development into multiple
lines and then unify, or merge, the disparate threads. As with most
VCSs, Git calls a line of development a
branch and assigns each branch a name.

Along with branching comes merging. Just as Linus wanted easy branching to foster
alternate lines of development, he also wanted to facilitate easy
merging of those branches. Because branch merging has often been a
painful and difficult operation in version control systems, it would
be essential to support clean, fast, easy merging.

Complete Repositories

So that individual developers needn’t query a centralized
repository server for historical revision information, it
was essential that each repository have a complete copy of all
historical revisions of every file.

A Clean Internal Design

Even though end users might not be concerned about a clean
internal design, it was important to Linus and ultimately to other
Git developers as well. Git’s object model has simple structures
that capture fundamental concepts for raw data, directory structure, recording
changes, and so forth. Coupling the object model with a globally
unique identifier technique allowed a very clean data model that
could be managed in a distributed development environment.

Be Free, as in Freedom

’Nuff said.

Given a clean slate to create a new VCS, many talented software
engineers collaborated and Git was born. Necessity was the mother of
invention again!

Precedents

The complete history of VCSs is beyond the scope of this book.
However, there are several landmark, innovative systems that set the stage
for or directly led to the development of Git. (This section is
selective, hoping to record when new features were introduced or became
popular within the free software community.)

The Source Code Control System (SCCS) was one of the original systems on Unix®[2] and was developed by M. J. Rochkind in the very early 1970s. [ The Source Code Control
System,
IEEE Transactions on Software
Engineering
1(4) (1975): 364-370.] This is arguably the first
VCS available on any Unix system.

The central store that SCCS provided was called a
repository, and that fundamental concept remains pertinent to this
day. SCCS also provided a simple locking model to serialize development. If a developer needed files to
run and test a program, he or she would check them out unlocked. However,
in order to edit a file, he or she had to check it out with a lock (a
convention enforced through the Unix file system). When finished, he or
she would check the file back into the repository and unlock it.

The Revision Control System (RCS) was introduced by Walter F. Tichy in the early 1980s. [ RCS: A System for Version
Control,
Software Practice and Experience
15(7) (1985): 637-654.] RCS introduced both forward and reverse delta
concepts for the efficient storage of different file revisions.

The Concurrent Version System (CVS), designed and originally implemented by Dick Grune in 1986 and then crafted anew some four years later by
Berliner and colleagues extended and modified the RCS model with great
success. CVS became very popular and was the de facto standard within the
open source (http://www.opensource.org) community
for many years. CVS provided several advances over RCS, including distributed development and repository-wide
change sets for entire modules.

Furthermore, CVS introduced a new paradigm for the lock. Whereas
earlier systems required a developer to lock each file before changing it
and thus forced one developer to wait for another in serial fashion, CVS
gave each developer write permission in his or her private working copy.
Thus, changes by different developers could be merged automatically by CVS
unless two developers tried to change the same line. In that case, the
conflict was flagged and the developers were left to work out the
solution. The new rules for the lock allowed different developers to write
code concurrently.

As often occurs, perceived shortcomings and faults in CVS eventually
led to a new VCS. Subversion (SVN), introduced in 2001, quickly became
popular within the free software community. Unlike CVS, SVN committed
changes atomically and had significantly better support for
branches.

BitKeeper and Mercurial were radical departures from all the aforementioned
solutions. Each eliminated the central repository; instead, the store was
distributed, providing each developer with his own shareable copy. Git is
derived from this peer-to-peer model.

Finally, Mercurial and Monotone contrived a hash fingerprint to uniquely identify a file’s content. The name assigned to
the file is a moniker and a convenient handle for the user and nothing
more. Git features this notion as well. Internally, the Git identifier is
based on the file’s contents, a concept known as a content-addressable
file store. The concept is not new. [See The Venti
Filesystem,
(Plan 9), Bell Labs, http://www.usenix.org/events/fast02/quinlan/quinlan_html/index.html.]
Git immediately borrowed the idea from Monotone, according to
Linus.[3] Mercurial was implementing the concept simultaneously with Git.

Timeline

With the stage set, a bit of external impetus, and a dire VCS
crisis imminent, Git sprang to life in April 2005.

Git became self-hosted on April 7 with this commit:

    commit e83c5163316f89bfbde7d9ab23ca2e25604af29
    Author: Linus Torvalds <torvalds@ppc970.osdl.org>
    Date:   Thu Apr 7 15:13:13 2005 -0700

    Initial revision of "git", the information manager from hell

Shortly thereafter, the first Linux commit was made:

    commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
    Author: Linus Torvalds <torvalds@ppc970.osdl.org>
    Date:   Sat Apr 16 15:20:36 2005 -0700

    Linux-2.6.12-rc2

    Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

That one commit introduced the bulk of the entire Linux Kernel into
a Git repository.[4] It consisted of

     17291 files changed, 6718755 insertions(+), 0 deletions(-)

Yes, that’s an introduction of 6.7 million lines of code!

It was just three minutes later when the first patch using Git was
applied to the kernel. Convinced that it was working, Linus announced it
on April 20, 2005, to the Linux Kernel Mailing List.

Knowing full well that he wanted to return to the task of developing
the kernel, Linus handed the maintenance of the Git source code to Junio
Hamano on July 25, 2005, announcing that Junio was the
obvious choice.

About two months later, Version 2.6.12 of the Linux Kernel was
released using Git.

What’s in a Name?

Linus himself rationalizes the name Git by claiming
I’m an egotistical bastard, and I name all my projects after
myself. First Linux, now git.
[5] Granted, the name Linux for the kernel was
sort of a hybrid of Linus and Minix. The irony of using a British term for
a silly or worthless person was not missed, either.

Since then, others had suggested some alternative and perhaps more
palatable interpretations: the
Global Information Tracker seems to be the most popular.


[1] Linux® is the registered trademark of Linus Torvalds in the
United States and other countries.

[2] UNIX is a registered trademark of The Open Group in the United
States and other countries.

[3] Private email.

[4] See http://kerneltrap.org/node/13996 for a
starting point on how the old BitKeeper logs were imported into a Git
repository for older history (pre-2.5).

[5] See http://www.infoworld.com/article/05/04/19/HNtorvaldswork_1.html.

Comments are closed.

loading...