DevOps Zone is brought to you in partnership with:

Ariya is a passionate engineer interested in bleeding-edge technologies. He has been involved in various large projects, from KDE to WebKit. These days, his focus is mostly on software craftsmanship around web technologies. His (little) spare time is spent running the projects PhantomJS (headless WebKit) and Esprima (JavaScript parser). Ariya is a DZone MVB and is not an employee of DZone and has posted 59 posts at DZone. You can read more from them at their website. View Full User Profile

Extracting Parts of Git Repository and Keeping the History

07.08.2014
| 6987 views |
  • submit to reddit

At some point, a software project will grow beyond its original scope. In many cases, some portions of the project become their own mini world. For maintenance purposes, it is often beneficial to separate them into their own projects. Furthermore, the commit history for the extracted project should not be lost. With Git, this can be achieved using git-subtree.

While git-subtree is quite powerful, the feature that we need for this task is its splitting capability. The documentation says the following regarding this split feature:

Extract a new, synthetic project history from the history of the prefix subtree. The new history includes only the commits (including merges) that affected prefix, and each of those commits now has the contents of prefix at the root of the project instead of in a subdirectory. Thus, the newly created history is suitable for export as a separate git repository.

This turns out to be quite simple. In fact, there is already a Stack Overflow answer which describes the necessary step-by-step instructions. The illustration below, also dealing with a real-world repo, hopefully serves as an additional example of this use case.

First of all, make sure you have a fresh version of Git:

git --version

If it says 1.8.3, then get a newer version since there is a bug (fixed in 1.8.4) which will pollute your commit logs badly, i.e. by adding “-n” everywhere.

For this example, let’s say we want to extract the funny automatic name generator (for a container) from the Docker project into its own Git repository. We start by cloning the main Docker repository:

git clone https://github.com/dotcloud/docker.git
cd docker

We then split the name generator, which lives under pkg/namesgenerator, and place it into a separate branch. Here the branch is called namesgen but feel free to name it anything you like.

git-subtree split --prefix=pkg/namesgenerator/ --branch=namesgen

The above process is going to take a while, depending on the size of the repository. When it is completed, we can verify it by inspecting the commit history:

git log namesgen

The next step is to prepare a place for the new repository (choose any directory you prefer). From there, all we need to do is to pull the namesgen branch which was split before:

cd ~
mkdir namesgen
cd namesgen
git init
git pull /path/to/docker/checkout namesgen

That’s it! Of course, normally you want to push this to some remote, e.g. a repository on GitHub or Bitbucket or your own Git endpoint:

git remote add origin git@github.com:joesixpack/namesgen.git
git push -u origin --all

The new repository will only contain the files from pkg/namesgenerator/ directory from Docker repository. And obviously, every commit that touch that directory still appears in the history.

Mission accomplished!



Published at DZone with permission of Ariya Hidayat, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)