-->

archive old revisions of a git repository

Oct 27, 2020

Archive old revisions of a git repository

Sometimes one would like to continue development in a repository but without carrying older historical revisions forward. To achieve this without actually rewriting the repository history one can create a new detached timeline that can be reconnected later if necessary.

Example

Let’s imagine a repo like this:

mkdir myrepo
cd myrepo
git init .
echo 'this is new' > stuff.txt
echo 'My repo' > README.txt
git add -A
git commit -m "initial revision"

After some time we have a long commit history in the master branch, and we have decided that we don’t care anymore about stuff.txt:

git rm -f stuff.txt
git commit -m "remove old stuff"

However the master branch still contains revisions that contain that file.

To restart the timeline with a clean history we can create a new orphan branch with the exact same contents as our current master:

head=$(git rev-parse master)
git checkout --orphan newmaster master
git commit -m "replaced with $head"

The newmaster branch doesn’t contain any revisions with stuff.txt anymore! We can proceed to replace the master branch with the newmaster branch like this:

git checkout -b oldmaster master  # save a local reference to the original history
git checkout -B master newmaster
git push --force

At this point, the local and remote master branch doesn’t contain any of the commits that exist in oldmaster. This is great because we don’t have to carry forward references to the old files like stuff.txt. However it is not that great if we still want to view the full commit history, e.g. for README.txt:

git log master -- README.md

The above command will output only the commits made after our clean break. Additionally files that were removed before our clean break no longer appear in the history at all:

git log master -- stuff.txt  # returns nothing

However if we have a reference to the old master branch we can use git replace to create a synthetic history like this:

start=$(git rev-list --max-parents=0 master)
end=$(git rev-parse oldmaster)
git replace $start $end
git log master -- README.md
git log master -- stuff.txt
git replace -d $start

The output of git log will now be as if we had just continued development in the original branch!