git repos can grow fat after long time of use; fortunately, there are some easy commands to reduce their size;

garbage collection

the primary downsize technique is garbage collection; the command we use here for garbage collection is:

git gc --aggressive --prune=now

this deletes unreachable objects in the repo in an aggressive manner; you can find more about this in git help gc;

prune reflog

if you have indeed read the doc shown by git help gc, you see it says:

git gc tries very hard not to delete objects that are referenced anywhere in your repository. In particular, it will keep not only objects referenced by your current set of branches and tags, but also objects referenced by the index, remote-tracking branches, reflogs (which may reference commits in branches that were later amended or rewound), and anything else in the refs/* namespace. Note that a note (of the kind created by git notes) attached to an object does not contribute in keeping the object alive. If you are expecting some objects to be deleted and they aren’t, check all of those locations and decide whether it makes sense in your case to remove those references.

in short, we need to remove unnecessary references to let gc actually delete the objects; as it says, there are several kinds of references:

  • current set of branches and tags;

  • index;

  • remote-tracking branches;

  • reflogs;

  • anything else in the refs/* namespace;

the one that is common but often ignored is reflog, which we can prune with:

git reflog expire --expire=90.days.ago --expire-unreachable=all --all

this command prunes reachable reflog entries that are 90 days or older, and all unreachable reflog entries; when used together, run this command before git gc so that git gc takes its effect:

git reflog expire --expire=90.days.ago --expire-unreachable=all --all
git gc --aggressive --prune=now

delete all commits, but one

warn: this can be destructive!

sometimes we have a lot of legacy in our git repo, and what we really want is merely its current state (ie: the last commit); when this is the case, we can remove all the other commits, which can reduce the repo size quite a bit:

git checkout --orphan newbranch
git add -A
git commit
git branch -D master
git branch -m master
git push -f origin master

this creates a new orphan branch as the master branch, then discards the old one (including all the commits only reachable from it); this technique is very potent, so use it with care; if you need to push the new branch to remote, use forced push; finally, this technique is not limited to the master branch and the name newbranch is arbitrary;

when you have done this, you can append the same downsize commands above; note that because you have freed the old branch, there are possibly many objects to delete; so this can be a huge saving (at the expense of losing commit history); as such, you may not want this in a public git repo;

references