Git-clean

08-11-2012 | Remy van Elst


Table of Contents


The script below is a script to clean a git repository. Either to see big files, or to remove them, and remove them from history.

Examples

View the list of big files in the git repo and the git repo history (from the sourcecode repo of this website):

remy at sparcstation-20 in ~/repo/raymiiorg (master)
$ ~/bin/git-clean.sh df
All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file.
size     pack    SHA                                       location
2051757  655294  7e94729f780724d3b00803b018199f137959630e  includes/sec/dutch.txt
901683   901853  52e346e5ce9a2f4d9d531f93c082178684da4c72  images/smalldistro12.png
830998   827833  9afcd85c73a994ce1dc1e5daf1856113d6bf85f0  images/smalldistro10.png
799361   765804  a8e7d464fbb7e5159442b2644a4f60f6352bf238  content/downloads/packages/gource-0.38-2.x86_64.rpm
793898   763445  ea80fcf775a070e8d7c2b9375316f27cdac45bf6  content/downloads/packages/gource_0.38-1_amd64.deb
753400   746864  d9901f182cbd7090efe679847851027e1e05882c  content/downloads/packages/logstalgia-1.0.3-2.x86_64.rpm
747480   743285  e5aa2be4d510c14c1e67f2c07dbcf812a95a0798  content/downloads/packages/logstalgia_1.0.3-1_amd64.deb
635854   636023  6e5c39dff693b861f25ec96ba16c5fe9700ec9c9  images/smalldistro3.png
443121   439081  4461b7489dcb46911e14376e37e7abbe335668a7  images/smalldistro2.png
395247   393304  47a6ab134240e2433310cc40d8909bd11cf80935  images/smalldistro7.png

Remove the file images/smalldistro12.png from the repo and rewrite the history (note the backslash in the filename):

remy at sparcstation-20 in ~/repo/raymiiorg (master)
$ ~/bin/git-clean.sh rm imagessmalldistro12.png
Removing Biggest file imagessmalldistro12.png from git repo
Rewrite 2f65a614583053c57934b1b9ad378b77ac1ddbb9 (259/259)
WARNING: Ref 'refs/heads/master' is unchanged
WARNING: Ref 'refs/remotes/origin/master' is unchanged
WARNING: Ref 'refs/remotes/origin/master' is unchanged
WARNING: Ref 'refs/stash' is unchanged
Counting objects: 2159, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (785/785), done.
Writing objects: 100% (2159/2159), done.
Total 2159 (delta 1366), reused 2152 (delta 1363)
Counting objects: 2159, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2148/2148), done.
Writing objects: 100% (2159/2159), done.
Total 2159 (delta 1368), reused 791 (delta 0)

Info

Installation

Copy the script into a text file, chmod +x ./git-clean.sh and run it. If you do not run it from inside a git repository you will get an error.

License

GPLv3.

The script

#!/bin/bash
# Script to remove files from the git history.
# Execute this script in the git root directory
# For a size report: ./gitsize.sh df
# To remove the big files:
# use like: ./gitsize.sh rm $bigfilename
# Example: ./gitsize.sh rm includes/ubuntu-12.04.iso
# It will then search your git repo and remove all of it.

# Copyright (C) 2012 Remy van Elst

#     This program is free software: you can redistribute it and/or modify
#     it under the terms of the GNU General Public License as published by
#     the Free Software Foundation, either version 3 of the License, or
#     (at your option) any later version.

#     This program is distributed in the hope that it will be useful,
#     but WITHOUT ANY WARRANTY; without even the implied warranty of
#     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#     GNU General Public License for more details.

#     You should have received a copy of the GNU General Public License
#     along with this program.  If not, see <http://www.gnu.org/licenses/>.

if [ ! -e $1 ]; then
    if [ -d .git ]; then
        if [ $1 == "rm" ]; then
            if [ ! -e $2 ]; then
                echo "Removing Biggest file $2 from git repo"
                DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &&&& pwd )"
                rm -rf .git/refs/original/
                git filter-branch --index-filter "git rm --cached --ignore-unmatch $2" --tag-name-filter cat -- --all
                rm -rf .git/refs/original/
                git reflog expire --all --expire-unreachable=0
                git repack -A -d
                git prune
                git gc --aggressive
            else 
                echo "I need a file to remove..."
            fi
        elif [ $1 == "df" ]; then
            IFS=$'n';
            echo "All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file."
            objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | grep -v pack | sort -k3nr | head -n 10`
            output="size,pack,SHA,location"
            for y in $objects
            do
                # extract the size in bytes
                size=$((`echo $y | cut -f 5 -d ' '`))
                # extract the compressed size in bytes
                compressedSize=$((`echo $y | cut -f 6 -d ' '`))
                # extract the SHA
                sha=`echo $y | cut -f 1 -d ' '`
                # find the objects location in the repository tree
                other=`git rev-list --all --objects | grep $sha`
                #lineBreak=`echo -e "n"`
                output="${output}n${size},${compressedSize},${other}"
            done

            echo -e $output | column -t -s ', '
        fi
    else
        echo "Cannot find .git directory, exitting. This script should be ran from a git working dir."
        exit 1
    fi
else
    echo "Usage:"
    echo "    For a size report:"
    echo "    $0 df"
    echo "    "
    echo "    To remove a big file from git history:"
    echo "    $0 rm <filename>"
    echo "    Example, to remove the file "includes\ubuntu-12.04.iso":"
    echo "    $0 rm includes\ubuntu-12.04.iso"
    echo "    Made by Raymii.org. GPLv3 License"

fi

Tags: bash, clean, git,