Skip to main content Logo (IEC resistor symbol)logo

Quis custodiet ipsos custodes?
Home | About | All pages | RSS Feed | Gopher

Generate hashes of files with rhash for archival storage

Published: 11-12-2015 | Author: Remy van Elst | Text only version of this article

Table of Contents

Recently I had to archive a large amount of files to archival storage. To savespace and reduce the amount of files I decided to create archives with tar. Thefiles will be stored to tapes and DVD's, and will be restored in full, so randomaccess times are not an issue, therefore the tar.gz choice.

I do want to make sure that when the files need to be restored they still arecorrect. I first dabbled with some long shell commands to create checksums andverify them, but then I found the rhash tool in the repositories. It allowsyou to create checksums of files and folders, recursively, with all sorts ofchecksums, like CRC, MD5, SHA1 and many more. It also makes bulk validation verysimple.

This small article shows you how to create an archive file with the checksumsincluded and shows you how to validate these checksums later on.

If you like this article, consider sponsoring me by trying out a Digital OceanVPS. With this link you'll get $100 credit for 60 days). (referral link)

The data in question are archived tapes, disk copies, source code anddocumentation for the PDP8 mainframe. We also have these for the PDP11 and a fewVAX machines. The archives contain about 5 million files and is about 700 GB insize. The company decided to phase out the on-line storage and place this dataon tapes and dvd's, since they're not accessed more than once or twice a month.

Creating the hashes

The first archive contains PDP8 files located in the folder pdp8. This commandcreates the MD5SUMS file, which we place in the same folder:

rhash --recursive --md5 --output=pdp8/MD5SUMS pdp8/

The archive is later on created with a simple tar -czf pdp8.tar.gz pdp8.

Verifying the hashes

Extract the archive to a folder and use the following command to verify allfiles:

rhash --skip-ok --check pdp8/MD5SUMS 

If all files match the output looks like this:

--( Verifying pdp8/MD5SUMS )------------------------------------------------------------------------------------------------------------------------------------Everything OK

If a file does not match the hash, the output will include it:

--( Verifying pdp8/MD5SUMS )----------------------------------------------------pdp8/pdp8/readme.txt                                ERR--------------------------------------------------------------------------------Errors Occurred: Errors:1   Miss:0   Success:3323 Total:3324

If you leave out the --skip-ok option all files checked will be shown whichmight result in long output.

To manually verify one file, first get the checksum:

grep 'pdp8/readme.txt' pdp8/MD5SUMS 53a1aca1631d55de3feece9e1c4d900a  pdp8/pdp8/readme.txt

Then manually execute the correct checksum command to verify the match:

$ md5sum pdp8/pdp8/readme.txt 53a1aca1631d55de3feece9e1c4d900a  pdp8/pdp8/readme.txt
Tags: archive, bash, blog, gzip, md5sum, rhash, tar