1 Dec 2012 timj   » (Master)

Sayebackup.sh – deduplicating backups with rsync


Due to popular request, I’m putting up a polished version of the backup script that we’ve been using over the years at Lanedo to backup our systems remotely. This script uses a special feature of rsync(1) v2.6.4 for the creation of backups which share storage space with previous backups by hard-linking files.
The various options needed for rsync and ssh to minimize transfer bandwidth over the Internet, time-stamping for the backups and handling of several rsync oddities warranted encapsulation of the logic into a dedicated script.


The GitHub release tag is here: backups-0.0.1
Script URL for direct downloads: sayebackup.sh


This example shows creation of two consecutive backups and displays the sizes.

$ sayebackup.sh -i ~/.ssh/id_examplecom user@example.com:mydir # create backup as bak-.../mydir
$ sayebackup.sh -i ~/.ssh/id_examplecom user@example.com:mydir # create second bak-2012...-snap/
$ ls -l # show all the backups that have been created
drwxrwxr-x 3 user group 4096 Dez  1 03:16 bak-2012-12-01-03:16:50-snap
drwxrwxr-x 3 user group 4096 Dez  1 03:17 bak-2012-12-01-03:17:12-snap
lrwxrwxrwx 1 user group   28 Dez  1 03:17 bak-current -> bak-2012-12-01-03:17:12-snap
$ du -sh bak-* # the second backup is smaller due to hard links
4.1M    bak-2012-12-01-03:16:50-snap
128K    bak-2012-12-01-03:17:12-snap
4.0K    bak-current
Usage: sayebackup.sh [options] sources...
  --inc         make reverse incremental backup
  --dry         run and show rsync with --dry-run option
  --help        print usage summary
  -C <dir>      backup directory (default: '.')
  -E <exclfile> file with rsync exclude list
  -l <account>  ssh user name to use (see ssh(1) -l)
  -i <identity> ssh identity key file to use (see ssh(1) -i)
  -P <sshport>  ssh port to use on the remote system
  -L <linkdest> hardlink dest files from <linkdest>/
  -o <prefix>   output directory name (default: 'bak')
  -q, --quiet   suppress progress information
  -c            perform checksum based file content comparisons
  -x            disable crossing of filesystem boundaries
  --version     script and rsync versions
  This script creates full or reverse incremental backups using the
  rsync(1) command. Backup directory names contain the date and time
  of each backup run to allow sorting and selective pruning.
  At the end of each successful backup run, a symlink '*-current' is
  updated to always point at the latest backup. To reduce remote file
  transfers, the '-L' option can be used (possibly multiple times) to
  specify existing local file trees from which files will be
  hard-linked into the backup.
 Full Backups:
  Upon each invocation, a new backup directory is created that contains
  all files of the source system. Hard links are created to files of
  previous backups where possible, so extra storage space is only required
  for contents that changed between backups.
 Incremental Backups:
  In incremental mode, the most recent backup is always a full backup,
  while the previous full backup is degraded to a reverse incremental
  backup, which only contains differences between the current and the
  last backup.
 RSYNC_BINARY Environment variable used to override the rsync binary path.
See Also

Testbit Tools – Version 11.09 Release

flattr this!

Syndicated 2012-12-01 02:32:59 from Tim Janik

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!