How To Set Up An Automated Home Directory Backup System Using Rsync Hard-Linked Snapshots

By Kevin Wells · 23 September 2025 

Estimated reading time: 7-10 minutes  (25 minutes for a line-by-line script audit).

Quick Start
  1. Set your paths. Replace <user> and /mnt/backup with your own.
  2. Install everything in one go. Paste this into your shell command line as root. It creates the snapshot script, exclude lists, a cron schedule, and runs a dry run.
sudo bash <<'ROOT'
set -euo pipefail

SRC="/home/<user>/"
DEST_ROOT="/mnt/backup/home-snapshots"
SNAP_DIR="$DEST_ROOT/snapshots"
LOG_DIR="$DEST_ROOT/logs"

mountpoint -q /mnt/backup || { echo "ERROR: /mnt/backup not mounted."; exit 1; }

mkdir -p "$SNAP_DIR" "$LOG_DIR"

# Canary to block wipe propagation
if [[ ! -f ${SRC}.BACKUP_CANARY ]]; then
  touch ${SRC}.BACKUP_CANARY
  chattr +i ${SRC}.BACKUP_CANARY 2>/dev/null || true
fi

# Daily excludes - lean
cat >/etc/rsync-home.exclude <<'EOF'
/.cache/
/**/.cache/
/**/__pycache__/
/**/node_modules/
/**/.npm/
/**/.venv/
/**/.m2/repository/
/Downloads/**           # optional - usually exclude daily
# large and mutable
*.iso
*.img
*.qcow2
*.vdi
*.vmdk
*.ova
*.ovf
# optional media and archives
#*.zip
#*.tar
#*.tgz
#*.gz
#*.7z
#*.rar
#*.mp4
#*.mkv
#*.mov
# explicit NFS subtree if present
/NFS/**
EOF

# Weekly excludes - start from daily, relax by commenting lines you want included weekly
cp /etc/rsync-home.exclude /etc/rsync-home.weekly.exclude

# Guarded snapshot runner
cat >/usr/local/sbin/home-snapshot-backup.sh <<'EOF'
#!/usr/bin/env bash
set -euo pipefail

SRC="/home/<user>/"
DEST_ROOT="/mnt/backup/home-snapshots"
SNAP_DIR="$DEST_ROOT/snapshots"
LATEST="$DEST_ROOT/latest"
TODAY="$(date +%F)"
DEST_SNAP="$SNAP_DIR/$TODAY"
LOG="$DEST_ROOT/logs/rsync-$TODAY.log"
DRY_LOG="$DEST_ROOT/logs/rsync-$TODAY.dryrun.log"

EXCLUDE_FILE="${EXCLUDE_FILE:-/etc/rsync-home.exclude}"
MIN_FILES="${MIN_FILES:-1000}"
MAX_DELETE="${MAX_DELETE:-5000}"
DELETE_RATIO="${DELETE_RATIO:-0.05}"

DRY_RUN_ONLY=0
[[ "${1:-}" == "--dry-run" ]] && DRY_RUN_ONLY=1

mountpoint -q /mnt/backup || { echo "Backup disk not mounted." | tee -a "$LOG"; exit 1; }
[[ -f ${SRC}.BACKUP_CANARY ]] || { echo "Canary missing. Aborting." | tee -a "$LOG"; exit 2; }

SRC_COUNT=$(find "${SRC%/}" -xdev -type f 2>/dev/null | wc -l | tr -d " ")
(( SRC_COUNT < MIN_FILES )) && { echo "Too few files in source ($SRC_COUNT < $MIN_FILES). Aborting." | tee -a "$LOG"; exit 3; }

mkdir -p "$DEST_SNAP"

RSYNC_OPTS=(-aHAX --numeric-ids -x --no-links
  --delete --delete-excluded --delete-delay
  --itemize-changes --human-readable --stats
  --partial-dir=.rsync-partial
)
[[ -f "$EXCLUDE_FILE" ]] && RSYNC_OPTS+=(--exclude-from="$EXCLUDE_FILE")

DRY_OPTS=(--dry-run --log-file="$DRY_LOG")
REAL_OPTS=(--log-file="$LOG" --max-delete="$MAX_DELETE")
[[ -L "$LATEST" ]] && DRY_OPTS+=(--link-dest="$LATEST") && REAL_OPTS+=(--link-dest="$LATEST")

: >"$DRY_LOG" || true
rsync "${RSYNC_OPTS[@]}" "${DRY_OPTS[@]}" "$SRC" "$DEST_SNAP/" >/dev/null || true

DEL_COUNT=$(grep -cE '^\*deleting ' "$DRY_LOG" || true)
TOTAL_ITEMS=$(grep -cE '^[^ ]' "$DRY_LOG" || true)
(( TOTAL_ITEMS < MIN_FILES )) && TOTAL_ITEMS=$SRC_COUNT

if (( TOTAL_ITEMS > 0 )); then
  if command -v bc >/dev/null 2>&1; then
    RATIO=$(echo "scale=6; $DEL_COUNT / $TOTAL_ITEMS" | bc)
    OK=$(echo "$RATIO <= $DELETE_RATIO" | bc)
    [[ "$OK" -eq 0 ]] && { echo "Deletion ratio $RATIO > $DELETE_RATIO. Aborting." | tee -a "$LOG"; exit 5; }
  else
    (( DEL_COUNT * 20 > TOTAL_ITEMS )) && { echo "Deletion ratio > 5% approx. Aborting." | tee -a "$LOG"; exit 5; }
  fi
fi

(( DRY_RUN_ONLY == 1 )) && { echo "Dry run complete. Would delete=$DEL_COUNT. Snapshot=$DEST_SNAP"; exit 0; }

nice -n 10 ionice -c2 -n7 rsync "${RSYNC_OPTS[@]}" "${REAL_OPTS[@]}" "$SRC" "$DEST_SNAP/"
ln -sfn "$DEST_SNAP" "$LATEST"

KEEP="${KEEP:-60}"
mapfile -t OLD < <(ls -1dt "$SNAP_DIR"/* 2>/dev/null | tail -n +$((KEEP+1)) || true)
(( ${#OLD[@]} > 0 )) && rm -rf -- "${OLD[@]}"

echo "Backup OK: $DEST_SNAP | source files: $SRC_COUNT | deletions in dry-run: $DEL_COUNT" | tee -a "$LOG"
EOF
chmod 0755 /usr/local/sbin/home-snapshot-backup.sh

# Cron schedule
cat >/etc/cron.d/home-snapshots <<'EOF'
SHELL=/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

# Daily lean snapshot 02:10
10 2 * * * root /usr/local/sbin/home-snapshot-backup.sh

# Weekly fuller snapshot 02:30 Sunday - relax excludes by editing /etc/rsync-home.weekly.exclude
30 2 * * 0 root EXCLUDE_FILE=/etc/rsync-home.weekly.exclude /usr/local/sbin/home-snapshot-backup.sh
EOF

# One-off dry run probe
/usr/local/sbin/home-snapshot-backup.sh --dry-run || true

echo "Installed. Replace <user> and paths as required."
ROOT
  1. Verify quickly.
sudo systemctl status cron --no-pager
sudo ls -l /etc/cron.d/home-snapshots
ls -l /usr/local/sbin/home-snapshot-backup.sh
sudo /usr/local/sbin/home-snapshot-backup.sh --dry-run
sudo tail -n +1 /mnt/backup/home-snapshots/logs/rsync-$(date +%F).dryrun.log

Tip: keep heavy, mutable files out of the daily set. Include them weekly if needed.

Executive Summary

An automated rsync snapshot system that protects a Linux home directory with daily hard-linked snapshots and a weekly fuller run. It blocks wipe propagation, never follows symlinks, avoids other filesystems, and keeps nightly growth small.

Design Goals

  • Daily snapshots with near-zero space for unchanged files using --link-dest.
  • No traversal into mounts or symlinks under the source tree.
  • Preflight safety checks that abort on suspicious deletions or missing canary.
  • Clean restores for single files, subtrees, or the entire home directory.
  • Standard tooling only: rsync and cron.

Threat Model

  • Accidental deletion or corruption in the home directory.
  • Mirror jobs deleting the backup when the source is empty.
  • Crossing into mounted filesystems or following symlinks to external volumes.
  • Destination not mounted which would divert writes to the root filesystem.

High-Level Architecture

Source:        /home/<user>/
Destination:   /mnt/backup/home-snapshots/
Structure:     /mnt/backup/home-snapshots/
                 ├── snapshots/YYYY-MM-DD/
                 ├── latest -> snapshots/YYYY-MM-DD
                 └── logs/rsync-YYYY-MM-DD[.dryrun].log
Scheduler:     /etc/cron.d/home-snapshots

NOTE: Never publish real usernames, hostnames, or mount labels. Use placeholders as shown.

Installation Notes

  • Use a dedicated mount path like /mnt/backup, not a symlink under your home.
  • Create a canary file under the source. If it is missing, the job aborts.
  • Daily and weekly exclude lists keep nightly deltas small while still offering coverage.

Daily vs Weekly Profiles

The daily job is lean. It excludes caches and large mutable blobs to avoid nightly GB growth. The weekly job can include more by relaxing the exclude list. This split keeps daily deltas predictable.

Restore Workflow

  1. Identify a snapshot under snapshots/YYYY-MM-DD/ or use latest.
  2. Restore a directory:
    SNAP="/mnt/backup/home-snapshots/snapshots/2025-09-23"
    sudo rsync -aHAX --numeric-ids "$SNAP/Documents/" "/home/<user>/Documents/"
  3. For a full restore, rsync the snapshot root to /home/<user>/. Recreate external mounts and symlinks separately.

Operational Notes

  1. Logs are written on the backup disk. Review logs/rsync-YYYY-MM-DD.log and .dryrun.log per run.
  2. Retention keeps the most recent 60 snapshots by default. Set KEEP to adjust.
  3. Thresholds can be tuned per run:
    sudo MIN_FILES=800 MAX_DELETE=2000 DELETE_RATIO=0.03 /usr/local/sbin/home-snapshot-backup.sh --dry-run
  4. Security stance: script runs as root from cron, files are root-owned, and the backup drive should be access-controlled.

Why Deploy Rsync Snapshots Instead of Using A Deduplicating Repository?

Rsync with hard links is transparent and robust. Deduplicating tools like Borg or Restic can save more space for partially changing large files, but they add repository formats and extra tooling. For home directory snapshots that exclude heavy churn, rsync snapshots strike a practical balance.

 

All identifiers in this post are placeholders. Replace <user> and mount paths with your environment. Do not publish real hostnames, IPs, usernames, UUIDs, or absolute internal paths.
© 2025 Kevin Wells. You are free to quote with attribution and a link back. If you deploy this stack at work, adapt it to your organisation’s security baseline and change-control process.