July 3, 2008

HowTo: Automatically Backup Windows Machines to One Centralized Data Storage

I recently needed to set up an automatic backup for a few Windows workstations and few virtual machines running under VirtualBox. All data should be transferred to one standalone storage -- a RAID 0 Icybox disks.

Having read about rsync, the first idea was to set up rsync to do the backup. Rsync however requires an rsync daemon, or a unix shell on both synced machines. Although it is possible to run rsync on Windows using cygwin, there is no way to get rsync running on the Icybox.

Sidebar: What is rsync?

To clarify things a bit: rsync is a tools for remote synchronization of files. Feed it with two arguments -- the source folder and the destination folder and it will intelligently sync their contents. It will not blindly copy one folder over another, it will detect which files are outdated in the destination and will only copy these, saving your bandwidth and transfer time.

Working around the problem

I solved the problem with Icebox and Windows machines using a third, "man in the middle" server, which is actually running rsync and performing the backup between the mounted network shares.

The scheme is simple. Icybox is one big smb share, so I mount it on the "middle" server (/media/shares/users-backup). Each user is sharing all folders, that need to be backed up, and they are mounted too. I mount the user shares in one parent directory (/media/shares/users/username), which allows me to run rsync recursively on /media/shares/users, so the whole mounting machinery gets transparent for rsync.

Running the command rsync /media/shares/users /media/shares/users-backup will nicely backup all mounted shares to the remote disk. Note, that rsync is blissfully unaware of the fact that it is syncing two remote directories -- it thinks it is doing just a local copy. It also doesn't care what shares are mounted -- what is mounted at the time rsync runs is backed up.

The rsync command is added to cron and it is run each day at 1:00 am. All that is required from the users is to share their backup folders. The shares might be even protected with passwords (on both sides).

Show me teh code

To cook this delicious meal, we would need:

  • rsync
  • smbfs and/or cifs
  • cron

Step 1: Mount

The mounting needs to be done in /etc/fstab so it can be mounted at system startup and by executing mount -a command.

Open the /etc/fstab file and add one line for each of your shares:

//tomas/projects /media/shares/users/tomas/ smbfs ro,user,guest,nounix 0 0 
If your shares are standars Windows shares, or samba shares, use the smbfs option, if you are using nfs use cifs. Smbfs is unmantained and replaced by cifs, but unfortunately, cifs cannot resolve netbios hostnames. If your hosts are are using dhcp addresses, using hostnames instead of IPs is always a good idea.

It might happen that you will not be able to mount you share with smbfs. If this is the case, double check, that your share name does not end with a slash.Using //tomas/projects is ok, but using //tomas/projects/ will get you a nice "is not directory" error. If your share name is ok, but you are still unable to mount, try the cifs option.

Step 2: Write the rsync script

This is nothing complicated, just run rsync and log the times. As the first thing, we remount the shares, so if someone turned on or off the computer since the last mount, we get the current state.
#!/bin/bash
echo -e "\n===Rsync Start: `date` ===" >> /var/log/rsync-users.log

mount -a -o remount >> /var/log/rsync-users.log

rsync --verbose --stats --recursive --checksum --update --times/media/shares/users/ /media/shares/users-backup/ | tee -a/var/log/rsync.log

echo "===Rsync Stop: `date`===" >> /var/log/rsync-users.log
To ignore some files and directories, I use the --exclude-from option and pass it the file with each exclusion pattern on new line. The following sample is used in the script that backs up our vmware servers.
.*
*.vmem
*.WRITELOCK
*.log

Step 3: Add rsync to cron

Edit /etc/crontab and add following line:

1 0 * * *      backup    /home/backup/rsync-users
The second column specifies a user, which the script should run under.

That's it. The machinery starts at 1 am and all computers that are turned on are backed up. The backup is incremental so it usually takes only few minutes to resync.