Backup Mac folders with versions using RockStor Snapshots

On a Mac, you can use Time Machine to maintain backups of your hard drive, with versions, so that you can jump back to any point in time that Time Machine has made a backup. But what if you wanted to do that only for a few folders, and with more control over the whole process? This post describes how to achieve this using rsync and the snapshot feature of RockStor.

Versions of backup folders using rsync

This post was inspired by Time Machine for every Unix out there, which uses the –link-dest option of rsync to maintain links to older versions of files. With RockStor, we can achieve the same thing with snapshots, which are an efficient way of capturing the state of a folder at a specific point in time.

A better way. rsync + Snapshots = Versioned Backup

Here’s how I take backups of a folder on my Mac, to a shared folder on RockStor, and maintain versions of this folder. (See the earlier blog post to make a RockStor Share accessible as a shared folder on your Mac).

Every time I want to make a backup, I do the following

  • Copy files using rsync

Using rsync, I copy the Mac folder to the shared folder. The first time, this operation could take a long time since it copies the entire folder contents over. But the subsequent ones will be faster since only the changes from the previous time will be copied.

rsync -avz <source_folder> <shared_folder>
  • Take a snapshot of the share

A snapshot is a point in time representation of the contents of the Share. RockStor allows users to take snapshots of shares using either the web-ui or the command line interface (CLI). Since I like to automate my backup process, I use the CLI over ssh to issue a snapshot command for that share. (In my script I generate a snapshot name that contains the current date and time, so that I can identify the snapshot folders easily). The command to create a snapshot with the name ‘mysnapshot’ for the share ‘myshare’ is as follows.

ssh macuser@192.168.56.102 "shares share myshare snapshot add -v mysnapshot"

The snapshot command is issued over ssh. The CLI works as a series of subsystems, and you navigate to the subsystem you desire and issue the appropriate command. Here’s what the different components of the command mean

  • shares – this takes you to the ‘shares’ subsystem
  • share myshare – takes you to the subsystem for operations on the share ‘myshare’
  • snapshot add -v mysnapshot is the command to add a snapshot with name ‘mysnapshot’ for the share. The -v flag makes the snapshot visible as a hidden folder in the Samba export.

This way I have a list of snapshots, at multiple points in time, so I can go back and retrieve files from an older snapshot, if I accidentally lose anything in my current folder on the Mac.

(The full script is at https://gist.github.com/sujeetsr/8606389 )

Retrieving files from snapshots

Let see an example of retrieving a file from a snapshot, when it has been deleted from the share.

In my folder share2, I have a few image files, which you can see in the screenshot below. At this point, I have not created any snapshots.

folder_1

I now take a snapshot of the share, and the snapshot appears as a hidden folder in the share. The snapshot contains all the files that were in the share at the time the snapshot was taken, as seen below.

Screen Shot 2014-02-22 at 5.39.51 PM

Now I delete the file called ‘ocean_bean_1.jpg’, and as you can see from the screenshot below, it is not present in the share itself, but is still present in the snapshot. So If I need to retrieve it at any later point in time, I can always get it from that snapshot folder.

Screen Shot 2014-02-22 at 5.43.16 PM

Snapshots thus are an easy and efficient way to maintain versions of folders, and enable you to retrieve files, or versions of files that were present at some point in the past.

Read More

Data replication with Rockstor

A lot of users ask us if Rockstor can do replication, which is completely understandable. Who doesn’t need disaster recovery? Local redundancy is a must, which is provided by many software raid levels available in Rockstor. But, that only helps with some disaster scenarios caused by broken disk drives and such. What about replicating data from one Rockstor appliance to another, efficiently and automatically on a schedule?Enter Rockstor’s replication framework.

The replication framework is a very useful feature and there are many aspects to it. It would be very confusing to write about all of it in one post, So I’ll focus on the basics in this post and cover topics such as recovering data, implementation details and others in future posts.

So What exactly is it?

Replication has different meaning in different storage products. In Rockstor, replication refers to a fast and efficient periodic backup of Shares to another Rockstor appliance. In tech speak, it’s called Asynchronous Replication.

Say, you have one Rockstor appliance holding your data in multiple Shares. Furthermore, let’s say that one of those Shares is very important and you’d like it to be mirrored on another Rockstor appliance in a different building or a remote branch office. You can schedule a replication task in the web-ui such that your Share is backed up every few minutes or hours or some other interval you choose. The local Share is then mirrored by a remote Share. In case of a local disaster, the data contained(and possibly destroyed) in your local Share can be recovered from it’s mirror copy. An important note, however, is that the remote Share does not contain any possible changes made locally after the last backup.

How does it work?

As many of our users know, the underlying filesystem in Rockstor is BTRFS. One of many super cool features of BTRFS is send/receive. It is the basic foundation on which Rockstor’s replication framework is built on.

Suppose a Share is scheduled to be replicated every hour. A replication task will run every hour as scheduled. The first replication task runs right away and when it does, the first thing it does is create a Snapshot of the share, a point in time read-only copy of the Share. Because of BTRFS’s copy-on-write capabilities, the Snapshot is created instantaneously and occupies no extra space. Then it copies the entire contents of the snapshot to the remote Rockstor appliance where the Share is reconstructed. So the first replication task can take a while, proportional to the amount of data in the Share and subject to constraints like network throughput. But the next replication task that runs an hour later creates a new Snapshot and sends only the changes since the last snapshot taken an hour ago. The receiving side reconstructs the mirror from these changes and subsequent replication tasks follow the same lockstep pattern.

Cool! How do I setup replication?

It all sounds great, but is it easy to schedule replication policy for a Share? Is it easy to recover data when needed? Is the framework resilient to network and other errors? The answer to all these questions is YES! Rockstor’s replication framework addresses all these concerns to provide a simple, reliable and resilient mechanism.

I’ll give a quick setup tutorial here, but for more information read Rockstor documentation.

Step 1: Pair two Rockstor appliances

In order for replication to work, there must be two Rockstor appliances that are aware of each other. Let’s call them Source and Destination. We’ll setup a Share on Source to be replicated on Destination every 10 minutes.

Login to the web-ui of Source and click on the hostname/ip right below the logo in the top left corner. This takes you to the Appliances screen where you add a new appliance by clicking the Add Appliance button.

appliance_pair3

Once you click submit in the Add Appliance form, the Source connects to Destination using the admin credentials provided and upon success, adds it. Repeat this process on Destination and add Source as an appliance. Replication will fail if the two appliances are not aware of each other.

Step 2: Turn on Replication service

This can be done from System -> Services screen in the web-ui. If the service is not enabled, you’ll be warned while adding replication tasks described in the next step.

Step 3: Create a replication task

Replication tasks are like cronjobs, you can schedule one task for every Share you want replicated. This can be done by clicking on Add Replication Task button from Storage -> Replication -> Send screen. Select the Share you want replicated from the drop down. Target appliance should be the IP of Destination in our case and it should already be selected if there are no other appliances. Don’t change the default replication ports unless you are setting up replication over WAN(I’ll cover this in a different post). For the Target pool name, provide the name of a pool on Destination that you want to house the backup in.

add_rep_task

Step 4: Watch the progress

Once the replication task is added, the first transfer from Source to Destination is triggered. This could take a while if the Share is large. In Storage -> Replication -> Send screen, you’ll now see a table with details of the newly added replication task. Click on the link in the Last backup column for detailed trail of replication tasks that were executed so far. In the below example, there are four failed attempts followed by the first successful attempt. The failed attempts occurred because I did not add Source appliance to the Destination appliance. Once I did that, everything started working as expected.

replica_send_trail

You can see similar information on the Destination appliance as replication tasks execute over time. To see the receiving side of things, go to Storage -> Replication -> Receive screen. Make sure to do this on the Destination appliance and not the Source. You’ll see the Destination side information of the replication task we added earlier. Click on the link in Last Receive column to display the trail of replication tasks seen from the receiving side.

receive_replica_trailWe don’t see any information about first four failed attempts because those attempts couldn’t even be recorded as the Destination appliance was not aware of the sender and rejected it. Once we added the Source appliance, it started playing nice.

There’s more to write about, including setting up replication over WAN, resilient features etc.. I plan to cover them in separate posts soon.

Read More