Backup OpenStack object store or S3 with rclone

17-08-2017 | Remy van Elst


Table of Contents


Introduction

rclone_logo

This is a guide that sho ws you how to make backups of an object storage service like OpenStack swift or S3. Most object store services save data on multiple servers, but deleting a file also deletes it from all servers. Tools like rsync or scp are not compatible most of the time with these services, unless there is a proxy that translates the object store protocol to something like SFTP. rclone is an rsync-like, command line tool that syncs files and directories from cloud storage services like OpenStack swift, Amazon S3, Google cloud/drive, dropbox and more. By having a local backup of the contents of your cloud object store you can restore from accidental deletion or easily migrate between cloud providers. Syncing between cloud providers is also possible. It can also help to lower the RTO (recovery time objective) and backups are just always a good thing to have and test.

In this guide we'll do the following:

  • Install rclone
  • Configure rclone backends
  • Copy a remote object store locally
  • Syncing an object store to a different cloud provider
  • Problems with swift and rclone

Installation

rclone is written in the Go programming language, so installation is quite easy, it's a single binary. They only provide Snap packages, no regular .deb or .rpm packages. I personally rather have just a repository or have it packages upstream, but snap works as well.

This guide uses an Ubuntu 16.04 server. By default snapd (the snap package manager) should be installed, but if that's not the case, install it:

apt-get install snapd

Use snap to install rclone:

snap install --classic rclone

The --classic argument is required because it disables the security confinement otherwise it won't be able to access some user files.

On the test machine I used the snap binary was not in the $PATH, I had to logout and log back in. rclone's binary is in /snap/bin/rclone.

If you are on another distro or want to do manual installation, you can do so:

curl "https://downloads.rclone.org/rclone-current-linux-amd64.zip"
unzip "rclone-current-linux-amd64.zip"
cd rclone-*-linux-amd64

cp rclone /usr/bin/
chown root:root /usr/bin/rclone
chmod 755 /usr/bin/rclone

mkdir -p /usr/local/share/man/man1
cp rclone.1 /usr/local/share/man/man1/
mandb 

Updating this manual installation can be done by repeating the above steps

Configure cloud storage

rclone has an interactive config menu. The default config file is in $HOME/.rclone.conf, and after you did the initial configuration setup you can edit that or copy it to other computers.

The first cloud storage I'm going to setup is an OpenStack Swift object store. Execute the interactive config wizard:

rclone config

Create a new remote with n:

    2017/08/17 14:22:05 NOTICE: Config file "/root/.config/rclone/rclone.conf" not found - using defaults
    No remotes found - make a new one
    n) New remote  
    r) Rename remote
    c) Copy remote 
    s) Set configuration password
    q) Quit config 
    n/r/c/s/q> n  

Type swift to select the OpenStack Swift storage type:

Type of storage to configure.
Choose a number from below, or type in your own value
 1 / Amazon Drive
   \ "amazon cloud drive"
 2 / Amazon S3 (also Dreamhost, Ceph, Minio)
   \ "s3"
 3 / Backblaze B2
   \ "b2"
 4 / Dropbox   
   \ "dropbox" 
 5 / Encrypt/Decrypt a remote
   \ "crypt"   
 6 / Google Cloud Storage (this is not Google Drive)
   \ "google cloud storage"
 7 / Google Drive
   \ "drive"   
 8 / Hubic
   \ "hubic"   
 9 / Local Disk
   \ "local"   
10 / Microsoft OneDrive
   \ "onedrive"
11 / Openstack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
   \ "swift"   
12 / SSH/SFTP Connection
   \ "sftp"
13 / Yandex Disk
   \ "yandex"  
Storage> swift 

Enter your username and password (without any project/tenant ID's):

User name to log in.
user> example@example.org 
API key or password.
key> hunter2

Enter the authentication URL (keystone auth_url), in my example for CloudVPS it is: https://identity.stack.cloudvps.com/v2.0:

Authentication URL for server.
Choose a number from below, or type in your own value
 1 / Rackspace US
   \ "https://auth.api.rackspacecloud.com/v1.0"
 2 / Rackspace UK
   \ "https://lon.auth.api.rackspacecloud.com/v1.0"
 3 / Rackspace v2
   \ "https://identity.api.rackspacecloud.com/v2.0"
 4 / Memset Memstore UK
   \ "https://auth.storage.memset.com/v1.0"
 5 / Memset Memstore UK v2
   \ "https://auth.storage.memset.com/v2.0"
 6 / OVH
   \ "https://auth.cloud.ovh.net/v2.0"
auth> https://identity.stack.cloudvps.com/v2.0

Depending on the data you received from your cloud provider, some of the following options are required. The tenant name in my case is and I use the tenant_id in this field. When ever someone renames the tenant the config won't break:

User domain - optional (v3 auth)
domain>
Tenant name - optional for v1 auth, required otherwise
tenant> 2a1[...]662
Tenant domain - optional (v3 auth)
tenant_domain> 
Region name - optional
region>
Storage URL - optional
storage_url>   
AuthVersion - optional - set to (1,2,3) if your auth URL has no version
auth_version>  

The configuration is summarized, press y to confirm:

Remote config
--------------------
[swift1]
user = example@example.org
key = hunter2
auth = https://identity.stack.cloudvps.com/v2.0
domain =
tenant = 2a1[...]662
tenant_domain =
region =
storage_url =  
auth_version = 
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y

We can quit the configuration:

Current remotes:

Name                 Type
====                 ====
swift1               swift

e) Edit existing remote
n) New remote  
d) Delete remote
r) Rename remote
c) Copy remote 
s) Set configuration password
q) Quit config 
e/n/d/r/c/s/q> q

You can find the location of the current configuration file by grepping the help (seriously?):

    rclone help | grep 'config'
    and configuration walkthroughs.
      config          Enter an interactive configuration session.
      listremotes     List all the remotes in the config file.
          --ask-password                      Allow prompt for password for encrypted configuration. (default true)
          --config string                     Config file. (default "/root/.config/rclone/rclone.conf")

In my case after doing the initial config wizard the location changed from /root/.rclone.conf to /root/.config/rclone/rclone.conf. No idea why.

Test the configuration by listing the containers in the object store:

rclone lsd swift1:

     4383157 0001-01-01 00:00:00       183 loadtest
   764180481 0001-01-01 00:00:00         6 olimex
  3219384585 0001-01-01 00:00:00     14458 pics
    22099726 0001-01-01 00:00:00      9145 smallpics

If you have no folders or objects, then create one:

rclone mkdir swift1:rclone_test

And do the lsd.

If you have problems with a Swift backend, please see the last part of this guide. Most likely your credentials or other data like project ID or region will be wrong.

For this example I have also setup another swift backend at a different OpenStack provider (fuga by CYSO). You can setup any cloud provider you like, or just use SFTP (via SSH) to some location remote. rclone abstracts this away for you.

One important point with rclone is that by default it does not follow symlinks. This is because the software works on Windows as well and there is no support for symlinks there. If you do have symlinks then you must give the -L / --copy-links command line option.

Local backup of an Object Store

After you've set up the rclone remotes we can configure a backup to the local machine. This can be a server somewhere or you workstation. To keep the example simple, there is no automated cleanup in this guide, but you can easily set this up. The command syncs the backend to the local filesystem, based on the day, so if you schedule this cron once a day you have a full backup every day.

rclone sync swift1:loadtest /root/backup/$(date +%Y%m%d)/

There is no output. Listing the directory does show a full backup locally:

ls -la /root/backup/20170817/20170412-1138/
total 1208
drwxr-xr-x 7 root root    4096 Aug 17 16:21 .
drwxr-xr-x 4 root root    4096 Aug 17 16:21 ..
drwxr-xr-x 2 root root    4096 Aug 17 16:21 csv_data
drwxr-xr-x 2 root root    4096 Aug 17 16:21 data

Remote via rclone:

rclone lsd swift1:loadtest/20170412-1138
           0 0001-01-01 00:00:00         0 csv_data
           0 0001-01-01 00:00:00         0 data

Using the swift command line:

$ swift list loadtest
20170412-1138/data/...
20170412-1138/csv_data/...

Running this as a cron script every day allows you to have a backup of the object store at a different location, plus versioned. rclone does not support incremental or differential backups, (see documentation),

Sync two object stores

Syncing two object stores with rclone is usefull when you need the contents to always be online, even if one service provider has a large outage. If your application supports it, the best thing is to let the application do dual uploads to multiple object stores. It could then also load from different object stores if one is down.

If dual upload is not available, you can use rclone to do a sync between object stores. rclone does have to download every file locally before uploading it to the other side, so the machine you use to sync object stores must have enough free disk and lots of bandwidth.

Using the above commands, you could also implement a backup of one object store to another. This example just syncs the stores, so that in case of a disruption you can change the configuration in your application and not have downtime or loss of data for a long period.

This example uses two swift object stores, since just changing configuration for swift is applicable in more cases. If you sync Amazon to swift you need to have swift and s3 compatibility in your software. (or any other two different protocols). Most swift object stores do offer S3 emulation, but compatibility differs between software versions so test that beforehand.

In this example I have setup another object store with Cyso (fuga.io) to do the syncing. CloudVPS object store is named swift1 and fuga is named swift2 in the rclone config. The data in container loadtest goes from CloudVPS to fuga. Files added, changed or removed at CloudVPS are added, changed and removed over at fuga as well, there is no versioning.

This is the rclone command:

rclone sync swift1:loadtest swift2:loadtest

This can be put in cron just fine.

Check and verify that the contents is on the other side:

rclone lsd swift1:loadtest/20170412-1138
           0 0001-01-01 00:00:00         0 csv_data
           0 0001-01-01 00:00:00         0 data
           0 0001-01-01 00:00:00         0 gnuplot_scripts
           0 0001-01-01 00:00:00         0 images
           0 0001-01-01 00:00:00         0 style
rclone sync swift1:loadtest swift2:loadtest
rclone lsd swift2:loadtest/20170412-1138
           0 0001-01-01 00:00:00         0 csv_data
           0 0001-01-01 00:00:00         0 data
           0 0001-01-01 00:00:00         0 gnuplot_scripts
           0 0001-01-01 00:00:00         0 images
           0 0001-01-01 00:00:00         0 style

To script the check if the sync is correct, for example for use with a monitoring system, you can use the rclone check command:

rclone check swift1:loadtest swift2:loadtest
2017/08/18 08:44:14 NOTICE: Swift container loadtest: 0 files not in Swift container loadtest
2017/08/18 08:44:14 NOTICE: Swift container loadtest: 0 files not in Swift container loadtest
2017/08/18 08:44:16 NOTICE: Swift container loadtest: 0 differences found
echo $?
0

If there are differences, the exit code will be 1 and the command outputs the difference. Perfect for monitoring:

rclone delete swift2:loadtest/20170412-1138/csv_data/
rclone check swift1:loadtest swift2:loadtest
2017/08/18 08:46:49 NOTICE: Swift container loadtest: 0 files not in Swift container loadtest
2017/08/18 08:46:49 NOTICE: Swift container loadtest: 42 files not in Swift container loadtest
2017/08/18 08:46:49 ERROR : 20170412-1138/csv_data/graphes-Transactions-mean.csv: File not in Swift container loadtest
[...]
2017/08/18 08:46:49 ERROR : 20170412-1138/csv_data/graphes-Users_Arrival-total.csv: File not in Swift container loadtest
2017/08/18 08:46:49 ERROR : 20170412-1138/csv_data/graphes-freemem-stddev.csv: File not in Swift container loadtest
2017/08/18 08:46:49 ERROR : 20170412-1138/csv_data/graphes-Users_Arrival-rate.csv: File not in Swift container loadtest
2017/08/18 08:46:51 NOTICE: Swift container loadtest: 42 differences found
2017/08/18 08:46:51 Failed to check: 42 differences found
echo $?
1

By having a second live version of your data, you are able to meet a lower RTO (recovery time objective). If one service provider has a major outage, you don't have to wait hours, or even days until it is fixed. You just restore the backup or change your configuration and are up and running again.

Do note that as with every backup, it's important to test this regularly. Do a failover once in a while or try to do a restore and see what works and what not. Document it so that your team can do it as well, saves you another call in the middle of the night.

Errors with Swift

I tried to setup a backend at another cloud provider (OpenStack over at CYSO, fuga.io). After setting up the configuration with the correct username, password, auth_url and such (since the nova and swift CLI worked), rclone kept giving a non-descriptive error:

rclone mkdir swift2:rclone_test
2017/08/17 15:51:18 Failed to create file system for "swift2:rclone_test": Bad Request

Setting the loglevel to DEBUG or specifying verbose mode did not help. The documentation states the following:

Due to an oddity of the underlying swift library, it gives a "`" error rather than a more sensible error when the authentication fails for Swift.

So this most likely means your username / password is wrong. You can investigate further with the `--dump-bodies` flag.

Using this --dump-bodies flag gave me more information:

rclone -vvvvvvvv --dump-bodies  mkdir swift2:rclone_test
2017/08/17 15:53:29 DEBUG : rclone: Version "v1.36" starting with parameters ["/snap/rclone/466/bin/rclone" "-vvvvvvvv" "--dump-bodies" "mkdir" "swift2:rclone_test"]
2017/08/17 15:53:29 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2017/08/17 15:53:29 DEBUG : HTTP REQUEST (req 0xc82012e620)
2017/08/17 15:53:29 DEBUG : POST /v2.0/tokens HTTP/1.1
Host: identity.api.fuga.io:5000
User-Agent: rclone/v1.36
Content-Length: 131
Content-Type: application/json
Accept-Encoding: gzip

{"auth":{"passwordCredentials":{"username":"example@example.org","password":"hunter2"},"tenantName":"$TENANT_ID"}}
2017/08/17 15:53:29 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2017/08/17 15:53:29 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2017/08/17 15:53:29 DEBUG : HTTP RESPONSE (req 0xc82012e620)
2017/08/17 15:53:29 DEBUG : HTTP/1.1 401 Unauthorized

The error here was my fault, Fuga doesn't accept the tenandid as the tenantname. After specifying the correct tenant_name it did work.

For reference, here are the two configuration files. The first is for CloudVPS:

[swift1]
type = swift
user = example@example.org
key = hunter2
auth = https://identity.stack.cloudvps.com/v2.0
domain = 
tenant = 2a1[...]662
tenant_domain = 
region = 
storage_url = 
auth_version = 

Fuga:

[swift2]
type = swift
user = example2@example2.org
key = hunter2
auth = https://identity.api.fuga.io:5000/v2.0
domain = 
tenant = my_tenant_name
tenant_domain = 
region = 
storage_url = http://object.api.fuga.io/swift/v1
auth_version = 

Tags: amazon, backup, cloud, openstack, rclone, rsync, s3, swift,