Difference between revisions of "Research Data Backup"
(→Basic Outline) |
(→How To Get Access) |
||
Line 18: | Line 18: | ||
If you have a small amount of data, say less than 2 or 3 G, then you could simply <tt>rsync</tt> or <tt>scp</tt> your data to <tt>garfield.cs.mun.ca</tt>. However, if you have a significant amount of data, then the Labnet people would prefer that you do this directly to the server that is attached to the storage: <tt>carme.pcglabs.mun.ca</tt>. To do this, however, you will need to have access to that machine activated. Please [[Ask Support | contact support]] and request '''MUN RDB Access''' and this will be arranged. [This process is under revision. Looks like most likely we will point most people to garfield or ganymede. And we won't be giving users direct access to carme. Mike will clarify this later.] | If you have a small amount of data, say less than 2 or 3 G, then you could simply <tt>rsync</tt> or <tt>scp</tt> your data to <tt>garfield.cs.mun.ca</tt>. However, if you have a significant amount of data, then the Labnet people would prefer that you do this directly to the server that is attached to the storage: <tt>carme.pcglabs.mun.ca</tt>. To do this, however, you will need to have access to that machine activated. Please [[Ask Support | contact support]] and request '''MUN RDB Access''' and this will be arranged. [This process is under revision. Looks like most likely we will point most people to garfield or ganymede. And we won't be giving users direct access to carme. Mike will clarify this later.] | ||
− | If you have more than | + | If you have more than 50G of data, please [[Ask Support | contact support]] and request '''MUN RDB Large DATA Access''' with details of how much space you need. Availability is dependent on University resources. However, the University has made a commitment to this service, so if your full request cannot be accommodated immediately, it should be doable in future. |
== Examples == | == Examples == |
Revision as of 12:18, March 12, 2013
These instructions are for members of Memorial University.
Memorial University provides a service to all researchers on campus called Research Data Backup. The implementation of the system is managed and maintained by and through the Labnet system.
Contents
[hide]Basic Outline
Anything you put in your home directory on Labnet is automatically backed up on a daily basis as part of the Research Data Backup service. You can get access to up to approximately 6 months of incremental versions of your documents/data through the Labnet 'webtools' interface. The number of incremental versions available is not guaranteed, as it is a function of overall system usage. If you have a Labnet account, you can back up data right now on this service, through rsync or scp directly from placentia head node to either garfield.cs.mun.ca or ganymede.pcglabs.mun.ca.
This is all you need to know if you only need up to 50G of data backed up. If you need more than 50G of space, then a special partition will have to be created for you, and this partition will not be backed up incrementally (i.e. you will have one copy of your data).
How To Get Access
If you already have a Labnet account, you are ready to go. If you do not have a Labnet account go to the Labnet 'webtools' page to create one. Click on 'Labnet Account Generation' on the left menu. You will need your MUN Login credentials, and your Labnet user name and password will be the same as your MUN Login ones. If you do not have MUN Login credentials, go to this web page to get them activated first. Then go to the Labnet 'webtools' page.
- NOTE
- If you use the so-called IMAP Email service for your MUN email (i.e. you connect to mail.mun.ca with your email client to get your MUN email) then you have already activated these credentials. If you do not know your password, you can change it on the MUNLogin page, but this will also change your password for your email client.
Once you have activated your Labnet account, test it by logging on to garfield.cs.mun.ca or ganymede.pcglabs.mun.ca via ssh from the placentia head node.
If you have a small amount of data, say less than 2 or 3 G, then you could simply rsync or scp your data to garfield.cs.mun.ca. However, if you have a significant amount of data, then the Labnet people would prefer that you do this directly to the server that is attached to the storage: carme.pcglabs.mun.ca. To do this, however, you will need to have access to that machine activated. Please contact support and request MUN RDB Access and this will be arranged. [This process is under revision. Looks like most likely we will point most people to garfield or ganymede. And we won't be giving users direct access to carme. Mike will clarify this later.]
If you have more than 50G of data, please contact support and request MUN RDB Large DATA Access with details of how much space you need. Availability is dependent on University resources. However, the University has made a commitment to this service, so if your full request cannot be accommodated immediately, it should be doable in future.
Examples
Create a directory in your Labnet home directory called, e.g., placentia_backup. Log on to garfield.cs.mun.ca or ganymede.pcglabs.mun.ca to do this. (Of course you can call it whatever you like, but you'll need to make corresponding change whenever placentia_backup appears below.)
Let us further assume that you have a single directory on placentia you wish to back up, called your_local_data.
Manual backup
To back up your_local_data manually, run the following command on placentia head node (assuming you are in the directory immediately above your_local_data):
rsync -e ssh -aSx ./your_local_data <labnet_userid>@garfield.cs.mun.ca:placentia_backup/
This will create a sub-directory on the Labnet system called placentia_backup/your_local_data. You will be prompted for your password when you do this. If you use -aRSx instead of -aSx the full directory structure of your source directory will be preserved, relative to the destination directory, e.g. on Labnet you might see placentia_backup/home/<user>/your_local_data.
You can do this manually as often as you like. The data will be backed up daily in an incremental manner on the Labnet system. So you do not need to worry about keeping track of versions of the data yourself, unless you expect to need to have versions going back longer than about 6 months.
Read the rsync man page for more information about how it works, and how to use it, e.g., for more complex cases (multiple source directories, etc.). It's possible to get fairly sophisticated, with exclude patterns, multiple sources, incremental, etc. If you go that route it is probably a good idea to script it, rather than try to do it all on the command line. For some examples of rsync scripting, see here. There are a lot of resources for help with understanding and scripting rsync on the Internet.
Automatic backup
The above process can be made to happen automatically on a recurring schedule, but in order to do so first you must set up passwordless ssh to Labnet. If the backup happens automatically in the middle of the night, you won't be there to supply your Labnet password!
1. Copy the contents of ~/.ssh/id_rsa.pub from placentia into the file ~/.ssh/authorized_keys in your Labnet home directory. If you've done this correctly the above rsync command will now work without prompting you for a password.
2. Set up a cron job on placentia to do this for you automatically. Running...
crontab -e
...will open an editor (probably vi). Add a line like this:
0 22 * * * /usr/bin/rsync -e ssh -aSx /home/<acenet_username>/your_local_data <labnet_username>@garfield.cs.mun.ca:placentia_backup/
This will run the command for you automatically at 10pm each night. It is basically the same command you would run manually, but with full path names specified. It's a good idea to use full path specifications for everything in a cron table. See man 5 crontab for information on the meaning of the first part of this line, and how to set up, e.g., weekly rather than daily backups.
3 (optional). If you don't like the idea of using your 'internal' placentia ssh key, then instead of step 1 above, create an alternative key (in your home directory on placentia):
mkdir .sshremote chmod 700 .sshremote ssh-keygen -t rsa -f .sshremote/placentia # Hit enter when asked for a pass phrase
Then replace -e ssh above with -e "ssh -i .sshremote/placentia".