Storage System
Legacy documentation
This page describes a service provided by a retired ACENET system. Most ACENET services are currently provided by national systems, for which please visit https://docs.computecanada.ca. |
- Main page: User Guide
File storage at ACENET is implemented with one of two file system technologies:
- Lustre, a parallel file system widely used in high-performance computing, at Mahone, Fundy, and Placentia.
- Oracle's SAM-QFS at Glooscap.
Contents
Changes
In early 2016 ACENET replaced a variety of aging storage hardware and software at Mahone, Fundy, Placentia in order to assure data continuity. Lustre was also introduced at that time, and the following changes affecting users were made:
- The
/globalscratch
file system was merged with/home
. Files formerly in/globalscratch/$USER
should now be found in/home/$USER/scratch
, and/home/$USER/scratch
is no longer a symbolic link but a real subdirectory. Scripts or programs which explicitly refer to/globalscratch
will have to be edited. - The
quota
command is now a wrapper around thelfs quota
command. The appearance of its output has changed somewhat, but your quota standing are now available immediately. - Quotas have been adjusted to reflect the merged filesystems, and a new file count quota has been imposed with a default quota of 180,000 files per user.
- The tape layer of the old storage systems has not been replaced, for reasons of cost. While file restoration after deletion or other forms of accidental loss was never officially supported, such recovery is now practically impossible in every case.
- Red Hat Enterprise Linux 6 (RHEL6) is the default operating system on all four clusters now. This simplifies job submission for those users who have been obliged to use
os=RHEL6
for the last year or so, and certain applications that were previously only available on RHEL6 can now run on any node, e.g. MATLAB.
Policies
- Policy document: ACENET Data Policies
Backup
ACENET does not provide backup services. Our filesystems are built with RAID redundancy to protect your data from the loss due to hardware failures, but we do not protect you from accidentally deleting or changing your own files. Users are therefore strongly encouraged to make off-site (or multi-site) copies of their critical data. Source code and other such key files should be managed with a version control tool such as Git, Subversion, Mercurial, or CVS.
You should also be aware of your home institution's data storage policies and follow them. Some institutions offer network backup facilities which you might be able to take advantage of. MUN users can take advantage of MUN's RDB system for backing up data from Placentia.
Archiving
ACENET does not provide permanent data archiving.
Data retention policy
Data stored in expired accounts is subject to deletion after a grace period of 4 months.
Layout
There are three types of disk space available to the user on most ACENET clusters: One is Permanent Storage, and two others are Temporary Storage. The general outline of the ACENET storage system is given below.
- Permanent Storage system on every cluster
Name | Location | Function | Resource type |
---|---|---|---|
Home Dir | /home/<username> |
critical data and code | network |
- Temporary Storage system on every cluster
Name | Location | Function | Resource type |
---|---|---|---|
No-quota Scratch | /nqs/<username> |
temporary data, large data | network |
Local Scratch | /scratch/tmp |
temporary data, fast read/write access data | node-local |
Permanent Storage
Main storage (home directory) is your personal and permanent space for research-critical data and code. This is where you should put your data prior to and after computations, and where you should keep source code and executables. It is located in /home/<username>
, where your username will replace <username>
. When you log in, this is your current working directory. You may create whatever subdirectories you like here. The Main storage is a networked storage shared among all compute nodes via Lustre or NFS (Network File System) at Glooscap.
Quotas
Storage quotas are implemented at all clusters. The default quota values (soft limits) are given in the table below. The hard limit quotas are 5-10% higher (except Glooscap). The grace period of exceeding the soft limit is one week.
Location | Limit type | Fundy | Mahone | Placentia | Glooscap |
---|---|---|---|---|---|
/home/<username> |
bytes per user | 150 GB | 155 GB | 75 GB | 61 GB |
/home/<username> |
files per user | 180,000 | 180,000 | 180,000 | no limit |
Your usage and limit information can be found with the command quota
.
You can also use du
to determine how much space your files occupy:
$ du -h --max-depth=1 /home/$USER/
DAU
In the table below, the Disk Allocation Unit (DAU) sizes on ACENET clusters are provided. Where there are two numbers specified for the DAU in the table below, like so X (Y) KB, then the first 8 blocks of a file will be X KB each, and the rest of the blocks will be Y KB each.
Location | Fundy | Mahone | Placentia | Glooscap |
---|---|---|---|---|
/home |
4 KB | 4 KB | 4 KB | 4 (64) KB |
Temporary Storage
No-quota Scratch
No-quota Scratch (NQS) is temporary network storage that has no per-user quota limit, but gets cleaned periodically to get rid of old files. It's available at /nqs/<username>/
on every cluster to users who have requested access to it.
- Note
- If you want to use NQS, you should contact support stating that you understand the terms of use and would like NQS turned on. Also, please let us know if you want to be notified when files are scheduled for deletion, and if so where you want those emails to be sent.
NQS is designed to allow you to store large amounts of data on a temporary basis, for example, files generated and consumed during a single job that cannot be stored on Main Storage or Global Scratch due to the per-user quotas. Because no quotas are enforced on NQS, there is an irreducible risk that the filesystem will fill up. Should that occur existing data on /nqs
may be unrecoverable. This means it is unsuitable for storage of critical data. Long-term storage of data --- critical or not --- is also not appropriate since this increases the risk of the filesystem filling up during its intended use.
You are expected to delete your files from /nqs
once the associated job or jobs are complete. Technical staff also reserve the right to delete files manually in the event of a manifest risk of a fill-up emergency.
To ensure that these guidelines are followed and /nqs
stays usable for its intended purpose, files which have not been accessed for 31 days are automatically deleted. The deletion routine will notify you seven days in advance of removing any of your files if you keep a file named /home/username/.nqs
in your home directory with these contents:
U_EMAIL=user@some.address.foo U_QUIET=no
Fundy | Mahone | Placentia | Glooscap | |
---|---|---|---|---|
size | 12 T | 13 T | 12 T | 19 T |
DAU | 4 KB | 4 KB | 4 KB | 4 KB |
If you want to check how much space is used or available in NQS then use the following command:
$ df -h /nqs/$USER
To examine the last access time of your files:
$ ls -lu /nqs/$USER # in the given directory $ ls -luR /nqs/$USER # in subdirectories too, recursively
To find files recursively which have not been accessed for the last e.g. 24 days:
$ find /nqs/$USER -type f -atime +24
Local Scratch
- Main page: Local Scratch
Each compute node has its own disk (or in some cases, solid state memory) which is not shared with other compute nodes. We refer to this as local disk. If it is used to store temporary files for an individual job, then we refer to that as "local scratch storage".
Local scratch has the advantage over network storage that local storage is not prone to slow down when cluster load is high. If your application does a high volume of input/output then using local scratch might result in more predictable run times. However, local scratch is more complicated to use than network storage. If you are willing to invest some effort into learning how to use node-local disk in general and the specifics of ACENET's node-local scratch in particular, then please read Local Scratch.