This page is maintained manually. It gets updated as soon as we learn new information.
|
Clusters
Please click on the name of the cluster below in the table to quickly get to the corresponding section of this page. The outage schedule section is a single place where data about all scheduled ACEnet outages are represented.
Services
- Legend:
Online |
cluster is up and running
|
Offline |
all users cannot login or submit jobs, or service is not working
|
Online |
some users can login and/or there are problems
|
Outage schedule
Grid Engine will not schedule any job with a run time (h_rt
) that extends into the beginning of a planned outage period. This is so the job will not be terminated prematurely when the system goes down.
Brasdor
On February 21, 2014, ACEnet's Brasdor cluster suffered serious damage when an A/C malfunction over-cooled the room, causing a sprinkler head to deploy. Assessment is ongoing, however it is clear that the water damage was extensive enough that we will be unable to return the cluster to service.
A central concern of our recovery work has been the possibility of restoring user data. Data written to /home or /globalscratch on or before February 15, 2014 has the potential to have a copy surviving on tape. We have been able to restore such data using Mahone's tape library. Due to disk space limitations, the process to restore data must be approached in a user-by-user fashion.
We are asking any user requiring recovery of Brasdor data to contact support specifying which file system you want us to recover (/home and/or /globalscratch). Please specify the subject line as "File recovery at Brasdor - your_username". Also, please note that /nqs cannot be recovered.
|
Mahone
- 12:01, November 18, 2014 (AST)
- The cluster is offline for unscheduled NFS maintenance.
- 09:35, November 18, 2014 (AST)
Placentia
- An NFS server has been rebooted. Please check whether your jobs are progressing normally or need to be resubmitted.
- 08:13, November 17, 2014 (AST)
- NFS issues. Users home dirs may not get mounted on the computed noted, jobs could fail or not start.
- 07:13, November 17, 2014 (AST)
Fundy
- 11:36, November 21, 2014 (AST)
- The cluster if offline to investigate and fix the storage system problems.
- 10:46, November 20, 2014 (AST)
- NFS problems once again. Users might not be able to log in.
- 23:01, November 19, 2014 (AST)
Glooscap
- Head node locked up late Thursday afternoon, November 6. Service has been restored. Jobs were unaffected.
- 08:56, November 7, 2014 (AST)
- All general production hosts (short.q, medium.q, long.q) at Glooscap are now running the RHEL 6 operating system. Upgrade of the head node to RHEL 6 is being planned.
- 11:16, October 23, 2014 (ADT)