Cluster Status

This page is maintained manually. It gets updated as soon as we learn new information.

Clusters

Please click on the name of the cluster below in the table to quickly get to the corresponding section of this page. The outage schedule section is a single place where data about all scheduled ACEnet outages are represented.

Cluster	Status	Planned Outage	Notes
Brasdor	Offline	No outages	Extensive damage
Mahone	Online	No outages	Maintenance complete
Placentia	Online	No outages
Fundy	Online	No outages
Glooscap	Online	No outages

Services

Service	Status	Planned Outage
WebMO	Online	No outages
Account creation	Online	No outages
PGI and Intel licenses	Online	No outages

Legend:

Online	cluster is up and running
Offline	all users cannot login or submit jobs, or service is not working
Online	some users can login and/or there are problems

Outage schedule

Grid Engine will not schedule any job with a run time (h_rt) that extends into the beginning of a planned outage period. This is so the job will not be terminated prematurely when the system goes down.

Brasdor

On February 21, 2014, ACEnet's Brasdor cluster suffered serious damage when an A/C malfunction over-cooled the room, causing a sprinkler head to deploy. Assessment is ongoing, however it is clear that the water damage was extensive enough that we will be unable to return the cluster to service. A central concern of our recovery work has been the possibility of restoring user data. Data written to /home or /globalscratch on or before February 15, 2014 has the potential to have a copy surviving on tape. We have been able to restore such data using Mahone's tape library. Due to disk space limitations, the process to restore data must be approached in a user-by-user fashion. We are asking any user requiring recovery of Brasdor data to contact support specifying which file system you want us to recover (/home and/or /globalscratch). Please specify the subject line as "File recovery at Brasdor - your_username". Also, please note that /nqs cannot be recovered.

Mahone

Maintenance complete.

12:01, November 18, 2014 (AST)

The cluster is offline for unscheduled NFS maintenance.

09:35, November 18, 2014 (AST)

Placentia

An NFS server has been rebooted. Please check whether your jobs are progressing normally or need to be resubmitted.

08:13, November 17, 2014 (AST)

NFS issues. Users home dirs may not get mounted on the computed noted, jobs could fail or not start.

07:13, November 17, 2014 (AST)

Fundy

Fundy is back online.

11:36, November 21, 2014 (AST)

The cluster if offline to investigate and fix the storage system problems.

10:46, November 20, 2014 (AST)

NFS problems once again. Users might not be able to log in.

23:01, November 19, 2014 (AST)

Glooscap

Head node locked up late Thursday afternoon, November 6. Service has been restored. Jobs were unaffected.

08:56, November 7, 2014 (AST)

All general production hosts (short.q, medium.q, long.q) at Glooscap are now running the RHEL 6 operating system. Upgrade of the head node to RHEL 6 is being planned.

11:16, October 23, 2014 (ADT)

Cluster Status

Clusters

Services

Outage schedule

Brasdor

Mahone

Placentia

Fundy

Glooscap

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Quick Links

User Support

Resources

Policies

Legacy Documentation

Tools