Cluster Status

This page is maintained manually. It gets updated as soon as we learn new information.

Clusters

Please click on the name of the cluster below in the table to quickly get to the corresponding section of this page. The outage schedule section is a single place where data about all scheduled outages are represented.

Cluster	Status	Planned Outage	Notes
Mahone	Online	No outages
Placentia	Offline	No outages	Loss of network connection. Jobs are not affected.
Fundy	Online	No outages
Glooscap	Online	No outages

Services

Service	Status	Planned Outage	Notes
WebMO	Online	Date to come	We are experiencing problems submitting WebMO jobs
Account creation	Online	No outages
PGI and Intel licenses	Online	No outages
Videoconferencing (IOCOM Server)	Online	No outages

Legend:

Online	cluster is up and running
Offline	all users cannot login or submit jobs, or service is not working
Online	some users can login and/or there are problems affecting your work

Outage schedule

Grid Engine will not schedule any job with a run time (h_rt) that extends into the beginning of a planned outage period. This is so the job will not be terminated prematurely when the system goes down.

None

Mahone

The cluster may be unreachable due to an upstream provider networking issue.

08:11, December 8, 2016 (AST)

Placentia

Due to a network problem at Memorial University the Placentia cluster has currently lost it's external network connection.

Jobs are not affected as the internal network is still available. We expect this outage to last a few hours.

10:41, September 26, 2017 (ADT)

Placentia's headnode (clhead) spontaneously rebooted last night around 2:15 am NST.

As far as we can tell no jobs were effected.

08:14, September 8, 2017 (ADT)

A/C repairs have been completed and Placentia is back in production.

Fortunately we didn't have to kill any jobs or shutdown any equipment. Jobs that had been previously submitted are starting normally.

We don't expect any negative effects besides the longer waiting time over the past 2 days.

13:33, August 25, 2017 (ADT)

Service technicians have started working on the affected A/C unit.

We are trying to avoid killing jobs that are already running or having to shutdown compute nodes, however we are prepared to do so if the temperature rises too high during the maintenance.

08:49, August 25, 2017 (ADT)

The Memorial Data centre is having A/C problems, therefore we are reducing Placentia's capacity.

For now we prevent new jobs from starting. If this proves to be sufficient, already running jobs won't be effected.

15:20, August 23, 2017 (ADT)

Placentia is back up after a planned power outage at the Memorial University Campus.

The jobs have been restarted and the vast majority of them are running fine, however a few jobs have failed in the process. Please check your jobs to make sure whether they belong to the latter group.

14:07, June 3, 2017 (ADT)

Fundy

Fundy head node is unresponsive. Technical staff are investigating the cause.

12:19, April 18, 2017 (ADT)

Fundy is back now.

10:59, August 8, 2016 (ADT)

Glooscap

Network problem has been resolved. Glooscap is reachable again.

16:36, April 24, 2017 (ADT)

Glooscap is inaccessible due to a network problem at the host university. We expect most jobs will continue running uninterrupted while we diagnose the problem.

13:06, April 24, 2017 (ADT)

The metadata server was hung all night March 7-8. It was rebooted this morning and Glooscap is operating once again, although technical staff continue to be cautious about its future behaviour. To try to alleviate the load on the metadata server we are withdrawing compute nodes cl002 through cl058 from service. This represents a reduction of 188 cores in the capacity of the cluster.

11:24, March 8, 2017 (AST)

The cluster is unresponsive.

17:05, March 7, 2017 (AST)

A file system consistency check (fsck) has been completed, jobs have been restarted and logins are once again enabled. We will be monitoring to see if the rate or severity of slowdowns has changed.

10:20, March 7, 2017 (AST)

The intermittent slow response on Glooscap continues, with many such events logged Feb 16-18 and Feb 26-Mar 1. Technical staff continue to investigate the cause without vendor support.

09:13, March 1, 2017 (AST)

Users report intermittent slowness in interactive use of Glooscap. Symptoms include pauses of several seconds to over a minute in response to shell commands involving files or file metadata (such as "ls"). This is believed to be due to load on the file system, and therefore may also be affecting the run times of jobs doing extensive I/O. Vendor support for the file system is no longer available so deep troubleshooting is out of reach. We have no reports of loss of data or other actual failures. All we can recommend is great patience.

12:18, February 9, 2017 (AST)

Cluster Status

Clusters

Services

Outage schedule

Mahone

Placentia

Fundy

Glooscap

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Quick Links

User Support

Resources

Policies

Legacy Documentation

Tools