Cluster Status

From ACENET
Revision as of 14:23, January 16, 2015 by Luyang (talk | contribs) (Clusters)
Jump to: navigation, search
Ambox notice.png This page is maintained manually. It gets updated as soon as we learn new information.

Clusters

Please click on the name of the cluster below in the table to quickly get to the corresponding section of this page. The outage schedule section is a single place where data about all scheduled ACEnet outages are represented.

Cluster Status Planned Outage Notes
Mahone Online No outages
Placentia Online No outages
Fundy Online Power outage Reduced capacity.
Glooscap Online No outages

Services

Service Status Planned Outage Notes
WebMO Online No outages
Account creation Online No outages
PGI and Intel licenses Online No outages
Legend:
Online cluster is up and running
Offline all users cannot login or submit jobs, or service is not working
Online some users can login and/or there are problems

Outage schedule

Grid Engine will not schedule any job with a run time (h_rt) that extends into the beginning of a planned outage period. This is so the job will not be terminated prematurely when the system goes down.

Mahone

  • Maintenance complete.
12:01, November 18, 2014 (AST)

Placentia

  • There was a power event this morning at around 3:40am that took down a number of compute nodes. Please check your jobs and resubmit if necessary.
10:30, January 5, 2015 (AST)

Fundy

  • Thirteen compute nodes have been temporarily taken out of production to allow us to further investigate and replace the faulty line card in the Infiniband switch. These nodes will be put back in production as Ethernet-only if the repair takes longer than anticipated.
12:28, January 7, 2015 (AST)
  • Mellanox has informed us that there is no replacement or support for products that has reached their End of Life date. We are investigating alternatives.
14:25, January 6, 2015 (AST)
  • We have contacted Mellanox support to deal with what appears to be a problem with the IB switch.
12:24, January 6, 2015 (AST)

Glooscap

  • Some queue changes have been made in connection with the 2015 Compute Canada NRAC. 'qsum' may display incorrect numbers of available slots while jobs drain from reassigned equipment, perhaps as late as January 20th. We regret any confusion this may cause.
10:45, January 8, 2015 (AST)