Difference between revisions of "Cluster Status"

From ACENET
Jump to: navigation, search
(Fundy)
(Clusters)
Line 17: Line 17:
 
| [[Cluster Status#Placentia | Placentia]] || style="color:green" | '''Online''' || [[Cluster Status#Outage schedule | No outages]] || Reduced capacity
 
| [[Cluster Status#Placentia | Placentia]] || style="color:green" | '''Online''' || [[Cluster Status#Outage schedule | No outages]] || Reduced capacity
 
|- valign=top bgcolor="#f5faff"
 
|- valign=top bgcolor="#f5faff"
| [[Cluster Status#Fundy | Fundy]] || style="color:red" | '''Offline''' || [[Cluster Status#Outage schedule | No outages]] || Network outage
+
| [[Cluster Status#Fundy | Fundy]] || style="color:green" | '''Online''' || [[Cluster Status#Outage schedule | No outages]] || Network outage
 
|- valign=top bgcolor="#f5faff"
 
|- valign=top bgcolor="#f5faff"
 
| [[Cluster Status#Glooscap | Glooscap]] || style="color:green" | '''Online''' || [[Cluster Status#Outage schedule | No outages]] ||  
 
| [[Cluster Status#Glooscap | Glooscap]] || style="color:green" | '''Online''' || [[Cluster Status#Outage schedule | No outages]] ||  

Revision as of 20:10, November 5, 2015

Ambox notice.png This page is maintained manually. It gets updated as soon as we learn new information.

Clusters

Please click on the name of the cluster below in the table to quickly get to the corresponding section of this page. The outage schedule section is a single place where data about all scheduled outages are represented.

Cluster Status Planned Outage Notes
Mahone Online No outages
Placentia Online No outages Reduced capacity
Fundy Online No outages Network outage
Glooscap Online No outages

Services

Service Status Planned Outage Notes
WebMO Online No outages
Account creation Online No outages
PGI and Intel licenses Online No outages
Videoconferencing (IOCOM Server) Online No outages
Legend:
Online cluster is up and running
Offline all users cannot login or submit jobs, or service is not working
Online some users can login and/or there are problems affecting your work

Outage schedule

Grid Engine will not schedule any job with a run time (h_rt) that extends into the beginning of a planned outage period. This is so the job will not be terminated prematurely when the system goes down.

  • No outages

Mahone

  • LDAP service has been restored to Mahone and Glooscap. Both clusters are back in service. Some jobs could have failed. Check your jobs.
09:33, October 26, 2015 (ADT)
  • There is a problem with LDAP affecting Mahone and Glooscap.
15:59, October 25, 2015 (ADT)

Placentia

  • A file server (nfs3) failed and had to be rebooted again. Check your jobs.
11:25, October 26, 2015 (ADT)
  • A file server (nfs3) is malfunctioning again. We have disabled hosts which use that server, in order to prevent jobs from going into error state. While we diagnose the problem total capacity has been reduced by 608 cores, including parts of short.q, medium.q, long.q, tarasov.q, and gaussian.q.
15:00, September 25, 2015 (ADT)

Fundy

  • There is a power outage on the UNB campus affecting the IT building. The cluster should be running, but is not accessible on the network.
15:00, November 5, 2015 (AST)
  • Fundy is back online. There was an issue of the UPS over the weekend. Machines were shutdown and jobs were lost.
15:20, June 15, 2015 (ADT)
  • Fundy reported as inaccessible by users.
8:00, June 15, 2015 (ADT)

Glooscap

  • LDAP service has been restored to Mahone and Glooscap. Both clusters are back in service. Some jobs could have failed. Check your jobs.
09:33, October 26, 2015 (ADT)
  • There is a problem with LDAP affecting Mahone and Glooscap.
15:59, October 25, 2015 (ADT)