Difference between revisions of "Cluster Status"

From ACENET
Jump to: navigation, search
Line 17: Line 17:
 
| [[Cluster Status#Fundy | Fundy]] || style="color: green" | '''Online''' || [[Cluster Status#Outage Schedule | No outages]] ||
 
| [[Cluster Status#Fundy | Fundy]] || style="color: green" | '''Online''' || [[Cluster Status#Outage Schedule | No outages]] ||
 
|- valign=top bgcolor="#f5faff"
 
|- valign=top bgcolor="#f5faff"
| [[Cluster Status#Glooscap | Glooscap]] || style="color: green" | '''Online''' || [[Cluster Status#Outage Schedule | Upgrade Jan 16-31]] ||  
+
| [[Cluster Status#Glooscap | Glooscap]] || style="color: green" | '''Online''' || [[Cluster Status#Outage Schedule | Upgrade Jan 18-Feb 1]] ||  
 
|- valign=top bgcolor="#f5faff"
 
|- valign=top bgcolor="#f5faff"
 
| [[Cluster Status#Courtenay | Courtenay]] || style="color: green" | '''Online''' || [[Cluster Status#Outage Schedule | No outages]] ||
 
| [[Cluster Status#Courtenay | Courtenay]] || style="color: green" | '''Online''' || [[Cluster Status#Outage Schedule | No outages]] ||
Line 33: Line 33:
  
 
== Outage schedule ==
 
== Outage schedule ==
*  '''Glooscap:'''  Installation of '''new equipment''', including a revamp of the storage system, will begin on Monday, January 16, 2012.  The entire cluster will be unavailable beginning midnight Jan 15-16 until further notice.  The expected date of return to production is Tuesday Jan 31, but this may be adjusted earlier or later depending on circumstances.
+
*  '''Glooscap:'''  Installation of '''new equipment''', including a revamp of the storage system, will begin on Wednesday, January 18th, 2012.  The entire cluster will be unavailable beginning midnight Jan 18th until further notice.  The expected date of return to production is Wednesday, February 1st, but this may be adjusted earlier or later depending on circumstances.
  
 
== Brasdor ==
 
== Brasdor ==

Revision as of 14:33, January 16, 2012

Please click on the name of the cluster below in the table to quickly get to the corresponding section of this page. The outage schedule section is a single place where data about all scheduled ACEnet outages are represented.

Cluster Status Planned Outage Notes
Brasdor Online No outages NFS problem, possible job failures
Mahone Online No outages
Placentia Online No outages
Fundy Online No outages
Glooscap Online Upgrade Jan 18-Feb 1
Courtenay Online No outages
Legend:
Online cluster is up and running
Offline all users cannot login or submit jobs
Online some users can login and/or there are problems

Outage schedule

  • Glooscap: Installation of new equipment, including a revamp of the storage system, will begin on Wednesday, January 18th, 2012. The entire cluster will be unavailable beginning midnight Jan 18th until further notice. The expected date of return to production is Wednesday, February 1st, but this may be adjusted earlier or later depending on circumstances.

Brasdor

  • A faulty fiber cable was replaced yesterday, and after an overnight testing, the problematic NFS servers were put back in production, but malfunctioned shortly after. Users could have experienced job failures again. We continue to investigate.
10:49, January 12, 2012 (AST)
  • We are experiencing problems with one of the NFS servers again. Some jobs could have failed. The affected compute nodes are disabled until the issue is resolved.
12:38, January 10, 2012 (AST)
  • The NFS server problem has been resolved. The nodes are available again in the production queues.
12:19, January 9, 2012 (AST)
  • One of the NFS servers has failed. Some jobs could have failed. The affected compute nodes are now disabled, until the NFS server is back online.
21:40, January 8, 2012 (AST)

Mahone

  • The head node is back online now. Users will need to update their cron jobs.
15:10, August 26, 2011 (AST)
  • The hard drive has failed on the head node. It has been replaced now and we are in the process of installing the OS. The rest of the cluster is not affected, jobs are not affected.
09:47, August 26, 2011 (AST)

Placentia

  • The head node got rebooted, and it's back online now.
12:09, December 16, 2011 (AST)
  • The head node is unavailable. We are investigating.
12:05, December 16, 2011 (AST)

Fundy

  • The head node is back online.
14:31, July 18, 2011 (AST)
  • The head node will be rebooted in 30 min. No new logins until 14:45 AST.
13:48, July 18, 2011 (AST)

Glooscap

  • Glooscap Dec 21st Outage complete, nodes back online
16:09, December 21st, 2011 (AST)
  • Glooscap Nov 22nd Outage rescheduled for Dec 21st.
15:36, November 16, 2011 (AST)

Courtenay

  • Back online.
15:01, December 19, 2011 (AST)
  • There is a network problem at UNBSJ. Working on this.
14:56, December 19, 2011 (AST)