Difference between revisions of "Cluster Status"
Line 17: | Line 17: | ||
| [[Cluster Status#Fundy | Fundy]] || style="color: green" | '''Online''' || [[Cluster Status#Outage Schedule | No outages]] || | | [[Cluster Status#Fundy | Fundy]] || style="color: green" | '''Online''' || [[Cluster Status#Outage Schedule | No outages]] || | ||
|- valign=top bgcolor="#f5faff" | |- valign=top bgcolor="#f5faff" | ||
− | | [[Cluster Status#Glooscap | Glooscap]] || style="color: green" | '''Online''' || [[Cluster Status#Outage Schedule | Upgrade Jan | + | | [[Cluster Status#Glooscap | Glooscap]] || style="color: green" | '''Online''' || [[Cluster Status#Outage Schedule | Upgrade Jan 18-Feb 1]] || |
|- valign=top bgcolor="#f5faff" | |- valign=top bgcolor="#f5faff" | ||
| [[Cluster Status#Courtenay | Courtenay]] || style="color: green" | '''Online''' || [[Cluster Status#Outage Schedule | No outages]] || | | [[Cluster Status#Courtenay | Courtenay]] || style="color: green" | '''Online''' || [[Cluster Status#Outage Schedule | No outages]] || | ||
Line 33: | Line 33: | ||
== Outage schedule == | == Outage schedule == | ||
− | * '''Glooscap:''' Installation of '''new equipment''', including a revamp of the storage system, will begin on | + | * '''Glooscap:''' Installation of '''new equipment''', including a revamp of the storage system, will begin on Wednesday, January 18th, 2012. The entire cluster will be unavailable beginning midnight Jan 18th until further notice. The expected date of return to production is Wednesday, February 1st, but this may be adjusted earlier or later depending on circumstances. |
== Brasdor == | == Brasdor == |
Revision as of 14:33, January 16, 2012
Please click on the name of the cluster below in the table to quickly get to the corresponding section of this page. The outage schedule section is a single place where data about all scheduled ACEnet outages are represented.
Cluster | Status | Planned Outage | Notes |
---|---|---|---|
Brasdor | Online | No outages | NFS problem, possible job failures |
Mahone | Online | No outages | |
Placentia | Online | No outages | |
Fundy | Online | No outages | |
Glooscap | Online | Upgrade Jan 18-Feb 1 | |
Courtenay | Online | No outages |
- Legend:
Online | cluster is up and running |
Offline | all users cannot login or submit jobs |
Online | some users can login and/or there are problems |
Outage schedule
- Glooscap: Installation of new equipment, including a revamp of the storage system, will begin on Wednesday, January 18th, 2012. The entire cluster will be unavailable beginning midnight Jan 18th until further notice. The expected date of return to production is Wednesday, February 1st, but this may be adjusted earlier or later depending on circumstances.
Brasdor
- A faulty fiber cable was replaced yesterday, and after an overnight testing, the problematic NFS servers were put back in production, but malfunctioned shortly after. Users could have experienced job failures again. We continue to investigate.
- 10:49, January 12, 2012 (AST)
- We are experiencing problems with one of the NFS servers again. Some jobs could have failed. The affected compute nodes are disabled until the issue is resolved.
- 12:38, January 10, 2012 (AST)
- The NFS server problem has been resolved. The nodes are available again in the production queues.
- 12:19, January 9, 2012 (AST)
- One of the NFS servers has failed. Some jobs could have failed. The affected compute nodes are now disabled, until the NFS server is back online.
- 21:40, January 8, 2012 (AST)
Mahone
- The head node is back online now. Users will need to update their cron jobs.
- 15:10, August 26, 2011 (AST)
- The hard drive has failed on the head node. It has been replaced now and we are in the process of installing the OS. The rest of the cluster is not affected, jobs are not affected.
- 09:47, August 26, 2011 (AST)
Placentia
- The head node got rebooted, and it's back online now.
- 12:09, December 16, 2011 (AST)
- The head node is unavailable. We are investigating.
- 12:05, December 16, 2011 (AST)
Fundy
- The head node is back online.
- 14:31, July 18, 2011 (AST)
- The head node will be rebooted in 30 min. No new logins until 14:45 AST.
- 13:48, July 18, 2011 (AST)
Glooscap
- Glooscap Dec 21st Outage complete, nodes back online
- 16:09, December 21st, 2011 (AST)
- Glooscap Nov 22nd Outage rescheduled for Dec 21st.
- 15:36, November 16, 2011 (AST)
Courtenay
- Back online.
- 15:01, December 19, 2011 (AST)
- There is a network problem at UNBSJ. Working on this.
- 14:56, December 19, 2011 (AST)