Difference between revisions of "Cluster Status"
(→Clusters) |
(→Fundy) |
||
Line 70: | Line 70: | ||
== Fundy == | == Fundy == | ||
+ | * Fundy has been released. | ||
+ | : 14:04, May 29, 2015 (ADT) | ||
* Thunder storm has caused the power loss on UNB campus. Waiting for power back. | * Thunder storm has caused the power loss on UNB campus. Waiting for power back. | ||
: 21:51, May 28, 2015 (ADT) | : 21:51, May 28, 2015 (ADT) |
Revision as of 17:04, May 29, 2015
![]() |
Clusters
Please click on the name of the cluster below in the table to quickly get to the corresponding section of this page. The outage schedule section is a single place where data about all scheduled outages are represented.
Cluster | Status | Planned Outage | Notes |
---|---|---|---|
Mahone | Online | No outages | |
Placentia | Online | No outages | A/C failure, check your jobs |
Fundy | Online | No outages | |
Glooscap | Online | No outages |
Services
Service | Status | Planned Outage | Notes |
---|---|---|---|
WebMO | Online | No outages | |
Account creation | Online | No outages | |
PGI and Intel licenses | Online | No outages | |
Videoconferencing (IOCOM Server) | Online | No outages |
- Legend:
Online | cluster is up and running |
Offline | all users cannot login or submit jobs, or service is not working |
Online | some users can login and/or there are problems affecting your work |
Outage schedule
Grid Engine will not schedule any job with a run time (h_rt
) that extends into the beginning of a planned outage period. This is so the job will not be terminated prematurely when the system goes down.
- No outages planned at the present time
Mahone
- Operational.
- 08:26, May 19, 2015 (ADT)
- We are experiencing NFS problem at Mahone.
- 14:11, May 16, 2015 (ADT)
Placentia
- The A/C has failed again. We have turned off cl001-cl108 to keep the temperature in the room stable. Check your jobs.
- 18:23, May 19, 2015 (ADT)
- The A/C unit has now been repaired and cl001-cl108 have been returned to service.
- 13:53, May 19, 2015 (ADT)
- Scheduled maintenance has completed and Placentia is once again online. We continue with nodes cl001-cl108 offline pending air conditioning repairs.
- 15:37, April 29, 2015 (NDT)
- The A/C unit that had trouble yesterday has a failing compressor. A replacement is on order but may take 2-3 weeks to arrive. cl001-108 will be removed from service until the A/C is repaired.
- 14:00, April 13, 2015 (ADT)
Fundy
- Fundy has been released.
- 14:04, May 29, 2015 (ADT)
- Thunder storm has caused the power loss on UNB campus. Waiting for power back.
- 21:51, May 28, 2015 (ADT)
- The Infiniband switch line board has finally been replaced. Consequently, the temporarily isolated Enerther-only part of the cluster has been merged back with the rest of the nodes.
- 15:28, March 20, 2015 (ADT)
Glooscap
- Medium and long jobs are no longer being held.
- 08:54, May 8, 2015 (ADT)
- Glooscap is back in service. The power work could not be completed due to a missing part, and we are waiting to hear if and when another short outage will be required. We are holding jobs which would run beyond Tuesday, May 12 06h00. We hope to be able either to release all jobs or schedule a second outage by end of business Thursday May 7.
- 16:30, May 4, 2015 (ADT)
- Off-line to permit work on power and air conditioning in the data center. We are also using this opportunity to carry out preventive maintenance on the filesystem and some compute nodes. Return to service will probably be late on Monday, May 4.
- 9:00, May 1, 2015 (ADT)