Changes

Jump to: navigation, search

Cluster Status/Previous outages

700 bytes added, 16:26, October 27, 2021
2020
* UPS failure. Returning to service on street power is deemed risky. We are investigating repair options for the UPS.
: 15:01, March 30, 2020 (NDT)
 
* Electrical power work at MUN data centre is complete. Siku and Placentia are back in production.
: 17:15, March 17, 2020 (NDT)
 
* The power-fluctuation at Memorial University caused issues with the Infiniband network. In the process we had to temporarily suspend the scheduler and terminate all jobs that had not already failed right away. As of Wednesday 4:40pm NST the scheduler was resumed and Siku is back online.
: 9:00, March 12, 2020 (NST)
 
* We had a campus wide power-event at Memorial University. Some compute nodes were affected and some jobs have crashed. We are still investigating and fixing issues and will prevent jobs from starting in the mean time.
: 13:51, March 11, 2020 (NST)
342
edits

Navigation menu