Cluster Status

From ACENET
Jump to: navigation, search
Ambox notice.png This page is maintained manually. It gets updated as soon as we learn new information.

Clusters

Cluster Status Planned Outage Notes
Siku Offline May 13-16 campus electrical work (2 of 3)
Placentia Offline May 13-16 campus electrical work (2 of 3)
Arbutus See status.computecanada.ca (west.cloud.computecanada.ca)
Béluga See status.computecanada.ca
Cedar See status.computecanada.ca
Graham See status.computecanada.ca
Niagara See status.computecanada.ca

Services

Service Status Planned Outage Notes
WebMO Retired End of service 2019 Mar 31 Retired with Placentia
Account creation Manual No outages Write support
PGI and Intel licenses Online No outages
Legend:
Online cluster is up and running
Offline all users cannot login or submit jobs, or service is not working
Online some users can login and/or there are problems affecting your work

Outage schedule

Jobs will not be scheduled with a run time (--time=) that extends into the beginning of a planned outage period. This is so the job will not be terminated prematurely when the system goes down.

  • Siku and Placentia will be down beginning May 13 11h30 NDT (14h00 UTC) due to electrical power work on MUN main campus. We expect to return to service during the day on Monday, May 16. There will be a third scheduled power outage from May 28 to May 30.
  • Siku and Placentia will be down beginning May 27 11h30 NDT (14h00 UTC) due to electrical power work on MUN main campus. We expect to return to service during the day on Monday, May 30.

Siku

  • Siku is back online since 12:30pm NDT (15h00 UTC). There will be two similar outages in May: May 13-16 and May 28-30.
12:32, May 2, 2022 (NDT)
  • Siku is offline since 11:30am NDT (14h00 UTC) to facilitate electrical work by Memorial University facilities management in the data centre. We expect a return to service by mid-day on Monday, May 2nd 2022.
11:40, April 29, 2022 (NDT)
  • A time sensitive maintenance outage was carried out on Monday March 28. Work began at 7:30AM Newfoundland time (10h00 UTC) and was completed by 5:30pm Newfoundland time (20h00 UTC). The work carried out has expanded our Infiniband Network and increased the capacity of our backend-infrastructure to allow the addition of almost 30 additional nodes, which will be added over the coming days.
17:30, March 28, 2022 (NST)
  • Memorial University IT services has interrupted network service to Siku just after midnight Newfoundland time (03h30 UTC) on Tuesday Mar 1, 2022, to perform maintenance. The interruption to lasted less 30min. During this time, jobs were prevented to start to avoid failures caused by the lack of external network connection, but has now resumed.
00:25, March 1, 2022 (NST)
  • Memorial Universities networks are online again and access to Siku has been restored. Siku's scheduler had stopped at some point during the outage, but has been restarted on Sat Jan. 8th at 10:20am (NST). Jobs have been running on Siku since then.
Update Monday Jan 10, 13:15 (NST): The onset of the network interruption was also accompanied by a power-fluctuation, that has caused some (but not all) compute nodes to reboot.
13:00, January 8, 2022 (NST)
  • Memorial University has announced that they are experiencing a wide-spread internet outage. Therefore access to Siku is currently not possible, but we expect the system to continue running jobs until internet access has been restored.
Update 14:10 NST: Memorial University has announced on their Twitter account that the issue was caused by an internal technology malfunction. MUN-ITS is working on fixing it.
13:30, January 7, 2022 (NST)

For older outages see: Previous outages

  • Our newest cluster, Siku, is now in production. Access is currently restricted to invited users only. Access request form.
13:00, December 10, 2019 (NST)

Placentia

  • Placentia was retired from general service as of 2019 Mar 31. A reduced number of compute nodes remain in service, with access restricted to MUN users who have made suitable arrangements. Contact support@ace-net.ca if you believe you should have access.