Difference between revisions of "Cluster Status"
(→Services) |
|||
Line 13: | Line 13: | ||
! scope="col" align=left width="250px" | Notes | ! scope="col" align=left width="250px" | Notes | ||
|- valign=top bgcolor="#f5faff" | |- valign=top bgcolor="#f5faff" | ||
− | | [[Cluster Status#Mahone | Mahone]] || style="color:#ff8c00" | '''Online''' || [[Cluster Status#Outage schedule | Service ends March 31 ]] || | + | | [[Cluster Status#Mahone | Mahone]] || style="color:#ff8c00" | '''Online''' || [[Cluster Status#Outage schedule | Service ends March 31 2018 ]] || |
|- valign=top bgcolor="#f5faff" | |- valign=top bgcolor="#f5faff" | ||
− | | [[Cluster Status#Placentia | Placentia]] || style="color:green" | '''Online''' || [[Cluster Status#Outage schedule | No outages ]] || | + | | [[Cluster Status#Placentia | Placentia]] || style="color:green" | '''Online''' || [[Cluster Status#Outage schedule | No outages - Service extended to 31 March 2019]] || |
|- valign=top bgcolor="#f5faff" | |- valign=top bgcolor="#f5faff" | ||
− | | [[Cluster Status#Fundy | Fundy]] || style="color:#ff8c00" | '''Online''' || [[Cluster Status#Outage schedule | Service ends March 31 ]] || | + | | [[Cluster Status#Fundy | Fundy]] || style="color:#ff8c00" | '''Online''' || [[Cluster Status#Outage schedule | Service ends March 31 2018]] || |
|- valign=top bgcolor="#f5faff" | |- valign=top bgcolor="#f5faff" | ||
− | | [[Cluster Status#Glooscap | Glooscap]] || style="color:green" | '''Online''' || [[Cluster Status#Outage schedule | No outages ]] || | + | | [[Cluster Status#Glooscap | Glooscap]] || style="color:green" | '''Online''' || [[Cluster Status#Outage schedule | No outages - Service extended to 31 March 2019 ]] || |
|- valign=top bgcolor="#f5faff" | |- valign=top bgcolor="#f5faff" |
Revision as of 17:50, March 27, 2018
![]() |
Clusters
Please click on the name of the cluster below in the table to quickly get to the corresponding section of this page. The outage schedule section is a single place where data about all scheduled outages are represented.
Cluster | Status | Planned Outage | Notes |
---|---|---|---|
Mahone | Online | Service ends March 31 2018 | |
Placentia | Online | No outages - Service extended to 31 March 2019 | |
Fundy | Online | Service ends March 31 2018 | |
Glooscap | Online | No outages - Service extended to 31 March 2019 |
Services
Service | Status | Planned Outage | Notes |
---|---|---|---|
WebMO | Online | No outages | |
Account creation | Manual | No outages | Write support |
PGI and Intel licenses | Online | No outages | |
Videoconferencing (IOCOM Server) | Online | No outages |
- Legend:
Online | cluster is up and running |
Offline | all users cannot login or submit jobs, or service is not working |
Online | some users can login and/or there are problems affecting your work |
Outage schedule
Grid Engine will not schedule any job with a run time (h_rt
) that extends into the beginning of a planned outage period. This is so the job will not be terminated prematurely when the system goes down.
- Fundy and Mahone will run no more jobs after midnight 2018 March 31, and will be withdrawn from service shortly thereafter. Login nodes and file systems will continue to operate the first week of April, but may be withdrawn from service with little or no notice any time thereafter. All researchers should have copied off any critical data already.
- Groups which have not registered their "Transition Ready" status cannot submit new jobs at Glooscap, Placentia, Mahone or Fundy. See New Systems Migration for more information.
Mahone
- Mahone is back in service after this weekend's electrical power event. Some compute nodes must remain off-line due to the lack of a power distribution bar. This represents a reduction in capacity of about 80 cores.
- 13:15, December 4, 2017 (AST)
- A power distribution bar shorted out in one of the racks, which tripped the 150a breaker in the UPS and took out one entire panel. We are working on bringing the servers up.
- 08:48, December 4, 2017 (AST)
- Mahone has been returned to service.
- 15:47, November 8, 2017 (AST)
- An unplanned overnight power outage at the SMU has caused all nodes - including the storage system - to crash. The sysadmins are in the process of powering everything up again and assessing any damage.
- 08:56, November 7, 2017 (AST)
Placentia
- The compute nodes that had to be taken offline yesterday to facilitate repairs to one of the A/C units are now back online and ready to use.
- 09:59, February 20, 2018 (AST)
- This morning at 08h00 NST am the compute nodes cl001 to cl108 have been shut down to facilitate repairs to one of the A/C units. If all goes as planned we will be able to bring them back up by Tuesday noon.
- 09:33, February 19, 2018 (AST)
- Due to important repairs to the A/C unit in Placentia's data centre, compute nodes cl001 to cl108 will be unavailable between 07h00 NST Monday, February 19th until noon Tuesday, February 20th. This section includes the Gaussian.q nodes, which have more RAM and local scratch than most other nodes.
- 14:55, February 15, 2018 (AST)
Fundy
- No recent issues
Glooscap
- Electrical power work originally scheduled for the week of Feb 20-23 has been indefinitely postponed.
- 11:46, February 16, 2018 (AST)
- Interactive response of the head node is very slow for many operations. Technical staff are investigating.
- 16:36, December 6, 2017 (AST)
- The metadata server was hung all night March 7-8. It was rebooted this morning and Glooscap is operating once again, although technical staff continue to be cautious about its future behaviour. To try to alleviate the load on the metadata server we are withdrawing compute nodes cl002 through cl058 from service. This represents a reduction of 188 cores in the capacity of the cluster.
- 11:24, March 8, 2017 (AST)