Siku

From ACENET
Jump to: navigation, search


Siku is a high-performance computer cluster installed in 2019 at Memorial University in St. John's, Newfoundland.

It is funded in large part by the Atlantic Canada Opportunities Agency (ACOA) with the intention of generating regional economic benefits through industry engagement, while recognizing the important work that ACENET does for academic research in the region.

Siku is only accessible to selected clients.

  • Industrial researchers wishing to access Siku should write to info@ace-net.ca.
    • To add a member to an existing industrial account, the principal investigator should write to support@ace-net.ca with the new member's name, email address, telephone number (if available), and a suggested username beginning in "an-", e.g. "an-jdoe".
  • Principal Investigators of academic research groups may use this access request form.
    • If you are a member of an academic group which uses Siku but you don't have access, have your PI to write to support@ace-net.ca telling us your Alliance username and we'll add you to the access list.

Addresses

Login nodes (ssh) for Alliance accounts: siku.ace-net.ca
Login node (ssh) for local accounts: industry.siku.ace-net.ca
Globus collection: alliancecan#siku
Data transfer node (rsync, scp, sftp,...): dtn.siku.ace-net.ca

Authentication and authorization

Multi-factor authentication (MFA) using Duo is required to access Siku.

Passwords, changing and resetting your password

Local accounts

If your username has the prefix an-, then you have a local account. When your account was created you should have received an initial password and a link through which you to register for MFA with Duo.

If you have a local account, you can change your password by logging in to the cluster and running passwd. The password must meet the following criteria:

  • Minimum length: 12
  • Minimum number of lowercase characters: 1
  • Minimum number of uppercase characters: 1
  • Minimum number of digits: 1

If you have forgotten your password and need it reset, or if you have trouble with Duo as your second factor, write to support@ace-net.ca

Alliance accounts

If you have an active account with the Digital Research Alliance of Canada ("The Alliance") and have been granted access to Siku, you can log in using your Alliance username and password. You can change your password or reset a forgotten password by visiting https://ccdb.alliancecan.ca/security/change_password. You must register for multi-factor authentication at https://ccdb.alliancecan.ca/multi_factor_authentications. You can find more guidance for that at https://docs.alliancecan.ca/wiki/Multifactor_authentication.


SSH key pairs recommended

No matter whether you have a local or an Alliance account, we encourage you to use a passphrase-protected SSH key pair for regular access to Siku. Used properly, a key pair is more secure than a password.

SSH with Hardware Security Keys at Siku

In order to help our clients improve the security of their login authentication, we now offer the option to connect to Siku using SSH keys backed by a hardware security key (e.g. Yubikey). You can use either of two new types of cryptographic keys, ecdsa-sk or ed25519-sk, and authenticate to ACENET's systems via one of the most secure methods currently available.

If you have a hardware security key and are working on Windows 11, MacOS 12 or 13, or a modern Linux distribution, you can likely start using this option now.

On older operating systems, you will need to update OpenSSH. If you're not sure how to do that, or are unsure whether this option is right for you, you can experiment with the cross-platform application Termius, which will provide support right out of the box on a trial basis.

Setup

If you’re already using SSH keys to access ACENET's systems, you are familiar with everything you need to know to begin working with the new key types.

You will need to register a public key of either type ecdsa-sk or ed25519-sk. If you do not already have a key of one of these types, you can use a graphical application to create one, or type ssh-keygen -t ecdsa-sk into PowerShell or a terminal emulator. Remember that you will be prompted to activate a hardware security key as part of the key generation process, and have your Yubikey (or a similar device) on hand.

Authentication is the same as with other SSH keys, except that you will again have to respond to a prompt to activate a hardware security key to authenticate successfully.

Troubleshooting

Windows 10 and 11 Users

Windows users can use PowerShell to access ACENET’s systems and will frequently be able to SSH with the new key types without issue. If you encounter an issue, it’s likely because your version of OpenSSH is not set up to use Windows Hello as a Security Key Provider by default. There is a quick fix. Download the most current version of OpenSSH for PowerShell HERE.

  1. Run the following command in a PowerShell session:
  2. Start-Process -NoNewWindow msiexec.exe -ArgumentList "/i C:\Users\USERNAME\Downloads\OpenSSH-Win64-CURRENT_VERSION.msi ADDLOCAL=Client ADD_PATH=1" -Wait
  3. Reboot

This will resolve most issues in short order. Alternatively, you can choose to experiment with the cross-platform Termius application, which will provide support for generating and using the new key types on a trial basis.

Mac Users

If you are using MacOS 12 or better, you should be able to SSH with the new key types without issue. Users with older versions of the OS can often update to access new features. If you cannot update, you can either choose to install a current version of OpenSSH via the Brew package manager, or use the cross-platform Termius application, which will provide support for generating and using the new key types on a trial basis.

Known issues

  • We have enabled user namespaces to support the operation of Apptainer. We recommend that you make a habit of testing Apptainers on compute nodes rather than login nodes, in case we have to disable user namespaces on login nodes without notice for security reasons.
  • Multi-Processing using libverbs is not working as expected. MPI implementations, however, should work.
  • Directories are automatically created at first logon. This may produce a race condition that results in errors like the following:
Could not chdir to home directory /home/username: No such file or directory
/usr/bin/xauth:  error in locking authority file /home/username/.Xauthority
Lmod has detected the following error:  Unable to load module because of error when  evaluating modulefile: ...

Should this occur on first login, simply log out, wait a minute, and log back in again.

  • Julia problems:
    • Multi-node Julia jobs currently end with Authentication failed message. Workarounds are to do the calculation on a single node, or use an Alliance cluster.

Similarities and differences with national GP clusters

Siku is designed from experience gained with the Alliance systems, Béluga, Cedar, Graham, Narval, and Niagara. Users familiar with those systems will find much familiar here.

  • The filesystem is similarly structured. See Storage and file management.
    • There is no "Nearline" archival filesystem.
  • The same scheduler is used, Slurm, although with simpler policies. See "Job Scheduling", below.
  • The same modules system provides access to the same list of available software.

Job scheduling

Tasks taking more than 10 CPU-minutes or 4 GB of RAM should not be run directly on a login node, but submitted to the job scheduler, Slurm.

Scheduling policies on Siku are different from the policies on Alliance systems. What resources you can use, especially the length of a job you can run, depends on what kind of Slurm account you have. (This is different from what type of login account you have, described under "Authentication and authorization" above. Sorry about that.)

What kind of Slurm account do I have?

  • If you work for an organization which is paying for access to Siku, you have a paid (pd-) account.
  • If you are in an academic research group which is not paying for access to Siku, you have a default (def-) account.
  • If you are an academic in a research group which has contributed equipment to Siku, you also have a contributed (ctb-) account in addition to a default account. Your jobs will normally be submitted under the contributed account, but you may override this by including the --account=def-??? directive in a job script.

If you want to find out what Slurm accounts are available to you, run sacctmgr show associations where user=$USER format=account%20 and examine the prefix on the account code:

  • pd-... is a paid account
  • def-... is a default account
  • ctb-... is a contributed account

If you have a paid account you may also wish to read about Tracking paid accounts.

What is the longest job I can run?

  • If you are NOT requesting a GPU and
    • ... you have a default account, you may request up to 24 hours.
    • ... you have a paid or contributed account, you may request up to 7 days (168 hours).
      • Jobs longer than 3 days but less than 7 days are subject to some restrictions in order to keep waiting times from getting too long. See "Why hasn't my job started?" below for more about this.
  • If you are requesting a GPU and
    • ... your account-owner has NOT contributed a GPU to the cluster, you may request up to 24 hours.
    • ... your account-owner has contributed a GPU to the cluster, you may request up to 3 days (72 hours).


There are a few nodes which accept only 3-hour jobs or shorter from most users (partition contrib3h). Consequently, jobs with a run time of 3 hours or less will often schedule much more quickly than other jobs. If you want an interactive job for testing purposes, we recommend salloc --time=3:0:0 (or less). If you can arrange your production workflow to include 3-hour jobs, those may benefit from more rapid scheduling.

How do I get a node with a GPU?

GPUs should be requested following these examples:

  • Request a Tesla V100 GPU:
#SBATCH --gres=gpu:v100:1
  • Request two RTX 6000 GPUs:
#SBATCH --gres=gpu:rtx6000:2

See "Node characteristics" below for the numbers and types of GPUs installed. Jobs requesting GPUs may request run times of up to 24 hours.

How do I select a node-type?

Siku has 67 CPU-nodes with 40 CPU-cores per node and 33 nodes with 48 CPU-cores per node. When running highly parallel calculations, it may be worth targeting the 48-core nodes by explicitly requesting #SBATCH --ntasks-per-node=48.

Jobs with #SBATCH --ntasks-per-node=40 may end up running on either of those node-types, in which case the remaining CPU cores may be used by other jobs. In this case please avoid requesting all memory on the node by using --mem=0, as this would bock the remaining 8 CPU-cores on 48-core nodes from being used and thereby being charged for CPU-usage by the scheduler, even though the job can't utilize them.
Instead (if possible) use no more than --mem-per-cpu=4775M, which will allow your job to fit on any of the available node-types.

If you want to target the 40-core nodes specifically, e.g. because your calculations cannot easily be adapted to make use of the 48-core nodes, you can use #SBATCH --constraint=40core to prevent the 48-core nodes from being considered for your job.

See Node characteristics for all available node configurations.

Why was my job rejected?

If you receive sbatch: error: Batch job submission failed: when you submit a job, the rest of the message will usually indicate the problem. Here are some you might see:

  • Invalid qos specification: You've used a --partition or --qos directive. Remove the directive and try again.
  • Requested time limit is invalid (missing or exceeds some limit)
    • def- accounts are limited to 24h.
    • GPU jobs are limited to 24h.
    • pd- and ctb- accounts are limited to 7d.
  • Invalid account or account/partition combination specified
    • Most Siku users do not need to supply --account; delete the directive and try again.
    • If you think you should have access to more than one account, see "What kind of account do I have?" above.

Why hasn't my job started?

The output from the sq and squeue commands includes a Reason field. Here are some Reasons you may see, and what to do about them.

  • Resources: The scheduler is waiting for other jobs to finish in order to have enough resources to run this job.
  • Priority: Higher priority jobs are being scheduled ahead of this one.
    • pd- and ctb- accounts always have higher priority than def- accounts. Within these tiers, priority is determined by the amount of computing that each account has done recently.
  • QOSGrpBillingMinutes: You are close to the quarterly usage limit for your paid account. See Tracking paid accounts.
  • QOSGrpCpuLimit: This may be due to:
    • The limit on the number of CPUs which can be used by long jobs at one time, considering all accounts in aggregate.
    • The limit on the number of CPUs which can be used by a ctb- account at one time. Jobs for a ctb- account run in the same priority tier as pd- jobs, but they cannot use more than a fixed amount of resources at any one time. Use --account=def-xxx to avoid this limit, but at the cost of priority
    • Certain users of ctb- accounts may be subject to a stronger constraint on the number of simultaneous CPUs, at the request of the contributing PI.
    • Such a job should eventually run, when other jobs in the pool sharing the same limit finish.
  • QOSGrpGRES: The ctb- account you are using has a limit on the number of GPUs which may be in use at one time. If sinfo --partition=all_gpus shows there are idle GPUs, try resubmitting the pending job with --account=def-xxx.
  • MaxCpuPerAccount: Long-job limit per account.
  • QOSMaxWallDurationPerJobLimit: You have submitted a job with a time limit >24h from def- account. This job will never run; scancel and resubmit with a time limit less than or equal to 24h.
    • Certain users of ctb- accounts may be subject to a similar constraint with a different time limit, at the request of the contributing PI.
  • ReqNodeNotAvail, Reserved for maintenance: There is maintenance scheduled to take place in the near future. Your job's time limit is long enough that it might not finish before the maintenance outage begins.
    • You can determine the start time of the maintenance outage with scontrol show reservations, and the current time with date.
  • BadConstraints: We have seen jobs which display BadConstraints and then run normally without intervention. Treat it as a meaningless message, unless you think your job has been waiting in this state for an unusually long time, in which case Ask Support for help.

My monthly usage report doesn't look right

See Tracking paid accounts.

Other restrictions

  • Each account may have no more than 5,000 jobs pending or running at one time. This is to prevent overloading the scheduler processes. A job array counts as multiple jobs.

Storage quotas and filesystem characteristics

Filesystem Default Quota Backed up? Purged? Mounted on Compute Nodes?
Home Space 52 GB and 512K files per user Yes No Yes
Scratch Space 20 TB and 1M files per user No Not yet implemented Yes
Project Space 1 TB and 512K files per group Yes No Yes

/home is backed up to tape daily and /project weekly. Tapes are duplicated against media failure. All backups are kept on-site.

Node characteristics

Nodes Cores Available memory CPU Storage GPU
42 40 181G or 186000M 2 x Intel Xeon Gold 6248 @ 2.5GHz ~720G -
8 40 370G or 378880M 2 x Intel Xeon Gold 6248 @ 2.5GHz ~720G -
17 40 275G or 282000M 2 x Intel Xeon Gold 6248 @ 2.5GHz ~720G -
33 48 275G or 282000M 2 x Intel Xeon Gold 6240R @ 2.4GHz ~720G -
1 40 181G or 186000M 2 x Intel Xeon Gold 6148 @ 2.4GHz ~720G 3 x NVIDIA Tesla V100 (32GB memory)
1 40 181G or 186000M 2 x Intel Xeon Gold 6148 @ 2.4GHz ~720G 2 x NVIDIA Tesla V100 (32GB memory)
1 40 181G or 186000M 2 x Intel Xeon Gold 6248 @ 2.5GHz ~720G 4 x NVIDIA Quadro RTX 6000 (24GB memory)
1 40 181G or 186000M 2 x Intel Xeon Gold 6248 @ 2.5GHz ~720G 2 x NVIDIA Quadro RTX 6000 (24GB memory)
  • "Available memory" is the amount of memory configured for use by Slurm jobs. Actual memory is slightly larger to allow for operating system overhead.
  • "Storage" is node-local storage. Access it via the $SLURM_TMPDIR environment variable.
  • Hyperthreading is turned off.
  • All GPUs are PCIe-linked. NVLINK is not currently supported.

Operating system: Rocky Linux 9.x

SSH host keys

Login nodes:

siku.ace-net.ca

ED25519 (256b)
SHA256:ivMcbACnaXK3C6RHBxE3zh17F/zROsR2E9ZW9vewwv8
MD5:7f:0a:fe:09:4b:0d:8e:41:f9:96:cc:b1:aa:60:6d:1f
RSA (3072b)
SHA256:ILN9LOkykPIG+PGSi0GDNT6fUFbApC4ECxgSfk4OKZk
MD5:95:80:47:b9:d8:a7:c5:31:13:9a:72:22:7c:e7:f8:f8

industry.siku.ace-net.ca

ED25519 (256b)
SHA256:mbl/JVD9LnluYV/9g13oSv5tgzd32U9MEl7H+28oaLE
MD5:d6:e3:7c:9e:93:0c:8a:2b:24:ba:94:ce:77:4f:62:9a
RSA (3072b)
SHA256:C1CNHK0B/i2e8+t9Sya1m/ugofXQFdwTfFR3JcsRIiI
MD5:04:f4:d8:33:a5:cf:e0:7b:e1:16:fb:86:f8:3f:e9:20

Data Transfer Node (DTN):

ED25519 (256b)
SHA256:qc5JZcUIJAT/J6Dob3QVIqqdQWTZXwgtGzW3sTj6hZQ
MD5:9f:fd:68:c9:06:03:0d:6c:95:2c:b2:a8:b6:65:9f:58
RSA (3072b)
SHA256:oDqXPh9d2g4UkYv7HwtehZPyRVFOkol/jURyw3yI3Jo
MD5:62:c6:d9:d9:53:c5:71:e9:4f:d9:69:46:e4:3a:0c:38

July 2024 changes

During an extended outage between June 17th and July 3rd we implemented a number of changes.

List of changes

  • The operating system was updated to Rocky Linux 9.4
  • The Slurm scheduler has been updated to version 23.11.8
  • Login nodes were renamed to sikulogin1/sikulogin2
  • Compute nodes were renamed to siku1 through siku104 with siku101 to siku104 being the GPU nodes (formerly cg001 to cg004).
  • Industry users need to use the new login node industry.siku.ace-net.ca.
  • The SSH hostkeys have changed. The fingerprints of the new keys can be found above.
You may need to remove the old keys by the following commands on your machine:
  ssh-keygen -f ~/.ssh/known_hosts -R "siku.ace-net.ca"
  ssh-keygen -f ~/.ssh/known_hosts -R "134.153.246.145"
  ssh-keygen -f ~/.ssh/known_hosts -R "134.153.246.158"
  ssh-keygen -f ~/.ssh/known_hosts -R "dtn.siku.ace-net.ca"
  ssh-keygen -f ~/.ssh/known_hosts -R "134.153.246.139"

Known Issues

These issues are being worked at:

  • Currently only two GPU nodes with RTX6000 GPUs are available. We are working on making the other GPU nodes available.
  • Currently a few CPU compute nodes are still unavailable. We are working on bringing the remaining compute nodes online.
  • Using ssh to connect to running jobs is currently not possible.
  • The diskusage_report (quota) command is currently not being updated.
  • The diskusage_report (quota) command is currently not showing usage of the project filesystem.
  • The following web-services are currently not available:
* JupyterHub
* WebMO
* Siku User Portal

Resolved Issues

  • The Data Transfer Node (dtn.siku.ace-net.ca) is now available.