Tracking paid accounts
If your firm or organization has a paid contract with ACENET for compute time on Siku, then this page explains how your computing is measured and tracked. Submitting and troubleshooting jobs is discussed elsewhere.
- "User", that's obvious. One person, one login name.
- "Account" is not the same as "user" in this context. Your firm or organization has a contract with ACENET (or you probably wouldn't be reading this.) On Siku, that contract is represented by an "account" with a name like "pd-abc-123".
- "QoS" stands for "Quality of Service", but it might be better to think of a "QoS" as a software object which remembers how many CPU hours etc. an account is allowed to use, and how much has been used already. Each paid account is associated with its own QoS, and the QoS has the same name as the account, like "pd-abc-123".
- "Billing hours" measure the use of the system. One CPU-hour and associated RAM is worth one billing hour. A GPU-hour is worth 35 billing hours. See below for a formula, and examples.
- A "billing minute" is simply one-sixtieth of a billing hour.
- We have also used the term "CPU hours" in the past, with the same meaning.
How much computing can I do?
You can see the number of billing hours available to you through your QoS by running the utility acct-tool. The output will look something like this:
Available QoSs: pd-abc-123 Default QoS: pd-abc-123 For QoS 'pd-abc-123' this quarter: Billing (CPU-equiv) hours cap: 2190000.0 (131400000 minutes) Billing (CPU-equiv) hours used: 57.7 (3461 minutes)
When your team gets close to the limit, you may find that your jobs are not starting, but instead staying in PD (pending) state with "QOSGrpBillingMinutes" showing in the "Reason" field of squeue (or sq). Or, you may receive this pair of messages on trying to submit a job with with sbatch:
sbatch: error: QOSGrpBillingMinutes sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
In either case this is because the job would put you over your billing limit if it ran for the time requested. Have your Principal Investigator contact firstname.lastname@example.org to discuss raising your billing limit.
Who has been using our account, and how much?
To see the billing units charged against your account (your QoS) broken down by user, run acct-history:
$ acct-history -S 2019-11-01 Usage report on QoS 'pd-abc-123': Start time: 2019-11-01 End time: 2019-11-14T20:27:50 Billing (CPU-equiv) hours per user: 198.4 alice 9817.9 bob ------------------------ 10016.3 TOTAL
A report containing similar information will be sent at the beginning of each month to the senior contact in the organization (i.e., the principal investigator or contract signer).
Billing units formula
BillingHours = MAX( CPUs, RAM_GB * 0.215, GPUs * 35.0 ) * hours
A job that reserves one CPU and 4G of RAM and runs for one hour consumes 1 billing hour. One CPU and 10G of RAM for one hour? 2.15 billing hours.
One of our GPU-equipped nodes has 40 CPUs, 186G of RAM, and two GPUs. To use that node for 24 hours would cost MAX( 40, 186*0.215, 2*35 ) * 24 = 70 * 24 = 1680 billing hours.
The rate per GB of RAM is chosen so that using either all the CPUs or all the RAM on a basic node costs the same 40 billing units. Requesting all the memory on a high-memory node costs 80 billing units.
You can get the billing rate for a live job like so:
[you@login1 ~]$ scontrol show job 7976 | grep billing TRES=cpu=12,mem=108000M,node=1,billing=22
This job will be billed at a rate of 22 billing units per unit of elapsed time. You can determine the billing units consumed by an individual job, once it has completed, by examining the Slurm accounting record for the job and multiplying the billing rate by the elapsed time:
[you@login1 ~]$ sacct -X --format=AllocTRES%40,Elapsed --noheader -j 7402 billing=40,cpu=40,mem=10G,node=1 00:02:09
In this example, the billing rate is 40 and the elapsed time is 2 minutes 9 seconds, or 0.0358 hours, so the job cost 1.43 billing hours.
Why the funny word, QoS?
You may well ask, "Why have QoSs at all, why not just use accounts?" That has to do with Slurm internals. We would like you to be able to think in terms of a "bank account" of computing time, but to implement that we had to use Slurm's QoS mechanism. If we were then to call a QoS a "bank account", when the term "account" in Slurm means something slightly different but closely related, that would cause great confusion if and when you ever have to consult the generic Slurm documentation at https://slurm.schedmd.com.
What if I have more than one account or QoS?
A user typically only has access to one account and one QoS, and your jobs are automatically associated with that account and QoS. You could have more than one account if, for example, your firm made two separate contracts with ACENET for separate projects, and you are working on both, or if you were a freelancer and working for two different firms with ACENET contracts. In that case it will be up to you to assign each job you submit to the correct QoS using the --qos= option to sbatch, salloc, or srun.