Tracking paid accounts

From ACENET
Jump to: navigation, search

If your firm or organization has a paid contract with ACENET for compute time on Siku, then this page explains how your computing is measured and tracked. Submitting and troubleshooting jobs is discussed elsewhere.

Monthly statements

The principal investigator for your organization should receive an automated email on the first day of every month, summarizing the last twelve months' usage on Siku. It will look something like this:

Here is a report on your firm's recent usage on Siku.

The valid users associated with your account today are:
  aturing, ghopper

Your quarterly usage cap is currently 2000 billing hours.
You can adjust this cap at any time by emailing support@ace-net.ca.

Billing hours used, by month (most recent first):
   Mar 2023        271.48 
   Feb 2023        292.18 
   Jan 2023        217.01 
   Dec 2022        910.99 
   Nov 2022       1150.67 
   Oct 2022        883.90 
   Sep 2022        588.38 
   Aug 2022        854.17 
   Jul 2022         12.58 
   Jun 2022          0.00 
   May 2022          0.00 
   Apr 2022          0.00 
   --------------------
   TOTAL          5181.36

Billing hours used in the above months, by user:
   aturing        4619.63
   ghopper         561.73
   --------------------
   TOTAL          5181.36

Billing hours used in most recent month (March 2023), by user:
   aturing          21.43
   ghopper         250.05
   --------------------
   TOTAL           271.48

Billing hours used in most recent month (March 2023), by comment:
   (none)          271.84
   --------------------
   TOTAL           271.48

Best regards,
An ACENET robot

This is, we hope, self-explanatory except perhaps for the last table. The "Billing hours used ... by comment" will be uninformative in most cases, but if you add comments to each of your jobs, like so:

#SBATCH --comment=ALPHA_beta

...then the final table will break out your usage by the comment. A contract research firm, for example, might use a unique comment string for each of its clients, so that the usage on behalf of each client can be easily read from the table.

The gory details

Terminology

  • "User", that's obvious. One person, one login name.
  • "Account" is not the same as "user" in this context. Your firm or organization has a contract with ACENET (or you probably wouldn't be reading this.) On Siku, that contract is represented by an "account" with a name like "pd-abc-123".
  • "QoS" stands for "Quality of Service", but it might be better to think of a "QoS" as a software object which remembers how many CPU hours etc. an account is allowed to use, and how much has been used already. Each paid account is associated with its own QoS, and the QoS has the same name as the account, like "pd-abc-123".
  • "Billing hours" measure the use of the system. One CPU-hour and associated RAM is worth one billing hour. A GPU-hour is worth 35 billing hours. See below for a formula, and examples.
    • A "billing minute" is simply one-sixtieth of a billing hour.
    • We have also used the term "CPU hours" in the past, with the same meaning.

How much computing can I do?

You can see the number of billing hours available to you through your QoS by running the utility acct-tool. The output will look something like this:

Available QoSs: pd-abc-123
Default QoS:    pd-abc-123

For QoS 'pd-abc-123' this quarter:
 Billing (CPU-equiv) hours cap:     2190000.0   (131400000 minutes)
 Billing (CPU-equiv) hours used:         57.7   (3461 minutes)

When your team gets close to the limit, you may find that your jobs are not starting, but instead staying in PD (pending) state with "QOSGrpBillingMinutes" showing in the "Reason" field of squeue (or sq). Or, you may receive this pair of messages on trying to submit a job with with sbatch:

sbatch: error: QOSGrpBillingMinutes
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)

In either case this is because the job would put you over your billing limit if it ran for the time requested. Have your Principal Investigator contact support@ace-net.ca to discuss raising your billing limit.

Who has been using our account, and how much?

To see the billing units charged against your account (your QoS) broken down by user, run acct-history:

$ acct-history -S 2019-11-01
Usage report on QoS 'pd-abc-123':
     Start time:  2019-11-01
       End time:  2019-11-14T20:27:50

Billing (CPU-equiv) hours per user:
           198.4  alice
          9817.9  bob
  ------------------------
         10016.3  TOTAL

Breakdowns by individual jobs can be obtained from sacct with suitable options. See the sacct man page or contact support for help.

A report containing similar information will be sent at the beginning of each month to the senior contact in the organization (i.e., the principal investigator or contract signer).

Billing units formula

BillingHours = MAX( CPUs, RAM_GB * 0.215, GPUs * 35.0 ) * hours

A job that reserves one CPU and 4G of RAM and runs for one hour consumes 1 billing hour. One CPU and 10G of RAM for one hour? 2.15 billing hours.

One of our GPU-equipped nodes has 40 CPUs, 186G of RAM, and two GPUs. To use that node for 24 hours would cost MAX( 40, 186*0.215, 2*35 ) * 24 = 70 * 24 = 1680 billing hours.

The rate per GB of RAM is chosen so that using either all the CPUs or all the RAM on a basic node costs the same 40 billing units. Requesting all the memory on a high-memory node costs 80 billing units.

You can get the billing rate for a live job like so:

[you@login1 ~]$ scontrol show job 7976 | grep billing
  TRES=cpu=12,mem=108000M,node=1,billing=22

This job will be billed at a rate of 22 billing units per unit of elapsed time. You can determine the billing units consumed by an individual job, once it has completed, by examining the Slurm accounting record for the job and multiplying the billing rate by the elapsed time:

[you@login1 ~]$ sacct -X --format=AllocTRES%40,Elapsed --noheader -j 7402
       billing=40,cpu=40,mem=10G,node=1   00:02:09

In this example, the billing rate is 40 and the elapsed time is 2 minutes 9 seconds, or 0.0358 hours, so the job cost 1.43 billing hours.

Why the funny word, QoS?

You may well ask, "Why have QoSs at all, why not just use accounts?" That has to do with Slurm internals. We would like you to be able to think in terms of a "bank account" of computing time, but to implement that we had to use Slurm's QoS mechanism. If we were then to call a QoS a "bank account", when the term "account" in Slurm means something slightly different but closely related, that would cause great confusion if and when you ever have to consult the generic Slurm documentation at https://slurm.schedmd.com.

What if I have more than one account or QoS?

A user typically only has access to one account and one QoS, and your jobs are automatically associated with that account and QoS. You could have more than one account if, for example, your firm made two separate contracts with ACENET for separate projects, and you are working on both, or if you were a freelancer and working for two different firms with ACENET contracts. In that case it will be up to you to assign each job you submit to the correct QoS using the --qos= option to sbatch, salloc, or srun.