Qsum
Legacy documentation
This page describes a service provided by a retired ACENET system. Most ACENET services are currently provided by national systems, for which please visit https://docs.computecanada.ca. |
- Main page: Job Control
qsum
is a ACENET custom-made utility for seeing a short summary of how busy the Grid Engine queues and waiting list are.
Here is an example of qsum
output from Mahone:
$ qsum -------------------------------------------------- Queue Info Processors -------------------------------------------------- Queue Name IN USE AVAIL UNAVAIL TOTAL long.q 55 1 0 56 medium.q 108 0 4 112 short.q 348 16 16 380 test.q 0 8 0 8 --------------------------------------------------------- Users Info Running Waiting Error/Other --------------------------------------------------------- ID #JOBS #CPUS #JOBS #CPUS #JOBS #CPUS asmith 1 8 bbrown 11 89 3 24 cjones 6 96 16 256 18 288 dduffy 3 248 ewilson 2 32 ffreleng 3 19 ggeorge 3 3 hpotter 2 16 --------------------------------------------------------- Total 31 511 19 280 18 288
The first table ("Queue Info") summarizes how many slots (cpu cores) are occupied with jobs, and how many are available. The rows (long.q, medium.q, short.q, test.q
) indicate how many slots are in use, available, or out of service for
- long jobs, i.e. those requesting more than 168 hours run time,
- medium jobs, requesting more than 48 hours,
- short jobs, requesting less than 48 hours, and
- test jobs, requesting less than 1 hour and
"test=true"
.
Short jobs can also run in medium.q
and long.q
, and medium jobs can run in long.q
.
The second table ("User Info") summarizes the number of jobs and slots that are in use and requested by each user.
The columns headed "Error/Other" show if you have any jobs that are being held back from running due to an error condition. These error conditions are usually correctable by the user, not the system operator. Contact support if your jobs are in an error state and you don't know what do to about it. Jobs in various other non-runnable states like "hqw" or "dr" will also appear in this column.
$ qsum -------------------------------------------------- Queue Info Processors -------------------------------------------------- Queue Name IN USE AVAIL UNAVAIL TOTAL cmms.q 63 17 0 80 demirov.q 32 0 0 32 gaussian.q 75 1 0 76 interact.q 0 8 0 8 long.q 43 5 0 48 medium.q 96 0 0 96 short.q 299 33 0 332 sub.q 0 72 88 160 tarasov.q 4 76 0 80 test.q 0 8 0 8 --------------------------------------------------------- Users Info Running Waiting Error/Other --------------------------------------------------------- ID #JOBS #CPUS #JOBS #CPUS #JOBS #CPUS asmith 10 160 2 32 2 bbrown 7 112 18 288 cjones 1 1 dduffy 3 7 1 4 ..... lines omitted ..... zeno 1 4 --------------------------------------------------------- Total 103 596 768 1108 4 5
This example from Placentia shows the large number of cluster queues present at that site.
- The standard cluster queues,
long.q, medium.q, short.q
andtest.q
are present as at other sites. - There are several Green ACENET queues, e.g.
demirov.q, cmms.q, tarasov.q
. Access to these queues is restricted to members of the research group which funded the purchase of the nodes. sub.q
is a subordinate queue which allows users to run jobs on these Green ACENET nodes. See thesub.q
page for how to do this.gaussian.q
is a set of nodes purpose-built for the Gaussian computational chemistry code, and accessible to all users of that package.
Caveat
The queueing system has a much more complicated state than can be represented in a one-screen summary. In particular, the "AVAIL" column should not simply be read as "idle and waiting for your job". For example, your job may require memory or other resources the available slots don't have. See "Job won't start" in our FAQ for more on this. Also, if adjustments have been made to the Grid Engine configuration, as technical staff have to do from time to time, there may be overcounting of available slots for a day or two.