Updated 2022-08-08

Submit Jobs to Phoenix Queues

Summary

  1. Navigate to your working directory
  2. Create a PBS Script or download the template found below here.
  3. Designate the account to which the job should be chared with the -A flag.
  4. Indicate the required resources (CPUs, GPUs, memory, local storage, etc.) for the job with the -l flag - the scheduler will route your job to the appropriate resource pool!
    1. The default queue, inferno, will be used for all jobs unless otherwise specified. For the differences between the two queues, see below.
  5. Submit your job with qsub job_name.pbs replacing job_name.pbs with the name of your PBS Script.

Note

For more information about how to create a PBS Script, please refer to the PBS Scripting Guide. All information found there applies to the Phoenix Cluster except for the queues that a job may be submitted to (a list of the Phoenix Cluster queues can be found towards the end of this page).

Job Accounting

Phoenix's accounting system is based on the most significant processing unit on the compute node:

  • On CPU and CPU-SAS nodes, charge rates are based on CPU-hours (total number of procs * walltime) allocated for the job
  • On GPU-V100 and GPU-RTX6000 nodes, charge rates are based on GPU-hours (total number of GPUs * walltime) allocated for the job

The rates for each of the node classes can be found in this table and a description for each of the compute node classes can be found on this page.

When submitting a job, the account to which the job should be charged must be specified using the -A flag, either on the command line or as a PBS directive. The scheduler will verify that the account has sufficient funds available to run the full length of the job before accepting it, and a corresponding lien will be placed on the account once the job starts running. If the job finishes early, the excess funds will be released.

To see the accounts to which you can submit jobs and their current balances, run the pace-quota command and read the "Job Charge Account Balances" section:

[puser32@login-phoenix-3 ~]$ pace-quota
...
====================================================================================================
                                    Job Charge Account Balances
====================================================================================================
Name                        Balance           Reserved      Available
GT-gburdell3-CODA20       291798.90            3329.35      288469.67
GT-gburdell3-phys         241264.01              69.44      241194.66
GT-gburdell3                  41.72               0.00          41.72

The balance column shows the current total based on completed transactions; the reserved column lists the sum of liens based on running jobs; and the available column displays the total funds available for new job submissions.

There are several types of accounts currently available to researchers; the appropriate choice depends on your computational needs and the preferences of the researcher responsible for the account. The following table summarizes the various accounts that may be available on Phoenix.

Account Name Syntax Description Example
GT-<PI UID> An institute-sponsored account that provides 10k CPU hours on a base CPU-192GB node, although the credits can be used on any node class. These credits reset on the 1st of the month. GT-gburdell3
GT-<PI UID>-CODA20 Account for 2020 hardware refresh with the move to Coda; credits are determined based on 5 years of cycles on the refreshed hardware that was determined based on the "equivalent or better" refreshment rubric using the SPECfp_rate benchmark GT-gburdell3-CODA20
GT-<PI UID>-FY20PhaseN Account for compute resources purchased in FY20; credited with the maximum of FY20 expenditures or credits equivalent to 5 years of cycles on the purchased hardware GT-gburdell3-FY20Phase2
GT-<PI UID>-<group> PI-specific child account for a shared (multi-PI or school-owned) account. This is the account to which jobs should be charged. Depending on the arrangement made by the shared account's managers, there may be a fixed value assigned to each PI, or you may have access to the full shared balance. The visible balance may be a total lifetime value or a value reset each month, depending on the managers' preference. GT-gburdell3-phys
GT-<group>-CODA20 Parent account for a (multi-PI) shared account; child accounts are either allocated a fixed percentage of deposits to this account or draw down from the parent balance. This unseen account cannot be used for job submissions, but instead provides the funds from which child accounts may draw. GT-phys-CODA20
GT-<PI UID>-<custom> Account opened in Phoenix on the postpaid billing model. PIs are billed based on actual usage each month and may set limits if preferred. GT-gburdell3-paid

Queues on the Phoenix Cluster

Unlike previous PACE-managed clusters, there are only two queues on the Phoenix cluster: inferno and embers. Although the two queues provide access to the same resource pool, the job policies are quite different. Please note that the queue policies are current as of 10/01/2020, and are subject to change to best meet the needs of the research community.

Inferno: The Primary Queue

Inferno is the main, and default, queue for the Phoenix cluster; all jobs submitted to the scheduler will be routed here unless otherwise specified. Jobs in this queue will consume account credits, but will benefit from a larger job limit, higher priority, and longer wallclock limits. This queue should be the main production mechanism for workflow, as jobs here will start sooner and cannot be pre-empted. For jobs in the inferno queue, the following policies apply:

  • Base priority = 250,000
  • Max jobs per user = 500
    • Max eligible jobs per user = 500*
  • Wallclock limit = The minimum of the following:
    • 21 days for CPU resources (e.g. CPU-192GB or CPU-768GB-SAS node classes)
    • 3 days for GPU resources (e.g. GPU-192GB-V100 or GPU-384GB-RTX6000 node classes)
    • 264,960 CPU-hours ÷ Number of Requested Processors

To submit jobs to the inferno queue, you can use the -q inferno flag or omit it when submitting jobs, as this is the default queue for all jobs.

Note

The scheduler will reject a job if the job submission exceeds the 264,960 CPU-hours ÷ Number of Requested Processors. If your job is listed as complete with no output, please check that the nodes * ppn * walltime < 264,960 processor-hours.

Embers: The Backfill Queue

Embers is the backfill queue on the Phoenix cluster - jobs submitted here are meant to take advantage of opportunistic cycles remaining after the inferno jobs have been accommodated. Jobs submitted here have a small job limit, the lowest priority, and shorter wallclock limits. Additionally, jobs submitted to this queue are eligible for pre-emption; after the first hour that the job is running, if an inferno job is waiting for the resources being consumed, the running embers job will be killed. You can resubmit the job if you would like to try again. However, while jobs submitted to this queue still require an associated account to run, no credits will be consumed from the account. As such, jobs in the embers queue should not be critical workflow that faces an imminent deadline. For jobs in the embers queue, the following policies apply:

  • Base priority = 0
  • Max jobs per user = 50
    • Max eligible jobs per user = 1
  • Wallclock limit = 8 hours
  • Eligible for preemption after 1 hour

To submit jobs to the embers queue, use the -q embers flag when submitting jobs.

Tip

The embers queue is ideal for exploratory work as you develop, compile, and debug your applications.

Additional Constraints on Running Jobs

In addition the above per-job limits, the scheduler is also configured with the following limits on concurrently running jobs to provide balanced utilization of the resource by all. These limits apply to jobs submitted in both queues, and jobs that violate these limits will be held in the queue until currently running jobs complete and the total number of utilized processors and GPUs, and the remaining CPU-time fall below the thresholds.

  • Per-group processors = 6000
  • Per-user GPUs = 32
  • Per-group CPU-time = 300,000 CPU-hours

Targeting a Resource Pool

The scheduler places your job in the appropriate resource pool based on the requested resources for the job. To determine the compute node class most appropriate for the job, the memory per processor (either using pmem or mem/nprocs) and additional features (e.g. GPU or SAS are considered). The following tricks will help guide you to the appropriate resource pool:

  • CPU-192GB: less than 8 GB per processor, no features (-l nodes=1:ppn=4,pmem=6gb)
  • CPU-384GB: less than 16 GB per processor, no features (-l nodes=1:ppn=4,pmem=9gb
  • CPU-768GB: more than 16 GB per processor, no features (-l nodes=1:ppn=4,pmem=22gb
  • CPU-384GB-SAS: less than 16 GB per processor, "local-sas" feature request (-l nodes=1:ppn=4:local-sas,pmem=5gb
  • CPU-768GB-SAS: more than 16 GB per processor, "local-sas" feature request (-l nodes=1:ppn=4:local-sas,pmem=26gb
  • GPU-192GB-V100: less than 8 GB per processor, more than 1 GPU and optionally TeslaV100-16GB or TeslaV100-32GB feature (-l nodes=1:ppn=4:gpus=1:TeslaV100-16GB)
  • GPU-384GB-V100: less than 16 GB per processor, more than 1 GPU and optionally TeslaV100-16GB or TeslaV100-32GB feature (-l nodes=1:ppn=4:gpus=1,mem=40gb)
  • GPU-768GB-V100: more than 16 GB per processor, more than 1 GPU and optionally TeslaV100-16GB or TeslaV100-32GB feature (-l nodes=1:ppn=2:gpus=2,pmem=20gb)
  • GPU-384GB-RTX6000: less than 16 GB per processor, more than 1 GPU and RTX6000 feature (-l nodes=1:ppn=2:gpus=1:RTX6000)
  • GPU-768GB-RTX6000: more than 16 GB per processor, more than 1 GPU and RTX6000 feature (-l nodes=1:ppn=6:gpus=1:RTX6000,mem=128gb)

Note

The default GPU type is the TeslaV100. If you want to use a Quadro Pro RTX6000 GPU, you must specify the RTX6000 feature.

Example

  • The following is a PBS Script template that serves as an example.
  • You can download the script here. Make sure to edit the appropriate fields before attempting to submit a job.
  • To find the Charge Accounts you have access to (for the #PBS -A [Account] flag), run the command pace-quota
#PBS -N [INSERT NAME]           # job name
#PBS -A [Account]               # account to which job is charged, ex: GT-gburdell3
#PBS -l nodes=2:ppn=4           # number of nodes and cores per node required
#PBS -l pmem=2gb                # memory per core
#PBS -l walltime=15:00          # duration of the job (ex: 15 min)
#PBS -j oe                      # combine output and error messages into 1 file
#PBS -o [INSERT NAME].out       # output file name
#PBS -m abe                     # event notification, set to email on start, end, or fail
#PBS -M [INSERT EMAIL ADDRESS]  # email to send notifications to

cd $PBS_O_WORKDIR 

for i in {1..5}
do 
    echo "Test $i"
done