Coming Soon: The Phoenix Cluster will be coming into production in late fall. Stay tuned for annoucements on the PACE Website and the Pace Blog¶
Welcome to the Phoenix Cluster, the
Each compute node in the Phoenix cluster has dual Intel Xeon Gold 6226 Processors (Cascade Lake), providing 2 sockets with 12 cores
Accessing the System¶
Job Submissions on Phoenix¶
Phoenix's accounting system is based on the most significant processing unit on the compute node:
- On CPU and CPU-SAS nodes, charge rates are based on CPU-hours (total number of procs * walltime) allocated for the job
- On GPU-V100 and GPU-RTX6000 nodes, charge rates are based on GPU-hours (total number of GPUs * walltime) allocated for the job
The rates for
Phoenix's job scheduler consists of the Moab Workload Manager, which is responsible for matching jobs to nodes, and the
Torque Resource Manager, which provides the low-level functionality to manage jobs. Simply put, the
command is used to submit jobs to Torque, which enqueues them in Moab to determine when/where to run them, executes them accordingly, and then returns the results to the user.
Unlike previous PACE-managed clusters, there are only two queues on the Phoenix cluster: inferno and embers. Although the two queues provide access to the same resource pool, the job policies are quite different. Please note that the queue policies are current as of 10/01/2020, and are subject to change to best meet the needs of the research community.
Inferno: The Fiery Blaze That Incinerates Any Computational Workload¶
Inferno is the main, and default, queue for the Phoenix cluster; all jobs submitted to the scheduler will be routed here unless otherwise specified. Jobs in this queue will consume account credits, but will benefit from a larger job limit, higher priority, and longer wallclock limits. This queue should be the main production mechanism for workflow, as jobs here will start sooner and cannot be pre-empted. For jobs in the inferno queue, the following policies apply:
- Base priority = 250,000
- Max jobs per user = 500
- Max eligible jobs per user = 500*
- Wallclock limit = 21 days*
*Due to high demand for GPU resources, the wallclock limit is 3 days and the maximum eligible jobs per user is 10 for these resources.
Embers: The Remnants of the Inferno Hot Enough to Extend Your Research¶
Embers is the backfill queue on the Phoenix cluster - jobs submitted here are meant to take advantage of opportunistic cycles remaining after the inferno jobs have been accommodated. Jobs submitted here have a small job limit, the lowest priority, and shorter wallclock limits. Additionally, jobs submitted to this queue are eligible for pre-emption; if an inferno job is waiting for the resources being consumed, the job will be cancelled and the user will need to resubmit to continue their work. However, while jobs submitted to this queue still require an associated account to run, no credits will be consumed from the account. As such, jobs in the embers queue should not be critical workflow that faces an imminent deadline. For jobs in the embers queue, the following policies apply:
- Base priority = 0
- Max jobs per user = 50
- Max eligible jobs per user = 1
- Wallclock limit = 8 hours
- Eligible for preemption after 1 hour
Submitting Jobs with qsub¶
The use of
msub is disabled, as the resource manager, Torque, is responsible for job management. Please ensure applications are updated to use the
qsub command accordingly.
To submit jobs