Updated 2020-10-20

Important

Coming Soon: The Phoenix Cluster will be coming into production in late fall. Stay tuned for annoucements on the PACE Website and the Pace Blog

Phoenix cluster logo

Introduction

Welcome to the Phoenix Cluster, the

System Overview

CLX Compute

Each compute node in the Phoenix cluster has dual Intel Xeon Gold 6226 Processors (Cascade Lake), providing 2 sockets with 12 cores

Login Nodes

File Systems

Network

Accessing the System

Job Submissions on Phoenix

Job Accounting

Phoenix's accounting system is based on the most significant processing unit on the compute node:

  • On CPU and CPU-SAS nodes, charge rates are based on CPU-hours (total number of procs * walltime) allocated for the job
  • On GPU-V100 and GPU-RTX6000 nodes, charge rates are based on GPU-hours (total number of GPUs * walltime) allocated for the job

The rates for

Job Scheduler

Phoenix's job scheduler consists of the Moab Workload Manager, which is responsible for matching jobs to nodes, and the Torque Resource Manager, which provides the low-level functionality to manage jobs. Simply put, the qsub command is used to submit jobs to Torque, which enqueues them in Moab to determine when/where to run them, executes them accordingly, and then returns the results to the user.

Phoenix Queues

Unlike previous PACE-managed clusters, there are only two queues on the Phoenix cluster: inferno and embers. Although the two queues provide access to the same resource pool, the job policies are quite different. Please note that the queue policies are current as of 10/01/2020, and are subject to change to best meet the needs of the research community.

Inferno: The Fiery Blaze That Incinerates Any Computational Workload

Inferno is the main, and default, queue for the Phoenix cluster; all jobs submitted to the scheduler will be routed here unless otherwise specified. Jobs in this queue will consume account credits, but will benefit from a larger job limit, higher priority, and longer wallclock limits. This queue should be the main production mechanism for workflow, as jobs here will start sooner and cannot be pre-empted. For jobs in the inferno queue, the following policies apply:

  • Base priority = 250,000
  • Max jobs per user = 500
    • Max eligible jobs per user = 500*
  • Wallclock limit = 21 days*

*Due to high demand for GPU resources, the wallclock limit is 3 days and the maximum eligible jobs per user is 10 for these resources.

Embers: The Remnants of the Inferno Hot Enough to Extend Your Research

Embers is the backfill queue on the Phoenix cluster - jobs submitted here are meant to take advantage of opportunistic cycles remaining after the inferno jobs have been accommodated. Jobs submitted here have a small job limit, the lowest priority, and shorter wallclock limits. Additionally, jobs submitted to this queue are eligible for pre-emption; if an inferno job is waiting for the resources being consumed, the job will be cancelled and the user will need to resubmit to continue their work. However, while jobs submitted to this queue still require an associated account to run, no credits will be consumed from the account. As such, jobs in the embers queue should not be critical workflow that faces an imminent deadline. For jobs in the embers queue, the following policies apply:

  • Base priority = 0
  • Max jobs per user = 50
    • Max eligible jobs per user = 1
  • Wallclock limit = 8 hours
  • Eligible for preemption after 1 hour

Submitting Jobs with qsub

Important

The use of msub is disabled, as the resource manager, Torque, is responsible for job management. Please ensure applications are updated to use the qsub command accordingly.

To submit jobs

Optimized Job Routing