Updated 2022-11-11
Phoenix Cluster Resources¶
Detailed Node Specs¶
- Most nodes includes the following common features:
- Dual Intel Xeon Gold 6226 CPUs @ 2.7 GHz (24 cores/node)
- DDR4-2933 MHz DRAM
- Infiniband 100HDR interconnect
- The cpu-amd nodes include the following common features:
- Dual AMD Epyc 7713 CPUs @ 2.0 GHz (128 cores/node)
- 512GB DDR4 DRAM
- 1.6 TB NVMe
- The gpu-a100 nodes include the following common features:
- Dual AMD Epyc 7513 CPUs @ 2.6 GHz (64 cores/node)
- 512GB DDR4 DRAM
- 2x Nvidia A100-40GB Tensor Core GPUs
- 1.6 TB NVMe
- The following chart provides detailed specifications for the 800 nodes that are part of the Phoenix-Slurm cluster migration for Phases 1 (October 10) and 2 (November 2-4). Details are also included for 4 additional cpu-amd nodes and 5 additional gpu-a100 nodes added on November 7:
Node Class | Quantity | RAM | Storage | Extra Unique Specs |
---|---|---|---|---|
CPU-192GB | 522 | 192 GB | 1.6 TB NVMe storage | |
CPU-384GB | 144 | 384 GB | 1.6 TB NVMe storage | |
CPU-768GB | 39 | 768 GB | 1.6 TB NVMe storage | |
CPU-384GB-SAS | 46 | 192 GB | 8.0 TB SAS storage | |
CPU-768GB-SAS | 2 | 384 GB | 8.0 TB SAS storage | |
GPU-192GB-V100 | 11 | 192 GB | 2x Tesla V100 | |
GPU-384GB-V100 | 14 | 384 GB | 2x Tesla V100 | |
GPU-768GB-V100 | 2 | 768 GB | 2x Tesla V100 | |
GPU-384GB-RTX6000 | 18 | 384 GB | 4x Quadro Pro RTX6000 | |
GPU-768GB-RTX6000 | 2 | 768 GB | 4x Quadro Pro RTX6000 | |
CPU-512GB-AMD | 4 | 512 GB | 1.6 TB NVMe storage | 2x AMD Epyc 7713 |
GPU-512GB-A100 | 5 | 512 GB | 1.6 TB NVMe storage | 2x AMD Epyc 7513, 2x Tensor Core A100 |
Partitions¶
- Jobs are assigned to Slurm partitions automatically based on your charge account (internal or external) and the most significant resources requested (gpu, memory requirements, etc).
- Jobs will only be charged if the inferno QOS is selected.
- Slurm partitions assigned determine how much users are charged based on current rates.
- Slurm partitions include the following node classes and are assigned by the scheduler based on availability:
Partition | Node Class |
---|---|
cpu-small | CPU-192GB, CPU-384GB, CPU-384GB-SAS, CPU-768GB, CPU-768GB-SAS |
cpu-medium | CPU-384GB, CPU-384GB-SAS, CPU-768GB, CPU-768GB-SAS |
cpu-large | CPU-768GB, CPU-768GB-SAS |
cpu-sas | CPU-384GB-SAS, CPU-768GB-SAS |
gpu-v100 | GPU-192GB-V100, GPU-384GB-V100, GPU-768GB-V100 |
gpu-rtx6000 | GPU-384GB-RTX6000, GPU-768GB-RTX6000 |
cpu-amd | CPU-512GB-AMD |
gpu-a100 | GPU-512GB-A100 |
- The partitions for external users have the same names with "-X" added (i.e. cpu-small-X, cpu-medium-X).
Job Submit Flowchart¶
- When submitting a job to the Slurm scheduler using interactive mode (using
salloc
) or with a script (usingsbatch
), the resources requested will determine the partition assigned, as illustrated in the following flowchart:
Tip
The scheduler reserves 8GB of memory for system processes, so the total available memory for jobs on a given node is reduced accordingly.