Updated 2023-05-03
Phoenix Cluster Resources¶
Detailed Node Specs¶
- Most nodes includes the following common features:
- Dual Intel Xeon Gold 6226 CPUs @ 2.7 GHz (24 cores/node)
- DDR4-2933 MHz DRAM
- Infiniband 100HDR interconnect
- 40 cpu-large nodes have been added on February 21, 2023, with Dual Intel Xeon Gold 6226R CPUs @ 2.9 GHz (32 cores/node) and 768 GB of RAM
- The cpu-amd nodes include the following common features (4 added November 7, 2022; 4 added February 23, 2023):
- Dual AMD Epyc 7713 CPUs @ 2.0 GHz (128 cores/node)
- 512GB DDR4 DRAM
- 1.6 TB NVMe
- The gpu-a100 nodes include the following common features (5 added November 7, 2022; 6 added Feburary 23, 2023, 1 added February 28, 2023):
- Dual AMD Epyc 7513 CPUs @ 2.6 GHz (64 cores/node)
- 512GB DDR4 DRAM
- 2x Nvidia Tensor Core A100 40GB (6 nodes) or 80GB (6 nodes) GPUs
- 1.6 TB NVMe
- The cpu-pmem node (added on April 11, 2023) has a large amount of memory (1.5 TB). This memory is composed of 192 GB of DDR4-2933 ECC-DRAM and 1.3125 TB of 2666MHz DCPMM (Intel Optane persistent memory). It has 24 cores (Dual Intel Xeon Gold 6226 CPUs @ 2.7 GHz) and Infiniband 100HDR interconnect.
- The following chart provides detailed specifications for the 1382 nodes that were part of the Phoenix-Slurm cluster migration for Phases 1 (October 10-12, 2022) through 6 (January 30-February 3, 2023). Details are also included for 8 additional cpu-amd nodes and 12 additional gpu-a100 nodes:
Node Class | Quantity | RAM | Storage | Extra Unique Specs |
---|---|---|---|---|
CPU-192GB | 850 | 192 GB | 1.6 TB NVMe storage | |
CPU-384GB | 239 | 384 GB | 1.6 TB NVMe storage | |
CPU-768GB | 104 | 768 GB | 1.6 TB NVMe storage | |
CPU-384GB-SAS | 75 | 192 GB | 8.0 TB SAS storage | |
CPU-768GB-SAS | 4 | 384 GB | 8.0 TB SAS storage | |
CPU-PMEM | 1 | 1.5 TB | 1.6 TB NVMe storage | |
GPU-192GB-V100 | 21 | 192 GB | 2x Tesla V100 (16GB or 32GB) | |
GPU-384GB-V100 | 27 | 384 GB | 2x Tesla V100 (16GB or 32GB) | |
GPU-768GB-V100 | 5 | 768 GB | 2x Tesla V100 (16GB) | |
GPU-384GB-RTX6000 | 32 | 384 GB | 4x Quadro Pro RTX6000 (24GB) | |
GPU-768GB-RTX6000 | 5 | 768 GB | 4x Quadro Pro RTX6000 (24GB) | |
CPU-512GB-AMD | 8 | 512 GB | 1.6 TB NVMe storage | 2x AMD Epyc 7713 |
GPU-512GB-A100 | 12 | 512 GB | 1.6 TB NVMe storage | 2x AMD Epyc 7513, 2x Tensor Core A100 (40GB or 80GB) |
1383 |
Partitions¶
- Jobs are assigned to Slurm partitions automatically based on your charge account (internal or external) and the most significant resources requested (gpu, memory requirements, etc).
- Jobs will only be charged if the inferno QOS is selected.
- Slurm partitions assigned determine how much users are charged based on current rates.
- Slurm partitions include the following node classes and are assigned by the scheduler based on availability:
Partition | Node Class |
---|---|
cpu-small | CPU-192GB, CPU-384GB, CPU-384GB-SAS, CPU-768GB, CPU-768GB-SAS |
cpu-medium | CPU-384GB, CPU-384GB-SAS, CPU-768GB, CPU-768GB-SAS |
cpu-large | CPU-768GB, CPU-768GB-SAS |
cpu-sas | CPU-384GB-SAS, CPU-768GB-SAS |
cpu-pmem | CPU-PMEM |
gpu-v100 | GPU-192GB-V100, GPU-384GB-V100, GPU-768GB-V100 |
gpu-rtx6000 | GPU-384GB-RTX6000, GPU-768GB-RTX6000 |
cpu-amd | CPU-512GB-AMD |
gpu-a100 | GPU-512GB-A100 |
- The partitions for external users have the same names with "-X" added (i.e. cpu-small-X, cpu-medium-X).
Job Submit Flowchart¶
- When submitting a job to the Slurm scheduler using interactive mode (using
salloc
) or with a script (usingsbatch
), the resources requested will determine the partition assigned, as illustrated in the following flowchart:
Tip
The scheduler reserves 8GB of memory for system processes, so the total available memory for jobs on a given node is reduced accordingly.