Phoenix Cluster Resources¶
Detailed Node Specs¶
- Most nodes includes the following common features:
- Dual Intel Xeon Gold 6226 CPUs @ 2.7 GHz (24 cores/node)
- DDR4-2933 MHz DRAM
- Infiniband 100HDR interconnect
- 40 cpu-large nodes have been added on February 21, 2023, with Dual Intel Xeon Gold 6226R CPUs @ 2.9 GHz (32 cores/node) and 768 GB of RAM
- The cpu-amd nodes include the following common features (4 added November 7, 2022; 4 added February 23, 2023):
- Dual AMD Epyc 7713 CPUs @ 2.0 GHz (128 cores/node)
- 512GB DDR4 DRAM
- 1.6 TB NVMe
- The gpu-a100 nodes include the following common features (5 added November 7, 2022; 6 added Feburary 23, 2023, 1 added February 28, 2023):
- Dual AMD Epyc 7513 CPUs @ 2.6 GHz (64 cores/node)
- 512GB DDR4 DRAM
- 2x Nvidia Tensor Core A100 40GB (6 nodes) or 80GB (6 nodes) GPUs
- 1.6 TB NVMe
- The cpu-pmem node (added on April 11, 2023) has a large amount of memory (1.5 TB). This memory is composed of 192 GB of DDR4-2933 ECC-DRAM and 1.3125 TB of 2666MHz DCPMM (Intel Optane persistent memory). It has 24 cores (Dual Intel Xeon Gold 6226 CPUs @ 2.7 GHz) and Infiniband 100HDR interconnect.
- The following chart provides detailed specifications for the 1382 nodes that were part of the Phoenix-Slurm cluster migration for Phases 1 (October 10-12, 2022) through 6 (January 30-February 3, 2023). Details are also included for 8 additional cpu-amd nodes and 12 additional gpu-a100 nodes:
|Node Class||Quantity||RAM||Storage||Extra Unique Specs|
|CPU-192GB||850||192 GB||1.6 TB NVMe storage|
|CPU-384GB||239||384 GB||1.6 TB NVMe storage|
|CPU-768GB||104||768 GB||1.6 TB NVMe storage|
|CPU-384GB-SAS||75||192 GB||8.0 TB SAS storage|
|CPU-768GB-SAS||4||384 GB||8.0 TB SAS storage|
|CPU-PMEM||1||1.5 TB||1.6 TB NVMe storage|
|GPU-192GB-V100||21||192 GB||2x Tesla V100 (16GB or 32GB)|
|GPU-384GB-V100||27||384 GB||2x Tesla V100 (16GB or 32GB)|
|GPU-768GB-V100||5||768 GB||2x Tesla V100 (16GB)|
|GPU-384GB-RTX6000||32||384 GB||4x Quadro Pro RTX6000 (24GB)|
|GPU-768GB-RTX6000||5||768 GB||4x Quadro Pro RTX6000 (24GB)|
|CPU-512GB-AMD||8||512 GB||1.6 TB NVMe storage||2x AMD Epyc 7713|
|GPU-512GB-A100||12||512 GB||1.6 TB NVMe storage||2x AMD Epyc 7513, 2x Tensor Core A100 (40GB or 80GB)|
- Jobs are assigned to Slurm partitions automatically based on your charge account (internal or external) and the most significant resources requested (gpu, memory requirements, etc).
- Jobs will only be charged if the inferno QOS is selected.
- Slurm partitions assigned determine how much users are charged based on current rates.
- Slurm partitions include the following node classes and are assigned by the scheduler based on availability:
|cpu-small||CPU-192GB, CPU-384GB, CPU-384GB-SAS, CPU-768GB, CPU-768GB-SAS|
|cpu-medium||CPU-384GB, CPU-384GB-SAS, CPU-768GB, CPU-768GB-SAS|
|gpu-v100||GPU-192GB-V100, GPU-384GB-V100, GPU-768GB-V100|
- The partitions for external users have the same names with "-X" added (i.e. cpu-small-X, cpu-medium-X).
Job Submit Flowchart¶
- When submitting a job to the Slurm scheduler using interactive mode (using
salloc) or with a script (using
sbatch), the resources requested will determine the partition assigned, as illustrated in the following flowchart:
The scheduler reserves 8GB of memory for system processes, so the total available memory for jobs on a given node is reduced accordingly.