Updated 2022-11-22
Participation in PACE¶
In January 2021, Georgia Tech formalized the adoption of a new method of funding research cyberinfrastructure. The new model aims to be more sustainable and flexible and has the full support of Georgia Tech’s executive leadership. The new model is based on actual consumption of resources – similar to that of commercial cloud offerings. In this new model, PIs are charged only for what they actually use rather than for some fixed capacity which may remain idle. Other advantages of this model include:
- Researchers have more flexibility to leverage new hardware releases instead of being restricted to hardware purchased at a specific point in time. Researchers can tailor computing resources to fit their scientific workflows rather than being restricted to the specific quantity and configuration of compute nodes purchased.
- Rapid provisioning without the requirement to wait for a lengthy procurement period to complete.
- Insulation from failure of compute nodes. Compute nodes in need of repair can be taken out of service and repaired, allowing jobs to proceed using other compute nodes in the pool rather than decreasing the capacity available to a particular user.
- A free tier that provides any PI the equivalent of 10,000 CPU-hours (per month) on a 192GB compute node and one TB of project storage at no cost.
- PACE staff will monitor the time jobs wait in the queue and procure additional equipment as needed to keep wait times reasonably low.
- Note that a similar consumption model has been used successfully at other institutions such as Univ. Washington and UCSD, and this approach is also being developed by key sponsors (e.g. NSF’s cloudbank.org).
PACE has secured the necessary campus approvals to waive the F&A overhead on PACE services, as well as purchases from commercial cloud providers, for proposals submitted to sponsors between January 1, 2021 and June 30, 2023. While not formally committed to at this point, this waiver is anticipated to continue. Complete details on the cost model and the rates are published at this site: https://pace.gatech.edu/update-gts-research-cyberinfrastructure-cost-model
Policies affecting the Phoenix cluster are maintained by the faculty-led PACE Advisory Committee.
PACE Services¶
- Phoenix cluster – The Phoenix cluster is the largest computational resource in PACE. It ranked #277 on the November 2020 Top500 (top500.org) list of world-wide supercomputers. A wide range of scientific disciplines are supported.
- Hive cluster – The Hive cluster is funded by the National Science Foundation (NSF) through Major Research Instrumentation (MRI) award 1828187, “MRI: Acquisition of an HPC System for Data-Driven Discovery in Computational Astrophysics, Biology, Chemistry, and Materials Science", and is dedicated to supporting research in accordance with the terms of that award.
- Firebird cluster – The Firebird cluster supports research involving Controlled Unclassified Information (CUI), including ITAR and other forms of protected data.
- ICE – The Instructional Cluster Environment is an educational resource separate from production research resources intended to provide students at the undergraduate and graduate level with an opportunity to gain first-hand scientific computing experience including HPC and GPU programming. It is configured with the same hardware and software as the Phoenix cluster in order to facilitate transitions between learning and research contexts. PACE manages both PACE-ICE and CoC-ICE clusters.
- OSG – The Open Science Grid is a national, distributed computing partnership for data-intensive research. PACE operates resources that participate in OSG, supporting various projects including LIGO, VERITAS, and CTA. PACE received an NSF CC* award 1925541: “Integrating Georgia Tech into the Open Science Grid for Multi-Messenger Astrophysics”. With this award, PACE added CPU/GPU/Storage to the existing OSG capacity, as well as the first regional StashCache service that benefits all OSG institutions in the Southeast region, not just Georgia Tech.
- ScienceDMZ – The PACE ScienceDMZ uses Globus to facilitate high-speed data transfers between PACE and research cyberinfrastructure resources at collaborating institutions world-wide.
- Archive storage – Archival storage is a low-cost storage tier designed for long-term storage of data sets, and can be included as a key component of data management plans.
- Collaboration Services – The PACE team provides grant-submission support and partners with other schools, departments and researchers on awarded research projects. The PACE team has been awarded equipment grants, and PACE pursues opportunities that expand the resources and services for the benefit of the Institute.
What is included in a free tier account on the Phoenix cluster¶
All academic and research faculty (“PIs”) participating in PACE are automatically granted a certain level of resources in addition to any additional funding they may bring. Each PI is provided 1TB of project storage and a number of credits equivalent to 10,000 CPU-hours (per month) on a 192GB compute node. These credits may be used towards any computational resources (e.g., GPUs, high memory nodes) that are available within the Phoenix cluster. All PACE users also have access to the preemptable backfill queue at no cost.
How to join PACE?¶
- Immediate access to a free tier account on the Phoenix cluster may be provided to a PI (i.e., any GT academic or research faculty) and their group.
- Cost: This option is provided at no cost to participants. NOTE: if the free tier account does not provide sufficient compute time or project storage, then PIs may purchase additional compute time and/or project storage that is subject to the rates described in the Rate Study section.
- How to sign-up: PI should contact pace-support@oit.gatech.edu to request a free tier account on the Phoenix. If adding additional group members to the allocation, PI may provide group members’ GT Account names. An active PI account is required for addition of users a group.
- If other than PI, then please provide the required information that includes:
- User’s GT Account name (e.g. gburdell3)
- User’s name
- Authorizing person’s (“PI”) GT Account name
- PI’s name
- PI’s direct approval for the account
- Academic faculty wishing to access the Hive cluster (NSF MRI funded cluster) should contact one of the five PIs for the cluster to request the addition of their group.
- PACE can only grant accounts to academic faculty with the approval of one of the following PIs:
- Srinivas Aluru
- Surya Kalidindi
- C. David Sherrill
- Deirdre Shoemaker
- Richard Vuduc
- Cost: Access to approved PIs and their groups is provided at no cost to participants.
- How to sign up: Academic faculty with approved Hive accounts may request the addition of students, postdocs, and research faculty working with their group. If you are a member of a research group and your PI has Hive access, please ask your PI to request a Hive account for you by contacting pace-support@oit.gatech.edu.
- PACE can only grant accounts to academic faculty with the approval of one of the following PIs:
- For getting access to instructional clusters (PACE-ICE/COC-ICE clusters):
- Faculty, if you are from a department other than CoC: please fill out this form to send an application
- If you are teaching a CoC class, or if your class is cross listed with CoC department: please directly contact David Mercer
- For all other PACE services listed above, please contact pace-support@oit.gatech.edu. Cost of services is provided in the section below and the following Internal usage rate calculator is made available. When contacting PACE, please provide the following information:
- Inquiry about the service
- User's GT Account name (e.g., gburdell3)
- User’s name
- Authorizing person’s (“PI”) GT Account name
- PI’s name
- PI’s direct approval for the account
- Workday Worktag to charge
- Expiration date for funds
- Finance approver contact info
How much do services cost?¶
PACE offers various compute options according to the table below. A rate study has been submitted, which will formalize rates for PACE services and be reviewed annually. A detailed breakdown of charges is provided in the table below; also provided is an internal usage rate calculator to help convert between research budget, compute resources, and account credits. Participants are encouraged to seek support from PACE in choosing cost effective and proper hardware for their purpose.
In CODA, PACE clusters are based on "Cascade Lake" CPU technology from Intel, and fabric is based on the 100-gigabit high-performance network, HDR100 InfiniBand, with a 10-gigabit ethernet on all compute nodes.
Compute Node Configurations¶
Participants have an option to run their workflows on the following resource configuration. Reducing the number of configurations helps PACE staff provide efficient service, and focus efforts on projects of strategic value. This choice of configurations is informed by more than decade of faculty-driven PACE purchases.
-
192 GB Compute node
- dual-socket, 12-core Intel Xeon Gold 6226 "Cascade Lake" @ 2.7Ghz (24 cores total)
- 192 GB DDR4-2933 Mhz memory
- HDR100 Infiniband
- 10-gigabit Ethernet
-
384 GB Compute node
- same configuration as the 192 GB Intel node, just more memory
-
768 GB Compute node
- same configuration as the 192 GB Intel node, just more memory
-
384 GB Compute node w/ local disk
- same configuration as the 384 GB Intel node, plus (qty 4) 2TB SAS drives
-
768 GB Compute node w/ local disk
- same configuration as the 768 GB Intel node, plus (qty 4) 2TB SAS drives
Note
Due to licensing restrictions from nVidia, PACE cannot support consumer-grade nVidia GPUs (e.g. the GTX or Titan line). For those desiring single-precision GPU performance (e.g. AI or machine learning workloads), we do support the nVidia RTX6000.
-
384 GB Compute node w/ single precision GPU
- same configuration as the 384 GB compute node, plus (qty 4) nVidia RTX6000 GPUs
-
768 GB Compute node w/ single precision GPU
- same configuration as the 768 GB compute node, plus (qty 4) nVidia RTX6000 GPUs
-
192 GB Compute node w/ double precision GPU
- same configuration as the 192 GB compute node, plus (qty 2) nVidia v100 GPUs<200b>
-
384 GB Compute node w/ double precision GPU
- same configuration as the 384 GB compute node, plus (qty 2) nVidia v100 GPUs<200b>
-
768 GB Compute node w/ double precision GPU
- same configuration as the 768 GB compute node, plus (qty 2) nVidia v100 GPUs<200b>
-
Storage
- provisioned from shared Lustre filesystem
- may be easily expanded on demand
- smallest increment is 1TB for 1 month, beyond the 1 TB offered through the free tier
- storage may be purchased monthly (i.e., $6.67/TB per month) or in lump sum paid up front (e.g., $240.12 / TB for 3 years in advance)
- includes nightly backups
-
Archive Storage
- Globus user interface, not directly accessible from PACE systems
- No PACE account needed
- may be easily expanded on demand
- smallest increment is 1TB for 1 month; no free tier
- storage may be purchased monthly (i.e., $3.33/TB per month), or multiple years may be paid up front (e.g., $119.88 / TB for 3 years)
- triple replication for data reliability
Rate Study¶
Clusters With Slurm¶
Posted Rates Effective October 10, 2022
Moab Service | Slurm Service | Unit | Internal | External |
---|---|---|---|---|
[GEN] cpu-192GB | [GEN] cpu-small | CPUh | $0.0068 | $0.0232 |
[GEN] cpu-384GB | [GEN] cpu-medium | CPUh | $0.0077 | $0.0218 |
[GEN] cpu-768GB | [GEN] cpu-large | CPUh | $0.0091 | $0.0242 |
n/a | [GEN] cpu-amd | CPUh | $0.0027 | $0.0053 |
[GEN] cpu-384GB-SAS [GEN] cpu-768GB-SAS |
[GEN] cpu-sas | CPUh | $0.0091 | $0.0219 |
n/a | [GEN] cpu-pmem | CPUh | $0.0131 | $0.0176 |
[GEN] gpu-384GB-rtx6000 [GEN] gpu-768GB-rtx6000 |
[GEN] gpu-rtx6000 | GPUh | $0.1491 | $0.2206 |
[GEN] gpu-192GB-v100 [GEN] gpu-384GB-v100 [GEN] gpu-768GB-v100 |
[GEN] gpu-v100 | GPUh | $0.2307 | $0.4225 |
n/a | [GEN] gpu-a100 | GPUh | $0.2891 | $0.4373 |
[CUI] Server-192GB | [CUI] cpu-small | CPUh | $0.0067 | $0.0395 |
[CUI] Server-384GB | [CUI] cpu-medium | CPUh | $0.0103 | $0.0394 |
[CUI] Server-768GB | [CUI] cpu-large | CPUh | $0.0127 | $0.0425 |
[CUI] Server-384GB-SAS | [CUI] cpu-sas | CPUh | $0.0152 | $0.0498 |
n/a | [CUI] gpu-a100 | GPUh | $0.2769 | $0.4373 |
[Storage] Project storage | TB/Mo | $5.67 | $7.78 | |
[Storage] CUI storage | TB/Mo | $6.67 | $12.41 | |
[Storage] Hive storage | TB/Mo | $6.60 | $6.81 | |
[Storage] LIGO/OSG storage | TB/Mo | $2.61 | $3.13 | |
[Storage] IDEaS Data Repository | TB/Mo | $0.0000 | $1.78 | |
[Storage] Research Data Repository | TB/Mo | $1.67 | $8.79 | |
General Consulting | Hour | $89 | $89 |
Note:
[GEN] - Phoenix cluster
[CUI] - Firebird cluster
[Storage] - Storage on Phoenix, Firebird, and/or Hive clusters
Clusters Without Slurm¶
Posted Rates Corrected February 16, 2022
Service | Unit | Internal | External |
---|---|---|---|
[GEN] cpu-192GB | CPUh | $0.0068 | $0.0243 |
[GEN] cpu-384GB | CPUh | $0.0077 | $0.0276 |
[GEN] cpu-384GB-SAS | CPUh | $0.0091 | $0.0289 |
[GEN] cpu-768GB | CPUh | $0.0091 | $0.0297 |
[GEN] cpu-768GB-SAS | CPUh | $0.0119 | $0.0340 |
[GEN] gpu-192GB-v100 | GPUh | $0.2307 | $0.4415 |
[GEN] gpu-384GB-RTX6000 | GPUh | $0.1491 | $0.3018 |
[GEN] gpu-384GB-v100 | GPUh | $0.2409 | $0.45083 |
[GEN] gpu-768GB-RTX6000 | GPUh | $0.1551 | $0.2832 |
[GEN] gpu-768GB-v100 | GPUh | $0.2627 | $0.4944 |
[CUI] Server-192GB | CPUh | $0.0067 | $0.0569 |
[CUI] Server-384GB | CPUh | $0.0103 | $0.0639 |
[CUI] Server-384GB-SAS | CPUh | $0.0152 | $0.0816 |
[CUI] Server-768GB | CPUh | $0.0127 | $0.0650 |
[Storage] Project Storage | TB/Mo | $6.67 | $7.68 |
[Storage] CUI | Drive Bay/Mo | $41.22 | $41.22 |
[Storage] Archival | TB/Mo | $3.33 | $4.93 |
General Consulting | Hour | $98 | $98 |
PACE Participation Calculator¶
$ |
|||
Equivalent Nodes purchased for 5 years of compute | |||
|
|
|
|
|
|
||
|
|
|
|
|
|
||
|
|||
CPU Hours on Each Node Class for Given Credits Allocation | |||
|
|
|
|
|
|
||
GPU Hours on Each Node Class for Given Credits Allocation | |||
|
|
|
|
|
|