Updated 2020-07-27

testflight-coda Resources

  • You can see the particular resources available on queue at any given time with
pace-check-queue <insert queue name>
  • You can see even more detailed information about each node in a queue with
pace-check-queue <insert queue name> -s

Note

pace-check-queue updates every 15 minutes, so changes in resource availability may not be reflected immediately

Testflight-Coda (CPU Only)

Queue Number of Nodes Number of Cores Per Node Usable Memory Per Node Local Storage per Node Wall Time Limit
testflight-coda 10 24 192 GB 1.4 TB TB 12 hours

GPU Testflight Queues

Queue Number of Nodes Number of Cores per Node Number of GPUs per Node GPU Type Usable Memory Per Node
tf-coda-gpu-dp 3 24 2 Nvidia Tesla V100-16gb 384gb
tf-coda-gpu-sp 2 24 4 Nvidia Quadro RTX6000 384gb

Detailed Information

  • Tesflight-coda
    • Dual Intel Xeon Gold 6226 processors (24 cores / node)
      • 192 GB DDR4-2933 memory
      • 1.4 TB local storage
      • InfiniBand EDR interface between nodes and lustre storage
  • tf-coda-gpu-dp:
    • Stands for tf-coda-gpu double precision
    • Useful for double precision GPU jobs such as accelerated MC simulation or CFD calculations.
    • Each node has 2x Nvidia Tesla V100-16gb GPUs, and 384gb memory
  • tf-coda-gpu-sp:
    • Stands for tf-coda-gpu single precision
    • Useful for single precision GPU jobs such as machine learning / deep learning workflows.
    • Each node has 384gb memory, and 4x Nvidia Quadro RTX6000 GPUs

Coda Testflight Queues General Information

  • This a shared resource for everyone to test their workflow before migrating to Coda
    • Job submissions are meant to test applications and code in the new software and hardware environment
    • Tests should be comprehensive enough to check for accuracy and performance at appropriate scales to be representative of production workload, e.g. running an MPI job on 2-4 nodes for distributed computations rather than exclusively testing on a single node
    • Users found to be running production workflow will be notified and asked to modify their job submissions accordingly
  • Jobs run on testflight-coda have a wall clock limit of 12 hours
  • Users may have up to 10 jobs enqueued (running, idle, or blocked) at a time
    • Only 1 queued job (idle) is eligible to accumulate priority; however, if resources are available, more than one job may run at a time
  • A single user may concurrently utilize at most 160 processors
    • Additional jobs exceeding this limit will wait in the queue until there are sufficient processors to stay below this threshold

Note

While the hardware may not identical to that which will be used for your Cluster in CODA, both CPU architectures are from the Cascade Lake family, so many of the optimizations will be valid for either system