Updated 2022-01-10

(Phoenix) Testflight-Coda Environment for RHEL7.9

Testflight-Coda is designed for Phoenix users to test their workflows in advanced of PACE's RHEL7.9 operating system upgrade in February 2022.

(Phoenix) Testflight-Coda Login Node

Access Testflight-Coda by ssh to login-testflight-coda.pace.gatech.edu. You will arrive in your Phoenix home directory, with access to your Phoenix project & scratch storage as well.

Note

If you accessed the former login-testflight-coda during the Rich-to-Phoenix migration from the same computer, you may need to take an extra step to log in. Upon attempting to ssh to login-testflight-coda, you may receive a warning of a changed host key, and your login may be denied by your computer as a security measure.
  • To resolve this, please open your known_hosts file on your local computer with a text editor. The known_hosts file location may be indicated in the warning message. On Mac or Linux, it is generally located at ~/.ssh/known_hosts. On Windows, it may be at C:\Users\<username>\.ssh\known_hosts.
  • Delete the line containing login-testflight-coda.pace.gatech.edu and save the file.
  • Then, make a new ssh connection, which will regenerate the line with the updated host key.

The login node runs RHEL7.9 and has the updated PACE software stack modules, so you can use it to recompile your code in the new environment.

Modifying PBS Scripts

Researchers will need to make several modifications to their Phoenix PBS scripts in order for them to run on the Testflight scheduler. Since these are temporary changes for testing, you should copy your normal script.

There is no need to modify your module load commands, as the upgraded PACE-provided modules have the same names and versions as the existing ones. If you use the Testflight environment, you will automatically load the upgraded modules.

Choosing a Queue (-q)

Unlike Phoenix, Testflight-Coda has a separate queue for each node type. When submitting a job, please specify the appropriate queue for the resources you need with the -q directive.

  • testflight for CPU jobs
  • testflight-rtx for GPU jobs employing an Nvidia Quadro Pro RTX6000 GPU
  • testflight-v100 for GPU jobs employing an Nvidia Tesla V100 GPU

Note

On Phoenix, the default GPU is a Tesla V100. If you do not ordinarily specify a GPU type on Phoenix, use the V100 queue here.

Accounting (-A)

There is no charge accounting on Testflight-Coda. If you leave the -A flag with an account in your script, it will be ignored, and you will not be charged for usage.

Resource requests (-l)

  • Testflight is designed for testing code, not production, so please run only a short test to validate your software and workflow. Jobs are limited to 3 hours and 48 CPUs. Adjust your -l resource requests to be within this limit.

  • Since the primary effect of the upgrade is to MPI, if you are testing an MPI-dependent code, be sure to use multiple nodes. You can ensure multiple distinct nodes are assigned to your job by requesting more resources in total than are present on a single node (24 CPUs; for GPUs, 2 V100s or 4 RTX6000s).

  • To request a GPU, include the gpus=1 (or gpus=2) syntax, but do not specify a GPU type under the -l flag. Instead, choose your GPU type by the queue, as noted above.

  • Each user is limited to using 2 GPUs at a time (across all jobs) and to 10 running or queued jobs.

Example PBS Directives

For a simple test, swap these lines into your existing scripts instead of the ones you generally use.

To test MPI code across two full nodes for one hour, you can use these lines in place of the ones you usually use:

#PBS -q testflight
#PBS -l nodes=2:ppn=24
#PBS -l pmem=7GB
#PBS -l walltime=1:00:00

Note

Be sure to adjust any line in your code (e.g., mpirun calls) that include the number of cores available for your job, so that it matches the scheduler request.

To test a GPU+MPI job on 2 GPUs across 2 nodes for 1 hour, you can use these lines:

#PBS -q testflight-v100
#PBS -l nodes=2:ppn=24:gpus=1
#PBS -l pmem=7GB
#PBS -l walltime=1:00:00