Updated 2021-07-13

Introduction

Welcome to the Firebird cluster, the PACE-managed Controlled Unclassified Information (CUI) High-Performance Computing resource. If you have a project that involves protected data and would like to explore your options with PACE, please start a conversation via PACE-support.

System Overview

The Firebird cluster is a heterogeneous HPC cluster designed to meet the standards defined in NIST Special Publication 800-171. Each CUI project has its own storage, in 20 TB increments, for controlled data and applications, a dedicated headnode into which users login and submit jobs, and a pool of compute nodes that limit access to a single project at any time.

CLX Compute

Each compute node is instrumented with the same base hardware:

  • Dual-socket Intel Xeon Gold 6226 CPUs (12 core, 2.7 GHz, 19.25 MB cache)
  • 192/384/768 GB DDR4-2933 Registered ECC Memory
  • Single channel Mellanox ConnectX-5 HDR100 InfiniBand NIC
  • 1.6 TB Intel NVMe Datacenter SSD

Additionally, some nodes provide expanded local storage with an array of SAS disks or GPUs for accelerated computation. The compute node resource requirements for each job are specified at job submission, and will be made available by the scheduler.

Headnodes

The CUI headnodes are virtual machines dedicated to each project. As they reside on the same hardware as the compute nodes and share access to network storage volumes, they provide an environment in which batch job workflows can be tested and verified before submission. However, there are a few considerations to consider when working on the headnode:

  1. Headnodes are shared by all users within a group. Good stewardship means fairly sharing this resource, and any process that could impede the work of others will be terminated. Please review our headnode use policy for full details.
  2. As a virtual machine meant for lightweight staging and testing, the available memory and CPUs are less than that of a compute node. Tests of even moderate scale will not run meaningfully compared to within a job submission.
  3. The virtual CPUs do not provide the full set of instructions available on the compute node processors; thus, to best optimize compiled code, it should be built on the compute node.

Note

Although the underlying hardware reflects the compute node architecture, the environment is not identical, and thus optimized software compilation should be done on compute nodes.

Caution

While the headnodes provide a great environment to test applications before committing to a job, the limited virtual resources constrain their viability for a fully vetted workflow tester. Consider interactive jobs and iterative batch development to refine any job scaling if necessary.

Network

The headnodes, compute nodes, and storage servers are all connected via an isolated 100 Gbps InfiniBand fabric; data can move freely within the CUI environment on this high-bandwidth fabric, allowing for efficient file IO and message passing during jobs.

File Systems

In addition to the local scratch storage available on the compute nodes, each CUI project has access to two globally mounted volumes:

  • home: Each user has a 10 GB quota in this ZFS partition
  • data: Each project has a fixed quota in this ZFS partition. This volume also contains any project-specific CUI software.

These storage volumes are connected to head and compute nodes via the same InfiniBand fabric used for distributed computations, allowing for high-bandwidth access during jobs. To learn more about your home and data quotas, run the pace-quota command.

Important

At this time, we can accommodate total project storage up to 20 TB.

Accessing the System

To access the Firebird cluster, you must be on the GlobalProtect PACE VPN. Once connected, you can login to the headnode and submit jobs.

PACE VPN Access

The PACE VPN can be accessed using the GlobalProtect VPN client. Please use the Office of Information's directions to install the GlobalProtect VPN Client for Windows or macOS.

Note

Palo Alto Networks provides documentation for a Linux client, although this is not supported by OIT at this time.

Once installed, configure your VPN client to connect on the vpn.gatech.edu portal to the pace-ext-gw.vpn.gatech.edu gateway.

Important

The Cisco AnyConnect VPN client is not supported on Firebird. Please use the GlobalProtect VPN Client instead.

Headnode Access

Each CUI project has a designated headnode from which data can be accessed and compute jobs can be submitted and is located at the address login-<project>.pace.gatech.edu. Once connected to the PACE VPN, use SSH to login to the headnode using your GT credentials. For example, user George Burdell can login to his CUI project "project" via

ssh gburdell3@login-project.pace.gatech.edu
# Replace gburdell3 with your username
# replace project with your project name

Important

Be sure to use your credentials along with the appropriate CUI project; failure to login 5 times will result in your account being locked out for 30 minutes.

CUI File Migration

Because of the sensitive nature of data and programs in the CUI environment, PACE will not migrate your files from the old environment in the Rich data center to the new environment in the Coda data center. Instead, we will provide a mechanism where you can submit jobs to a queue which will allow data migration jobs to run with less impact on other jobs. Principal Investigators and researchers should be sure to migrate files and data within the published time for your group to ensure that data is not lost when the old Rich environment is decommissioned.

To move your files effectively, please use rsync instead of scp, which can be much slower and cannot recover from interruptions.

Setup

Since copying a large quantity of data may take a significant amount of time, please begin at least 1 week before your migration date. While rsync is restartable, it is preferable to avoid disruption through a scheduled job. The mapping of your directories in Rich CUI to the directories on the Firebird cluster is shown below, where <project> should be replaced with your project name and <user> is your username:

Directory Name Directory Paths in Rich CUI Directory Paths in Firebird
Home /nv/h<project>/<user> /storage/home/hc-<project>/<user>
Project /nv/p<project>/<user> /storage/home/c-<project>/<user>

Login to the headnode on Firebird, and create a "from-rich" directory in both your home and project directories:

ssh <username>@login-<project>.pace.gatech.edu
mkdir ~/from-rich
mkdir ~/cui_data/from-rich

Additionally, to use rsync via a batch job, you will need to enable passwordless login between hosts by copying your public key to the remote host. This can be done by running the ssh-copy-id command from your Rich cluster:

ssh-copy-id -i ~/.ssh/id_rsa.pub <username>@login-<project>

where you should replace your username and project as appropriate. You can verify that passwordless login is enabled between the two by simply trying to ssh to login-<project> and verifying that you are not prompted for a password.

Example job

Important

File transfer jobs should be submitted from the Rich CUI login nodes to push data to the new Firebird cluster.

You can submit a PBS script like this example. Please specify a queue to which you have access and set a sufficient walltime. Adjust the file paths as indicated to select the data to transfer.

#PBS -N rsync-richcui
#PBS -l nodes=1:ppn=1
#PBS -l walltime=1:00:00
#PBS -q <queue>
#PBS -j oe

cd $PBS_O_WORKDIR

# Make a shortcut name to the destination directory.
# Please ensure that it is within your project storage.
# 
HDEST=/storage/home/hc-<project>1/<user>/from-rich
PDEST=/storage/home/hc-<project>1/<user>/cui_data/from-rich


# For each directory in ~/scratch that you want to preserve ...
# replace "dir1", "dir2", etc. with the directories or files you wish to transfer in scratch.
# Add additional lines as needed.

# Go to home directory on Rich CUI and copy non-dot files
cd
rsync -avuHWP * <username>@login-<project>:"${HDEST}"/
rsync -avuHWP --exclude '.ssh' .??* <username>@login-<project>:"${HDEST}"/
# Copy project information
rsync -avuHWP cui_data/. <username>@login-<project>:"${PDEST}"/

# If the session terminates, you can restart and it will only synchronize
# what hasn't already been done.

Warning

Do not attempt to run a job on Firebird while transferring data, as this may lead to data corruption issues.

Job Submissions on Firebird

Access to Firebird's computational resources is achieved through job submissions. From the headnode, you specify the resources needed to run your job, and the scheduler will determine when and where the job will run. Jobs can be run interactively, providing access to the terminal on the compute node, or through batch submissions, where a script is run non-interactively and results are logged to a file.

Job Accounting

Firebird's accounting system is based on the most significant processing unit on the compute node:

  • On CPU and CPU-SAS nodes, charge rates are based on CPU-hours for the duration of the job (total number of procs * walltime used)
  • On GPU-RTX6000 nodes, charge rates are based on GPU-hours for the duration of the job (total number of GPUs * walltime used)

The current quantities and charge rates for the CUI resources are:

Service Node Quantity Charge Unit Internal External
[CUI] CPU-192GB 12 CPUh $0.0067  $0.0569
[CUI] CPU-384GB 28 CPUh $0.0103  $0.0639
[CUI] CPU-768GB 4 CPUh $0.0127  $0.0650
[CUI] CPU-384GB-SAS 4 CPUh $0.0152  $0.0816
[CUI] GPU-768GB-RTX6000 1 GPUh $0.1551 $0.4944

When submitting a job, the account to which the job should be charged must be specified using the -A flag, either as a commmand line option or as a PBS directive. The scheduler will verify that the account has sufficient funds available to run the full length of the job before accepting it, and a corresponding lien will be place on the account once the job starts running. Whenever the job completes, the scheduler will finalize the charge based on the CPU/GPU-time consumed.

Tip

Because of CUI restrictions, only a single project may occupy a specific compute node at any moment. As such, the scheduler will charge for the whole node, even if only a fraction is actually consumed by the job. However, if the scheduler can accommodate multiple jobs from a single project on a single compute node, it will allow them to run on the same resource and the sum of charges will amount to the product of the time occupied by the project on the node and the number of available compute resources.

To check available accounts and their balance, use the mam-balance command:

[gburdell3@login-project ~]$ mam-balance
Name                    Balance     Reserved Available
----------------------- ----------- -------- -----------
GT-gburdell3-project     47123.9056   3.2160  47120.6896   

The output from this command will tell you the name of the account, the current balance, the total amount reserved in liens for running jobs, and the effective balance available for additional jobs.

Note

The mam-balance command is only available from the headnode.

Job Scheduler

Firebird's job scheduler consists of the Moab Workload Manager, which is responsible for matching jobs to nodes, and the Torque Resource Manager, which provides the low-level functionality to manage jobs. Simply put, the qsub command is used to submit jobs to Torque, which enqueues them in Moab to determine when/where to run them, executes them accordingly, and then returns the results to the user.

Firebird Queues

Unlike previous PACE-managed clusters, there are only two queues on the Firebird cluster: blaze and cinders. Although the two queues provide access to the same resource pool, the job policies are quite different. Please note that the queue policies are current as of 01/15/2020, and are subject to change to best meet the needs of the research community.

Note

Project specific queues with different base priority are not supported on Firebird.

Blaze: Primary Queue for Production Workflow

Blaze is the main, and default, queue for the Firebird cluster; all jobs submitted to the scheduler will be routed here unless otherwise specified. Jobs in this queue will consume account credits, but will benefit from a greater job limit, higher priority, and longer wallclock limits. This queue should be the main production mechanism for workflow, as jobs here will start sooner and cannot be pre-empted. For jobs in the blaze queue, the following policies apply:

  • Base priority = 250,000
  • Max jobs per user = 500
    • Max eligible jobs per user = 500*
  • Wallclock limit = 21 days*

*Due to high demand for GPU resources, the wallclock limit is 3 days and the maximum eligible jobs per user is 10 for these resources.

Cinders: Backfill Queue for Supplementary Compute

Cinders is the backfill queue on the Firebird cluster - jobs submitted here are meant to take advantage of opportunistic cycles remaining after the blaze jobs have been accommodated. Jobs submitted here have a small job limit, the lowest priority, and shorter wallclock limits. Additionally, jobs submitted to this queue are eligible for pre-emption; if an blaze job is waiting for the resources being consumed, the job will be cancelled and the user will need to resubmit to continue their work. However, while jobs submitted to this queue still require an associated account to run, no credits will be consumed from the account. As such, jobs in the cinders queue should not be critical workflow that faces an imminent deadline. For jobs in the cinders queue, the following policies apply:

  • Base priority = 0
  • Max jobs per user = 50
    • Max eligible jobs per user = 1
  • Wallclock limit = 8 hours
  • Eligible for preemption after 1 hour

Submitting Jobs with qsub

Important

The use of msub is disabled, as the resource manager, Torque, is responsible for job management. Please ensure applications are updated to use the qsub command accordingly.

To submit jobs to the Firebird cluster, use the qsub command. At a minimum, you must specify the account to which the job should be charged (-A <account>) and either the interactive job flag (-I) or a batch submission script. By default, all jobs will run in the Blaze queue; as such, there is no need to specify the destination queue, unless you want your job to run via the Cinders queue. To review

Targeting a Resource Pool

The scheduler places your job in the appropriate resource pool based on the requested resources for the job. To determine the compute node class most appropriate for the job, the memory per processor (either using pmem or mem/nprocs) and additional features (e.g. GPU or SAS are considered). The following tricks will help guide you to the appropriate resource pool:

  • CPU-192GB: less than 8 GB per processor, no features (-l nodes=1:ppn=4,pmem=6gb)
  • CPU-384GB: less than 16 GB per processor, no features (-l nodes=1:ppn=4,pmem=9gb)
  • CPU-768GB: more than 16 GB per processor, no features (-l nodes=1:ppn=4,pmem=22gb)
  • CPU-384GB-SAS: less than 16 GB per processor, "local-sas" feature request (-l nodes=1:ppn=4:local-sas,pmem=5gb )
  • GPU-768GB-RTX6000: more than 16 GB per processor, more than 1 GPU and RTX6000 feature (-l nodes=1:ppn=6:gpus=1:RTX6000,mem=128gb)

Example

The following PBS script template is an example script that can be used for job submissions on the Firebird cluster. You can download the script here. Make sure to edit the appropriate fields before attempting to submit a job.

#PBS -N [INSERT NAME]           # job name
#PBS -A [Account]               # account to which job is charged
#PBS -l nodes=2:ppn=4           # number of nodes and cores per node required
#PBS -l pmem=2gb                # memory per core
#PBS -l walltime=15:00          # duration of the job (ex: 15 min)
#PBS -j oe                      # combine output and error messages into 1 file
#PBS -o [INSERT NAME].out       # output file name
#PBS -m abe                     # event notification, set to email on start, end, or fail
#PBS -M [INSERT EMAIL ADDRESS]  # email to send notifications to

cd $PBS_O_WORKDIR 

for i in {1..5}
do 
    echo "Test $i"
done

CUI Software Environment

Similar to our other research clusters, the CUI software stack is a modular environment based on Lmod, and a default compiler + MPI environment that mirrors the current Phoenix cluster default configuration. For more information on using the modular software system, please reference the modules guide.