Updated 2020-01-16

How to Use Local Scratch Storage

What is Local Scratch Storage?

  • Every Node comes with a tmp directory that's local to the node.
  • Since it's a local disk mounted on the node, it's faster than network storage (/data, /home etc)
  • Most local nodes have limited disk space (<20gb)
  • Because of this limited disk space, a node may be forced to go offline if a large tmp directeory is created on the node and then not deleted once the job ends. As a solution, /scratch directories have been created on nodes to support the use of tmp directories. The /scratch directory is automatically deleted after the job has finished running

Scratch Directory on Node Explained

  • Each node has its own /scratch directory.

Warning

The /scratch dir on the node is not the same as the ~/scratch directory in your /home folder

  • Every job will create a directory on the node under /scratch named after the jobID, for example:
    • /scratch/20986925.shared-sched.pace.gatech.edu
  • This directory will be automatically deleted when the job is complete, so you don't have to worry about deleting it yourself
  • ${TMPDIR} is the variable that is assigned to this path (normally /tmp). ${TMPDIR} is how you reference this directory in your code
  • Any places in your code that you require a tmp directory (normally /tmp) replace with ${TMPDIR}

How to Use Scratch Directory + Example

Important

In your code, use ${TMPDIR} where you use /tmp, or anywhere you need to use local scratch storage

  • Since${TMPDIR} automatically deletes itself after the job, the problem of a large number of nodes running out of local space will be prevented
  • To update your code:
    • Wherever you need to access a tmp directory, use ${TMPDIR}
python code.py > /tmp/intermediate_results.txt #WRONG

would be changed to:

python code.py > /${TMPDIR}/intermediate_results.txt #RIGHT

Example PBS Script

#PBS -N tmpTest
#PBS -l nodes=1:ppn=2
#PBS -l walltime=1:00
#PBS -l file=10g
#PBS -q force-6
#PBS -j oe
#PBS -o tmpTest.out

cd ${TMPDIR}
pwd
echo "this wont do anything but uses the right tmp directory" > ${TMPDIR}/intermediate_results.txt

Warning

The directive #PBS -l file=10g reserves 10gb per core not per job. In this PBS script, 20gb total scratch storage would be reserved (10gb per core x 1 node x 2 cores per node). Keep this in mind as it is easy to accidentally request excessive amounts of storage which could prevent your job from running.

  • The #PBS directives are standard, more on the directives can be found in the PBS guide
  • cd ${TMPDIR} enters the /scratch directory on the node which the job is running on, and then prints this path with pwd
  • The echo line prints text to a file in the tmp directory on the node. Notice how ${TMPDIR} is used, NOT /tmp. This file is deleted after the job is run, so no actual results from this line will be apparent in the .out file

Results

  • From the same directory you ran qsub from, use a text editor to view the .out file, such as vim tmpTest.out. The results should look something like this:
Job name:   tmpTest
Queue:      force-6
End PBS Prologue Mon Mar  4 16:09:00 EST 2019
---------------------------------------
/scratch/24537771.shared-sched.pace.gatech.edu
---------------------------------------
Begin PBS Epilogue Mon Mar  4 16:09:01 EST 2019
Job ID:     24537771.shared-sched.pace.gatech.edu

  • The path ${TMPDIR} represents is printed

Ensure Local Storage Availability for Job

  • PACE clusters consist of many nodes with different specifications, including different local drive capacities ranging from 20G to 7TB
  • The -l file=n #PBS directive allows you to request space for your job. For example, if you need 10 gb of space:
    • #PBS -l file=10g. Remember: this storage amount is per core not per job.
  • This directive will make sure your job will be allocated on a node that has at least 10gb free on ${TMPDIR}. This is assuming you only requested 1 node and 1 proc per node. Since the -l file=10g directive reserves storage per core, requesting for example, 2 nodes and 4 ppn would suddenly leave you with 80gb requested storage instead of 10 (10gb per core * 2 nodes * 4 ppn)
  • See the section below for more information about the -l file= flag and how to ensure that you request the right amount of local storage when using it

More about -l file=

  • The -l file=<> requests local storage per task and makes it so that no single file can exceed the specified size
  • When using -l nodes=<>:ppn=<> or -l procs=<>, each core counts as one task, so the file size specified with -l file=<> is scaled by the number of processors
  • This can hold up a job in the queue since if no node can provide that much local storage
  • However, if your job does not require multiple nodes, you can use the -l ncpus=<> flag to ensure that multiple cores are used for a single task, making sure that the local storage specified by l file=<> is not scaled
  • If you need multiple nodes, then you will still need to use -l nodes=<>:ppn=<> or -l procs=<> and be mindful of the fact that each file cannot exceed the size indicated by -l file=<> and that total file size will scale with the number of requested processors

Retrieve Files from Unexpectedly Terminated Jobs

  • Since {$TMPDIR} is deleted once a job is finished, you may lose the files stored in it if the job unexpectedly terminates
  • However, you can use the trap command to make sure that files are copied over from the {$TMPDIR} before the job terminates and the directory is deleted
  • For trap to work, it must precede the command that causes the unexpected termination, so the suggested location is right after the PBS directives
  • Here is an example command that can be inserted right after the PBS directives:
trap "cp ${TMPDIR}/* ~/data/somewhere_to_store_these/;exit" TERM
  • You should make sure to make this command more specific based on where you want to copy your files to
  • A good option is to copy the files over to your global scratch directory ~/scratch/ where you have 7TB of space
  • Note that this copy will ONLY happen if the job terminates as a failure, so you should make sure that the existing procedure you have for copying files during a normal completion is still a part of the script.