Updated 2020-11-13

Phoenix Storage Structure

Overview

  • The Coda storage structure is similar to the storage structure that was already implemented in Rich
  • However, there are some improvements made in storage size and naming that are covered in this guide
  • Storage is split into 3 main types (directories):
    • home and scratch are named the same
    • Project storage will no longer be referred to as data
      • The naming scheme for project storage will be p-<pi-username>-<number>, so for example p-jdoe4-0
  • For more information on where to find your data from Rich that has been migrated to the Phoenix cluster, check out the Where is my Rich Data? guide

Tip

The Home directory quota is 10GB. It is very important to limit usage of the home directory since there are many applications that must be able to write files in the home directory to function.

Important

All symlinks (symbolic links or short-cuts) will be broken during the migration of data since the underlying lay-out of data on the Phoenix is different from the Rich Datacenter based systems. The links will have to be fixed manually by you. Please see How to Create Symlinks for assistance.

Home

  • This is your home directory, the directory you are in when you log into, and your main directory. You can create whatever files and directories you want in ~
  • Is your user directory, name will be something like /storage/home/hcoda1/0/someuser3 but you can access it as ~
  • home space is backed up (includes any files or directories you make in the home directory)
  • Contains symbolic links to your project storage directory and ~/scratch
  • ~ is limited by a 10GB quota, which has been upgraded from 5GB in Rich. Home directories are also limited to 1 million files or directories.

Tip

Project storage quotas will not be enforced during the migration period from Rich to Coda datacenter. Quotas will be enforced starting no earlier than January 1, 2020.

Project Storage, formerly known as data

  • data is no longer an alternate term for project storage
  • The naming scheme for project storage is now p-<pi-username>-<number>, so for example p-jdoe4-0, in most cases.
    • The p represents "project", jdoe4 is the responsible PI's username, and the number 0 represents how many projects are associated with your PI.
    • The PI username in the name of the project indicates the PI (Georgia Tech faculty member) who is responsible for the storage project and for any data contained in it. When a user leaves Georgia Tech, this PI will receive the data contained in the user's project storage.
    • Some project storage locations start with d followed by the abbreviation for a school, college, or unit, e.g., d-chem-0. In this case, the unit, rather than a specific faculty member, is responsible for the data.
  • A user may have access to multiple project storage locations, especially if that user has multiple supervisors. The PI username in each one indicates which PI is responsible for the data in that directory.
  • The symlink for your project storage with the structure p-<pi-username>-<number> points to /storage/coda1/p-<pi-username>/<number>/<username>
  • Project Storage is the place for any large files that need to be stored long term data sets, etc
  • The quota for project storage is shared by all members of a research group (not set by user) and is determined by the PI's purchased amount.
  • Project Storage: does not have a limit on number of files/directories for each user or research group
  • Project Storage can be expanded. If a research group needs more than the 1 TB provided by PACE, the PI may purchase additional space. Purchase options and additional details can be found on the PACE participation page

Scratch

  • ~/scratch is for short term data and is not backed up, so important data should not be stored there.
  • Great for a working environment, such as moving files during a job, storing data to be used in a job that doesn't need to be on the cluster long term, or as place to store generated files from a job.
  • Common workflow looks like this:
    • Using a file transfer service like Globus, copy scripts and dataset into ~/scratch folder
    • When job is executed, the data remains in ~/scratch
    • Output and resulting generated files will show up in ~/scratch. Then, move the important results data to your project directory (see above), or transfer them off the cluster if needed. Then empty ~/scratch and remove unneeded temporary files.
  • Each week, files older than 60 days are automatically deleted from ~/scratch
  • ~/scratch Storage limit is 15 TB
  • ~/scratch: There is a limit of 1 million files or directories.