Updated 2021-10-18
PACE Storage – Scratch Storage Delete Policy¶
Summary¶
This policy provides guidance on how often the temporary files stored in the scratch storage for the PACE clusters are reviewed and deleted. The aim is to define
- Age of the files that will be included in the list to be deleted
- How often the deletion process will happen
- The number of notifications that will be sent to the users
- How to request extensions, and how many can be requested by a user.
- Tasks to be performed by the PACE team
The scratch space is a shared resource among all users of PACE. It is not intended to be used for long term storage; any file that needs to be kept for over 60 days should be moved to the user’s project space, which is protected by backups and snapshots.
Scope¶
This policy covers the scratch storage used in these PACE clusters:
- Phoenix
- Head node: login-phoenix
- Storage technology: Lustre
- Hive
- Head node: login-hive
- Storage technology: GPFS
Definitions¶
- Scratch
- Temporary file system for storing files used during job execution. The quotas for this storage system are usually larger than the project storage.
- Scratch Sweeper
- Process that sweeps the file system for files older than a certain number of days. Creates a file in the user’s home directory (SCRATCH.TO.BE.DELETED), providing the user with a complete reference of which files will be deleted.
- Generates notification messages for each user, providing details about the upcoming deletion cycle.
- Deleter
- Process that uses the output of the sweeper function to delete the files
- Extension
- For files that the users require to keep for a longer period, a ServiceNow incident should be opened to request an extension. The PACE team will update the user’s files metadata, to exclude them from the current sweep and delete cycles.
- Special exceptions can be requested in case the user will be away for an extended period. Please include information in the ServiceNow incident.
- Extensions can be requested from the time the first notification is received until the deleter process runs.
- PACE CI
- PACE Cyber infrastructure group. Provides operations support for the PACE clusters.
- PACE RS
- Research scientists. Provides end user support and consulting for the PACE services.
Statement of Policy¶
- Scratch space is cleaned once a month.
- Files that are 60 days or older will be included for deletion.
- Automated notifications are sent by sweeper script, every time it runs. Deleter script does not send notifications.
- First sweeper will run on Tuesday to avoid holidays. Normally, it will be the first Tuesday of the month, except when that day is a holiday. In that case, the first sweeper will be run on the next non-holiday business day. The first notification message is sent.
- Users can request extensions after the first notification, up until one day before the delete runs. All the user’s files will be excluded from the current and next deletion cycles.
- Second sweeper runs 7 days after. The second notification message is sent.
- Deleter runs 14 days after the first sweeper. No notification is sent to the users.
- When a user submits more than 2 consecutive exception requests, PACE will provide assistance and guidance to use the storage resources efficiently.
Procedure¶
- Day 1: PACE CI runs the scratch sweeper
- Files that are 46 days or older are included in the user report.
- A file with the list of files to be deleted is created in the user’s home directory (~/SCRATCH.TO.BE.DELETED)
- An automated notification is sent to every user with candidate files.
- PACE CI and RS will track down any notification issues.
- Days 1-13: Users can submit requests for extensions
- To submit the request, users should reply to the notification message; it will create a ServiceNow request with all the proper details.
- PACE CI and PACE RS provide support for the requests
- All the user’s files are updated.
- Day 7: PACE CI runs the scratch sweeper.
- Files that are 53 days or older are included in the user report.
- A new file with the updated list of files to be deleted is created in the user’s home directory (~/SCRATCH.TO.BE.DELETED).
- An automated notification is sent to every user with candidate files. Users that requested extensions are excluded from this deleter cycle.
- PACE CI and RS will track down any notification issues.
- Day 14: PACE CI runs the scratch delete
- The sweeper runs one last time to get a final list of files to delete.
- Using the final list, the files are deleted from storage.
Enforcement¶
PACE will continue to monitor the scratch storage utilization and work individually with users to assist them on using the shared resource efficiently and appropriately. When a user submits more than two consecutive extension requests, PACE will work with the user to either move the file to permanent storage or delete it.
References¶
PACE web site: https://pace.gatech.edu
PACE documentation: https://docs.pace.gatech.edu
History¶
Revision Date | Author | Description |
---|---|---|
2021-09-08 | Ruben Lara | Initial creation based on Teams conversation in the PACE -> Cyber infrastructure channel. |
2021-09-09 | Ken Suda Dan Zhou Michael Weiner Ruben Lara |
Updated terms and process details |
2021-10-15 | Ruben Lara Dan Zhou Michael Weiner |
Added note about PACE checking for notification issues |