Updated 2022-03-23

PACE Data Management and Retention Guidelines

This document represents an effort to articulate best practices and operational guidance around the management of research data held in the PACE environment at Georgia Tech. In order for some version of this to become official campus policy, it would need to have a wider review including the faculty senate, cybersecurity, EVPR, and the campus Data Governance Committee. These guidelines have had a reasonable internal review, and PACE intends to follow these guidelines until further review and approval can establish a proper data management policy and process. This document will be reviewed or superseded by March 2022.

Guiding Principles

  1. Data arising from sponsored projects are GT’s data, not the PI’s.
  2. PI’s are responsible for knowing what data exist related to their projects, and how the data are being managed.
  3. If a PI leaves GT and effort continues on the sponsored project, GT must name an alternate/new PI who is a member of GT faculty and that new PI would provide control of data. If the departing PI wants to take the data (or a copy of the data) it must be done in compliance with an approved data transfer agreement (DTA) or data usage agreement (DUA) between GT and the PI’s new institution.
  4. At termination of sponsored projects, data should be retained in an approved GT system (eg PACE, Dropbox, Box, Office365), destroyed, or returned to the provider depending upon contract terms.
  5. If the data is considered CUI, then additional restrictions apply. By contract terms, CUI must be maintained in system certified as compliant via a System Security Plan (SSP) and/or Technology Control Plan (TCP) which is prepared and approved by Cybersecurity and other appropriate parties. GT cannot fulfill terms of contract and provide oversight of data access if a PI takes a copy of CUI data, unless an appropriate DTA or DUA is in place.
  6. If a PI leaves and their sponsored projects are transferred to the new institution, data access will be addressed in the transfer agreement of the sponsored program (which may require a supporting DUA or DTA).
  7. While certain data sets may be copyrighted, data itself is not intellectual property. [Citation needed?]
  8. Note that any data included in a publication is considered public.
  9. For reference, here is the current GT policy on data access. From that page, there are links to our policies on acceptable use, cybersecurity, and data privacy.

The campus Data Governance Committee uses the following definitions which are aligned with the USG Business Procedures Manual

Data Governance definitions:

Information System: The technology, software, and services administered for the purpose of creating, storing, managing, using and gathering data and communication at Georgia Tech.

System Owner: A System Owner is a technical expert who has overall responsibility for the data management, security, and compliance efforts of an Information System. Each Information System must be assigned a System Owner.

Data Steward: Data Stewards are Georgia Tech leaders (at the director level of a division/unit) who have day to day responsibility for Organizational Data within their Data Domain(s) and/or Data Sub-Domain(s). Depending on the size and complexity of a functional division/unit, it may be necessary, and beneficial, for the Data Steward to appoint an Associate Data Steward(s). Data Stewards are appointed by a Data Trustee.

Guidance is provided below for some common situations

Situation 1: PI is still at GT. Student, postdoc, research scientist, or other researcher working under the direction of a PI leaves.

Outcome of the data: All data remains under the control and management of the PI who sponsored the user’s account on PACE, unless the data is explicitly controlled otherwise by a sponsor’s contract or via DUA, DTA, purchase agreement etc. Note, this assumes that the PACE account provided to anyone working under the PI sponsoring that account is used to store research data and does not contain any personal (private) data.

Acknowledged Gap: There isn’t a well-established system to document data ownership. Do we need a regular attestation/verification of data ownership? Based on guiding principle #2, the PI is responsible for the data, but in practice, a graduate student could put the data anywhere. PIs need assistance and guidance (with examples of dos and don’ts) in developing good data practices.

Other Considerations: When a student, postdoc, etc. changes their affiliated PI, PACE needs to be informed since it may affect Access Controls. Unless specified otherwise, data remains with the PI whose project originally had it. A process will have to be defined where a student can copy data from old location to new location. In this case, the PI will have to authorize the student to copy data to new location.

Situation 2: A PI wishes to remove data from PACE by deletion or moving. Note: the PI is still at Georgia Tech.

Outcome of the data: They may delete it or securely move it to another GT provided storage service* provided that storage is compatible with the data protection / classification level. PI is responsible to comply with whatever contractual and regulatory obligations may exist (e.g. NDA, DUA, SSP/TCP, DMP, IRB). Backup would be maintained on PACE for window specified by service agreement.

* E.g. Dropbox, Box, departmental storage systems, and other storage devices.

Acknowledged Gap: PACE doesn’t have any way of knowing which datasets might have contractual and regulatory obligations.

Other Considerations: Review or amendment of NDA, TCP, IRB, etc may be necessary if data moves.

Situation 3: A PI leaves GT. Who is responsible for the data from their project and if students remain, how is their access to the data addressed?

Outcome of the data: The chair of the PI’s School, or Director of their Lab/IRI will need to decide this, preferably in consultation with the PI before they leave. There should be a formal exit agreement decided on before the faculty leaves about how to handle any remaining research loose ends. This would include the best way to handle sponsored work which is still active at GT (including students finishing up), lab equipment, and data.

In order to avoid interrupting student progress, priority should be given to students, so they maintain access to the research data. For example

  • If the student is continuing the same or a related project with a new PI, the data will be transferred to the new PI’s control if the data is relevant to the student’s work.
  • If the former PI and their School chair agree and identify resources, a copy could be made for the new PI while preserving the original files in the appropriate location for former faculty.
  • If the new PI is not supervising the same project or is not appropriate to take control of the data, then the student may be provided continued access to the old research data in a new location.

Acknowledged Gap: The use of exit agreements is not consistent, and data is often forgotten. Consequently, there are a number of sizable datasets remaining in PACE which were not addressed when the faculty member left. PACE will proceed with actions based on the above guidelines.

Other Considerations:

  • If data is owned by GT, there may need to be a DTA to cover details of its transfer.
  • If the data is to remain at GT in PACE, resources will need to be identified to maintain it for the necessary period of time and consistent with contractual and regulatory obligations as well as the wishes of the sponsor. Note each school can use up to 1 TB of free space on PACE storage in Coda, so they could store data there for smaller quantities at no charge. This would not handle larger quantities.  
  • Deceased faculty require other considerations (e.g. consulting co-PIs to identify critical data)
  • If PI needs to access/copy their data stored on PACE and take it to their new institution, it must be done in compliance with contractual and regulatory obligations and the wishes of the sponsor.
  • If the faculty member leaves in a non-amicable situation, then the School chair and College representatives will coordinate with PACE, Local IT, and other stakeholders to ensure data is handled properly and resources are identified if needed.

Roles and Responsibilities

After reviewing these situations, the following chart captures the main points and should be used as a guideline going forward, until further review and approval can establish a proper data management policy and process.

Role Responsibilities
PACE System Owner (aka Data service provider)
  • Provide guidance as needed especially with regard to technical capabilities
  • Provide users with tools to see how much data they are using and who has access and how much it costs to maintain it.
  • Maintain backups consistent with service offering
  • Remind PI of their responsibility to ensure that data are stored in a GT service compatible with the data protection / classification, and of the need to respect whatever contractual and regulatory obligations may exist (e.g. NDA, DUA, TCP, DMP)
  • If PACE is requested by the PI to execute an action on the data (e.g. delete some or all of it) they will confirm with the PI in writing. This also applies to situation where data needs to be either copied or transferred between PIs.
  • Provide timely cost estimates for data storage with sufficient lead time for faculty to make transitional plans.
PI Data Steward
  • Provide funding for life of the contract associated to the data while they are affiliated with GT
  • Responsible for ensuring all sponsor-specified contractual and regulatory obligations are met.
  • Provide PACE with necessary information to ensure correct data access provisioning.
  • Responsible for data deletion which is compliant with all sponsor-specified contractual and regulatory obligations
  • Responsible to notify PACE of any changes (such as personnel) in their group so that data access can be timely updated.
Local IT Unit Notify PACE if PI is leaving GT.
GT Data Owner (use proper data gov language) Ultimately accountable for ensuring all sponsor-specified contractual and regulatory obligations are met.
Student/Postdoc/etc. Data user and data creator Inform Data system owner if they change PI/group
School Chairs, or Lab/IRI Director Interim Data Steward
  • Keep the local IT staff engaged in any HPC related procurement, and data agreements. This includes early negotiations with prospective faculty as well as exit agreement when faculty leave.
  • Ensure proper disposition of research data. Unless otherwise specified in research contract(s), either assume the role of data steward, or designate as needed a new data steward.