Transfer Files Using Science DMZ¶
Why use Science DMZ?¶
- Globus is the fastest and most effecient file transfer option, especially for large transfers
- PACE Grid Science DMZ allows high speed transfers between our local data transfer node and data transfer node endpoints external to campus
Step 1: Configuring read or write access to Science DMZ for internal or external collaborators¶
- Refer to general PACE globus documentation at http://docs.pace.gatech.edu/storage/globus.
- The name for the Science DMZ Globus endpoint managed by PACE is called PACE Grid (note please use PACE Grid and not "PACE Grid Server", since PACE Grid Server is a management endpoint only, and not for public transfers.).
- For collaborators that need to upload data that are external to Georgia Tech, collect the user's id or credentials from CILogin [reference needed] (normally
email@example.com external collaborarator, which they can used to log into CILogin . Also available are ORCID and Google ID. For instance
firstname.lastname@example.org). That CILogin ID or other credentials will then be used to assign read and/or write access to the appropriate folder at the PACE Grid globus endpoint. Send a request to pace-support with the above credentials and the purpose of their transfer into or out of the Science DMZ (Share research results, transfer input data to a collaborator at Georgia Tech, migrate data for student, faculty or staff joining Georgia Tech or leaving Georgia Tech, etc.).
- If you are already a Georgia Tech user, then you already have an account, but a directory will still need to be configured for you on the PACE Grid endpoint (probably under input_and_output/gatech/username or input_and_output/pace/username). Send a request to pace-support, and once validated, a read, write or sharable directory will be configured for you. After the directory is configured, go to https://www.globus.org/ and click Log In . Search for and select "Georgia Institute of Technology", you will then be taken to the Gatech login page. After logging in with your GT credentials, you will be redirected to the main file transfer screen.
Step 2: Set Local and External Endpoints for a Local User doing a transfer¶
- Log in, and you will be taken to the main file manager screen. Choose Georgia Institute of Technology and your typical gatech login and password. You may be presented with the DUO Dual Factor authentication to login if you have not done so already.
After a successful login, you will be presented with the File Manager view. Select PACE Grid as the first end point.
You will be presented with some directories. In this example we will use input_and_output/pace/username, since the user in this example is already a PACE user, but depending on your circumstance you may be under input_and_output/gatech , share/project or some other directory, as directed by a response to a ticket you submit to gain access to Science DMZ.
The External endpoint we will use in this example will be ESNet New York:
- On the right I search for ESNet and choose the ESNet New York endpoint
- I initiate transfer of a 50GB directory as a test. (Note typically transfers will automatically have "verify transfer" selected, but check to make sure, if you want to insure the integrity of the transfer:
- You can check the status of active transfers in the Activity link on the left.
- You can refresh the File Manager window and verify the transferred folder is there.
Step 3: Set Local and External Endpoints for transfer for an External User¶
Ideally if you have access to an external Globus endpoint, you want it's connnection on par with other sites with a properly tuned DTN (data transfer node) running Globus Connect Server (NOT just Globus Connect Personal, which will be limited to the bandwidth of your workstation or laptop) with a 10Gbe, 40Gbe or 100Gbe connection, that has a DMZ profile to the WAN (Wide Access Network) that can enable multiple GB per second transfers. Otherwise, the performance of your transfer will be throttle by the performance of whatever network connection your data is connected to, including throttling through a firewall.
In this example, I will log into my external LIGO account, with the assumption that I had submitted a request for that account to upload to Science DMZ in Globus endpoint PACE Grid (there will be some throttling because of the LIGO firewall, but it will give the work flow on how an external user can connect to PACE Grid ).
Log in with my LIGO credentials (some login screens skipped for brevity).
- On the left, migrate to the directory we were sent in response to DMZ access request, which in this example is input_and_ouput/pace/amcneill3 .
Transfer files and folders¶
- Select the folder / file you want to transfer
Transfer or Sync to...on the menu in the middle
- Select the folder you want to transfer to
- Select one of the
Startbutton at the bottom of the screen, with the arrow direction corresponding to the direction of transfer (i.e personal machine to cluster or cluster to personal machine)
- Status of the transfer can be viewed in
Activitylocated in the menu on the left (click on top left corner if it isnt shown")
- Globus should send you an email when the transfer is complete
- Final status of your transfer
Step 4: Policy duration of storage of temporary transfers¶
- Ideally the purpose of most Science DMZ workflows will be a temporary cache of data to be transferred into or off of campus. Therefore the ideal length of time that data should be allowed on any Science DMZ storage should be a few days or at most a few weeks. Exceptions can be made based on extenuating circumstances.
(to be completed)