Updated 2019-11-07

Storage and File Transfer on Hive with Globus

Access to PACE Data

Screenshot

Warning

For accessing your PACE data on Hive, please use Globus as documented in sections below.

  • When you log into Hive, you will not have access to your PACE data that you access from, e.g., Shared clusters (login-s) or Dedicated clusters (login-d). For accessing your PACE data on Hive, please use Globus as documented in sections below.
  • The above image illustrates how globus can be used to transfer your data from shared/dedicated clusters (login-s or -d) to Hive (login-hive)

Storage

Important

Users have separate home, data, and scratch directories on Hive compared to other PACE systems. If you'd like to transfer files from other PACE systems to Hive, you may want to refer to the Set Endpoints for Hive and datamover section under Step 3: Set Endpoints for transfer.

  • Please refer to the Storage Guide Page in the Storage and File Transfer section for a detailed guide that also applies to the Hive Cluster.

File Transfer

  • This guide will focus on using Globus, a secure research data management tool that PACE provides free access to.

Step 1: Making an account

  • Georgia Tech already provides accounts, so all you have to do is sign in with your Gatech information
  • Go to https://www.globus.org/ and click Log In
  • Search for and select "Georgia Institute of Technology", you will then be taken to the Gatech login page
  • After logging in with your GT credentials, you will be redirected to the main file transfer screen

Step 2: Installing the Globus Personal Client

  • The Globus Connect Personal Client is how Globus will have access to your files
  • After installing the personal client, you will have to set up your computer as an endpoint to allow files to be transfered to and from it
  • Log in to Globus then follow these well documented guides on how to install the Globus Personal Client and set up an endpoint:
  • Windows
  • Mac
  • Linux. Linux is a bit tricky, after using tar to extract the files, if ./globusconnect & gives errors about tcllib, skip to the part of the guide titled "How to Install Globus Connect Personal for Linux Using the Command Line" and follow that.
  • On Linux, try ./globusconnectpersonal -start & if ./globusconnect & doesnt work

Warning

Linux: Globus also has a standalone command line tool, globus-cli which needs to be installed on Linux if you are installing the personal client through the command line

  • If you want to create an endpoint and do not have globus cli installed or cannot login, you can create the endpoint in the globus.org portal. Navigate to the endpoints link:

Screenshot

Choose globus connect personal Screenshot

Select generate setup key for a selected name (in our case rich116-f39-15-004) Screenshot

Copy the setup key and use in the globusconnectpersonal -setup <the key you copied> Screenshot

Step 3: Set Endpoints for transfer

  • The following three sections describe the procedure for transfering files between the Hive Cluster and different endpoints (local, datamover, and PACE Internal)

Note

If you are asked for authentication when connecting to an endpoint, please enter your regular GT credentials.

Set Endpoints for Hive and Local

  • Used to transfer files between your local file system and the Hive Cluster file system

Important

Make sure the Globus Personal Client is running on your computer before setting up endpoints

  • Log in, and you will be taken to the main file manager screen

Screenshot

  • To set up your computer as one endpoint:
    • Click on the Collection Bar at the top
    • Click on "Your Collections"
    • If you set up your endpoint correctly following the documentation linked above, the endpoint you created should show up
    • Select it and you should be taken back to the file manager page, with your computer's files available to transfer showing up on the left

Screenshot

  • To set up the cluster as the other endpoint:
    • Click "Transfer or Sync to" which should open up a blank endpoint screen right next to your computer's files

Screenshot

  • Select the top bar labeled "collection" (next to the name of your computer endpoint)
  • Search for "PACE Hive", and select it when it shows up.

Screenshot

  • You will be prompted for a username and password, enter your GT username and password

Screenshot

  • Your cluster account and all your files on the cluster should now show up on the right
  • You can now transfer files between your account on the cluster and your personal computer

Tip

After the first time setting the endpoints, they will be saved under recent, so you can easily select them. The Globus Personal Client must be running before setting any endpoints

Set Endpoints for Hive and datamover node

  • Used to transfer files between the PACE Cluster file system and the Hive Cluster file system
  • Uses dual 10 GigE connections
  • Any user with access to both Clusters can use this

Important

Make sure the Globus Personal Client is running on your computer before setting up endpoints

  • Log in, and you will be taken to the main file manager screen

Screenshot

  • To set up datamover node as one endpoint:
    • Click on the Collection Bar at the top
    • Search "gatechpace#datamover" and select it when it shows up
    • You should be taken back to the file manager page, with the files on datamover available to transfer showing up on the left

Screenshot

  • Enter your GT username and password if prompted for authentication.

Screenshot

  • After authentication, the interface should like this:

Screenshot

  • To set up Hive as the other endpoint:
    • Click the "Transfer or Synch to" button and the focus will shift over to the blank endpoint screen on the right.
    • Click the top bar labeled "select a collection"
    • Search "PACE Hive" and click on it when it shows up

Screenshot

  • You may be prompted for a username and password. Enter your GT username and password.

Screenshot

  • Your Hive cluster account and the files on it should now show up on the right

Screenshot

  • Congratulations! You are now able to transfer files between the Hive cluster and the datamover node.

Set Endpoints for Hive and PACE Internal

  • Used to transfer files between the PACE Cluster file system and the Hive Cluster file system
  • Uses 40 GigE connections
  • Can only be used by approved users

Important

Make sure the Globus Personal Client is running on your computer before setting up endpoints

  • Log in, and you will be taken to the main file manager screen

Screenshot

  • To set up PACE Internal as one endpoint:
    • Click on the Collection Bar at the top
    • Search "PACE Internal" and select it when it shows up
    • You should be taken back to the file manager page, with the files on PACE Internal available to transfer showing up on the left

Screenshot

  • Enter your GT username and password if prompted for authentication.

Screenshot

  • After authentication, the interface should like this:

Screenshot

  • To set up Hive as the other endpoint:
    • Click the "Transfer or Synch to" button and the focus will shift over to the blank endpoint screen on the right.
    • Click the top bar labeled "select a collection"
    • Search "PACE Hive" and click on it when it shows up

Screenshot

  • You may be prompted for a username and password. Enter your GT username and password.

Screenshot

  • Your Hive cluster account and the files on it should now show up on the right

Screenshot

  • Congratulations! You are now able to transfer files between the Hive cluster and PACE Internal.

Transfer files and folders

  • Select the folder / file you want to transfer
  • Select Transfer or Sync to... on the menu in the middle
  • Select the folder you want to transfer to
  • Select one of the Start button at the bottom of the screen, with the arrow direction corresponding to the direction of transfer (i.e personal machine to cluster or cluster to personal machine)
  • Status of the transfer can be viewed in Activity located in the menu on the left (click on top left corner if it isnt shown")
  • Globus should send you an email when the transfer is complete
  • Example: Here I have selected the 3D Objects folder on my laptop and my test folder on the cluster
    • Transfer from my laptop to the cluster: To transfer the 3D Objects folder to the cluster, I would select the start arrow in the bottom left
    • Transfer from the cluster to my laptop: To transfer my test folder from the cluster to my laptop, I would select the start arrow in the bottom right

Screenshot

Using Globus Recap

Warning

Always make sure the Globus Personal Client is running before you go to transfer files

  1. Start globus personal client (using its program gui if on Windows or Mac)
    • On linux, navigate to where you installed the globuspersonalclient files, cd globuspersonalclient-x.y.z (x.y.x is version number) and then run ./globusconnectpersonal -start &
  2. Log into Globus, you should then be directed to the main file manager page. Other pages including activity page and endpoint manager are available in the drop down menu on the left (click top left if the menu is not visible)
  3. To set your personal machine as an endpoint, click on the "collection" bar on the top left (on file manager page), select your personal computer name (personal endpoint name you made). It should be under "recents" or in "your collections". To set the cluster as the other endpoint, click on the collection bar on the top right, search for PACE -> select PACE Hive. Log in with your GT username and password. The datamover endpoint will also show up in your recents if you have used it before
  4. You can now transfer files and folders as you wish. Monitor transfer status with the Activity tab
  5. When you are done transfering files, logout of globus online, and logout of the personal client on you computer. For linux to logout run globus logout

Using globus-url-copy command line tool

You will want to do the first part of the globus-toolkit install as outlined at . Install Globus Connect Server namely:

sudo curl -LOs https://downloads.globus.org/toolkit/globus-connect-server/globus-connect-server-repo-latest.noarch.rpm
sudo rpm --import https://downloads.globus.org/toolkit/gt6/stable/repo/rpm/RPM-GPG-KEY-Globus
sudo yum install globus-connect-server-repo-latest.noarch.rpm
#Install EPEL repo
sudo curl -LOs https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo yum install epel-release-latest-7.noarch.rpm

and then the specific globus-url-copy and myproxy software

yum install tcllib
yum install globus-gass-copy-progs
yum install myproxy
yum install globus-proxy-utils

You may also want to setup globusconnectpersonal, for managing a globus connect personal endpoint that you can access via the globus.org portal or via the globus-cli. It is not necessary for using globus-url-copy to upload files and directories to a remote endpoint.

wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
tar xzf globusconnectpersonal-latest.tgz
cd globusconnectpersonal-x.y.z
# setup globus connect personal in ~/.globusonline/lta, or pass -dir ~/.somethingelse to setup in ~/.somethingelse
./globusconnect -setup <whatever key you got from globus-cli or create endpoint on globus portal> # -dir ~/.somethingelse
#add -restrict-paths if you want to add additional shared paths, or edit ~/.globusonline/lta/config-paths
./globusconnect -start -restrict-paths RW~/,RW/writable/directory,R/read/only/directory,N/none/accessible/directory &
 # -dir ~/.somethingelse 

Log in to a local globus endpoint such as globus-research.pace.gatech.edu or iw-dm-4.pace.gatech.edu. In this example I use the PACE Research internal high speed endpoint connected to the VAPOR networks for servers connected to scientific instruments that are connected to the VAPOR network. The argument -t 8760 sets the proxy life time for 8760 hours or a year. The -b argument bootstraps the certificate directory from the endpoint (also implies -T trustroots from that server) and -s argument is the hostname of the server.

[amcneill3@rich116-f39-15 globusconnectpersonal-2.3.6]$ myproxy-logon -t 8760 -b -T -s globus-research
Bootstrapping MyProxy server root of trust.
New trusted MyProxy server: /C=US/O=Globus Consortium/OU=Globus Connect Service/CN=2bfacbe2-2af2-11e9-9fa4-0a06afd4a22e
New trusted CA (a059cd44.0): /C=US/O=Globus Consortium/CN=Globus Connect CA 3
Server authorization failed.  Server identity does not match expected identity.
If the server identity is acceptable, set
MYPROXY_SERVER_DN="/C=US/O=Globus Consortium/OU=Globus Connect Service/CN=2bfacbe2-2af2-11e9-9fa4-0a06afd4a22e"
and try again.
`[amcneill3@rich116-f39-15 globusconnectpersonal-2.3.6]$ export MYPROXY_SERVER_DN="/C=US/O=Globus Consortium/OU=Globus Connect Service/CN=2bfacbe2-2af2-11e9-9fa4-0a06afd4a22e"
[amcneill3@rich116-f39-15 globusconnectpersonal-2.3.6]$ myproxy-logon -t 8760 -b -T -s globus-research
Enter MyProxy pass phrase:
A credential has been received for user amcneill3 in /tmp/x509up_u296017.
Trust roots have been installed in /nv/hp1/amcneill3/.globus/certificates/.

It is a good idead to same the environmental variable MYPROXY_SERVER_DN for future use, by placing it in your .bashrc or .bash_profile file as:

export MYPROXY_SERVER_DN="/C=US/O=Globus Consortium/OU=Globus Connect Service/CN=2bfacbe2-2af2-11e9-9fa4-0a06afd4a22e

To list the contents of a remote directory using globus-url-copy, (-ss sets source DN, we want to set this to the DN for globus-research, otherwise, it will fail and complain that the hostname globus-research does not match the DN) do:

[amcneill3@rich116-f39-15 globus]$ globus-url-copy -ss "$MYPROXY_SERVER_DN" -list gsiftp://globus-research/~/scratch/5gbzd/

To upload a single file using globus-url-copy (here we want to set destination DN with -ds), do:

[amcneill3@rich116-f39-15 globus]$ globus-url-copy -ds "$MYPROXY_SERVER_DN" -fast -rst -rst-retries 0 -tcp-bs 1G -p 16 1G.dat gsiftp://globus-research/~/scratch/1Gb.dat
  • -fast = optimize transfer
  • -rst = reset if there is an interruption
  • -rst-restries 0 = retry an infinite amount of times if there are resets
  • -tcp-bs = set buffer size for transfer cache
  • -p 16 = set parrallesim (can be 1 to 16)
  • 1G.dat = source file
  • gsiftp://globus-research/~/scratch/1Gb.dat = destination file on globus-server "PACE Research" endpoint

In order to upload a directory using globus-url-copy, use the -r (recursive) and -cd (create destination) arguments

[amcneill3@rich116-f39-15 globus]$ globus-url-copy -ds "$MYPROXY_SERVER_DN" -fast -rst -rst-retries 0 -tcp-bs 1G -p 16 -r -cd 5gbz/ gsiftp://globus-research/~/scratch/5gbzd/

If you want to verify-checksum of the transfer then add the -verify-checksum argument

[amcneill3@rich116-f39-15 globus]$ globus-url-copy -ds "$MYPROXY_SERVER_DN" -verify-checksum -fast -rst -rst-retries 0 -tcp-bs 1G -p 16 -r -cd 5gbz/ gsiftp://globus-research/~/scratch/5gbzd/

(to be completed)


This material is based upon work supported by the National Science Foundation under grant number 1828187. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.