Updated 2021-02-03
Technical Questions¶
How can I learn more about PACE clusters and users/groups/queues?¶
- PACE offers the tool Ganglia for inspecting performance/utilization/status of PACE managed clusters.
What are those "data" and "scratch" folders in my home directory?¶
- Please refer to the Storage Guide for the most relevant information.
- On the Phoenix Cluster there are
project
andscratch
directories. Theproject
directory is for long term storage, while scratch is for short-term high-performance storage. The Phoenix Storage Guide has more information - On the Hive Cluster, there are two symbolic links - data and scratch. The data symbolic link points to your project directory space for long term storage of data sets. The scratch symbolic link points to your space on the high-performance scratch storage
- As part of your job submission file, you can make additional directories within your scratch space and copy input files from your project directory into the newly created sub-directory on the scratch. During the execution of your job, operate on the copy within the scratch space. When your calculations are complete, copy needed files back to your project directory space and remove the remaining files from the scratch space.
- Remember, the scratch space is limited and not intended to hold data for the long term. We implement automated removal of "old" files (> 60 days old) from the scratch space each week. In addition, we apply 7TB hard quotas and a file limit of 1 Million Files per user. We do not perform backups on the scratch storage, but do for the project directory and home directory storage.
How can I get information about CPUs on a particular node?¶
- Use this shell command from your home directory:
cat /proc/cpuinfo
- The output should look something like this:
foo@joe98 ~> cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 2222
stepping : 3
cpu MHz : 3015.524
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 6038.61
TLB size : 1088 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 2222
stepping : 3
cpu MHz : 3015.524
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 6030.07
TLB size : 1088 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
processor : 2
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 2222
stepping : 3
cpu MHz : 3015.524
cache size : 1024 KB
physical id : 1
siblings : 2
core id : 0
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 6030.07
TLB size : 1088 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
processor : 3
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 2222
stepping : 3
cpu MHz : 3015.524
cache size : 1024 KB
physical id : 1
siblings : 2
core id : 1
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy
bogomips : 6030.07
TLB size : 1088 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
foo@joe98 ~>
How can I find out how much memory a particular node includes?¶
- Use this shell command from your home directory:
cat /proc/meminfo
- The output should look something like this:
foo@joe98 ~> cat /proc/meminfo
MemTotal: 16419200 kB
MemFree: 15035800 kB
Buffers: 239928 kB
Cached: 283524 kB
SwapCached: 2120 kB
Active: 706044 kB
Inactive: 114520 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 16419200 kB
LowFree: 15035800 kB
SwapTotal: 2096440 kB
SwapFree: 1899200 kB
Dirty: 316 kB
Writeback: 0 kB
Mapped: 303940 kB
Slab: 540140 kB
CommitLimit: 10306040 kB
Committed_AS: 504724 kB
PageTables: 2516 kB
VmallocTotal: 536870911 kB
VmallocUsed: 1320 kB
VmallocChunk: 536869559 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB
foo@joe98 ~>
What's eating up all my disk space?¶
- Use
du -sh *
to show the disk usage of each file/dir in the current directory.
Is there a way to allow my interactive sessions to persist as I travel between my home and lab computers?¶
- Yes, you can use Screen as outlined on the Screen page of the Software Guides section.
- Please refer to the following link for more information: GNU Screen
How do I get system email sent elsewhere?¶
- If you are intereted in PBS-issued emails only, you can specify your email address in the PBS submit script, followed by the "#PBS -M":
#PBS -M your_email_address
- If you would like all system emails forwarded, then you create a .forward file in your home directory:
foo@pacemaker ~> cd
foo@pacemaker ~> echo your_email_address > .forward
What do I do if I have trouble transferring files?¶
- Please refer to the Storage and File Transfer section and make sure that you have followed all the steps correctly. If you have and the problem persists, please try another transfer method in the section.
- If you are still unable to transfer files properly, run the following command from your home directory if are having trouble transferring files off of the cluster:
pace-support.sh
- If you are having trouble transferring files to the cluster, please contact pace-support@oit.gatech.edu
What do I do if I'm having problems with my password?¶
- PACE clusters use the standard "GT Account" provided to all GT faculty, staff and students. For external collaborators, we can provision guest accounts created by their GT sponsor.
- Guest accounts and password resets can be resolved by using passport.
- If you still have problems with your password, please see your local Computer Support Representative (CSR), or visit the Technology Support Center.
How can I get general information about all clusters?¶
- OIT offers Ganglia to provide users with any kind of information they might need about the clusters. The webpages can only be browsed on campus, or via VPN.
- About Ganglia:
- The main page of Ganglia provides two graphs for CPU and Memory utilization for the past hour, for each cluster.
- You can get historical information up to a year from the menu titled "Last" (see figure below).
- To get more detailed (i.e. per-node) information, you can click to the cluster title, or any of the graphs.
- The workload on each node are color coded, e.g., nodes that use almost 100% of CPUs will appear red.
- If you submitted a job to a cluster and it is not allocted for a long time, you can always check the cluster utilization from this webpage and see how many nodes are busy/idle.
- If the cluster looks idle and your jobs is not still being allocated, then please check your PBS parameters for typos.