Updated 2023-05-31
LAMMPS-GPU¶
Overview¶
LAMMPS is a classical molecular dynamics code with a focus on materials modeling. It's an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for solid-state materials (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or mesoscopic systems.
Running LAMMPS-GPU Interactively¶
Allocating Resources####¶
-
In order to run Lammps-GPU interactively we can use the
salloc
command to specify the account, partitions, time, and queue -
Here is an example of an
salloc
command you can use:salloc -A [Account] -N 1 -n 8 -t 15 -q embers -G 1
-
This will allocate the proper resources to run LAMMPS-GPU
Using an Interactive File Example¶
-
The following example will show Lammps running interactively using a example interactive file
-
Here is what the interactive file
lammps-gpu-example
should look like:
# This LAMMPS input script simulates LJ particles in a 2D box
# Written by Simon Gravelle (https://simongravelle.github.io/)
# Find more scripts here: https://github.com/simongravelle/lammps-input-files
# LAMMPS tutorials for beginners: https://lammpstutorials.github.io/
# main parameters
units lj
dimension 2
atom_style atomic
pair_style lj/cut 2.5
boundary p p p
# create system and insert atoms
region myreg block -30 30 -30 30 -0.5 0.5
create_box 2 myreg
create_atoms 1 random 1500 341341 myreg
create_atoms 2 random 100 127569 myreg
# atom settings
mass 1 1
mass 2 1
pair_coeff 1 1 1.0 1.0
pair_coeff 2 2 0.5 3.0
neigh_modify every 1 delay 5 check yes
# minimisation
minimize 1.0e-4 1.0e-6 1000 10000
reset_timestep 0
# dynamics
fix mynve all nve
fix mylgv all langevin 1.0 1.0 0.1 1530917
fix myefn all enforce2d
timestep 0.005
# outputs
thermo 1000
dump mydmp all atom 1000 dump.lammpstrj
# run
run 10000
-
We can do the following in order to run this file with Lammps-GPU:
-
Load module:
module load lammps-gpu
-
Run the script:
srun -n 6 lmp < lammps-gpu-example
-
Your output should look something like this:
LAMMPS (7 Jan 2022)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
Created orthogonal box = (-30 -30 -0.5) to (30 30 0.5)
3 by 2 by 1 MPI processor grid
Created 1500 atoms
using lattice units in orthogonal box = (-30 -30 -0.5) to (30 30 0.5)
create_atoms CPU = 0.000 seconds
Created 100 atoms
using lattice units in orthogonal box = (-30 -30 -0.5) to (30 30 0.5)
create_atoms CPU = 0.000 seconds
WARNING: Using 'neigh_modify every 1 delay 0 check yes' setting during minimization (src/src/min.cpp:187)
generated 1 of 1 mixed pair_coeff terms from geometric mixing rule
...
...
Setting up cg style minimization ...
Unit style : lj
Current step : 0
Per MPI rank memory allocation (min/avg/max) = 4.176 | 4.176 | 4.177 Mbytes
Step Temp E_pair E_mol TotEng Press
0 0 5.8997404e+14 0 5.8997404e+14 1.5732641e+15
81 0 -1.7518285 0 -1.7518285 -0.15730928
Loop time of 0.0161491 on 6 procs for 81 steps with 1600 atoms
99.2% CPU use with 6 MPI tasks x 1 OpenMP threads
Minimization stats:
Stopping criterion = energy tolerance
Energy initial, next-to-last, final =
589974040194331 -1.75166415802626 -1.75182852779174
Force two-norm initial, final = 2.5817498e+20 60.584174
Force max component initial, final = 1.5160091e+20 11.519543
Final line search alpha, max atom move = 6.6931485e-05 0.00077102009
Iterations, force evaluations = 81 197
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 0.0059371 | 0.0071613 | 0.0088512 | 1.0 | 44.34
Neigh | 0.0015247 | 0.0017042 | 0.0019472 | 0.3 | 10.55
Comm | 0.002375 | 0.0041855 | 0.0055143 | 1.4 | 25.92
Output | 0 | 0 | 0 | 0.0 | 0.00
Modify | 0 | 0 | 0 | 0.0 | 0.00
Other | | 0.003098 | | | 19.18
...
...
Total # of neighbors = 8440
Ave neighs/atom = 5.275
Neighbor list builds = 24
Dangerous builds = 0
generated 1 of 1 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
Unit style : lj
Current step : 0
Time step : 0.005
Per MPI rank memory allocation (min/avg/max) = 4.063 | 4.063 | 4.063 Mbytes
Step Temp E_pair E_mol TotEng Press
0 0 -1.7518285 0 -1.7518285 -0.15730928
1000 0.99279852 -1.3476948 0 -0.35551678 0.78484062
2000 1.022668 -1.3292054 0 -0.30717657 0.81660226
3000 1.0185213 -1.3456334 0 -0.32774866 0.76583719
4000 0.96371604 -1.3062798 0 -0.34316611 0.85257195
5000 0.96229603 -1.3303442 0 -0.36864964 0.70962141
6000 0.94309004 -1.3151242 0 -0.37262362 0.72161759
7000 0.99747756 -1.3064984 0 -0.30964422 0.84469381
8000 1.0138762 -1.3348616 0 -0.32161914 0.78244061
9000 0.9639628 -1.3148769 0 -0.35151653 0.85549524
10000 1.0020337 -1.3172927 0 -0.31588535 0.83832425
Loop time of 0.670191 on 6 procs for 10000 steps with 1600 atoms
Performance: 6445926.779 tau/day, 14921.127 timesteps/s
99.8% CPU use with 6 MPI tasks x 1 OpenMP threads
...
...
Total # of neighbors = 8568
Ave neighs/atom = 5.355
Neighbor list builds = 1152
Dangerous builds = 0
Total wall time: 0:00:00
Running Lammps-GPU in Batch Mode¶
- We can also test this in a normal batch mode. Here is an example batch script:
#!/bin/bash
#SBATCH -J SBATCHlammpsTest
#SBATCH -A phx-pace-staff
#SBATCH -N 2 --ntasks-per-node=4
#SBATCH -t 10
#SBATCH -q embers
#SBATCH -o Report-%j.out
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=gburdell3@gatech.edu
cd $SLURM_SUBMIT_DIR
module load lammps-gpu
srun -n 6 lmp < lammps-gpu-example
- Your expected output should look something like this:
---------------------------------------
Begin Slurm Prolog: Apr-28-2023 01:54:28
Job ID: 1774754
User ID: gburdell3
Account: [Account]
Job name: SBATCHlammpsTest
Partition: cpu-small
QOS: embers
---------------------------------------
Lmod is automatically replacing "gcc/10.3.0-o57x6h" with "intel/20.0.4".
The following have been reloaded with a version change:
1) mvapich2/2.3.6-ouywal => mvapich2/2.3.6-z2duuy
LAMMPS (7 Jan 2022)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
Created orthogonal box = (-30 -30 -0.5) to (30 30 0.5)
3 by 2 by 1 MPI processor grid
Created 1500 atoms
using lattice units in orthogonal box = (-30 -30 -0.5) to (30 30 0.5)
create_atoms CPU = 0.000 seconds
Created 100 atoms
using lattice units in orthogonal box = (-30 -30 -0.5) to (30 30 0.5)
create_atoms CPU = 0.000 seconds
WARNING: Using 'neigh_modify every 1 delay 0 check yes' setting during minimization (src/src/min.cpp:187)
generated 1 of 1 mixed pair_coeff terms from geometric mixing rule
...
...
Setting up cg style minimization ...
Unit style : lj
Current step : 0
Per MPI rank memory allocation (min/avg/max) = 4.176 | 4.176 | 4.177 Mbytes
Step Temp E_pair E_mol TotEng Press
0 0 5.8997404e+14 0 5.8997404e+14 1.5732641e+15
81 0 -1.7518285 0 -1.7518285 -0.15730928
Loop time of 0.0161491 on 6 procs for 81 steps with 1600 atoms
99.2% CPU use with 6 MPI tasks x 1 OpenMP threads
Minimization stats:
Stopping criterion = energy tolerance
Energy initial, next-to-last, final =
589974040194331 -1.75166415802626 -1.75182852779174
Force two-norm initial, final = 2.5817498e+20 60.584174
Force max component initial, final = 1.5160091e+20 11.519543
Final line search alpha, max atom move = 6.6931485e-05 0.00077102009
Iterations, force evaluations = 81 197
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
---------------------------------------------------------------
Pair | 0.0059371 | 0.0071613 | 0.0088512 | 1.0 | 44.34
Neigh | 0.0015247 | 0.0017042 | 0.0019472 | 0.3 | 10.55
Comm | 0.002375 | 0.0041855 | 0.0055143 | 1.4 | 25.92
Output | 0 | 0 | 0 | 0.0 | 0.00
Modify | 0 | 0 | 0 | 0.0 | 0.00
Other | | 0.003098 | | | 19.18
...
...
Total # of neighbors = 8440
Ave neighs/atom = 5.275
Neighbor list builds = 24
Dangerous builds = 0
generated 1 of 1 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
Unit style : lj
Current step : 0
Time step : 0.005
Per MPI rank memory allocation (min/avg/max) = 4.063 | 4.063 | 4.063 Mbytes
Step Temp E_pair E_mol TotEng Press
0 0 -1.7518285 0 -1.7518285 -0.15730928
1000 0.99279852 -1.3476948 0 -0.35551678 0.78484062
2000 1.022668 -1.3292054 0 -0.30717657 0.81660226
3000 1.0185213 -1.3456334 0 -0.32774866 0.76583719
4000 0.96371604 -1.3062798 0 -0.34316611 0.85257195
5000 0.96229603 -1.3303442 0 -0.36864964 0.70962141
6000 0.94309004 -1.3151242 0 -0.37262362 0.72161759
7000 0.99747756 -1.3064984 0 -0.30964422 0.84469381
8000 1.0138762 -1.3348616 0 -0.32161914 0.78244061
9000 0.9639628 -1.3148769 0 -0.35151653 0.85549524
10000 1.0020337 -1.3172927 0 -0.31588535 0.83832425
Loop time of 0.670191 on 6 procs for 10000 steps with 1600 atoms
Performance: 6445926.779 tau/day, 14921.127 timesteps/s
99.8% CPU use with 6 MPI tasks x 1 OpenMP threads
...
...
Total # of neighbors = 8568
Ave neighs/atom = 5.355
Neighbor list builds = 1152
Dangerous builds = 0
Total wall time: 0:00:00
---------------------------------------
Begin Slurm Epilog: Apr-28-2023 01:54:32
Job ID: 1774754
Array Job ID: _4294967294
User ID: gburdell3
Account: [Account]
Job name: SBATCHlammpsTest
Resources: cpu=8,mem=8G,node=2
Rsrc Used: cput=00:01:28,vmem=336K,walltime=00:00:11,mem=0,energy_used=0
Partition: cpu-small
QOS: embers
Nodes: atl1-1-02-010-22-2,atl1-1-02-010-23-1
---------------------------------------