Running LAMMPS on Linux with Nvidia GPU or Multi-core CPU

So the other day, one of my friends came to my room, asking for help on a “LAMMPS” library that has to do with molecular dynamics. He got the basics running by getting the pre-built Ubuntu Linux executables:

sudo add-apt-repository ppa:gladky-anton/lammps
sudo apt-get update 
sudo apt-get install lammps-daily

But then he realized that the ‘apt-get’ version will only use one CPU thread instead of multiple, which would be too slow for his simulation. Also, he has an Nvidia Quadro GPU which he wished could be utilized for his simulations. So I dug around, trying to find a way to include multi-thread as well as GPU packages for LAMMPS. The annoying part was, the community of LAMMPS is a little bit outdated, and the newest thread I could find was still discussing about Kepler GPUs… So, after ~15 hours of digging around and GDB-level manipulation (which is absolutely painful for someone who hasn’t taken a single course in C/C++), the library was working. So here is everything you need to know to get LAMMPS running on your Linux with an Nvidia GPU or Multi-core CPU.

I have one in my GitHub repo that is compiled for CUDA computing capability 5.0, so you are also welcomed to simply download a compiled version of LAMMPS with GPU support.

Official documentation can be found at http://lammps.sandia.gov/doc/Manual.html .

First, download the official repository of LAMMPS from github, and switch to the stable branch:

~$ git clone https://github.com/lammps/lammps
Cloning into 'lammps'...
remote: Counting objects: 147198, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 147198 (delta 2), reused 0 (delta 0), pack-reused 147194
Receiving objects: 100% (147198/147198), 357.63 MiB | 3.26 MiB/s, done.
Resolving deltas: 100% (125710/125710), done.
Checking out files: 100% (9710/9710), done.
 
~$ cd lammps
~/lammps$ git checkout stable
Switched to branch 'stable'

Before going further, we need to check the versions of dependencies to make sure everything is at the right version. I was using an old gcc compiler, and the build ended up failing all the time.

~/lammps$ gcc -v
<omitted>
gcc version 6.4.0 20180424 (Ubuntu 6.4.0-17ubuntu1~16.04) 

g++ and gcc should be 6.x.x here, and if not, one needs add the PPA, update, then install the correct version:

sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-6 g++-6

Another problem is that multiple gcc/g++ versions can exit on the same computer, and they have different priorities. Thus, we need to set the 6.x.x version to default:

~/lammps$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-6 100 --slave /usr/bin/g++ g++ /usr/bin/g++-6 

~/lammps$ sudo update-alternatives --config gcc  
There are 3 choices for the alternative gcc (providing /usr/bin/gcc).

  Selection    Path              Priority   Status
------------------------------------------------------------
* 0            /usr/bin/gcc-6     100       auto mode
  1            /usr/bin/gcc-4.8   50        manual mode
  2            /usr/bin/gcc-4.9   60        manual mode
  3            /usr/bin/gcc-6     100       manual mode

Press <enter> to keep the current choice[*], or type selection number: 0

gcc should now be set to the default compiler. If not, type in the number associated to gcc-6, and set it to default.

The next library to check is OpenMPI, A High Performance Message Passing Library. The version that worked for me is 3.1.0, and to install the newest version, go to https://www.open-mpi.org/software/ompi/v3.1/ and download the openmpi-3.1.0.tar.gz file.

mpirun --version
mpirun (Open MPI) 3.1.0

Report bugs to http://www.open-mpi.org/community/help/

And, follow the steps from the official documentation to build the library:

~/Downloads$ gunzip -c openmpi-3.1.0.tar.gz | tar xf -
~/Downloads$ cd openmpi-3.1.0
~/Downloads$ ./configure --prefix=/usr/local
<...lots of output...>
~/Downloads$ make all install
~/Downloads$ sudo rm -r openmpi-3.1.0

Last, CUDA and CUDA toolkit should all be version 9.0. Run the following commands to check them:

~/lammps$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
~/lammps$ nvidia-smi
Wed Jun  6 21:53:17 2018       
<omitted>
| NVIDIA-SMI 384.130                Driver Version: 384.130       
<omitted>
| GPU  Name        Persistence-M| Bus-Id        Disp.A | 
<omitted>
|   0  Quadro M2000M       Off  | 00000000:01:00.0 Off |          
<omitted>

If not, search online for instructions to install CUDA 9.0 on your computer.

Once all dependencies have been checked out, we can go on and explore the LAMMPS building options.

Go to the /src directory, and in the terminal type make to get the ways to use the command.

~/lammps$ cd src
~/lammps/src$ make

make clean-all           delete all object files
make clean-machine       delete object files for one machine
make mpi-stubs           build dummy MPI library in STUBS
make install-python      install LAMMPS wrapper in Python
make tar                 create lmp_src.tar.gz for src dir and packages

make package                 list available packages and their dependencies
make package-status (ps)     status of all packages
make package-installed (pi)  list of installed packages
make yes-package             install a single pgk in src dir
make no-package              remove a single pkg from src dir
make yes-all                 install all pgks in src dir
make no-all                  remove all pkgs from src dir
make yes-standard (yes-std)  install all standard pkgs
make no-standard (no-std)    remove all standard pkgs
make yes-user                install all user pkgs
make no-user                 remove all user pkgs
make yes-lib       install all pkgs with libs (included or ext)
make no-lib        remove all pkgs with libs (included or ext)
make yes-ext                 install all pkgs with external libs
make no-ext                  remove all pkgs with external libs

make package-update (pu) replace src files with updated package files
make package-overwrite   replace package files with src files
make package-diff (pd)   diff src files against package files

make lib-package         help for download/build/install a package library
make lib-package args="..."    download/build/install a package library
make purge               purge obsolete copies of source files

make machine             build LAMMPS for machine
make mode=lib machine    build LAMMPS as static lib for machine
make mode=shlib machine  build LAMMPS as shared lib for machine
make mode=shexe machine  build LAMMPS as shared exe for machine
make makelist            create Makefile.list used by old makes
make -f Makefile.list machine     build LAMMPS for machine (old)

machine is one of these from src/MAKE:

# mpi = MPI with its default compiler
# serial = GNU g++ compiler, no MPI

... or one of these from src/MAKE/OPTIONS:

# big = MPI with its default compiler, BIGBIG switch
# fftw = MPI with its default compiler, FFTW3 support
# g++_mpich = MPICH with compiler set to GNU g++
# g++_mpich_link = GNU g++ compiler, link to MPICH
# g++_openmpi = OpenMPI with compiler set to GNU g++
# g++_openmpi_link = GNU g++ compiler, link to OpenMPI
# gpu = GPU package, MPI with its default compiler
# g++_serial = GNU g++ compiler, no MPI
# icc_mpich = MPICH with compiler set to Intel icc
# icc_mpich_link = Intel icc compiler, link to MPICH
# icc_openmpi = OpenMPI with compiler set to Intel icc
# icc_openmpi_link = Intel icc compiler, link to OpenMPI
# icc_serial = Intel icc compiler, no MPI
# intel_phi = USER-INTEL package with Phi offload support, Intel MPI, MKL FFT
# intel_cpu_intelmpi = USER-INTEL package, Intel MPI, MKL FFT
# intel_cpu_intelmpi = USER-INTEL package, Intel MPI, MKL FFT
# intel_cpu_mpich = USER-INTEL package, MPICH with compiler set to Intel icc
# intel_cpu_openmpi = USER-INTEL package, OpenMPI with compiler set to Intel icc
# jpeg = default MPI compiler, default MPI, JPEG support
# knl = Flags for Knights Landing Xeon Phi Processor,Intel Compiler/MPI,MKL FFT
# kokkos_cuda_mpi = KOKKOS/CUDA package, MPICH or OpenMPI with nvcc compiler, Kepler GPU
# kokkos_mpi_only = KOKKOS package, no threading, MPI with its default compiler
# kokkos_omp = KOKKOS/OMP package, MPI with its default compiler
# kokkos_phi = KOKKOS package with PHI support, Intel compiler, default MPI
# mgptfast = MPI with its default compiler, optimizations for USER-MGPT
# omp = USER-OMP package, MPI with its default compiler
# opt = OPT package, MPI with its default compiler
# pgi_mpich_link = Portland group compiler, link to MPICH
# png = default MPI compiler, default MPI, PNG support

... or one of these from src/MAKE/MACHINES:

# linux = RedHat Linux box, Intel icc, MPICH2, FFTW
# bgl = LLNL Blue Gene Light machine, xlC, native MPI, FFTW
# bgq = IBM Blue Gene/Q, multiple compiler options, native MPI, ALCF FFTW2
# multiple compiler options for BGQ
# chama - Intel SandyBridge, mpic++, openmpi, no FFTW
# cori2 = NERSC Cori II KNL, static build, FFTW (single precision)
# cygwin = Windows Cygwin, mpicxx, MPICH, FFTW
# glory = Linux cluster with 4-way quad cores, Intel mpicxx, native MPI, FFTW
# mpi = MPI with its default compiler
# jaguar = ORNL Jaguar Cray XT5, CC, native MPICH, FFTW
# mac = Apple PowerBook G4 laptop, c++, no MPI, FFTW 2.1.5
# mac_mpi = Apple laptop, MacPorts Open MPI 1.4.3, gcc 4.8, fftw, jpeg
# mingw32-cross = Win 32-bit, gcc-4.7.1, MinGW, internal FFT, no MPI, OpenMP
# mingw32-cross-mpi = Win 32-bit, gcc-4.7.1, MinGW, internal FFT, MPICH2, OpenMP
# mingw64-cross = Win 64-bit, gcc-4.7.1, MinGW, internal FFT, no MPI, OpenMP
# mingw64-cross-mpi = Win 64-bit, gcc-4.7.1, MinGW, internal FFT, MPICH2, OpenMP
# myrinet = cluster, g++, myrinet MPI, no FFTs
# power = IBM Power5+, mpCC_r, native MPI, FFTW
# redsky - SUN X6275 nodes, Nehalem procs, mpic++, openmpi, OpenMP, no FFTW 
# serial = RedHat Linux box, g++4, no MPI, no FFTs
# stampede = Intel Compiler, MKL FFT, Offload to Xeon Phi
# storm = Cray Red Storm XT3, Cray CC, native MPI, FFTW
# tacc = UT Lonestar TACC machine, mpiCC, MPI, FFTW
# ubuntu = Ubuntu Linux box, g++, openmpi, FFTW3
# ubuntu_simple = Ubuntu Linux box, g++, openmpi, KISS FFT
# kokkos_cuda = KOKKOS/CUDA package, OpenMPI with nvcc compiler, Kepler GPU
# xe6 = Cray XE6, Cray CC, native MPI, FFTW
# xt3 = PSC BigBen Cray XT3, CC, native MPI, FFTW
# xt5 = Cray XT5, Cray CC, native MPI, FFTW

... or one of these from src/MAKE/MINE:

We are building for mpi with its default compilers. Before proceeding, one should check the required packages for the simulation code, and re-install the package. It’s always good to do make clean-all and make no-all first to ensure a clean compilation.

Multi-core CPU

Multi-threading in LAMMPS require the OPT package. The installation is simple: just type in make yes-opt with the rest of the required packages, then make mpi or make ubuntu_simple.

~/lammps/src$ make yes-opt
Installing package opt

~/lammps/src$ make mpi
make[1]: Entering directory '/home/a***o/lammps/src/Obj_mpi'
cc -O -o fastdep.exe ../DEPEND/fastdep.c
<...lots of output...>
size ../lmp_mpi
   text	   data	    bss	    dec	    hex	filename
3804320	   8056	  14184	3826560	 3a6380	../lmp_mpi
make[1]: Leaving directory '/home/a***o/lammps/src/Obj_mpi'

There might be warnings during the process of creating the executable, but it is okay to ignore them.

~/lammps/src$ find . -name lmp_mpi
./lmp_mpi

The file is located in /src, and on may use the cp command to copy it to other places for further use.

To start the simulation with multi-core GPU, type the following command:

mpirun -np 8 lmp_mpi -sf opt -in your_simulation.lmp

The number after -np refers to the 8 threads on my Xeon CPU, and the opt option after -sf indicates that OPT package is being used. If you would like to start the simulation without an input file, remove -in:

~/lammps/src$ mpirun -np 8 lmp_mpi -sf opt 
LAMMPS (11 May 2018)

GPU

The problem with GPU is more complicated. Since the GPU package is built upon a specific sm architectures (corresponding to the compute capability), we need to rebuild our library.

~/lammps/src$ cd ..
~/lammps$ cd lib
~/lammps/lib$ cd gpu
~/lammps/lib/gpu$ ls -a
~/lammps/lib/gpu$ vim Makefile.linux.double

Go to the lib/gpu directory, and open the file named Makefile.linux:

# /* ----------------------------------------------------------------------   
 #  Generic Linux Makefile for CUDA 
 #     - Change CUDA_ARCH for your GPU
 # ------------------------------------------------------------------------- */
 
 # which file will be copied to Makefile.lammps
 
 EXTRAMAKE = Makefile.lammps.standard
 
 CUDA_HOME = /usr/local/cuda
 NVCC = nvcc
 
 # Kepler CUDA
 #CUDA_ARCH = -arch=sm_35
 # Tesla CUDA
 CUDA_ARCH = -arch=sm_21
 # newer CUDA
 #CUDA_ARCH = -arch=sm_13
 # older CUDA
 #CUDA_ARCH = -arch=sm_10 -DCUDA_PRE_THREE
 
 # this setting should match LAMMPS Makefile
 # one of LAMMPS_SMALLBIG (default), LAMMPS_BIGBIG and LAMMPS_SMALLSMALL
 
 LMP_INC = -DLAMMPS_SMALLBIG

Since the default path of CUDA 9.0 is /local/cuda-9.0, the CUDA_HOME variable should be changed accordingly. Also, the number after sm_ shoud match the compute capability of the current Nvidia GPU. For more information, refer to the official website.

If a former build exists, use make -f clean to remove them. Otherwise, type make -f Makefile.linux into the terminal:

~/lammps/lib/gpu$ make -f Makefile.linux
<...lots of output...>
mpicxx -DMPI_GERYON -DUCL_NO_EXIT -DMPICH_IGNORE_CXX_SEEK -DOMPI_SKIP_MPICXX=1 -fPIC -O2 -DLAMMPS_SMALLBIG  -D_SINGLE_DOUBLE -I/usr/local/cuda-9.0/include -DUSE_CUDPP -Icudpp_mini -o nvc_get_devices ./geryon/ucl_get_devices.cpp -DUCL_CUDADR -L/usr/local/cuda-9.0/lib64 -lcuda 

A new executable nvc_get_devices should appear in the directory. Run it to see if LAMMPS can get the corrent information of your gpu.

~/lammps/lib/gpu$ ./nvc_get_devices
Found 1 platform(s).
Using platform: NVIDIA Corporation NVIDIA CUDA Driver
CUDA Driver Version:                           9.0

Device 0: "Quadro M2000M"
  Type of device:                                GPU
  Compute capability:                            5
  Double precision support:                      Yes
  Total amount of global memory:                 3.94745 GB
  Number of compute units/multiprocessors:       5
  Number of cores:                               960
  Total amount of constant memory:               65536 bytes
  Total amount of local/shared memory per block: 49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum group size (# of threads per block)    1024 x 1024 x 64
  Maximum item sizes (# threads for each dim)    2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Clock rate:                                    1.137 GHz
  Run time limit on kernels:                     Yes
  Integrated:                                    No
  Support host page-locked memory mapping:       Yes
  Compute mode:                                  Default
  Concurrent kernel execution:                   Yes
  Device has ECC support enabled:                No

Finally, go back to /src, install the new gpu package, and compile with MPI.

~\lammps\src$ make yes-gpu
Installing package gpu

~lammps/src$ make mpi
make[1]: Entering directory '/home/a***o/lammps/src/Obj_mpi'
cc -O -o fastdep.exe ../DEPEND/fastdep.c
<...lots of output...>
size ../lmp_mpi
   text	   data	    bss	    dec	    hex	filename
8563129	  12048	 163568	8738745	 8557b9	../lmp_mpi
make[1]: Leaving directory '/home/a***o/lammps/src/Obj_mpi'

Again, there will be lots of warnings, but it’s perfectly normal.

To run with GPU package, type:

mpirun lmp_mpi -sf gpu -pk gpu 1 -in your_simulation.lmp

While -sf indicates that GPU package is being used, gpu 1 specifies the exact GPU being used (in case that multiple exist). If your build is successful, you should something like the following when querying the GPU usage through watch -n 1 nvidia-smi:

~\lammps\src$ mpirun lmp_mpi -sf gpu -pk gpu 1 
LAMMPS (16 Mar 2018)

|    0     30068      C   lmp_mpi                                       26MiB |
|    0     30069      C   lmp_mpi                                       26MiB |
|    0     30070      C   lmp_mpi                                       26MiB |
|    0     30071      C   lmp_mpi                                       26MiB |
+-----------------------------------------------------------------------------+