Running LAMMPS on Linux with Nvidia GPU or Multi-core CPU

So the other day, one of my friends came to my room, asking for help on a "LAMMPS" library that has to do with molecular dynamics. He got the basics running by getting the pre-built Ubuntu Linux executables: ``` shell sudo add-apt-repository ppa:gladky-anton/lammps sudo apt-get update sudo apt-get install lammps-daily ``` But then he realized that the 'apt-get' version will only use one CPU thread instead of multiple, which would be too slow for his simulation. Also, he has an Nvidia Quadro GPU which he wished could be utilized for his simulations. So I dug around, trying to find a way to include multi-thread as well as GPU packages for LAMMPS. The annoying part was, the community of LAMMPS is a little bit outdated, and the newest thread I could find was still discussing about Kepler GPUs... So, after ~15 hours of digging around and GDB-level manipulation (which is absolutely painful for someone who hasn't taken a single course in C/C++), the library was working. So here is everything you need to know to get LAMMPS running on your Linux with an Nvidia GPU or Multi-core CPU. *I have one in my GitHub repo that is compiled for CUDA computing capability 5.0, so you are also welcomed to simply [download a compiled version of LAMMPS with GPU support.](https://github.com/liaopeiyuan/LAMMPS-simulations)* *Official documentation can be found at http://lammps.sandia.gov/doc/Manual.html .* First, download the official repository of LAMMPS from github, and switch to the stable branch: ``` shell ~$ git clone https://github.com/lammps/lammps Cloning into 'lammps'... remote: Counting objects: 147198, done. remote: Compressing objects: 100% (4/4), done. remote: Total 147198 (delta 2), reused 0 (delta 0), pack-reused 147194 Receiving objects: 100% (147198/147198), 357.63 MiB | 3.26 MiB/s, done. Resolving deltas: 100% (125710/125710), done. Checking out files: 100% (9710/9710), done. ~$ cd lammps ~/lammps$ git checkout stable Switched to branch 'stable' ``` Before going further, we need to check the versions of dependencies to make sure everything is at the right version. I was using an old gcc compiler, and the build ended up failing all the time. ``` shell ~/lammps$ gcc -v gcc version 6.4.0 20180424 (Ubuntu 6.4.0-17ubuntu1~16.04) ``` g++ and gcc should be `6.x.x` here, and if not, one needs add the PPA, update, then install the correct version: ``` shell sudo add-apt-repository ppa:ubuntu-toolchain-r/test sudo apt-get update sudo apt-get install gcc-6 g++-6 ``` Another problem is that multiple gcc/g++ versions can exit on the same computer, and they have different priorities. Thus, we need to set the `6.x.x` version to default: ``` shell ~/lammps$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-6 100 --slave /usr/bin/g++ g++ /usr/bin/g++-6 ~/lammps$ sudo update-alternatives --config gcc There are 3 choices for the alternative gcc (providing /usr/bin/gcc). Selection Path Priority Status ------------------------------------------------------------ * 0 /usr/bin/gcc-6 100 auto mode 1 /usr/bin/gcc-4.8 50 manual mode 2 /usr/bin/gcc-4.9 60 manual mode 3 /usr/bin/gcc-6 100 manual mode Press to keep the current choice[*], or type selection number: 0 ``` gcc should now be set to the default compiler. If not, type in the number associated to `gcc-6`, and set it to default. The next library to check is OpenMPI, A High Performance Message Passing Library. The version that worked for me is `3.1.0`, and to install the newest version, go to https://www.open-mpi.org/software/ompi/v3.1/ and download the `openmpi-3.1.0.tar.gz` file. ``` shell mpirun --version mpirun (Open MPI) 3.1.0 Report bugs to http://www.open-mpi.org/community/help/ ``` And, follow [the steps from the official documentation](https://www.open-mpi.org/faq/?category=building) to build the library: ``` shell ~/Downloads$ gunzip -c openmpi-3.1.0.tar.gz | tar xf - ~/Downloads$ cd openmpi-3.1.0 ~/Downloads$ ./configure --prefix=/usr/local <...lots of output...> ~/Downloads$ make all install ~/Downloads$ sudo rm -r openmpi-3.1.0 ``` Last, CUDA and CUDA toolkit should all be version 9.0. Run the following commands to check them: ``` shell ~/lammps$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Sep__1_21:08:03_CDT_2017 Cuda compilation tools, release 9.0, V9.0.176 ~/lammps$ nvidia-smi Wed Jun 6 21:53:17 2018 | NVIDIA-SMI 384.130 Driver Version: 384.130 | GPU Name Persistence-M| Bus-Id Disp.A | | 0 Quadro M2000M Off | 00000000:01:00.0 Off | ``` If not, search online for instructions to install CUDA 9.0 on your computer. Once all dependencies have been checked out, we can go on and explore the LAMMPS building options. Go to the `/src` directory, and in the terminal type `make` to get the ways to use the command. ``` shell ~/lammps$ cd src ~/lammps/src$ make make clean-all delete all object files make clean-machine delete object files for one machine make mpi-stubs build dummy MPI library in STUBS make install-python install LAMMPS wrapper in Python make tar create lmp_src.tar.gz for src dir and packages make package list available packages and their dependencies make package-status (ps) status of all packages make package-installed (pi) list of installed packages make yes-package install a single pgk in src dir make no-package remove a single pkg from src dir make yes-all install all pgks in src dir make no-all remove all pkgs from src dir make yes-standard (yes-std) install all standard pkgs make no-standard (no-std) remove all standard pkgs make yes-user install all user pkgs make no-user remove all user pkgs make yes-lib install all pkgs with libs (included or ext) make no-lib remove all pkgs with libs (included or ext) make yes-ext install all pkgs with external libs make no-ext remove all pkgs with external libs make package-update (pu) replace src files with updated package files make package-overwrite replace package files with src files make package-diff (pd) diff src files against package files make lib-package help for download/build/install a package library make lib-package args="..." download/build/install a package library make purge purge obsolete copies of source files make machine build LAMMPS for machine make mode=lib machine build LAMMPS as static lib for machine make mode=shlib machine build LAMMPS as shared lib for machine make mode=shexe machine build LAMMPS as shared exe for machine make makelist create Makefile.list used by old makes make -f Makefile.list machine build LAMMPS for machine (old) machine is one of these from src/MAKE: # mpi = MPI with its default compiler # serial = GNU g++ compiler, no MPI ... or one of these from src/MAKE/OPTIONS: # big = MPI with its default compiler, BIGBIG switch # fftw = MPI with its default compiler, FFTW3 support # g++_mpich = MPICH with compiler set to GNU g++ # g++_mpich_link = GNU g++ compiler, link to MPICH # g++_openmpi = OpenMPI with compiler set to GNU g++ # g++_openmpi_link = GNU g++ compiler, link to OpenMPI # gpu = GPU package, MPI with its default compiler # g++_serial = GNU g++ compiler, no MPI # icc_mpich = MPICH with compiler set to Intel icc # icc_mpich_link = Intel icc compiler, link to MPICH # icc_openmpi = OpenMPI with compiler set to Intel icc # icc_openmpi_link = Intel icc compiler, link to OpenMPI # icc_serial = Intel icc compiler, no MPI # intel_phi = USER-INTEL package with Phi offload support, Intel MPI, MKL FFT # intel_cpu_intelmpi = USER-INTEL package, Intel MPI, MKL FFT # intel_cpu_intelmpi = USER-INTEL package, Intel MPI, MKL FFT # intel_cpu_mpich = USER-INTEL package, MPICH with compiler set to Intel icc # intel_cpu_openmpi = USER-INTEL package, OpenMPI with compiler set to Intel icc # jpeg = default MPI compiler, default MPI, JPEG support # knl = Flags for Knights Landing Xeon Phi Processor,Intel Compiler/MPI,MKL FFT # kokkos_cuda_mpi = KOKKOS/CUDA package, MPICH or OpenMPI with nvcc compiler, Kepler GPU # kokkos_mpi_only = KOKKOS package, no threading, MPI with its default compiler # kokkos_omp = KOKKOS/OMP package, MPI with its default compiler # kokkos_phi = KOKKOS package with PHI support, Intel compiler, default MPI # mgptfast = MPI with its default compiler, optimizations for USER-MGPT # omp = USER-OMP package, MPI with its default compiler # opt = OPT package, MPI with its default compiler # pgi_mpich_link = Portland group compiler, link to MPICH # png = default MPI compiler, default MPI, PNG support ... or one of these from src/MAKE/MACHINES: # linux = RedHat Linux box, Intel icc, MPICH2, FFTW # bgl = LLNL Blue Gene Light machine, xlC, native MPI, FFTW # bgq = IBM Blue Gene/Q, multiple compiler options, native MPI, ALCF FFTW2 # multiple compiler options for BGQ # chama - Intel SandyBridge, mpic++, openmpi, no FFTW # cori2 = NERSC Cori II KNL, static build, FFTW (single precision) # cygwin = Windows Cygwin, mpicxx, MPICH, FFTW # glory = Linux cluster with 4-way quad cores, Intel mpicxx, native MPI, FFTW # mpi = MPI with its default compiler # jaguar = ORNL Jaguar Cray XT5, CC, native MPICH, FFTW # mac = Apple PowerBook G4 laptop, c++, no MPI, FFTW 2.1.5 # mac_mpi = Apple laptop, MacPorts Open MPI 1.4.3, gcc 4.8, fftw, jpeg # mingw32-cross = Win 32-bit, gcc-4.7.1, MinGW, internal FFT, no MPI, OpenMP # mingw32-cross-mpi = Win 32-bit, gcc-4.7.1, MinGW, internal FFT, MPICH2, OpenMP # mingw64-cross = Win 64-bit, gcc-4.7.1, MinGW, internal FFT, no MPI, OpenMP # mingw64-cross-mpi = Win 64-bit, gcc-4.7.1, MinGW, internal FFT, MPICH2, OpenMP # myrinet = cluster, g++, myrinet MPI, no FFTs # power = IBM Power5+, mpCC_r, native MPI, FFTW # redsky - SUN X6275 nodes, Nehalem procs, mpic++, openmpi, OpenMP, no FFTW # serial = RedHat Linux box, g++4, no MPI, no FFTs # stampede = Intel Compiler, MKL FFT, Offload to Xeon Phi # storm = Cray Red Storm XT3, Cray CC, native MPI, FFTW # tacc = UT Lonestar TACC machine, mpiCC, MPI, FFTW # ubuntu = Ubuntu Linux box, g++, openmpi, FFTW3 # ubuntu_simple = Ubuntu Linux box, g++, openmpi, KISS FFT # kokkos_cuda = KOKKOS/CUDA package, OpenMPI with nvcc compiler, Kepler GPU # xe6 = Cray XE6, Cray CC, native MPI, FFTW # xt3 = PSC BigBen Cray XT3, CC, native MPI, FFTW # xt5 = Cray XT5, Cray CC, native MPI, FFTW ... or one of these from src/MAKE/MINE: ``` We are building for `mpi` with its default compilers. Before proceeding, one should check the required packages for the simulation code, and re-install the package. It's always good to do `make clean-all` and `make no-all` first to ensure a clean compilation. ## Multi-core CPU Multi-threading in LAMMPS require the OPT package. The installation is simple: just type in `make yes-opt` with the rest of the required packages, then `make mpi` or `make ubuntu_simple`. ``` shell ~/lammps/src$ make yes-opt Installing package opt ~/lammps/src$ make mpi make[1]: Entering directory '/home/a***o/lammps/src/Obj_mpi' cc -O -o fastdep.exe ../DEPEND/fastdep.c <...lots of output...> size ../lmp_mpi text data bss dec hex filename 3804320 8056 14184 3826560 3a6380 ../lmp_mpi make[1]: Leaving directory '/home/a***o/lammps/src/Obj_mpi' ``` There might be warnings during the process of creating the executable, but it is okay to ignore them. ``` shell ~/lammps/src$ find . -name lmp_mpi ./lmp_mpi ``` The file is located in `/src`, and on may use the `cp` command to copy it to other places for further use. To start the simulation with multi-core GPU, type the following command: ``` shell mpirun -np 8 lmp_mpi -sf opt -in your_simulation.lmp ``` The number after `-np` refers to the 8 threads on my Xeon CPU, and the `opt` option after `-sf` indicates that OPT package is being used. If you would like to start the simulation without an input file, remove `-in`: ``` shell ~/lammps/src$ mpirun -np 8 lmp_mpi -sf opt LAMMPS (11 May 2018) ``` ## GPU The problem with GPU is more complicated. Since the GPU package is built upon a specific sm architectures (corresponding to the compute capability), we need to rebuild our library. ``` shell ~/lammps/src$ cd .. ~/lammps$ cd lib ~/lammps/lib$ cd gpu ~/lammps/lib/gpu$ ls -a ~/lammps/lib/gpu$ vim Makefile.linux.double ``` Go to the `lib/gpu` directory, and open the file named `Makefile.linux`: ``` vim # /* ---------------------------------------------------------------------- # Generic Linux Makefile for CUDA # - Change CUDA_ARCH for your GPU # ------------------------------------------------------------------------- */ # which file will be copied to Makefile.lammps EXTRAMAKE = Makefile.lammps.standard CUDA_HOME = /usr/local/cuda NVCC = nvcc # Kepler CUDA #CUDA_ARCH = -arch=sm_35 # Tesla CUDA CUDA_ARCH = -arch=sm_21 # newer CUDA #CUDA_ARCH = -arch=sm_13 # older CUDA #CUDA_ARCH = -arch=sm_10 -DCUDA_PRE_THREE # this setting should match LAMMPS Makefile # one of LAMMPS_SMALLBIG (default), LAMMPS_BIGBIG and LAMMPS_SMALLSMALL LMP_INC = -DLAMMPS_SMALLBIG ``` Since the default path of CUDA 9.0 is `/local/cuda-9.0`, the `CUDA_HOME` variable should be changed accordingly. Also, the number after `sm_` shoud match the compute capability of the current Nvidia GPU. For more information, refer to the [official website](https://developer.nvidia.com/cuda-gpus). If a former build exists, use `make -f clean` to remove them. Otherwise, type `make -f Makefile.linux` into the terminal: ``` shell ~/lammps/lib/gpu$ make -f Makefile.linux <...lots of output...> mpicxx -DMPI_GERYON -DUCL_NO_EXIT -DMPICH_IGNORE_CXX_SEEK -DOMPI_SKIP_MPICXX=1 -fPIC -O2 -DLAMMPS_SMALLBIG -D_SINGLE_DOUBLE -I/usr/local/cuda-9.0/include -DUSE_CUDPP -Icudpp_mini -o nvc_get_devices ./geryon/ucl_get_devices.cpp -DUCL_CUDADR -L/usr/local/cuda-9.0/lib64 -lcuda ``` A new executable `nvc_get_devices` should appear in the directory. Run it to see if LAMMPS can get the corrent information of your gpu. ``` shell ~/lammps/lib/gpu$ ./nvc_get_devices Found 1 platform(s). Using platform: NVIDIA Corporation NVIDIA CUDA Driver CUDA Driver Version: 9.0 Device 0: "Quadro M2000M" Type of device: GPU Compute capability: 5 Double precision support: Yes Total amount of global memory: 3.94745 GB Number of compute units/multiprocessors: 5 Number of cores: 960 Total amount of constant memory: 65536 bytes Total amount of local/shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per block: 1024 Maximum group size (# of threads per block) 1024 x 1024 x 64 Maximum item sizes (# threads for each dim) 2147483647 x 65535 x 65535 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Clock rate: 1.137 GHz Run time limit on kernels: Yes Integrated: No Support host page-locked memory mapping: Yes Compute mode: Default Concurrent kernel execution: Yes Device has ECC support enabled: No ``` Finally, go back to `/src`, install the new gpu package, and compile with MPI. ``` shell ~\lammps\src$ make yes-gpu Installing package gpu ~lammps/src$ make mpi make[1]: Entering directory '/home/a***o/lammps/src/Obj_mpi' cc -O -o fastdep.exe ../DEPEND/fastdep.c <...lots of output...> size ../lmp_mpi text data bss dec hex filename 8563129 12048 163568 8738745 8557b9 ../lmp_mpi make[1]: Leaving directory '/home/a***o/lammps/src/Obj_mpi' ``` Again, there will be lots of warnings, but it's perfectly normal. To run with GPU package, type: ``` shell mpirun lmp_mpi -sf gpu -pk gpu 1 -in your_simulation.lmp ``` While `-sf` indicates that GPU package is being used, `gpu 1` specifies the exact GPU being used (in case that multiple exist). If your build is successful, you should something like the following when querying the GPU usage through `watch -n 1 nvidia-smi`: ``` shell ~\lammps\src$ mpirun lmp_mpi -sf gpu -pk gpu 1 LAMMPS (16 Mar 2018) ``` ``` shell | 0 30068 C lmp_mpi 26MiB | | 0 30069 C lmp_mpi 26MiB | | 0 30070 C lmp_mpi 26MiB | | 0 30071 C lmp_mpi 26MiB | +-----------------------------------------------------------------------------+ ```