AMD ROCm containers
Link to section 'What is AMD ROCm' of 'AMD ROCm containers' What is AMD ROCm
The AMD Infinity Hub contains a collection of advanced AMD GPU software containers and deployment guides for HPC, AI & Machine Learning applications, enabling researchers to speed up their time to science. Containerized applications run quickly and reliably in the high performance computing environment with full support of AMD GPUs. A collection of Infinity Hub tools were deployed to extend cluster capabilities and to enable powerful software and deliver the fastest results. By utilizing Singularity and Infinity Hub ROCm-enabled containers, users can focus on building lean models, producing optimal solutions and gathering faster insights. For more information, please visit AMD Infinity Hub.
Link to section 'Getting Started' of 'AMD ROCm containers' Getting Started
Users can download ROCm containers from the AMD Infinity Hub and run them directly using Singularity instructions from the corresponding container’s catalog page.
In addition, a subset of pre-downloaded ROCm containers wrapped into convenient software modules are provided. These modules wrap underlying complexity and provide the same commands that are expected from non-containerized versions of each application.
On clusters equipped with AMD GPUs, type the command below to see the lists of ROCm containers we deployed.
module load rocmcontainers
module avail
------------ ROCm-based application container modules for AMD GPUs -------------
cp2k/20210311--h87ec1599
deepspeed/rocm4.2_ubuntu18.04_py3.6_pytorch_1.8.1
gromacs/2020.3 (D)
namd/2.15a2
openmm/7.4.2
pytorch/1.8.1-rocm4.2-ubuntu18.04-py3.6
pytorch/1.9.0-rocm4.2-ubuntu18.04-py3.6 (D)
specfem3d/20201122--h9c0626d1
specfem3d_globe/20210322--h1ee10977
tensorflow/2.5-rocm4.2-dev
[....]
Link to section 'Deployed Applications' of 'AMD ROCm containers' Deployed Applications
cp2k
Link to section 'Description' of 'cp2k' Description
CP2K is a quantum chemistry and solid state physics software package that can perform atomistic simulations of solid state, liquid, molecular, periodic, material, crystal, and biological systems. CP2K provides a general framework for different modeling methods such as DFT using the mixed Gaussian and plane waves approaches GPW and GAPW. Supported theory levels include DFTB, LDA, GGA, MP2, RPA, semi-empirical methods AM1, PM3, PM6, RM1, MNDO, ..., and classical force fields AMBER, CHARMM, .... CP2K can do simulations of molecular dynamics, metadynamics, Monte Carlo, Ehrenfest dynamics, vibrational analysis, core level spectroscopy, energy minimization, and transition state optimization using NEB or dimer method. CP2K is written in Fortran 2008 and can be run efficiently in parallel using a combination of multi-threading, MPI, and HIP/CUDA.
Link to section 'Versions' of 'cp2k' Versions
- Bell: 8.2, 20210311--h87ec1599
- Negishi: 8.2, 20210311--h87ec1599
Link to section 'Module' of 'cp2k' Module
You can load the modules by:
module load rocmcontainers
module load cp2k
Link to section 'Example job' of 'cp2k' Example job
Using #!/bin/sh -l
as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash
instead.
To run cp2k on our clusters:
#!/bin/bash
#SBATCH -A gpu
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 8
#SBATCH --gpus-per-node=1
#SBATCH --job-name=cp2k
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
module --force purge
ml rocmcontainers cp2k
deepspeed
Link to section 'Description' of 'deepspeed' Description
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Link to section 'Versions' of 'deepspeed' Versions
- Bell: rocm4.2_ubuntu18.04_py3.6_pytorch_1.8.1
- Negishi: rocm4.2_ubuntu18.04_py3.6_pytorch_1.8.1
Link to section 'Module' of 'deepspeed' Module
You can load the modules by:
module load rocmcontainers
module load deepspeed
Link to section 'Example job' of 'deepspeed' Example job
Using #!/bin/sh -l
as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash
instead.
To run deepspeed on our clusters:
#!/bin/bash
#SBATCH -A gpu
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 8
#SBATCH --gpus-per-node=1
#SBATCH --job-name=deepspeed
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
module --force purge
ml rocmcontainers deepspeed
gromacs
Link to section 'Description' of 'gromacs' Description
GROMACS is a molecular dynamics application designed to simulate Newtonian equations of motion for systems with hundreds to millions of particles. GROMACS is designed to simulate biochemical molecules like proteins, lipids, and nucleic acids that have a lot of complicated bonded interactions.
Link to section 'Versions' of 'gromacs' Versions
- Bell: 2020.3, 2022.3.amd1
- Negishi: 2020.3, 2022.3.amd1
Link to section 'Module' of 'gromacs' Module
You can load the modules by:
module load rocmcontainers
module load gromacs
Link to section 'Example job' of 'gromacs' Example job
Using #!/bin/sh -l
as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash
instead.
To run gromacs on our clusters:
#!/bin/bash
#SBATCH -A gpu
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 8
#SBATCH --gpus-per-node=1
#SBATCH --job-name=gromacs
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
module --force purge
ml rocmcontainers gromacs
lammps
Link to section 'Description' of 'lammps' Description
LAMMPS stands for Large-scale Atomic/Molecular Massively Parallel Simulator and is a classical molecular dynamics MD code.
Link to section 'Versions' of 'lammps' Versions
- Bell: 2022.5.04
- Negishi: 2022.5.04
Link to section 'Module' of 'lammps' Module
You can load the modules by:
module load rocmcontainers
module load lammps
Link to section 'Example job' of 'lammps' Example job
Using #!/bin/sh -l
as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash
instead.
To run lammps on our clusters:
#!/bin/bash
#SBATCH -A gpu
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 8
#SBATCH --gpus-per-node=1
#SBATCH --job-name=lammps
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
module --force purge
ml rocmcontainers lammps
namd
Link to section 'Description' of 'namd' Description
NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.
Link to section 'Versions' of 'namd' Versions
- Bell: 2.15a2, 3.0a9
- Negishi: 2.15a2, 3.0a9
Link to section 'Module' of 'namd' Module
You can load the modules by:
module load rocmcontainers
module load namd
Link to section 'Example job' of 'namd' Example job
Using #!/bin/sh -l
as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash
instead.
To run namd on our clusters:
#!/bin/bash
#SBATCH -A gpu
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 8
#SBATCH --gpus-per-node=1
#SBATCH --job-name=namd
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
module --force purge
ml rocmcontainers namd
openmm
Link to section 'Description' of 'openmm' Description
OpenMM is a high-performance toolkit for molecular simulation. It can be used as an application, a library, or a flexible programming environment. OpenMM includes extensive language bindings for Python, C, C++, and even Fortran. The code is open source and developed on GitHub, licensed under MIT and LGPL.
Link to section 'Versions' of 'openmm' Versions
- Bell: 7.4.2, 7.7.0
- Negishi: 7.4.2, 7.7.0
Link to section 'Module' of 'openmm' Module
You can load the modules by:
module load rocmcontainers
module load openmm
Link to section 'Example job' of 'openmm' Example job
Using #!/bin/sh -l
as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash
instead.
To run openmm on our clusters:
#!/bin/bash
#SBATCH -A gpu
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 8
#SBATCH --gpus-per-node=1
#SBATCH --job-name=openmm
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
module --force purge
ml rocmcontainers openmm
pytorch
Link to section 'Description' of 'pytorch' Description
PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
Link to section 'Versions' of 'pytorch' Versions
- Bell: 1.8.1-rocm4.2-ubuntu18.04-py3.6, 1.9.0-rocm4.2-ubuntu18.04-py3.6, 1.10.0-rocm5.0-ubuntu18.04-py3.7
- Negishi: 1.8.1-rocm4.2-ubuntu18.04-py3.6, 1.9.0-rocm4.2-ubuntu18.04-py3.6, 1.10.0-rocm5.0-ubuntu18.04-py3.7
Link to section 'Module' of 'pytorch' Module
You can load the modules by:
module load rocmcontainers
module load pytorch
Link to section 'Example job' of 'pytorch' Example job
Using #!/bin/sh -l
as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash
instead.
To run pytorch on our clusters:
#!/bin/bash
#SBATCH -A gpu
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 8
#SBATCH --gpus-per-node=1
#SBATCH --job-name=pytorch
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
module --force purge
ml rocmcontainers pytorch
rochpcg
Link to section 'Description' of 'rochpcg' Description
HPCG is a HPC benchmark intended to better represent computational and data access patterns that closely match a broad set of scientific workloads. This container implements the HPCG benchmark on top of AMDs ROCm platform.
Link to section 'Versions' of 'rochpcg' Versions
- Bell: 3.1.0
- Negishi: 3.1.0
Link to section 'Module' of 'rochpcg' Module
You can load the modules by:
module load rocmcontainers
module load rochpcg
Link to section 'Example job' of 'rochpcg' Example job
Using #!/bin/sh -l
as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash
instead.
To run rochpcg on our clusters:
#!/bin/bash
#SBATCH -A gpu
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 8
#SBATCH --gpus-per-node=1
#SBATCH --job-name=rochpcg
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
module --force purge
ml rocmcontainers rochpcg
rochpl
Link to section 'Description' of 'rochpl' Description
HPL, or High-Performance Linpack, is a benchmark which solves a uniformly random system of linear equations and reports floating-point execution rate. This container implements the HPL benchmark on top of AMDs ROCm platform.
Link to section 'Versions' of 'rochpl' Versions
- Bell: 5.0.5
- Negishi: 5.0.5
Link to section 'Module' of 'rochpl' Module
You can load the modules by:
module load rocmcontainers
module load rochpl
Link to section 'Example job' of 'rochpl' Example job
Using #!/bin/sh -l
as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash
instead.
To run rochpl on our clusters:
#!/bin/bash
#SBATCH -A gpu
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 8
#SBATCH --gpus-per-node=1
#SBATCH --job-name=rochpl
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
module --force purge
ml rocmcontainers rochpl
specfem3d
Link to section 'Description' of 'specfem3d' Description
SPECFEM3D Cartesian simulates acoustic fluid, elastic solid, coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra structured or not. It can, for instance, model seismic waves propagating in sedimentary basins or any other regional geological model following earthquakes. It can also be used for non-destructive testing or for ocean acoustics.
Link to section 'Versions' of 'specfem3d' Versions
- Bell: 20201122--h9c0626d1
- Negishi: 20201122--h9c0626d1
Link to section 'Module' of 'specfem3d' Module
You can load the modules by:
module load rocmcontainers
module load specfem3d
Link to section 'Example job' of 'specfem3d' Example job
Using #!/bin/sh -l
as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash
instead.
To run specfem3d on our clusters:
#!/bin/bash
#SBATCH -A gpu
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 8
#SBATCH --gpus-per-node=1
#SBATCH --job-name=specfem3d
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
module --force purge
ml rocmcontainers specfem3d
specfem3d_globe
Link to section 'Description' of 'specfem3d_globe' Description
SPECFEM3D Globe simulates global and regional continental-scale seismic wave propagation.
Link to section 'Versions' of 'specfem3d_globe' Versions
- Bell: 20210322--h1ee10977
- Negishi: 20210322--h1ee10977
Link to section 'Module' of 'specfem3d_globe' Module
You can load the modules by:
module load rocmcontainers
module load specfem3d_globe
Link to section 'Example job' of 'specfem3d_globe' Example job
Using #!/bin/sh -l
as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash
instead.
To run specfem3d_globe on our clusters:
#!/bin/bash
#SBATCH -A gpu
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 8
#SBATCH --gpus-per-node=1
#SBATCH --job-name=specfem3d_globe
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
module --force purge
ml rocmcontainers specfem3d_globe
tensorflow
Link to section 'Description' of 'tensorflow' Description
TensorFlow is an end-to-end open source platform for machine learning.
Link to section 'Versions' of 'tensorflow' Versions
- Bell: 2.5-rocm4.2-dev, 2.7-rocm5.0-dev
- Negishi: 2.5-rocm4.2-dev, 2.7-rocm5.0-dev
Link to section 'Module' of 'tensorflow' Module
You can load the modules by:
module load rocmcontainers
module load tensorflow
Link to section 'Example job' of 'tensorflow' Example job
Using #!/bin/sh -l
as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash
instead.
To run tensorflow on our clusters:
#!/bin/bash
#SBATCH -A gpu
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 8
#SBATCH --gpus-per-node=1
#SBATCH --job-name=tensorflow
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out
module --force purge
ml rocmcontainers tensorflow
This example demonstrates how to run Tensorflow on AMD GPUs with rocmcontainers modules.
First, prepare the matrix multiplication example from Tensorflow documentation:
# filename: matrixmult.py
import tensorflow as tf
# Log device placement
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
tf.debugging.set_log_device_placement(True)
# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
print(c)
Submit a Slurm job, making sure to request GPU-enabled queue and desired number of GPUs. For illustration purpose, the following example shows an interactive job submission, asking for one node (${resource.nodecores} cores) in the "gpu" account with and two GPUs for 6 hours, but the same applies to your production batch jobs as well:
sinteractive -A gpu -N 1 -n ${resource.nodecores} -t 6:00:00 --gres=gpu:2
salloc: Granted job allocation 5401130
salloc: Waiting for resource configuration
salloc: Nodes ${resource.hostname}-g000 are ready for job
Inside the job, load necessary modules:
module load rocmcontainers
module load tensorflow/2.5-rocm4.2-dev
And run the application as usual:
python matrixmult.py
Num GPUs Available: 2
[...]
2021-09-02 21:07:34.087607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 32252 MB memory) -> physical GPU (device: 0, name: Vega 20, pci bus id: 0000:83:00.0)
[...]
2021-09-02 21:07:36.265167: I tensorflow/core/common_runtime/eager/execute.cc:733] Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
2021-09-02 21:07:36.266755: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library librocblas.so
tf.Tensor(
[[22. 28.]
[49. 64.]], shape=(2, 2), dtype=float32)