MPI support for EESSI-based containers

The European Environment for Scientific Software Installations (EESSI) is a collaboration between a number of academic and industrial partners in the HPC community to set up a shared stack of scientific software installations to avoid the installation and execution of sub-optimal applications on HPC resources. The software stack is intended to work on laptops, personal workstations, HPC clusters and in the cloud, which means the project will need to support different CPUs, networks, GPUs, and so on.

EESSI can be used through via containers, however this requires some additional settings for MPI workloads. This module outlines the creation of an initialisation script that can facilitate this while also catering to systems which have no direct connection to the internet.

Purpose of Module

The EESSI architecture is built upon the CernVM-FS distributed file system which provides a scalable, reliable and low-maintenance software distribution service. CernVM-FS uses a cache so that a client only ever has local copies of the files it actually needs. The cache is populated over the http protocol.

If CernVM-FS is not available or configured where a user would like to use EESSI, it is still possible to use EESSI via a Singularity container. The container approach, however, requires additional configuration when considering MPI workloads.

In addition, there are many cases where worker nodes in HPC systems have no connection to the outside world, which makes it impossible for them to populate their CernVM-FS cache.

This module describes a script created to address both of these issues.

Background Information

The European Environment for Scientific Software Installations EESSI is a collaboration between a number of academic and industrial partners in the HPC community. Through the EESSI project, they want to set up a shared stack of scientific software installations to avoid not only duplicate work across HPC sites but also the execution of sub-optimal applications on HPC resources.

The software stack is intended to work on laptops, personal workstations, HPC clusters and in the cloud, which means the project will need to support different CPUs, networks, GPUs, and so on. When using singularity containers which leverage EESSI on HPC systems there are additional requirements to ensure that MPI workloads can be correctly launched and run.

Building and Testing

The script itself can be downloaded as described in the next section. It includes extensive commenting and, at the time of writing, is configured to use the 2020.12 version of the EESSI pilot software stack. You should configure settings in the script according to the system you have access to.

The script creates two layers of caching for CernVM-FS, a global one and a per-node cache. The script should be run from a location that has external internet access and access to the shared file system of the HPC resource. The script will inspect the architecture where it is run, and fully pre-populate the cache with the software stack for that architecture. The per-node cache is then dynamically populated from the global cache.

After running the script, it will tell the user to set a number of environment variables, e.g.,

export EESSI_CONFIG="container:cvmfs2 cvmfs-config.eessi-hpc.org /cvmfs/cvmfs-config.eessi-hpc.org"
export EESSI_PILOT="container:cvmfs2 pilot.eessi-hpc.org /cvmfs/pilot.eessi-hpc.org"
export SINGULARITY_HOME="/p/project/cecam/singularity/cecam/ocais1/home:/home/ocais1"
export SINGULARITY_BIND="/p/project/cecam/singularity/cecam/alien_2020.12:/shared_alien,/tmp:/local_alien,/p/project/cecam/singularity/cecam/ocais1/home/default.local:/etc/cvmfs/default.local"
export SINGULARITY_SCRATCH="/var/lib/cvmfs,/var/run/cvmfs"

It will also tell you how to start a shell session within the container

singularity shell --fusemount "$EESSI_CONFIG" --fusemount "$EESSI_PILOT" /p/project/cecam/singularity/cecam/ocais1/client-pilot_centos7-x86_64.sif

Once inside the shell you are able to initialise the EESSI computing environment, which will give you access to all the software available within EESSI, and which you can access via environment modules. You can use the modules to access the software you are interested in, and to find the path to the executables you are interested in within the container. Let’s do this for GROMACS executable gmx_mpi.

Singularity> source /cvmfs/pilot.eessi-hpc.org/2020.12/init/bash
Found EESSI pilot repo @ /cvmfs/pilot.eessi-hpc.org/2020.12!
Using x86_64/intel/skylake_avx512 as software subdirectory.
Using /cvmfs/pilot.eessi-hpc.org/2020.12/software/x86_64/intel/skylake_avx512/modules/all as the directory to be added to MODULEPATH.
Found Lmod configuration file at /cvmfs/pilot.eessi-hpc.org/2020.12/software/x86_64/intel/skylake_avx512/.lmod/lmodrc.lua
Initializing Lmod...
Prepending /cvmfs/pilot.eessi-hpc.org/2020.12/software/x86_64/intel/skylake_avx512/modules/all to $MODULEPATH...
Environment set up to use EESSI pilot software stack, have fun!
[EESSI pilot 2020.12] $ module load GROMACS
[EESSI pilot 2020.12] $ which gmx_mpi
/cvmfs/pilot.eessi-hpc.org/2020.12/software/x86_64/intel/skylake_avx512/software/GROMACS/2020.1-foss-2020a-Python-3.8.2/bin/gmx_mpi

Now that we know the path to the executable within the container, we can call it directly from outside the container and use it within a batch job. We show how one can execute a GROMACS benchwork using the installation found inside EESSI (on JUWELS):

[juwels01 ~]$ SLURM_MPI_TYPE=pspmix OMP_NUM_THREADS=2 \
              srun --time=00:05:00 --nodes=1 --ntasks-per-node=24 --cpus-per-task=2 \
              singularity exec --fusemount "$EESSI_CONFIG" --fusemount "$EESSI_PILOT" \
              /p/project/cecam/singularity/cecam/ocais1/client-pilot_centos7-x86_64.sif \
              /cvmfs/pilot.eessi-hpc.org/2020.12/software/x86_64/intel/skylake_avx512/software/GROMACS/2020.1-foss-2020a-Python-3.8.2/bin/gmx_mpi \
              mdrun -s ion_channel.tpr -maxh 0.50 -resethway -noconfout -nsteps 10 -g logfile

Source Code

EESSI is still in a pilot phase, and for this reason the final version of this script cannot be created until the underlying requirements have stabilised. For the time being the script is contained in an issue in the EESSI filesystem layer repository.