Extending available MPI runtime environments

This module is another in a sequence that form the overall capabilities of the HTC library (see HTC MPI-Aware Tasks for the most relevant previous module where support for forked MPI workloads was added). This module adds support for additional MPI runtimes to make the library a more portable solution between HPC systems.

Purpose of Module

This module extends the supported MPI runtimes of jobqueue_features, beyond the original SLURM and mpiexec, to OpenMPI, Intel MPI and MPICH. This support includes the relevant arguments to provide reasonable process pinning arguments to the runtimes based on the system architecture and resources requested for each worker.

Background Information

To date, we have only included MPI launchers that do not require complex configuration (srun and mpiexec). In order to extend the supported MPI launchers we also need to be able to take into account the distribution of processes and threads by the launcher. We have this information since it is dictated by the system configuration file and the arguments the user provides when creating the Dask cluster to which they submit their tasks.

The main goal here is to make a best effort mapping between the user request and the MPI launcher options that will distribute and pin the processes/threads across the target system.

Building and Testing

The library is a Python module and can be installed with

python setup.py install

More details about how to install a Python package can be found at, for example, Install Python packages on the research computing systems at IU

To run the tests for the MPI launchers within the library, you need the pytest Python package. You can run all the relevant tests from the jobqueue_features directory with

pytest tests/test_mpi_wrapper.py

Source Code

The latest version of the library is available on the jobqueue_features GitHub repository

The code that was originally created specifically for this module can be seen in the Merge Request that added support for OpenMPI and Intel MPI, and the Merge Request that added support for MPICH.