HTC Multi-node Tasks

This module is the third in a sequence that will form the overall capabilities of the HTC library (see HTC Library Configuration in YAML for the previous module). This module deals with enabling tasks to be run over a set of nodes (specifically MPI/OpenMP tasks).

Purpose of Module

The initial goal is to allow the HTC library to control tasks that are executed via the MPI launcher command. The task tracked by Dask is actually the process created by the launcher. The launcher is a forked process from within the library.

The implementation is intended to be generic but the specific example implementation provided is for srun launcher that is used on JURECA system.

Background Information

This module builds upon the work described in HTC Library Configuration in YAML.

Building and Testing

The library is a Python module and can be installed with

python setup.py install

More details about how to install a Python package can be found at, for example, Install Python packages on the research computing systems at IU

To run the tests for the decorators within the library, you need the pytest Python package. You can run all the relevant tests from the jobqueue_features directory with

pytest tests/test_mpi_wrapper.py

Specific examples of usage for the JURECA system are available in the examples subdirectory.

Source Code

The latest version of the library is available on the jobqueue_features GitHub repository

The code that was originally created specifically for this module can be seen in the HTC/MPI Merge Request which can be found in the original private repository of the code. Additional, more complex, examples were provided in the HTC/MPI examples Merge Request