Contact Map Parallelization

This module adds the ability to parallelize the calculation of contact frequencies (see the contact map module). It includes improvements to the core of the contact_map package to facilitate parallelization, as well as integration with a framework for practical parallelization.

Purpose of Module

Contacts are defined as when two atoms, or atoms within two groups of atoms (residues), are within some cutoff distance of each other. The contact map is the set of all contacts in a given snapshot. The contact frequency is the fraction of a trajectory in which each pair of contacts is present. The contact frequency therefore requires calculation of the contact map for each individual frame in the trajectory.

The original contact_map code included OpenMP (shared-memory) parallelization of the calculation of a single contact map (a loop over atoms). Each contact map in a contact frequency (the loop over the frames of a trajectory) was done sequentially. However, each frame is completely independent, and can be processed on a separate node. This module implements that parallelization.

This module interfaces with the dask.distributed package for task-based parallelization. The trajectory is separated into segments, with the dask network calculating the contact frequency of each segment in parallel (reading from a common file source). Then the partial contact frequencies are combined into one ContactFrequency object. This also includes methods, such as serialization into JSON strings, that would be useful for parallelization by other tools.

Background Information

This is part of the contact map package, which in turn builds on tools in MDTraj.

The parallelization is based on dask.distributed. See its docs for details on setting up a dask scheduler/worker network.

Building and Testing

The contact_map package can be installed with conda, using conda install -c conda-forge contact_map. This module is included in version 0.3.0, which can be specifically installed with conda install -c conda-forge contact_map==0.3.0.

dask.distibuted must be installed separately, which can be done with conda install -c conda-forge dask distributed.

Tests for this module can be run with pytest. Install pytest with pip install pytest and then run the command py.test from within the directory with the source code, or py.test --pyargs contact_map from anywhere after installation. Tests specific to integration with dask.distributed will be marked as “skipped” if that framework is not installed.