Contact Map Parallelization¶
This module adds the ability to parallelize the calculation of contact
frequencies (see the contact map
module). It includes improvements to the
core of the contact_map package to facilitate parallelization, as well
as integration with a framework for practical parallelization.
Purpose of Module¶
Contacts are defined as when two atoms, or atoms within two groups of atoms (residues), are within some cutoff distance of each other. The contact map is the set of all contacts in a given snapshot. The contact frequency is the fraction of a trajectory in which each pair of contacts is present. The contact frequency therefore requires calculation of the contact map for each individual frame in the trajectory.
The original contact_map
code included OpenMP (shared-memory)
parallelization of the calculation of a single contact map (a loop over
atoms). Each contact map in a contact frequency (the loop over the frames of
a trajectory) was done sequentially. However, each frame is completely
independent, and can be processed on a separate node. This module implements
that parallelization.
This module interfaces with the dask.distributed
package for task-based
parallelization. The trajectory is separated into segments, with the
dask
network calculating the contact frequency of each segment in
parallel (reading from a common file source). Then the partial contact
frequencies are combined into one ContactFrequency
object. This also
includes methods, such as serialization into JSON strings, that would be
useful for parallelization by other tools.
Background Information¶
This is part of the contact map package, which in turn builds on tools in MDTraj.
The parallelization is based on dask.distributed
. See its docs for
details on setting up a dask scheduler/worker network.
Building and Testing¶
The contact_map
package can be installed with conda, using conda
install -c conda-forge contact_map
. This module is included in version
0.3.0
, which can be specifically installed with conda install -c
conda-forge contact_map==0.3.0
.
dask.distibuted
must be installed separately, which can be done with
conda install -c conda-forge dask distributed
.
Tests for this module can be run with pytest. Install pytest with pip
install pytest
and then run the command py.test
from within the
directory with the source code, or py.test --pyargs contact_map
from
anywhere after installation. Tests specific to integration with
dask.distributed
will be marked as “skipped” if that framework is
not installed.
Source Code¶
This module is composed of the following pull requests in the
contact_map
repository: