Contact Map Parallelization¶
This module adds the ability to parallelize the calculation of contact
frequencies (see the
contact map module). It includes improvements to the
core of the contact_map package to facilitate parallelization, as well
as integration with a framework for practical parallelization.
Contacts are defined as when two atoms, or atoms within two groups of atoms (residues), are within some cutoff distance of each other. The contact map is the set of all contacts in a given snapshot. The contact frequency is the fraction of a trajectory in which each pair of contacts is present. The contact frequency therefore requires calculation of the contact map for each individual frame in the trajectory.
contact_map code included OpenMP (shared-memory)
parallelization of the calculation of a single contact map (a loop over
atoms). Each contact map in a contact frequency (the loop over the frames of
a trajectory) was done sequentially. However, each frame is completely
independent, and can be processed on a separate node. This module implements
This module interfaces with the
dask.distributed package for task-based
parallelization. The trajectory is separated into segments, with the
dask network calculating the contact frequency of each segment in
parallel (reading from a common file source). Then the partial contact
frequencies are combined into one
ContactFrequency object. This also
includes methods, such as serialization into JSON strings, that would be
useful for parallelization by other tools.
The parallelization is based on
dask.distributed. See its docs for
details on setting up a dask scheduler/worker network.
contact_map package can be installed with conda, using
install -c conda-forge contact_map. This module is included in version
0.3.0, which can be specifically installed with
conda install -c
dask.distibuted must be installed separately, which can be done with
conda install -c conda-forge dask distributed.
Tests for this module can be run with pytest. Install pytest with
install pytest and then run the command
py.test from within the
directory with the source code, or
py.test --pyargs contact_map from
anywhere after installation. Tests specific to integration with
dask.distributed will be marked as “skipped” if that framework is