For analysis of MD simulations MDTraj is a fast and commonly used analysis. However MDTraj has limitations, such as the requirement that the whole trajectory and result of the computation fits into memory. This module rewrites part of MDTraj to work with Dask in order to achieve out-of-memory computations, and combined with dask-distributed results in possible out-of-machine parallelization, essential for HPCs and a (surprising) speed-up even on a single machine.
Using MDTraj is a fast and easy way to analyze MD trajectories. However, MDTraj has a couple limitations:
- The whole trajectory needs to fit into memory, or gathering results becomes inconvenient
- The result of the computation also needs to fit into memory
- All processes need access to all the memory, preventing out-of-machine parallelization, and HPC scaling
Dask-traj solves all 3 limitations by rewriting the MDTraj functions to work with dask.arrays. This is done for both the trajectory and the computation functions. As dask.arrays know how to spill to disk, this lifts the requirement to fit into memory on both.
Together with dask-distributed it also allows the computation to be executed in a distributed way, which allows scaling out of a single machine. In preliminary tests this approach even leads to a speedup on a single machine, which is surprising as MDTraj is already a parallel code.
The splitting of everything in Dask-traj is done in the time-axis of the MD trajectory and as a lot of analysis is embarrassingly parallel, this leads to nice non-communicating compute graphs as shown here.
One very important point of dask-traj is that we
seek in the trajectory file.
So if your files are stored in a format that does not have an efficient seek
method, the loading of Trajectories will not get a speed-up, and might even be
slower than MDTraj.
Also, due to the way the code is written in MDTraj, only a subset of functions are available at the moment, but this will be expanded further in the future. If you have a use-case that requires the conversion of a MDTraj functionality, not yet present in dask-traj, please make an issue and I will focus on that.
This code can be installed with conda using
conda install -c dask_traj. To
install the specific version associated with this module, use
conda install -c
This code can also be installed with pip by running
pip install dask-traj
Finally, this code can also be installed by downloading the source code (see the
Code section below), and running
python setup.py install from the root
Tests for this module can be run with pytest. Install pytest with
install pytest and then run the command
py.test from within the
directory with the source code, or
py.test --pyargs dask_traj from
anywhere after installation.
The examples require some extra dependencies to be installed, namely: * jupyter * distributed * python-graphviz
Which are all installable through conda and pip.
- An example on how to do analysis using Dask-traj can be found in dask-traj_example.ipynb
- An example on how to combine dask-traj with dask.distributed can be found in dask-traj_distributed example.ipynb
These examples can also be found in the
examples directory in the source code. They can be run by
jupyter notebook from that directory (see
Jupyter notebook documentation at http://jupyter.org/ for more details)