Dask-traj¶

Purpose of Module
Current Limitations
Building and Testing
- Examples
Source Code

For analysis of MD simulations MDTraj is a fast and commonly used analysis. However MDTraj has limitations, such as the requirement that the whole trajectory and result of the computation fits into memory. This module rewrites part of MDTraj to work with Dask in order to achieve out-of-memory computations, and combined with dask-distributed results in possible out-of-machine parallelization, essential for HPCs and a (surprising) speed-up even on a single machine.

Purpose of Module ¶

Using MDTraj is a fast and easy way to analyze MD trajectories. However, MDTraj has a couple limitations:

The whole trajectory needs to fit into memory, or gathering results becomes inconvenient
The result of the computation also needs to fit into memory
All processes need access to all the memory, preventing out-of-machine parallelization, and HPC scaling

Dask-traj solves all 3 limitations by rewriting the MDTraj functions to work with dask.arrays. This is done for both the trajectory and the computation functions. As dask.arrays know how to spill to disk, this lifts the requirement to fit into memory on both.

Together with dask-distributed it also allows the computation to be executed in a distributed way, which allows scaling out of a single machine. In preliminary tests this approach even leads to a speedup on a single machine, which is surprising as MDTraj is already a parallel code.

The splitting of everything in Dask-traj is done in the time-axis of the MD trajectory and as a lot of analysis is embarrassingly parallel, this leads to nice non-communicating compute graphs as shown here.

Graph figure of a trajectory with 1251 split in chunks of 100 frames

Current Limitations ¶

One very important point of dask-traj is that we seek in the trajectory file. So if your files are stored in a format that does not have an efficient seek method, the loading of Trajectories will not get a speed-up, and might even be slower than MDTraj.

Also, due to the way the code is written in MDTraj, only a subset of functions are available at the moment, but this will be expanded further in the future. If you have a use-case that requires the conversion of a MDTraj functionality, not yet present in dask-traj, please make an issue and I will focus on that.

Building and Testing ¶

This code can be installed with conda using conda install -c dask_traj. To install the specific version associated with this module, use conda install -c conda-forge dask_traj==0.2.2

This code can also be installed with pip by running pip install dask-traj

Finally, this code can also be installed by downloading the source code (see the Source Code section below), and running python setup.py install from the root directory.

Tests for this module can be run with pytest. Install pytest with pip install pytest and then run the command py.test from within the directory with the source code, or py.test --pyargs dask_traj from anywhere after installation.

Examples ¶

The examples require some extra dependencies to be installed, namely: * jupyter * distributed * python-graphviz

Which are all installable through conda and pip.

An example on how to do analysis using Dask-traj can be found in dask-traj_example.ipynb
An example on how to combine dask-traj with dask.distributed can be found in dask-traj_distributed example.ipynb

These examples can also be found in the examples directory in the source code. They can be run by using jupyter notebook from that directory (see Jupyter notebook documentation at http://jupyter.org/ for more details)

Source Code ¶

The source code for this module, and modules that build on it, is hosted at https://github.com/sroet/dask-traj. This module specifically includes everything up to and including release 0.2.2

Dask-traj¶

Purpose of Module¶

Current Limitations¶

Building and Testing¶

Examples¶

Source Code¶

Purpose of Module ¶

Current Limitations ¶

Building and Testing ¶

Examples ¶

Source Code ¶