Resampling Statistics¶
The module provides tools for resampling (e.g., statistical bootstrapping)
in the context of pandas
DataFrames in general and specifically
OpenPathSampling. This provides tools for estimating statistical error on
multiple quantities simultaneously.
Purpose of Module¶
Providing an estimate of uncertainty is essential when presenting scientific
results. This module provides the ability to perform statistical analysis of
a large simulation from OpenPathSampling by using the sampled trajectories
to create subsamples, which are then assumed to be independent. The
subsamples are analyzed separately, and this module makes it easy to obtain
mean, standard deviation, or percentile values. In particular, this module
provides the tools to do such an analysis on functions that return a table
of data using a pandas.DataFrame
object, as the OpenPathSampling rate
matrix calculation does.
Most of the code is generic, and could be used for any function that
produces a pandas.DataFrame
as its output. Therefore this module may
be useful for many projects other than OpenPathSampling. Within
OpenPathSampling, this can be used to obtain statistics on rates, fluxes,
and other such quantities.
These tools are implemented in two main classes. The first is
BlockResampling
, which organizes the input (MC steps in OPS) into blocks
to be passed to a function that does the analysis. This allows us to obtain
several results for the analysis. The second is ResamplingStatistics
,
which takes those blocks and a function (that returns a
pandas.DataFrame
) as input. It then applies that function to each of
those blocks, and then makes it easy to access properties such as the mean,
standard deviation, or percentile values for each frame element.
While BlockResampling
is the only resampling method implemented in the
module (as it is the one needed for TIS rate calculations), it would be
straightforward to extend this framework with other resampling methods, such
as variants of bootstrapping.
Background Information¶
This module builds on OpenPathSampling, a Python package for path sampling simulations. To learn more about OpenPathSampling, you might be interested in reading:
- OPS documentation: http://openpathsampling.org
- OPS source code: http://github.com/openpathsampling/openpathsampling
Testing¶
Tests in OpenPathSampling use the nose package.
This module has been included in the OpenPathSampling core. Its tests can
be run by setting up a developer install of OpenPathSampling and running
the command nosetests
from the root directory of the repository.
Source Code¶
This module has been merged into OpenPathSampling. It is composed of the following pull requests: