n2p2 - CG descriptor analysis

This module adds tools to the n2p2 package which allow to assess the quality of atomic environment descriptors. This is particularly useful when designing a neural network potential based coarse-grained model (NNP-CG).

Purpose of Module

Creating a coarse-grained (CG) model from the full description of a system is a two-step process: (1) selecting a reduced set of degrees of freedom and (2) defining interactions depending on these coarse-grained variables. For example, in a common coarse-graining approach for molecular systems the atomistic picture is replaced by a simpler description with CG particles sitting at the center-of-mass coordinates of the actual molecules. The corresponding interactions between CG sites can be modelled with empirical force fields but also, as has been recently shown in [1] and [2], with machine learning potentials. To simplify the construction of NNP based coarse-grained models in n2p2, this module adds software to estimate the quality of atomic environment descriptors, which in turn hints on the expected performance of the coarse-grained description.

The overall goal of the descriptor analysis is to show qualitatively whether there is a correlation between the raw atomic environment descriptors (and their derivatives) and the atomic forces. If no or very little correlation can be found we can assume that the descriptors do not encode enough information to construct a (free) energy landscape. On the other hand, if “similar” descriptors correspond to “similar” forces there is a good chance that a machine learning algorithm is capable of detecting this link and a machine learning potential can be fitted. In order to find a possible correlation between descriptors and forces the following approach is used: First, a clustering algorithm (k-means or HDBSCAN) searches for groups in the high-dimensional descriptor space of all atoms. Then, for every detected cluster the statistical distribution of the corresponding atomic forces is compared to the statistics of all remaining atomic forces. A hypothesis test (Welch’s t-test) is applied to decide whether the link between descriptors and forces is statistically significant. The percentage of clusters which show a clear link is then an indicator for a good descriptor-force correlation.

In order to perform the analysis described above n2p2 was extended by two software pieces:

  1. A new application based on the C++ libraries: nnp-atomenv

    This application allows to generate files containing the atomic environment data required for the cluster analysis.

  2. A new Jupyter notebook with the actual analysis: analyze-descriptors.ipynb

    The script depends on common Python libraries (numpy, scipy, scikit-learn) and reads in data provided by nnp-atomenv. It then clusters the data, performs statistical tests and presents graphical results.

Background Information

This module is based on n2p2, a C++ code for generation and application of neural network potentials used in molecular dynamics simulations. The source code and documentation are located here:

Building and Testing

The code changes from this module are already merged with the main n2p2 repository (see the section below for corresponding pull requests).

Note

By the time of reading these instructions n2p2 was most likely developed further. To recall the state of the software at the time of writing these instructions please use these commands:

git clone https://github.com/CompPhysVienna/n2p2
cd n2p2
git checkout 3cfe391377d2792ac29baf8394b3dce712afdad2

To build the new tool nnp-atomenv the usual n2p2 build instructions apply:

cd src
make nnp-atomenv -j

The analyze-descriptors.ipynb Jupyter notebook requires some Python packages to be installed:

  • numpy
  • scipy
  • matplotlib
  • seaborn
  • scikit-learn
  • hdbscan
  • pickle

Step-by-step instructions on how the descriptor analysis is prepared and performed is available at this dedicated documentation page

Regression testing is used in n2p2 automatically for each commit to the main repository. This module also adds the corresponding tests for the nnp-atomenv tool in test/cpp/. The build log showing the correct run of tests is available here.