n2p2 - CG descriptor analysis¶
This module adds tools to the n2p2 package which allow to assess the quality of atomic environment descriptors. This is particularly useful when designing a neural network potential based coarse-grained model (NNP-CG).
Purpose of Module¶
Creating a coarse-grained (CG) model from the full description of a system is a two-step process: (1) selecting a reduced set of degrees of freedom and (2) defining interactions depending on these coarse-grained variables. For example, in a common coarse-graining approach for molecular systems the atomistic picture is replaced by a simpler description with CG particles sitting at the center-of-mass coordinates of the actual molecules. The corresponding interactions between CG sites can be modelled with empirical force fields but also, as has been recently shown in [1] and [2], with machine learning potentials. To simplify the construction of NNP based coarse-grained models in n2p2, this module adds software to estimate the quality of atomic environment descriptors, which in turn hints on the expected performance of the coarse-grained description.
The overall goal of the descriptor analysis is to show qualitatively whether there is a correlation between the raw atomic environment descriptors (and their derivatives) and the atomic forces. If no or very little correlation can be found we can assume that the descriptors do not encode enough information to construct a (free) energy landscape. On the other hand, if “similar” descriptors correspond to “similar” forces there is a good chance that a machine learning algorithm is capable of detecting this link and a machine learning potential can be fitted. In order to find a possible correlation between descriptors and forces the following approach is used: First, a clustering algorithm (k-means or HDBSCAN) searches for groups in the high-dimensional descriptor space of all atoms. Then, for every detected cluster the statistical distribution of the corresponding atomic forces is compared to the statistics of all remaining atomic forces. A hypothesis test (Welch’s t-test) is applied to decide whether the link between descriptors and forces is statistically significant. The percentage of clusters which show a clear link is then an indicator for a good descriptor-force correlation.
In order to perform the analysis described above n2p2 was extended by two software pieces:
A new application based on the C++ libraries: nnp-atomenv
This application allows to generate files containing the atomic environment data required for the cluster analysis.
A new Jupyter notebook with the actual analysis: analyze-descriptors.ipynb
The script depends on common Python libraries (numpy, scipy, scikit-learn) and reads in data provided by
nnp-atomenv
. It then clusters the data, performs statistical tests and presents graphical results.
Background Information¶
This module is based on n2p2, a C++ code for generation and application of neural network potentials used in molecular dynamics simulations. The source code and documentation are located here:
- n2p2 documentation: http://compphysvienna.github.io/n2p2/
- n2p2 source code: http://github.com/CompPhysVienna/n2p2
Building and Testing¶
The code changes from this module are already merged with the main n2p2 repository (see the section below for corresponding pull requests).
Note
By the time of reading these instructions n2p2 was most likely developed further. To recall the state of the software at the time of writing these instructions please use these commands:
git clone https://github.com/CompPhysVienna/n2p2
cd n2p2
git checkout 3cfe391377d2792ac29baf8394b3dce712afdad2
To build the new tool nnp-atomenv
the usual n2p2 build instructions apply:
cd src
make nnp-atomenv -j
The analyze-descriptors.ipynb
Jupyter notebook requires some Python packages
to be installed:
- numpy
- scipy
- matplotlib
- seaborn
- scikit-learn
- hdbscan
- pickle
Step-by-step instructions on how the descriptor analysis is prepared and performed is available at this dedicated documentation page
Regression testing is used in n2p2 automatically for each commit to the main
repository. This module also adds the corresponding tests for the
nnp-atomenv
tool in test/cpp/
. The build log showing the correct run of
tests is available here.
Source Code¶
The new functionality introduced by this module is collected in two pull requests:
- New tool for symmetry function quality analysis
- Complete coarse-graining/descriptor analysis documentation
The easiest way to view the source code changes is to use the Files changed tab in the above pull request pages.
[1] | Zhang, L.; Han, J.; Wang, H.; Car, R.; E, W. DeePCG: Constructing Coarse-Grained Models via Deep Neural Networks. J. Chem. Phys. 2018, 149 (3), 034101. |
[2] | John, S. T.; Csányi, G. Many-Body Coarse-Grained Interactions Using Gaussian Approximation Potentials. J. Phys. Chem. B 2017, 121 (48), 10934–10949. |