# n2p2 - CG descriptor analysis¶

This module adds tools to the *n2p2* package which allow to assess the quality of
atomic environment descriptors. This is particularly useful when designing a
neural network potential based coarse-grained model (NNP-CG).

## Purpose of Module¶

Creating a coarse-grained (CG) model from the full description of a system is a
two-step process: (1) selecting a reduced set of degrees of freedom and (2)
defining interactions depending on these coarse-grained variables. For example,
in a common coarse-graining approach for molecular systems the atomistic picture
is replaced by a simpler description with CG particles sitting at the
center-of-mass coordinates of the actual molecules. The corresponding
interactions between CG sites can be modelled with empirical force fields but
also, as has been recently shown in [1] and [2], with machine learning
potentials. To simplify the construction of NNP based coarse-grained models in
*n2p2*, this module adds software to estimate the quality of atomic environment
descriptors, which in turn hints on the expected performance of the
coarse-grained description.

The overall goal of the descriptor analysis is to show qualitatively whether there is a correlation between the raw atomic environment descriptors (and their derivatives) and the atomic forces. If no or very little correlation can be found we can assume that the descriptors do not encode enough information to construct a (free) energy landscape. On the other hand, if “similar” descriptors correspond to “similar” forces there is a good chance that a machine learning algorithm is capable of detecting this link and a machine learning potential can be fitted. In order to find a possible correlation between descriptors and forces the following approach is used: First, a clustering algorithm (k-means or HDBSCAN) searches for groups in the high-dimensional descriptor space of all atoms. Then, for every detected cluster the statistical distribution of the corresponding atomic forces is compared to the statistics of all remaining atomic forces. A hypothesis test (Welch’s t-test) is applied to decide whether the link between descriptors and forces is statistically significant. The percentage of clusters which show a clear link is then an indicator for a good descriptor-force correlation.

In order to perform the analysis described above *n2p2* was extended by two
software pieces:

**A new application based on the C++ libraries:**nnp-atomenvThis application allows to generate files containing the atomic environment data required for the cluster analysis.

**A new Jupyter notebook with the actual analysis:**analyze-descriptors.ipynbThe script depends on common Python libraries (

*numpy*,*scipy*,*scikit-learn*) and reads in data provided by`nnp-atomenv`

. It then clusters the data, performs statistical tests and presents graphical results.

## Background Information¶

This module is based on *n2p2*, a C++ code for generation and application of
neural network potentials used in molecular dynamics simulations. The source
code and documentation are located here:

*n2p2*documentation: http://compphysvienna.github.io/n2p2/*n2p2*source code: http://github.com/CompPhysVienna/n2p2

## Building and Testing¶

The code changes from this module are already merged with the main *n2p2*
repository (see the section below for corresponding pull requests).

Note

By the time of reading these instructions *n2p2* was most likely
developed further. To recall the state of the software at the time of writing
these instructions please use these commands:

```
git clone https://github.com/CompPhysVienna/n2p2
cd n2p2
git checkout 3cfe391377d2792ac29baf8394b3dce712afdad2
```

To build the new tool `nnp-atomenv`

the usual n2p2 build instructions apply:

```
cd src
make nnp-atomenv -j
```

The `analyze-descriptors.ipynb`

Jupyter notebook requires some Python packages
to be installed:

- numpy
- scipy
- matplotlib
- seaborn
- scikit-learn
- hdbscan
- pickle

Step-by-step instructions on how the descriptor analysis is prepared and performed is available at this dedicated documentation page

Regression testing is used in *n2p2* automatically for each commit to the main
repository. This module also adds the corresponding tests for the
`nnp-atomenv`

tool in `test/cpp/`

. The build log showing the correct run of
tests is available here.

## Source Code¶

The new functionality introduced by this module is collected in two pull requests:

- New tool for symmetry function quality analysis
- Complete coarse-graining/descriptor analysis documentation

The easiest way to view the source code changes is to use the *Files changed*
tab in the above pull request pages.

[1] | Zhang, L.; Han, J.; Wang, H.; Car, R.; E, W. DeePCG: Constructing Coarse-Grained Models via Deep Neural Networks. J. Chem. Phys. 2018, 149 (3), 034101. |

[2] | John, S. T.; Csányi, G. Many-Body Coarse-Grained Interactions Using Gaussian Approximation Potentials. J. Phys. Chem. B 2017, 121 (48), 10934–10949. |