n2p2: Tools for Training Set Size Dependence

This module provides tools to analyse the training set size dependence of residual error of neural network potenials (NNPs). It is specifically written to be used with the NNP n2p2.

Purpose of Module

NNTSSD is a module that allows

  • automated dataset creation of varied sizes
  • training of the neural network
  • analysis of the learning curves obtained in the training process

in order to determine representative learning curves showing residual errors for varied sizes of training sets. It also provides tools that allow

  • the usage of external test sets, which might be useful for developing epoch optimization approaches
  • the usage of separate validation datasets, which are used to obtain TSSD curves that are independent from test sets that are used for epoch optimization
  • graphic representation of learning curves and training performance
  • a user-friendly way of running NNTSSD methods by filling in an input file

Other methods within the module allow

  • processing of input data (namely splitting datasets)
  • analysis of training performance (dependence of residual error of the number of training epochs)

Background Information

Neural network potentials are used in molecular dynamics simulation to reproduce potential energy surfaces of ab initio methods. This module addresses the question of dependence of the NNP’s prediction error (characterized by the RMSE in energy and forces) on the size of the training dataset.

Building, Testing and Examples

Building instructions for NNTSSD, information regarding software tests and and examples can be found here. The additions to n2p2 presented here are not yet merged with the main n2p2 repository. Before following the above instructions please check out the n2p2_training_size branch in the author’s fork of n2p2 using these commands:

git clone git@github.com:MadlenReiner/n2p2.git
cd n2p2
git checkout n2p2_training_size

Then, run the build process of n2p2

cd src
make

to create the training tools required for NNTSSD. In some cases it may be required to set paths to external libraries in src/makefile.gnu.

Source Code

The source code of this module can be found in the tools/python/NNTSSD/source of the n2p2_training_size branch in the author’s fork:

Another way of reviewing the code additions to n2p2 is to visit the corresponding pull request:

Change to the tab Files changed to get an overview of all changes.