n2p2 - Symmetry Function Memory Footprint Reduction

This module improves memory management in n2p2. More specifically, a new strategy to store symmetry function derivatives is implemented. In this way the memory footprint during training is drastically reduced.

Purpose of Module

Training high-dimensional neural network potentials (HDNNPs) means to minimize the error between predictions and the reference information in a data set of atomic configurations. There, the desired potential energy surface is supplied in the form of an energy per configuration and forces acting on each atom. Consider the HDNNP expression for forces

F_{i,\alpha} = - \sum_{j=0}^{N_\text{atoms}}
\sum_{k=0}^{N_\text{sym.func.}} \frac{\partial E_j}{\partial G_{j,k}}
\frac{\partial G_{j,k}}{\partial x_{i, \alpha}},

where G_{j,k} denotes the k-th symmetry function of atom j. Only the first expression \frac{\partial E_j}{\partial
G_{j,k}} depends on the neural network weights and therefore changes during the training process. The symmetry function derivatives with respect to atom coordinates \frac{\partial G_{j,k}}{\partial x_{i, \alpha}}, however, stay fixed for each atomic configuration in the data set. Given the high computational cost of symmetry functions it is essential to pre-calculate and store them in memory. While this strategy speeds up the training procedure significantly [1] it also drastically increases the memory footprint, which easily reaches more than 100 GB for common data set sizes.

This module alters the core C++ library of n2p2 in order to reduce the memory consumption of all depending applications and provides benchmark results quantifying the improvement. The idea is to exploit that for specific combinations of neighboring atoms i and j, the expression \frac{\partial G_{j,k}}{\partial x_{i, \alpha}} always equals zero. Consider a three-component system with elements A, B and C. In addition, let atoms i and j be of element A and B, respectively. Then, the derivative of a symmetry function G_{j,k} with signature B-C (i.e. only sensitive to neighbor atoms of type C) with respect to i’s coordinates vanishes. Hence, by taking these element combination relations automatically into account a significant portion of the memory usage can be avoided. Depending on the symmetry function setup, savings of about 30 to 50% can be achieved for typical systems. These improvements will be particularly helpful for developing HDNNPs for coarse-grained models.

Code changes cover most of the classes in the libnnp core library where they add functionality to identify relevant (nonzero) element combinations for the symmetry function derivative computation. Additional CI tests ensure that results are not affected.

Background Information

This module is based on n2p2, a C++ code for generation and application of neural network potentials used in molecular dynamics simulations. The source code and documentation are located here:

Building and Testing

Because the change in memory management affects the core library of n2p2 several applications shipped with n2p2 will benefit from reduced memory consumption. However, the biggest effect can be observed during training with the nnp-train application. In the src directory type

make nnp-train

to build this n2p2 tool (see the build documentation for more details). Switch to one of the folders inside the examples/nnp-train directory and run nnp-train (after a successful build the binary is copied to the bin directory). The screen output will contain a section labelled SETUP: SYMMETRY FUNCTION MEMORY which will highlight the memory savings.

The code changes from this module are already merged with the main repository of n2p2 (see pull request). The improved memory management is enabled by default when n2p2 is compiled. However, there are use cases (see this discussion) where the “full” memory layout is more desirable. Hence, a compilation flag allows to switch between the two choices. The documentation also shows benchmark results which demonstrate the potential memory savings.

Regression testing is implemented in n2p2 and automatically performed upon submission of a pull request via Travis CI. The log file showing the successful pass of all tests for the specific pull request can be found here.