NNP training procedure

Warning

This part of the documentation is incomplete!

Creating a new high-dimensional neural network potential from scratch is usually a multi-step iterative procedure which includes

data set generation/augmentation,
selection/refinement/pruning of symmetry function parameters,
experimenting with neural network topology/settings,
the actual neural network training, i.e. optimizing the weight parameters.

Tools are ready-made for most of these tasks and can be combined to generate a new NNP from scratch. Here is a rough guideline of the individual steps from a set of configurations to an initial NN potential. However, be aware that creating a reliable NNP usually requires repeated data set refining and adequate testing before it is ready for production!

Step 1: Data set

Prepare a data set (name the file input.data) in the file format described here. Unfortunately there’s no simple recipe how to create a “good” set of configurations, you will find more hints in the literature. As a starting point you could try with some (100+) configurations taken from an ab initio MD simulation.

Step 2: Prepare settings file

Prepare a settings file (name the file input.nn): Use the recommended file here and change the settings in the “GENERAL NNP SETTINGS” section according to your system.

Step 3 (optional): Data set normalization

Note

This step is usually not necessary any more because data set normalization is handled on-the-fly during training with the keyword normalize_data_set.

As explained here it may be useful to ensure that training is independent of the chosen unit system. With input.nn and input.data present run the tool nnp-norm which implements a normalization procedure (see chapter 3.1 in [1]) for this purpose. This will write an additional header to the settings file input.nn with three new keyword-value pairs. These will instruct other n2p2 tools to enable the data set normalization during runtime. Whenever the data set is changed, do not forget to repeat this step.

Important

Besides the addition of the normalization header no other actions are required to enable data set normalization for all other steps below. In particular, neither the data set nor other unit system dependent settings (e.g. cutoff radii, some symmetry function parameters) need to be converted manually. Any unit conversion will be handled internally and no user intervention is necessary.

Step 4: Symmetry function setup

Change the symmetry function definitions in the “SYMMETRY FUNCTIONS” section of the input.nn file.. again this not a trivial task, please find more information in the literature [2] [3] [4]. See also the description of the symfunction_short keyword here.

Note

There is a very useful standalone Python tool written by Florian Buchner (see his pull request) which allows to create sets of symmetry function lines following the guidelines given in [3] and [4]. To use it, just copy the file sfparamgen.py to a local directory and follow the instructions given in this Jupyter notebook.

Step 5: Compute symmetry function statistics

With the files input.data and input.nn ready in the same directoy, run the tool nnp-scaling (supports MPI parallelization). This will compute all symmetry functions for all atoms once and store statistics about them in a third file (scaling.data) required for training.

Step 6: NNP Training

Run the actual training program nnp-train, preferably in parallel via: mpirun -np 16 nnp-train Be aware of the memory footprint which is estimated in the previous step (see end of log file or screen output).

Step 7: Collect weight files

Upon training weight files are created for each epoch weights.???.<epoch>. Select an epoch with satisfying RMSE and rename the corresponding weight files to weights.???.data (??? is the atomic number of elements occurring).

Step 8: Prediction and MD simulation

Try the potential by predicting energies and forces for a new configuration: Collect the files from training (input.nn, scaling.data and weights.???) in a folder together with a single configuration (named again input.data) and run the tool nnp-predict. Alternatively, try to run a MD simulation with LAMMPS (see setup instructions here).

Please also have a look at the examples directory which provides working example setups for each tool. If there are problems don’t hesitate to ask again…