NNP training procedure
Warning
This part of the documentation is incomplete!
Creating a new high-dimensional neural network potential from scratch is usually a multi-step iterative procedure which includes
data set generation/augmentation,
selection/refinement/pruning of symmetry function parameters,
experimenting with neural network topology/settings,
the actual neural network training, i.e. optimizing the weight parameters.
Tools are ready-made for most of these tasks and can be combined to generate a new NNP from scratch. Here is a rough guideline of the individual steps from a set of configurations to an initial NN potential. However, be aware that creating a reliable NNP usually requires repeated data set refining and adequate testing before it is ready for production!
Step 1: Data set
Prepare a data set (name the file input.data
) in the file format described
here. Unfortunately there’s no simple recipe how to create a
“good” set of configurations, you will find more hints in the literature. As a
starting point you could try with some (100+) configurations taken from an ab
initio MD simulation.
Step 2: Prepare settings file
Prepare a settings file (name the file input.nn
): Use the recommended file
here
and change the settings in the “GENERAL NNP SETTINGS” section according to
your system.
Step 3 (optional): Data set normalization
Note
This step is usually not necessary any more because data set normalization is handled on-the-fly during training with the keyword normalize_data_set.
As explained here it may be useful to ensure that training is
independent of the chosen unit system. With input.nn
and input.data
present run the tool nnp-norm which implements a normalization procedure
(see chapter 3.1 in [1]) for this purpose. This will write an additional header
to the settings file input.nn
with three new keyword-value pairs. These will
instruct other n2p2 tools to enable the data set normalization during runtime.
Whenever the data set is changed, do not forget to repeat this step.
Important
Besides the addition of the normalization header no other actions are required to enable data set normalization for all other steps below. In particular, neither the data set nor other unit system dependent settings (e.g. cutoff radii, some symmetry function parameters) need to be converted manually. Any unit conversion will be handled internally and no user intervention is necessary.
Step 4: Symmetry function setup
Change the symmetry function definitions in the “SYMMETRY FUNCTIONS” section of
the input.nn
file.. again this not a trivial task, please find more
information in the literature [2] [3] [4]. See also the description of the
symfunction_short
keyword here.
Note
There is a very useful standalone Python tool written by Florian Buchner (see his pull request) which allows to create sets of symmetry function lines following the guidelines given in [3] and [4]. To use it, just copy the file sfparamgen.py to a local directory and follow the instructions given in this Jupyter notebook.
Step 5: Compute symmetry function statistics
With the files input.data
and input.nn
ready in the same directoy, run
the tool nnp-scaling
(supports MPI parallelization). This will compute all
symmetry functions for all atoms once and store statistics about them in a
third file (scaling.data
) required for training.
Step 6: NNP Training
Run the actual training program nnp-train
, preferably in parallel via:
mpirun -np 16 nnp-train
Be aware of the memory footprint which is estimated
in the previous step (see end of log file or screen output).
Step 7: Collect weight files
Upon training weight files are created for each epoch weights.???.<epoch>
.
Select an epoch with satisfying RMSE and rename the corresponding weight
files to weights.???.data
(???
is the atomic number of elements
occurring).
Step 8: Prediction and MD simulation
Try the potential by predicting energies and forces for a new configuration:
Collect the files from training (input.nn
, scaling.data
and
weights.???
) in a folder together with a single configuration (named again
input.data
) and run the tool nnp-predict
. Alternatively, try to run a MD
simulation with LAMMPS (see setup instructions here).
Please also have a look at the examples
directory which provides working
example setups for each tool. If there are problems don’t hesitate to ask
again…