nnp-train
Warning
Documentation in progress…
This tool implements the actual training procedure for a given data set. It is able to train both 2G and 4G [1] neural networks (NN). In the latter case the training procedure consists of two training stages. Stage 1 needs to be done first. It is the training of the charge NNs. After this is finished one can go to stage 2 which consists of training the short-ranged NNs by fitting the data to energy and forces.
Requirements:
input.data
input.nn
scaling.data
Additionally for stage 2 in the 4G case:
hardness.???.data
weightse.???.data
Usage:
When training a 2G NN the following command will train a NN with the given
topology in input.nn
for the given data set with 4 MPI tasks
mpirun -np 4 nnp-train
If one has specified a 4G NN the command is
mpirun -np 4 nnp-train <n>
where <n>
is the stage (1 or 2). After finishing stage 1 one has to choose
the NN architecture of the preferred training epoch. Usually one picks the epoch
with the lowest RMSE in the training set but there may be reasons to deviate
from that rule. After deciding for epoch <m>
one has to rename the files
hardness.???.<m>.out
and weightse.???.<m>.out
to hardness.???.data
and weightse.???.data
, respectively.
When the training is finished (after stage 2 with the 4G NN or after the
training with the 2G NN) it is again necessary to pick an epoch <m>
of this
run and rename the files weights.???.<m>.out
to weights.???.data
.
Sample screen output:
A typical stage 1 training with a 4G network looks like this:
.
.
.
*** TRAINING LOOP *************************************************************
The training loop output covers different errors, update and
timing information. The following quantities are organized
according to the matrix scheme below:
-------------------------------------------------------------------
ep ........ Epoch.
Q_count ... Number of charge updates.
Q_train ... RMSE of training charges.
Q_test .... RMSE of test charges.
Q_pt ...... Percentage of time for charge updates w.r.t. to t_train.
count ..... Total number of updates.
train ..... Percentage of time for training.
error ..... Percentage of time for error calculation.
other ..... Percentage of time for other purposes.
epoch ..... Total time for this epoch (seconds).
total ..... Total time for all epochs (seconds).
-------------------------------------------------------------------
charge ep Q_count Q_train Q_test Q_pt
timing ep count train error other epoch total
-------------------------------------------------------------------
CHARGE 0 0 2.30301E-01 2.75350E-01 0.0
TIMING 0 0 0.0 58.3 41.7 0.09 0.09
------
CHARGE 1 4 1.64420E-02 1.25118E-02 100.0
TIMING 1 4 92.8 3.8 3.3 0.41 0.49
------
CHARGE 2 4 8.13293E-03 4.64616E-03 100.0
TIMING 2 4 91.7 4.5 3.9 0.34 0.83
------
.
.
.
------
CHARGE 10 4 3.41430E-03 2.22138E-03 100.0
TIMING 10 4 90.0 5.0 5.0 0.38 3.65
-------------------------------------------------------------------------------
TIMING Training loop finished: 3.65 seconds.
*******************************************************************************
Whereas 2G NN training or stage 2 training with 4G NN produces something similar to this:
.
.
.
*** TRAINING LOOP *************************************************************
The training loop output covers different errors, update and
timing information. The following quantities are organized
according to the matrix scheme below:
-------------------------------------------------------------------
ep ........ Epoch.
E_count ... Number of energy updates.
E_train ... RMSE of training energies per atom.
E_test .... RMSE of test energies per atom.
E_pt ...... Percentage of time for energy updates w.r.t. to t_train.
F_count ... Number of force updates.
F_train ... RMSE of training forces.
F_test .... RMSE of test forces.
F_pt ...... Percentage of time for force updates w.r.t. to t_train.
count ..... Total number of updates.
train ..... Percentage of time for training.
error ..... Percentage of time for error calculation.
other ..... Percentage of time for other purposes.
epoch ..... Total time for this epoch (seconds).
total ..... Total time for all epochs (seconds).
-------------------------------------------------------------------
energy ep E_count E_train E_test E_pt
force ep F_count F_train F_test F_pt
timing ep count train error other epoch total
-------------------------------------------------------------------
ENERGY 0 0 1.80089E-02 1.72559E-02 0.0
FORCE 0 0 1.76247E-01 1.93256E-01 0.0
TIMING 0 0 0.0 82.9 17.1 0.12 0.12
------
ENERGY 1 4 5.48098E-05 2.99658E-05 15.5
FORCE 1 16 3.97965E-03 3.93252E-03 84.5
TIMING 1 20 95.4 3.5 1.1 1.21 1.33
------
ENERGY 2 4 1.62363E-05 8.82677E-06 14.1
FORCE 2 16 3.15635E-03 2.18593E-03 85.9
TIMING 2 20 95.2 3.6 1.1 1.23 2.56
------
.
.
.
------
ENERGY 10 4 2.47602E-05 8.54473E-06 14.2
FORCE 10 16 8.73691E-03 1.41630E-02 85.8
TIMING 10 20 94.5 4.0 1.5 1.23 12.35
-------------------------------------------------------------------------------
TIMING Training loop finished: 12.35 seconds.
*******************************************************************************
File output:
Always generated:
In the following [...]
is a part of the filename that only exists in 4G training.
learning-curve.out[.stage-<n>]
: Contains the errors of the NN after each epoch for all quantities that are used for this training.test.data
: Contains the data that is only used for testing but not for training (formatted likeinput.data
).train.data
: Contains the data that is only used for training but not for testing (formatted likeinput.data
).updater.???.out[.stage-<n>]
: Contains informations about the optimization algorithm that was used for training the NN.timing.out[.stage-<n>]
: Contains information about the time needed for individual tasks in the training procedure (e.g. update and error calculation).
Optional:
In 4G stage 1 (if write_weights_epoch
is set non-zero):
hardness.???.??????.out
weightse.???.??????.out
In 2G and 4G stage 2 (if write_weights_epoch
is set non-zero):
weights.???.??????.out
In 4G stage 1 (if write_traincharges
is set non-zero):
traincharges.??????.out
: Contains a comparison between the reference charges and the predicted charges for the data used in the training after the epoch denoted by??????
.testcharges.??????.out
: Contains a comparison between the reference charges and the predicted charges for the data used for testing after the epoch denoted by??????
.
In 2G or 4G stage 2 (if write_trainpoints
is set non-zero):
trainpoints.??????.out
: Contains a comparison between the reference energies and the predicted energies for the data used in the training after the epoch denoted by??????
.testpoints.??????.out
: Contains a comparison between the reference energies and the predicted energies for the data used for testing after the epoch denoted by??????
.
In 2G or 4G stage 2 (if write_trainforces
is set non-zero):
trainforces.??????.out
: Contains a comparison between the reference forces and the predicted forces for the data used in the training after the epoch denoted by??????
.testforces.??????.out
: Contains a comparison between the reference forces and the predicted forces for the data used for testing after the epoch denoted by??????
.