Configuration file format
Atomic configurations are stored on disk by NNP applications in a simple ASCII
file. Data sets with training structures need to be provided in the same format.
The file name for input configurations is usually input.data
. A configuration
file may contain multiple structures, each enclosed by the begin
and end
keywords. The lines in between must begin with one of the following keywords:
atom
lattice
comment
energy
charge
Here is a sample layout:
begin
comment <comment>
lattice <ax> <ay> <az>
lattice <bx> <by> <bz>
lattice <cx> <cy> <cz>
atom <x1> <y1> <z1> <e1> <c1> <n1> <fx1> <fy1> <fz1>
atom <x2> <y2> <z2> <e2> <c2> <n2> <fx2> <fy2> <fz2>
...
atom <xn> <yn> <zn> <en> <cn> <nn> <fxn> <fyn> <fzn>
energy <energy>
charge <charge>
end
begin
...
end
...
begin
...
end
where the arguments of the keywords are:
<comment>
: comment line<ax>
…<cz>
: box vectors \(\vec{\mathbf{a}}, \vec{\mathbf{b}}, \vec{\mathbf{c}}\) (seennp::Structure::calculateInverseBox()
).<x1>
…<zn>
: atom coordinates ofn
atoms<e1>
…<en>
: atom element string (e.g. Cd, S)<c1>
…<cn>
: not (yet) used, reserved for atom charge in case of long range neural network (to be implemented)<n1>
…<nn>
: not used<fx1>
…<fzn>
: force components ofn
atoms<energy>
: total potential energy<charge>
: total charge (for long range neural network only)
The lattice
section must be omitted for non-periodic structures. It is
possible to mix periodic and non-periodic structures. Also, configurations may
contain different numbers of atoms. If atoms in a periodic structure are
initially outside of the simulation box they will be automatically mapped back
into the box (see nnp::Structure::remap()
). Here is an example
configuration file with 3 structures, 2 periodic and 1 non-periodic:
begin
comment This periodic structure contains 2 Cd and 2 S atoms.
lattice 1.0 0.0 0.0
lattice 0.0 1.0 0.0
lattice 0.0 0.0 1.0
atom 0.1 0.2 0.3 Cd -0.1 0.0 -0.1 -0.3 0.1
atom 0.2 0.4 0.8 Cd -0.1 0.0 -0.2 0.6 -0.6
atom 0.7 0.2 0.7 S 0.1 0.0 -0.8 -0.1 0.1
atom 0.1 0.1 0.4 S 0.1 0.0 1.1 -0.2 0.4
energy 123.456
charge 0.0
end
begin
comment This non-periodic structure contains 1 Cd and 2 S atoms.
atom 0.9 0.1 0.8 Cd -0.1 0.0 -0.3 -0.3 0.1
atom 0.7 0.2 0.2 S 0.1 0.0 -0.8 0.1 0.3
atom 0.6 0.9 0.4 S 0.1 0.0 1.1 0.2 -0.4
energy 1337.00
charge 0.0
end
begin
comment This periodic structure contains 3 Cd and 3 S atoms.
lattice 2.0 0.0 0.0
lattice 1.0 2.0 0.0
lattice 1.0 1.0 2.0
atom 1.9 0.2 1.7 S 0.1 0.0 0.4 -0.1 -0.2
atom 1.1 0.2 0.5 Cd -0.1 0.0 -0.1 -0.3 0.2
atom 0.2 1.4 0.8 Cd -0.1 0.0 -0.2 0.8 0.5
atom 0.9 0.2 1.7 S 0.1 0.0 -0.7 -0.3 -0.6
atom 0.8 1.2 0.1 Cd -0.1 0.0 -0.2 0.1 0.5
atom 0.1 0.1 0.4 S 0.1 0.0 0.8 -0.2 -0.4
energy 543.210
charge 0.0
end
Manual train/test set definition
Since n2p2 version 2.3.0 it is possible to manually assign structures to the
training or test set already in the input.data
file. Just add the string
set=train
or set=test
right after the structure start marker begin
.
For example:
begin set=train
lattice 1.0 0.0 0.0
lattice 0.0 1.0 0.0
lattice 0.0 0.0 1.0
atom 0.1 0.2 0.3 Cd -0.1 0.0 -0.1 -0.3 0.1
atom 0.1 0.1 0.4 S 0.1 0.0 1.1 -0.2 0.4
energy 123.456
charge 0.0
end
begin set=test
lattice 1.0 0.0 0.0
lattice 0.0 1.0 0.0
lattice 0.0 0.0 1.0
atom 0.1 0.2 0.3 Cd -0.1 0.0 -0.1 -0.3 0.1
atom 0.1 0.1 0.4 S 0.1 0.0 1.1 -0.2 0.4
energy 123.456
charge 0.0
end
This assignment takes precedence over the usual random split performed by
nnp-train
. Structures which are not labelled with the set=
method will
still be assigned randomly to training/test sets according to the
test_fraction
keyword.