Efficient Machine Learning Force Field For Large-Scale Molecular Simulations of Organic Systems

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Efficient Machine Learning Force Field for Large-Scale

Molecular Simulations of Organic Systems


Junbao Hu1,2 , Liyang Zhou3*, Jian Jiang1,2*
1 BeijingNational Laboratory for Molecular Sciences, State Key Laboratory of Polymer
Physics and Chemistry, Institute of Chemistry, Chinese Academy of Sciences, Beijing,
100190, P. R. China.
arXiv:2312.09490v1 [cond-mat.soft] 15 Dec 2023

2 University of Chinese Academy of Sciences, Beijing, 100049, P. R. China.


3 Juhua Group Co., Ltd, Quzhou, 324004, P. R. China.

*Corresponding author(s). E-mail(s): [email protected]; [email protected];

Abstract
To address the computational challenges of ab initio molecular dynamics and the accuracy limitations
of empirical force fields, the introduction of machine learning force fields has proven effective in var-
ious systems including metals and inorganic materials. However, in large-scale organic systems, the
application of machine learning force fields is often hindered by impediments such as the complexity
of long-range intermolecular interactions and molecular conformations, as well as the instability in
long-time molecular simulations.Therefore, we propose a universal multiscale higher-order equivariant
model combined with active learning techniques, efficiently capturing the complex long-range inter-
molecular interactions and molecular conformations. Compared to existing equivariant models, our
model achieves the highest predictive accuracy, and magnitude-level improvements in computational
speed and memory efficiency. In addition, a bond length stretching method is designed to improve
the stability of long-time molecular simulations. Utilizing only 901 samples from a dataset with 120
atoms, our model successfully extends high precision to systems with hundreds of thousands of atoms.
These achievements guarantee high predictive accuracy, fast simulation speed, minimal memory con-
sumption, and robust simulation stability, satisfying the requirements for high-precision and long-time
molecular simulations in large-scale organic systems.

Main development of new materials, and advances in


biomedical research [1–4].
Molecular dynamics (MD) simulation has gained Traditional molecular simulations grapple with
significant attention in recent years across various the dilemma of balancing the high computational
disciplines, spanning physics, chemistry, biology, cost of ab initio molecular dynamics (AIMD)
and materials science. This cutting-edge technol- against the low precision of empirical force fields.
ogy, by simulating interactions between molecules A resolution to this challenge is found in the
or atoms, offers researchers a means to investi- application of machine learning, leveraging its
gate the microstructure of substances and under- powerful fitting capabilities. The fundamental idea
stand their macroscopic properties. This technique of machine learning force field (MLFF) is to estab-
has proved invaluable for experimental design, lish a mapping from molecular coordinates to

1
the labels of high-precision quantum chemistry and non-metallic inorganic materials[36]. How-
data, including potential energy and forces. This ever, there is limited research to date on the
approach eliminates the need to solve the intri- application of MLFF in organic systems. In fact,
cate Schrödinger equation, resulting in a signifi- due to the weak generalization ability of machine
cant acceleration and achieving a balance between learning models[37], the application of MLFF in
prediction precision and simulation speed [5]. large-scale organic systems is usually challenged
Since the advent of BPNN in 2007[6], numer- by several impediments such as the complex-
ous MLFF models have been proposed to improve ity of long-range intermolecular interactions and
prediction accuracy, and their performance molecular conformations, as well as the instabil-
has been systematically investigated in public ity in long-time molecular simulations. Therefore,
datasets (MD17[7], MD22[8], OC22[9]). From molecular simulations based on MLFF often fail
the perspective of tensor order (denoted by (e.g., loss of accuracy, bond breaking, and atomic
l), existing 3D molecular representation learn- overlap) when encountering scenarios not present
ing models can be categorized into two main in the training set [30, 38] .
classes: one is the invariant graph neural net- Expanding the receptive field of neural net-
works with only scalar features (i.e., l = 0) , work models by appropriately increasing the num-
including SchNet[10], DeePMD[11], DTNN[12], ber of interaction layers has been proven to
PhysNet[13], ComENet[14], SphereNet[15]; the enhance the predictive accuracy of MLFF in
other is the equivariant graph neural networks multi-molecular interaction systems[39]. However,
with vector features (i.e., l = 1) includ- increasing the number of interaction layers results
ing EGNN[16], PaiNN[17], GVP-GNN[18], in higher computational costs and can lead to
EQGAT[19], as well as higher-order features over-smoothing issues that diminish the expres-
(i.e., l > 1) such as TFN[20], Cormorant[21], sive capability of the machine learning model[40].
SEGNN[22], Equiformer[23], NequlP[24], In fact, expanding the model’s receptive field
Allegro[25], BotNet[26], MACE[27]. Specifically, by increasing the number of interaction layers
invariant methods directly use invariant geomet- does not guarantee accurate characterization of
ric features such as distance and angles as input, long-range intermolecular interactions[29].
ensuring invariance to the rotation and translation The key to accurately capturing long-range
transformations on input molecules. In contrast, intermolecular interactions is increasing the cutoff
the equivariant model can maintain certain radius of the neural network model’s hyperparam-
symmetries or properties under specific trans- eter. However, the number of neighboring atoms
formations such as rotation and translation.[28]. increases cubically with increasing cutoff radius,
Models with equivariance properties for l ≥ 1 gen- resulting in a significant increase in simulation
erally outperform invariant ones on various public time and memory consumption. This limitation
datasets and tests[16, 24, 29, 30]. Furthermore, in largely hampers the application of MLFF in
terms of the expressive power of geometric graph long-time MD simulations for large-scale organic
neural networks, higher-order equivariant models systems. Consequently, many models adopt mod-
demonstrate more expressive capabilities com- ular methods to consider long-range intermolec-
pared to the first-order equivariant models[31]. ular interactions. For instance, a fragment-based
Increasing the equivariant order often leads to method[39] was used to demonstrate the long-
improved model accuracy[27, 32, 33]. Therefore, range intermolecular interactions. However, this
higher-order equivariant force field models such fragment-based approch lacks generality and may
as NequIP, Allegro, and MACE achieve excellent require specific fragmentation schemes for cer-
performance on multiple MD simulation metrics tain molecules. The DeePMD-LR method[41]
[30] with nearly a 1000-fold improvement in data directly incorporates existing empirical equations
efficiency [24, 25]. to account for the long-range intermolecular inter-
The high accuracy MLFFs (the errors in atom actions, but the limited applicability and the
energy and forces are around 1 meV/atom and 50 requirement for expert knowledge in empirical
meV/Å, respectively [34, 35]) have been success- force fields restrict the use of this method. The
fully applied in diverse systems such as metallic fourth generation HDNNP network[42] employs a

2
separate neural network module to predict long- To address the aforementioned challenges,
range interactions, which results in an overly we propose an end-to-end, highly efficient, and
complex model with significant computational broadly applicable higher-order equivariant model
costs, limiting its application in long-time MD to effectively handle the long-range intermolecu-
simulations. The MFN model[43] considers the lar interactions. This method exhibits excellent
long-range intermolecular interactions by adopt- competitiveness in various aspects, including pre-
ing a new architecture that incorporates analytic diction accuracy, simulation speed, memory con-
matrix-valued functions. However, the computa- sumption, and MD results. Moreover, the incorpo-
tional complexity and the memory consumption ration of a bond length stretching technique sig-
scale as O(N 2 ) and O(N 4/3 ), respectively, where nificantly boosts the simulation stability of MLFF
N is the number of atoms. Therefore, there is a models in organic systems. In addition, an active
pressing need for an efficient end-to-end method to learning approach based on committee queries is
capture the long-range intermolecular interactions adopted to efficiently collect datasets. Notably,
in the organic systems. this study successfully extends high precision from
In addition, the extensive conformational vari- a limited dataset comprising 901 samples with
ability of organic compounds requires that the 120 atoms to larger systems encompassing ten
training set ensures not only quantity but also thousand atoms. This achievement plays a pivotal
diversity[44]. Unfortunately, AIMD proves to role in facilitating long-time MD simulations in
be prohibitively expensive to generate a large large-scale organic systems.
number of diverse conformations. Consequently,
an efficient method for collecting training sets Results
becomes imperative, emphasizing the need for
non-redundant datasets to mitigate the label- Efficient and universal multiscale
ing costs associated with quantum-mechanical higher-order equivariant modeling
calculations such as density functional theory
(DFT). Moreover, the impracticality of prepar-
architecture
ing the training set from high-precision quantum Using the Quantum Mechanics (QM) method
DFT calculations for large organic systems fur- to compute large-scale systems remains a chal-
ther complicates the labeling process. As a result, lenging task[46, 47]. However, certain studies
the labeled datasets are typically derived from focus only on specific regions within the system,
smaller systems[45]. This limitation underscores such as the active sites of enzymes. Therefore,
the necessity of investigating whether MLFF the Quantum Mechanics/Molecular Mechanics
methods can maintain high-precision generaliza- (QM/MM) hybrid method, also known as a mul-
tion when applied to larger systems. tiscale method, is widely employed to address this
Finally, to address the issue of instability in challenge. The key concept of this method is that
long-time molecular simulations, various methods for the target regions, a high computational cost
have been proposed. For example, directly incor- QM method is used, and for nontarget regions,
porating configurations not present in the training a low computational cost Molecular Mechanics
set can enhance the stability of the simulation (MM) method is employed; thereby achieving
[38], but it is impractical to exhaustively consider efficient computations for large-scale systems[47].
all possible configurations. Reducing the time step Most of the MLFF models are based on the
in MD is another strategy to decrease the inci- locality hypothesis of atoms[48, 49]. This assump-
dence of simulation crashes [24, 25]. However, tion claims that atoms only interact with their
this approach comes at the cost of a propor- neighbors and is physically justified by the near-
tional increase in simulation time. Although larger sightedness principle of electronic matter. [50] Fur-
models with millions or more parameters may alle- thermore, this assumption infers that interactions
viate the issues mentioned above, they do not between atoms that are far apart are negligible.
guarantee simulation stability [30]. Therefore, a In order to take into account long-range inter-
more efficient and straightforward MLFF model is molecular interactions, the MLFF models must
required to ensure stability during long-time MD increase receptive field. In this work, inspired by
simulations.

3
Embedding Message
Message Message-Long
Convolution Filter TP MLP Convolution Filter TP MLP
Interaction

Aggregation Aggregation

Attention Attention
Interaction

Output Block
Update
TP MLP

Output

Fig. 1 |The framework of the universal multiscale higher-order equivariant model. The model comprises an
embedding layer, an interaction layer, and an output layer. The interaction layer may consist of multiple layers that
are responsible for considering interactions between the central atom and its neighboring atoms. The yellow module is
designed to efficiently capture the long-range intermolecular interactions. This module achieves efficient processing of long-
range message by reducing the channel numbers of nodes and lowering expansion order of direction information. After the
aggregation of the long-range message,P the feature dimensions of the long-range nodes and the local nodes are aligned, and
then they are added together. Here, denotes summation, ∥ represents feature concatenation, TP denotes tensor product,
MLP stands for multilayer perceptron, and Attention indicates the attention mechanism.

the QM/MM method, we propose a universal mul- this work, we only focus on higher-order equiv-
tiscale higher-order equivariant model, in which a ariant models. Various higher-order equivariant
high-cost module is used to demonstrate the short- models have been proposed through the construc-
range interactions related to the central atom, tion of different nonlinear functions, convolution
while a low-cost module is employed to capture filters, message functions, aggregation functions,
the corresponding long-range interactions. This and update functions. Examples of these mod-
method adopts a multiscale strategy that empow- els include TFN [20], Equiformer [23], Cormorant
ers MLFF models to attain optimal precision and [21], MACE [27], NequIP [52], SEGNNs [53] and so
efficiency in handling intermolecular long-range on. Our universal and efficient multiscale higher-
interactions. Specifically, the potential energy of order equivariant model is proposed based on
each atom is divided into short-range and long these groundworks (Fig. 1). This multiscale model
range parts, and the total potential energy of the achieves fast processing of long-range messages by
system is obtained by summing the contributions reducing the channel dimensions of nodes through
from all atoms. The atomic forces are the nega- equivariant linear transformations and lowering
tive gradients of the total potential energy with the expansion order of the embedding direction
respect to the coordinates, ensuring the energy information. After the long-range messages are
conservation of the model[51]: aggregated, another linear transformation is used
X to align the long-range and short-range features.
Epot = (Eishort + Eilong ) Finally, the summation of these two features
i∈Natoms gives the entire multiscale equivariant model. We

− have to note that this multiscale model is gen-
Fi = −▽i Epot
eral. Any higher-order equivariant model can be
As mentioned above, in terms of fitting atomic combined with this model. In this work, our
potential energy, higher-order equivariant models multiscale higher-order equivariant model is con-
significantly outperform the first-order equivari- structed based on the MACE model. Hereafter,
ant models and invariant models. Therefore, in

4
For example, we consider the total potential
a energy of two formaldehyde molecules that are
arranged in a face-to-face parallel alignment in
space. Taking the carbon atom of the left molecule
as the center atom, the shortest and longest dis-
tances to the right molecule are 8Å and 9 Å,
respectively, as shown in Fig. 2a. To prepare the
dataset, we rotate one of the molecules around
the central axis of two carbon atoms by 1 degree
each time, and collect 360 samples. In this dataset,
b we select one sample every 3 degrees to include
in the training dataset, while the remaining con-
figurations serve as the test dataset. The energy
and forces for each sample are calculated using
the semi-empirical GFN2-xTB level of theory[55].
Three different machine learning models based on
the MACE model are considered: (i) three inter-
action layers (n = 3) with a cutoff radius of 8
Å; (ii) one interaction layer (n = 1) with a cut-
off radius of 9 Å; (iii) our multiscale model with
one interaction layer (n = 1) and cutoff radii of
3 Å for short-range interactions and 9 Å for long-
range interactions. To guarantee sufficient fitting
capacity for all models, in the first two models and
the short-range part of the third model, the node
Fig. 2 |The potential energy predictions from three features are set to 256x0e, the direction informa-
different machine learning models. a The schematic tion is embedded using a third-order expansion,
diagram of two formaldehyde molecules. The shortest and the cutoff polynomial envelope is set to 60, and
farthest distances between the right molecule and the car-
bon atom of the left molecule are 8 Å and 9 Å, respectively. the batch size is set to 2. In the long-range part
b the potential energy predictions on the test dataset. The of the third model (our MS-MACE model), the
blue dash line denotes the model with three interaction lay- node features are set to 32x0e with only first order
ers (n=3) and a cutoff radius of 8 Å, the green dash line expansion for direction information embeddings.
represents the model with one interaction layer (n=1) and
a cutoff radius of 9 Å, the red dash line represents our mul- All other hyperparameters remain the same.
tiscale model, and the black line represents the GFN2-xTB Due to the lack of intermediary atoms between
results. these two formaldehyde molecules, the transmis-
sion of the message is interrupted. Consequently,
it will be referred to as “Our” or “Our (MS- merely increasing the number of interaction layers
MACE)”. without enlarging the cutoff radius is insufficient
to capture long-range intermolecular interactions.
Intermolecular interaction Therefore, according to Fig. 2(b), the first model
(n = 3, rc = 8 Å) can only predict the aver-
Increasing the number of interaction layers gen-
age energy (the blue dash line). On the other
erally enhances the receptive field of the model
hand, although the number of interaction layer
[54]. However, for long-range intermolecular inter-
is reduced to one (n = 1), the variations of the
actions, there may be scenarios that lack inter-
potential energy with respect to angles can be
mediary atoms, which causes an interruption in
effectively captured by increasing the cutoff radius
propagation, consequently leading to the ineffec-
to 9 Å (the green dash line). However, a longer
tiveness of this model. On the contrary, increasing
cutoff radius incurs significant computational cost
the cutoff radius of the neural network model’s
and memory consumption, especially for large-
hyperparameter proves to be the key to accurately
scale systems. In our multiscale model, inspired by
capturing long-range intermolecular interactions.

5
the QM/MM method, we consider the short-range C12 F27 N, CAS No. 86508-42-1), which is a non-
interactions using a very short cutoff radius (rc = conducting, well-thermally and chemically stabi-
3 Å) but a larger channel dimensions (256x0e) and lized dielectric liquid and is widely used as a fully
a higher-order expansion for the direction infor- immersible electronics coolant. Molecular simula-
mation, while the long-range intermolecular inter- tions of 300 ps are performed at 300 K, 600 K,
actions are captured by a large cutoff radius (rc and 800 K using AIMD simulations based on semi-
= 9 Å) but a smaller channel dimensions (32x0e) empirical GFN2-xTB level of theory under the
and only first-order expansion for the embed- NVT ensemble. Fig.3a show the distribution of
ded direction information. According to Fig. 2(b), C-F bond length at different temperatures. The
our model gives the most accurate predictions results indicate that higher temperatures lead to
(red dash line). We have to note that our model broader distributions. However, even at 800 K, the
not only exhibits the highest accuracy, but also distribution of bond lengths remains too narrow
achieves magnitude-level improvements in compu- to guarantee the stability of long-time MD sim-
tational speed and memory efficiency. That is to ulations. Inspired by this fact, the bond length
say, to consider long-range intermolecular interac- stretching method is developed as follows. First,
tions, the MLFF models should increase the cutoff the training dataset is constructed by randomly
radius rather than the number of interaction lay- selecting 600 samples of conformations from the
ers, the channel dimensions of node features, and trajectories at 300 K. Then, in every sample
the expansion order for the embedded direction within this training dataset, we randomly select
information. The accuracy and efficiency of our a chemical bond and multiply the length of this
multiscale model will be discussed in detail below bond by a factor of λ, where λ is a random variable
and in Fig. S1 of the Supporting Materials. in the range of [0.85, 2.0]. This range is chosen to
avoid overstretching and overcompression of the
Stability of long-time simulation bonds. For comparison, the bond length distribu-
tion of the dataset with stretched bonds is also
The existing MLFF models are easy to form exces- added to Fig. 3a. The results indicate that the
sively long or short bonds during long-time MD stretching method can significantly improve the
simulations in organic systems, leading to the col- bond length distributions.
lapse of simulations[30, 38, 56]. This is due to To verify the effectiveness of this bond stretch-
the fact that most of the samples in the training ing method, we first combine the two represen-
dataset are obtained from sampling near the equi- tative equivariant models, i.e., the MACE and
librium state[57]], and the proportion of molecular NequIP models, with our multiscale model, result-
conformations with abnormal bond lengths in the ing in the MS-MACE and MS-NequIP models,
dataset is very low. Consequently, the weak gener- respectively. Then these two multiscale models
alization ability of machine learning models leads are trained on datasets that are both processed
to the collapse of MD simulations. However, it and unprocessed by the bond length stretching
is challenging to directly collect conformations method. According to Figs. 3b and 3c, our bond
with extremely long or short bond lengths from length stretching method significantly improves
MD trajectories, because the MD process would the prediction capabilities of potential energy and
collapse instantaneously in this scenario. There- forces. As shown in Fig. 3c, without employing
fore, we propose a data augmentation method the bond length stretching method, these two
to enhance the stability of the model during MLFF models would even yield unphysical pre-
long-time MD simulations. We first collect con- dictions of forces for extremely short or long bond
formations from normal MD trajectories and then lengths. Furthermore, 150 conformations are ran-
use a method of stretching bond lengths to expand domly sampled from the MD trajectories at 800 K
the training dataset. to serve as the initial configurations. Then 50000
To demonstrate this bond length stretching steps of Langevin dynamics simulations based on
method, we take an organic molecule as example, the above MLFF models are performed at 800 K
i.e., perfluorotri-n-butylamine (molecular formula with a time step of 3 fs. The MD simulation stabil-
ity of MLFF models can be analyzed by counting
the number of frames in which collapse occurs

6
a b c

d e f

Fig. 3 |Long-time molecular simulation stability. a, The violin plots of the C-F bond length distributions for
perfluorotri-n-butylamine in the training datasets are collected using GFN2-xTB level of theory at different temperatures.
In addition, the bond length distribution of the training dataset for 300 K, improved by the bond length stretching method,
is also presented (the red shape). b and c show the average absolute errors of energy and forces for different machine learn-
ing models on the test dataset of various C-F bond lengths. The plus symbol denotes the MS-NequIP model, while the
dot symbol denotes the MS-MACE model. The blue color represents the original training dataset derived from the 300 K
trajectories (hereafter, this training dataset is referred to as “300K”), while the red color represents the training dataset
improved by the bond length stretching method (hereafter, this training dataset is referred to as “300K-Stretch”). The
black line indicates the GFN2-xTB result and the blue dash line is the zero line. d, The violin plots for the stability of
MD simulations. In e and f, the curves describe the C-F bond lengths change in MD iterations at 600 K with initial bond
lengths of 1.9 Å and 2.1 Å, respectively. The blue and green lines represent the MS-NequIP model trained from the “300K”
dataset and the “300K-Stretch” dataset, respectively. Similarly, the yellow and red lines represent the MS-MACE model
trained from the “300K” and “300K-Stretch” datasets, respectively.

(the criterion for collapse is that there exists a In addition, according to the inset of Fig.
bond length deviating by more than 5 Å from 3c, without employing the bond length stretch-
its equilibrium state)[30, 38]. According to Fig. ing method, MS-NequIP and MS-MACE models
3d, without employing the bond length stretching give unphysical preditions (i.e., repulsive forces)
method, the stability of the MS-MACE model is when the bond lengths are larger than 1.9 Å and
superior to that of the MS-NequIP model, with 2.0 Å, respectively. This seems to suggest that the
a success rate of 88.67% for the former and only threshold bond length for unphysical predictions
2.0% for the latter, further emphasizing the excel- determines the stability of the long-time MD sim-
lent simulation stability of the MACE framework ulations. Therefore, we infer that the simulation
[29]. With employing the bond length stretching stability of the MLFF model is highly correlated
method, the simulation stability of both models with the predictive capability of extreme bond
is greatly enhanced, with a success rate of 100%. lengths. To further examine the simulation sta-
This suggests that the bond length stretching bility of each model, we investigate the change
method exhibits high generality and effectiveness. of C-F bond length over the simulation time in

7
a b c

d e

Fig. 4 |Active learning workflow and results. a The schematic diagram of active learning techniques within the
framework of the multiscale higher-order equivariant model combined with the bond length stretching method. b, c, d and
e respectively show the distribution of the maximum force variance (σfmax ) in the first, second, third, and tenth rounds of
the data collecting process (or exploring process). Different colors represent exploration at different temperatures, where
the blue, green, yellow, and red lines correspond to 300 K, 500 K, 700 K, and 900 K, respectively.

Langevin dynamics simulations with a time step and universally applicable method to enhance the
of 1 fs at 600 K, where the initial bond lengths stability of long-time MD simulations.
are 1.9 Å (Fig. 3e) and 2.1 Å (Fig. 3f), respec-
tively. According to Fig. 3c, in the case where the Efficient collection of training
bond length is 1.9 Å (larger than the equilibrium datasets
bond length), the MS-MACE predicts a negative
force (i.e., elastic rebound force). Therefore, the Due to the limited generalization capability of
elongated chemical bond can be restored to its neural networks, the performance quality of
equilibrium state (see the yellow line in Fig. 3e). MLFF model is heavily dependent on the qual-
However, in this case, the MS-NequIP model gives ity of the training dataset. Moreover, labeling the
an unphysical prediction (a repulsive force) (Fig. dataset requires the use of quantum-mechanical
3c), resulting an abnormal elongation of the C- calculations, which involve a high computational
F bond during the MD simulations (see the blue cost. Therefore, a key task prior to training MLFF
line in Fig. 3e), and ultimately leading to the model is to construct a dataset that is as com-
collapse of simulations. At a bond length of 2.1 prehensive and non-redundant as possible [34, 58].
Å, without employing the bond length stretching Active learning techniques have emerged as a
method, both models give unphysical predictions prevalent strategy for collecting datasets, with
(i.e., repulsive forces), resulting in incorrect elon- committee querying methods being particularly
gations of the C-F bonds (see the yellow and prominent in the field of MLFF [45, 49, 59].
blue lines in Fig. 3f). On the other hand, by the The fundamental idea of an active learning
application of our bond length stretching method, technique is to use the disagreement of the com-
both models can make correct predictions for the mittee to quantify the generalization error [60].
cases where the initial bond lengths are 1.9 Å Specifically, if a training dataset is of sufficiently
and 2.1 Å, respectively (see the green and red high quality, a MLFF model, operating under var-
curves in Figs. 3e and 3f). In summary, the bond ious random seed conditions, will yield accurate
length stretching method proves to be an effective predictions with low variance for normal samples

8
Table 1 The percentage of accurate (σfmax < σlow ), effective candidate (σlow ≤ σfmax ≤ σhigh ), and failed (σfmax > σhigh )
samples in the ith iteraction.

i 1 2 3 4 5 6 7 8 9 10
Accurate 73.20 97.42 98.74 98.68 98.15 99.49 99.46 99.51 99.67 99.68
Candidate 26.72 2.58 1.25 1.32 1.85 0.51 0.54 0.49 0.33 0.32
Failed 0.08 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00

that are not present in the training set. Other- field (GAFF)[61] on a system consisting of 200
wise, predictions with high variance indicate that perfluorotri-n-butylamine molecules in the NPT
the training dataset is incomplete and requires ensemble. During the annealing simulation, the
the addition of more effective samples for further system cools down from 800 K to 280 K in 60
improvement. Typically, the maximum variance ns, and 300 frames of trajectory are randomly
in force predictions made by the MLFF model extracted with a minimum sampling interval of
under different random seeds is used as a crite- 10 ps. In each frame, the three molecules closest
rion to determine whether a sample should be to the center of the simulation box are selected
added to the training set. This maximum variance as a sample (120 atoms) and added to the initial
is calculated based on the following formula: dataset. To enhance the simulation stability dur-
np o ing data collection, the initial dataset is improved
σfmax = max ⟨∥Fi (Rt ) − ⟨Fi (Rt )⟩∥2 ⟩ (1) using the bond length stretching method proposed
i in the previous section. Then the improved dataset
is labeled using the semi-empirical GFN2-xTB
where i is the index of the atom in the candi-
level of theory. Second, four MLFF models based
date sample. If the maximum variance satisfies the
on our multiscale higher-order equivariant model
inequality σlow ≤ σfmax ≤ σhigh , then the sam-
(i.e., MS-MACE) with different random seeds and
ple will be added to the training dataset, where
a batch size of 64 are trained on the initial dataset
the lower and upper limits are set to σlow = 100
(300 samples). Then high precision MD simula-
meV/Å and σhigh = 300 meV/Å, respectivley.[45]
tions are conducted in the NVT ensemble at 300
If the maximum variance exceeds the upper limit,
K, 500 K, 700 K, and 900 K, respectively, using
it suggests that the corresponding sample may be
the four trained MLFF models with a step length
abnormal. Using it as labeled data could poten-
of 1 fs and a total simulation time of 45 ps. In
tially be detrimental to the training effectiveness
the meantime, a certain number (10, 20, 40, and
of the MLFF model. An extremely low maximum
50 for 300 K, 500 K, 700 K, and 900 K, respec-
variance (< σlow ) indicates that the existing train-
tively.) of new samples are randomly selected as
ing dataset is sufficiently comprehensive for the
the candidate samples from the MD trajectories.
MLFF model to make correct predictions about
Finally, the candidate samples are assessed using
the corresponding sample. That is to say that this
Eq. (1), and the effective candidates are labeled
sample does not need to be added to the training
and added to the training dataset. The above pro-
dataset.
cess is repeated until the proportion of effective
In this section, similar to the previous section,
candidates generated from the pool of candidate
we will take perfluorotri-n-butylamine molecules
samples converges to a value less than 0.5%.
as an example to demonstrate the powerful capa-
Fig. 4b-e depict the distribution of σfmax at
bilities of active learning technique for data
different temperatures in the 1st, 2nd, 3rd, and
collection within the framework of our multi-
10th iterations. Table 1 presents the percentages
scale higher-order equivariant model. To con-
of accurate (σfmax < σlow ), effective candidate
struct a dataset that is as comprehensive and
(σlow ≤ σfmax ≤ σhigh ), and failed (σfmax >
non-redundant as possible, the workflow of the
σhigh ) samples in each iteration. In the 1st iter-
active learning method involves a series of con-
ation, 26.80% of samples exceed the lower limit
tinuous iterations. First, an annealing simulation
of the force variance (i.e., σfmax ≥ σlow ), and the
is conducted using the general AMBER force
density distribution curves of σfmax at different

9
a b

c d

Fig. 5 |a and b are the changes of the average absolute errors of the energy and forces per atom with the increase of the
atom numbers, respectively. c and d are the changes of time cost and memory consumption of simulations with the increase
of atom numbers. The straight lines are obtained by linear fittings to the data points using the least squares method. The
blue, green, yellow, and black lines are from the Allegro, NequIP, MACE, and our MS-MACE models, respectively.

temperatures are relatively broad (Fig. 4b). In the Prediction accuracy and molecular
first iteration, 26.72% of the candidates become simulation efficiency
effective candidates and are added to the initial
training dataset. After just one iteration, the per- In this section, we compare our multiscale model
centage of accurate samples increased to 97.42%, (referred as “MS-MACE” model) with three
and the density distribution of σfmax is primarily other representative equivariant models, includ-
concentrated in the range between 0 and σlow (Fig. ing MACE, NequIP, and Allegro, in terms of
4c). The proportion of effective candidate samples prediction accuracy, simulation speed, and GPU
drops to 2.58%. After 10 steps of iteration, the memory consumption. It is important to note that
percentage of accurate samples approaches 99.7%, achieving complete consistency in hyperparame-
and only 0.32% of samples need to be added to the ters for all models is impractical due to differences
dataset. Referring to pertinent literature [45, 59], in model architecture. For example, MACE and
we conclude that the iterations of data collection NequIP models utilize atoms for message passing,
using an active learning technique have converged. while the Allegro model employs edges for mes-
In the end, we collect a training dataset consisting sage passing[28]. However, to ensure the validity
of 901 samples. of the comparisons, the shared parameters of the
models are set identically. The detailed informa-
tion on the parameter settings can be found in the

10
a b c

Fig. 6 |Comparisons with AIMD simulations. a is the RDF of F-F atoms. b is the angle distributions of F-C-F
bonds, and c is the dihedral distributions of F-C-C-F. The blue lines are the results from AIMD simulations and the red
dotted lines are the results from our multiscale higher-order equivariant model.

Supporting Materials and the section of Method. that all models are based on a local assumption
In addition, it merits emphasis that comparisons [48, 49, 62].
with invariant models have not been conducted, Compared to other models, our MS-MACE
because equivariant models typically exhibit supe- model achieves the best performance across vari-
rior performance over invariant models in the ous metrics, including accuracy, simulation speed,
construction of MLFF. and GPU memory consumption. Specifically,
In the previous section, we have obtained according to Figs. 5a and 5b, our MS-MACE
a dataset comprising a total of 901 samples, model gives the most accurate predictions for both
each containing three perfluorotri-n-butylamine energy and force. In particular, the average error
molecules. This dataset is randomly divided into in force approaches an asymptotic value of 10.5
training and validation sets, comprising 801 and meV/Å, indicating that even in a large-scale sys-
100 samples, respectively. Subsequently, we will tem comprising several hundred thousand atoms,
construct seven test sets, each comprising 100 the force error remains around 10.5 meV/Å. That
samples. Within a single test set, each sample con- is to say, although each sample in the training
tains an equal number of molecules with different dataset only contains 120 atoms, our multiscale
conformations. Across the first to the seventh test MLFF model can successfully extend high preci-
sets, each sample will contain 20, 40, 60, 80, 100, sion to large-scale systems with hundred thousand
120, and 140 molecules, respectively. In order to atoms. In addition, according to Figs. 5c and
obtain these seven sets, we conduct all-atom MD 5d, our MS-MACE model achieves magnitude-
simulations on a system with 1000 molecules in level improvements in computational speed and
the NPT ensemble at 600 K using GAFF force memory efficiency. Furthermore, due to the lin-
field. All of the samples in the seven test sets ear characteristics of the time cost and memory
are randomly extracted from the MD trajectories, consumption changes, it is easy to speculate that
and then labeled using the semi-empirical GFN2- when the number of atoms in the system exceeds
xTB level of theory. In addition, aiming to reduce 100000, the computational speed and memory effi-
the impact of randomness, the average prediction ciency will see orders of magnitude improvement.
errors for each model are obtained from the model In summary, our MS-MACE model demonstrates
with five different random number seeds. exceptional competitiveness in energy and force
Fig. 5 shows the results of the comparisons predicting, simulation speed, and GPU memory
among different equivariant models. All models consumption.
exhibit high precisions in predicting energy and
forces on the test sets. Moreover, the time cost Compared with AIMD simulations
and GPU memory consumption of simulations
change linearly with the increase of atom number In this section, by comparing with the results from
in the system. This could be attributed to the fact AIMD simulations, we validate the capability of

11
our multiscale higher-order equivariant model. We important for the MLFF model, as it directly
conduct AIMD simulations in the NVT ensemble determines the prediction accuracy and training
at 600 K for 500 ps. The system comprises five efficiency of a MLFF model. The active learning
perfluorotri-n-butylamine molecules, where peri- technique provides a method for constructing a
odic boundary conditions are used in all three comprehensive training set; however, active learn-
directions. Under the same conditions, MD simu- ing is merely a concept. Its effectiveness depends
lations based on MS-MACE model are performed. on the corresponding MLFF model. In this work,
To reduce the influence of randomness, all sim- the active learning concept is combined with the
ulations are repeated three times independently. bond length stretching method within the frame-
Fig. 6 shows the comparisons of radial distribu- work of our multiscale higher-order equivariant
tion functions (RDFs) for F-F atoms, the angle model. The results show that this method can
distributions for F-C-F bonds, and the dihedral efficiently and stably construct a comprehensive
distributions for F-C-C-F bonds. The results indi- and low-redundancy training set. For example,
cate that the MS-MACE model can perfectly we built a training set containing only 901 sam-
reproduce the results from AIMD simulations, ples, each with merely 120 atoms. However, based
highlighting the accuracy of our multiscale higher- on this dataset, our multiscale model can achieve
order equivariant model. high-precision, high-efficiency, and low-memory
consumption in long-time stable simulations of
Discussion systems with hundreds of thousands of atoms.
In summary, in this work we proposed a uni-
Although the polarity of the perfluorotri-n- versal multiscale higher-order equivariant frame-
butylamine molecule is extremely weak, a cutoff work for constructing a MLFF model. Moreover,
radius of 8 Å is still required to accurately cap- within this framework, a bond length stretch-
ture the intermolecular long-range interactions. ing method and an active learning workflow have
However, a large cutoff radius is a disaster for been designed to realize the long-time stability
the MLFF model. According to Figs. 5c and 5d, of MD simulations and efficient collection of a
the cost of simulation time and memory con- comprehensive, low-redundancy training set. By
sumption is enormous for a general MLFF model, employing our multiscale model, one can achieve
which is unaffordable for simulating large-scale high-precision, high-speed, and low GPU mem-
systems. Moreover, the cutoff radius in systems ory consumption in long-time stable simulations
with dipole-dipole or Coulomb interactions will of large-scale organic systems. In our subsequent
be even larger. Therefore, a multiscale higher- work, we will explore the application of this mul-
order equivariant model with siginificant advan- tiscale model in systems with polar or charged
tages in prediction accuracy, simulation speed, molecules.
and memory consumption is necessary for large-
scale organic systems. Method
In addition, long-time simulation stability is
one of the most important and challenging issues Equivariant model
in the development of the MLFF model. To
address this issue, we proposed a bond length Incorporating the data symmetry in machine
stretching method, in which we randomly select learning models can improve the efficiency of data
chemical bonds from the training dataset and ran- collection and the generalization capability of the
domly stretch or compress them within a reason- models [63]. For atomic systems, if the coordi-
able range. This method achieves long-time simu- nates of the system rotate, quantities like forces
lation stability comparable to AIMD simulations and dipole moments should rotate accordingly.
with almost no additional computational cost. More strictly speaking, if the function f : X →
Moreover, this bond length stretching method Y is equivariant under the action of a group
can effectively improve the long-time simulation of transformation G, the the following equation
stability of all MLFF models. holds:
Finally, a training set that is as comprehensive f (DX [g]x) = DY [g]f (x). (2)
and non-redundant as possible is also extremely

12
In Eq. (2), X and Y are two vector spaces, x, y, is necessary to specify the current L-equivariant
and g are elements in X, Y , and G, respectively, information. Eq. (5) denotes the radial embed-
and DX [g] and DY [g] are the transformation ding function, using Bessel basis functions and
matrices parametrized by g in X and Y . A natural polynomial smooth truncation functions.[66] Here,
and effective method to ensure the equivariance n represents the embedding dimension, and the
of the transformation is to impose constraints truncation function for long-range interactions
on the transformation matrices to consider the should be greater than that for short-range inter-
symmetric properties of the data.[26, 64] actions. Eq. (6) expands the radial information to
Han[65] categorizes the equivariant models a specified dimension using a learnable multilayer
into three types: vector-based, Lie group-based, perceptron (MLP), which is related to the tensor
and irreducible representation-based. Among product in Eq. (7).
these three types of models, the irreducible
representation-based approach , which takes X
advantage of the transformation properties of (0)
hi,c00 = Wcz δzi (3)
spherical harmonics Yml , exhibits higher-order X
z

equivariant capabilities and excels in various force


(t,short/long)
h̄i,cl
2 m2
=
(t)
Wcc̃l hi,c̃l
2
(t)
2 m2
(4)

field tasks [30]. Currently, this method can be  
rij
unified within the e3nn framework [32]. s sin nπ short/long
short/long 2 rcut
jn (rij ) = short/long
rcut rij
Multi-scale higher-order equivariant
×
short/long
fcut (rij ) (5)
model n o
(t,short/long)
Rcη l l l
1 1 2 3
(rij ) = MLP
short/long
jn (rij ) (6)
Taking the MACE model as an example, we X
(t,short/long) l m
will illustrate the construction of the multi-scale ϕij,cη l m = Cη3 ,l 3m
1 3 3 1 1 1 l2 m 2
l1 l2 m1 m2
higher-order equivariant model as shown in Fig. m
(t,short/long) (t,short/long)
1. First, the atomic numbers are initially mapped × Rcη (rij )Yl 1 (r̂ ij )h̄j,cl m
1 l1 l2 l3 1 2 2

to a one-dimensional vector δzi through one-hot (7)


encoding. Subsequently, a linear transformation X X
(0)
(t,short/long)
Ai,cl m
3 3
=
(t)
Wcc̃η l
1 3
ϕ
(t,short/long)
ij,k̃η1 l3 m3
(8)
initializes the node as hi,c00 , where the index k̃,η1 j∈N (i)

“00” indicates that the current node contains (t)


Ai,cl m =
(t,short)
(Ai,cl m + Ai,cl
(t,long)
)/2 (9)
3 3 3 3 3 m3
only scalar information. In the subsequent net- X ν
Y
work framework, node features are denoted by (t),ν
Bi,ην cLM =
LM
Cην lm
(t)
Ai,cl
ξ mξ
(10)
(t,short/long) lm ξ=1
h̄i,cl2 m2 , where i represents the atomic num- XX
ber, l2 m2 signifies the specific spherical harmonic
(t)
mi,cLM = Wz
(t),ν
B
(t),ν
i ην cL i,ην cLM
(11)
ν ην
features, c denotes the channel count, t indicates X X
(t+1) (t) (t) (t) (t)
the interaction layer, and “short/long” is a short- hi,cLM = WcL,c̃ mi,c̃LM + Wz h
i cL,c̃ i,c̃LM
c̃ c̃
hand to avoid repetition. This is because the
(12)
operations for short-range and long-range inter- XX
actions in the interaction layer are similar and Ei =
(t) (t)
Wc hi,cLM (13)
t c
implemented through Eqs. (4) to (8). The key X
difference lies in the fact that the number of neigh- F = −∇ Ei (14)
i
boring atoms to handle long-range interactions is
larger. Therefore, the number of channels c and Eq. (7) is the key part of the equivari-
the order l for long-range interactions should be ant network, merging neighboring atomic fea-
smaller than those for short-range interactions to (t,short/long)
tures hj,cl2 m2 with the radial informa-
reduce computational complexity. Finally, short- (t,short/long)
range and long-range information is summed up tion Rcη1 l1 l2 l3 (rij ) and directional information
using Eq. (9). Ylm
1
1
through convolutional filtering operations.
Eq. (4) represents a linear transformation that Eq. (8) involves pooling operations and linear
satisfies equivariance requirements, which only transformations. It is worth noting that the han-
occurs between the same orders. Therefore, it dling of long-range information modules may

13
(t) (t+1)
Fig. 7 |Equation calculation flow diagram.A single interaction layer calculation is completed from hi,cLM to hi,cLM .
(t+1)
The final energy is obtained by summing the linearly read-out results from each layer’s hi,cLM .The forces come from the
negative gradient of the final energy with respect to the coordinates.

require zero-padding to align with short-range layer is 2; directional information is expanded to


node information, facilitating the final summation the 3rd order; and the cutoff radius is set to 8 Å.
in Eq. (9). For multiscale higher-order equivariant mod-
Eqs. (10) and (11) outline the process of effi- els, the hyperparameter settings for the short-
ciently calculating many-body interactions in the range node features, the expansion order of short-
MACE framework. Details can be found in the range directional information, and the number of
literature. [27, 29] Eq. (12) involves residual con- interaction layers remain the same as described
nections [67], designed to update node features. above. However, the long-range node features are
Fig. (7) illustrates the calculation process. represented as 8x0e and the long-range directional
information is expanded to the first order. The
Hyperparameter settings cutoff radius for short-range and long-range are
set to 3 Å and 8 Å, respectively. For the Allegro
All models were trained on a NVIDIA RTX 4090 model, the env embed multiplicity is set to 8, and
GPU in single-GPU training using float32 preci- latent mlp latent dimensions is set to 256.
sion. Unless explicitly stated, the default hyper- The training hyperparameters include an ini-
parameter settings for all models in this paper are tial learning rate of 0.01 and a ReduceLROn-
as follows: the embedding dimension of the radial Plateau scheduler, which reduces the learning rate
basis function is 8; smooth truncation is set to when the validation loss does not improve over
be a polynomial envelope function with p=6; the a certain number of epochs. To update the eval-
radial MLP is [64, 64, 64]; the dimension of the uation and final model weights of the validation
readout layer is 16; node features are represented dataset, an exponential moving average with a
as 64x0e; the number of layers in the interaction weight of 0.99 is applied. The optimizer is Adam,

14
and the total number of training epochs is set to simulations based on the MLFF models are exe-
4000. Energy is normalized by the moving aver- cuted using version 1.0 of the OpenMM-Torch
age of the potential energy. The loss function is as plugin and version 8.0.0 of the OpenMM software.
follows:
Test of simulation speed and
B
!2 memory consumption
λE X Eb − Êb
L= Before benchmarking, it is essential to preheat
B Nb
b=1 the GPU using partial data. The timer employs
B Nb ,3  2 torch.cuda.Event(), and the dataset with dif-
λF X 1 X ∂Eb
+ − − F̂ib ,α ferent numbers of atoms should contain at least
3B Nb i ,α=1 ∂rib ,α
b=1 b 20 samples. Each sample is run 50 times, and
(15) the average value is taken as the final result.
Similarly, torch.cuda.max memory allocated()
The initial weight values of the force and from the official torch is used to record the peak
energy follow common settings [27, 29], where the GPU memory consumption. Before each recording
force weight is set to 1000, and the energy weight round, torch.cuda.reset peak memory stats()
is the number of atoms of the system. For MACE and torch.cuda.empty cache() are employed to
and MS-MACE models, Stochastic Weight Aver- reset information and release excess cache. The
aging (SWA) [68, 69] is enabled at 75% of the total GPU of the test platform is a 40G A100.
iteration count. After initiating SWA, as in [29],
the energy weights and force weights of the loss
function are reset to 1000 and 10, respectively.
Code availability
The code and sample scripts will be released after
DFT settings review.
Since this paper requires calculations with thou-
sands of atoms and the need to perform AIMD, Data availability
the energy and forces of all data sets are calcu-
lated using the lower computational cost but good The dataset will be released after review.
accuracy method, i.e., the semi-empirical GFN2-
xTB level of theory[55]. This choice is made to Competing interests
balance computational cost and accuracy. The
convergence level follows the default settings of The authors declare no competing interests.
the xtb software [70].
It is essential to note that in the process of data Acknowledgements
collection, a pretrained MLFF based on the mul-
tiscale higher-order equivariant model is used to This work was supported by the National Key
perform high-precision MD, which is significantly R&D Program of China(No. 2021YFB3803200)
faster than the AIMD simulation. Therefore, the and the National Natural Science Foundation of
data sampling stage is not computationally inten- China under Grant No. 22273112.
sive. In practice, the primary cost of generating
data labeling still lies in labeling using DFT. More References
accurate quantum-mechanical calculations can be
employed based on specific requirements. [1] Zhang, J., Wang, X., Zhu, Y., Shi, T., Tang,
Z., Li, M., Liao, G.: Molecular dynamics sim-
Molecular dynamics settings ulation of the melting behavior of copper
nanorod. Computational Materials Science
Periodic AIMD simulations based on the semi- 143, 248–254 (2018)
empirical GFN2-xTB level of theory are imple-
mented using the DFTB+ software [71]. The MD

15
[2] Zhong, Z., Du, G., Wang, Y., Jiang, J.: Phase processing systems 30 (2017)
behaviors of dialkyldimethylammonium bro-
mide bilayers. Langmuir 39(31), 11081–11089 [11] Wang, H., Zhang, L., Han, J., Weinan, E.:
(2023) Deepmd-kit: A deep learning package for
many-body potential energy representation
[3] Ma, L., Zhong, Z., Hu, J., Qing, L., Jiang, and molecular dynamics. Computer Physics
J.: Long-lived weak ion pairs in ionic liquids: Communications 228, 178–184 (2018)
An insight from all-atom molecular dynamics
simulations. The Journal of Physical Chem- [12] Schütt, K.T., Arbabzadah, F., Chmiela, S.,
istry B (2023) Müller, K.R., Tkatchenko, A.: Quantum-
chemical insights from deep tensor neural net-
[4] Perilla, J.R., Schulten, K.: Physical proper- works. Nature communications 8(1), 13890
ties of the hiv-1 capsid from all-atom molec- (2017)
ular dynamics simulations. Nature communi-
cations 8(1), 15959 (2017) [13] Unke, O.T., Meuwly, M.: Physnet: A neu-
ral network for predicting energies, forces,
[5] Mouvet, F., Villard, J., Bolnykh, V., Rothlis- dipole moments, and partial charges. Journal
berger, U.: Recent advances in first-principles of chemical theory and computation 15(6),
based molecular dynamics. Accounts of 3678–3693 (2019)
Chemical Research 55(3), 221–230 (2022)
[14] Wang, L., Liu, Y., Lin, Y., Liu, H., Ji,
[6] Behler, J., Parrinello, M.: Generalized neural- S.: Comenet: Towards complete and efficient
network representation of high-dimensional message passing for 3d molecular graphs.
potential-energy surfaces. Physical review let- Advances in Neural Information Processing
ters 98(14), 146401 (2007) Systems 35, 650–664 (2022)

[7] Chmiela, S., Tkatchenko, A., Sauceda, H.E., [15] Liu, Y., Wang, L., Liu, M., Lin, Y., Zhang,
Poltavsky, I., Schütt, K.T., Müller, K.- X., Oztekin, B., Ji, S.: Spherical message
R.: Machine learning of accurate energy- passing for 3d molecular graphs. In: Inter-
conserving molecular force fields. Science national Conference on Learning Representa-
Advances 3(5), 1603015 (2017) https://doi. tions (2022). https://openreview.net/forum?
org/10.1126/sciadv.1603015 id=givsRXsOt9r

[8] Chmiela, S., Vassilev-Galindo, V., Unke, [16] Satorras, V.G., Hoogeboom, E., Welling, M.:
O.T., Kabylda, A., Sauceda, H.E., E (n) equivariant graph neural networks. In:
Tkatchenko, A., Müller, K.-R.: Accurate International Conference on Machine Learn-
global machine learning force fields for ing, pp. 9323–9332 (2021). PMLR
molecules with hundreds of atoms. Science
Advances 9(2), 0873 (2023) [17] Schütt, K., Unke, O., Gastegger, M.: Equiv-
ariant message passing for the prediction
[9] Tran, R., Lan, J., Shuaibi, M., Wood, B.M., of tensorial properties and molecular spec-
Goyal, S., Das, A., Heras-Domingo, J., Kol- tra. In: International Conference on Machine
luru, A., Rizvi, A., Shoghi, N., et al.: The Learning, pp. 9377–9388 (2021). PMLR
open catalyst 2022 (oc22) dataset and chal-
lenges for oxide electrocatalysts. ACS Catal- [18] Jing, B., Eismann, S., Suriana, P., Town-
ysis 13(5), 3066–3084 (2023) shend, R.J.L., Dror, R.: Learning from pro-
tein structure with geometric vector per-
[10] Schütt, K., Kindermans, P.-J., Sauceda Felix, ceptrons. In: International Conference on
H.E., Chmiela, S., Tkatchenko, A., Müller, Learning Representations (2021). https://
K.-R.: Schnet: A continuous-filter convolu- openreview.net/forum?id=1YLJDvSx6J4
tional neural network for modeling quantum
interactions. Advances in neural information

16
[19] Le, T., Noé, F., Clevert, D.-A.: Equivari- S., Mohamed, S., Agarwal, A., Belgrave, D.,
ant Graph Attention Networks for Molecular Cho, K., Oh, A. (eds.) Advances in Neural
Property Prediction (2022) Information Processing Systems, vol. 35, pp.
11423–11436 (2022)
[20] Thomas, N., Smidt, T., Kearnes, S., Yang,
L., Li, L., Kohlhoff, K., Riley, P.: Tensor [28] Zhang, X., Wang, L., Helwig, J., Luo, Y., Fu,
field networks: Rotation- and translation- C., Xie, Y., Liu, M., Lin, Y., Xu, Z., Yan,
equivariant neural networks for 3D point K., et al.: Artificial intelligence for science in
clouds (2018) quantum, atomistic, and continuum systems.
arXiv preprint arXiv:2307.08423 (2023)
[21] Anderson, B., Hy, T.S., Kondor, R.: Cor-
morant: Covariant molecular neural net- [29] Kovács, D.P., Batatia, I., Arany, E.S.,
works. Advances in neural information pro- Csányi, G.: Evaluation of the MACE force
cessing systems 32 (2019) field architecture: From medicinal chemistry
to materials science. The Journal of Chem-
[22] Brandstetter, J., Hesselink, R., Pol, E., ical Physics 159(4), 044118 (2023) https://
Bekkers, E.J., Welling, M.: Geometric and doi.org/10.1063/5.0155322
physical quantities improve e(3) equivariant
message passing. In: International Conference [30] Fu, X., Wu, Z., Wang, W., Xie, T., Keten, S.,
on Learning Representations (2022). https: Gomez-Bombarelli, R., Jaakkola, T.: Forces
//openreview.net/forum?id= xwr8gOBeV1 are not enough: Benchmark and critical
evaluation for machine learning force fields
[23] Liao, Y.-L., Smidt, T.: Equiformer: Equiv- with molecular simulations. arXiv preprint
ariant graph attention transformer for 3d arXiv:2210.07237 (2022)
atomistic graphs. In: The Eleventh Interna-
tional Conference on Learning Representa- [31] Joshi, C.K., Bodnar, C., Mathis, S.V., Cohen,
tions (2023). https://openreview.net/forum? T., Lio, P.: On the expressive power of geo-
id=KwmPfARgOTD metric graph neural networks. arXiv preprint
arXiv:2301.09308 (2023)
[24] Batzner, S., Musaelian, A., Sun, L., Geiger,
M., Mailoa, J.P., Kornbluth, M., Molinari, N., [32] Geiger, M., Smidt, T.: e3nn: Euclidean neural
Smidt, T.E., Kozinsky, B.: E (3)-equivariant networks. arXiv preprint arXiv:2207.09453
graph neural networks for data-efficient and (2022)
accurate interatomic potentials. Nature com-
munications 13(1), 2453 (2022) [33] Rackers, J.A., Tecot, L., Geiger, M., Smidt,
T.E.: A recipe for cracking the quantum scal-
[25] Musaelian, A., Batzner, S., Johansson, A., ing limit with machine learned electron den-
Sun, L., Owen, C.J., Kornbluth, M., Kozin- sities. Machine Learning: Science and Tech-
sky, B.: Learning local equivariant represen- nology 4(1), 015027 (2023)
tations for large-scale atomistic dynamics.
Nature Communications 14(1), 579 (2023) [34] Tokita, A.M., Behler, J.: Tutorial: How
to train a neural network potential. arXiv
[26] Batatia, I., Batzner, S., Kovács, D.P., preprint arXiv:2308.08859 (2023)
Musaelian, A., Simm, G.N.C., Drautz, R.,
Ortner, C., Kozinsky, B., Csányi, G.: The [35] Klicpera, J., Becker, F., Günnemann, S.:
Design Space of E(3)-Equivariant Atom- Gemnet: Universal directional graph neu-
Centered Interatomic Potentials (2022) ral networks for molecules. In: Beygelzimer,
A., Dauphin, Y., Liang, P., Vaughan, J.W.
[27] Batatia, I., Kovacs, D.P., Simm, G., Ortner, (eds.) Advances in Neural Information Pro-
C., Csanyi, G.: Mace: Higher order equiv- cessing Systems (2021). https://openreview.
ariant message passing neural networks for net/forum?id=HS sOaxS9K-
fast and accurate force fields. In: Koyejo,

17
[36] Wen, T., Zhang, L., Wang, H., Weinan, E., [44] Zuo, Y., Chen, C., Li, X., Deng, Z., Chen,
Srolovitz, D.J.: Deep potentials for materi- Y., Behler, J., Csányi, G., Shapeev, A.V.,
als science. Materials Futures 1(2), 022601 Thompson, A.P., Wood, M.A., et al.: Per-
(2022) formance and cost assessment of machine
learning interatomic potentials. The Journal
[37] Gao, X., Ramezanghorbani, F., Isayev, O., of Physical Chemistry A 124(4), 731–745
Smith, J.S., Roitberg, A.E.: Torchani: A free (2020)
and open source pytorch-based deep learn-
ing implementation of the ani neural network [45] Huang, J., Zhang, L., Wang, H., Zhao, J.,
potentials. Journal of chemical information Cheng, J., et al.: Deep potential genera-
and modeling 60(7), 3408–3415 (2020) tion scheme and simulation protocol for the
li10gep2s12-type superionic conductors. The
[38] Wang, Z., Wu, H., Sun, L., He, X., Liu, Z., Journal of Chemical Physics 154(9) (2021)
Shao, B., Wang, T., Liu, T.-Y.: Improving
machine learning force fields for molecular [46] Chung, L.W., Sameera, W., Ramozzi, R.,
dynamics simulations with fine-grained force Page, A.J., Hatanaka, M., Petrova, G.P., Har-
metrics. The Journal of chemical physics ris, T.V., Li, X., Ke, Z., Liu, F., et al.: The
159(3), 035101 (2023) oniom method and its applications. Chemical
reviews 115(12), 5678–5796 (2015)
[39] Li, Y., Wang, Y., Huang, L., Yang, H.,
Wei, X., Zhang, J., Wang, T., Wang, [47] Collins, M.A., Bettens, R.P.: Energy-based
Z., Shao, B., Liu, T.-Y.: Long-short-range molecular fragmentation methods. Chemical
message-passing: A physics-informed frame- reviews 115(12), 5607–5642 (2015)
work to capture non-local interaction for scal-
able molecular dynamics simulation. arXiv [48] Grisafi, A., Ceriotti, M.: Incorporating long-
preprint arXiv:2304.13542 (2023) range physics in atomic-scale machine learn-
ing. The Journal of chemical physics 151(20)
[40] Di Giovanni, F., Giusti, L., Barbero, F., (2019)
Luise, G., Lio, P., Bronstein, M.M.: On over-
squashing in message passing neural net- [49] Fedik, N., Zubatyuk, R., Kulichenko, M.,
works: The impact of width, depth, and Lubbers, N., Smith, J.S., Nebgen, B.,
topology. In: International Conference on Messerly, R., Li, Y.W., Boldyrev, A.I., Bar-
Machine Learning, pp. 7865–7885 (2023). ros, K., et al.: Extending machine learning
PMLR beyond interatomic potentials for predicting
molecular properties. Nature Reviews Chem-
[41] Zhang, L., Wang, H., Muniz, M.C., Pana- istry 6(9), 653–672 (2022)
giotopoulos, A.Z., Car, R., et al.: A deep
potential model with long-range electro- [50] Prodan, E., Kohn, W.: Nearsightedness
static interactions. The Journal of Chemical of electronic matter. Proceedings of the
Physics 156(12) (2022) National Academy of Sciences 102(33),
11635–11638 (2005) https://doi.org/10.1073/
[42] Ko, T.W., Finkler, J.A., Goedecker, S., pnas.0505436102
Behler, J.: A fourth-generation high-
dimensional neural network potential with [51] Chmiela, S., Sauceda, H.E., Poltavsky, I.,
accurate electrostatics including non-local Müller, K.-R., Tkatchenko, A.: sgdml: Con-
charge transfer. Nature communications structing accurate and data efficient molecu-
12(1), 398 (2021) lar force fields using machine learning. Com-
puter Physics Communications 240, 38–45
[43] Batatia, I., Schaaf, L.L., Chen, H., Csányi, (2019)
G., Ortner, C., Faber, F.A.: Equivari-
ant matrix function neural networks. arXiv [52] Batzner, S., Musaelian, A., Sun, L., Geiger,
preprint arXiv:2310.10434 (2023) M., Mailoa, J.P., Kornbluth, M., Molinari, N.,

18
Smidt, T.E., Kozinsky, B.: E (3)-equivariant (2020)
graph neural networks for data-efficient and
accurate interatomic potentials. Nature com- [61] Wang, J., Wolf, R.M., Caldwell, J.W., Koll-
munications 13(1), 2453 (2022) man, P.A., Case, D.A.: Development and
testing of a general amber force field. Journal
[53] Brandstetter, J., Hesselink, R., Pol, E., of computational chemistry 25(9), 1157–1174
Bekkers, E.J., Welling, M.: Geometric and (2004)
physical quantities improve e (3) equiv-
ariant message passing. arXiv preprint [62] Huguenin-Dumittan, K.K., Loche, P., Hao-
arXiv:2110.02905 (2021) ran, N., Ceriotti, M.: Physics-inspired equiv-
ariant descriptors of nonbonded interactions.
[54] Lubbers, N., Smith, J.S., Barros, K.: Hierar- The Journal of Physical Chemistry Letters
chical modeling of molecular energies using a 14, 9612–9618 (2023)
deep neural network. The Journal of chemical
physics 148(24) (2018) [63] Cohen, T., Welling, M.: Group equivari-
ant convolutional networks. In: International
[55] Bannwarth, C., Ehlert, S., Grimme, S.: Gfn2- Conference on Machine Learning, pp. 2990–
xtb—an accurate and broadly parametrized 2999 (2016). PMLR
self-consistent tight-binding quantum chemi-
cal method with multipole electrostatics and [64] Darby, J.P., Kovács, D.P., Batatia, I.,
density-dependent dispersion contributions. Caro, M.A., Hart, G.L., Ortner, C., Csányi,
Journal of chemical theory and computation G.: Tensor-reduced atomic density repre-
15(3), 1652–1671 (2019) sentations. Physical Review Letters 131(2),
028001 (2023)
[56] Stocker, S., Gasteiger, J., Becker, F.,
Günnemann, S., Margraf, J.T.: How robust [65] Han, J., Rong, Y., Xu, T., Huang, W.: Geo-
are modern graph neural network poten- metrically equivariant graph neural networks:
tials in long and hot molecular dynamics A survey. arXiv preprint arXiv:2202.07230
simulations? Machine Learning: Science and (2022)
Technology 3(4), 045010 (2022)
[66] Gasteiger, J., Groß, J., Günnemann, S.:
[57] Liu, Y., He, X., Mo, Y.: Discrepancies and the Directional message passing for molecu-
error evaluation metrics for machine learning lar graphs. In: International Conference on
interatomic potentials. npj Computational Learning Representations (ICLR) (2020).
Materials 9(1), 174 (2023) https://arxiv.org/abs/2003.03123

[58] Schütt, K.T., Chmiela, S., Von Lilienfeld, [67] He, K., Zhang, X., Ren, S., Sun, J.: Deep
O.A., Tkatchenko, A., Tsuda, K., Müller, K.- residual learning for image recognition. In:
R.: Machine learning meets quantum physics. Proceedings of the IEEE Conference on Com-
Lecture Notes in Physics (2020) puter Vision and Pattern Recognition, pp.
770–778 (2016)
[59] Zhang, Y., Wang, H., Chen, W., Zeng, J.,
Zhang, L., Wang, H., Weinan, E.: Dp-gen: A [68] Izmailov, P., Podoprikhin, D., Garipov, T.,
concurrent learning platform for the genera- Vetrov, D., Wilson, A.G.: Averaging weights
tion of reliable deep learning based potential leads to wider optima and better generaliza-
energy models. Computer Physics Communi- tion. arXiv preprint arXiv:1803.05407 (2018)
cations 253, 107206 (2020)
[69] Athiwaratkun, B., Finzi, M., Izmailov,
[60] Schran, C., Brezina, K., Marsalek, O.: Com- P., Wilson, A.G.: There are many con-
mittee neural network potentials control gen- sistent explanations of unlabeled data:
eralization errors and enable active learning. Why you should average. arXiv preprint
The Journal of Chemical Physics 153(10) arXiv:1806.05594 (2018)

19
[70] Bannwarth, C., Caldeweyher, E., Ehlert, S.,
Hansen, A., Pracht, P., Seibert, J., Spicher,
S., Grimme, S.: Extended tight-binding quan-
tum chemistry methods. Wiley Interdisci-
plinary Reviews: Computational Molecular
Science 11(2), 1493 (2021)

[71] Hourahine, B., Aradi, B., Blum, V., Bonafé,


F., Buccheri, A., Camacho, C., Cevallos, C.,
Deshaye, M., Dumitrică, T., Dominguez, A.,
et al.: Dftb+, a software package for effi-
cient approximate density functional theory
based atomistic simulations. The Journal of
chemical physics 152(12) (2020)

20
Supporting Information
for
arXiv:2312.09490v1 [cond-mat.soft] 15 Dec 2023

Efficient Machine Learning Force Field for


Large-Scale Molecular Simulations of
Organic Systems

Junbao Hu,†,‡ Liyang Zhou,∗,¶ and Jian Jiang∗,†,‡

†Beijing National Laboratory for Molecular Sciences, State Key Laboratory of Polymer
Physics and Chemistry, Institute of Chemistry, Chinese Academy of Sciences, Beijing
100190, P. R. China
‡University of Chinese Academy of Sciences, Beijing 100049, P. R. China
¶Juhua Group Co., Ltd, Quzhou, 324004, P. R. China

E-mail: [email protected]; [email protected]

S1
Local test

Fig. S1 shows the prediction accuracy, simulation speed, and GPU memory consumption of
the MACE model on the perfluorotri-n-butylamine system for different cutoff radius. Energy
and force predictions are measured by averaging the mean absolute error (MAE) of models
trained with five different random seeds to mitigate the effects of random seed variations. The
results indicate that a cutoff radius of 8 Å achieves optimal accuracy, but results in higher
simulation speed and GPU memory consumption. Overall, with the increase in cutoff radius
rc , the simulation speed and memory consumption of the model follow a cubic relationship.

S2
a b

c d

Figure S1. | a and b are the changes of the average absolute errors of the energy and forces
per atom with the increase of the atom numbers, respectively. c and d are the changes
of time cost and memory consumption of simulations with the increase of atom numbers.
The straight lines are obtained by linear fittings to the data points using the least squares
method. The blue, green, yellow, red, and purple lines are 5 Å, 6 Å, 7 Å, 8 Å, and 9 Å,
respectively.

S3

You might also like