Koopman Neural Operator As A Mesh-Free Solver of Non-Linear Partial Differential Equations

Koopman neural operator as a mesh-free solver of non-linear partial differential
equations∗
Wei Xiong,† Xiaomeng Huang,† Ziyang Zhang,‡ Ruixuan Deng,§ Pei Sun,¶ and Yang Tian∗∗
The lacking of analytic solutions of diverse partial differential equations (PDEs) gives birth to
a series of computational techniques for numerical solutions. Although numerous latest advances
are accomplished in developing neural operators, a kind of neural-network-based PDE solver, these
solvers become less accurate and explainable while learning long-term behaviors of non-linear PDE
families. In this paper, we propose the Koopman neural operator (KNO), a new neural operator,
to overcome these challenges. With the same objective of learning an infinite-dimensional map-
ping between Banach spaces that serves as the solution operator of the target PDE family, our
approach differs from existing models by formulating a non-linear dynamic system of equation so-
lution. By approximating the Koopman operator, an infinite-dimensional operator governing all
possible observations of the dynamic system, to act on the flow mapping of the dynamic system, we
arXiv:2301.10022v2 [cs.LG] 6 May 2024
can equivalently learn the solution of a non-linear PDE family by solving simple linear prediction
problems. We validate the KNO in mesh-independent, long-term, and5zero-shot predictions on five
representative PDEs (e.g., the Navier-Stokes equation and the Rayleigh-Bénard convection) and
three real dynamic systems (e.g., global water vapor patterns and western boundary currents). In
these experiments, the KNO exhibits notable advantages compared with previous state-of-the-art
models, suggesting the potential of the KNO in supporting diverse science and engineering applica-
tions (e.g., PDE solving, turbulence modelling, and precipitation forecasting).
I. INTRODUCTION Φ with Γ for a typically time-dependent PDE family
∂t γ (xt ) = (Lϕ γ) (xt ) + η (xt ) , xt ∈ D × T, (1)

A. Partial differential equation solvers are
important γ (xt ) = γB , xt ∈ ∂D × T, (2)
γ (x0 ) = γI , x0 ∈ D × {0}. (3)
In science and engineering, partial differential equa- In Eq. (1-3), notions γB and γI denote the boundary
tion (PDE) is a fundamental tool to characterize vari- and initial conditions, set T = [0, ∞) is the time domain,
ous problems (e.g., problems in fluid mechanics, quan- differential operator Lϕ is characterized depending on ϕ,
tum mechanics, and civil engineering) [1]. However, even fixed function η (·) lives in an appropriate function space
though significant progress has been achieved on solving determined by Lϕ , and γ (·) is the solution of the PDE
PDEs [2], numerous important PDEs, such as the Navier- family. This formulation gives rise to the solution oper-
Stokes equation, still lack analytic solutions [3]. Conse- ator Q : (ϕ, γB , γI ) 7→ γ, which reduces to Q : ϕ 7→ γ if
quently, the urgent needs of solving complicated PDEs in boundary and initial conditions are constant. For con-
real applications have given birth to diverse techniques venience, we always consider fixed boundary and initial
for computationally approximating PDE solutions [4]. conditions in our subsequent derivations. In application,
researchers consider a parametric counterpart Qθ ≃ Q

Given Φ = Φ D; Rdϕ , a Banach space of inputs, and
Γ = Γ D; Rdγ , a Banach space of solutions, defined on parameterized by θ to define optimization problems [5].
a bounded open set D ⊂ Rd , these PDE solvers are ex-
pected to approximate a solution operator Q that relates
B. Existing partial differential equation solvers are
diverse
To date, diverse types of PDE solvers have been devel-

∗
oped, which can be generally classified as following:
Correspondence should be addressed to X.R.D., P.S., and Y.T.
† Department of Earth System Science, Tsinghua University, Bei- (1) Classic numerical solvers. The earlist and
jing, 100084, China.
‡ Markov Laboratory, Central Research Institute, 2012 Laborato-
commonest PDE solvers, such as finite ele-
ries, Huawei Technologies Co. Ltd., China. ment [6] and finite difference [7] methods, solve
§ [email protected]; Department of Statistics, University of PDEs by discretizing the space according to
Michigan, Ann Arbor, MI 48104, Michigan, United States of specific mesh designs. These approaches are
America. granularity-dependent, whose accuracy favors fine-
¶ [email protected]; Faculty of Health Sciences, City Univer-
sity of Macau, Macau, 999078, China.
grained meshes while efficiency prefers coarse-
∗∗ [email protected]; Faculty of Health Sciences, City Univer- grained meshes [8]. Therefore, they inevitably face
sity of Macau, Macau, 999078, China.; Also at Faculty of Data the tradeoff between accuracy and efficiency when
Science, City University of Macau, Macau, 999078, China. the target PDE is complicated [9].
2
(2) Neural-network-based solvers. To revolution- Because these solvers learn the solution opera-
ize the computational techniques of PDE solving, tor directly, they only need to be trained once
three types of neural-network-based solvers have for a target PDE family. Generating equa-
been proposed to approximate or enhance the clas- tion solution γ (·) of different instances of the
sic ones in a fast manner [10, 11]: PDE family only requires a forward pass of
networks, which is computationally favorable
(2a) Mesh-dependent and finite-dimensional [5, 9]. Although neural operators are initially
operators. The first type of solvers approx- not competitive with other neural-network-
imate the solution operator as a parameter- based solvers because evaluating kernal inte-
ized neural network between finite Euclidean gral operators is costly, the latest approach,
spaces after discretizing domains D and T into named as Fourier neural operator [9], resolves
x and y meshes, i.e., Qθ : Rx × Ry × Θ → this limitation by fast Fourier transform.
Rx × Ry [12–14]. These solvers are mesh-
dependent and require fine-tuning on different Compared with classic numerical solvers, neural-
values of n, leading to limited generalization network-based solvers, especially neural operators, are
capacities [5]. more efficient in dealing with science and engineering
questions where PDEs are complicated [5, 9]. Therefore,
(2b) Neural finite element methods. The sec- our research primarily focus on neural operator designs.
ond type of solvers directly parameterize equa- C. Existing partial differential equation solvers
tion solution γ (·) as a neural network, which face challenges
equivalently gives rise to Qθ : D × T × Θ → R
[10, 15–18]. Although these solvers are mesh-
Despite substantial progress achieved by neural opera-
independent and accurate, they are limited to
tors in theoretical foundations (e.g., Ref. [22]), approxi-
learn a certain instance of the PDE rather
mator designs (e.g., Refs. [9, 23]), and applications (e.g.,
than the entire family [5]. Therefore, similar
Ref. [24]), there still remain numerous challenges in ex-
to the classic numerical ones, these solvers re-
isting neural operator solvers, among which, a critical one
quires new network design and training when-
lies in the limited capacity of existing models to learn the
ever the instance is changed. Meanwhile, most
long-term dynamics of complicated PDEs.
of these solvers are not applicable to the cases
To understand this challenge, let us consider a Green
where the underlying PDE remains unknown
function, Jϕ : (D × T ) × (D × T ) → R, associated with
(see an exception in the finite element network
Eqs. (1-3) [5]
[18], which supports spatio-temporal forecast-
ing on real data). Z
γ (xt+ε ) = Jϕ (xt , yt ) η (yt ) dyt , ∀ xt ∈ D × {t}.
(2c) Neural operators. The third type of solvers D×{t}
are developed to learn a mesh-dependent and (4)
infinite-dimensional solution operator with
neural networks, i.e, Qθ : Φ×Θ → Γ [5, 9, 19– The initial idea of neural operators [5] is to parameterize
23]. These solvers overcome the dependence this Green function as a kernel integral operator (i.e.,
on meshes by learning network parameters in a see the integral term related to κθ presented below) and
manner applicable to different discretizations. define an iterative update strategy of neural networks
Z !
γ
b (xt+ε ) = σ W γ
b (xt ) + κθ (xt , yt , ϕ (xt ) , ϕ (yt )) γ
b (yt ) dyt , ∀ xt ∈ D × {t}, (5)
D×{t}
where ε ∈ (0, ∞) denotes time difference, notion γb : [9],

i.e., Eq. (5) can be reformulated as γ b (xt+ε ) =
D × T → Rdγb denotes the neural network representa- σ Wγ b (xt ) + F −1 (F (κθ ) · F (b
γ (xt ))) for any xt ∈ D ×
tion of equation solution γ generated by specific em- {t} (notion F (·) is the Fourier transform that can be
bedding, mapping σ : R → R denotes an element-wise realized by fast Fourier transform in application).
activation function, notion W : Rdγb → Rdγb denotes
a linear transformation, and κθ : R2(d+dϕ ) → Rdγb is At the first glance, the iterative update in Eq. (5) is
a neural network parameterized by θ [5]. The com- similar to a dynamic system perspective where we study
puting efficiency of this iterative update can be im- the evolution mapping ζ : Rdγ × T → Rdγ of an infinite-
proved using the well-known Fourier neural operator dimensional non-linear dynamic system γt = γ (D × {t})
(notion γ (D × {t}) represents that function γ acts on all
3
elements in set D × {t}) (1) Long-term behaviour of the PDE family as a

Z t+ε non-linear dynamic system of equation solu-
γt+ε = γt + ζ (γτ , τ ) dτ, ∀t ∈ T. (6) tion. Besides learning the solution operator of an
t entire target PDE family, we formalize a non-linear
In this dynamic system, each snapshot γt is a field dynamic system of equation solution described by
distributed on domain D at moment t, i.e., a time- Eq. (6) in the meanwhile. This characterization
dependent equation solution of Eqs. (1-3). The tem- supports to optimize the iterative update strategy
poral evolution of field γt is characterized by an infinite- of neural operators in Eq. (5) using dynamic sys-
dimensional evolution mapping ζ : Rdγ × T → Rdγ tem theory.
∂t γt = ζ (γt , t) , ∀γt ∈ Rdγ × T, (7) (2) Equivalent linear prediction of non-linear

dynamics via Koopman operator. To capture
which lies in the heart of dynamic system theories the intricate long-term dynamics, our model is de-
[25, 26]. When domain D stands for a bounded spatial signed to learn the Koopman operator, an infinite-
range, Eq. (5) can be understood as a general spatio- dimensional linear operator governing all observa-
temporal dynamics model of diverse phenomena in neu- tions of a dynamic system, to act on the evolution
roscience [27], geophysics [28], fluid mechanics [29], and mapping ζ (·, ·) of the dynamic system of equation
engineering [30]. solution. By doing so, we can transform the original
For most non-linear PDE families, the concerned evolu- task into an equivalent but simpler linear prediction
tion mapping ζ (·, ·) is even more intricate than the equa- problem.
tion solution γ (·) itself. In the terminologies of modern
dynamic system theory [31], Eq. (6) generally leads to (3) Balance between accuracy and efficiency in
the cocycle property of the flow mapping θ zero-shot and long-term prediction. We com-
′ ′
pare the KNO with existing state-of-the-art mod-
θtt = θt+τ
t
◦ θtt+τ , ∀t ≤ t + τ ≤ t′ ∈ T (8) els (e.g. the Fourier neural operator [9] and other
where notion ◦ denotes the composition of mappings. competitors) in zero-shot prediction (i.e., testing
This cocycle property defines how γ (·), the equation so- on an untrained discretization granularity or an
lution of target PDE family, evolves across adjoining time untrained prediction interval) and long-term pre-
intervals. Challenges frequently arise as time difference ε diction experiments on five representative PDEs
in Eq. (6) enlarges because the dynamic system of γ (·) (e.g., the Navier-Stokes equation and the Rayleigh-
can be either non-autonomous (i.e., the flow mapping θ Bénard convection) and three real dynamic sys-
associated with ζ (·, ·) is time-dependent) or autonomous tems (e.g., global water vapor patterns and west-
(i.e., the flow mapping θ is time-independent such that ern boundary currents). In these experiments, the
∂t ζ (·, t) ≡ 0) in different PDE families [25, 26]. If the KNO is shown as a competitive approach to realize
non-linear mapping ζ (·, ·) is time-dependent, the accu- PDE solving and dynamic system modelling.
rate modelling of the long-term behaviours of a PDE (i.e., The open source code release is provided in https:
ε → ∞) becomes increasingly challenging because the //github.com/Koopman-Laboratory/KoopmanLab. One
evolution rules of equation solution γ (·) change across can also see Ref. [36] for this toolbox.
time. To maintain accuracy, existing models are required
to enlarge model size or complexity, inevitably facing the
trade-off between accuracy and efficiency. Certainly, a II. THE KOOPMAN NEURAL OPERATOR:
model can constrain its strategy to only predict short- THEORY AND COMPUTATION
term dynamics (i.e., ε = 1). However, learning long-term
dynamics of PDEs serves as the cornerstone of diverse A. Time-dependent Koopman operator
important applications, such as weather forecasting [32],
epidemic prevention [33], economics analysis [34], and The non-linear and potentially non-autonomous dy-
earth modelling [35]. Therefore, overcoming the limita- namic system in Eqs. (6-8) makes the long-term pre-
tion in learning long-term PDE behaviours is necessary diction a daunting challenge. In practice, researchers al-
for optimizing PDE solvers, which is the core objective ways expect to deal with a linear dynamic system (i.e.,
of our research. ∂t γt = Aγt where A is a linear operator), whose dynam-
ics is sufficiently simple and can be accurately modelled.
Therefore, a natural question is whether we can trans-
D. Our framework and contribution form the original system in Eqs. (6-8) to a linear one to
simplify the learning task with reasonable errors.
In this paper, we attempt to overcome the challenge of According to modern dynamic system theory, such
predicting long-term PDE dynamics by proposing a new an objective can be realized by formulating the Koop-
neural operator named as the Koopman neural operator man operator K, an infinite-dimensional linear opera-
(KNO). Our framework and contributions are summa- tor governing all possible observations of a dynamic sys-
rized as following: tem, and letting it act on the flow mapping ζ (·, ·) to
4
linearize the original dynamic system in an appropriate of the PDE described by Eqs. (1-3) [51]
space [31]. For instance, let us take the case where sys-
′ ′
tem γt is autonomous (i.e., θtt can be simplified as θt ) M = Dnx + αγ (xt ) I, α ∈ C, (12)
as a simple illustration,
the family of Koopman oper- Mψ (xt ) = λψ (xt ) , λ ∈ C, (13)
ators Kε : G Rdγ × T → G Rdγ × T , parameterized ∂t ψ (xt ) = N ψ (xt ) , (14)
by time difference ε, is defined based on a set of ob-
servation function
(or named as measurement function) in which Dnx is the n-th total derivative operator and I
G Rdγ × T = {g|g : Rdγ × T → C} [31] is an identity operator. We notice that Eq. (13) actually
defines an eigenvalue problem associated with operator
Kε g (γt ) = g (θε (γt )) = g (γt+ε ) , ∀γt ∈ Rdγ × T. (9) M at moment t. By calculating the time derivative of Eq.
(13), we can observe a relation between linear operators
Given with an appropriate space defined by G Rdγ × T , M and N
we can linearize the dynamics of γt via Eq. (9). This idea
has seen notable success in fluid dynamics [37], robotics (∂t M + MN − N M) ψ (xt ) = ∂t λψ (xt ) . (15)
[38], plasma physics [39], and neuroscience [40].
Different from existing machine-learning-based Koop- This relation implies
man operator models that either are limited to au-
tonomous dynamic systems (e.g., the case described by ∂t M + [M, N ] = 0, (16)
Eq. (9)) [41–44] or require a priori knowledge about
the eigenvalue spectrum (e.g, the numbers of real and where [M, N ] = MN − N M stands for the commutator
complex eigenvalues) of Koopman operator for non- of operators [52]. Given Eqs. (12-16), we can combine
autonomous dynamic systems [45], our framework con- them with Eq. (11) to identify a relation between opera-
cerns a more general case where we consider a time- tor N and the time-dependent Koopman operator Ktt+ε
dependent Koopman operator applicable to both non-
autonomous and autonomous dynamic systems [46] Ktt+ε g (γt ) − g (γt )
ψ (D × {t}) = g (γt ) ⇒ N = lim .
t+ε→t ε
Ktt+ε g (γt ) = g θtt+ε (γt ) = g (γt+ε ) , ∀t ≤ t + ε ∈ T.

(17)
(10)
Note that Eq. (17) still holds when the evolution of
As shown in Eq. (10), this Koopman operator governs a equation solution γt is autonomous (i.e., we can consider
time-dependent linear evolution flow of g (γt ) in a space a time-invariant Koopman operator K in Eq. (17) di-
defined by G Rdγ × T rectly).
Therefore, the linearization of g (γt ) via a Koopman
Ktt+ε g (γt ) − g (γt )
∂t g (γt ) = lim . (11) operator is closely related to the Lax pair and can be
ε→0 ε understood in the aspect of the inverse scattering trans-
In mathematics, the adjoint of the Koopman operator form of integrable PDEs [51]. This relation has been
defined by Eq. (9) is the Perron-Frobenius operator of comprehensively studied in mathematics and physics [53–
dynamic systems [47] while the adjoint of the associated 56], ensuring the validity and effectiveness of using the
Lie operator (see A for details) is the Liouville operator Koopman operator theory during PDE solving.
of Hamiltonian dynamics [48, 49]. These properties re-
late our approach with well-known theories about linear
representation of dynamic systems in statistical physics C. Computational idea of the Koopman operator
and quantum mechanics [50]. approximation
After formalizing the time-dependent Koopman opera-

B. Time-dependent Koopman operator in PDE tor and confirming its relation to PDE solving, our next
solving task is explore a possible idea to computationally rep-
resent this Koopman operator. Inspired by the Hankel-
To understand how the linearization of g (γt ) realized DMD [57], sHankel-DMD [58], and HAVOK [59] algo-
by the Koopman operator is related to PDE solving, we rithms, we consider the Krylov sequence [60] of the ob-
can consider the Lax pair (M, N ) of an integrable version servable defined by a unit time step ε ∈ [0, ∞]
Rn = [g (γ0 ) , g (γε ) , g (γ2ε ) , . . . , g (γnε )] , (18)

h i
(n−1)ε
= g (γ0 ) , K0ε g (γ0 ) , Kε2ε K0ε g (γ0 ) , . . . , K(n−1)ε
nε
K(n−2)ε · · · K0ε g (γ0 ) , (19)
5

which can be seen in the Krylov subspace method for subspace, K ⊂ G Rdγ × T , of the Koopman operator
computing matrix eigenvalues [60]. Such a sequence can
be efficiently sampled by a Hankel matrix representation

K = span (Rn ) ≃ span H(m,n) (21)
of the system
  if n ≥ dim (K) − 1 (notion dim (·) denotes the dimension-
g (γε ) · · · ality). To see these possibilities, we consider the Galerkin
 g (γ0 ) g (γnε )
projection of the original Koopman operator to K, which

.. .. .. ..
 
Hm×n = 

. . . . ,

is denoted by K b tt+ε : G Rdγ × T → K for any t ∈ T .

For any function h (·) ∈ G Rdγ × T , we have
 
 
g γ(m−1)ε g (γmε ) · · · g γ(m+n−1)ε
(20) b tt+ε h (γt ) , g (γiε )⟩ = ⟨Ktt+ε h (γt ) , g (γiε )⟩, ∀i = 0, . . . , m,
⟨K
(22)
whose dimension of delay-embedding is m ∈ N+ . The
columns of Hm×n approximate the functions in the where ⟨·, ·⟩ is the inner product. According to Refs. [61,
Krylov subspace. Our motivation to consider this Krylov 62], the Koopman operator restricted to K approximates
subspace lies in the possibilities for it to span an invariant the original Koopman operator
Z
b tt+ε h (γt ) − Ktt+ε h (γt ) ∥F dµ = 0, ∀h (·) ∈ G Rdγ × T

lim ∥K (23)
m→∞ G (Rdγ ×T )
if the original Koopman operator is bounded and K The expensive online optimization is not favorable for
happens to be its invariant subspace (or simply
n → ∞). solving PDEs, persuading us to consider an alternative
Notion µ denotes a measure on G Rdγ × T and ∥ · ∥F is solution. As suggested in Ref. [63], we can assume a con-
the Frobenius norm. Therefore, we are expected to ap- stant form of the Koopmen operator, K b tt+ε , only when ε
proximate the original Koopman operator via a restricted is sufficiently small (i.e., the time interval is small enough
counterpart such that such that the changes of K b tt+ε during this duration are
negligible). Given this assumption, we can further con-
b (k+1)ε Hm×n (k) , ∀k = 1, . . . , n, (24)
Hm×n (k + 1) = K kε sider a condition under which the dynamic system of
g (γt ) is approximately ergodic (i.e., the observation of
where Hm×n (k) denotes the k-th column of Hm×n . equation solution, g (γt ), eventually visits all possible
A challenge lies in that all the derivations presented states in Rdγ as t → ∞, thus the proportion of time
above (i.e., the Krylov subspace method, Hankel matrix that g (γt ) spends on a particular state equals the prob-
representation, and the Galerkin projection) are initially ability of this state) [57, 64]. This condition makes the
proposed for autonomous systems [31, 57–59]. Although time-averaging approximate the true expectation of of an
we can formulate these mathematical expressions for a observable as the time approaches to infinity. Under this
time-dependent Koopman operator as shown in Eqs. (18- condition, we can define an expectation of the Koopman
24), this formulation inevitably requires an online opti- operator controlled by time difference ε
mization in computational implementation because the
concerned Koopman operator may change across time.
Z n
1 −1
X
Kε = lim g (γτ ) g (γτ +ε ) dτ ≃ argmin ∥Hm×n (k + 1) − P Hm×n (k) ∥F . (25)
t→∞ t [0,t) P ∈Rdγ +1 k=1
Given a fixed ε, the Koopman operator Kε : Therefore, the offline optimization of the Koopman op-
G Rdγ × T → K in Eq. (25) can be understood as the erator is required to support a high time resolution (i.e.,
time-average of K b tt+ε at different t. A representation of a small ε). Meanwhile, we need to find an appropri-
Kε only requires offline optimization and is computation- ate design of the observation function g (·) such that the
ally favorable for solving PDEs. ergodicity condition holds with reasonable errors. Al-
6
though these two requirements make analytic derivations are less ergodic and difficult to predict using the
or algorithmic approaches highly difficult (e.g., see Ref. Koopman operator. Please see Figure 1for details.
[63] for the usage of an integrated framework of offline
low rank decomposition and online reduced-order model- (3) Part 3: Hankel representation and offline
ing in dealing with the first requirement), we suggest the Koopman operator. Given gF (γbt ) for every
possibility for a neural-network-based approach to satisfy t ∈ εN+ , we set a dimension of delay-embedding,
them in practice. m ∈ N, to define a Hankel matrix H bm×n of gF (γbt )
(note that n equals the number of accessible sam-
ples). To ensure that the space spanned by the
Hankel matrix successfully approximates the invari-
D. Neural network architectures of Koopman
ant sub-space of target Koopman operator, we train
neural operator
a o × o linear layer to learn the r-th power of the
r
Koopman operator Kε : G Rdγb × T → K b follow-
The non-trivial part of the idea discussed above is how ing Eqs. (24-25). Based on the learned operator
to define an effective neural network architecture to real- r
Kε , we can predict the future
ize a Koopman-operator-based PDE solving pipeline. Be- state of the lat-
est observable gF γ b(m+n−1)ε as gF γ b(m+n)ε =
low, we introduce the Koopman neural operator (KNO) h r iT
as a possible architecture design, whose open-source Py- Kε H bm×n (n) (m), where notion T denotes the
Torch toolbox is released in Ref. [65]. In general, a KNO transpose of a matrix. Here we choose to learn the
architecture consists of the following parts: r-th power of Kε because a flexible value of r ∈ N
offers opportunities for us to adjust the time resolu-
(1) Part 1: Observation. Given an input ϕt = tion. Specifically, given the a priori time resolution
ϕ (D × {t}) of the PDE in Eqs. (1-3), we first ε pre-determined by the data set, we can control the
transform it as g (γbt ) in space G Rdγct × T by an internal time resolution of the learned Koopman
encoder (a single non-linear layer with tanh (·) acti- operator by setting a prediction length r. Based
vation function) that represent an observation func- on this setting, predicting the evolution of gF (b γt )
tion g (·). Please see Figure 1for illustrations. during a duration of ε requires the Koopman op-
erator to iterates r times in the Fourier space (i.e.,
(2) Part 2: Fourier transform. We apply the each iteration only corresponds to a short period of
Fourier transform to map g (γbt ) as gF (γbt ) = F ◦ ε/r). In other words, even though the time resolu-
g (γbt ) and parameterize the subsequent parts of tion ε of the sampled data may be not high enough,
our network in the Fourier space. Similar to Ref. we can define a non-one value of r to improve the
[9], gF (γbt ) is computed by fast Fourier transform, actual time resolution on which the Koopman op-
where we truncate the Fourier series at ω, a max- erator acts. This procedure has benefits in practice
imum number of frequency modes. On the one because the changes of the time-dependent Koop-
hand, as suggested in Ref. [9], this procedure offers man operator are more negligible during a smaller
a convenient computation of the iterative update period, which makes the offline optimization less
strategy shown in Eq. (4). On the other hand, the challenging. Please see Figure 1for illustrations of
truncated Fourier transform is important for our Part 3.
model because it subdivides the system into two
parts. The first part corresponds to the remaining (4) Part 4: Inverse Fourier transform.
Given each
low-frequency modes after truncation while the sec- predicted state gF γ b(m+n)ε in Part 3, we trans-

ond part consists of the truncated high-frequency form it from the Fourier space to G Rdγb × T by
modes. Although there is no theoretical guaran- an inverse Fourier transform, i.e, g γb(m+n)ε =
tee, low-frequency modes are generally more stable F −1 ◦ gF γb(m+n)ε . Note that high-frequency fluc-
than those volatile high-frequency modes in prac- tuations in the system have been filtered out in
tice unless the system is purely random. In other Part 2 and can not be directly recovered by the
words, it is more possible for low-frequency modes inverse Fourier transform (the complement of the
to satisfy the ergodicity condition with acceptable lost information is realized by Part 5). Please see
errors than high-frequency modes. Therefore, by Figure 1for the instances of Part 4.
filtering out high-frequency modes and learning the
Koopman operator only on low-frequency modes in (5) Part 5: High-frequency information com-
Part 3, we can empirically reduce the difficulty plement. According to the Fourier analysis im-
of data-driven Koopman operator approximation. plemented on feature maps, convolutional lay-
Meanwhile, we design an extra part (Part 5) in ers can amplify high-frequency components [66].
our model to extract the high-frequency modes of Therefore, we train a convolutional network C
g (γbt ) to complement the lost information about on the outputs of Part 1 to extract their
high-frequency fluctuations. Thus, our model can high-frequency information, denoted by gC (γbt ),
also capture the volatile parts of real systems that as a complement of Parts 2-4. Meanwhile,
7
FIG. 1. Conceptual illustrations of neural network architectures of the KNO. Note that the layout of every part is slightly
reorganized to offer a clear version.
the convolutional network also implements an in- tions of Part 5.

dependent forward prediction parallel to Parts
T (6) Part 6: Inverse observation. Given two future
2-4, i.e, gC γ b(i+1)ε , . . . , gC γb(i+m+1)ε =
T states, g γ b(m+n)ε and gC γ b(m+n)ε , of the latest
C g (bγiε ) , . . . , g γ
b(i+m)ε for any i = 1, . . . , n. observable independently predicted by Parts 2-4
In the released toolbox of the KNO [65], a basic and Part 5, we unify them by gU γ b(m+n)ε =
implementation of C is a convolutional layer, which (1 − λ) g γ b(m+n)ε + λgC γ b(m+n)ε , where λ ∈
is simple and practical. If the forward prediction [0, 1] controls the relative weights of low- and high-
becomes challenging in complicated PDEs, one can frequency information. We train an non-linear de-
also consider other effective convolutional network coder (a single non-linear layer with tanh (·) ac-
designs. For instance, we can use a simple tripar- tivation function) to represent the inverse of ob-
tite network design to realize C. The first and the servation function g−1 (·) ≃ gU−1 (·) and transform
third parts of C are convolutional layers used for gU γ

b(m+n)ε to γ b(m+n)ε , the target state of equa-
data reshaping and the second part of C is the in- tion solution in space Rdγb . Please see Figure 1 for
ception module (i.e., the basic component of the details.
GoogLeNet) [67], whose efficiency has been exten-
sively validated in practice. This design is provided Based on Parts 1-6, we have developed a new iterative
in the toolbox as well. See Figure 1for the illustra- update strategy different from Eq. (4). For any t′ > t ∈
εN, we have
h r iT
bt′ = g−1 F −1 ◦ Kε ◦ F ◦ g γ

γ b[t−mε,t] + C ◦ g γ
b[t−mε,t] (m) , (26)
| {z } | {z }
Parts 1-4 Part 1 and part 5
where γ b[t−mε,t] is a vector [b bt ] defined by m ∈

γt−mε , . . . , γ The idea of subdividing the prediction, reconstruction,
N, the dimension of delay-embedding. In Figure 1, we low-frequency, and high-frequency contributions of the
illustrate the one-unit architecture of KNO. Similar to loss function can also be seen in previous studies about
Fourier neural operator [9], a x-unit KNO architecture the sparse identification of nonlinear dynamic systems
can be produced by cascading the copy of Parts 2-5 x [68–70].
times. Based on Eq. (26), the loss function of the KNO The proposed KNO framework shares a similar motiva-
is defined as tion of parameterizing the Koopman operator via neural
m
X networks with Ref. [45]. The KNO is different from clas-
L = α∥b
γ t′ − γ t′ ∥ F + β ∥g−1 ◦ g (b
γt−imε ) − γt−imε ∥F , sic frameworks since it is developed for PDE solving and
i=0 consists of multiple procedures to reduce the difficulties of
(27) Koopman operator approximation (e.g., the divide-and-
conquer processing of high- and low-frequency informa-
where α, β ∈ (0, ∞) control the weights of prediction and tion as well as the flexible setting of internal time resolu-
reconstruction processes in the loss function, respectively. tion). Below, we test our KNO model in representative
8
tasks to validate its effectiveness. (5) Ablation experiment. To demonstrate the sig-
nificance of the learned Koopman operator in the
KNO, we implement an ablation experiment.
III. EXPERIMENTS
In our experiments, the high-frequency complement
can be either realized by a single convolutional layer, Cs ,
A. Experiments design or by a simple tripartite convolutional network, Ct . Be-
sides the KNO, we implement the following models for
1. Data set information comparison: the Fourier neural operator (FNO) [9], the
U-shaped neural operator (UNO) [78], the convolutional
We implement our experiments on the 1-dimensional neural operator (CNO) [79], latent spectral model (LSM)
Bateman–Burgers equation [71], the 2-dimensional [80], the U-Net [81], and the residual neural network
Navier-Stokes equation [72], the 2-dimensional Rayleigh- (ResNet) [82]. Among these baseline models, the FNO [9]
Bénard convection (i.e., a kind of turbulent convection) is an extensively validated neural operator model. The
[73], the 2-dimensional shallow-water equations [74], the UNO [78] and the CNO [79] represent the performance of
infrared satellite imagery of water vapor in the Storm recently proposed neural operators. The LSM [80] serves
EVent ImagRy data set [75], and the western boundary as an instance of the neural network parameterization of
current data collected by the E.U. Copernicus Marine spectral methods. The U-Net [81] and the ResNet [82]
Environment Monitoring Service [76, 77]. The first four represent the classic ideas of spatio-temporal modeling
data sets correspond to typical problems in turbulence in the field of deep learning. Other common neural net-
analysis. The last two data sets contain the real data work models in deep learning, such as the convolutional
governed by unknown PDEs, which offer opportunities LSTM (ConvLSTM) [83], and deep hidden physics model
to explore the applicability of the KNO in real precipi- (DHPM) [84], are shown as less efficient on complex dy-
tation forecasting (e.g., storm and rainfall) and western namic systems [83] and are not considered in our research.
boundary current analysis tasks (e.g., the dynamic mod- Each model is trained by its default optimizer (e.g., the
elling of Kuroshio and Gulf Stream currents). Please see KNO and the FNO are trained by the Adam optimizer).
B for full data description. The experiments on the first four PDE data sets are
implemented on a single Nvidia V100 GPU with 32GB
memory. The experiments on the last two large-scale real
2. Experiment designs and baselines data sets are implemented on a single Nvidia A100 GPU
with 40GB memory.
We conduct five experiments to validate our model:
(1) Mesh-independent experiment. As suggested B. Experiment results
in previous works [5, 9, 19–23], neural operator
models are expected to be mesh-independent be- 1. Mesh-independent experiment
casue they learn the solution operator of an entire
PDE family. Therefore, we design an experiment Our mesh-independent experiment is implemented on
to validate the mesh-independent property of the the data of 1-dimensional Bateman–Burgers equation
KNO. generated under different discretization conditions (i.e.,
spatial resolution of meshes). The data with highest res-
(2) Long-term prediction experiment. To validate
olution is generated following the Gaussian initialization
the long-term prediction capacity of the KNO, we
introduced in Ref. [9]. The data with lower resolution
design prediction tasks on eight data sets, including
are directly down-sampled from the data with higher res-
five representative PDE solving problems in turbu-
olution.
lence analysis and three complicated dynamic sys-
We choose the FNO as a baseline for comparison. As
tem modelling problems.
for other baseline models less efficient than FNO in mesh-
(3) Zero-shot prediction experiment (discretiza- independent experiment, such as graph neural operator
tion granularity). Following the idea of Ref. [9], (GNO) [5] and multipole graph neural operator (MGNO)
we validate the generalization ability of the KNO [85] (see results reported by Ref. [9]), we no longer dis-
by testing it on untrained discretization granularity cuss them for convenience. We implement multiple ver-
(e.g., in a way similar to super-resolution [9]). sions of the KNO with different hyper-parameters (e.g.,
operator size o, frequency mode number f , the itera-
(4) Zero-shot prediction experiment (prediction tion number r of the Koopman operator, and the rela-
interval). Apart from the generalization on un- tive weight λ that controls the contributions of low- and
trained discretization granularity, we also validate high-frequency information. Please see Sec. II D for the
the generalization capacity of the KNO on un- meanings of these parameters). Under every condition,
trained prediction interval (e.g., zero-shot temporal these models are trained on 1000 samples and conduct 1-
interpolation and extrapolation). second forward prediction (i.e., t′ −t = 1) on 200 samples
9
FIG. 2. Experiment results of mesh-independence. (a-d) respectively present the results under different conditions of high-
frequency complement designs and λ in the KNO. Here the high-frequency complement is either realized by a single convolutional
layer, Cs , or by a simple tripartite convolutional network, Ct . Parameter λ controls the contributions of low- and high-frequency
information. Model performance is measured using the root mean square error (RMSE). In the legends of (a-d), the numbers
within brackets indicate the parameter settings and model sizes, where the last number corresponds to the model size and all
other numbers denote parameters. Notion w in the FNO stands for the width parameter (i.e., the dimension of latent space)
[9]. Note that the results of the FNO are not repeatedly shown in (b-d) since the adjustment of λ has no relation with the
FNO. Therefore, the performances of the KNO models in (b-d) can be directly compared with the performance of the FNO in
(a).
for performance evaluation. During training, the batch ture (i.e., t′ = t + zε, where z ∈ {1, . . . , 40} is se-
size is fixed as 64. The learning rate is initialized at 0.001 lected for the 2-dimensional Rayleigh-Bénard convection
and is halved every 100 epochs. The weights of predic- data while z ∈ {1, . . . , 10} is set for all other data
tion and reconstruction in the loss function are defined sets). Note that the meaning of the time difference ε be-
as α = 5 and β = 0.5, respectively. tween adjacent time frames varies across different data
As illustrated in Figure 2, the KNO achieves almost sets. For the 2-dimensional Navier-Stokes equation, the
constant prediction error on every resolution. Compared 2-dimensional Rayleigh-Bénard convection, and the 2-
with the FNO, the prediction error of KNO models are dimensional shallow-water equations, the time difference
generally smaller under different conditions. Notably, a denotes one second. For the the western boundary cur-
one-unit KNO only requires about 5 × 103 parameters to rent data (e.g., the Kuroshio and Gulf Stream currents),
achieve similar performance with a one-unit FNO that the time difference corresponds to one day. For the in-
has more than 1.4×105 parameters. When the model size frared satellite imagery of water vapor, the time differ-
of the KNO increases to 2 × 104 or 4 × 104 , the KNO can ence corresponds to a five-minute-interval. Therefore,
significantly outperform the FNO. These results suggest ten or fifty time frames can already span a long interval
the possibility for the KNO to be better at maintaining of physics time to make the concerned systems exhibit
the balance between accuracy and efficiency in PDE solv- long-term and highly non-linear evolution.
ing (i.e., the KNO achieves higher accuracy with fewer Here we present the details of experiment implemen-
parameters). Meanwhile, we notice that a KNO model tation. For the Navier-Stokes equation, the training set
with a balance between high- and low-frequency informa- contains 1000 samples. The testing set contains 200 sam-
tion (i.e., λ = 0.5) can maintain more robust performance ples for ν ∈ {10−3 , 10−4 } while it includes 100 samples for
across different resolutions (i.e., the fluctuations of per- ν = 10−5 . For the Rayleigh-Bénard convection, models
formance are relatively smaller). Therefore, we speculate are trained and tested on 1600 and 200 samples, respec-
that λ = 0.5 is an optimal empirical choice in dealing tively. For the shallow-water equations, the training and
with cross-resolution prediction tasks. This speculation testing sets respectively include 1600 and 200 samples.
is also verified in our subsequent zero-shot experiment in In all western boundary current data prediction tasks,
Figure 4. models are trained on 1600 samples and tested on 200
samples. For the water vapor prediction task, models are
tested on 400 samples after being trained on 3200 sam-
2. Long-term prediction experiment ples. To show that the KNO is not limited to small model
designs, we present a possible realization of a relatively
Our long-term prediction experiment is implemented large KNO model (defined with o = 16, f = 12, λ = 0.5,
on all 2-dimensional data sets. Among these imple- and r = 6. High-frequency information complement is re-
mented data sets, the 2-dimensional Navier-Stokes equa- alized using the simple tripartite convolutional network
tion can be defined with three viscosity cases, i.e., ν ∈ Ct . Model size is 19871979). The compared models on
{10−3 , 10−4 , 10−5 }. On all data sets, the trained mod- all data sets include the FNO (model size is 926517), the
els are required to predict 10 time frames in the fu- UNO (model size is 30478033), the CNO (model size is
10
FIG. 3. Experiment results of long-term prediction. (a-h) show the performances of all models on different data sets, where
each prediction step creates a time frame. (i) visualizes the prediction results (the twenty-sixth time frame after the initial
condition is selected as an instance for illustration) and the associated errors of all models on the Rayleigh-Bénard convection.
Note that color bars in (i) are shared by all models. Therefore, the results can be directly compared across different models.
(j) presents the prediction results (the third time frame after the initial condition) and the associated errors of the KNO on
the Kuroshio and Gulf Stream currents.
2667034), the LSM (model size is 19188162), the U-Net (model size is 24950491), and the ResNet (model size is
11
FIG. 4. Results of the zero-shot experiment (discretization granularity) on the 1-dimensional Bateman–Burgers equation.
(a-d) respectively present the results under different conditions of high-frequency complement designs and λ in the KNO. Here
the high-frequency complement can be either realized by a single convolutional layer, Cs , or by a simple tripartite convolutional
network, Ct . Parameter λ controls the contributions of low- and high-frequency information.
20316490). prediction across resolutions, which is consistent with our

As shown in Figures 3(a-h), the KNO achieves opti- observations in Figure 2.
mal or nearly optimal performance on all data sets. The
increase rates of prediction errors in the KNO are gener-
ally smaller than those in other compared models. These 4. Zero-shot prediction experiment on prediction interval
results suggest that the usage of the Koopman opera-
tor modelling approach is beneficial to characterizing the
long-term evolution of PDEs or real systems because it Because the KNO is proposed to model the intricate
helps improves model accuracy. In Figures 3(i-j), we dynamics of PDE solutions, it is natural to wonder if
show the instances of prediction results. It can be seen the KNO can achieve optimal performance in the zero-
that the smaller errors achieved by the KNO do ensure shot prediction task with an untrained prediction inter-
a higher quality of 2-dimensional field visualization in its val. Specifically, there are two kinds of possible situa-
prediction results. The rapid growth of errors in other tions. In the first situation, models are only supervised
models generally lead to blurry and inaccurate visualiza- by non-adjacent time frames separated by untrained time
tion results. frames. Models are required to predict these unsuper-
vised time frames separating between supervised ones.
For instance, if supervised time frames are separated by
two untrained frames, models learn how to predict the 1-
3. Zero-shot experiment on discretization granularity st, the 4-th, the 7-th time frames and so on during train-
ing. During testing, models are required to predict the
As suggested by Ref. [9], a mesh-independent neural 2-nd, the 3-rd, the 5-th, and the 6-th time frames and
operator model can be trained only on the data with so on. In the second situation, models are supervised
lower resolution and predict the data with higher reso- by a given time interval and learn how to predict the
lution (referred to as zero-shot super-resolution [9]). In time frames within this interval. During testing, models
our research, we validate this property by implementing are required to predict a much longer time interval that
a zero-shot experiment and comparing between the KNO covers multiple unsupervised time frames. For instance,
and the FNO. models are trained to predict 10 time frames in the fu-
Model settings are the same as our mesh-independence ture. During testing, they are required to predict 40 time
experiment shown in Figure 2. We use the 1-dimensional frames instead. These two kinds of zero-shot experiments
Bateman–Burgers equation to implement the experi- are useful in temporal interpolation and extrapolation.
ment. Most of the settings of prediction tasks keep the The 2-dimensional Rayleigh-Bénard convection data is
same as those in Figure 2. The only difference lies in that used to implement the experiment. We directly adopt
all models are trained on a lower resolution and tested model parameter settings from our experiment in Fig-
on a higher resolution. Specifically, models are trained ure 3. In the first kind of zero-shot experiment, models
on 28 grids and tested on {29 , . . . , 213 } grids for the 1- are supervised by the time frames separated by 2 or 4
dimensional Bateman–Burgers equation. untrained frames. In the second kind of zero-shot ex-
As shown in Figure 4, a KNO with a smaller model size periment, models are trained to predict 10 or 20 time
can outperform a FNO with a larger model size under frames. During the testing, they are required to pre-
many conditions. Meanwhile, after comparing between dict 40 time frames in the future (i.e., there are 30 or 20
the performances of the KNO across different settings of unsupervised time frames used for zero-shot prediction).
λ, we find that λ = 0.5 is an optimal choice for zero-shot As shown in Figures 5(a-d), the KNO generally achieves
12
FIG. 5. Results of the zero-shot experiment (prediction interval) on the 2-dimensional Rayleigh-Bénard convection data. (a-b)
show the results of the first kind of zero-shot prediction, where models are supervised by the time frames separated by 2 or 4
untrained frames during training, respectively. The result of the ResNet is absent in (b) because the ResNet does not converge
well during training. (c-d) present the results of the second type of zero-shot prediction, where there are 20 or 30 unsupervised
time frames during testing. Note that the lines shown in (a-d) are smoothed using the B-spline basis offered by Scipy [86]
because the raw performances of the ResNet and the CNO are oscillating. (e) visualizes the prediction results (the twenty-sixth
time frame after the initial condition is selected as an instance for illustration) and the associated errors of all models during
the zero-shot experiment shown in (d).
ideal accuracy in both kinds of zero-shot experiments. vation function and its inverse by the reconstruction loss.
In Figure 5(e), we visualize instances of the prediction In the latter case, the whole model is driven to focus only
results of all models during the zero-shot extrapolation on the temporal prediction and the linear layer does not
experiment with 30 unsupervised time frames. Because necessarily function in a Koopman-operator-like manner.
these instances correspond to the same system shown in We use the data of the 1-dimensional Bate-
Figure 3(i), we can directly compare between these two man–Burgers equation to implement the ablation exper-
cases and find that the long-term prediction of the KNO iment. For convenience, the high-frequency complement
is still robust even without supervision while the perfor- in the KNO is realized by a single convolutional layer, Cs ,
mances of other models significantly reduce in the zero- with λ = 0.5. As shown in Table I, the KNO defined with
shot case. These results suggest that the captured evo- an effective reconstruction process generally outperforms
lution patterns by the approximated Koopman operator the KNO framework only driven by the prediction loss.
are suitable for dynamic system modelling.
IV. CONCLUSION
5. Ablation experiment
In summary, we develop the Koopman neural oper-
To prove that the learned Koopman operator is a key ator (KNO), a mesh-independent neural-network-based
factor to ensure the effectiveness of the KNO, we set solver of partial differential equations. The basic code
a simple ablation experiment to compare the predic- of the KNO is provided in C and the official toolbox of
tion performances between the KNO framework with and the KNO can be seen in Ref. [36]. Compared with the
without the reconstruction loss term. In the former case, existing state-of-the-art approaches in the field of neural-
the temporal prediction in the model mainly depends on network-based PDE solving, such as the Fourier neural
the weight matrix of the linear layer, which functions as operator [9], the U-shaped neural operator [78], the con-
the r-th power of a Koopman operator. Other optimiz- volutional neural operator [79], and latent spectral model
able parts of the model will be driven to learn the obser- [80], our proposed KNO exhibits a higher capacity to cap-
13
Operator size o Mode number f Iteration number r Weight α Weight β RMSE

8 10 10 5.0 0.5 6.18 × 10−3
8 10 10 5.0 0 6.09 × 10−3
8 10 16 5.0 0.5 5.01 × 10−3
8 10 16 5.0 0 4.79 × 10−3
8 64 10 5.0 0.5 4.93 × 10−3
8 64 10 5.0 0 5.17 × 10−3
16 10 16 5.0 0.5 3.24 × 10−3
16 10 16 5.0 0 3.32 × 10−3
32 10 16 5.0 0.5 2.32 × 10−3
32 10 16 5.0 0 2.46 × 10−3
32 64 16 5.0 0.5 2.34 × 10−3
32 64 16 5.0 0 2.40 × 10−3
128 10 16 5.0 0.5 1.92 × 10−3
128 10 16 5.0 0 2.08 × 10−3
TABLE I. Results of the ablation experiment on the 1-dimensional Bateman–Burgers equation.
ture the long-term evolution of PDEs or real dynamic sys- 2021YFC3101600, 2020YFA0608000, 2020YFA0607900).
tems. Meanwhile, it generally maintains accuracy across Authors are grateful for the technical supports provided
different mesh resolutions. This property suggests the by Mr. Hao Wu, who studies at the School of Com-
potential of the KNO to be considered as a basic unit puter Science and Technology, University of Science and
for constructing large-scale frameworks to solve complex Technology of China. Meanwhile, authors appreciate the
physics equations with non-linear dynamics or predict helps of Mr. Yuan Gao and Mr. Shuyi Zhou, who study
future states of real dynamic systems. Moreover, the ro- at Department of Earth System Science, Tsinghua Uni-
bust mesh-independence property enables the KNO to be versity, during data preparation.
trained on the low resolution data before being applied
on the high resolution data. This property may create
more possibilities to reduce the computational costs of Appendix A: The associated Lie operator of the
PDE solving. Koopman operator
Besides the linear system in Eq. (9)) in the main text,

ACKNOWLEDGEMENTS we can also consider the generator operator of such a
Koopman operator, which is referred to as the Lie op-
Authors Y.T and P.S appreciate the supports from erator because it is the Lie derivative of g (·) along the
the Artificial and General Intelligence Research Program vector field γ (·) [87–89]
of Guo Qiang Research Institute at Tsinghua Univer-
sity (2020GQG1017) and the Huawei Innovation Re- Ktt+ε g (γt ) − g (γt )
search Program (TC20221109044). Author X.M.H is Lt g = lim . (A1)
t+ε→t t+ε−t
supported by the National Natural Science Foundation
of China (42125503) and the National Key Research The generator operator also defines a linear system of
and Development Program of China (2022YFE0195900, g (γt ) because
g (γt+ε ) − g (γt ) Ktt+ε g (γt ) − g (γt )

∂t g (γt ) = lim = lim = Lt g (γt ) . (A2)
ε→0 ε t+ε→t ε
Although our work primarily focuses on the Koopman Appendix B: Details of the implemented data sets
operator, one can also model the Lie operator in appli-
cation. In our experiments, we consider the 1-dimensional
Bateman–Burgers equation [71], the 2-dimensional
Navier-Stokes equation [72], the 2-dimensional Rayleigh-
Bénard convection [73], the 2-dimensional shallow-water
equations [74], the infrared satellite imagery of water
vapor in the Storm EVent ImagRy data set [75], and
14
the western boundary current data collected by the 4. Shallow-water equations.

E.U. Copernicus Marine Environment Monitoring Ser-
vice [76, 77]. Below, we introduce their precise definitions The 2-dimensional shallow-water equations have the
and pre-processing pipelines. following forms [74]
∂t h + ∂x u + ∂y v = 0, (B9)
1. Bateman–Burgers equation.
1
∂t hu + ∂x u2 h + h2 = −gr h∂x b, (B10)
gr
The 1-dimensional Bateman–Burgers equation is
1
2
u (xt )
∂t hv + ∂y v 2 h + h2 = −gr h∂y b, (B11)
∂t u (xt ) + ∂x = ν∂xx u (xt ) , xt ∈ (0, 1) × (0, 1] , gr
2
(B1) where h denotes a scalar field of water depth, notion b de-
scribes the spatially varying bathymetry, and gr measures
u (x0 ) = uI , x0 ∈ (0, 1) × {0}, (B2)
the gravitational acceleration. Notions u and v denote
where u (·) denotes the velocity, notion ωI is a periodic the velocities in horizontal and vertical directions, re-
initial condition, and parameter ν is the viscosity coeffi- spectively. The data of the 2-dimensional shallow-water
cient. The data set of Eqs. (B1-B2) is provided by Ref. equations is offered by Ref. [74]. In our experiments, the
[9]. The learning objective of this equation is u (·). learning objective is depth field h.
2. Navier-Stokes equation. 5. Water vapor data.
Mathematically, the incompressible 2-dimensional The infrared satellite imagery of water vapor data is
Navier-Stokes equation in a vorticity form is defined as acquired from the Storm EVent ImagRy data set [75],
which can be used for precipitation analysis and fore-
∂t ω + u · ∇ω = ν∇2 ω + ∇ × f , (B3) casting (e.g., storm and rainfall). The spatial resolution
∇ · u = 0, (B4) of satellite imagery is 2 kilometer and the time interval
between each pair of adjacent imagery data is 5 minutes.
where ω is the scalar field of vorticity, notion u de- For convenience, we have down-sampled the data from
fines the vector field of velocity, and f denotes a time- 192 × 192 to 128 × 128 in the spatial domain.
independent forcing term. The viscosity coefficient is set
as ν ∈ {10−3 , 10−4 , 10−5 }. Similar to the situation of
Bateman–Burgers equation, the data of Eqs. (B3-B4) is 6. Western boundary current data.
provided by Ref. [9]. In experiments, the learning target
is vorticity field ω. The western boundary current data is acquired from
the E.U. Copernicus Marine Environment Monitoring
Service [76]. The data set can be used to analyze the
3. Rayleigh-Bénard convection equation.
dynamics of western boundary current, which is an im-
portant factor in shaping global ocean circulation and
The 2-dimensional Rayleigh-Bénard convection is de- climate variability [77]. Specifically, we use the the daily
fined as [73] sea surface stream velocity data whose spatial resolution
is 0.25×0.25 degrees. We choose the data in the Kuroshio
1
∂t u + u · ∇u + ∇p = ν∇2 u + f , (B5) region (10-42◦ N, 123-155◦ E) and the Gulf Stream region
ρ (θ) (20-52◦ N, 33-65◦ W) to study the Kuroshio and the Gulf
∂t θ + u · ∇θ = κ∇2 θ, (B6) Stream currents. The data covers the time interval from
∇ · u = 0, (B7) 2013/1/1 to 2018/12/31.
ρ (θ) = ρ (θ0 ) [1 − a (θ − θ0 )] , (B8)
where p is the pressure, scalar θ is the temperature, coef- Appendix C: Code implementation
ficient κ measures the heat conductivity, and ρ (θ) is the
density of fluids given the temperature θ. Notion θ0 is The basic code implement of the KNO based on
the initial temperature and a denotes the thermal expan- PyTorch is presented below. The official toolbox
sion coefficient. The data of the 2-dimensional Rayleigh- of the KNO is presented in https://github.com/
Bénard convection is generated by Ref. [83] using the Koopman-Laboratory/KoopmanLab (also see Ref. [36]).
Boussinesq approximation approach. In our experiments, 1 import torch
the learning target is the magnitude of velocity u (i.e., a 2 import numpy as np
scalar field). 3 import torch . nn as nn
15
4 import torch . nn . functional as F 62 x_reconstruct = torch . tanh ( x_reconstruct

5 )
6 torch . manual_seed (0) 63 x_reconstruct = self . dec ( x_reconstruct )
7 64 # Predict
8 class encoder ( nn . Module ) : 65 x = self . enc ( x ) # Encoder
9 def __init__ ( self , t_len , op_size ) : 66 x = x . permute (0 , 2 , 1)
10 super ( encoder , self ) . __init__ () 67 x_w = x
11 self . layer = nn . Linear ( t_len , op_size ) 68 for i in range ( self . decompose ) :
12 def forward ( self , x ) : 69 x1 = self . koopman_layer ( x ) # Koopman
13 x = self . layer ( x ) Operator
14 x = torch . tanh ( x ) 70 x = x + x1
15 return x 71 x = self . w0 ( x_w ) + x
16 72 x = x . permute (0 , 2 , 1)
17 class decoder ( nn . Module ) : 73 x = self . dec ( x ) # Decoder
18 def __init__ ( self , t_len , op_size ) : 74 return x , x_reconstruct
19 super ( decoder , self ) . __init__ () 75
20 self . layer = nn . Linear ( op_size , t_len ) 76 class K o o p m a n _ O p e r a t o r 2 D ( nn . Module ) :
21 def forward ( self , x ) : 77 def __init__ ( self , op_size , modes ) :
22 x = torch . tanh ( x ) 78 super ( Koopman_Operator2D , self ) . __init__
23 x = self . layer ( x ) ()
24 return x 79 self . op_size = op_size
25 80 self . scale = (1 / ( op_size * op_size ) )
26 class K o o p m a n _ O p e r a t o r 1 D ( nn . Module ) : 81 self . modes_x = modes
27 def __init__ ( self , op_size , modes_x = 16) : 82 self . modes_y = modes
28 super ( Koopman_Operator1D , self ) . __init__ 83 self . koo pman_ma trix = nn . Parameter ( self .
() scale * torch . rand ( op_size , op_size , self .
29 self . op_size = op_size modes_x , self . modes_y , dtype = torch . cfloat ) )
30 self . scale = (1 / ( op_size * op_size ) ) 84
31 self . modes_x = modes_x 85 # Complex multip licatio n
32 self . ko opman_ma trix = nn . Parameter ( self . 86 def time_marching ( self , input , weights ) :
scale * torch . rand ( op_size , op_size , self . 87 # ( batch , t , x , y ) , (t , t +1 , x , y ) -> (
modes_x , dtype = torch . cfloat ) ) batch , t +1 , x , y )
33 # Complex multip lication 88 return torch . einsum ( " btxy , tfxy - > bfxy " ,
34 def time_marching ( self , input , weights ) : input , weights )
35 # ( batch , t , x ) , (t , t +1 , x ) -> ( batch , 89
t +1 , x ) 90 def forward ( self , x ) :
36 return torch . einsum ( " btx , tfx - > bfx " , 91 batchsize = x . shape [0]
input , weights ) 92 # Fourier Transform
37 def forward ( self , x ) : 93 x_ft = torch . fft . rfft2 ( x )
38 batchsize = x . shape [0] 94 # Koopman Operator Time Marching
39 # Fourier Transform 95 out_ft = torch . zeros ( x_ft . shape , dtype =
40 x_ft = torch . fft . rfft ( x ) torch . cfloat , device = x . device )
41 # Koopman Operator Time Marching 96 out_ft [: , : , : self . modes_x , : self .
42 out_ft = torch . zeros ( x_ft . shape , dtype = modes_y ] = self . time_marching ( x_ft [: , : , :
torch . cfloat , device = x . device ) self . modes_x , : self . modes_y ] , self .
43 out_ft [: , : , : self . modes_x ] = self . koop man_matr ix )
time_marching ( x_ft [: , : , : self . modes_x ] , 97 out_ft [: , : , - self . modes_x : , : self .
self . koo pman_ma trix ) modes_y ] = self . time_marching ( x_ft [: , : , -
44 # Inverse Fourier Transform self . modes_x : , : self . modes_y ] , self .
45 x = torch . fft . irfft ( out_ft , n = x . size ( -1) koop man_matr ix )
) 98 # Inverse Fourier Transform
46 return x 99 x = torch . fft . irfft2 ( out_ft , s =( x . size
47 ( -2) , x . size ( -1) ) )
48 class KNO1d ( nn . Module ) : 100 return x
49 def __init__ ( self , op_size , modes_x = 16 , 101
decompose = 4 , t_len = 1) : 102 class KNO2d ( nn . Module ) :
50 super ( KNO1d , self ) . __init__ () 103 def __init__ ( self , op_size , modes = 10 ,
51 # Parameter decompose = 6 , t_len = 10 , weight = 0) :
52 self . op_size = op_size 104 super ( KNO2d , self ) . __init__ ()
53 self . decompose = decompose 105 # Parameter
54 # Layer Structure 106 self . op_size = op_size
55 self . enc = encoder ( t_len , op_size ) 107 self . decompose = decompose
56 self . dec = decoder ( t_len , op_size ) 108 self . modes = modes
57 self . koopman_layer = K o o p m a n _ O p e r a t o r 1 D ( 109 # Layer Structure
self . op_size , modes_x = modes_x ) 110 self . enc = encoder ( t_len , op_size )
58 self . w0 = nn . Conv1d ( op_size , op_size , 1) 111 self . dec = decoder ( t_len , op_size )
59 def forward ( self , x ) : 112 self . koopman_layer = K o o p m a n _ O p e r a t o r 2 D (
60 # Reconstruct self . op_size , self . modes )
61 x_reconstruct = self . enc ( x ) 113 self . weight = weight
114 if self . weight == 0:
16
115 self . w0 = nn . Conv2d ( op_size , op_size 124 x = self . enc ( x ) # Encoder

, 1) 125 x = x . permute (0 , 3 , 1 , 2)
116 else : 126 x_w = x
117 self . w0 = Convolutional_Module ( 127 for i in range ( self . decompose ) :
op_size ) # Define a convolutional structure 128 x1 = self . koopman_layer ( x ) # Koopman
118 def forward ( self , x): Operator
119 # Reconstruct 129 x = x + x1
120 x_reconstruct = self . enc ( x ) 130 x = (1 - self . weight ) * x + self . weight
121 x_reconstruct = torch . tanh ( x_reconstruct * self . w0 ( x_w )
) 131 x = x . permute (0 , 2 , 3 , 1)
122 x_reconstruct = self . dec ( x_reconstruct ) 132 x = self . dec ( x ) # Decoder
123 # Predict 133 return x , x_reconstruct
[1] L. Debnath and L. Debnath, Nonlinear partial differential lems, Communications in Mathematics and Statistics 6,
equations for scientists and engineers (Springer, 2005). 1 (2018).
[2] H. Tanabe, Functional analytic methods for partial dif- [16] L. Bar and N. Sochen, Unsupervised deep learning algo-
ferential equations (CRC Press, 2017). rithm for pde-based forward and inverse problems, arXiv
[3] M. S. Gockenbach, Partial differential equations: analyt- preprint arXiv:1904.05417 (2019).
ical and numerical methods, Vol. 122 (Siam, 2005). [17] S. Pan and K. Duraisamy, Physics-informed probabilistic
[4] R. M. Mattheij, S. W. Rienstra, and J. T. T. Boonkkamp, learning of linear embeddings of nonlinear dynamics with
Partial differential equations: modeling, analysis, compu- guaranteed stability, SIAM Journal on Applied Dynami-
tation (SIAM, 2005). cal Systems 19, 480 (2020).
[5] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhat- [18] M. Lienen and S. Günnemann, Learning the dynamics of
tacharya, A. Stuart, and A. Anandkumar, Neural oper- physical systems from sparse observations with finite ele-
ator: Graph kernel network for partial differential equa- ment networks, in International Conference on Learning
tions, arXiv preprint arXiv:2003.03485 (2020). Representations (2022).
[6] J. N. Reddy, Introduction to the finite element method [19] L. Lu, P. Jin, and G. E. Karniadakis, Deeponet: Learning
(McGraw-Hill Education, 2019). nonlinear operators for identifying differential equations
[7] K. Lipnikov, G. Manzini, and M. Shashkov, Mimetic based on the universal approximation theorem of opera-
finite difference method, Journal of Computational tors, arXiv preprint arXiv:1910.03193 (2019).
Physics 257, 1163 (2014). [20] K. Bhattacharya, B. Hosseini, N. B. Kovachki, and A. M.
[8] E. Tadmor, A review of numerical methods for nonlinear Stuart, Model reduction and neural networks for para-
partial differential equations, Bulletin of the American metric pdes, arXiv preprint arXiv:2005.03180 (2020).
Mathematical Society 49, 507 (2012). [21] N. H. Nelsen and A. M. Stuart, The random feature
[9] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhat- model for input-output maps between banach spaces,
tacharya, A. Stuart, and A. Anandkumar, Fourier neu- SIAM Journal on Scientific Computing 43, A3212 (2021).
ral operator for parametric partial differential equations, [22] N. Kovachki, S. Lanthaler, and S. Mishra, On universal
arXiv preprint arXiv:2010.08895 (2020). approximation and error bounds for fourier neural op-
[10] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Physics- erators, Journal of Machine Learning Research 22, Art
informed neural networks: A deep learning framework for (2021).
solving forward and inverse problems involving nonlinear [23] Z. Li, D. Z. Huang, B. Liu, and A. Anandkumar, Fourier
partial differential equations, Journal of Computational neural operator with learned deformations for pdes
physics 378, 686 (2019). on general geometries, arXiv preprint arXiv:2207.05209
[11] D. Kochkov, J. A. Smith, A. Alieva, Q. Wang, M. P. (2022).
Brenner, and S. Hoyer, Machine learning–accelerated [24] J. Guibas, M. Mardani, Z. Li, A. Tao, A. Anandkumar,
computational fluid dynamics, Proceedings of the Na- and B. Catanzaro, Efficient token mixing for transform-
tional Academy of Sciences 118, e2101784118 (2021). ers via adaptive fourier neural operators, in International
[12] X. Guo, W. Li, and F. Iorio, Convolutional neural net- Conference on Learning Representations (2021).
works for steady flow approximation, in Proceedings of [25] L. Perko, Differential equations and dynamical systems,
the 22nd ACM SIGKDD international conference on Vol. 7 (Springer Science & Business Media, 2013).
knowledge discovery and data mining (2016) pp. 481–490. [26] P. A. Fishwick, Handbook of dynamic system modeling
[13] Y. Zhu and N. Zabaras, Bayesian deep convolutional (CRC Press, 2007).
encoder–decoder networks for surrogate modeling and [27] P. C. Bressloff, Spatiotemporal dynamics of continuum
uncertainty quantification, Journal of Computational neural fields, Journal of Physics A: Mathematical and
Physics 366, 415 (2018). Theoretical 45, 033001 (2011).
[14] S. Bhatnagar, Y. Afshar, S. Pan, K. Duraisamy, and [28] J. E. Herrera-Estrada, Y. Satoh, and J. Sheffield, Spa-
S. Kaushik, Prediction of aerodynamic flow fields using tiotemporal dynamics of global drought, Geophysical Re-
convolutional neural networks, Computational Mechan- search Letters 44, 2254 (2017).
ics 64, 525 (2019). [29] W. Wu, C. Meneveau, and R. Mittal, Spatio-temporal
[15] B. Yu et al., The deep ritz method: a deep learning- dynamics of turbulent separation bubbles, Journal of
based numerical algorithm for solving variational prob- Fluid Mechanics 883, A45 (2020).
17
[30] E. Schöll, Nonlinear spatio-temporal dynamics and chaos [47] A. Lasota and M. C. Mackey, Probabilistic properties
in semiconductors, 10 (Cambridge University Press, of deterministic systems (Cambridge university press,
2001). 1985).
[31] S. L. Brunton, M. Budisic, E. Kaiser, and J. N. Kutz, [48] P. Gaspard, G. Nicolis, A. Provata, and S. Tasaki, Spec-
Modern koopman theory for dynamical systems, SIAM tral signature of the pitchfork bifurcation: Liouville equa-
Review 64, 229 (2022). tion approach, Physical Review E 51, 74 (1995).
[32] J. Pathak, S. Subramanian, P. Harrington, S. Raja, [49] P. Gaspard, Chaos, scattering and statistical mechanics,
A. Chattopadhyay, M. Mardani, T. Kurth, D. Hall, Z. Li, Chaos (2005).
K. Azizzadenesheli, et al., Fourcastnet: A global data- [50] A. Lasota and M. C. Mackey, Chaos, fractals, and noise:
driven high-resolution weather model using adaptive stochastic aspects of dynamics, Vol. 97 (Springer Science
fourier neural operators, arXiv preprint arXiv:2202.11214 & Business Media, 1998).
(2022). [51] P. D. Lax, Integrals of nonlinear equations of evolution
[33] D. Cao, Y. Wang, J. Duan, C. Zhang, X. Zhu, C. Huang, and solitary waves, Communications on pure and applied
Y. Tong, B. Xu, J. Bai, J. Tong, et al., Spectral tem- mathematics 21, 467 (1968).
poral graph neural network for multivariate time-series [52] P. D. Lax, Linear algebra and its applications, Vol. 78
forecasting, Advances in neural information processing (John Wiley & Sons, 2007).
systems 33, 17766 (2020). [53] J. P. Parker and J. Page, Koopman analysis of isolated
[34] F. Aminian, E. D. Suarez, M. Aminian, and D. T. Walz, fronts and solitons, SIAM Journal on Applied Dynamical
Forecasting economic data with neural networks, Com- Systems 19, 2803 (2020).
putational Economics 28, 71 (2006). [54] H. Nakao and I. Mezić, Spectral analysis of the koopman
[35] L. Xu, N. Chen, Z. Chen, C. Zhang, and H. Yu, Spa- operator for partial differential equations, Chaos: An In-
tiotemporal forecasting in earth system science: Meth- terdisciplinary Journal of Nonlinear Science 30, 113131
ods, uncertainties, predictability and future directions, (2020).
Earth-Science Reviews 222, 103828 (2021). [55] C. Gin, B. Lusch, S. L. Brunton, and J. N. Kutz, Deep
[36] W. Xiong, M. Ma, X. Huang, Z. Zhang, P. Sun, and learning models for global coordinate transformations
Y. Tian, Koopmanlab: A library for koopman neu- that linearise pdes, European Journal of Applied Math-
ral operator with pytorch (2023), open source codes ematics 32, 515 (2021).
available at https://github.com/Koopman-Laboratory/ [56] J. Page and R. R. Kerswell, Koopman analysis of burgers
KoopmanLab. equation, Physical Review Fluids 3, 071901 (2018).
[37] C. W. Rowley, I. Mezić, S. Bagheri, P. Schlatter, and [57] H. Arbabi and I. Mezic, Ergodic theory, dynamic mode
D. S. Henningson, Spectral analysis of nonlinear flows, decomposition, and computation of spectral properties
Journal of fluid mechanics 641, 115 (2009). of the koopman operator, SIAM Journal on Applied Dy-
[38] I. Abraham and T. D. Murphey, Active learning of dynamical Systems 16, 2096 (2017).
namics for data-driven control using koopman operators, [58] N. Črnjarić-Žic, S. Maćešić, and I. Mezić, Koopman op-
IEEE Transactions on Robotics 35, 1071 (2019). erator spectrum for random dynamical systems, Journal
[39] R. Taylor, J. N. Kutz, K. Morgan, and B. A. Nelson, of Nonlinear Science 30, 2007 (2020).
Dynamic mode decomposition for plasma diagnostics and [59] S. L. Brunton, B. W. Brunton, J. L. Proctor, E. Kaiser,
validation, Review of Scientific Instruments 89, 053501 and J. N. Kutz, Chaos as an intermittently forced linear
(2018). system, Nature communications 8, 1 (2017).
[40] B. W. Brunton, L. A. Johnson, J. G. Ojemann, and J. N. [60] Y. Saad, Numerical methods for large eigenvalue prob-
Kutz, Extracting spatial–temporal coherent patterns in lems: revised edition (SIAM, 2011).
large-scale neural recordings using dynamic mode decom- [61] M. Korda and I. Mezić, On convergence of extended dy-
position, Journal of neuroscience methods 258, 1 (2016). namic mode decomposition to the koopman operator,
[41] N. Takeishi, Y. Kawahara, and T. Yairi, Learning koop- Journal of Nonlinear Science 28, 687 (2018).
man invariant subspaces for dynamic mode decomposi- [62] M. Li and L. Jiang, Reduced-order modeling for koopman
tion, Advances in Neural Information Processing Systems operators of nonautonomous dynamic systems in multi-
30 (2017). scale media, arXiv preprint arXiv:2204.13180 (2022).
[42] O. Azencot, N. B. Erichson, V. Lin, and M. Mahoney, [63] M. Li and L. Jiang, Data-driven reduced-order model-
Forecasting sequential data using consistent koopman ing for nonautonomous dynamical systems in multiscale
autoencoders, in International Conference on Machine media, Journal of Computational Physics 474, 111799
Learning (PMLR, 2020) pp. 475–485. (2023).
[43] S. E. Otto and C. W. Rowley, Linearly recurrent autoen- [64] I. P. Cornfeld, S. V. Fomin, and Y. G. Sinai, Ergodic the-
coder networks for learning dynamics, SIAM Journal on ory, Vol. 245 (Springer Science & Business Media, 2012).
Applied Dynamical Systems 18, 558 (2019). [65] W. Xiong, M. Ma, X. Huang, Z. Zhang, P. Sun, and
[44] D. J. Alford-Lago, C. W. Curtis, A. T. Ihler, and O. Is- Y. Tian, Koopmanlab: A pytorch module of koopman
san, Deep learning enhanced dynamic mode decomposi- neural operator family for solving partial differential
tion, Chaos: An Interdisciplinary Journal of Nonlinear equations, arXiv preprint arXiv:2301.01104 (2023).
Science 32, 033116 (2022). [66] N. Park and S. Kim, How do vision transformers work?,
[45] B. Lusch, J. N. Kutz, and S. L. Brunton, Deep learning arXiv preprint arXiv:2202.06709 (2022).
for universal linear embeddings of nonlinear dynamics, [67] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,
Nature communications 9, 1 (2018). D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabi-
[46] S. Macesic, N. Crnjaric-Zic, and I. Mezic, Koopman oper- novich, Going deeper with convolutions, in Proceedings
ator family spectrum for nonautonomous systems, SIAM of the IEEE conference on computer vision and pattern
Journal on Applied Dynamical Systems 17, 2478 (2018). recognition (2015) pp. 1–9.
18
[68] K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton, lutional neural operators for robust and accurate learning
Data-driven discovery of coordinates and governing equa- of pdes, Advances in Neural Information Processing Sys-
tions, Proceedings of the National Academy of Sciences tems 36 (2024).
116, 22445 (2019). [80] H. Wu, T. Hu, H. Luo, J. Wang, and M. Long, Solv-
[69] K. P. Champion, S. L. Brunton, and J. N. Kutz, Discov- ing high-dimensional pdes with latent spectral models,
ery of nonlinear multiscale systems: Sampling strategies in Proceedings of the 40th International Conference on
and embeddings, SIAM Journal on Applied Dynamical Machine Learning (2023) pp. 37417–37438.
Systems 18, 312 (2019). [81] O. Ronneberger, P. Fischer, and T. Brox, U-net: Con-
[70] U. Fasel, J. N. Kutz, B. W. Brunton, and S. L. Brunton, volutional networks for biomedical image segmentation,
Ensemble-sindy: Robust sparse model discovery in the in International Conference on Medical image computing
low-data, high-noise limit, with active learning and con- and computer-assisted intervention (Springer, 2015) pp.
trol, Proceedings of the Royal Society A 478, 20210904 234–241.
(2022). [82] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learn-
[71] E. R. Benton and G. W. Platzman, A table of solutions of ing for image recognition, in Proceedings of the IEEE con-
the one-dimensional burgers equation, Quarterly of Ap- ference on computer vision and pattern recognition (2016)
plied Mathematics 30, 195 (1972). pp. 770–778.
[72] C. Wang, Exact solutions of the steady-state navier- [83] R. Wang, K. Kashinath, M. Mustafa, A. Albert, and
stokes equations, Annual Review of Fluid Mechanics 23, R. Yu, Towards physics-informed deep learning for tur-
159 (1991). bulent flow prediction, in Proceedings of the 26th ACM
[73] E. Bodenschatz, W. Pesch, and G. Ahlers, Recent devel- SIGKDD International Conference on Knowledge Dis-
opments in rayleigh-bénard convection, Annual review of covery & Data Mining (2020) pp. 1457–1466.
fluid mechanics 32, 709 (2000). [84] M. Raissi, Deep hidden physics models: Deep learning of
[74] M. Takamoto, T. Praditia, R. Leiteritz, D. MacKinlay, nonlinear partial differential equations, The Journal of
F. Alesiani, D. Pflüger, and M. Niepert, Pdebench: An Machine Learning Research 19, 932 (2018).
extensive benchmark for scientific machine learning, Ad- [85] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, A. Stuart,
vances in Neural Information Processing Systems 35, K. Bhattacharya, and A. Anandkumar, Multipole graph
1596 (2022). neural operator for parametric partial differential equa-
[75] M. Veillette, S. Samsi, and C. Mattioli, Sevir: A storm tions, Advances in Neural Information Processing Sys-
event imagery dataset for deep learning applications in tems 33, 6755 (2020).
radar and satellite meteorology, Advances in Neural In- [86] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haber-
formation Processing Systems 33, 22009 (2020). land, T. Reddy, D. Cournapeau, E. Burovski, P. Peter-
[76] E.U. Copernicus Marine Environment Monitoring Ser- son, W. Weckesser, J. Bright, et al., Scipy 1.0: fundamen-
vice, Daily satellite global sea level data (sealevel glo phy tal algorithms for scientific computing in python, Nature
l4 observations 008 047) collected by the copernicus ma- methods 17, 261 (2020).
rine environment monitoring service (2021). [87] B. O. Koopman, Hamiltonian systems and transfor-
[77] D. Hu, L. Wu, W. Cai, A. S. Gupta, A. Ganachaud, mation in hilbert space, Proceedings of the National
B. Qiu, A. L. Gordon, X. Lin, Z. Chen, S. Hu, et al., Pa- Academy of Sciences 17, 315 (1931).
cific western boundary currents and their roles in climate, [88] R. Abraham, J. E. Marsden, and T. Ratiu, Manifolds,
Nature 522, 299 (2015). tensor analysis, and applications, Vol. 75 (Springer Sci-
[78] M. A. Rahman, Z. E. Ross, and K. Azizzadenesheli, ence & Business Media, 2012).
U-no: U-shaped neural operators, arXiv preprint [89] C. Chicone and Y. Latushkin, Evolution semigroups in
arXiv:2204.11127 (2022). dynamical systems and differential equations, 70 (Amer-
[79] B. Raonic, R. Molinaro, T. De Ryck, T. Rohner, F. Bar- ican Mathematical Soc., 1999).
tolucci, R. Alaifari, S. Mishra, and E. de Bézenac, Convo-

Koopman Neural Operator As A Mesh-Free Solver of Non-Linear Partial Differential Equations

Uploaded by

Copyright:

Available Formats

Koopman Neural Operator As A Mesh-Free Solver of Non-Linear Partial Differential Equations

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Koopman Neural Operator As A Mesh-Free Solver of Non-Linear Partial Differential Equations

Uploaded by

Copyright:

Available Formats

Koopman neural operator as a mesh-free solver of non-linear partial differential

I. INTRODUCTION Φ with Γ for a typically time-dependent PDE family

∂t γ (xt ) = (Lϕ γ) (xt ) + η (xt ) , xt ∈ D × T, (1)

To date, diverse types of PDE solvers have been devel-

where ε ∈ (0, ∞) denotes time difference, notion γb : [9],

elements in set D × {t}) (1) Long-term behaviour of the PDE family as a

∂t γt = ζ (γt , t) , ∀γt ∈ Rdγ × T, (7) (2) Equivalent linear prediction of non-linear

After formalizing the time-dependent Koopman opera-

Rn = [g (γ0 ) , g (γε ) , g (γ2ε ) , . . . , g (γnε )] , (18)

the convolutional network also implements an in- tions of Part 5.

where γ b[t−mε,t] is a vector [b bt ] defined by m ∈

20316490). prediction across resolutions, which is consistent with our

Operator size o Mode number f Iteration number r Weight α Weight β RMSE

TABLE I. Results of the ablation experiment on the 1-dimensional Bateman–Burgers equation.

Besides the linear system in Eq. (9)) in the main text,

g (γt+ε ) − g (γt ) Ktt+ε g (γt ) − g (γt )

the western boundary current data collected by the 4. Shallow-water equations.

2. Navier-Stokes equation. 5. Water vapor data.

4 import torch . nn . functional as F 62 x_reconstruct = torch . tanh ( x_reconstruct

115 self . w0 = nn . Conv2d ( op_size , op_size 124 x = self . enc ( x ) # Encoder

You might also like