Koopman Neural Operator As A Mesh-Free Solver of Non-Linear Partial Differential Equations
Koopman Neural Operator As A Mesh-Free Solver of Non-Linear Partial Differential Equations
Koopman Neural Operator As A Mesh-Free Solver of Non-Linear Partial Differential Equations
equations∗
Wei Xiong,† Xiaomeng Huang,† Ziyang Zhang,‡ Ruixuan Deng,§ Pei Sun,¶ and Yang Tian∗∗
The lacking of analytic solutions of diverse partial differential equations (PDEs) gives birth to
a series of computational techniques for numerical solutions. Although numerous latest advances
are accomplished in developing neural operators, a kind of neural-network-based PDE solver, these
solvers become less accurate and explainable while learning long-term behaviors of non-linear PDE
families. In this paper, we propose the Koopman neural operator (KNO), a new neural operator,
to overcome these challenges. With the same objective of learning an infinite-dimensional map-
ping between Banach spaces that serves as the solution operator of the target PDE family, our
approach differs from existing models by formulating a non-linear dynamic system of equation so-
lution. By approximating the Koopman operator, an infinite-dimensional operator governing all
possible observations of the dynamic system, to act on the flow mapping of the dynamic system, we
arXiv:2301.10022v2 [cs.LG] 6 May 2024
can equivalently learn the solution of a non-linear PDE family by solving simple linear prediction
problems. We validate the KNO in mesh-independent, long-term, and5zero-shot predictions on five
representative PDEs (e.g., the Navier-Stokes equation and the Rayleigh-Bénard convection) and
three real dynamic systems (e.g., global water vapor patterns and western boundary currents). In
these experiments, the KNO exhibits notable advantages compared with previous state-of-the-art
models, suggesting the potential of the KNO in supporting diverse science and engineering applica-
tions (e.g., PDE solving, turbulence modelling, and precipitation forecasting).
(2) Neural-network-based solvers. To revolution- Because these solvers learn the solution opera-
ize the computational techniques of PDE solving, tor directly, they only need to be trained once
three types of neural-network-based solvers have for a target PDE family. Generating equa-
been proposed to approximate or enhance the clas- tion solution γ (·) of different instances of the
sic ones in a fast manner [10, 11]: PDE family only requires a forward pass of
networks, which is computationally favorable
(2a) Mesh-dependent and finite-dimensional [5, 9]. Although neural operators are initially
operators. The first type of solvers approx- not competitive with other neural-network-
imate the solution operator as a parameter- based solvers because evaluating kernal inte-
ized neural network between finite Euclidean gral operators is costly, the latest approach,
spaces after discretizing domains D and T into named as Fourier neural operator [9], resolves
x and y meshes, i.e., Qθ : Rx × Ry × Θ → this limitation by fast Fourier transform.
Rx × Ry [12–14]. These solvers are mesh-
dependent and require fine-tuning on different Compared with classic numerical solvers, neural-
values of n, leading to limited generalization network-based solvers, especially neural operators, are
capacities [5]. more efficient in dealing with science and engineering
questions where PDEs are complicated [5, 9]. Therefore,
(2b) Neural finite element methods. The sec- our research primarily focus on neural operator designs.
ond type of solvers directly parameterize equa- C. Existing partial differential equation solvers
tion solution γ (·) as a neural network, which face challenges
equivalently gives rise to Qθ : D × T × Θ → R
[10, 15–18]. Although these solvers are mesh-
Despite substantial progress achieved by neural opera-
independent and accurate, they are limited to
tors in theoretical foundations (e.g., Ref. [22]), approxi-
learn a certain instance of the PDE rather
mator designs (e.g., Refs. [9, 23]), and applications (e.g.,
than the entire family [5]. Therefore, similar
Ref. [24]), there still remain numerous challenges in ex-
to the classic numerical ones, these solvers re-
isting neural operator solvers, among which, a critical one
quires new network design and training when-
lies in the limited capacity of existing models to learn the
ever the instance is changed. Meanwhile, most
long-term dynamics of complicated PDEs.
of these solvers are not applicable to the cases
To understand this challenge, let us consider a Green
where the underlying PDE remains unknown
function, Jϕ : (D × T ) × (D × T ) → R, associated with
(see an exception in the finite element network
Eqs. (1-3) [5]
[18], which supports spatio-temporal forecast-
ing on real data). Z
γ (xt+ε ) = Jϕ (xt , yt ) η (yt ) dyt , ∀ xt ∈ D × {t}.
(2c) Neural operators. The third type of solvers D×{t}
are developed to learn a mesh-dependent and (4)
infinite-dimensional solution operator with
neural networks, i.e, Qθ : Φ×Θ → Γ [5, 9, 19– The initial idea of neural operators [5] is to parameterize
23]. These solvers overcome the dependence this Green function as a kernel integral operator (i.e.,
on meshes by learning network parameters in a see the integral term related to κθ presented below) and
manner applicable to different discretizations. define an iterative update strategy of neural networks
Z !
γ
b (xt+ε ) = σ W γ
b (xt ) + κθ (xt , yt , ϕ (xt ) , ϕ (yt )) γ
b (yt ) dyt , ∀ xt ∈ D × {t}, (5)
D×{t}
linearize the original dynamic system in an appropriate of the PDE described by Eqs. (1-3) [51]
space [31]. For instance, let us take the case where sys-
′ ′
tem γt is autonomous (i.e., θtt can be simplified as θt ) M = Dnx + αγ (xt ) I, α ∈ C, (12)
as a simple illustration,
the family of Koopman oper- Mψ (xt ) = λψ (xt ) , λ ∈ C, (13)
ators Kε : G Rdγ × T → G Rdγ × T , parameterized ∂t ψ (xt ) = N ψ (xt ) , (14)
by time difference ε, is defined based on a set of ob-
servation function
(or named as measurement function) in which Dnx is the n-th total derivative operator and I
G Rdγ × T = {g|g : Rdγ × T → C} [31] is an identity operator. We notice that Eq. (13) actually
defines an eigenvalue problem associated with operator
Kε g (γt ) = g (θε (γt )) = g (γt+ε ) , ∀γt ∈ Rdγ × T. (9) M at moment t. By calculating the time derivative of Eq.
(13), we can observe a relation between linear operators
Given with an appropriate space defined by G Rdγ × T , M and N
we can linearize the dynamics of γt via Eq. (9). This idea
has seen notable success in fluid dynamics [37], robotics (∂t M + MN − N M) ψ (xt ) = ∂t λψ (xt ) . (15)
[38], plasma physics [39], and neuroscience [40].
Different from existing machine-learning-based Koop- This relation implies
man operator models that either are limited to au-
tonomous dynamic systems (e.g., the case described by ∂t M + [M, N ] = 0, (16)
Eq. (9)) [41–44] or require a priori knowledge about
the eigenvalue spectrum (e.g, the numbers of real and where [M, N ] = MN − N M stands for the commutator
complex eigenvalues) of Koopman operator for non- of operators [52]. Given Eqs. (12-16), we can combine
autonomous dynamic systems [45], our framework con- them with Eq. (11) to identify a relation between opera-
cerns a more general case where we consider a time- tor N and the time-dependent Koopman operator Ktt+ε
dependent Koopman operator applicable to both non-
autonomous and autonomous dynamic systems [46] Ktt+ε g (γt ) − g (γt )
ψ (D × {t}) = g (γt ) ⇒ N = lim .
t+ε→t ε
Ktt+ε g (γt ) = g θtt+ε (γt ) = g (γt+ε ) , ∀t ≤ t + ε ∈ T.
(17)
(10)
Note that Eq. (17) still holds when the evolution of
As shown in Eq. (10), this Koopman operator governs a equation solution γt is autonomous (i.e., we can consider
time-dependent linear evolution flow of g (γt ) in a space a time-invariant Koopman operator K in Eq. (17) di-
defined by G Rdγ × T rectly).
Therefore, the linearization of g (γt ) via a Koopman
Ktt+ε g (γt ) − g (γt )
∂t g (γt ) = lim . (11) operator is closely related to the Lax pair and can be
ε→0 ε understood in the aspect of the inverse scattering trans-
In mathematics, the adjoint of the Koopman operator form of integrable PDEs [51]. This relation has been
defined by Eq. (9) is the Perron-Frobenius operator of comprehensively studied in mathematics and physics [53–
dynamic systems [47] while the adjoint of the associated 56], ensuring the validity and effectiveness of using the
Lie operator (see A for details) is the Liouville operator Koopman operator theory during PDE solving.
of Hamiltonian dynamics [48, 49]. These properties re-
late our approach with well-known theories about linear
representation of dynamic systems in statistical physics C. Computational idea of the Koopman operator
and quantum mechanics [50]. approximation
Z
b tt+ε h (γt ) − Ktt+ε h (γt ) ∥F dµ = 0, ∀h (·) ∈ G Rdγ × T
lim ∥K (23)
m→∞ G (Rdγ ×T )
if the original Koopman operator is bounded and K The expensive online optimization is not favorable for
happens to be its invariant subspace (or simply
n → ∞). solving PDEs, persuading us to consider an alternative
Notion µ denotes a measure on G Rdγ × T and ∥ · ∥F is solution. As suggested in Ref. [63], we can assume a con-
the Frobenius norm. Therefore, we are expected to ap- stant form of the Koopmen operator, K b tt+ε , only when ε
proximate the original Koopman operator via a restricted is sufficiently small (i.e., the time interval is small enough
counterpart such that such that the changes of K b tt+ε during this duration are
negligible). Given this assumption, we can further con-
b (k+1)ε Hm×n (k) , ∀k = 1, . . . , n, (24)
Hm×n (k + 1) = K kε sider a condition under which the dynamic system of
g (γt ) is approximately ergodic (i.e., the observation of
where Hm×n (k) denotes the k-th column of Hm×n . equation solution, g (γt ), eventually visits all possible
A challenge lies in that all the derivations presented states in Rdγ as t → ∞, thus the proportion of time
above (i.e., the Krylov subspace method, Hankel matrix that g (γt ) spends on a particular state equals the prob-
representation, and the Galerkin projection) are initially ability of this state) [57, 64]. This condition makes the
proposed for autonomous systems [31, 57–59]. Although time-averaging approximate the true expectation of of an
we can formulate these mathematical expressions for a observable as the time approaches to infinity. Under this
time-dependent Koopman operator as shown in Eqs. (18- condition, we can define an expectation of the Koopman
24), this formulation inevitably requires an online opti- operator controlled by time difference ε
mization in computational implementation because the
concerned Koopman operator may change across time.
Z n
1 −1
X
Kε = lim g (γτ ) g (γτ +ε ) dτ ≃ argmin ∥Hm×n (k + 1) − P Hm×n (k) ∥F . (25)
t→∞ t [0,t) P ∈Rdγ +1 k=1
Given a fixed ε, the Koopman operator Kε : Therefore, the offline optimization of the Koopman op-
G Rdγ × T → K in Eq. (25) can be understood as the erator is required to support a high time resolution (i.e.,
time-average of K b tt+ε at different t. A representation of a small ε). Meanwhile, we need to find an appropri-
Kε only requires offline optimization and is computation- ate design of the observation function g (·) such that the
ally favorable for solving PDEs. ergodicity condition holds with reasonable errors. Al-
6
though these two requirements make analytic derivations are less ergodic and difficult to predict using the
or algorithmic approaches highly difficult (e.g., see Ref. Koopman operator. Please see Figure 1for details.
[63] for the usage of an integrated framework of offline
low rank decomposition and online reduced-order model- (3) Part 3: Hankel representation and offline
ing in dealing with the first requirement), we suggest the Koopman operator. Given gF (γbt ) for every
possibility for a neural-network-based approach to satisfy t ∈ εN+ , we set a dimension of delay-embedding,
them in practice. m ∈ N, to define a Hankel matrix H bm×n of gF (γbt )
(note that n equals the number of accessible sam-
ples). To ensure that the space spanned by the
Hankel matrix successfully approximates the invari-
D. Neural network architectures of Koopman
ant sub-space of target Koopman operator, we train
neural operator
a o × o linear layer to learn the r-th power of the
r
Koopman operator Kε : G Rdγb × T → K b follow-
The non-trivial part of the idea discussed above is how ing Eqs. (24-25). Based on the learned operator
to define an effective neural network architecture to real- r
Kε , we can predict the future
ize a Koopman-operator-based PDE solving pipeline. Be- state of the lat-
est observable gF γ b(m+n−1)ε as gF γ b(m+n)ε =
low, we introduce the Koopman neural operator (KNO) h r iT
as a possible architecture design, whose open-source Py- Kε H bm×n (n) (m), where notion T denotes the
Torch toolbox is released in Ref. [65]. In general, a KNO transpose of a matrix. Here we choose to learn the
architecture consists of the following parts: r-th power of Kε because a flexible value of r ∈ N
offers opportunities for us to adjust the time resolu-
(1) Part 1: Observation. Given an input ϕt = tion. Specifically, given the a priori time resolution
ϕ (D × {t}) of the PDE in Eqs. (1-3), we first ε pre-determined by the data set, we can control the
transform it as g (γbt ) in space G Rdγct × T by an internal time resolution of the learned Koopman
encoder (a single non-linear layer with tanh (·) acti- operator by setting a prediction length r. Based
vation function) that represent an observation func- on this setting, predicting the evolution of gF (b γt )
tion g (·). Please see Figure 1for illustrations. during a duration of ε requires the Koopman op-
erator to iterates r times in the Fourier space (i.e.,
(2) Part 2: Fourier transform. We apply the each iteration only corresponds to a short period of
Fourier transform to map g (γbt ) as gF (γbt ) = F ◦ ε/r). In other words, even though the time resolu-
g (γbt ) and parameterize the subsequent parts of tion ε of the sampled data may be not high enough,
our network in the Fourier space. Similar to Ref. we can define a non-one value of r to improve the
[9], gF (γbt ) is computed by fast Fourier transform, actual time resolution on which the Koopman op-
where we truncate the Fourier series at ω, a max- erator acts. This procedure has benefits in practice
imum number of frequency modes. On the one because the changes of the time-dependent Koop-
hand, as suggested in Ref. [9], this procedure offers man operator are more negligible during a smaller
a convenient computation of the iterative update period, which makes the offline optimization less
strategy shown in Eq. (4). On the other hand, the challenging. Please see Figure 1for illustrations of
truncated Fourier transform is important for our Part 3.
model because it subdivides the system into two
parts. The first part corresponds to the remaining (4) Part 4: Inverse Fourier transform.
Given each
low-frequency modes after truncation while the sec- predicted state gF γ b(m+n)ε in Part 3, we trans-
ond part consists of the truncated high-frequency form it from the Fourier space to G Rdγb × T by
modes. Although there is no theoretical guaran- an inverse Fourier transform, i.e, g γb(m+n)ε =
tee, low-frequency modes are generally more stable F −1 ◦ gF γb(m+n)ε . Note that high-frequency fluc-
than those volatile high-frequency modes in prac- tuations in the system have been filtered out in
tice unless the system is purely random. In other Part 2 and can not be directly recovered by the
words, it is more possible for low-frequency modes inverse Fourier transform (the complement of the
to satisfy the ergodicity condition with acceptable lost information is realized by Part 5). Please see
errors than high-frequency modes. Therefore, by Figure 1for the instances of Part 4.
filtering out high-frequency modes and learning the
Koopman operator only on low-frequency modes in (5) Part 5: High-frequency information com-
Part 3, we can empirically reduce the difficulty plement. According to the Fourier analysis im-
of data-driven Koopman operator approximation. plemented on feature maps, convolutional lay-
Meanwhile, we design an extra part (Part 5) in ers can amplify high-frequency components [66].
our model to extract the high-frequency modes of Therefore, we train a convolutional network C
g (γbt ) to complement the lost information about on the outputs of Part 1 to extract their
high-frequency fluctuations. Thus, our model can high-frequency information, denoted by gC (γbt ),
also capture the volatile parts of real systems that as a complement of Parts 2-4. Meanwhile,
7
FIG. 1. Conceptual illustrations of neural network architectures of the KNO. Note that the layout of every part is slightly
reorganized to offer a clear version.
h r iT
bt′ = g−1 F −1 ◦ Kε ◦ F ◦ g γ
γ b[t−mε,t] + C ◦ g γ
b[t−mε,t] (m) , (26)
| {z } | {z }
Parts 1-4 Part 1 and part 5
tasks to validate its effectiveness. (5) Ablation experiment. To demonstrate the sig-
nificance of the learned Koopman operator in the
KNO, we implement an ablation experiment.
III. EXPERIMENTS
In our experiments, the high-frequency complement
can be either realized by a single convolutional layer, Cs ,
A. Experiments design or by a simple tripartite convolutional network, Ct . Be-
sides the KNO, we implement the following models for
1. Data set information comparison: the Fourier neural operator (FNO) [9], the
U-shaped neural operator (UNO) [78], the convolutional
We implement our experiments on the 1-dimensional neural operator (CNO) [79], latent spectral model (LSM)
Bateman–Burgers equation [71], the 2-dimensional [80], the U-Net [81], and the residual neural network
Navier-Stokes equation [72], the 2-dimensional Rayleigh- (ResNet) [82]. Among these baseline models, the FNO [9]
Bénard convection (i.e., a kind of turbulent convection) is an extensively validated neural operator model. The
[73], the 2-dimensional shallow-water equations [74], the UNO [78] and the CNO [79] represent the performance of
infrared satellite imagery of water vapor in the Storm recently proposed neural operators. The LSM [80] serves
EVent ImagRy data set [75], and the western boundary as an instance of the neural network parameterization of
current data collected by the E.U. Copernicus Marine spectral methods. The U-Net [81] and the ResNet [82]
Environment Monitoring Service [76, 77]. The first four represent the classic ideas of spatio-temporal modeling
data sets correspond to typical problems in turbulence in the field of deep learning. Other common neural net-
analysis. The last two data sets contain the real data work models in deep learning, such as the convolutional
governed by unknown PDEs, which offer opportunities LSTM (ConvLSTM) [83], and deep hidden physics model
to explore the applicability of the KNO in real precipi- (DHPM) [84], are shown as less efficient on complex dy-
tation forecasting (e.g., storm and rainfall) and western namic systems [83] and are not considered in our research.
boundary current analysis tasks (e.g., the dynamic mod- Each model is trained by its default optimizer (e.g., the
elling of Kuroshio and Gulf Stream currents). Please see KNO and the FNO are trained by the Adam optimizer).
B for full data description. The experiments on the first four PDE data sets are
implemented on a single Nvidia V100 GPU with 32GB
memory. The experiments on the last two large-scale real
2. Experiment designs and baselines data sets are implemented on a single Nvidia A100 GPU
with 40GB memory.
We conduct five experiments to validate our model:
(1) Mesh-independent experiment. As suggested B. Experiment results
in previous works [5, 9, 19–23], neural operator
models are expected to be mesh-independent be- 1. Mesh-independent experiment
casue they learn the solution operator of an entire
PDE family. Therefore, we design an experiment Our mesh-independent experiment is implemented on
to validate the mesh-independent property of the the data of 1-dimensional Bateman–Burgers equation
KNO. generated under different discretization conditions (i.e.,
spatial resolution of meshes). The data with highest res-
(2) Long-term prediction experiment. To validate
olution is generated following the Gaussian initialization
the long-term prediction capacity of the KNO, we
introduced in Ref. [9]. The data with lower resolution
design prediction tasks on eight data sets, including
are directly down-sampled from the data with higher res-
five representative PDE solving problems in turbu-
olution.
lence analysis and three complicated dynamic sys-
We choose the FNO as a baseline for comparison. As
tem modelling problems.
for other baseline models less efficient than FNO in mesh-
(3) Zero-shot prediction experiment (discretiza- independent experiment, such as graph neural operator
tion granularity). Following the idea of Ref. [9], (GNO) [5] and multipole graph neural operator (MGNO)
we validate the generalization ability of the KNO [85] (see results reported by Ref. [9]), we no longer dis-
by testing it on untrained discretization granularity cuss them for convenience. We implement multiple ver-
(e.g., in a way similar to super-resolution [9]). sions of the KNO with different hyper-parameters (e.g.,
operator size o, frequency mode number f , the itera-
(4) Zero-shot prediction experiment (prediction tion number r of the Koopman operator, and the rela-
interval). Apart from the generalization on un- tive weight λ that controls the contributions of low- and
trained discretization granularity, we also validate high-frequency information. Please see Sec. II D for the
the generalization capacity of the KNO on un- meanings of these parameters). Under every condition,
trained prediction interval (e.g., zero-shot temporal these models are trained on 1000 samples and conduct 1-
interpolation and extrapolation). second forward prediction (i.e., t′ −t = 1) on 200 samples
9
FIG. 2. Experiment results of mesh-independence. (a-d) respectively present the results under different conditions of high-
frequency complement designs and λ in the KNO. Here the high-frequency complement is either realized by a single convolutional
layer, Cs , or by a simple tripartite convolutional network, Ct . Parameter λ controls the contributions of low- and high-frequency
information. Model performance is measured using the root mean square error (RMSE). In the legends of (a-d), the numbers
within brackets indicate the parameter settings and model sizes, where the last number corresponds to the model size and all
other numbers denote parameters. Notion w in the FNO stands for the width parameter (i.e., the dimension of latent space)
[9]. Note that the results of the FNO are not repeatedly shown in (b-d) since the adjustment of λ has no relation with the
FNO. Therefore, the performances of the KNO models in (b-d) can be directly compared with the performance of the FNO in
(a).
for performance evaluation. During training, the batch ture (i.e., t′ = t + zε, where z ∈ {1, . . . , 40} is se-
size is fixed as 64. The learning rate is initialized at 0.001 lected for the 2-dimensional Rayleigh-Bénard convection
and is halved every 100 epochs. The weights of predic- data while z ∈ {1, . . . , 10} is set for all other data
tion and reconstruction in the loss function are defined sets). Note that the meaning of the time difference ε be-
as α = 5 and β = 0.5, respectively. tween adjacent time frames varies across different data
As illustrated in Figure 2, the KNO achieves almost sets. For the 2-dimensional Navier-Stokes equation, the
constant prediction error on every resolution. Compared 2-dimensional Rayleigh-Bénard convection, and the 2-
with the FNO, the prediction error of KNO models are dimensional shallow-water equations, the time difference
generally smaller under different conditions. Notably, a denotes one second. For the the western boundary cur-
one-unit KNO only requires about 5 × 103 parameters to rent data (e.g., the Kuroshio and Gulf Stream currents),
achieve similar performance with a one-unit FNO that the time difference corresponds to one day. For the in-
has more than 1.4×105 parameters. When the model size frared satellite imagery of water vapor, the time differ-
of the KNO increases to 2 × 104 or 4 × 104 , the KNO can ence corresponds to a five-minute-interval. Therefore,
significantly outperform the FNO. These results suggest ten or fifty time frames can already span a long interval
the possibility for the KNO to be better at maintaining of physics time to make the concerned systems exhibit
the balance between accuracy and efficiency in PDE solv- long-term and highly non-linear evolution.
ing (i.e., the KNO achieves higher accuracy with fewer Here we present the details of experiment implemen-
parameters). Meanwhile, we notice that a KNO model tation. For the Navier-Stokes equation, the training set
with a balance between high- and low-frequency informa- contains 1000 samples. The testing set contains 200 sam-
tion (i.e., λ = 0.5) can maintain more robust performance ples for ν ∈ {10−3 , 10−4 } while it includes 100 samples for
across different resolutions (i.e., the fluctuations of per- ν = 10−5 . For the Rayleigh-Bénard convection, models
formance are relatively smaller). Therefore, we speculate are trained and tested on 1600 and 200 samples, respec-
that λ = 0.5 is an optimal empirical choice in dealing tively. For the shallow-water equations, the training and
with cross-resolution prediction tasks. This speculation testing sets respectively include 1600 and 200 samples.
is also verified in our subsequent zero-shot experiment in In all western boundary current data prediction tasks,
Figure 4. models are trained on 1600 samples and tested on 200
samples. For the water vapor prediction task, models are
tested on 400 samples after being trained on 3200 sam-
2. Long-term prediction experiment ples. To show that the KNO is not limited to small model
designs, we present a possible realization of a relatively
Our long-term prediction experiment is implemented large KNO model (defined with o = 16, f = 12, λ = 0.5,
on all 2-dimensional data sets. Among these imple- and r = 6. High-frequency information complement is re-
mented data sets, the 2-dimensional Navier-Stokes equa- alized using the simple tripartite convolutional network
tion can be defined with three viscosity cases, i.e., ν ∈ Ct . Model size is 19871979). The compared models on
{10−3 , 10−4 , 10−5 }. On all data sets, the trained mod- all data sets include the FNO (model size is 926517), the
els are required to predict 10 time frames in the fu- UNO (model size is 30478033), the CNO (model size is
10
FIG. 3. Experiment results of long-term prediction. (a-h) show the performances of all models on different data sets, where
each prediction step creates a time frame. (i) visualizes the prediction results (the twenty-sixth time frame after the initial
condition is selected as an instance for illustration) and the associated errors of all models on the Rayleigh-Bénard convection.
Note that color bars in (i) are shared by all models. Therefore, the results can be directly compared across different models.
(j) presents the prediction results (the third time frame after the initial condition) and the associated errors of the KNO on
the Kuroshio and Gulf Stream currents.
2667034), the LSM (model size is 19188162), the U-Net (model size is 24950491), and the ResNet (model size is
11
FIG. 4. Results of the zero-shot experiment (discretization granularity) on the 1-dimensional Bateman–Burgers equation.
(a-d) respectively present the results under different conditions of high-frequency complement designs and λ in the KNO. Here
the high-frequency complement can be either realized by a single convolutional layer, Cs , or by a simple tripartite convolutional
network, Ct . Parameter λ controls the contributions of low- and high-frequency information.
FIG. 5. Results of the zero-shot experiment (prediction interval) on the 2-dimensional Rayleigh-Bénard convection data. (a-b)
show the results of the first kind of zero-shot prediction, where models are supervised by the time frames separated by 2 or 4
untrained frames during training, respectively. The result of the ResNet is absent in (b) because the ResNet does not converge
well during training. (c-d) present the results of the second type of zero-shot prediction, where there are 20 or 30 unsupervised
time frames during testing. Note that the lines shown in (a-d) are smoothed using the B-spline basis offered by Scipy [86]
because the raw performances of the ResNet and the CNO are oscillating. (e) visualizes the prediction results (the twenty-sixth
time frame after the initial condition is selected as an instance for illustration) and the associated errors of all models during
the zero-shot experiment shown in (d).
ideal accuracy in both kinds of zero-shot experiments. vation function and its inverse by the reconstruction loss.
In Figure 5(e), we visualize instances of the prediction In the latter case, the whole model is driven to focus only
results of all models during the zero-shot extrapolation on the temporal prediction and the linear layer does not
experiment with 30 unsupervised time frames. Because necessarily function in a Koopman-operator-like manner.
these instances correspond to the same system shown in We use the data of the 1-dimensional Bate-
Figure 3(i), we can directly compare between these two man–Burgers equation to implement the ablation exper-
cases and find that the long-term prediction of the KNO iment. For convenience, the high-frequency complement
is still robust even without supervision while the perfor- in the KNO is realized by a single convolutional layer, Cs ,
mances of other models significantly reduce in the zero- with λ = 0.5. As shown in Table I, the KNO defined with
shot case. These results suggest that the captured evo- an effective reconstruction process generally outperforms
lution patterns by the approximated Koopman operator the KNO framework only driven by the prediction loss.
are suitable for dynamic system modelling.
IV. CONCLUSION
5. Ablation experiment
In summary, we develop the Koopman neural oper-
To prove that the learned Koopman operator is a key ator (KNO), a mesh-independent neural-network-based
factor to ensure the effectiveness of the KNO, we set solver of partial differential equations. The basic code
a simple ablation experiment to compare the predic- of the KNO is provided in C and the official toolbox of
tion performances between the KNO framework with and the KNO can be seen in Ref. [36]. Compared with the
without the reconstruction loss term. In the former case, existing state-of-the-art approaches in the field of neural-
the temporal prediction in the model mainly depends on network-based PDE solving, such as the Fourier neural
the weight matrix of the linear layer, which functions as operator [9], the U-shaped neural operator [78], the con-
the r-th power of a Koopman operator. Other optimiz- volutional neural operator [79], and latent spectral model
able parts of the model will be driven to learn the obser- [80], our proposed KNO exhibits a higher capacity to cap-
13
ture the long-term evolution of PDEs or real dynamic sys- 2021YFC3101600, 2020YFA0608000, 2020YFA0607900).
tems. Meanwhile, it generally maintains accuracy across Authors are grateful for the technical supports provided
different mesh resolutions. This property suggests the by Mr. Hao Wu, who studies at the School of Com-
potential of the KNO to be considered as a basic unit puter Science and Technology, University of Science and
for constructing large-scale frameworks to solve complex Technology of China. Meanwhile, authors appreciate the
physics equations with non-linear dynamics or predict helps of Mr. Yuan Gao and Mr. Shuyi Zhou, who study
future states of real dynamic systems. Moreover, the ro- at Department of Earth System Science, Tsinghua Uni-
bust mesh-independence property enables the KNO to be versity, during data preparation.
trained on the low resolution data before being applied
on the high resolution data. This property may create
more possibilities to reduce the computational costs of Appendix A: The associated Lie operator of the
PDE solving. Koopman operator
Although our work primarily focuses on the Koopman Appendix B: Details of the implemented data sets
operator, one can also model the Lie operator in appli-
cation. In our experiments, we consider the 1-dimensional
Bateman–Burgers equation [71], the 2-dimensional
Navier-Stokes equation [72], the 2-dimensional Rayleigh-
Bénard convection [73], the 2-dimensional shallow-water
equations [74], the infrared satellite imagery of water
vapor in the Storm EVent ImagRy data set [75], and
14
∂t h + ∂x u + ∂y v = 0, (B9)
1. Bateman–Burgers equation.
1
∂t hu + ∂x u2 h + h2 = −gr h∂x b, (B10)
gr
The 1-dimensional Bateman–Burgers equation is
1
2
u (xt )
∂t hv + ∂y v 2 h + h2 = −gr h∂y b, (B11)
∂t u (xt ) + ∂x = ν∂xx u (xt ) , xt ∈ (0, 1) × (0, 1] , gr
2
(B1) where h denotes a scalar field of water depth, notion b de-
scribes the spatially varying bathymetry, and gr measures
u (x0 ) = uI , x0 ∈ (0, 1) × {0}, (B2)
the gravitational acceleration. Notions u and v denote
where u (·) denotes the velocity, notion ωI is a periodic the velocities in horizontal and vertical directions, re-
initial condition, and parameter ν is the viscosity coeffi- spectively. The data of the 2-dimensional shallow-water
cient. The data set of Eqs. (B1-B2) is provided by Ref. equations is offered by Ref. [74]. In our experiments, the
[9]. The learning objective of this equation is u (·). learning objective is depth field h.
Mathematically, the incompressible 2-dimensional The infrared satellite imagery of water vapor data is
Navier-Stokes equation in a vorticity form is defined as acquired from the Storm EVent ImagRy data set [75],
which can be used for precipitation analysis and fore-
∂t ω + u · ∇ω = ν∇2 ω + ∇ × f , (B3) casting (e.g., storm and rainfall). The spatial resolution
∇ · u = 0, (B4) of satellite imagery is 2 kilometer and the time interval
between each pair of adjacent imagery data is 5 minutes.
where ω is the scalar field of vorticity, notion u de- For convenience, we have down-sampled the data from
fines the vector field of velocity, and f denotes a time- 192 × 192 to 128 × 128 in the spatial domain.
independent forcing term. The viscosity coefficient is set
as ν ∈ {10−3 , 10−4 , 10−5 }. Similar to the situation of
Bateman–Burgers equation, the data of Eqs. (B3-B4) is 6. Western boundary current data.
provided by Ref. [9]. In experiments, the learning target
is vorticity field ω. The western boundary current data is acquired from
the E.U. Copernicus Marine Environment Monitoring
Service [76]. The data set can be used to analyze the
3. Rayleigh-Bénard convection equation.
dynamics of western boundary current, which is an im-
portant factor in shaping global ocean circulation and
The 2-dimensional Rayleigh-Bénard convection is de- climate variability [77]. Specifically, we use the the daily
fined as [73] sea surface stream velocity data whose spatial resolution
is 0.25×0.25 degrees. We choose the data in the Kuroshio
1
∂t u + u · ∇u + ∇p = ν∇2 u + f , (B5) region (10-42◦ N, 123-155◦ E) and the Gulf Stream region
ρ (θ) (20-52◦ N, 33-65◦ W) to study the Kuroshio and the Gulf
∂t θ + u · ∇θ = κ∇2 θ, (B6) Stream currents. The data covers the time interval from
∇ · u = 0, (B7) 2013/1/1 to 2018/12/31.
ρ (θ) = ρ (θ0 ) [1 − a (θ − θ0 )] , (B8)
where p is the pressure, scalar θ is the temperature, coef- Appendix C: Code implementation
ficient κ measures the heat conductivity, and ρ (θ) is the
density of fluids given the temperature θ. Notion θ0 is The basic code implement of the KNO based on
the initial temperature and a denotes the thermal expan- PyTorch is presented below. The official toolbox
sion coefficient. The data of the 2-dimensional Rayleigh- of the KNO is presented in https://github.com/
Bénard convection is generated by Ref. [83] using the Koopman-Laboratory/KoopmanLab (also see Ref. [36]).
Boussinesq approximation approach. In our experiments, 1 import torch
the learning target is the magnitude of velocity u (i.e., a 2 import numpy as np
scalar field). 3 import torch . nn as nn
15
[1] L. Debnath and L. Debnath, Nonlinear partial differential lems, Communications in Mathematics and Statistics 6,
equations for scientists and engineers (Springer, 2005). 1 (2018).
[2] H. Tanabe, Functional analytic methods for partial dif- [16] L. Bar and N. Sochen, Unsupervised deep learning algo-
ferential equations (CRC Press, 2017). rithm for pde-based forward and inverse problems, arXiv
[3] M. S. Gockenbach, Partial differential equations: analyt- preprint arXiv:1904.05417 (2019).
ical and numerical methods, Vol. 122 (Siam, 2005). [17] S. Pan and K. Duraisamy, Physics-informed probabilistic
[4] R. M. Mattheij, S. W. Rienstra, and J. T. T. Boonkkamp, learning of linear embeddings of nonlinear dynamics with
Partial differential equations: modeling, analysis, compu- guaranteed stability, SIAM Journal on Applied Dynami-
tation (SIAM, 2005). cal Systems 19, 480 (2020).
[5] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhat- [18] M. Lienen and S. Günnemann, Learning the dynamics of
tacharya, A. Stuart, and A. Anandkumar, Neural oper- physical systems from sparse observations with finite ele-
ator: Graph kernel network for partial differential equa- ment networks, in International Conference on Learning
tions, arXiv preprint arXiv:2003.03485 (2020). Representations (2022).
[6] J. N. Reddy, Introduction to the finite element method [19] L. Lu, P. Jin, and G. E. Karniadakis, Deeponet: Learning
(McGraw-Hill Education, 2019). nonlinear operators for identifying differential equations
[7] K. Lipnikov, G. Manzini, and M. Shashkov, Mimetic based on the universal approximation theorem of opera-
finite difference method, Journal of Computational tors, arXiv preprint arXiv:1910.03193 (2019).
Physics 257, 1163 (2014). [20] K. Bhattacharya, B. Hosseini, N. B. Kovachki, and A. M.
[8] E. Tadmor, A review of numerical methods for nonlinear Stuart, Model reduction and neural networks for para-
partial differential equations, Bulletin of the American metric pdes, arXiv preprint arXiv:2005.03180 (2020).
Mathematical Society 49, 507 (2012). [21] N. H. Nelsen and A. M. Stuart, The random feature
[9] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhat- model for input-output maps between banach spaces,
tacharya, A. Stuart, and A. Anandkumar, Fourier neu- SIAM Journal on Scientific Computing 43, A3212 (2021).
ral operator for parametric partial differential equations, [22] N. Kovachki, S. Lanthaler, and S. Mishra, On universal
arXiv preprint arXiv:2010.08895 (2020). approximation and error bounds for fourier neural op-
[10] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Physics- erators, Journal of Machine Learning Research 22, Art
informed neural networks: A deep learning framework for (2021).
solving forward and inverse problems involving nonlinear [23] Z. Li, D. Z. Huang, B. Liu, and A. Anandkumar, Fourier
partial differential equations, Journal of Computational neural operator with learned deformations for pdes
physics 378, 686 (2019). on general geometries, arXiv preprint arXiv:2207.05209
[11] D. Kochkov, J. A. Smith, A. Alieva, Q. Wang, M. P. (2022).
Brenner, and S. Hoyer, Machine learning–accelerated [24] J. Guibas, M. Mardani, Z. Li, A. Tao, A. Anandkumar,
computational fluid dynamics, Proceedings of the Na- and B. Catanzaro, Efficient token mixing for transform-
tional Academy of Sciences 118, e2101784118 (2021). ers via adaptive fourier neural operators, in International
[12] X. Guo, W. Li, and F. Iorio, Convolutional neural net- Conference on Learning Representations (2021).
works for steady flow approximation, in Proceedings of [25] L. Perko, Differential equations and dynamical systems,
the 22nd ACM SIGKDD international conference on Vol. 7 (Springer Science & Business Media, 2013).
knowledge discovery and data mining (2016) pp. 481–490. [26] P. A. Fishwick, Handbook of dynamic system modeling
[13] Y. Zhu and N. Zabaras, Bayesian deep convolutional (CRC Press, 2007).
encoder–decoder networks for surrogate modeling and [27] P. C. Bressloff, Spatiotemporal dynamics of continuum
uncertainty quantification, Journal of Computational neural fields, Journal of Physics A: Mathematical and
Physics 366, 415 (2018). Theoretical 45, 033001 (2011).
[14] S. Bhatnagar, Y. Afshar, S. Pan, K. Duraisamy, and [28] J. E. Herrera-Estrada, Y. Satoh, and J. Sheffield, Spa-
S. Kaushik, Prediction of aerodynamic flow fields using tiotemporal dynamics of global drought, Geophysical Re-
convolutional neural networks, Computational Mechan- search Letters 44, 2254 (2017).
ics 64, 525 (2019). [29] W. Wu, C. Meneveau, and R. Mittal, Spatio-temporal
[15] B. Yu et al., The deep ritz method: a deep learning- dynamics of turbulent separation bubbles, Journal of
based numerical algorithm for solving variational prob- Fluid Mechanics 883, A45 (2020).
17
[30] E. Schöll, Nonlinear spatio-temporal dynamics and chaos [47] A. Lasota and M. C. Mackey, Probabilistic properties
in semiconductors, 10 (Cambridge University Press, of deterministic systems (Cambridge university press,
2001). 1985).
[31] S. L. Brunton, M. Budisic, E. Kaiser, and J. N. Kutz, [48] P. Gaspard, G. Nicolis, A. Provata, and S. Tasaki, Spec-
Modern koopman theory for dynamical systems, SIAM tral signature of the pitchfork bifurcation: Liouville equa-
Review 64, 229 (2022). tion approach, Physical Review E 51, 74 (1995).
[32] J. Pathak, S. Subramanian, P. Harrington, S. Raja, [49] P. Gaspard, Chaos, scattering and statistical mechanics,
A. Chattopadhyay, M. Mardani, T. Kurth, D. Hall, Z. Li, Chaos (2005).
K. Azizzadenesheli, et al., Fourcastnet: A global data- [50] A. Lasota and M. C. Mackey, Chaos, fractals, and noise:
driven high-resolution weather model using adaptive stochastic aspects of dynamics, Vol. 97 (Springer Science
fourier neural operators, arXiv preprint arXiv:2202.11214 & Business Media, 1998).
(2022). [51] P. D. Lax, Integrals of nonlinear equations of evolution
[33] D. Cao, Y. Wang, J. Duan, C. Zhang, X. Zhu, C. Huang, and solitary waves, Communications on pure and applied
Y. Tong, B. Xu, J. Bai, J. Tong, et al., Spectral tem- mathematics 21, 467 (1968).
poral graph neural network for multivariate time-series [52] P. D. Lax, Linear algebra and its applications, Vol. 78
forecasting, Advances in neural information processing (John Wiley & Sons, 2007).
systems 33, 17766 (2020). [53] J. P. Parker and J. Page, Koopman analysis of isolated
[34] F. Aminian, E. D. Suarez, M. Aminian, and D. T. Walz, fronts and solitons, SIAM Journal on Applied Dynamical
Forecasting economic data with neural networks, Com- Systems 19, 2803 (2020).
putational Economics 28, 71 (2006). [54] H. Nakao and I. Mezić, Spectral analysis of the koopman
[35] L. Xu, N. Chen, Z. Chen, C. Zhang, and H. Yu, Spa- operator for partial differential equations, Chaos: An In-
tiotemporal forecasting in earth system science: Meth- terdisciplinary Journal of Nonlinear Science 30, 113131
ods, uncertainties, predictability and future directions, (2020).
Earth-Science Reviews 222, 103828 (2021). [55] C. Gin, B. Lusch, S. L. Brunton, and J. N. Kutz, Deep
[36] W. Xiong, M. Ma, X. Huang, Z. Zhang, P. Sun, and learning models for global coordinate transformations
Y. Tian, Koopmanlab: A library for koopman neu- that linearise pdes, European Journal of Applied Math-
ral operator with pytorch (2023), open source codes ematics 32, 515 (2021).
available at https://github.com/Koopman-Laboratory/ [56] J. Page and R. R. Kerswell, Koopman analysis of burgers
KoopmanLab. equation, Physical Review Fluids 3, 071901 (2018).
[37] C. W. Rowley, I. Mezić, S. Bagheri, P. Schlatter, and [57] H. Arbabi and I. Mezic, Ergodic theory, dynamic mode
D. S. Henningson, Spectral analysis of nonlinear flows, decomposition, and computation of spectral properties
Journal of fluid mechanics 641, 115 (2009). of the koopman operator, SIAM Journal on Applied Dy-
[38] I. Abraham and T. D. Murphey, Active learning of dy- namical Systems 16, 2096 (2017).
namics for data-driven control using koopman operators, [58] N. Črnjarić-Žic, S. Maćešić, and I. Mezić, Koopman op-
IEEE Transactions on Robotics 35, 1071 (2019). erator spectrum for random dynamical systems, Journal
[39] R. Taylor, J. N. Kutz, K. Morgan, and B. A. Nelson, of Nonlinear Science 30, 2007 (2020).
Dynamic mode decomposition for plasma diagnostics and [59] S. L. Brunton, B. W. Brunton, J. L. Proctor, E. Kaiser,
validation, Review of Scientific Instruments 89, 053501 and J. N. Kutz, Chaos as an intermittently forced linear
(2018). system, Nature communications 8, 1 (2017).
[40] B. W. Brunton, L. A. Johnson, J. G. Ojemann, and J. N. [60] Y. Saad, Numerical methods for large eigenvalue prob-
Kutz, Extracting spatial–temporal coherent patterns in lems: revised edition (SIAM, 2011).
large-scale neural recordings using dynamic mode decom- [61] M. Korda and I. Mezić, On convergence of extended dy-
position, Journal of neuroscience methods 258, 1 (2016). namic mode decomposition to the koopman operator,
[41] N. Takeishi, Y. Kawahara, and T. Yairi, Learning koop- Journal of Nonlinear Science 28, 687 (2018).
man invariant subspaces for dynamic mode decomposi- [62] M. Li and L. Jiang, Reduced-order modeling for koopman
tion, Advances in Neural Information Processing Systems operators of nonautonomous dynamic systems in multi-
30 (2017). scale media, arXiv preprint arXiv:2204.13180 (2022).
[42] O. Azencot, N. B. Erichson, V. Lin, and M. Mahoney, [63] M. Li and L. Jiang, Data-driven reduced-order model-
Forecasting sequential data using consistent koopman ing for nonautonomous dynamical systems in multiscale
autoencoders, in International Conference on Machine media, Journal of Computational Physics 474, 111799
Learning (PMLR, 2020) pp. 475–485. (2023).
[43] S. E. Otto and C. W. Rowley, Linearly recurrent autoen- [64] I. P. Cornfeld, S. V. Fomin, and Y. G. Sinai, Ergodic the-
coder networks for learning dynamics, SIAM Journal on ory, Vol. 245 (Springer Science & Business Media, 2012).
Applied Dynamical Systems 18, 558 (2019). [65] W. Xiong, M. Ma, X. Huang, Z. Zhang, P. Sun, and
[44] D. J. Alford-Lago, C. W. Curtis, A. T. Ihler, and O. Is- Y. Tian, Koopmanlab: A pytorch module of koopman
san, Deep learning enhanced dynamic mode decomposi- neural operator family for solving partial differential
tion, Chaos: An Interdisciplinary Journal of Nonlinear equations, arXiv preprint arXiv:2301.01104 (2023).
Science 32, 033116 (2022). [66] N. Park and S. Kim, How do vision transformers work?,
[45] B. Lusch, J. N. Kutz, and S. L. Brunton, Deep learning arXiv preprint arXiv:2202.06709 (2022).
for universal linear embeddings of nonlinear dynamics, [67] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,
Nature communications 9, 1 (2018). D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabi-
[46] S. Macesic, N. Crnjaric-Zic, and I. Mezic, Koopman oper- novich, Going deeper with convolutions, in Proceedings
ator family spectrum for nonautonomous systems, SIAM of the IEEE conference on computer vision and pattern
Journal on Applied Dynamical Systems 17, 2478 (2018). recognition (2015) pp. 1–9.
18
[68] K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton, lutional neural operators for robust and accurate learning
Data-driven discovery of coordinates and governing equa- of pdes, Advances in Neural Information Processing Sys-
tions, Proceedings of the National Academy of Sciences tems 36 (2024).
116, 22445 (2019). [80] H. Wu, T. Hu, H. Luo, J. Wang, and M. Long, Solv-
[69] K. P. Champion, S. L. Brunton, and J. N. Kutz, Discov- ing high-dimensional pdes with latent spectral models,
ery of nonlinear multiscale systems: Sampling strategies in Proceedings of the 40th International Conference on
and embeddings, SIAM Journal on Applied Dynamical Machine Learning (2023) pp. 37417–37438.
Systems 18, 312 (2019). [81] O. Ronneberger, P. Fischer, and T. Brox, U-net: Con-
[70] U. Fasel, J. N. Kutz, B. W. Brunton, and S. L. Brunton, volutional networks for biomedical image segmentation,
Ensemble-sindy: Robust sparse model discovery in the in International Conference on Medical image computing
low-data, high-noise limit, with active learning and con- and computer-assisted intervention (Springer, 2015) pp.
trol, Proceedings of the Royal Society A 478, 20210904 234–241.
(2022). [82] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learn-
[71] E. R. Benton and G. W. Platzman, A table of solutions of ing for image recognition, in Proceedings of the IEEE con-
the one-dimensional burgers equation, Quarterly of Ap- ference on computer vision and pattern recognition (2016)
plied Mathematics 30, 195 (1972). pp. 770–778.
[72] C. Wang, Exact solutions of the steady-state navier- [83] R. Wang, K. Kashinath, M. Mustafa, A. Albert, and
stokes equations, Annual Review of Fluid Mechanics 23, R. Yu, Towards physics-informed deep learning for tur-
159 (1991). bulent flow prediction, in Proceedings of the 26th ACM
[73] E. Bodenschatz, W. Pesch, and G. Ahlers, Recent devel- SIGKDD International Conference on Knowledge Dis-
opments in rayleigh-bénard convection, Annual review of covery & Data Mining (2020) pp. 1457–1466.
fluid mechanics 32, 709 (2000). [84] M. Raissi, Deep hidden physics models: Deep learning of
[74] M. Takamoto, T. Praditia, R. Leiteritz, D. MacKinlay, nonlinear partial differential equations, The Journal of
F. Alesiani, D. Pflüger, and M. Niepert, Pdebench: An Machine Learning Research 19, 932 (2018).
extensive benchmark for scientific machine learning, Ad- [85] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, A. Stuart,
vances in Neural Information Processing Systems 35, K. Bhattacharya, and A. Anandkumar, Multipole graph
1596 (2022). neural operator for parametric partial differential equa-
[75] M. Veillette, S. Samsi, and C. Mattioli, Sevir: A storm tions, Advances in Neural Information Processing Sys-
event imagery dataset for deep learning applications in tems 33, 6755 (2020).
radar and satellite meteorology, Advances in Neural In- [86] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haber-
formation Processing Systems 33, 22009 (2020). land, T. Reddy, D. Cournapeau, E. Burovski, P. Peter-
[76] E.U. Copernicus Marine Environment Monitoring Ser- son, W. Weckesser, J. Bright, et al., Scipy 1.0: fundamen-
vice, Daily satellite global sea level data (sealevel glo phy tal algorithms for scientific computing in python, Nature
l4 observations 008 047) collected by the copernicus ma- methods 17, 261 (2020).
rine environment monitoring service (2021). [87] B. O. Koopman, Hamiltonian systems and transfor-
[77] D. Hu, L. Wu, W. Cai, A. S. Gupta, A. Ganachaud, mation in hilbert space, Proceedings of the National
B. Qiu, A. L. Gordon, X. Lin, Z. Chen, S. Hu, et al., Pa- Academy of Sciences 17, 315 (1931).
cific western boundary currents and their roles in climate, [88] R. Abraham, J. E. Marsden, and T. Ratiu, Manifolds,
Nature 522, 299 (2015). tensor analysis, and applications, Vol. 75 (Springer Sci-
[78] M. A. Rahman, Z. E. Ross, and K. Azizzadenesheli, ence & Business Media, 2012).
U-no: U-shaped neural operators, arXiv preprint [89] C. Chicone and Y. Latushkin, Evolution semigroups in
arXiv:2204.11127 (2022). dynamical systems and differential equations, 70 (Amer-
[79] B. Raonic, R. Molinaro, T. De Ryck, T. Rohner, F. Bar- ican Mathematical Soc., 1999).
tolucci, R. Alaifari, S. Mishra, and E. de Bézenac, Convo-