Multi-Region Markovian Gaussian Process: An Efficient Method to Discover Directional Communications Across Multiple Brain Regions

Weihan Li    Chengrui Li    Yule Wang    Anqi Wu
Abstract

Studying the complex interactions between different brain regions is crucial in neuroscience. Various statistical methods have explored the latent communication across multiple brain regions. Two main categories are the Gaussian Process (GP) and Linear Dynamical System (LDS), each with unique strengths. The GP-based approach effectively discovers latent variables with frequency bands and communication directions. Conversely, the LDS-based approach is computationally efficient but lacks powerful expressiveness in latent representation. In this study, we merge both methodologies by creating an LDS mirroring a multi-output GP, termed Multi-Region Markovian Gaussian Process (MRM-GP). Our work establishes a connection between an LDS and a multi-output GP that explicitly models frequencies and phase delays within the latent space of neural recordings. Consequently, the model achieves a linear inference cost over time points and provides an interpretable low-dimensional representation, revealing communication directions across brain regions and separating oscillatory communications into different frequency bands.

Machine Learning, ICML

1 Introduction

The number of simultaneous neural recordings from various brain regions has increased recently. These recordings offer opportunities to explore the mechanisms through which inter-areal communication supports brain function (Kohn et al., 2020). Brain regions linked to sensory and cognitive functions often display interconnectedness, with signals transmitted bidirectionally and potentially simultaneously (Harris & Mrsic-Flogel, 2013; Miller et al., 2018; Wang et al., 2024). However, the high-dimensional neural recordings typically present a complex view of this concurrent communication—for example, neurons may concurrently represent overlapping neural activities within a certain region. Therefore, uncovering the interactions between different brain regions presents a challenging task.

Many statistical methodologies have been employed to address the challenge of understanding communications across multiple brain regions. Hultman et al. 2018 examined multi-region local field potential data and identified frequency-based interactions across brain regions using a Gaussian Process Factor Analysis model. Following a similar approach, Gokcen et al. 2022, 2023 suggested that latent variables can be divided into across- and within-region components. This model was applied to disentangle the concurrent and bidirectional communications across brain regions with a multi-output Squared Exponential kernel. Glaser et al. 2020 developed a switching linear dynamic system to uncover low-dimensional interactions among multiple brain regions. This method captured regions responsible for transitioning between latent states by specifying a novel transition rule.

Broadly categorized into Gaussian Process (GP) and Linear Dynamical System (LDS) classes, these methods offer distinct advantages. The GP-based approach, leveraging the robust representational capability of multi-output kernels, performs well in discovering latent variables with crucial information, such as frequencies and directional communications. Conversely, the LDS-based approach, while computationally efficient with a linear cost in time points, lacks the powerful expressiveness of GP in latent representation.

Our goal is to combine the strengths of both methodologies by constructing an LDS that mirrors a GP. Several studies have explored this connection: Hartikainen & Särkkä 2010 established a framework about converting a single-output GP with Matern or Squared Exponential kernel to an LDS, which relied on spectral factorization (Sayed & Kailath, 2001). Building upon this, Solin & Särkkä 2014 proposed the conversion for single-output periodic kernels, and Särkkä et al. 2013 extended the single-output conversions to a spatiotemporal GP. However, applying these conversions for GP-based multi-region methods is non-trivial because a gap exists in converting a multi-output GP to an LDS. One approach to bridge this gap is to assume the kernel is separable over spatial and temporal domain (Solin et al., 2016). This allows us to create a closed-form multi-output GP-LDS conversion following the framework proposed in single-output cases.

Consequently, choosing a separable kernel becomes essential in this study. An effective option is the complex-valued multi-region kernel (Ulrich et al., 2015), specifically designed to facilitate learning latent interactions encompassing frequencies and phase delays across brain regions. However, it is important to note that the connection between an LDS and a GP with a complex-valued multi-output kernel remains unknown.

We introduce the Multi-Region Markovian Gaussian Process (MRM-GP) to model latent representations, where Markovian means the discrete state space representation of a GP. Our work establishes a connection between an LDS and a multi-output GP that explicitly discovers frequency-based latent communications and their directionality via phase delays. By doing so, we can have three advantages: (1) utilizing the powerful representational capability of kernel functions; (2) employing the efficient inference algorithm to ensure a linear computational cost over time points; (3) extending the LDS to incorporate time-varying frequencies and delays by switching states.

We test MRM-GP using multi-region spike trains and local field potential recordings. The model proves its capability to produce understandable low-dimensional representations. These representations illustrate the direction of communication flow among regions and effectively disentangle oscillatory interactions into diverse frequencies.

2 Background

We introduce the multi-region kernel, a multi-output kernel for modeling interactions across different brain regions. Then, we demonstrate how a Gaussian Process with this kernel can be employed to model latent communications across regions. It is worth noting that various mapping methods can be used to project latent representations onto neural recordings, and in this case, we opt for Factor Analysis.

2.1 Multi-Region Kernel

The complex-valued multi-region kernel proposed in (Ulrich et al., 2015) explicitly models communication frequencies and phase delays within the latent space of neural data:

𝐊pp(τ)subscript𝐊𝑝superscript𝑝𝜏\displaystyle\mathbf{K}_{pp^{\prime}}(\tau)\!\!\!\!bold_K start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_τ ) =\displaystyle== r=1Rapraprexp(12σ2τ2+iη(τ+ϕpp)),superscriptsubscript𝑟1𝑅superscriptsubscript𝑎𝑝𝑟superscriptsubscript𝑎superscript𝑝𝑟12superscript𝜎2superscript𝜏2𝑖𝜂𝜏subscriptitalic-ϕ𝑝superscript𝑝\displaystyle\!\!\!\!\sum_{r=1}^{R}a_{p}^{r}a_{p^{\prime}}^{r}\exp\left(-\frac% {1}{2\sigma^{2}}\tau^{2}+i\eta(\tau+\phi_{pp^{\prime}})\right),∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_i italic_η ( italic_τ + italic_ϕ start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ) ,
=\displaystyle== exp(12σ2τ2+iητ)temporalr=1Rapraprexp(iηϕpp)spatial.subscript12superscript𝜎2superscript𝜏2𝑖𝜂𝜏temporalsubscriptsuperscriptsubscript𝑟1𝑅superscriptsubscript𝑎𝑝𝑟superscriptsubscript𝑎superscript𝑝𝑟𝑖𝜂subscriptitalic-ϕ𝑝superscript𝑝spatial\displaystyle\!\!\!\!\underbrace{\exp\left(-\frac{1}{2\sigma^{2}}\tau^{2}+i% \eta\tau\right)}_{\text{temporal}}\underbrace{\sum_{r=1}^{R}a_{p}^{r}a_{p^{% \prime}}^{r}\exp(i\eta\phi_{pp^{\prime}})}_{\text{spatial}}.under⏟ start_ARG roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_i italic_η italic_τ ) end_ARG start_POSTSUBSCRIPT temporal end_POSTSUBSCRIPT under⏟ start_ARG ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT roman_exp ( italic_i italic_η italic_ϕ start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) end_ARG start_POSTSUBSCRIPT spatial end_POSTSUBSCRIPT .

This kernel ensures separability over space and time, where p𝑝pitalic_p and psuperscript𝑝p^{\prime}italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are two brain regions, τ=tt𝜏𝑡superscript𝑡\tau=t-t^{\prime}italic_τ = italic_t - italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the time interval. In the temporal part, σ𝜎\sigmaitalic_σ signifies the length scale, i=1𝑖1i=\sqrt{-1}italic_i = square-root start_ARG - 1 end_ARG denotes the imagery unit, η𝜂\etaitalic_η represents the communication frequency between regions. In the spatial part, ϕppsubscriptitalic-ϕ𝑝superscript𝑝\phi_{pp^{\prime}}italic_ϕ start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT represents the phase delay between region p𝑝pitalic_p and psuperscript𝑝p^{\prime}italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, aprsuperscriptsubscript𝑎𝑝𝑟a_{p}^{r}italic_a start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT and aprsuperscriptsubscript𝑎superscript𝑝𝑟a_{p^{\prime}}^{r}italic_a start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT are amplitudes, and R>1𝑅1R>1italic_R > 1 denotes the rank number ensuring positive definiteness.

The separability is required to establish a connection between the multi-region kernel and a linear dynamic system (LDS). Moreover, the real part of this kernel Re[𝐊pp(τ)]=r=1Rapraprexp(12σ2τ2)cos(η(τ+ϕpp))Resubscript𝐊𝑝superscript𝑝𝜏superscriptsubscript𝑟1𝑅superscriptsubscript𝑎𝑝𝑟superscriptsubscript𝑎superscript𝑝𝑟12superscript𝜎2superscript𝜏2𝜂𝜏subscriptitalic-ϕ𝑝superscript𝑝\operatorname{Re}[\mathbf{K}_{pp^{\prime}}(\tau)]=\sum_{r=1}^{R}a_{p}^{r}a_{p^% {\prime}}^{r}\exp(-\frac{1}{2\sigma^{2}}\tau^{2})\cos(\eta(\tau+\phi_{pp^{% \prime}}))roman_Re [ bold_K start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ] = ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) roman_cos ( italic_η ( italic_τ + italic_ϕ start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ), denoted as the Cross-Spectral Mixture (CSM) kernel (Ulrich et al., 2015), has been shown to effectively capture frequency-based communications among various brain regions (Hultman et al., 2018). However, due to CSM’s non-separability, we work with the complex-valued kernel as represented in Eq. 2.1 to build an LDS.

2.2 Multi-Region Gaussian Process Factor Analysis

Refer to caption
Figure 1: An example of two dimensions across-region latent variables and one dimension within-region latent variable. Brain region A and region B have bidirectional communications within different frequency bands. Each region also has a one-dimensional neural activity unrelated to the other region.

A Gaussian Process Factor Analysis model, utilizing the real part of the multi-region kernel in Eq. 2.1 and named CSM-GPFA, can identify latent variables that capture frequencies and phase delays across brain regions.

Given single region neural recording ypnp×Tsuperscript𝑦𝑝superscriptsuperscript𝑛𝑝𝑇y^{p}\in\mathbb{R}^{n^{p}\times T}italic_y start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT × italic_T end_POSTSUPERSCRIPT, p{1,,P}𝑝1𝑃p\in\left\{1,\dots,P\right\}italic_p ∈ { 1 , … , italic_P } is the brain region index, npsuperscript𝑛𝑝n^{p}italic_n start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT denotes the number of neurons in region p𝑝pitalic_p, and T𝑇Titalic_T represents time steps. Our goal is to find the M𝑀Mitalic_M independent low-dimensional variables xpM×Tsuperscript𝑥𝑝superscript𝑀𝑇x^{p}\in\mathbb{R}^{M\times T}italic_x start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_T end_POSTSUPERSCRIPT for each region’s neural data ypsuperscript𝑦𝑝y^{p}italic_y start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT. These variables from P𝑃Pitalic_P regions together form as x=[x1,,xP]MP×T𝑥superscriptsuperscript𝑥1superscript𝑥𝑃topsuperscript𝑀𝑃𝑇x=[x^{1},\dots,x^{P}]^{\top}\in\mathbb{R}^{MP\times T}italic_x = [ italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M italic_P × italic_T end_POSTSUPERSCRIPT, representing a latent representation for multi-region recordings y=[y1,,yP]N×T𝑦superscriptsuperscript𝑦1superscript𝑦𝑃topsuperscript𝑁𝑇y=[y^{1},\dots,y^{P}]^{\top}\in\mathbb{R}^{N\times T}italic_y = [ italic_y start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , italic_y start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_T end_POSTSUPERSCRIPT, where N=n1++nP𝑁superscript𝑛1superscript𝑛𝑃N=n^{1}+\dots+n^{P}italic_N = italic_n start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + ⋯ + italic_n start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT is the total number of neurons over P𝑃Pitalic_P regions. Besides, y𝑦yitalic_y is a linear mapping of x𝑥xitalic_x: y=𝐂x+d+ϵ𝑦𝐂𝑥𝑑italic-ϵy=\mathbf{C}x+d+\epsilonitalic_y = bold_C italic_x + italic_d + italic_ϵ, where 𝐂𝐂\mathbf{C}bold_C is a block diagonal matrix 𝐂=diag{𝐂1,,𝐂p,,𝐂P}N×MP𝐂diagsuperscript𝐂1superscript𝐂𝑝superscript𝐂𝑃superscript𝑁𝑀𝑃\mathbf{C}=\text{diag}\{\mathbf{C}^{1},\dots,\mathbf{C}^{p},\dots,\mathbf{C}^{% P}\}\in\mathbb{R}^{N\times MP}bold_C = diag { bold_C start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , … , bold_C start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT , … , bold_C start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT } ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_M italic_P end_POSTSUPERSCRIPT, dN×1𝑑superscript𝑁1d\in\mathbb{R}^{N\times 1}italic_d ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT is bias, and ϵ𝒩(0,𝐕)similar-toitalic-ϵ𝒩0𝐕\epsilon\sim\mathcal{N}(0,\mathbf{V})italic_ϵ ∼ caligraphic_N ( 0 , bold_V ) is Gaussian noise with 𝐕N×N𝐕superscript𝑁𝑁\mathbf{V}\in\mathbb{R}^{N\times N}bold_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT.

Meanwhile, a widely used assumption of xpsuperscript𝑥𝑝x^{p}italic_x start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT is to split it into across- and within-region parts (Gokcen et al., 2022): xp=[xp,a,xp,w],xp,ama×T,xp,wmw×Tformulae-sequencesuperscript𝑥𝑝superscriptsuperscript𝑥𝑝𝑎superscript𝑥𝑝𝑤topformulae-sequencesuperscript𝑥𝑝𝑎superscriptsubscript𝑚𝑎𝑇superscript𝑥𝑝𝑤superscriptsubscript𝑚𝑤𝑇x^{p}=[x^{p,a},x^{p,w}]^{\top},x^{p,a}\in\mathbb{R}^{m_{a}\times T},x^{p,w}\in% \mathbb{R}^{m_{w}\times T}italic_x start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT = [ italic_x start_POSTSUPERSCRIPT italic_p , italic_a end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT italic_p , italic_a end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT × italic_T end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT × italic_T end_POSTSUPERSCRIPT, where ma,mwsubscript𝑚𝑎subscript𝑚𝑤m_{a},m_{w}italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT are the number of dimensions for across- or within-region part and ma+mw=Msubscript𝑚𝑎subscript𝑚𝑤𝑀m_{a}+m_{w}=Mitalic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT + italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT = italic_M. The across-region variables xp,asuperscript𝑥𝑝𝑎x^{p,a}italic_x start_POSTSUPERSCRIPT italic_p , italic_a end_POSTSUPERSCRIPT describe neural activity that is shared across all brain regions, meaning that for the remaining P1𝑃1P-1italic_P - 1 regions, they have the latent variables with the same frequencies and dynamics except phase delays, while the within-region variables xp,wsuperscript𝑥𝑝𝑤x^{p,w}italic_x start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT describe the neural activity of region p𝑝pitalic_p that is not related to other regions (see Figure 1).

Consequently, we model xp,asuperscript𝑥𝑝𝑎x^{p,a}italic_x start_POSTSUPERSCRIPT italic_p , italic_a end_POSTSUPERSCRIPT and xp,wsuperscript𝑥𝑝𝑤x^{p,w}italic_x start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT separately with 𝐊𝐊\mathbf{K}bold_K in Eq. 2.1. For region p𝑝pitalic_p, there are masubscript𝑚𝑎m_{a}italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT dimensions of across-region variables, and each dimension xmp,aT×1,m[1,ma]formulae-sequencesuperscriptsubscript𝑥𝑚𝑝𝑎superscript𝑇1𝑚1subscript𝑚𝑎x_{m}^{p,a}\in\mathbb{R}^{T\times 1},m\in[1,m_{a}]italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p , italic_a end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_T × 1 end_POSTSUPERSCRIPT , italic_m ∈ [ 1 , italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ] has spatial correlations with the remaining P1𝑃1P-1italic_P - 1 regions. So, the mthsuperscript𝑚𝑡m^{th}italic_m start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT dimension across-region variables over P𝑃Pitalic_P regions: xma=[xm1,a,,xmP,a]P×Tsubscriptsuperscript𝑥𝑎𝑚subscriptsuperscript𝑥1𝑎𝑚subscriptsuperscript𝑥𝑃𝑎𝑚superscript𝑃𝑇x^{a}_{m}=[x^{1,a}_{m},\dots,x^{P,a}_{m}]\in\mathbb{R}^{P\times T}italic_x start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = [ italic_x start_POSTSUPERSCRIPT 1 , italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , … , italic_x start_POSTSUPERSCRIPT italic_P , italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_P × italic_T end_POSTSUPERSCRIPT are considered as a group and modeled as the real part of a multi-output Complex Gaussian Process with 𝐊msuperscript𝐊𝑚\mathbf{K}^{m}bold_K start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. For within-region variables, each dimension xmp,wT×1,m[1,mw]formulae-sequencesubscriptsuperscript𝑥𝑝𝑤𝑚superscript𝑇1𝑚1subscript𝑚𝑤x^{p,w}_{m}\in\mathbb{R}^{T\times 1},m\in[1,m_{w}]italic_x start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_T × 1 end_POSTSUPERSCRIPT , italic_m ∈ [ 1 , italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ] is independently modeled as the real part of a single-output Complex Gaussian Process with 𝐊(τ)m=r=1Rarm2exp(12σm2τ2+iηmτ)𝐊superscript𝜏𝑚superscriptsubscript𝑟1𝑅superscriptsuperscriptsubscript𝑎𝑟𝑚212superscriptsuperscript𝜎𝑚2superscript𝜏2𝑖superscript𝜂𝑚𝜏\mathbf{K}(\tau)^{m}=\sum_{r=1}^{R}{a_{r}^{m}}^{2}\exp(-\frac{1}{2{\sigma^{m}}% ^{2}}\tau^{2}+i\eta^{m}\tau)bold_K ( italic_τ ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_i italic_η start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_τ ), where p=p𝑝superscript𝑝p=p^{\prime}italic_p = italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and ϕpp=0subscriptitalic-ϕ𝑝superscript𝑝0\phi_{pp^{\prime}}=0italic_ϕ start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = 0. Furthermore, we also assume independence among different dimensions of across- and within-region variables, implying that different index m𝑚mitalic_m refers to distinct kernel parameters.

Unlike the approach in (Ulrich et al., 2015), which uses a mixture of frequencies, we employ a single frequency in Eq 2.1 to achieve frequency disentanglement. Specifically, each dimension of the across-region latent variable will have a single frequency peak. Consequently, the mixture of frequencies present in the data will be captured by multiple dimensions.

3 Method

Modeling latent variables xpsuperscript𝑥𝑝x^{p}italic_x start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT with Gaussian Process is inefficient with a 𝒪(T3)𝒪superscript𝑇3\mathcal{O}(T^{3})caligraphic_O ( italic_T start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) time complexity. So, we want to build Markovian representations of these latent variables, indicating the state space representations of each dimension: across-region xmasubscriptsuperscript𝑥𝑎𝑚x^{a}_{m}italic_x start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and within-region xmp,wsubscriptsuperscript𝑥𝑝𝑤𝑚x^{p,w}_{m}italic_x start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, where every Markovian representation follows a Linear Dynamical System (LDS) (Solin et al., 2016).

Spectral factorization has been used in multi-output GP cases (Zhu et al., 2023). However, for the complex-valued multi-output kernel, we need to develop a new spectral factorization-based method to first convert the complex-valued temporal part to an LDS (Section 3.1) and then use the kernel’s separability to combine the complex-valued spatial part to get the final LDS (Section 3.2).

3.1 Markovian Within-Region Latent Variables

The Markovian representation of region p𝑝pitalic_p’s mthsuperscript𝑚𝑡m^{th}italic_m start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT dimension within-region latent variable xmp,wT×1subscriptsuperscript𝑥𝑝𝑤𝑚superscript𝑇1x^{p,w}_{m}\in\mathbb{R}^{T\times 1}italic_x start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_T × 1 end_POSTSUPERSCRIPT follows a discrete-time LDS structure:

fm,tp,w=𝐀mwfm,t1p,w+qt1,qt1𝒞𝒩(0,𝐐mw),\displaystyle\begin{split}f_{m,t}^{p,w}=\mathbf{A}_{m}^{w}f_{m,t-1}^{p,w}+q_{t% -1}&,\quad q_{t-1}\sim\mathcal{CN}(0,\mathbf{Q}_{m}^{w}),\\ \end{split}start_ROW start_CELL italic_f start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT = bold_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_m , italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT + italic_q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL start_CELL , italic_q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∼ caligraphic_C caligraphic_N ( 0 , bold_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ) , end_CELL end_ROW (2)

where fm,tp,w=[gm,tp,w,dgm,tp,wdt,,dk1gm,tp,wdtk1]Tk×Tsuperscriptsubscript𝑓𝑚𝑡𝑝𝑤superscriptsubscriptsuperscript𝑔𝑝𝑤𝑚𝑡𝑑subscriptsuperscript𝑔𝑝𝑤𝑚𝑡𝑑𝑡superscript𝑑𝑘1subscriptsuperscript𝑔𝑝𝑤𝑚𝑡𝑑superscript𝑡𝑘1𝑇superscript𝑘𝑇f_{m,t}^{p,w}=[g^{p,w}_{m,t},\frac{dg^{p,w}_{m,t}}{dt},\dots,\frac{d^{k-1}g^{p% ,w}_{m,t}}{dt^{k-1}}]^{T}\in\mathbb{C}^{k\times T}italic_f start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT = [ italic_g start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT , divide start_ARG italic_d italic_g start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t end_ARG , … , divide start_ARG italic_d start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_ARG ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_k × italic_T end_POSTSUPERSCRIPT, denoting the complex-valued dynamics gm,tp,wsubscriptsuperscript𝑔𝑝𝑤𝑚𝑡g^{p,w}_{m,t}italic_g start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT and its derivatives up to (k1)thsuperscript𝑘1𝑡(k-1)^{th}( italic_k - 1 ) start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT order at time t𝑡titalic_t. Especially, within-region latent variable xmp,wsubscriptsuperscript𝑥𝑝𝑤𝑚x^{p,w}_{m}italic_x start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is the real part of gm,tp,wsubscriptsuperscript𝑔𝑝𝑤𝑚𝑡g^{p,w}_{m,t}italic_g start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT. 𝐀mwk×ksuperscriptsubscript𝐀𝑚𝑤superscript𝑘𝑘\mathbf{A}_{m}^{w}\in\mathbb{C}^{k\times k}bold_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_k × italic_k end_POSTSUPERSCRIPT represents the complex-valued transition matrix, and qt1subscript𝑞𝑡1q_{t-1}italic_q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT is the sampling from a complex normal distribution 𝒞𝒩()𝒞𝒩\mathcal{CN}(\cdot)caligraphic_C caligraphic_N ( ⋅ ) with the complex-valued measurement (Hermitian) matrix 𝐐mwk×ksuperscriptsubscript𝐐𝑚𝑤superscript𝑘𝑘\mathbf{Q}_{m}^{w}\in\mathbb{C}^{k\times k}bold_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_k × italic_k end_POSTSUPERSCRIPT.

The key question now becomes how to associate single-output 𝐊(τ)m=r=1Rarm2exp(12σm2τ2+iηmτ)𝐊superscript𝜏𝑚superscriptsubscript𝑟1𝑅superscriptsuperscriptsubscript𝑎𝑟𝑚212superscriptsuperscript𝜎𝑚2superscript𝜏2𝑖superscript𝜂𝑚𝜏\mathbf{K}(\tau)^{m}=\sum_{r=1}^{R}{a_{r}^{m}}^{2}\exp(-\frac{1}{2{\sigma^{m}}% ^{2}}\tau^{2}+i\eta^{m}\tau)bold_K ( italic_τ ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_i italic_η start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_τ ) with 𝐀mwsuperscriptsubscript𝐀𝑚𝑤\mathbf{A}_{m}^{w}bold_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT and 𝐐mwsuperscriptsubscript𝐐𝑚𝑤\mathbf{Q}_{m}^{w}bold_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT. To achieve this, our approach involves two steps: forming a continuous-time LDS for within-region variables in each region through spectral factorization (Kailath et al., 2000), and subsequently transforming it into the discrete-time version as specified in Eq. 2. This linkage is new and differs from previous connections (Hartikainen & Särkkä, 2010; Solin & Särkkä, 2014) as the kernel is situated in the complex domain.

Forming a continuous-time LDS.

Given single-output 𝐊msuperscript𝐊𝑚\mathbf{K}^{m}bold_K start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, the continuous-time LDS we want to form is:

df(t)mp,wdt=𝐅mwf(t)mp,w+𝐋u(t),𝑑𝑓subscriptsuperscript𝑡𝑝𝑤𝑚𝑑𝑡superscriptsubscript𝐅𝑚𝑤𝑓subscriptsuperscript𝑡𝑝𝑤𝑚𝐋𝑢𝑡\displaystyle\begin{split}\frac{df(t)^{p,w}_{m}}{dt}&=\mathbf{F}_{m}^{w}f(t)^{% p,w}_{m}+\mathbf{L}u(t),\end{split}start_ROW start_CELL divide start_ARG italic_d italic_f ( italic_t ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t end_ARG end_CELL start_CELL = bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT italic_f ( italic_t ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + bold_L italic_u ( italic_t ) , end_CELL end_ROW (3)

where f(t)mp,w=[g(t)mp,w,dg(t)mp,wdt,,dk1g(t)mp,wdtk1]k×T𝑓subscriptsuperscript𝑡𝑝𝑤𝑚superscript𝑔subscriptsuperscript𝑡𝑝𝑤𝑚𝑑𝑔subscriptsuperscript𝑡𝑝𝑤𝑚𝑑𝑡superscript𝑑𝑘1𝑔subscriptsuperscript𝑡𝑝𝑤𝑚𝑑superscript𝑡𝑘1topsuperscript𝑘𝑇f(t)^{p,w}_{m}=[g(t)^{p,w}_{m},\frac{dg(t)^{p,w}_{m}}{dt},\dots,\frac{d^{k-1}g% (t)^{p,w}_{m}}{dt^{k-1}}]^{\top}\in\mathbb{C}^{k\times T}italic_f ( italic_t ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = [ italic_g ( italic_t ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , divide start_ARG italic_d italic_g ( italic_t ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t end_ARG , … , divide start_ARG italic_d start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT italic_g ( italic_t ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_ARG ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_k × italic_T end_POSTSUPERSCRIPT, denoting the continuous-time version of gm,tasubscriptsuperscript𝑔𝑎𝑚𝑡g^{a}_{m,t}italic_g start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT and its derivatives up to (k1)thsuperscript𝑘1𝑡(k-1)^{th}( italic_k - 1 ) start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT order. 𝐅mwk×ksuperscriptsubscript𝐅𝑚𝑤superscript𝑘𝑘\mathbf{F}_{m}^{w}\in\mathbb{C}^{k\times k}bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_k × italic_k end_POSTSUPERSCRIPT is a continuous-time transition matrix, 𝐋=[0,,0,1]k×1𝐋superscript001topsuperscript𝑘1\mathbf{L}=[0,\dots,0,1]^{\top}\in\mathbb{R}^{k\times 1}bold_L = [ 0 , … , 0 , 1 ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_k × 1 end_POSTSUPERSCRIPT signifies a constant vector, and u(t)𝑢𝑡u(t)italic_u ( italic_t ) denotes a single-dimensional white noise with spectral density v𝑣vitalic_v. We need to obtain both 𝐅mwsuperscriptsubscript𝐅𝑚𝑤\mathbf{F}_{m}^{w}bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT and v𝑣vitalic_v from 𝐊msuperscript𝐊𝑚\mathbf{K}^{m}bold_K start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT.

𝐅mwsuperscriptsubscript𝐅𝑚𝑤\mathbf{F}_{m}^{w}bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT takes a companion form of LDS (Grewal & Andrews, 2014):

𝐅mw=[01011a0ak2ak1],superscriptsubscript𝐅𝑚𝑤matrix01missing-subexpressionmissing-subexpressionmissing-subexpression01missing-subexpressionmissing-subexpressionmissing-subexpression1subscript𝑎0subscript𝑎𝑘2subscript𝑎𝑘1\begin{split}\mathbf{F}_{m}^{w}=\begin{bmatrix}0&1&&\\ &0&1&\\ &&\ddots&1\\ -a_{0}&\dots&-a_{k-2}&-a_{k-1}\\ \end{bmatrix},\end{split}start_ROW start_CELL bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL 0 end_CELL start_CELL 1 end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL start_CELL ⋱ end_CELL start_CELL 1 end_CELL end_ROW start_ROW start_CELL - italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL start_CELL … end_CELL start_CELL - italic_a start_POSTSUBSCRIPT italic_k - 2 end_POSTSUBSCRIPT end_CELL start_CELL - italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , end_CELL end_ROW (4)

where a0,,ak1subscript𝑎0subscript𝑎𝑘1a_{0},\dots,a_{k-1}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT are the coefficients in a stochastic differential equation that is equivalent to Eq. 3:

dkg(t)mp,wdtk+ak1dk1g(t)mp,wdtk1++a0g(t)mp,w=u(t).superscript𝑑𝑘𝑔subscriptsuperscript𝑡𝑝𝑤𝑚𝑑superscript𝑡𝑘subscript𝑎𝑘1superscript𝑑𝑘1𝑔subscriptsuperscript𝑡𝑝𝑤𝑚𝑑superscript𝑡𝑘1subscript𝑎0𝑔subscriptsuperscript𝑡𝑝𝑤𝑚𝑢𝑡\begin{split}\frac{d^{k}g(t)^{p,w}_{m}}{dt^{k}}+a_{k-1}\frac{d^{k-1}g(t)^{p,w}% _{m}}{dt^{k-1}}+\dots+a_{0}g(t)^{p,w}_{m}=u(t).\end{split}start_ROW start_CELL divide start_ARG italic_d start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_g ( italic_t ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG + italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT divide start_ARG italic_d start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT italic_g ( italic_t ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_ARG + ⋯ + italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_g ( italic_t ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = italic_u ( italic_t ) . end_CELL end_ROW (5)

To obtain 𝐅mwsuperscriptsubscript𝐅𝑚𝑤\mathbf{F}_{m}^{w}bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT and v𝑣vitalic_v, we first apply Fourier transform on both sides of the continuous-time LDS in Eq. 3 to achieve a frequency domain representation (see Appendix A for derivation):

S(ω)=𝐆(𝐅mwiω𝐈)1𝐋v𝐋(𝐅mw+iω𝐈)T𝐆𝑆𝜔𝐆superscriptsuperscriptsubscript𝐅𝑚𝑤𝑖𝜔𝐈1𝐋𝑣superscript𝐋topsuperscriptsuperscriptsubscript𝐅𝑚𝑤𝑖𝜔𝐈𝑇superscript𝐆top\begin{split}S(\omega)=\mathbf{G}(\mathbf{F}_{m}^{w}-i\omega\mathbf{I})^{-1}% \mathbf{L}v\mathbf{L}^{\top}(\mathbf{F}_{m}^{w}+i\omega\mathbf{I})^{-T}\mathbf% {G}^{\top}\end{split}start_ROW start_CELL italic_S ( italic_ω ) = bold_G ( bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT - italic_i italic_ω bold_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_L italic_v bold_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT + italic_i italic_ω bold_I ) start_POSTSUPERSCRIPT - italic_T end_POSTSUPERSCRIPT bold_G start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL end_ROW (6)

where S(ω)=2πσmexp((ηmω)22σm)𝑆𝜔2𝜋superscript𝜎𝑚superscriptsuperscript𝜂𝑚𝜔22superscript𝜎𝑚S(\omega)=\sqrt{2\pi}\sigma^{m}\exp(-\frac{(\eta^{m}-\omega)^{2}}{2\sigma^{m}})italic_S ( italic_ω ) = square-root start_ARG 2 italic_π end_ARG italic_σ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG ( italic_η start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT - italic_ω ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_ARG ) is the spectral density of single-output 𝐊msuperscript𝐊𝑚\mathbf{K}^{m}bold_K start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, 𝐆=[1,0,,0]1×k𝐆100superscript1𝑘\mathbf{G}=[1,0,\dots,0]\in\mathbb{R}^{1\times k}bold_G = [ 1 , 0 , … , 0 ] ∈ blackboard_R start_POSTSUPERSCRIPT 1 × italic_k end_POSTSUPERSCRIPT represents a constant vector, 𝐈k×k𝐈superscript𝑘𝑘\mathbf{I}\in\mathbb{R}^{k\times k}bold_I ∈ blackboard_R start_POSTSUPERSCRIPT italic_k × italic_k end_POSTSUPERSCRIPT denotes an identity matrix, and, notably, v=2πσm𝑣2𝜋superscript𝜎𝑚v=\sqrt{2\pi}\sigma^{m}italic_v = square-root start_ARG 2 italic_π end_ARG italic_σ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. Now, we only need to solve Eq. 6 to obtain the coefficients a0,,ak1subscript𝑎0subscript𝑎𝑘1a_{0},\dots,a_{k-1}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT in 𝐅mwsuperscriptsubscript𝐅𝑚𝑤\mathbf{F}_{m}^{w}bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT.

On the left-hand side, S(ω)𝑆𝜔S(\omega)italic_S ( italic_ω ) follows an exponential family, which is infinitely differentiable. On the right-hand side, however, the finite number coefficients a0,,ak1subscript𝑎0subscript𝑎𝑘1a_{0},\dots,a_{k-1}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT in 𝐅mwsuperscriptsubscript𝐅𝑚𝑤\mathbf{F}_{m}^{w}bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT determine a finite polynomial function: 𝐆(𝐅mwiω𝐈)1𝐋𝐆superscriptsuperscriptsubscript𝐅𝑚𝑤𝑖𝜔𝐈1𝐋\mathbf{G}(\mathbf{F}_{m}^{w}-i\omega\mathbf{I})^{-1}\mathbf{L}bold_G ( bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT - italic_i italic_ω bold_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_L. Therefore, we can only construct a finite polynomial approximation of S(ω)𝑆𝜔S(\omega)italic_S ( italic_ω ). But one observation from Eq. 6 is that S(ω)𝑆𝜔S(\omega)italic_S ( italic_ω ) can be factorized into two parts, i.e., a complex function multiplying its conjugate.

Previous established connections between real-valued kernel and LDS assume a symmetric S(ω)𝑆𝜔S(\omega)italic_S ( italic_ω ), implying using a Taylor expansion to approximate S(ω)𝑆𝜔S(\omega)italic_S ( italic_ω ) as a polynomial of ω𝜔\omegaitalic_ω (Hartikainen & Särkkä, 2010). However, in the case of complex-valued 𝐊msuperscript𝐊𝑚\mathbf{K}^{m}bold_K start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, S(ω)𝑆𝜔S(\omega)italic_S ( italic_ω ) is non-symmetric due to frequency ηmsuperscript𝜂𝑚\eta^{m}italic_η start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, so our solution is to approximate it as a polynomial of iω𝑖𝜔i\omegaitalic_i italic_ω:

1S(ω)1𝑆𝜔\displaystyle\frac{1}{S(\omega)}\!\!\!\!divide start_ARG 1 end_ARG start_ARG italic_S ( italic_ω ) end_ARG \displaystyle\approx σm2π(b0+b1iω+b2(iω)2++b2k(iω)2k),superscript𝜎𝑚2𝜋subscript𝑏0subscript𝑏1𝑖𝜔subscript𝑏2superscript𝑖𝜔2subscript𝑏2𝑘superscript𝑖𝜔2𝑘\displaystyle\!\!\!\!\sqrt{\frac{\sigma^{m}}{2\pi}}(b_{0}+b_{1}i\omega+b_{2}(i% \omega)^{2}+\dots+b_{2k}(i\omega)^{2k}),square-root start_ARG divide start_ARG italic_σ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_π end_ARG end_ARG ( italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_i italic_ω + italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_i italic_ω ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ⋯ + italic_b start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT ( italic_i italic_ω ) start_POSTSUPERSCRIPT 2 italic_k end_POSTSUPERSCRIPT ) , (7)
=\displaystyle== T(iω),𝑇𝑖𝜔\displaystyle\!\!\!\!T(i\omega),italic_T ( italic_i italic_ω ) ,

where b2k=1subscript𝑏2𝑘1b_{2k}=1italic_b start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT = 1 if k𝑘kitalic_k is even, b2k=1subscript𝑏2𝑘1b_{2k}=-1italic_b start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT = - 1 if k𝑘kitalic_k is odd, b0,b2,,b2k2subscript𝑏0subscript𝑏2subscript𝑏2𝑘2b_{0},b_{2},\dots,b_{2k-2}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT 2 italic_k - 2 end_POSTSUBSCRIPT are real numbers, and b1,b3,,b2k1subscript𝑏1subscript𝑏3subscript𝑏2𝑘1b_{1},b_{3},\dots,b_{2k-1}italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT 2 italic_k - 1 end_POSTSUBSCRIPT are complex numbers that only have imagery parts. These coefficients’ values depend on σmsuperscript𝜎𝑚\sigma^{m}italic_σ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, ηmsuperscript𝜂𝑚\eta^{m}italic_η start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, and k𝑘kitalic_k. Figure 2(A-C) shows the approximation of S(ω)𝑆𝜔S(\omega)italic_S ( italic_ω ) when k=2,3,4𝑘234k=2,3,4italic_k = 2 , 3 , 4, demonstrating a reliable approximation even in the case of k=2𝑘2k=2italic_k = 2. Appendix D shows the effect of k𝑘kitalic_k on generated samples.

Refer to caption
Figure 2: Approximate S(ω)𝑆𝜔S(\omega)italic_S ( italic_ω ) with η=0.5𝜂0.5\eta=0.5italic_η = 0.5 rad/s, σ=4.5𝜎4.5\sigma=4.5italic_σ = 4.5 when varying the number of derivatives k=2,3,4𝑘234k=2,3,4italic_k = 2 , 3 , 4. k=2𝑘2k=2italic_k = 2 could provide a satisfactory approximation.

Now our target becomes solving the following equation for a0,,ak1subscript𝑎0subscript𝑎𝑘1a_{0},\dots,a_{k-1}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT:

T(iω)=σm2πH(iω)H(iω),H(iω)=a0+a1iω++ak1(iω)k1+(iω)k,formulae-sequence𝑇𝑖𝜔superscript𝜎𝑚2𝜋𝐻𝑖𝜔𝐻𝑖𝜔𝐻𝑖𝜔subscript𝑎0subscript𝑎1𝑖𝜔subscript𝑎𝑘1superscript𝑖𝜔𝑘1superscript𝑖𝜔𝑘\begin{split}T(i\omega)&=\sqrt{\frac{\sigma^{m}}{2\pi}}H(i\omega)H(-i\omega),% \\ H(i\omega)&=a_{0}+a_{1}i\omega+\dots+a_{k-1}(i\omega)^{k-1}+(i\omega)^{k},\end% {split}start_ROW start_CELL italic_T ( italic_i italic_ω ) end_CELL start_CELL = square-root start_ARG divide start_ARG italic_σ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_π end_ARG end_ARG italic_H ( italic_i italic_ω ) italic_H ( - italic_i italic_ω ) , end_CELL end_ROW start_ROW start_CELL italic_H ( italic_i italic_ω ) end_CELL start_CELL = italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_i italic_ω + ⋯ + italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ( italic_i italic_ω ) start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT + ( italic_i italic_ω ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT , end_CELL end_ROW (8)

where H(iω)𝐻𝑖𝜔H(i\omega)italic_H ( italic_i italic_ω ) is commonly referred to as the transfer function with a0,,ak1subscript𝑎0subscript𝑎𝑘1a_{0},\dots,a_{k-1}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT acting as its coefficients. Its reciprocal 1H(iω)1𝐻𝑖𝜔\frac{1}{H(i\omega)}divide start_ARG 1 end_ARG start_ARG italic_H ( italic_i italic_ω ) end_ARG is the function form of 𝐆(𝐅mwiω𝐈)1𝐋𝐆superscriptsuperscriptsubscript𝐅𝑚𝑤𝑖𝜔𝐈1𝐋\mathbf{G}(\mathbf{F}_{m}^{w}-i\omega\mathbf{I})^{-1}\mathbf{L}bold_G ( bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT - italic_i italic_ω bold_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_L. Solving Eq. 8 is often referred to as spectral factorization, and an advantageous aspect of this factorization is that a0,,ak1subscript𝑎0subscript𝑎𝑘1a_{0},\dots,a_{k-1}italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT are a subset of complex-valued roots of T(iω)𝑇𝑖𝜔T(i\omega)italic_T ( italic_i italic_ω ), where they are all situated within the left-half complex plane (Kailath et al., 2000) and can be found by QR algorithm with time complexity 𝒪(k)𝒪𝑘\mathcal{O}(k)caligraphic_O ( italic_k ). See Appendix A for derivation.

Forming a discrete-time LDS.

Given 𝐅mwsuperscriptsubscript𝐅𝑚𝑤\mathbf{F}_{m}^{w}bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT, 𝐋𝐋\mathbf{L}bold_L and v=2πσm𝑣2𝜋superscript𝜎𝑚v=\sqrt{\frac{2\pi}{\sigma^{m}}}italic_v = square-root start_ARG divide start_ARG 2 italic_π end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT end_ARG end_ARG in Eq. 6, the computation of 𝐀mwsuperscriptsubscript𝐀𝑚𝑤\mathbf{A}_{m}^{w}bold_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT and 𝐐mwsuperscriptsubscript𝐐𝑚𝑤\mathbf{Q}_{m}^{w}bold_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT in Eq. 2 are as follows (Solin et al., 2016):

d𝐏dt=𝐅mw𝐏+𝐏𝐅mwH+𝐋v𝐋=0,𝐀mw=expm(𝐅mwΔt),𝐐mw=𝐏𝐀mw𝐏𝐀mwH,formulae-sequence𝑑subscript𝐏𝑑𝑡superscriptsubscript𝐅𝑚𝑤subscript𝐏subscript𝐏superscriptsuperscriptsubscript𝐅𝑚𝑤𝐻𝐋𝑣superscript𝐋top0formulae-sequencesuperscriptsubscript𝐀𝑚𝑤expmsuperscriptsubscript𝐅𝑚𝑤Δ𝑡superscriptsubscript𝐐𝑚𝑤subscript𝐏superscriptsubscript𝐀𝑚𝑤subscript𝐏superscriptsuperscriptsubscript𝐀𝑚𝑤𝐻\displaystyle\begin{split}\frac{d\mathbf{P}_{\infty}}{dt}&=\mathbf{F}_{m}^{w}% \mathbf{P}_{\infty}+\mathbf{P}_{\infty}{\mathbf{F}_{m}^{w}}^{H}+\mathbf{L}v% \mathbf{L}^{\top}=0,\\ \mathbf{A}_{m}^{w}&=\text{expm}(\mathbf{F}_{m}^{w}\Delta t),\\ \mathbf{Q}_{m}^{w}&=\mathbf{P}_{\infty}-\mathbf{A}_{m}^{w}\mathbf{P}_{\infty}{% \mathbf{A}_{m}^{w}}^{H},\end{split}start_ROW start_CELL divide start_ARG italic_d bold_P start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t end_ARG end_CELL start_CELL = bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT + bold_P start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT + bold_L italic_v bold_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = 0 , end_CELL end_ROW start_ROW start_CELL bold_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT end_CELL start_CELL = expm ( bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT roman_Δ italic_t ) , end_CELL end_ROW start_ROW start_CELL bold_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT end_CELL start_CELL = bold_P start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT - bold_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT bold_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT , end_CELL end_ROW (9)

where v𝑣vitalic_v is the spectral density of white noise u(t)𝑢𝑡u(t)italic_u ( italic_t ), expm()expm\text{expm}(\cdot)expm ( ⋅ ) represents the matrix exponential function, and ΔtΔ𝑡\Delta troman_Δ italic_t signifies the time interval in discrete-time LDS.

3.2 Markovian Across-Region Latent Variables

Region p𝑝pitalic_p’s mthsuperscript𝑚𝑡m^{th}italic_m start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT dimension across-region latent variable xmaP×Tsubscriptsuperscript𝑥𝑎𝑚superscript𝑃𝑇x^{a}_{m}\in\mathbb{R}^{P\times T}italic_x start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_P × italic_T end_POSTSUPERSCRIPT can be expressed as xma=[xm1,a,,xmp,a,,xmP,a]subscriptsuperscript𝑥𝑎𝑚subscriptsuperscript𝑥1𝑎𝑚subscriptsuperscript𝑥𝑝𝑎𝑚subscriptsuperscript𝑥𝑃𝑎𝑚x^{a}_{m}=[x^{1,a}_{m},\dots,x^{p,a}_{m},\dots,x^{P,a}_{m}]italic_x start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = [ italic_x start_POSTSUPERSCRIPT 1 , italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , … , italic_x start_POSTSUPERSCRIPT italic_p , italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , … , italic_x start_POSTSUPERSCRIPT italic_P , italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ]. This indicates that they consist of P𝑃Pitalic_P variables sharing the same ηmsuperscript𝜂𝑚\eta^{m}italic_η start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT and σmsuperscript𝜎𝑚\sigma^{m}italic_σ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT for describing temporal features while using phase delays {ϕppm}p,p=1Psuperscriptsubscriptsuperscriptsubscriptitalic-ϕ𝑝superscript𝑝𝑚𝑝superscript𝑝1𝑃\{\phi_{pp^{\prime}}^{m}\}_{p,p^{\prime}=1}^{P}{ italic_ϕ start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_p , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT to capture cross-spatial differences. It’s important to note that the temporal and spatial components are separable in 𝐊msuperscript𝐊𝑚\mathbf{K}^{m}bold_K start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. Consequently, we can initially create a within-region Markovian representation, denoted as 𝐀mw,𝐐mwsuperscriptsubscript𝐀𝑚𝑤superscriptsubscript𝐐𝑚𝑤\mathbf{A}_{m}^{w},\mathbf{Q}_{m}^{w}bold_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT , bold_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT, for the temporal features in each xmp,asubscriptsuperscript𝑥𝑝𝑎𝑚x^{p,a}_{m}italic_x start_POSTSUPERSCRIPT italic_p , italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. Then, this representation is extended to the across-region Markovian representation for xmasubscriptsuperscript𝑥𝑎𝑚x^{a}_{m}italic_x start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT through the incorporation of phase delays {ϕppm}p,p=1Psuperscriptsubscriptsuperscriptsubscriptitalic-ϕ𝑝superscript𝑝𝑚𝑝superscript𝑝1𝑃\{\phi_{pp^{\prime}}^{m}\}_{p,p^{\prime}=1}^{P}{ italic_ϕ start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_p , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT.

Therefore, the Markovian representation of mthsuperscript𝑚𝑡m^{th}italic_m start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT dimension across-region latent variable xmasubscriptsuperscript𝑥𝑎𝑚x^{a}_{m}italic_x start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is:

fm,ta=𝐀mafm,t1a+qt1,qt1𝒞𝒩(0,𝐐ma),𝐀ma=𝐈𝐀mw,𝐐ma=𝐊spatialm𝐐mw,formulae-sequencesuperscriptsubscript𝑓𝑚𝑡𝑎superscriptsubscript𝐀𝑚𝑎superscriptsubscript𝑓𝑚𝑡1𝑎subscript𝑞𝑡1formulae-sequencesimilar-tosubscript𝑞𝑡1𝒞𝒩0superscriptsubscript𝐐𝑚𝑎formulae-sequencesuperscriptsubscript𝐀𝑚𝑎tensor-product𝐈superscriptsubscript𝐀𝑚𝑤superscriptsubscript𝐐𝑚𝑎tensor-productsubscriptsuperscript𝐊𝑚spatialsuperscriptsubscript𝐐𝑚𝑤\displaystyle\begin{split}f_{m,t}^{a}=\mathbf{A}_{m}^{a}f_{m,t-1}^{a}+q_{t-1},% &\quad q_{t-1}\sim\mathcal{CN}(0,\mathbf{Q}_{m}^{a}),\\ \mathbf{A}_{m}^{a}=\mathbf{I}\otimes\mathbf{A}_{m}^{w},&\quad\mathbf{Q}_{m}^{a% }=\mathbf{K}^{m}_{\text{spatial}}\otimes\mathbf{Q}_{m}^{w},\end{split}start_ROW start_CELL italic_f start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT = bold_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_m , italic_t - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT + italic_q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , end_CELL start_CELL italic_q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∼ caligraphic_C caligraphic_N ( 0 , bold_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL bold_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT = bold_I ⊗ bold_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT , end_CELL start_CELL bold_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT = bold_K start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT spatial end_POSTSUBSCRIPT ⊗ bold_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT , end_CELL end_ROW (10)

where fm,ta=[gm,ta,dgm,tadt,,dk1gm,tadtk1]Pk×Tsuperscriptsubscript𝑓𝑚𝑡𝑎superscriptsubscriptsuperscript𝑔𝑎𝑚𝑡𝑑subscriptsuperscript𝑔𝑎𝑚𝑡𝑑𝑡superscript𝑑𝑘1subscriptsuperscript𝑔𝑎𝑚𝑡𝑑superscript𝑡𝑘1topsuperscript𝑃𝑘𝑇f_{m,t}^{a}=[g^{a}_{m,t},\frac{dg^{a}_{m,t}}{dt},\dots,\frac{d^{k-1}g^{a}_{m,t% }}{dt^{k-1}}]^{\top}\in\mathbb{C}^{Pk\times T}italic_f start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT = [ italic_g start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT , divide start_ARG italic_d italic_g start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t end_ARG , … , divide start_ARG italic_d start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT italic_g start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_ARG ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_P italic_k × italic_T end_POSTSUPERSCRIPT, xm,tasubscriptsuperscript𝑥𝑎𝑚𝑡x^{a}_{m,t}italic_x start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT is the real part of gm,tasubscriptsuperscript𝑔𝑎𝑚𝑡g^{a}_{m,t}italic_g start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT, 𝐀maPk×Pksuperscriptsubscript𝐀𝑚𝑎superscript𝑃𝑘𝑃𝑘\mathbf{A}_{m}^{a}\in\mathbb{C}^{Pk\times Pk}bold_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_P italic_k × italic_P italic_k end_POSTSUPERSCRIPT is transition matrix, denoting the Kronecker product of identity matrix 𝐈P×P𝐈superscript𝑃𝑃\mathbf{I}\in\mathbb{R}^{P\times P}bold_I ∈ blackboard_R start_POSTSUPERSCRIPT italic_P × italic_P end_POSTSUPERSCRIPT and 𝐀mwk×ksuperscriptsubscript𝐀𝑚𝑤superscript𝑘𝑘\mathbf{A}_{m}^{w}\in\mathbb{C}^{k\times k}bold_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_k × italic_k end_POSTSUPERSCRIPT, and 𝐐maPk×Pksuperscriptsubscript𝐐𝑚𝑎superscript𝑃𝑘𝑃𝑘\mathbf{Q}_{m}^{a}\in\mathbb{C}^{Pk\times Pk}bold_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_P italic_k × italic_P italic_k end_POSTSUPERSCRIPT is measurement matrix, denoting the Kronecker product of 𝐊msuperscript𝐊𝑚\mathbf{K}^{m}bold_K start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT’s spatial part 𝐊spatialm=r=1Rapm,rapm,rexp(iηmϕppm)subscriptsuperscript𝐊𝑚spatialsuperscriptsubscript𝑟1𝑅superscriptsubscript𝑎𝑝𝑚𝑟superscriptsubscript𝑎superscript𝑝𝑚𝑟𝑖superscript𝜂𝑚superscriptsubscriptitalic-ϕ𝑝superscript𝑝𝑚\mathbf{K}^{m}_{\text{spatial}}=\sum_{r=1}^{R}a_{p}^{m,r}a_{p^{\prime}}^{m,r}% \exp(i\eta^{m}\phi_{pp^{\prime}}^{m})bold_K start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT spatial end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_r end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_r end_POSTSUPERSCRIPT roman_exp ( italic_i italic_η start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ) and 𝐐mwk×ksuperscriptsubscript𝐐𝑚𝑤superscript𝑘𝑘\mathbf{Q}_{m}^{w}\in\mathbb{C}^{k\times k}bold_Q start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_k × italic_k end_POSTSUPERSCRIPT.

3.3 Multi-Region Markovian Gaussian Process

Given our assumption of independence among different dimensions of across-region variables and distinct dimensions of within-region variables, the Markovian representation for all variables xMP×T𝑥superscript𝑀𝑃𝑇x\in\mathbb{R}^{MP\times T}italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_M italic_P × italic_T end_POSTSUPERSCRIPT, both across- and within-region, spanning P𝑃Pitalic_P brain regions can be expressed as:

ft=𝐀ft1+qt1,qt1𝒞𝒩(0,𝐐),\displaystyle\begin{split}f_{t}=\mathbf{A}f_{t-1}+q_{t-1}&,\quad q_{t-1}\sim% \mathcal{CN}(0,\mathbf{Q}),\end{split}start_ROW start_CELL italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_A italic_f start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_CELL start_CELL , italic_q start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∼ caligraphic_C caligraphic_N ( 0 , bold_Q ) , end_CELL end_ROW (11)

where ft=[gt,dgtdt,,dk1gtdtk1]MPk×Tsubscript𝑓𝑡superscriptsubscript𝑔𝑡𝑑subscript𝑔𝑡𝑑𝑡superscript𝑑𝑘1subscript𝑔𝑡𝑑superscript𝑡𝑘1topsuperscript𝑀𝑃𝑘𝑇f_{t}=[g_{t},\frac{dg_{t}}{dt},\dots,\frac{d^{k-1}g_{t}}{dt^{k-1}}]^{\top}\in% \mathbb{C}^{MPk\times T}italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , divide start_ARG italic_d italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t end_ARG , … , divide start_ARG italic_d start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t start_POSTSUPERSCRIPT italic_k - 1 end_POSTSUPERSCRIPT end_ARG ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_M italic_P italic_k × italic_T end_POSTSUPERSCRIPT, xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the real part of gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and 𝐀MPk×MPk𝐀superscript𝑀𝑃𝑘𝑀𝑃𝑘\mathbf{A}\in\mathbb{C}^{MPk\times MPk}bold_A ∈ blackboard_C start_POSTSUPERSCRIPT italic_M italic_P italic_k × italic_M italic_P italic_k end_POSTSUPERSCRIPT, 𝐐MPk×MPk𝐐superscript𝑀𝑃𝑘𝑀𝑃𝑘\mathbf{Q}\in\mathbb{C}^{MPk\times MPk}bold_Q ∈ blackboard_C start_POSTSUPERSCRIPT italic_M italic_P italic_k × italic_M italic_P italic_k end_POSTSUPERSCRIPT are block diagonal matrices: 𝐀=diag{𝐀1a,,𝐀maa,𝐀1w𝐀mww}𝐀diagsuperscriptsubscript𝐀1𝑎superscriptsubscript𝐀subscript𝑚𝑎𝑎superscriptsubscript𝐀1𝑤superscriptsubscript𝐀subscript𝑚𝑤𝑤\mathbf{A}=\text{diag}\{\mathbf{A}_{1}^{a},\dots,\mathbf{A}_{m_{a}}^{a},% \mathbf{A}_{1}^{w}\dots\mathbf{A}_{m_{w}}^{w}\}bold_A = diag { bold_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT , … , bold_A start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT , bold_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT … bold_A start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT }, 𝐐=diag{𝐐1a,,𝐐maa,𝐐1w,,𝐐mww}𝐐diagsuperscriptsubscript𝐐1𝑎superscriptsubscript𝐐subscript𝑚𝑎𝑎superscriptsubscript𝐐1𝑤superscriptsubscript𝐐subscript𝑚𝑤𝑤\mathbf{Q}=\text{diag}\{\mathbf{Q}_{1}^{a},\dots,\mathbf{Q}_{m_{a}}^{a},% \mathbf{Q}_{1}^{w},\dots,\mathbf{Q}_{m_{w}}^{w}\}bold_Q = diag { bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT , … , bold_Q start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT , bold_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT , … , bold_Q start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT }. Meanwhile, the neural recordings yN×T𝑦superscript𝑁𝑇y\in\mathbb{R}^{N\times T}italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_T end_POSTSUPERSCRIPT can be reconstructed by y=𝐂Re[𝐆f]+d+ϵ𝑦𝐂Re𝐆𝑓𝑑italic-ϵy=\mathbf{C}\operatorname{Re}[\mathbf{G}f]+d+\epsilonitalic_y = bold_C roman_Re [ bold_G italic_f ] + italic_d + italic_ϵ, with 𝐂𝐂\mathbf{C}bold_C, d𝑑ditalic_d, ϵitalic-ϵ\epsilonitalic_ϵ from CSM-GPFA in Section 2.2, and 𝐆𝐆\mathbf{G}bold_G in Eq. 6.

3.4 Multi-Region Markovian Gaussian Process with Switching States

After the link between the multi-region Gaussian Process and linear dynamical system (LDS) is established, we can seamlessly extend the across-region discrete-time LDS in Eq. 10 to incorporate switching states.

Integrating a Hidden Markov Model (HMM) into LDS leads to Switching LDS (Fox et al., 2008), and similarly, combining HMM with MRM-GP results in Switching MRM-GP. A significant advantage of this integration is the ability to link the across-region’s transition and measurement matrices with distinct, discrete states z{1,,Z}𝑧1𝑍z\in\{1,\dots,Z\}italic_z ∈ { 1 , … , italic_Z }: 𝐀za=diag{𝐀1,za,,𝐀ma,za},𝐐za=diag{𝐐1,za,,𝐐ma,za}formulae-sequencesuperscriptsubscript𝐀𝑧𝑎diagsuperscriptsubscript𝐀1𝑧𝑎superscriptsubscript𝐀subscript𝑚𝑎𝑧𝑎superscriptsubscript𝐐𝑧𝑎diagsuperscriptsubscript𝐐1𝑧𝑎superscriptsubscript𝐐subscript𝑚𝑎𝑧𝑎\mathbf{A}_{z}^{a}=\text{diag}\{\mathbf{A}_{1,z}^{a},\dots,\mathbf{A}_{m_{a},z% }^{a}\},\mathbf{Q}_{z}^{a}=\text{diag}\{\mathbf{Q}_{1,z}^{a},\dots,\mathbf{Q}_% {m_{a},z}^{a}\}bold_A start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT = diag { bold_A start_POSTSUBSCRIPT 1 , italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT , … , bold_A start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT } , bold_Q start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT = diag { bold_Q start_POSTSUBSCRIPT 1 , italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT , … , bold_Q start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT }, which makes it easy to accommodate time-varying frequencies and delays in across-region latent variables.

4 Inference

We have now established a connection between a Gaussian Process with a multi-region kernel and a linear dynamical system (LDS). The next step is to learn discrete states, model parameters, and latent variables.

MRM-GP, as a discrete-time LDS, affords a significant advantage: the ability to learn its parameters with a cost linear in time steps: 𝒪(T)𝒪𝑇\mathcal{O}(T)caligraphic_O ( italic_T ). To achieve this, we employ the variational Laplace EM inference algorithm proposed in the general recurrent state space framework for decision-making (Zoltowski et al., 2020).

If denoting the number of discrete states as Z𝑍Zitalic_Z, the parameters θ𝜃\thetaitalic_θ of MRM-GP can be categorized into two groups: (1) kernel parameters: {σm,z,ηm,z}m=1,z=1ma,Zsuperscriptsubscriptsuperscript𝜎𝑚𝑧superscript𝜂𝑚𝑧formulae-sequence𝑚1𝑧1subscript𝑚𝑎𝑍\{\sigma^{m,z},\eta^{m,z}\}_{m=1,z=1}^{m_{a},Z}{ italic_σ start_POSTSUPERSCRIPT italic_m , italic_z end_POSTSUPERSCRIPT , italic_η start_POSTSUPERSCRIPT italic_m , italic_z end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_m = 1 , italic_z = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_Z end_POSTSUPERSCRIPT, {σm,p,ηm,p}m=1,p=1mw,Psuperscriptsubscriptsuperscript𝜎𝑚𝑝superscript𝜂𝑚𝑝formulae-sequence𝑚1𝑝1subscript𝑚𝑤𝑃\{\sigma^{m,p},\eta^{m,p}\}_{m=1,p=1}^{m_{w},P}{ italic_σ start_POSTSUPERSCRIPT italic_m , italic_p end_POSTSUPERSCRIPT , italic_η start_POSTSUPERSCRIPT italic_m , italic_p end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_m = 1 , italic_p = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT , italic_P end_POSTSUPERSCRIPT, {ϕppm,z}m=1,p=1,p=p+1,z=1ma,P,Zsuperscriptsubscriptsuperscriptsubscriptitalic-ϕ𝑝superscript𝑝𝑚𝑧formulae-sequence𝑚1formulae-sequence𝑝1formulae-sequencesuperscript𝑝𝑝1𝑧1subscript𝑚𝑎𝑃𝑍\{\phi_{pp^{\prime}}^{m,z}\}_{m=1,p=1,p^{\prime}=p+1,z=1}^{m_{a},P,Z}{ italic_ϕ start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_z end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_m = 1 , italic_p = 1 , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_p + 1 , italic_z = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_P , italic_Z end_POSTSUPERSCRIPT; (2) emissions parameters: 𝐂,d,𝐕𝐂𝑑𝐕\mathbf{C},d,\mathbf{V}bold_C , italic_d , bold_V. Additionally, the hyper-parameters consist of the number of discrete states Z𝑍Zitalic_Z, the number of derivatives k𝑘kitalic_k, the kernel rank R𝑅Ritalic_R, and the number of latent dimensions M𝑀Mitalic_M. The value of Z𝑍Zitalic_Z depends on the data, k𝑘kitalic_k is discussed in Section 3.1 and Figure 2, M𝑀Mitalic_M is determined through a cross-validation strategy (Section 5.2), and the rank R𝑅Ritalic_R is consistently set to 2 to ensure positive definiteness without introducing many amplitude parameters. Besides, there is no need to learn the amplitude parameters, denoted as {apm,r}p=1,m=1,r=1P,M,Rsuperscriptsubscriptsuperscriptsubscript𝑎𝑝𝑚𝑟formulae-sequence𝑝1formulae-sequence𝑚1𝑟1𝑃𝑀𝑅\{a_{p}^{m,r}\}_{p=1,m=1,r=1}^{P,M,R}{ italic_a start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_r end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_p = 1 , italic_m = 1 , italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P , italic_M , italic_R end_POSTSUPERSCRIPT, since the emissions parameter 𝐂𝐂\mathbf{C}bold_C fulfills a similar role in MRM-GP.

The variational Laplace EM inference algorithm alternatively updates discrete switching states z{1,,Z}𝑧1𝑍z\in\{1,\dots,Z\}italic_z ∈ { 1 , … , italic_Z }, latent dynamics fMPk×T𝑓superscript𝑀𝑃𝑘𝑇f\in\mathbb{C}^{MPk\times T}italic_f ∈ blackboard_C start_POSTSUPERSCRIPT italic_M italic_P italic_k × italic_T end_POSTSUPERSCRIPT, and model parameters θ𝜃\thetaitalic_θ. The time complexity and memory storage of each step are all linear in time as follows: (1) updating z𝑧zitalic_z: 𝒪(Z)𝒪𝑍\mathcal{O}(Z)caligraphic_O ( italic_Z ), 𝒪(ZT)𝒪𝑍𝑇\mathcal{O}(ZT)caligraphic_O ( italic_Z italic_T ); (2) updating f𝑓fitalic_f: 𝒪(T)𝒪𝑇\mathcal{O}(T)caligraphic_O ( italic_T ), 𝒪(2M2P2k2T)𝒪2superscript𝑀2superscript𝑃2superscript𝑘2𝑇\mathcal{O}(2M^{2}P^{2}k^{2}T)caligraphic_O ( 2 italic_M start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_T ); (3) updating θ𝜃\thetaitalic_θ: 𝒪(ZMk)𝒪𝑍𝑀𝑘\mathcal{O}(ZMk)caligraphic_O ( italic_Z italic_M italic_k ), 𝒪(MPkT)𝒪𝑀𝑃𝑘𝑇\mathcal{O}(MPkT)caligraphic_O ( italic_M italic_P italic_k italic_T ).

Furthermore, to avoid the calculation of the complex number when updating f𝑓fitalic_f in our implementations, we rewrite the complex latent dynamics f𝑓fitalic_f in Eq. 11 to be a joint signal in the real domain, such that the latent dynamics becomes:

[frfi]t=[𝐀r𝐀i𝐀i𝐀r][frfi]t1+[qrqi]t1,[qrqi]t1𝒩([00],[𝐐r𝐐i𝐐i𝐐r]),formulae-sequencesubscriptmatrixsubscript𝑓𝑟subscript𝑓𝑖𝑡matrixsubscript𝐀𝑟subscript𝐀𝑖subscript𝐀𝑖subscript𝐀𝑟subscriptmatrixsubscript𝑓𝑟subscript𝑓𝑖𝑡1subscriptmatrixsubscript𝑞𝑟subscript𝑞𝑖𝑡1similar-tosubscriptmatrixsubscript𝑞𝑟subscript𝑞𝑖𝑡1𝒩matrix00matrixsubscript𝐐𝑟subscript𝐐𝑖subscript𝐐𝑖subscript𝐐𝑟\displaystyle\begin{split}&\begin{bmatrix}f_{r}\\ f_{i}\end{bmatrix}_{t}=\begin{bmatrix}\mathbf{A}_{r}&-\mathbf{A}_{i}\\ \mathbf{A}_{i}&\mathbf{A}_{r}\end{bmatrix}\begin{bmatrix}f_{r}\\ f_{i}\end{bmatrix}_{t-1}+\begin{bmatrix}q_{r}\\ q_{i}\end{bmatrix}_{t-1},\\ &\begin{bmatrix}q_{r}\\ q_{i}\end{bmatrix}_{t-1}\sim\mathcal{N}\left(\begin{bmatrix}0\\ 0\end{bmatrix},\begin{bmatrix}\mathbf{Q}_{r}&-\mathbf{Q}_{i}\\ \mathbf{Q}_{i}&\mathbf{Q}_{r}\end{bmatrix}\right),\end{split}start_ROW start_CELL end_CELL start_CELL [ start_ARG start_ROW start_CELL italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL bold_A start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_CELL start_CELL - bold_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL bold_A start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + [ start_ARG start_ROW start_CELL italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL [ start_ARG start_ROW start_CELL italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∼ caligraphic_N ( [ start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL end_ROW end_ARG ] , [ start_ARG start_ROW start_CELL bold_Q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_CELL start_CELL - bold_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL bold_Q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ) , end_CELL end_ROW (12)

where fr,fi,qr,qi,𝐀r,𝐀i,𝐐r,𝐐isubscript𝑓𝑟subscript𝑓𝑖subscript𝑞𝑟subscript𝑞𝑖subscript𝐀𝑟subscript𝐀𝑖subscript𝐐𝑟subscript𝐐𝑖f_{r},f_{i},q_{r},q_{i},\mathbf{A}_{r},\mathbf{A}_{i},\mathbf{Q}_{r},\mathbf{Q% }_{i}italic_f start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_A start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_Q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , bold_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are the real and imagery part of f,q,𝐀,𝐐𝑓𝑞𝐀𝐐f,q,\mathbf{A},\mathbf{Q}italic_f , italic_q , bold_A , bold_Q, respectively.

5 Experiments

Our code is available at https://github.com/WeihanLikk/MRM-GP.

Datasets.

We evaluate MRM-GP on three datasets:
\bullet\quad Synthetic Data: We generate simulated data incorporating both across-region communications and within-region neural activities, along with time-varying frequencies and phase delays introduced by various states.
\bullet\quad Local Field Potential Recordings (LFP) (Siegle et al., 2021): Local Field Potential recordings from mouse’s primary visual area (V1) and visual anteromedial area (VISam). The external stimulus consisted of an 8Hz drifting grating with eight orientation directions.
\bullet\quad Neural Spike Trains (Semedo et al., 2019; Zandvakili & Kohn, 2019): Simultaneous spike trains from monkey’s primary visual area (V1) and secondary visual cortex (V2). The external stimulus is a 6Hz drifting grating with eight orientation directions.

Baselines for comparison.

We compare MRM-GP with two methods designed to discover the directional communications in the latent space of multi-region recordings:
\bullet\quad DLAG (Gokcen et al., 2022): A Gaussian Process Factor Analysis employs a multi-output Squared Exponential kernel. Its goal is to uncover simultaneous or bidirectional latent communications across different regions. The kernel function incorporates a time delay parameter to determine the directions for learned communications.
\bullet\quad CSM-GPFA: The Gaussian Process Factor Analysis, using a multi-region kernel as described in Section 2.2, is an extension of the model presented in (Hultman et al., 2018). This extension introduces a new classification assumption, distinguishing latent variables into across-region and within-region types.

Metrics.

For every model and dataset, we fit the model on the training set, denoted as ytrainsubscript𝑦trainy_{\text{train}}italic_y start_POSTSUBSCRIPT train end_POSTSUBSCRIPT, and test its performance on the test set ytestsubscript𝑦testy_{\text{test}}italic_y start_POSTSUBSCRIPT test end_POSTSUBSCRIPT. Specifically, we randomly select some trials as ytrainsubscript𝑦trainy_{\text{train}}italic_y start_POSTSUBSCRIPT train end_POSTSUBSCRIPT, while the remaining trials serve as ytestsubscript𝑦testy_{\text{test}}italic_y start_POSTSUBSCRIPT test end_POSTSUBSCRIPT. Additionally, we randomly divide the test data ytestsubscript𝑦testy_{\text{test}}italic_y start_POSTSUBSCRIPT test end_POSTSUBSCRIPT into two parts: ytestheld-insuperscriptsubscript𝑦testheld-iny_{\text{test}}^{\text{held-in}}italic_y start_POSTSUBSCRIPT test end_POSTSUBSCRIPT start_POSTSUPERSCRIPT held-in end_POSTSUPERSCRIPT with 90%percent9090\%90 % neurons as held-in test data and ytestheld-outsuperscriptsubscript𝑦testheld-outy_{\text{test}}^{\text{held-out}}italic_y start_POSTSUBSCRIPT test end_POSTSUBSCRIPT start_POSTSUPERSCRIPT held-out end_POSTSUPERSCRIPT with 10%percent1010\%10 % neurons as held-out test data. We infer xtestheld-insuperscriptsubscript𝑥testheld-inx_{\text{test}}^{\text{held-in}}italic_x start_POSTSUBSCRIPT test end_POSTSUBSCRIPT start_POSTSUPERSCRIPT held-in end_POSTSUPERSCRIPT based on ytestheld-insuperscriptsubscript𝑦testheld-iny_{\text{test}}^{\text{held-in}}italic_y start_POSTSUBSCRIPT test end_POSTSUBSCRIPT start_POSTSUPERSCRIPT held-in end_POSTSUPERSCRIPT, which is then used as the test latent variables when computing test log-likelihood (LL) p(ytestheld-out|xtestheld-in;θ)𝑝conditionalsuperscriptsubscript𝑦testheld-outsuperscriptsubscript𝑥testheld-in𝜃p(y_{\text{test}}^{\text{held-out}}|x_{\text{test}}^{\text{held-in}};\theta)italic_p ( italic_y start_POSTSUBSCRIPT test end_POSTSUBSCRIPT start_POSTSUPERSCRIPT held-out end_POSTSUPERSCRIPT | italic_x start_POSTSUBSCRIPT test end_POSTSUBSCRIPT start_POSTSUPERSCRIPT held-in end_POSTSUPERSCRIPT ; italic_θ )(Pei et al., 2021), serving as the final metric in our experiments. To reduce the randomness when creating ytestheld-insuperscriptsubscript𝑦testheld-iny_{\text{test}}^{\text{held-in}}italic_y start_POSTSUBSCRIPT test end_POSTSUBSCRIPT start_POSTSUPERSCRIPT held-in end_POSTSUPERSCRIPT and ytestheld-outsuperscriptsubscript𝑦testheld-outy_{\text{test}}^{\text{held-out}}italic_y start_POSTSUBSCRIPT test end_POSTSUBSCRIPT start_POSTSUPERSCRIPT held-out end_POSTSUPERSCRIPT, we also average p(ytestheld-out|xtestheld-in;θ)𝑝conditionalsuperscriptsubscript𝑦testheld-outsuperscriptsubscript𝑥testheld-in𝜃p(y_{\text{test}}^{\text{held-out}}|x_{\text{test}}^{\text{held-in}};\theta)italic_p ( italic_y start_POSTSUBSCRIPT test end_POSTSUBSCRIPT start_POSTSUPERSCRIPT held-out end_POSTSUPERSCRIPT | italic_x start_POSTSUBSCRIPT test end_POSTSUBSCRIPT start_POSTSUPERSCRIPT held-in end_POSTSUPERSCRIPT ; italic_θ ) over five distinct partitions.

Refer to caption
Figure 3: Applying MRM-GP to synthetic data. (A): Compare the estimated latent variables and discrete states with the ground truth. MRM-GP accurately identifies two states with time-varying frequencies and phase delays, aligning with the ground truth. (B): Compare learned parameters with the ground truth indicated by dashed lines. (C): Examine the test LL with varying numbers of discrete states Z𝑍Zitalic_Z. The findings indicate that Z=2𝑍2Z=2italic_Z = 2 and Z=3𝑍3Z=3italic_Z = 3 exhibit similar LL, both larger than Z=1𝑍1Z=1italic_Z = 1.

5.1 Synthetic Data

This section aims to assess how well MRM-GP can identify switching states, latent variables, and parameters.

Experimental setup.

We generate 50 independent trials for two brain regions P=2𝑃2P=2italic_P = 2, where each region has 30303030 neurons, ma=1subscript𝑚𝑎1m_{a}=1italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = 1 dimension across-region variables, and mw=1subscript𝑚𝑤1m_{w}=1italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT = 1 dimension within-region variable. We also introduce the time-varying across-region frequencies and phase delays by two discrete states Z={z1,z2}𝑍subscript𝑧1subscript𝑧2Z=\{z_{1},z_{2}\}italic_Z = { italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }: (1) state 1: ηz1,a=1.0superscript𝜂subscript𝑧1𝑎1.0\eta^{z_{1},a}=1.0italic_η start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a end_POSTSUPERSCRIPT = 1.0 rad/s, ϕ1,2z1=10superscriptsubscriptitalic-ϕ12subscript𝑧110\phi_{1,2}^{z_{1}}=-10italic_ϕ start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = - 10 ms, σz1,a=10superscript𝜎subscript𝑧1𝑎10\sigma^{z_{1},a}=10italic_σ start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a end_POSTSUPERSCRIPT = 10, state 2: ηz2,a=0.25superscript𝜂subscript𝑧2𝑎0.25\eta^{z_{2},a}=0.25italic_η start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_a end_POSTSUPERSCRIPT = 0.25 rad/s, ϕ1,2z2=10superscriptsubscriptitalic-ϕ12subscript𝑧210\phi_{1,2}^{z_{2}}=10italic_ϕ start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = 10 ms, σz2,a=10superscript𝜎subscript𝑧2𝑎10\sigma^{z_{2},a}=10italic_σ start_POSTSUPERSCRIPT italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_a end_POSTSUPERSCRIPT = 10. Different sign of ϕ1,2subscriptitalic-ϕ12\phi_{1,2}italic_ϕ start_POSTSUBSCRIPT 1 , 2 end_POSTSUBSCRIPT means the change of directions. We set ηw=0.75superscript𝜂𝑤0.75\eta^{w}=0.75italic_η start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT = 0.75 rad/s and σw=10superscript𝜎𝑤10\sigma^{w}=10italic_σ start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT = 10 for within-region variables. For the generative and inference process, we set hyperparameters k=2𝑘2k=2italic_k = 2, R=2𝑅2R=2italic_R = 2, and compare the test log likelihood when Z=1,2,3𝑍123Z=1,2,3italic_Z = 1 , 2 , 3.

Results.

We fit an MRM-GP to the synthetic data, specifying ma=1subscript𝑚𝑎1m_{a}=1italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = 1 dimension across-region variables, mw=1subscript𝑚𝑤1m_{w}=1italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT = 1 dimension within-region variable, and Z=2𝑍2Z=2italic_Z = 2 states. Figure 3(A) shows single-trial latent variable estimations that accurately reflect the latent dynamics and communications influenced by discrete states over two brain regions. State z1subscript𝑧1z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (depicted in blue) exhibits a periodic signal with a higher frequency and forward communication from brain region 1 to brain region 2. In contrast, state z2subscript𝑧2z_{2}italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (shown in purple) displays an oscillatory pattern with a lower frequency and feedback communication from region 2 to region 1.

For a quantitative assessment of learned parameters, Figure 3(B) displays the estimated phase delays, frequencies, and length scales across different initializations, demonstrating close alignment with ground truths. Furthermore, Figure 3(C) illustrates the test log-likelihood for varying Z𝑍Zitalic_Z, revealing that both Z=2𝑍2Z=2italic_Z = 2 and Z=3𝑍3Z=3italic_Z = 3 provide similar and superior estimations compared to Z=1𝑍1Z=1italic_Z = 1 (see Appendix J for Z=3𝑍3Z=3italic_Z = 3 visualization). We also have synthetic experiments about parameter initialization and different parameter setting in Appendix F,G,H.

5.2 Local Field Potential Recordings

This section aims to explore interactions between the mouse’s primary visual area (V1) and the visual anteromedial area (VISam) in the presence of an 8Hz drifting grating. We also aim to compare the performance and inference time cost with other multi-region methods, namely DLAG and CSM-GPFA.

Experimental setup.

We conduct experiments using two sessions, each comprising eight orientation directions, resulting in 16 datasets. Each dataset includes 15 trials (10 as a training set and 5 as a testing set) of continuous-time local field potential recordings from approximately 20 neurons in V1 and approximately 25 in VISam. The initial sampling rate is 1000Hz, and we downsample it to 100Hz, resulting in 200 time points with 10ms bin size. We set hyperparameters k=2𝑘2k=2italic_k = 2, R=2𝑅2R=2italic_R = 2.

To determine the dimensionalities of across- and within-region latent variables, we adopt the approach outlined in (Gokcen et al., 2022). Initially, we apply Factor Analysis to identify the total number of latent variables required to elucidate the neural recordings for each region. A 5-fold cross-validation was employed to select the configuration yielding the highest test LL. Subsequently, given the selected total number of latent variables (M𝑀Mitalic_M), we conduct a grid search for the dimensionalities of across- (masubscript𝑚𝑎m_{a}italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT) and within-region (mwsubscript𝑚𝑤m_{w}italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT), respectively. For each pair of (ma,mw)subscript𝑚𝑎subscript𝑚𝑤(m_{a},m_{w})( italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ), we run 5-fold cross-validation with MRM-GP and chose the setting with the highest test LL. Given this procedure, our final choice was ma=1subscript𝑚𝑎1m_{a}=1italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = 1 across-region variables and mw=3subscript𝑚𝑤3m_{w}=3italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT = 3 within-region variables for both V1 and VISam. See Appendix B for the full comparison.

Refer to caption
Figure 4: Applying MRM-GP to LFP recordings. A: Compare the across-region latent variables with DLAG. B-C: Visualize the learned phase delays and frequencies from V1-VISam and V1a-V1b. D: Compare the test LL with other multi-region methods: DLAG and CSM-GPFA. E: The power spectrum of across-region latent variables in A. The latent variable from DLAG exhibits three frequency peaks, while MRM-GP’s latent variable has only one peak. F: Inference time comparison: MRM-GP has a linear time cost.

Results.

We applied an MRM-GP to local field potential recordings with Z=1𝑍1Z=1italic_Z = 1 state. Figure 4(A) shows a comparison of single-trial across-region latent variables for one dataset (orientation 135°, session 721123822), and the within-region variables are in Appendix B. Both latent variables demonstrate an oscillatory structure, capturing characteristics of the external 8Hz drifting grating stimulus.

The MRM-GP’s latent variable in this dataset is linked to a communication direction from V1 to VISam with an 8.2 ms phase delay, and the DLAG’s latent variable shows a 1.1 ms time delay. Both delays fall within a single time bin (10ms) and are positive, suggesting consistent communication direction from V1 to VISam. The difference in their values arises because DLAG models time delay (δppsubscript𝛿𝑝superscript𝑝\delta_{pp^{\prime}}italic_δ start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT) in the kernel equation Kpp(τ)=exp(12σ2(τδpp)2)subscript𝐾𝑝superscript𝑝𝜏12superscript𝜎2superscript𝜏subscript𝛿𝑝superscript𝑝2K_{pp^{\prime}}(\tau)=\exp(-\frac{1}{2\sigma^{2}}(\tau-\delta_{pp^{\prime}})^{% 2})italic_K start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_τ ) = roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( italic_τ - italic_δ start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), which is independent of frequency and has a different interpretation from the phase delay (ϕppsubscriptitalic-ϕ𝑝superscript𝑝\phi_{pp^{\prime}}italic_ϕ start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT). In this context, ϕppsubscriptitalic-ϕ𝑝superscript𝑝\phi_{pp^{\prime}}italic_ϕ start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT represents the delay in a specific frequency band. In contrast, δppsubscript𝛿𝑝superscript𝑝\delta_{pp^{\prime}}italic_δ start_POSTSUBSCRIPT italic_p italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT signifies the delay for a latent variable with a mixture of multiple frequencies, as evidenced by its power spectrum with three frequency peaks (Figure 4(E)). Therefore, the divergence in values is acceptable if their directions align.

The left chart in Figure 4(B) illustrates the estimated phase delays across all 16 datasets. Each data point represents the phase delay for an individual run on a specific dataset. The findings suggest consistent communication from V1 to VISam across all datasets, with the phase delays clustered around 7.5Hz (the left chart in Figure 4(C)), which is consistent with external 8Hz stimulus.

To demonstrate that the MRM-GP itself is not the cause of delays, we first divide V1 randomly into two parts, V1a and V1b, each with channels of equal size. We then estimated the phase delays between them. The right chart in Figure 4(B) illustrates that across 16 datasets, all phase delays hover around zero within frequency bands around 7.5Hz (the right chart in Figure 4(C)). This suggests that the learned delays are a consequence of the data rather than the model.

Figure 4(D) shows that MRM-GP, a linear dynamics system approximation of CSM-GPFA, exhibits a similar test LL to CSM-GPFA. The higher test LL compared to DLAG suggests that the multi-region kernel (Eq. 2.1) outperforms DLAG’s Squared Exponential kernel on these datasets. This is attributed to the former explicitly modeling frequencies through its kernel parameters and having a better frequency separation. Specifically, the across-region variable of the former has only one prominent frequency, whereas DLAG’s across-region variable exhibits three peaks in Figure4(E), keeping consistent with the data spectrum in Appendix E.

Lastly, in Figure 4(F), we compare the inference time of MRM-GP, CSM-GPFA, and DLAG for 500 iterations. We achieved this by downsampling the recordings and creating four datasets with varying lengths of time points: 50, 100, 150, and 200. The results indicate that the time cost of MRM-GP increases linearly, whereas both CSM-GPFA and DLAG exhibit cubic growth.

5.3 Neural Spike Trains

This section aims to evaluate MRM-GP’s ability to identify switching states within the communications subspace while also discovering across-region communications using a distinct type of neural data.

Experimental setup.

The simultaneous spike trains were obtained from the monkey’s primary visual area (V1) and secondary visual cortex (V2) in the presence of a 6Hz moving grating. This dataset comprises four sessions, each featuring eight orientation directions, resulting in 32 datasets. Each dataset comprises 400 trials (64 time points with 20ms bin size for every trial), with 300 trials randomly selected as the training set and 100 trials as the testing set. In V1, there are approximately 90 neurons, while in V2, there are around 20 neurons. We set hyperparameters k=2𝑘2k=2italic_k = 2, R=2𝑅2R=2italic_R = 2.

Results.

We fitted an MRM-GP to neural spikes trains with ma=2subscript𝑚𝑎2m_{a}=2italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = 2 dimensions across-region variables, mw=2subscript𝑚𝑤2m_{w}=2italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT = 2 dimension within-region variable, and Z=2𝑍2Z=2italic_Z = 2 states. The configuration of masubscript𝑚𝑎m_{a}italic_m start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and mwsubscript𝑚𝑤m_{w}italic_m start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT follows previous work (Gokcen et al., 2022) and adopts the same strategy mentioned in Section 5.2.

Figure 5(A) shows the across-region latent variables for one dataset (orientation 0°, session 106r001p26, ten trials are displayed, all variables are scaled by the variance explained in each region), and the within-region variables are in Appendix B. These latent variables indicate time-varying forward and feedback communications between V1 and V2. Different states exhibit distinct phase delays and frequencies. The first dimension of across-region variables (denoted as x1asuperscriptsubscript𝑥1𝑎x_{1}^{a}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT) displays a periodic pattern caused by the external drifting grating stimulus, whereas the second dimension (denoted as x2asuperscriptsubscript𝑥2𝑎x_{2}^{a}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT) exhibits a non-periodic signal with a single peak shortly after the stimulus onset.

Figure 5(B-C) presents the estimated phase delays and frequencies over multiple independent runs for 32 datasets. Each data point represents a dimension of across-region variables. Figure 5(B) corresponds to state z1subscript𝑧1z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, indicating that most state z1subscript𝑧1z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT dimensions exhibit across-region interactions within the 2Hz-8Hz range. Additionally, some dimensions display feedback communication with a large phase delay (>>>10ms) from V2 to V1, corresponding to state z1subscript𝑧1z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT of x2asuperscriptsubscript𝑥2𝑎x_{2}^{a}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT in Figure 5(A). However, there is variability across datasets for state z1subscript𝑧1z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT with smaller phase delays (<<<10ms). Some show forward communication from V1 to V2, akin to state z1subscript𝑧1z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT of x1asuperscriptsubscript𝑥1𝑎x_{1}^{a}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT in Figure 5(A), while others indicate feedback communications from V2 to V1.

One explanation for this variability is that in certain datasets, x1asuperscriptsubscript𝑥1𝑎x_{1}^{a}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT has a much weaker amplitude compared to x2asuperscriptsubscript𝑥2𝑎x_{2}^{a}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT, making the weaker latent affected by the stronger one along with its delay. This leads to a feedback signal from V2 to V1 at state z1subscript𝑧1z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. On the other hand, in some datasets (e.g., orientation 0°, session 106r001p26), x1asuperscriptsubscript𝑥1𝑎x_{1}^{a}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT is not as weak, resulting in a forward signal from V1 to V2 at state z1subscript𝑧1z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

Figure 5(C) depicts the estimated phase delays and frequencies associated with state z2subscript𝑧2z_{2}italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. The findings indicate a clear separation of oscillatory communications into two frequencies. One involves 6Hz communications with small phase delays (referring to state z2subscript𝑧2z_{2}italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT of x1asuperscriptsubscript𝑥1𝑎x_{1}^{a}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT in Figure 5(A)), while the other involves 1Hz communications with large phase delays (akin to state z2subscript𝑧2z_{2}italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT of x2asuperscriptsubscript𝑥2𝑎x_{2}^{a}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT in Figure 5(A)).

The time-varying phase delays can be explained as follows: (1) For x1asuperscriptsubscript𝑥1𝑎x_{1}^{a}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT, V1 triggers V2 to have oscillatory dynamics during state z1subscript𝑧1z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, while in state z2subscript𝑧2z_{2}italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, V2 is already engaged, causing both regions to oscillate synchronously, resulting in a smaller phase delay than in state z1subscript𝑧1z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. (2) For x2asuperscriptsubscript𝑥2𝑎x_{2}^{a}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT, V2 consistently sends signals with a low frequency to V1, resulting in a larger phase delay due to the longer period as indicated by z2subscript𝑧2z_{2}italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. During state z1subscript𝑧1z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, the stimulus onset triggers an intense signal from V2 to V1, leading to a smaller phase delay, which can be considered as an emergence of surprise or prediction error from V2 to V1 (Rao & Ballard, 1999).

Similar to Section 5.2, we also perform a control experiment by learning the phase delays between V1a and V1b. In Figure 5(D), the outcomes reveal zero-delay communications that are distributed across two frequencies (6Hz, 1Hz), suggesting that learned delays are a consequence of the data rather than the model.

Finally, we compare the test LL of MRM-GP, CSM-GPFA, and DLAG in Figure 5(E). The results indicate that MRM-GP with Z=2𝑍2Z=2italic_Z = 2 states achieves the highest LL, while MRM-GP with Z=1𝑍1Z=1italic_Z = 1 state exhibits a similar LL compared to CSM-GPFA, and both outperform DLAG. This suggests that (1) switching states exist in these datasets; (2) the multi-region kernel is more appropriate than the Squared Exponential kernel for modeling signals with sinusoidal structures.

Refer to caption
Figure 5: Applying MRM-GP to neural spike trains. A: Visualize across-region latent variables with state z1subscript𝑧1z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (in red) and state z2subscript𝑧2z_{2}italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (in blue). The results suggest forward and feedback communications between V1 and V2, along with time-varying delays. B-C: Visualize estimated parameters through phase delays on the x-axis and frequencies on the y-axis. D: Depict phase delays and frequencies learned in the V1a-V1b control experiment, with their representation on the x-axis and y-axis, respectively. E: Compare the test LL with discrete states Z=1,2𝑍12Z=1,2italic_Z = 1 , 2 and other multi-region methods: DLAG and CSM-GPFA.

6 Discussion

MRM-GP establishes the connection between a linear dynamics system (LDS) and a multi-output Gaussian Process (GP) explicitly modeling frequency-based communications and their directionality via phase delays within the latent space of neural data.

Connecting a complex-valued GP with an LDS is non-trivial. Although a complex-valued GP can be written as a multi-output real-valued GP (by twice the dimension), the resulted multi-output GP cannot be converted to an LDS by spectral factorization because the separability of the resulted multi-output kernel is not guaranteed.

Once the link is established, we can harness several advantages: (1) using the powerful representational capability of kernels, such as applying a multi-region kernel to model latent variables with periodic patterns; (2) achieving a linear computational cost; (3) incorporating time-varying frequencies and delays by introducing different discrete states.

We test MRM-GP using two distinct types of neural data. The findings showcase its capability to discover state-dependent latent communications across brain regions with a linear time inference cost.

Finally, the limitations of MRM-GP are twofold: (1) its reliance on separability for multi-output kernels, which restricts kernel selection options, and (2) its current model assumptions are unable to capture phenomena such as phase resetting and phase variability across different trials.

Acknowledgement

This work is supported by National Institutes of Health BRAIN initiative (1U01NS131810).

Impact Statement

The MRM-GP introduces an innovative and efficient method for investigating intricate interactions among brain regions. Its capacity to deliver an interpretable representation of multi-region neural data is poised to advance neuroscience, offering the potential for a more profound comprehension of brain function and disorders. This enhanced understanding of brain interactions has the prospect to drive advancements in neurotechnology, with potential benefits extending to fields such as brain-computer interfaces and personalized medicine.

References

  • Fox et al. (2008) Fox, E., Sudderth, E., Jordan, M., and Willsky, A. Nonparametric bayesian learning of switching linear dynamical systems. Advances in neural information processing systems, 21, 2008.
  • Glaser et al. (2020) Glaser, J., Whiteway, M., Cunningham, J. P., Paninski, L., and Linderman, S. Recurrent switching dynamical systems models for multiple interacting neural populations. Advances in neural information processing systems, 33:14867–14878, 2020.
  • Gokcen et al. (2022) Gokcen, E., Jasper, A. I., Semedo, J. D., Zandvakili, A., Kohn, A., Machens, C. K., and Yu, B. M. Disentangling the flow of signals between populations of neurons. Nature Computational Science, 2(8):512–525, 2022.
  • Gokcen et al. (2023) Gokcen, E., Jasper, A. I., Xu, A., Kohn, A., Machens, C. K., and Byron, M. Y. Uncovering motifs of concurrent signaling across multiple neuronal populations. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • Grewal & Andrews (2014) Grewal, M. S. and Andrews, A. P. Kalman filtering: Theory and Practice with MATLAB. John Wiley & Sons, 2014.
  • Harris & Mrsic-Flogel (2013) Harris, K. D. and Mrsic-Flogel, T. D. Cortical connectivity and sensory coding. Nature, 503(7474):51–58, 2013.
  • Hartikainen & Särkkä (2010) Hartikainen, J. and Särkkä, S. Kalman filtering and smoothing solutions to temporal gaussian process regression models. In 2010 IEEE international workshop on machine learning for signal processing, pp.  379–384. IEEE, 2010.
  • Hom & Johnson (1985) Hom, R. A. and Johnson, C. R. Matrix analysis. Cambridge University Express, 455, 1985.
  • Hultman et al. (2018) Hultman, R., Ulrich, K., Sachs, B. D., Blount, C., Carlson, D. E., Ndubuizu, N., Bagot, R. C., Parise, E. M., Vu, M.-A. T., Gallagher, N. M., et al. Brain-wide electrical spatiotemporal dynamics encode depression vulnerability. Cell, 173(1):166–180, 2018.
  • Kailath et al. (2000) Kailath, T., Sayed, A. H., and Hassibi, B. Linear estimation. Number BOOK. Prentice Hall, 2000.
  • Kohn et al. (2020) Kohn, A., Jasper, A. I., Semedo, J. D., Gokcen, E., Machens, C. K., and Byron, M. Y. Principles of corticocortical communication: proposed schemes and design considerations. Trends in Neurosciences, 43(9):725–737, 2020.
  • Miller et al. (2018) Miller, E. K., Lundqvist, M., and Bastos, A. M. Working memory 2.0. Neuron, 100(2):463–475, 2018.
  • Pei et al. (2021) Pei, F., Ye, J., Zoltowski, D., Wu, A., Chowdhury, R. H., Sohn, H., O’Doherty, J. E., Shenoy, K. V., Kaufman, M. T., Churchland, M., et al. Neural latents benchmark’21: evaluating latent variable models of neural population activity. arXiv preprint arXiv:2109.04463, 2021.
  • Rao & Ballard (1999) Rao, R. P. and Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature neuroscience, 2(1):79–87, 1999.
  • Särkkä et al. (2013) Särkkä, S., Solin, A., and Hartikainen, J. Spatiotemporal learning via infinite-dimensional bayesian filtering and smoothing: A look at gaussian process regression through kalman filtering. IEEE Signal Processing Magazine, 30(4):51–61, 2013.
  • Sayed & Kailath (2001) Sayed, A. H. and Kailath, T. A survey of spectral factorization methods. Numerical linear algebra with applications, 8(6-7):467–496, 2001.
  • Semedo et al. (2019) Semedo, J. D., Zandvakili, A., Machens, C. K., Byron, M. Y., and Kohn, A. Cortical areas interact through a communication subspace. Neuron, 102(1):249–259, 2019.
  • Siegle et al. (2021) Siegle, J. H., Jia, X., Durand, S., Gale, S., Bennett, C., Graddis, N., Heller, G., Ramirez, T. K., Choi, H., Luviano, J. A., et al. Survey of spiking in the mouse visual system reveals functional hierarchy. Nature, 592(7852):86–92, 2021.
  • Solin & Särkkä (2014) Solin, A. and Särkkä, S. Explicit link between periodic covariance functions and state space models. In Artificial Intelligence and Statistics, pp.  904–912. PMLR, 2014.
  • Solin et al. (2016) Solin, A. et al. Stochastic differential equation methods for spatio-temporal gaussian process regression. 2016.
  • Ulrich et al. (2015) Ulrich, K. R., Carlson, D. E., Dzirasa, K., and Carin, L. Gp kernels for cross-spectrum analysis. Advances in neural information processing systems, 28, 2015.
  • Wang et al. (2024) Wang, Y., Wu, Z., Li, C., and Wu, A. Extraction and recovery of spatio-temporal structure in latent dynamics alignment with diffusion model. Advances in Neural Information Processing Systems, 36, 2024.
  • Zandvakili & Kohn (2019) Zandvakili, A. and Kohn, A. Simultaneous v1–v2 neuronal population recordings in anesthetized macaque monkeys. CRCNS https://doi. org/10.6080/K0B27SHN, 2019.
  • Zhu et al. (2023) Zhu, H., Balsells-Rodas, C., and Li, Y. Markovian gaussian process variational autoencoders. In International Conference on Machine Learning, pp.  42938–42961. PMLR, 2023.
  • Zoltowski et al. (2020) Zoltowski, D., Pillow, J., and Linderman, S. A general recurrent state space framework for modeling neural dynamics during decision-making. In International Conference on Machine Learning, pp.  11680–11691. PMLR, 2020.

Appendix A Spectral Factorization.

Derivation for Eq. 6.

Start with Eq. 3, taking Fourier transforms on both sides gives:

(iω)𝐉(iω)mp,w=𝐅mw𝐉(iω)mp,w+𝐋𝐔(iω).𝑖𝜔𝐉subscriptsuperscript𝑖𝜔𝑝𝑤𝑚superscriptsubscript𝐅𝑚𝑤𝐉subscriptsuperscript𝑖𝜔𝑝𝑤𝑚𝐋𝐔𝑖𝜔\displaystyle\begin{split}(i\omega)\mathbf{J}(i\omega)^{p,w}_{m}&=\mathbf{F}_{% m}^{w}\mathbf{J}(i\omega)^{p,w}_{m}+\mathbf{L}\mathbf{U}(i\omega).\end{split}start_ROW start_CELL ( italic_i italic_ω ) bold_J ( italic_i italic_ω ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_CELL start_CELL = bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT bold_J ( italic_i italic_ω ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + bold_LU ( italic_i italic_ω ) . end_CELL end_ROW (13)

Solving for 𝐉(iω)mp,w𝐉subscriptsuperscript𝑖𝜔𝑝𝑤𝑚\mathbf{J}(i\omega)^{p,w}_{m}bold_J ( italic_i italic_ω ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT gives:

𝐉(iω)mp,w=((iω𝐅mw)𝐈)1𝐋𝐔(iω).𝐉subscriptsuperscript𝑖𝜔𝑝𝑤𝑚superscript𝑖𝜔superscriptsubscript𝐅𝑚𝑤𝐈1𝐋𝐔𝑖𝜔\displaystyle\begin{split}\mathbf{J}(i\omega)^{p,w}_{m}=((i\omega-\mathbf{F}_{% m}^{w})\mathbf{I})^{-1}\mathbf{L}\mathbf{U}(i\omega).\end{split}start_ROW start_CELL bold_J ( italic_i italic_ω ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = ( ( italic_i italic_ω - bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT ) bold_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_LU ( italic_i italic_ω ) . end_CELL end_ROW (14)

Recall that f(t)mp,w𝑓subscriptsuperscript𝑡𝑝𝑤𝑚f(t)^{p,w}_{m}italic_f ( italic_t ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT in Eq. 3 is a Complex Gaussian Process with single-output kernel 𝐊(τ)m=r=1Rarm2exp(12σm2τ2+iηm(τ))𝐊superscript𝜏𝑚superscriptsubscript𝑟1𝑅superscriptsuperscriptsubscript𝑎𝑟𝑚212superscriptsuperscript𝜎𝑚2superscript𝜏2𝑖superscript𝜂𝑚𝜏\mathbf{K}(\tau)^{m}=\sum_{r=1}^{R}{a_{r}^{m}}^{2}\exp(-\frac{1}{2{\sigma^{m}}% ^{2}}\tau^{2}+i\eta^{m}(\tau))bold_K ( italic_τ ) start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_τ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_i italic_η start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ( italic_τ ) ) and its derivatives up to (k1)thsuperscript𝑘1𝑡(k-1)^{th}( italic_k - 1 ) start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT. So, the spectral density matrix of this process f(t)mp,w𝑓subscriptsuperscript𝑡𝑝𝑤𝑚f(t)^{p,w}_{m}italic_f ( italic_t ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is:

𝐒J(ω)=𝔼[𝐉(iω)mp,wJ(iω)mp,w].subscript𝐒𝐽𝜔𝔼delimited-[]𝐉subscriptsuperscript𝑖𝜔𝑝𝑤𝑚𝐽superscriptsubscriptsuperscript𝑖𝜔𝑝𝑤𝑚top\displaystyle\begin{split}\mathbf{S}_{J}(\omega)=\mathbb{E}[\mathbf{J}(i\omega% )^{p,w}_{m}{J(-i\omega)^{p,w}_{m}}^{\top}].\end{split}start_ROW start_CELL bold_S start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ( italic_ω ) = blackboard_E [ bold_J ( italic_i italic_ω ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT italic_J ( - italic_i italic_ω ) start_POSTSUPERSCRIPT italic_p , italic_w end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] . end_CELL end_ROW (15)

Bring Eq. 14 into Eq. 15 gives:

𝐒J(ω)=(𝐅mwiω𝐈)1𝐋𝔼[𝐔(iω)U(iω)]𝐋(𝐅mw+iω𝐈)T,=(𝐅mwiω𝐈)1𝐋v𝐋(𝐅mw+iω𝐈)T.\displaystyle\begin{split}\mathbf{S}_{J}(\omega)=&(\mathbf{F}_{m}^{w}-i\omega% \mathbf{I})^{-1}\mathbf{L}\mathbb{E}[\mathbf{U}(i\omega)U(-i\omega)^{\top}]% \mathbf{L}^{\top}(\mathbf{F}_{m}^{w}+i\omega\mathbf{I})^{-T},\\ &=(\mathbf{F}_{m}^{w}-i\omega\mathbf{I})^{-1}\mathbf{L}v\mathbf{L}^{\top}(% \mathbf{F}_{m}^{w}+i\omega\mathbf{I})^{-T}.\end{split}start_ROW start_CELL bold_S start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ( italic_ω ) = end_CELL start_CELL ( bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT - italic_i italic_ω bold_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_L blackboard_E [ bold_U ( italic_i italic_ω ) italic_U ( - italic_i italic_ω ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] bold_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT + italic_i italic_ω bold_I ) start_POSTSUPERSCRIPT - italic_T end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ( bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT - italic_i italic_ω bold_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_L italic_v bold_L start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_F start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT + italic_i italic_ω bold_I ) start_POSTSUPERSCRIPT - italic_T end_POSTSUPERSCRIPT . end_CELL end_ROW (16)

Finally S(ω)𝑆𝜔S(\omega)italic_S ( italic_ω ) in Eq. 6 is:

S(ω)=𝐆𝐒J(ω)𝐆.𝑆𝜔subscript𝐆𝐒𝐽𝜔superscript𝐆top\displaystyle\begin{split}S(\omega)=\mathbf{G}\mathbf{S}_{J}(\omega)\mathbf{G}% ^{\top}.\end{split}start_ROW start_CELL italic_S ( italic_ω ) = bold_GS start_POSTSUBSCRIPT italic_J end_POSTSUBSCRIPT ( italic_ω ) bold_G start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT . end_CELL end_ROW (17)

Finding roots for T(iω)𝑇𝑖𝜔T(i\omega)italic_T ( italic_i italic_ω ) in Eq. 8.

Using the T(iω)𝑇𝑖𝜔T(i\omega)italic_T ( italic_i italic_ω )’s coefficients b0,b1,,b2ksubscript𝑏0subscript𝑏1subscript𝑏2𝑘b_{0},b_{1},\dots,b_{2k}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT, we can create a companion matrix:

𝐁=[000b2kb0100b2k1b0001b1b0],𝐁matrix000subscript𝑏2𝑘subscript𝑏0100subscript𝑏2𝑘1subscript𝑏0001subscript𝑏1subscript𝑏0\begin{split}\mathbf{B}&=\begin{bmatrix}0&0&\dots&0&-\frac{b_{2k}}{b_{0}}\\ 1&0&\dots&0&-\frac{b_{2k-1}}{b_{0}}\\ \vdots&\vdots&\dots&\vdots&\vdots\\ 0&0&\dots&1&-\frac{b_{1}}{b_{0}}\\ \end{bmatrix},\end{split}start_ROW start_CELL bold_B end_CELL start_CELL = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL … end_CELL start_CELL 0 end_CELL start_CELL - divide start_ARG italic_b start_POSTSUBSCRIPT 2 italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL 1 end_CELL start_CELL 0 end_CELL start_CELL … end_CELL start_CELL 0 end_CELL start_CELL - divide start_ARG italic_b start_POSTSUBSCRIPT 2 italic_k - 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL … end_CELL start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL … end_CELL start_CELL 1 end_CELL start_CELL - divide start_ARG italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG end_CELL end_ROW end_ARG ] , end_CELL end_ROW (18)

where the eigenvalues of this matrix are the roots for T(iω)𝑇𝑖𝜔T(i\omega)italic_T ( italic_i italic_ω ) (Hom & Johnson, 1985). Notably, the companion matrix is structured as a Hessenberg matrix, suggesting that its eigenvalues can be obtained through the QR algorithm with Givens rotation. This process has a time complexity of 𝒪(k)𝒪𝑘\mathcal{O}(k)caligraphic_O ( italic_k ) for each iteration.

Appendix B DLAG latent variables for V1-V2 spike train data

Refer to caption
Figure 6: Across- and within-region latent variables for DLAG on V1-V2 spike train data. DLAG’s two across-region latent variables both have some oscillations. However, there are fewer oscillations in the first dimension of our model’s across-region variables (Figure 5), indicating a better disentanglement of frequencies.

Appendix C Within-region latent variables for V1-V2 spike train data

Refer to caption
Figure 7: Within-region latent variables for MRM-GP and DLAG on local field potential recordings from V1 and VISam. MRMGP and DLAG exhibit similar within-region latent variables associated with high-frequency noise, indicating neural activities specific to a certain region.

Appendix D Compare latent variables with different k

Refer to caption
Figure 8: Compare the smoothness of latent variables with k=2,3,4𝑘234k=2,3,4italic_k = 2 , 3 , 4. We generate latent variables with the same frequency, length scale and phase delays, while changing the value of k𝑘kitalic_k. The results show that a larger k𝑘kitalic_k makes the latent variable smoother.

Appendix E Spectrum of the LFP data

Refer to caption
Figure 9: The spectrum for V1 and VISam LFP data with one trial and one channel shown. The multiple peaks in both V1 and VISam spectra indicate a mix of frequencies. In Figure 4, the spectrum of DLAG’s across-region latent variable has three peaks, while our model’s across-region latent variable has only one dominant peak, indicating a better frequency separation brought by the multi-region kernel in Eq.1.

Appendix F Synthetic data with a wide range of frequencies and delays

Refer to caption
Figure 10: Synthetic data experiments with a wide range of frequencies and delays. We tested our model on a wide range of parameters, where the frequencies vary from 0.1 rad/s to 1.5 rad/s, and the delays vary from -2 to 2. Under all these cases, our model could perfectly recover the latent variables.

Appendix G Synthetic data with different parameter initialization

Refer to caption
Figure 11: Synthetic data experiments with different parameter initialization. We tested our model with three initialization settings: from smoothness to unsmoothness. Under all these cases, our model could perfectly recover the latent variables, indicating that our model is not very sensitive to parameter initialization.

Appendix H Synthetic data without pure oscillation

Refer to caption
Figure 12: Synthetic data experiment without pure oscillation. We generated latent variables from a Gaussian Process with SE kernel. We added a manual shift to create delays, where one latent communication has a positive delay from Region A to Region B while the other has an opposite delay. The results show that our model can recover the true latent variables and correct delays between different regions. In summary, this demonstrates our model’s ability to handle non-periodic data.

Appendix I Cross-validating for the dimensionality of MRM-GP

Refer to caption
Figure 13: Determine the number of dimensions: (A-B) Factor Analysis is applied to V1 and VISam, revealing a gradual increase in test log-likelihood (LL) when the number of dimensions surpasses four (averaged over 5-fold cross-validation) . Considering both the inference time for all three models, we opt for four as the total dimensions for each region. (C) Conducting a grid search for across- and within-region variables using relative test LL as the metric—the difference between the model’s test LL and Factor Analysis’s test LL. The outcomes indicate that one across-region and three within-region variables yield the highest LL.

Appendix J Applying MRM-GP to synthetic data with Z=3𝑍3Z=3italic_Z = 3

Refer to caption
Figure 14: Applying MRM-GP to synthetic data with Z=3𝑍3Z=3italic_Z = 3. The result closely resembles Figure 3 when Z=2𝑍2Z=2italic_Z = 2, suggesting that the model can effectively learn the correct number of discrete states even specifying a large Z𝑍Zitalic_Z.