Goal-Oriented Sensor Reporting Scheduling for Non-linear Dynamic System Monitoring

Prasoon Raghuwanshi\orcidlink0000-0002-9629-9742, , Onel Luis Alcaraz López\orcidlink0000-0003-1838-5183, , Vimal Bhatia\orcidlink0000-0001-5148-6643, , Matti Latva-aho\orcidlink0000-0002-6261-0969 Prasoon Raghuwanshi, Onel Luis Alcaraz López, and Matti Latva-aho are with the Centre for Wireless Communications, University of Oulu,

90570

, Oulu, Finland (e-mail: [email protected]; [email protected]; [email protected]).Vimal Bhatia is with the Department of Electrical Engineering, Indian Institute of Technology Indore,

453552

, Indore, India (e-mail: [email protected])This research has been supported by the Research Council of Finland (former Academy of Finland) 6G Flagship Programme (Grant 346208), the Finnish Foundation for Technology Promotion, the INDIFICORE project, and the European Commission through the Horizon Europe/JU SNS project Hexa-X-II (Grant 101095759).

Abstract

Goal-oriented communication (GoC) is a form of semantic communication where the effectiveness of information transmission is measured by its impact on achieving the desired goal. In the context of the Internet of Things (IoT), GoC can make IoT sensors to selectively transmit data pertinent to the intended goals of the receiver. Therefore, GoC holds significant value for IoT networks as it facilitates timely decision-making at the receiver, reduces network congestion, and enhances spectral efficiency. In this paper, we consider a scenario where an edge node polls sensors monitoring the state of a non-linear dynamic system (NLDS) to respond to the queries of several clients. Our work delves into the foregoing GoC problem, which we term goal-oriented scheduling (GoS). Our proposed GoS utilizes deep reinforcement learning (DRL) with meticulously devised action space, state space, and reward function. The devised action space and reward function play a pivotal role in reducing the number of sensor transmissions. Meanwhile, the devised state space empowers our DRL scheduler to poll the sensor whose observation is expected to minimize the mean square error (MSE) of the query responses. Our numerical analysis demonstrates that the proposed GoS can either effectively minimize the query response MSE further or obtain a resembling MSE compared to benchmark scheduling methods, depending on the type of query. Furthermore, the proposed GoS proves to be energy-efficient for the sensors and of lower complexity compared to benchmark scheduling methods.

Index Terms:

Deep Reinforcement Learning, Goal-oriented Scheduling, Internet of Things, Non-linear Dynamic System.

I Introduction

There are billions of Internet of Things (IoT) devices worldwide and the number will keep growing in the coming years [1]. Notably, a significant share of the IoT landscape comprises low-cost/power sensors monitoring dynamic systems, which are usually high-dimensional. As a result, massive amounts of data are increasingly exchanged in IoT communications, often under stringent quality of service, e.g., latency and reliability, requirements [2, 3].

Given the resource limitations inherent to IoT sensors and networks, there has been a growing interest in remotely estimating the system states at a fusion center/edge node [4, 5, 6, 7]. Notably, an edge node may remotely estimate the entire system state by gathering observations from a subset of IoT sensors, rather than the entire sensor network. Thus, ultimately resulting in energy-efficient state observation. The application of remote state estimation (RSE) assisted-sensor reporting scheduling is diverse, spanning fields such as voltage regulation in power systems [8], strategic actuator placement in control systems [9], and sensing/reporting scheduling in wireless networks [4, 5, 6].

The value-of-information ( $\operatorname{\textsc{VoI}}$ ) [10] has been suggested in [6, 7] as a suitable metric for quantifying the impact of sensor transmission on the RSE error. Here, RSE error is defined with respect to the desired goal. A goal might be to accurately (i) identify the system state, or (ii) respond to queries from clients regarding the system state. Table I provides examples of potential client queries.

Recently, the authors in [4, 5, 6] utilized RSE-assisted sensor reporting scheduling at an edge node. The objective in [4] is to identify the state of a linear dynamic system, whereas in [5, 6], the focus is on effectively addressing client queries regarding the state of a linear dynamic system. Thus, the $\operatorname{\textsc{VoI}}$ adopted in [4] corresponds to the mean square error $(\operatorname{MSE})$ of the state estimation. Meanwhile, $\operatorname{\textsc{VoI}}$ is defined in [5, 6] as the difference between $\operatorname{MSE}$ of query response relative to the prior and posterior estimates of the state estimator [7]. Here, prior and posterior estimates denote estimates obtained before and after the sensor transmission, respectively. Furthermore, [4, 5, 6] exploit a key advantage offered by RSE, namely, the ability to observe system states by selectively polling a subset of sensors. In [4], the sensor scheduling strategy is devised to minimize the state estimation $\operatorname{MSE}$ , whereas in [5, 6], it aims to minimize the query response $\operatorname{MSE}$ .

TABLE I: Examples of Client Queries

Query	Definition, $z_{c}(\mathbf{x}(t))$
Current state	$\mathbf{x}(t)$
Maximum component	$\max(\mathbf{x}(t))$
Count range	$\sum_{m=1}^{M}\mathbbm{1}(x_{m}(t)\in[\fgee,\fges])$
Sample mean	$z_{mean}=\frac{1}{M}\sum_{m=1}^{M}x_{m}(t)$
Sample variance	$\frac{1}{M-1}\sum_{m=1}^{M}(x_{m}(t)-z_{mean})^{2}$
^∗ Herein, ${\mathbf{x}(t)=[x_{1}(t),\cdots,x_{M}(t)]^{T}\in\mathbb{R}^{M\times 1}}$ .

Fascinatingly, a closed-form mathematical expression for the query response $\operatorname{MSE}$ can be obtained for certain queries like sample mean, sample variance, and current state. Thus, sensor reporting scheduling strategies for such queries can be determined analytically, as depicted in [5]. Conversely, for queries such as the maximum system state component and count range, deriving closed-form mathematical expressions for the query response $\operatorname{MSE}$ proves to be unattainable. Therefore, addressing such queries necessitates the utilization of advanced approaches such as deep reinforcement learning (DRL) to tackle the sensor reporting scheduling problem, as outlined in [6].

Refer to caption — Figure 1: GoC illustration. Clients ask queries to the edge node about the NLDS state observed by the sensors. The edge node, based on the decision taken by its scheduler, may poll a sensor. Besides, the edge node responds to queries based on the state estimate computed by the CQKF.

Note that the proposals in [4, 5, 6] have one common flaw: they assume that the linear dynamic system model is perfectly known at the edge node, a prerequisite for Kalman filter-based RSE. Unfortunately, obtaining such information is often challenging or even impossible, especially in the case of a non-linear dynamic system (NLDS). Moreover, the Kalman filter cannot even deal with NLDS. Besides, in [6], a sensor must be polled at every time step, even when there are no client queries, resulting in unnecessary depletion of sensor energy. Apart from that, the complete state of the Kalman filter is provided as input in [6] to its DRL-based sensor scheduler. This input significantly inflates the size of the deep neural network (DNN) utilized by the DRL-based sensor scheduler, as it must also extract relevant information from the input. On top of that, time instances where no queries are posed are treated uniformly, providing the same reward to the DRL-based sensor scheduling algorithm on all those time instances. Consequently, the proposal in [6] struggles to determine the optimal action in the absence of queries.

Considering the aforesaid deets regarding NLDS and RSE-assisted-sensor reporting scheduling as our motivation, we propose a novel approach termed goal-oriented scheduling (GoS) for IoT sensors tasked with monitoring NLDS. In our goal-oriented communication (GoC) system model, illustrated in Fig. 1, clients pose queries about the NLDS state to the edge node, which then orchestrates sensor reporting scheduling to gather partial yet informative sensor observations. These observations are utilized by the edge node to perform RSE and address client queries. The sole motive of sensor reporting scheduling is to minimize the $\operatorname{MSE}$ of future query responses, hence the phrase goal-oriented scheduling. Within our system model, the edge node employs a DRL-based scheduler, which decides whether to poll a sensor at each time step. We have devised a reward function such that our DRL-based sensor scheduler makes judicious decisions even when no queries are posed. Furthermore, the edge node utilizes the observation from the polled sensor and the cubature quadrature Kalman filter (CQKF) [11] to estimate the entire NLDS state and respond to the client queries. However, since CQKF requires a mathematical model for the NLDS, we employ Holt’s method [12, 11] to iteratively estimate it. Additionally, we provide a specific attribute of the CQKF state as input to our DRL-based sensor scheduler. This input not only aids in minimizing the query response $\operatorname{MSE}$ but also significantly shrinks the size of the DNN utilized by our DRL-based sensor scheduler. Lastly, we weigh the performance of our proposed scheduler against two benchmark schedulers: the scheduler adopted in [6] and the Monte Carlo scheduler. Our complexity analysis indicates that the proposed scheduler exhibits the least complexity among the considered schedulers. Moreover, the numerical results reveal that, depending on the query type, our proposed scheduler either further minimizes the query response $\operatorname{MSE}$ or obtains a resembling $\operatorname{MSE}$ relative to the benchmark schedulers. In any case, this is accomplished by reducing the number of sensor transmissions, thereby saving sensor energy.

Algorithm 1

\operatorname{\textsc{CQpoints}}

M,n^{\prime}

1. Find the intersection points

\boldsymbol{\psi}_{j},\forall j\in\{1,\cdots,2M\}

of the unit

M

-hyper-sphere and its axes

\triangleright\ \boldsymbol{\psi}_{j}:

cubature point

2. Compute the roots

{\lambda_{j^{\prime}},\forall j^{\prime}\in\{1,\cdots,n^{\prime}\}}

of the CL polynomial

\triangleright\ \lambda_{j^{\prime}}:

quadrature point

\boldsymbol{\xi}_{j^{\prime}+(j-1)n^{\prime}}=\sqrt{2\lambda_{j^{\prime}}}% \boldsymbol{\psi}_{j},

\triangleright

CQ point

w_{j^{\prime}+(j-1)n^{\prime}}=\frac{n^{\prime}!}{2M}\frac{\Gamma(\iota+n^{% \prime}+1)}{\Gamma(M/2)\lambda_{j^{\prime}}}\frac{1}{L^{\prime}(\lambda_{j^{% \prime}})^{2}},

^{$\ddagger\ddagger$}

\forall j\in\{1,\cdots,2M\},\forall j^{\prime}\in\{1,\cdots,n^{\prime}\}

\mathbf{w}=[w_{1},\cdots,w_{2Mn^{\prime}}]^{T},\boldsymbol{\Xi}=[\boldsymbol{% \xi}_{1},\cdots,\boldsymbol{\xi}_{2Mn^{\prime}}]^{T}

^{$\ddagger\ddagger$} $L^{\prime}(\lambda_{j^{\prime}})$ is the first derivative of $L(\cdot)$ at $\lambda=\lambda_{j^{\prime}}$ .

The paper is structured as follows. Section II delineates the system model. Section III describes the components of the GoS framework and presents the scheduling problem. Section IV introduces benchmark schedulers and Section V discusses the computational complexities of all the considered schedulers. Section VI presents the numerical results. Lastly, Section VII concludes the paper and outlines potential avenues for future research.

Notation: ${\operatorname{argmax}(\cdot)}$ and ${\max(\cdot)}$ denote the argument of the maximum function and the maximum function itself, respectively. Similarly, ${\operatorname{argmin}(\cdot)}$ and ${\min(\cdot)}$ denote the argument of the minimum function and the minimum function itself, respectively. The cardinality of a set is represented by ${|\cdot|}$ , while the transpose operation is denoted by ${[\cdot]^{T}}$ . Column vectors/matrices are indicated by boldface lowercase/uppercase letters. The determinant, trace of a square matrix, and the expected value are denoted by ${\det(\cdot)}$ , ${\operatorname{Tr}(\cdot)}$ and ${\mathbb{E}[\cdot]}$ , respectively. ${\mathbf{I}_{M}}$ and ${\mathbf{0}_{M}}$ signify the ${M\times M}$ identity matrix and null vector of dimension ${M\times 1}$ , respectively. Additionally, ${\mathbf{1}_{p}}$ denotes a vector of dimension ${M\times 1}$ with all elements set to zero except the $p^{th}$ element, which is $1$ . The sets ${\mathbb{R}^{M\times 1}}$ and ${\mathbb{N}^{C\times 1}}$ represent real vectors of dimension ${M\times 1}$ and non-negative integer vectors of dimension ${C\times 1}$ , respectively. A Gaussian sample vector with mean ${\mathbf{\bar{y}}}$ and covariance matrix ${\mathbf{Z}}$ is denoted as ${\mathbf{y}\sim\mathcal{N}(\mathbf{\bar{y}},\mathbf{Z})}$ . Meanwhile, a Gaussian sample observation with mean ${\mathbf{1}_{n}^{T}\mathbf{\bar{y}}}$ and covariance ${\mathbf{1}_{n}^{T}\mathbf{Z}\mathbf{1}_{n}}$ is denoted as $y\sim\mathcal{N}(\mathbf{1}_{n}^{T}\mathbf{\bar{y}},\mathbf{1}_{n}^{T}\mathbf{% Z}\mathbf{1}_{n})$ . The indicator function, Cholesky decomposition, sample variance, and uniform distribution between $0$ and $1$ are denoted by ${\mathbbm{1}(\cdot)}$ , ${\operatorname{\textsc{Chol}}(\cdot)}$ , ${\operatorname{\textsc{Var}}(\cdot)}$ , and ${\mathcal{U}(0,1)}$ , respectively.

II System Model

Consider the GoC system illustrated in Fig. 1. In this system, an edge node receives data from $N$ sensors indexed by ${n\in\{1,2,\cdots,N\}}$ and is tasked with responding to queries from a set $\mathscr{C}$ of $C$ remote clients. A query from client ${c\in\mathscr{C}}$ is a request for the value of the function ${z_{c}(\mathbf{x}(t))}$ , while the edge node responds to it with an estimate ${\hat{z}_{c}}$ . Each client asks a different type of query about the system state. The system operates in discrete time slots, labeled as ${t}$ . In each slot, the edge node decides whether to poll a single sensor or refrain from doing so. The sensors observe NLDS, with its state represented as

\displaystyle\mathbf{x}(t)=\mathbf{f}(\mathbf{x}(t-1))+\mathbf{v}_{1}(t)\in% \mathbb{R}^{M\times 1},

(1)

where $M$ is the dimensionality of the NLDS state, ${\mathbf{f}(\cdot)}$ represents a nonlinear state dynamics (NLSD) function, and ${\mathbf{v}_{1}(t)\sim\mathcal{N}(\mathbf{0}_{M},\mathbf{\Sigma}_{v_{1}})}$ denotes the Gaussian noise with zero mean and covariance ${\mathbf{\Sigma}_{v_{1}}\in\mathbb{R}^{M\times M}}$ .

The sensors observe the system state as captured by

\displaystyle\mathbf{y}(t)=\mathbf{H}\mathbf{x}(t)+\mathbf{v}_{2}(t)\in\mathbb% {R}^{N\times 1}.

(2)

Herein, ${\mathbf{H}\in\mathbb{R}^{N\times M}}$ represents the observation matrix, and ${\mathbf{v}_{2}(t)\sim\mathcal{N}(\mathbf{0}_{N},\mathbf{\Sigma}_{v_{2}})}$ is the zero-mean Gaussian measurement noise with covariance matrix ${\mathbf{\Sigma}_{v_{2}}\in\mathbb{R}^{N\times N}}$ . Additionally, we model the channel between sensor $n$ and edge node as a packet erasure channel with a transmission error probability $\hbar_{n}$ .

Algorithm 2 CQKF at

t

\hat{\mathbf{x}}_{pos}(t-1),\mathbf{\Psi}_{pos}(t-1),\mathbf{\Sigma}_{v_{1}},% \mathbf{w},\boldsymbol{\Xi},\varpi,\varsigma,

\boldsymbol{a}(t-1),\boldsymbol{b}(t-1),\mathbf{\Sigma}_{v_{2}},\mathbf{H},p

\hat{\mathbf{x}}_{pri}(t),\mathbf{\Psi}_{pri}(t),\boldsymbol{a}(t),\boldsymbol% {b}(t),\mathbf{Z}^{*}(t-1)\leftarrow

\operatorname{\textsc{PredictionStep}}(\hat{\mathbf{x}}_{pos}(t-1),\mathbf{% \Psi}_{pos}(t-1),\mathbf{\Sigma}_{v_{1}},\mathbf{w},

\boldsymbol{\Xi},\varpi,\varsigma,\boldsymbol{a}(t-1),\boldsymbol{b}(t-1))

2. Draw

\theta

from

\mathcal{U}(0,1)

3. if

(p\neq 0)

and

(\theta\geq 0.02\lceil{\frac{p-1}{10}}\rceil)

then

\hat{\mathbf{x}}_{pos}(t),\mathbf{\Psi}_{pos}(t)\leftarrow\operatorname{% \textsc{UpdateStep}}(\hat{\mathbf{x}}_{pri}(t),\mathbf{\Psi}_{pri}(t),

\mathbf{Z}^{*}(t-1),\mathbf{\Sigma}_{v_{2}},\mathbf{H},\mathbf{w},\boldsymbol{% \Xi},y(t),p)

5. else

\{\hat{\mathbf{x}}_{pos}(t),\mathbf{\Psi}_{pos}(t)\}=\{\hat{\mathbf{x}}_{pri}(% t),\mathbf{\Psi}_{pri}(t)\}

7. end if

\hat{\mathbf{x}}_{pri}(t),\mathbf{\Psi}_{pri}(t),\boldsymbol{a}(t),\boldsymbol% {b}(t),\hat{\mathbf{x}}_{pos}(t),\mathbf{\Psi}_{pos}(t)

III Goal-Oriented Scheduling

The proposed GoS framework comprises the following three key components: state estimator, sensor scheduler, and query process at the clients. Detailed descriptions of each component are provided next.

III-A System State Estimator

We employ CQKF for NLDS state estimation. As initialization, CQKF requires cubature quadrature (CQ) points $(\boldsymbol{\Xi})$ and their corresponding weights $(\mathbf{w})$ , whose computation procedure is available in Algorithm 1. Initially, we determine the cubature points ${\boldsymbol{\psi}_{j},\forall j\in\{1,\cdots,2M\}}$ , which are the intersection points of the unit $M$ -hyper-sphere and its axes. For example, the unit $2$ -hyper-sphere, also known as the unit circle, has ${[1,0]^{T},[0,1]^{T},[-1,0]^{T}}$ and ${[0,-1]^{T}}$ as its four cubature points, which are basically the intersection points of the unit $2$ -hyper-sphere with its axes. Likewise, the unit $M$ -hyper-sphere has ${\boldsymbol{\psi}_{j}=\mathbf{1}_{j},\boldsymbol{\psi}_{M+j}=-\mathbf{1}_{j},% \forall j\in\{1,\cdots,M\}}$ , as its cubature points. Subsequently, we compute the roots ${\lambda_{j^{\prime}},\forall j^{\prime}\in\{1,\cdots,n^{\prime}\}}$ of the Chebyshev-Leguerre (CL) polynomial, known as quadrature points. Here, the CL polynomial is given as

\displaystyle\begin{split}L(\lambda)=&\sum_{k=0}^{n^{\prime}}\binom{n^{\prime}% }{k}(-1)^{k}\frac{(n^{\prime}+\iota)!}{(n^{\prime}+\iota-k)!}\lambda^{n^{% \prime}-k}\\ =&\ell_{0}+\ell_{1}\lambda+\cdots+\ell_{n^{\prime}-1}\lambda^{n^{\prime}-1}+% \lambda^{n^{\prime}},\end{split}

(3)

where $\iota=\frac{M}{2}-1$ . Consider $\boldsymbol{\ell}=[\ell_{1},\cdots,\ell_{n^{\prime}-1}]^{T}$ . To compute quadrature points, we first have to formulate the companion matrix ${(\mathbf{D})}$ corresponding to $L(\lambda)$ , where

\displaystyle\mathbf{D}=\begin{bmatrix}\mathbf{0}_{n^{\prime}-1}&\mathbf{I}_{n% ^{\prime}-1}\\ -\ell_{0}&-\boldsymbol{\ell}^{T}\end{bmatrix}.

(4)

Next, we formulate the characteristic polynomial of ${\mathbf{D}}$ , which is ${\det(\mathbf{D}-\lambda\mathbf{I}_{n^{\prime}})}$ , here $\lambda$ corresponds to the eigenvalues of $\mathbf{D}$ . Note that, ${L(\lambda)=\det(\mathbf{D}-\lambda\mathbf{I}_{n^{\prime}})}$ . Therefore, the eigenvalues of $\mathbf{D}$ are the roots of ${L(\lambda)}$ . Finally, we determine $\boldsymbol{\Xi}$ and $\mathbf{w}$ by utilizing the cubature and quadrature points in step 3 of Algorithm 1, respectively. Note that, ${L^{\prime}(\lambda_{j^{\prime}})}$ in step 3 of Algorithm 1 is the first derivative of ${L(\cdot)}$ at ${\lambda=\lambda_{j^{\prime}}}$ .

The CQKF is detailed in Algorithm 2 and encompasses two steps: prediction step and update step, elaborated thoroughly in Algorithm 3 and 4, respectively.

Algorithm 3

\operatorname{\textsc{PredictionStep}}

\hat{\mathbf{x}}_{pos}(t-1),\mathbf{\Psi}_{pos}(t-1),\mathbf{\Sigma}_{v_{1}},% \mathbf{w},\boldsymbol{\Xi},\varpi,\varsigma,

\boldsymbol{a}(t-1),\boldsymbol{b}(t-1)

\mathbf{\Sigma}_{pri}=\operatorname{\textsc{Chol}}(\mathbf{\Psi}_{pos}(t-1))

\triangleright

Cholesky decomposition

\boldsymbol{\zeta}_{i}(t-1)=\mathbf{\Sigma}_{pri}\boldsymbol{\xi}_{i}+\hat{% \mathbf{x}}_{pos}(t-1),\forall i\in\{1,\cdots,2Mn^{\prime}\}

\triangleright

\mathbf{Z}(t-1)=[\boldsymbol{\zeta}_{1}(t-1),\cdots,\boldsymbol{\zeta}_{2Mn^{% \prime}}(t-1)]^{T}

\mathbf{Z}^{*}(t-1),\boldsymbol{a}(t),\boldsymbol{b}(t)\leftarrow\operatorname% {\textsc{HoltsMethod}}(\varpi,\varsigma,{\mathbf{Z}(t-1)},

{\boldsymbol{a}(t-1)},\boldsymbol{b}(t-1),\hat{\mathbf{x}}_{pos}(t-1))

\triangleright

\mathbf{Z}^{*}(t-1)=[\boldsymbol{\zeta}_{1}^{*}(t-1),\cdots,\boldsymbol{\zeta}% _{2Mn^{\prime}}^{*}(t-1)]^{T}

\hat{\mathbf{x}}_{pri}(t)=\sum_{i=1}^{2Mn^{\prime}}w_{i}\boldsymbol{\zeta}_{i}% ^{*}(t-1)

\mathbf{\Psi}_{pri}(t)=\sum_{i=1}^{2Mn^{\prime}}w_{i}\boldsymbol{\zeta}_{i}^{*% }(t-1)\boldsymbol{\zeta}_{i}^{*T}(t-1)

-\hat{\mathbf{x}}_{pri}(t)\hat{\mathbf{x}}_{pri}^{T}(t)+\mathbf{\Sigma}_{v_{1}}

\hat{\mathbf{x}}_{pri}(t),\mathbf{\Psi}_{pri}(t),\boldsymbol{a}(t),\boldsymbol% {b}(t),\mathbf{Z}^{*}(t-1)

III-A1 Prediction step

The prediction step computes the prior estimates, $\hat{\mathbf{x}}_{pri}(t)$ and $\mathbf{\Psi}_{pri}(t)$ . At the outset, we compute the Cholesky decomposition $\mathbf{\Sigma}_{pri}$ of the previous posterior covariance $\mathbf{\Psi}_{pos}(t-1)$ , which is further put into service to determine the sampling points $\mathbf{Z}(t-1)$ . Later on, Holt’s method, elucidated in the next paragraph, transforms $\mathbf{Z}(t-1)$ into the updated sampling points $\mathbf{Z}^{*}(t-1)$ . At last, we compute $\hat{\mathbf{x}}_{pri}(t)$ and $\mathbf{\Psi}_{pri}(t)$ by utilizing $\mathbf{Z}^{*}(t-1)$ in step 4 and 5 of Algorithm 3.

Knowing $\mathbf{f}(\cdot)$ is essential to transform $\mathbf{Z}(t-1)$ into $\mathbf{Z}^{*}(t-1)$ , but such information is not available at the edge node. Therefore, we opt for Holt’s method, described in detail in Algorithm 5, a reliable way to model the NLSD function $\mathbf{f}(\cdot)$ . Holt’s method estimates $\mathbf{f}(\cdot)$ according to the expression available in step 1, which is updated at each time step with the help of the following smoothing parameters: ${\varpi,\varsigma,\boldsymbol{a}(t)\ \textrm{and}\ \boldsymbol{b}(t)}$ . Here, ${\varpi\ \textrm{and}\ \varsigma}$ are constants, while ${\boldsymbol{a}(t)\ \textrm{and}\ \boldsymbol{b}(t)}$ are variables whose update procedure is mentioned is step 2 and 3 of Algorithm 5.

Note that, CQKF necessitates $p$ , denoting the index of the selected action, and a random number ${\theta\in\mathcal{U}(0,1)}$ . Both the term action and $p$ are part of Algorithm 6. If ${p>0}$ and ${\theta\geq\hbar_{p}}$ , where $\hbar_{p}=0.02\lceil{\frac{p-1}{10}}\rceil$ [5], we advance to the update step to compute the posterior estimates, $\hat{\mathbf{x}}_{pos}(t)$ and $\mathbf{\Psi}_{pos}(t)$ . Otherwise, ${\{\hat{\mathbf{x}}_{pos}(t),\mathbf{\Psi}_{pos}(t)\}=\{\hat{\mathbf{x}}_{pri}% (t),\mathbf{\Psi}_{pri}(t)\}}$ .

III-A2 Update step

In the update step, we compute the Cholesky decomposition $\mathbf{\Sigma}_{pos}$ of $\mathbf{\Psi}_{pri}(t)$ . Following this, we determine the sampling points $\mathbf{Z}(t)$ , which undergo a linear transformation to become the updated sampling points $\mathbf{Z}^{*}(t)$ , as delineated in step 3 of Algorithm 4. Subsequently, we compute a vector $\hat{\mathbf{y}}(t)$ , representing the predicted sensor measurements, which is then put into service to determine the innovation error covariance $\mathbf{\Psi}_{yy}(t)$ , cross-covariance $\mathbf{\Psi}_{xy}(t)$ , and Kalman gain $\mathbf{K}(t)$ . Lastly, we compute $\hat{\mathbf{x}}_{pos}(t)$ and $\mathbf{\Psi}_{pos}(t)$ by employing $\mathbf{K}(t)$ , $\mathbf{\Psi}_{yy}(t)$ , and $y(t)$ in step 8 and step 9 of Algorithm 4. Here, $y(t)$ denotes the measurement of the polled sensor.

Algorithm 4

\operatorname{\textsc{UpdateStep}}

\hat{\mathbf{x}}_{pri}(t),\mathbf{\Psi}_{pri}(t),\mathbf{Z}^{*}(t-1),\mathbf{% \Sigma}_{v_{2}},\mathbf{H},\mathbf{w},\boldsymbol{\Xi},y(t),p

\mathbf{\Sigma}_{pos}=\operatorname{\textsc{Chol}}(\mathbf{\Psi}_{pri}(t))

\triangleright

Cholesky decomposition

\boldsymbol{\zeta}_{i}(t)=\mathbf{\Sigma}_{pos}\boldsymbol{\xi}_{i}+\hat{% \mathbf{x}}_{pri}(t),i\in\{1,\cdots,2Mn^{\prime}\}

\triangleright

\mathbf{Z}(t)=[\boldsymbol{\zeta}_{1}(t),\cdots,\boldsymbol{\zeta}_{2Mn^{% \prime}}(t)]^{T}

\boldsymbol{\zeta}_{i}^{*}(t)=\mathbf{H}\boldsymbol{\zeta}_{i}(t),i\in\{1,% \cdots,2Mn^{\prime}\}

\triangleright

\mathbf{Z}^{*}(t)=[\boldsymbol{\zeta}_{1}^{*}(t),\cdots,\boldsymbol{\zeta}_{2% Mn^{\prime}}^{*}(t)]^{T}

\hat{\mathbf{y}}(t)=\sum_{i=1}^{2Mn^{\prime}}w_{i}\boldsymbol{\zeta}_{i}^{*}(t)

\triangleright

\hat{\mathbf{y}}(t)=[\hat{y}_{1}(t),\cdots,\hat{y}_{N}(t)]^{T}

\mathbf{\Psi}_{yy}(t)=\sum_{i=1}^{2Mn^{\prime}}w_{i}\boldsymbol{\zeta}_{i}^{*}% (t)\boldsymbol{\zeta}_{i}^{*T}(t)-\hat{\mathbf{y}}(t)\hat{\mathbf{y}}^{T}(t)+% \mathbf{\Sigma}_{v_{2}}

\mathbf{\Psi}_{xy}(t)=\sum_{i=1}^{2Mn^{\prime}}w_{i}\boldsymbol{\zeta}_{i}^{*}% (t-1)\boldsymbol{\zeta}_{i}^{*T}(t)-\hat{\mathbf{x}}_{pri}(t)\hat{\mathbf{y}}^% {T}(t)

\triangleright

\mathbf{Z}^{*}(t-1)=[\boldsymbol{\zeta}_{1}^{*}(t-1),\cdots,\boldsymbol{\zeta}% _{2Mn^{\prime}}^{*}(t-1)]^{T}

\mathbf{K}(t)=\mathbf{\Psi}_{xy}(t)\mathbf{\Psi}_{yy}(t)^{-1}

\triangleright

Kalman gain

\hat{\mathbf{x}}_{pos}(t)=\hat{\mathbf{x}}_{pri}(t)+\mathbf{K}(t)\mathbf{1}_{p% }(y(t)-\hat{y}_{p}(t))

\mathbf{\Psi}_{pos}(t)=\mathbf{\Psi}_{pri}(t)-\mathbf{K}(t)\mathbf{\Psi}_{yy}(% t)\mathbf{K}^{T}(t)

\hat{\mathbf{x}}_{pos}(t),\mathbf{\Psi}_{pos}(t)

Algorithm 5

\operatorname{\textsc{HoltsMethod}}

\varpi,\varsigma,\mathbf{Z}(t-1),\boldsymbol{a}(t-1),\boldsymbol{b}(t-1),\hat{% \mathbf{x}}_{pos}(t-1)

\boldsymbol{\zeta}_{i}^{*}(t-1)=\varpi(1+\varsigma)\boldsymbol{\zeta}_{i}(t-1)% +(1+\varsigma)(1-\varpi)\boldsymbol{\zeta}_{i}(t-1)

-\varsigma\boldsymbol{a}(t-1)+(1-\varsigma)\boldsymbol{b}(t-1),\forall i\in\{1% ,\cdots,2Mn^{\prime}\}

\triangleright

\mathbf{Z}^{*}(t-1)=[\boldsymbol{\zeta}_{1}^{*}(t-1),\cdots,\boldsymbol{\zeta}% _{2Mn^{\prime}}^{*}(t-1)]^{T}

\boldsymbol{a}(t)=\varpi\hat{\mathbf{x}}_{pos}(t-1)+(1-\varpi)\boldsymbol{a}(t% -1)

\boldsymbol{b}(t)=\varsigma(\boldsymbol{a}(t)-\boldsymbol{a}(t-1))+(1-% \varsigma)\boldsymbol{b}(t-1)

\mathbf{Z}^{*}(t-1),\boldsymbol{a}(t),\boldsymbol{b}(t)

III-B Query Process and Query Response

The query process can be modeled as a Markov chain (MC). Each client $c$ operates independently, following its own MC, with its state at time $t$ denoted as ${q_{c}(t)\in\mathcal{Q}_{c}}$ , governed by a known transition matrix $\mathbf{T}_{c}$ . Client $c$ always requests the same function $z_{c}$ when its MC is within a subset of states, denoted as $\tilde{\mathcal{Q}}_{c}$ , where ${\tilde{\mathcal{Q}}_{c}\subset\mathcal{Q}_{c}}$ . Besides, the state of each client remains unknown to the edge node.

The edge node responds to a query, from client ${c\in\mathscr{C}}$ , with an estimate ${\hat{z}_{c}(\hat{\mathbf{x}}_{pos}(t),\mathbf{\Psi}_{pos}(t))}$ . The objective of the edge node is to respond to queries as accurately as possible, essentially minimizing the error in query responses. This error is quantified by the query response $\operatorname{MSE}$ , which for client $c$ is defined as [5, 6]

\displaystyle\operatorname{MSE}_{c}(t)=\mathbb{E}\bigl{[}(\hat{z}_{c}(\hat{% \mathbf{x}}_{pos}(t),\mathbf{\Psi}_{pos}(t))-z_{c}(\mathbf{x}(t)))^{2}\bigr{]}.

(5)

III-C GoS Problem

The problem is to anticipate future queries and schedule sensor transmissions to minimize the $\operatorname{MSE}$ on future query responses. This task demands foresight, necessitating an understanding not only of the monitored NLDS but also of the query process and the interplay among various query functions.

We can model the GoS problem at the edge node as a partially observable Markov decision process (POMDP), in which the edge node must decide whether to poll a sensor. Herein, the action space is ${\mathcal{A}=\{0,1,\cdots,N\}}$ , where action ${p=0}$ signifies no device is polled, and action ${p=n\in\{1,\cdots,N\}}$ represents sensor $n$ is polled.

Before initiating the sensor scheduling operation, the edge node possesses prior estimates. Moreover,

\displaystyle\operatorname{Tr}(\mathbf{\Psi}_{pri}(t))=\mathbb{E}[(\mathbf{x}(% t)-\hat{\mathbf{x}}_{pri}(t))^{T}(\mathbf{x}(t)-\hat{\mathbf{x}}_{pri}(t))].

(6)

Consequently, the state in POMDP can be represented as ${\boldsymbol{s}(t)=(\operatorname{Tr}(\mathbf{\Psi}_{pri}(t)),\boldsymbol{q}(t% ))}$ , where ${\boldsymbol{q}(t)=[q_{1}(t),\cdots,q_{C}(t)]^{T}}$ and the state space is ${\mathcal{S}=\mathbb{R}\times\prod_{c=1}^{C}\mathcal{Q}_{c}}$ . However, the edge node lacks knowledge of ${\boldsymbol{q}(t)}$ , instead possessing information about the time ${\boldsymbol{\tau}(t)=[\tau_{1}(t),\cdots,\tau_{C}(t)]^{T}\in\mathbb{N}^{C% \times 1}}$ that elapsed since the last query [6]. Consequently, the edge node has an observation ${\boldsymbol{o}(t)=(\operatorname{Tr}(\mathbf{\Psi}_{pri}(t)),\boldsymbol{\tau% }(t))}$ , with an observation space ${\mathcal{O}=\mathbb{R}\times\mathbb{N}^{C}}$ .

The reward ${r_{p}(t)}$ in POMDP is defined as

\displaystyle r_{p}(t)=\left\{\!\!\!\!\begin{array}[]{l}-\mu^{\mathbbm{1}(p==0% )}\operatorname{Tr}(\mathbf{\Psi}_{pos}(t)),\textrm{no query},\\ -\!\sum_{c=1}^{C}\alpha_{c}\operatorname{MSE}_{c}(t)\mathbbm{1}(\tau_{c}==0),% \textrm{otherwise},\end{array}\right.

(9)

where ${\mu\in(0,1)}$ , ${p\in\mathcal{A}}$ denotes the selected action, while ${\alpha_{c}\in[0,1],\forall c\in\mathscr{C}}$ , signifies the relative importance of client $c$ . Additionally, we presume that ${\alpha_{c}}$ is known to the edge node.

The long-term reward $R(\pi)$ can be stated as

\displaystyle R(\pi(t))=\mathbb{E}\Biggl{[}\sum_{t^{\prime}=0}^{\infty}\gamma^% {t^{\prime}}r_{p}(t+t^{\prime})\bigg{|}\boldsymbol{o}(o),\pi(t)\Biggr{]},

(10)

where ${\gamma\in[0,1)}$ is the exponential discount factor. Moreover, ${\pi:\mathcal{O}\rightarrow\mathbf{\Phi}(\mathcal{A})}$ represents the policy which maps $\mathcal{O}$ to ${\mathbf{\Phi}(\mathcal{A})}$ , where ${\mathbf{\Phi}(\mathcal{A})}$ encompasses the probability of selecting each action. Finally, the GoS problem can be defined as [6]

\displaystyle\pi^{*}(t)=\underset{\pi:\mathcal{O}\rightarrow\mathbf{\Phi}(% \mathcal{A})}{\operatorname{argmax}}R(\pi(t)),

(11)

where $\pi^{*}$ represents the optimal policy.

TABLE II: Online and Target Network Architecture and Parameters

Parameters	Values
Input dimension	$C+1$
Output dimension	$N+1$
Number of hidden layers	$1$
Hidden layers dimension	$\{4\}$
Activation function	ReLU
Optimizer	RMSProp
Initial learning rate	$1.0$
Mini-batch size $(B)$	$\|\mathcal{A}\|\times 30$
Memory buffer size $(\|\mathcal{E}\|)$ [13]	$\|\mathcal{A}\|\times 100$
Exponential discount factor $(\gamma)$ [6]	$0.9$
Threshold for global norm of gradient vector $(\delta)$ [13]	$5.0$
$\Theta_{onl},\Theta_{tar}$ (initialize)	$[-0.3,0.3]$
$\varepsilon$ (initial value)	$1$
$\mu$	$0.1$

III-D CQKF-cum-DRL-based Scheduler

We solve $(\ref{schedulingProblem})$ using DRL, thus, we name our scheduler as CQKF-cum-DRL-based scheduler, described in detail in Algorithm 6. Meanwhile, we are maintaining two DNNs, named online network and target network, to improve the stability of our DRL scheduler. For insights into the architecture of both networks, refer to Table II. A schematic of our proposed GoS is available in Fig. 2.

Algorithm 6 operates as follows. Initially, it computes the prior estimates to formulate ${\boldsymbol{o}(t)}$ . Subsequently, the online network, characterized by its weights $\Theta_{onl}$ , takes ${\boldsymbol{o}(t)}$ as its input and outputs the action values ${\hat{q}_{i}(\boldsymbol{o}(t)),}{\forall i\in\mathcal{A}}$ . Here, ${\hat{q}_{i}(\boldsymbol{o}(t))}$ serves as an estimate of the reward that the scheduler would gain if action $i$ is chosen. The $\epsilon$ -greedy method then employs the action values to select an action ${p\in\mathcal{A}}$ . Primarily, the $\epsilon$ -greedy method opts to select $p$ as the argument of the maximum action value. However, to explore the whole action space, the $\epsilon$ -greedy method occasionally opts to select $p$ randomly from the set $\mathcal{A}$ . The former operation is called exploitation, while the latter, is exploration. The posterior estimates are then reckoned according to steps 2-7 of Algorithm 2. Subsequently, ${r_{p}(t)}$ , gained by the online network for selecting action $p$ , is computed using Algorithm 7. If there is no query, then utilize ${-\mu^{\mathbbm{1}(p==0)}\operatorname{Tr}(\mathbf{\Psi}_{pos}(t))}$ as the reward, to convey the mean square error in the posterior estimate to the DRL scheduler. Note that, because of ${\mu^{\mathbbm{1}(p==0)}}$ , the reward expression provides an extra incentive to the DRL scheduler for selecting action-0, in case of no query. However, if a query has been asked, the subsequent procedure must be followed. At first, compute ${\operatorname{MSE}_{c}(t),\forall c\in\mathscr{C}}$ , required in $(\ref{rewardequation})$ . The computation of $\operatorname{MSE}_{c}(t)$ involves taking $S$ samples from a Gaussian distribution with mean $\hat{\mathbf{x}}_{pos}(t)$ and covariance $\mathbf{\Psi}_{pos}(t)$ . These samples are then utilized to obtain the vector $\boldsymbol{u}=[u_{1},\cdots,u_{S}]^{T}$ , where $u_{s}=z_{c}(\mathbf{x}_{s})$ and $\mathbf{x}_{s}$ is the $s^{th}$ sample. The variance of $\boldsymbol{u}$ yields $\operatorname{MSE}_{c}(t)$ . Once ${\operatorname{MSE}_{c}(t),\forall c\in\mathscr{C}}$ , has been computed, reckon $r_{p}(t)$ using $(\ref{rewardequation})$ .

Algorithm 6 CQKF-cum-DRL-based Scheduler at

t

\Theta_{tar},\boldsymbol{o}(t-1),\boldsymbol{o}(t),\hat{\mathbf{x}}_{pos}(t-1)% ,\mathbf{\Psi}_{pos}(t-1),

\boldsymbol{a}(t-1),\boldsymbol{b}(t-1),\eta,\varepsilon,\jmath

1. Compute

\{\hat{\mathbf{x}}_{pri}(t),\mathbf{\Psi}_{pri}(t),\boldsymbol{a}(t),% \boldsymbol{b}(t),\mathbf{Z}^{*}(t-1)\}

using step 1 of Algorithm 2

2. Evaluate

\hat{q}_{i}(\boldsymbol{o}(t)),\forall i\in\mathcal{A}

using the online network

3. Draw

\theta

from

\mathcal{U}(0,1)

4. if

\theta>\varepsilon

then

p\leftarrow\operatorname{argmax}_{i\in\mathcal{A}}\hat{q}_{i}(\boldsymbol{o}(t))

\triangleright

Exploitation

6. else

7. Select

p

randomly from

\{0,\cdots,N\}

\triangleright

Exploration

8. end if

\triangleright

p:

index of selected action

9. Compute

{\{\hat{\mathbf{x}}_{pos}(t),\mathbf{\Psi}_{pos}(t)\}}

using steps 2-7 of Algorithm 2

10.

r_{p}(t)\leftarrow\operatorname{\textsc{Reward}}(\mathscr{C},S,\hat{\mathbf{x}% }_{pos}(t),\mathbf{\Psi}_{pos}(t),\boldsymbol{\tau},p,{\{\alpha_{c},\forall c% \}})

11. if

\jmath==|\mathcal{E}|

then

12. Remove

\operatorname{\textsc{Tuple}}_{B}

from

\mathcal{E}

13.

\jmath\leftarrow\jmath-1

14. end if

\triangleright

\mathcal{E}:

memory buffer at the edge node

15. if

t>1

then

16. Store

\{\boldsymbol{o}(t-1),p,r_{p}(t),\boldsymbol{o}(t)\}

{(\jmath+1)^{th}}

tuple in

\mathcal{E}

17.

\jmath\leftarrow\jmath+1

\triangleright

\jmath:

number of tuples available in

\mathcal{E}

18. end if

19.

\eta\leftarrow\eta+1

20. if

\eta==20

then

21.

\Theta_{tar}=\Theta_{onl}

\triangleright

Update target network

22.

\eta=0

\triangleright

Restart counter

23. end if

24. Sample a mini-batch

\mathcal{B}

of size

B

from

\mathcal{E}

. Then, provide

{\operatorname{\textsc{Tuple}}_{j,4},\forall j\in\{1,\cdots,B\}}

, as input to the target network and utilize the target network’s outputs in

(\ref{targetvalueequation})

to determine the target values

{\bar{\bar{q}}_{j},\forall j\in\{1,\cdots,B\}}

for

\mathcal{B}

25. Provide

{\operatorname{\textsc{Tuple}}_{j,1},\forall j\in\{1,\cdots,B\}}

, as input to the online network and utilize the corresponding target values

{\bar{\bar{q}}_{j},\forall j\in\{1,\cdots,B\}}

, as labels for updating

\Theta_{onl}

by minimizing

\Omega

, in

(\ref{DRLlossequation})

, using RMSProp

26.

\varepsilon\leftarrow\max(0.1,\ \varepsilon-0.005)

26.

\hat{\mathbf{x}}_{pos}(t),\mathbf{\Psi}_{pos}(t),\boldsymbol{a}(t),\boldsymbol% {b}(t),\varepsilon,\Theta_{tar},\boldsymbol{o}(t),\eta,\jmath

Now that both $r_{p}(t)$ and $p$ are available, we proceed to store ${\{\boldsymbol{o}(t-1),p,r_{p}(t),\boldsymbol{o}(t)\}}$ as $\operatorname{\textsc{Tuple}}_{\jmath+1}$ , i.e., ${(\jmath+1)^{th}}$ tuple, in the finite memory $\mathcal{E}$ and increase $\jmath$ by $1$ . Here, $\jmath$ represents the number of tuples available in $\mathcal{E}$ . If $\mathcal{E}$ is full, we remove $\operatorname{\textsc{Tuple}}_{B}$ from $\mathcal{E}$ and decrease $\jmath$ by $1$ before storing the new tuple. Following this, we update the target network weights, denoted as $\Theta_{tar}$ , by setting ${\Theta_{tar}=\Theta_{onl}}$ , if the counter $\eta$ reached its threshold value, herein set to $20$ .

Next, the training process for the online network commences by sampling a mini-batch $\mathcal{B}$ of size $B$ from $\mathcal{E}$ . Then, we provide ${\operatorname{\textsc{Tuple}}_{j,4},\forall j\in\{1,\cdots,B\}}$ , i.e., fourth element of ${\operatorname{\textsc{Tuple}}_{j}\in\mathcal{B}}$ , as input to the target network and obtain its output ${\vec{\boldsymbol{q}}_{j}=\{\vec{q}_{j,i}|\forall i\in\mathcal{A}\},}{\forall j% \in\{1,\cdots,B\}}$ . Now, we utilize outputs of the target network to determine the target values as

\displaystyle\bar{\bar{q}}_{j}=\operatorname{\textsc{Tuple}}_{j,3}+\gamma% \underset{i\in\mathcal{A}}{\max}\ \vec{q}_{j,i},\forall j\in\{1,\cdots,B\},

(12)

for $\mathcal{B}$ . Not to mention, ${\bar{\bar{q}}_{j},\forall j\in\{1,\cdots,B\}}$ , is an estimate of $R(\pi)$ . Thereupon, we provide ${\operatorname{\textsc{Tuple}}_{j,1},\forall j\in\{1,\cdots,B\}}$ , as input to the online network. The corresponding target values ${\bar{\bar{q}}_{j},\forall j\in\{1,\cdots,B\}}$ , serve as labels for updating $\Theta_{onl}$ by minimizing $\Omega$ using RMSProp optimizer, where

\displaystyle\Omega=\frac{1}{B}\sum_{j=1}^{B}\big{[}\bar{\bar{q}}_{j}-\hat{q}_% {\operatorname{\textsc{Tuple}}_{j,2}}(\operatorname{\textsc{Tuple}}_{j,1})\big% {]}^{2}.

(13)

To deal with the exploding gradient problem during the online network’s training phase, we perform the gradient-norm clipping [14]. This involves clipping the gradient vector $\nabla_{\Theta_{onl}}\Omega$ as

\boldsymbol{\chi}=\frac{\delta\ \nabla_{\Theta_{onl}}\Omega}{\max({\|\nabla_{% \Theta_{onl}}\Omega\|}_{2},\delta)}.

(14)

Herein, $\delta$ represents the threshold value for ${\|\nabla_{\Theta_{onl}}\Omega\|}_{2}$ and the vector $\boldsymbol{\chi}$ stores the clipped gradients. At last, to emphasize exploitation over exploration in the $\epsilon$ -greedy method, it is necessary to gradually decrease $\varepsilon$ . Thus, we reduce $\varepsilon$ by $0.005$ , unless it has already reached $0.1$ .

Algorithm 7

\operatorname{\textsc{Reward}}

\mathscr{C},S,\hat{\mathbf{x}}_{pos}(t),\mathbf{\Psi}_{pos}(t),\boldsymbol{% \tau},p,\{\alpha_{c},\forall c\in\mathscr{C}\}

1. if Query has been asked at

t

then

2. for every

c

that asked a query do

3. Draw

\mathbf{x}_{s}

from

{\mathcal{N}(\hat{\mathbf{x}}_{pos}(t),\mathbf{\Psi}_{pos}(t)),\forall s\in\{1% ,\cdots,S\}}

u_{s}=z_{c}(\mathbf{x}_{s}),\forall s\in\{1,\cdots,S\}

\triangleright

\boldsymbol{u}=[u_{1},\cdots,u_{S}]^{T}

\operatorname{MSE}_{c}(t)=\operatorname{\textsc{Var}}(\boldsymbol{u})

\triangleright

Sample variance

6. end for

r_{p}(t)=-\sum_{c\in\mathscr{C}}\alpha_{c}\operatorname{MSE}_{c}(t)\mathbbm{1}% (\tau_{c}==0)

\triangleright

Reward

8. else

r_{p}(t)=-\mu^{\mathbbm{1}(p==0)}\operatorname{Tr}(\mathbf{\Psi}_{pos}(t))

\triangleright

Reward

10. end if

10.

r_{p}(t)

IV Benchmark Schedulers

IV-A Monte Carlo scheduler

The Monte Carlo scheduler, described in detail in Algorithm 8, is adopted as a benchmark due to its versatility in handling any query type. For a given client ${c\in\mathscr{C}}$ , Algorithm 8 operates as follows. Initially, it computes the prior estimates, and then subsequently, in an iterative manner, $S$ distinct Gaussian samples are drawn for sensor $n$ in step 11, by computing $S$ distinct posterior estimates either in step 7 or in step 9 depending on the inequality in step 5. The $S$ Gaussian samples are then employed to compute $S$ distinct query responses in step 12, in an iterative manner. These query responses are stored in ${\boldsymbol{u}}$ . Next, in step 14, ${\operatorname{\textsc{Var}}(\boldsymbol{u})}$ is computed and stored in ${\boldsymbol{\nu}}$ . Here ${\operatorname{\textsc{Var}}(\boldsymbol{u})}$ represents ${\operatorname{MSE}_{c}(t)}$ expected in case sensor $n$ is polled. Repeat the procedure outlined from step 3 to step 14 a total of $N$ times, to calculate ${\operatorname{\textsc{Var}}(\boldsymbol{u})}$ for every sensor. Now, in step 16, a sensor is polled, whose index value corresponds to the index of the minimum element in ${\boldsymbol{\nu}}$ . Following this, to compute the actual ${\operatorname{MSE}_{c}(t)}$ in step 18, Algorithm 8 again computes the posterior estimates by leveraging the received observation from the polled sensor.

Indeed, it is worth mentioning that the Monte Carlo scheduler does come with limitations. Unlike the proposed CQKF-cum-DRL-based scheduler, we need to design $C$ Monte Carlo schedulers in the case of $C$ clients. Moreover, the Monte Carlo scheduler does not even take into account the information related to the query requests while polling a sensor. It simply polls a sensor whenever a query is asked.

Note that the utilization of CQKF necessitates modifications to the original Monte Carlo scheduler available in [5]. Specifically, we have modified the procedure by relocating the computation of prior estimates, moving it outside the for loops present in steps 2 and 3. This alteration is due to the use of Holt’s method, whose smoothing parameters ${\boldsymbol{a}(t)\ \textrm{and}\ \boldsymbol{b}(t)}$ should only be updated once per time step.

Algorithm 8 Monte Carlo Scheduler for Client

c\in\mathscr{C}

\hat{\mathbf{x}}_{pos}(t-1),\mathbf{\Psi}_{pos}(t-1),\boldsymbol{a}(t-1),% \boldsymbol{b}(t-1)

1. Compute

\{\hat{\mathbf{x}}_{pri}(t),\mathbf{\Psi}_{pri}(t),\boldsymbol{a}(t),% \boldsymbol{b}(t),\mathbf{Z}^{*}(t-1)\}

using step 1 of Algorithm 2

2. for

n\in\{1,\cdots,N\}

3. for

s\in\{1,\cdots,S\}

4. Draw

\theta

from

\mathcal{U}(0,1)

5. if

\theta\geq 0.02\lceil{\frac{n-1}{10}}\rceil

then

6. Draw

y

from

\mathcal{N}(\mathbf{1}_{n}^{T}\hat{\mathbf{x}},\mathbf{1}_{n}^{T}\mathbf{\Psi}% \mathbf{1}_{n})

\hat{\mathbf{x}},\mathbf{\Psi}\leftarrow\operatorname{\textsc{UpdateStep}}(% \hat{\mathbf{x}}_{pri}(t),\mathbf{\Psi}_{pri}(t),\mathbf{Z}^{*}(t-1),

\mathbf{\Sigma}_{v_{2}},\mathbf{H},\mathbf{w},\boldsymbol{\Xi},n)

8. else

\{\hat{\mathbf{x}},\mathbf{\Psi}\}=\{\hat{\mathbf{x}}_{pri}(t),\mathbf{\Psi}_{% pri}(t)\}

10. end if

11.

\mathbf{x}_{s}=\mathcal{N}(\hat{\mathbf{x}},\mathbf{\Psi})

12.

u_{s}=z_{c}(\mathbf{x}_{s})

13. end for

14.

\nu_{i}=\operatorname{\textsc{Var}}(\boldsymbol{u})

\triangleright

Sample variance

15. end for

16.

p=\operatorname{argmin}_{n\in\{1,\cdots,N\}}\boldsymbol{\nu}

\triangleright

\boldsymbol{\nu}=[\nu_{1},\cdots,\nu_{N}]^{T}

17. Compute

{\{\hat{\mathbf{x}}_{pos}(t),\mathbf{\Psi}_{pos}(t)\}}

using steps 2-7 of Algorithm 2

18. Compute

{\operatorname{MSE}_{c}(t)}

using steps 3-5 of Algorithm 7

18.

\hat{\mathbf{x}}_{pos}(t),\mathbf{\Psi}_{pos}(t),\boldsymbol{a}(t),\boldsymbol% {b}(t)

IV-B Benchmark DRL Scheduler

Our second benchmark scheduler adopts the action space, POMDP state/observation space, reward function, and online and target network architecture utilized by the scheduler in [6]. The working of the benchmark DRL scheduler is same as the one described in Algorithm 6, except for the following changes:

•

Change $\mathcal{A}$ to ${\{1,\cdots,N\}}$ , indicating that the edge node must poll a sensor at every time step.
•

In Algorithm 6, provide ${\boldsymbol{o}(t)=(\hat{\mathbf{x}}_{pri}(t),\mathbf{\Psi}_{pri}(t),% \boldsymbol{\tau}(t))}$ , with ${\mathcal{O}=\mathbb{R}^{M+M^{2}}\times\mathbb{N}^{C}}$ , as input to the online network. Here, ${\{\hat{\mathbf{x}}_{pri}(t),\mathbf{\Psi}_{pri}(t)\}}$ indicates the complete state of CQKF after the prediction step.
•

Change step 9 of Algorithm 7 to ${r_{p}(t)=0}$ , indicating a zero reward when no query is posed at $t$ .
•

Change the online and target network architecture by increasing the number of hidden layers to three, having ${\{2.5M,M,N\}}$ neurons and a dropout probability of ${\{0.1,0.1,0\}}$ , respectively.

Thus, the distinctive features that set apart the benchmark DRL scheduler from the proposed scheduler are its action space, observation space, reward function, and DNN architecture.

V Complexity of the Considered Schedulers

TABLE III: Complexity of Fundamental Operations

Operations	Complexity	Operations	Complexity
$\operatorname{\textsc{Chol}}(\mathbf{\Psi}_{pri}(t))$	$M^{3}/3$	ReLU	$1$
$\mathcal{N}(\mathbf{1}_{i}^{T}\hat{\mathbf{x}},\mathbf{1}_{i}^{T}\mathbf{\Psi}% \mathbf{1}_{i})$	$1$	$\mathbf{\Psi}_{yy}(t)^{-1}$	$M^{3}$
$\operatorname{argmin}_{n\in\{1,\cdots,N\}}\nu_{n}$	$N$	$\mathcal{N}(\hat{\mathbf{x}},\mathbf{\Psi})$	$M$
Inequality	$1$	$\operatorname{\textsc{Var}}(\boldsymbol{u})$	$S$
Draw $\theta$ from $\mathcal{U}(0,1)$	$1$	$z_{c}(\mathbf{x}_{s})$	$1$

Herein, we quantify the computational complexity of our considered schedulers in terms of the number of arithmetic operations they perform. Table III presents the complexity expressions for fundamental operations utilized in the algorithms. Note that the complexity expressions for our considered schedulers pertain specifically to the complexity associated with making a scheduling decision at a single time step.

Notice that, because of step 5 of Algorithm 8, deriving an exact expression for the complexity of the Monte Carlo scheduler is not feasible. However, we can derive expressions for both the lower and upper bound of the complexity of the Monte Carlo scheduler. The lower bound expression pertains to the case that the inequality in step 5 of Algorithm 8 is never satisfied. Conversely, the upper bound expression represents the case that the aforesaid inequality is always satisfied. The lower and upper bound complexity expressions are given by


	$\displaystyle\begin{split}\vartheta_{1,lb}=&\frac{M^{3}}{3}+8M^{3}n^{\prime}+2% 2M^{2}n^{\prime}+4M^{2}+12Mn^{\prime}\\ &+NS(4+M)+N,\end{split}$			(15a)
	$\displaystyle\begin{split}\vartheta_{1,ub}=&\ \vartheta_{1,lb}+NS\Bigl{(}\frac% {22M^{3}}{3}+16M^{3}n^{\prime}+10M^{2}n^{\prime}\\ &+8M^{2}+M+3\Bigr{)},\end{split}$			(15b)

respectively. By taking into account the dominant terms in $(\ref{LBmontecarloComplexity})$ and $(\ref{UBmontecarloComplexity})$ , the final lower and upper bound complexity expressions for the Monte Carlo scheduler, in terms of big-O notation, are given by


$\displaystyle\vartheta_{1,lb}$	$\displaystyle=O(8M^{3}n^{\prime}+NSM),$	(16a)
$\displaystyle\vartheta_{1,ub}$	$\displaystyle=O(NSM^{3}n^{\prime}).$	(16b)

The complexity expression for the proposed scheduler is the summation of the complexities across three distinct phases: action values generation phase, action selection phase, and training phase. The complexity expressions for first and third phase are ${\hat{\vartheta}_{1}=\sum_{i=1}^{|\boldsymbol{l}|-1}l_{i+1}(2l_{i}+1)}$ and ${\hat{\vartheta}_{3}=B\hat{\vartheta}_{1}}$ , respectively, as derived in [6]. Here, ${\boldsymbol{l}=[l_{1},\cdots,l_{|\boldsymbol{l}|}]^{T}}$ with ${l_{1}=|\boldsymbol{o}(t)|}$ and ${l_{|\boldsymbol{l}|}=|\mathcal{A}|}$ , while the remaining elements of ${\boldsymbol{l}}$ are the hidden layer sizes. Moreover, because of steps 4- 7 of Algorithm 6, the complexity of the second phase falls within the range ${[3,(2+|\mathcal{A}|)]}$ . In the case of the proposed scheduler, ${\boldsymbol{l}=[(C+1),4,(N+1)]^{T}}$ . Thus, the lower and upper bound complexity expressions for the proposed scheduler are


	$\displaystyle\begin{split}\vartheta_{2,lb}=&\hat{\vartheta}_{1}+\hat{\vartheta% }_{3}+3,\\ =&(B+1)\hat{\vartheta}_{1}+3,\\ =&(30N+31)(8C+9N+21)+3,\end{split}$			(17a)
	$\displaystyle\begin{split}\vartheta_{2,ub}=&\hat{\vartheta}_{1}+\hat{\vartheta% }_{3}+2+(N+1),\\ =&\ \vartheta_{2,lb}+N,\end{split}$			(17b)

respectively. By taking into account the dominant terms in $(\ref{LBproposedschedulerComplexity})$ and $(\ref{UBproposedschedulerComplexity})$ , the final complexity expression for the proposed scheduler is given by

\displaystyle\vartheta_{2}

\displaystyle=O(9N^{2}+8CN).

(18)

TABLE IV: Complexities for Various System Parameter Configurations

\{N,M,C,S,n^{\prime}\}

Proposed

Scheduler

Benchmark

DRL Scheduler

Monte Carlo

Scheduler

\{20,20,2,100,2\}

[136930,

136950]

[27591913,

27591932]

[198367,

651977700]

\{30,20,2,100,2\}

[285820,

285850]

[42644333,

42644362]

[222377,

977891377]

\{20,30,2,100,2\}

[136930,

136950]

[59090323,

59090342]

[552940,

2175018940]

\{20,20,8,100,2\}

[167218,

167238]

[27952513,

27952532]

[198367,

651977700]

As mentioned in subsection IV-B, the working of the benchmark DRL scheduler is the same as the proposed scheduler. Thus, the general complexity expression for the benchmark DRL scheduler is the same as the ones derived for the proposed scheduler. However, this time ${\boldsymbol{l}=[(M+M^{2}+C),2.5M,M,N,N]^{T}}$ . Thus, the lower and upper bound complexity expressions for the benchmark DRL scheduler are


	$\displaystyle\begin{split}\vartheta_{3,lb}=&(30N+1)(5M^{3}+10M^{2}+5MC+3.5M\\ &+2NM+2N^{2}+2N)+3,\end{split}$			(19a)
	$\displaystyle\begin{split}\vartheta_{3,ub}=&\ \vartheta_{3,lb}+N-1,\end{split}$			(19b)

respectively. By taking into account the dominant terms in $(\ref{LBbenchmark2Complexity})$ and $(\ref{UBbenchmark2Complexity})$ , the final complexity expression for the benchmark DRL scheduler is given by

\displaystyle\vartheta_{3}

\displaystyle=O(5M^{3}N+5MCN+2MN^{2}+2N^{3}).

(20)

From $(\ref{bigOMonteCarloComplexity})$ , $(\ref{bigOproposedschedulerComplexity})$ and $(\ref{bigOScheduler2Complexity})$ , we observe that the proposed scheduler has quadratic computational complexity, while the benchmark schedulers have polynomial computational complexity. Moreover, by taking into account, $(\ref{LBmontecarloComplexity})$ , $(\ref{UBmontecarloComplexity})$ , $(\ref{LBproposedschedulerComplexity})$ , $(\ref{UBproposedschedulerComplexity})$ , $(\ref{LBbenchmark2Complexity})$ , $(\ref{UBbenchmark2Complexity})$ , the complexity ranges of the considered schedulers for various system parameter configurations are available in Table IV. As can in seen in Table IV, both lower and upper bound complexities of the proposed scheduler are extremely small for all the system parameter configurations. Specifically, the upper bound complexity of the proposed scheduler is significantly lower than the benchmark schedulers. Furthermore, its notably low complexity renders it suitable for implementation on an embedded processor-based edge node.

VI Results

TABLE V: Parameters Used in Simulations

Parameters

Values

\mathbf{\Sigma}_{v_{1}}

2.5\times 10^{-3}\mathbf{I}_{M}

\mathbf{\Sigma}_{v_{2}}

\mathbf{I}_{N}

\mathbf{H}

\mathbf{I}_{M}

N,M

20

n^{\prime}

2

S

100

\alpha_{c},\forall c\in\mathscr{C}

1

\eta

0

(initial value)

\boldsymbol{a}(0),\boldsymbol{b}(0)

\mathbf{0}_{M}

(initial value)

\hat{\mathbf{x}}_{pos}(0)

\mathbf{0}_{M}

(initial value)

\mathbf{\Psi}_{pos}(0)

\mathbf{I}_{M}

(initial value)

NLSD (21a)

NLSD (21b)

\{\varpi,\varsigma\}

\{0.77,0.02\}

\{0.75,0.025\}

[\fgee,\fges]

[-0.5,-0.2]

[-0.2,-0.1]

Our simulations consider the following two NLSD functions


$\displaystyle\mathbf{f}(\mathbf{x}(t))$	$\displaystyle=\mathbf{x}(t)+0.05\mathbf{x}(t)\odot(\mathbf{1}_{M}-\mathbf{x}(t% )\odot\mathbf{x}(t)),$	(21a)
$\displaystyle\mathbf{f}(\mathbf{x}(t))$	$\displaystyle=\mathbf{x}(t)\odot\operatorname{\textsc{Roll}}(\mathbf{x}(t)),$	(21b)

where ${\operatorname{\textsc{Roll}}(\mathbf{x}(t))=[x_{2},\cdots,x_{M},x_{1}]^{T}}$ , and $\odot$ signifies the element-wise product. Notably, (21a) and (21b) lead to NLDSs with non-correlated and correlated states, respectively. Furthermore, we model the query process at the client side using periodic and memoryless MC, depicted in Fig. 3. Herein, a client generates a query when its corresponding MC reaches state A. Table VI showcases the information about the clients and the queries asked by them, for the case of $C=2$ . Note that, the parameter $\mathfrak{C}$ in Table VI refers to the MC combinations possible at the client side.

TABLE VI: Information about the Clients when

C=2

Parameters

Client-1

Client-2

\mathfrak{C}=1

Periodic MC,

Initial MC state: D

Periodic MC,

Initial MC state: B

\mathfrak{C}=2

Memoryless MC

\mathfrak{C}=3

Memoryless MC

Periodic MC

Query asked

Maximum query

Count range query

The performance evaluation of the schedulers is performed over a duration of $4000$ time steps through ${\operatorname{MSE}_{c}(t),\forall c\in\mathscr{C}}$ , and action selection frequency (ASF) metrics. Besides, we are reckoning the duration of the first $2000$ time steps as a warm-up period for Holt’s method. Consequently, any actions selected and ${\operatorname{MSE}_{c}(t)}$ values, ${\forall c\in\mathscr{C}}$ , computed during the warm-up period are discarded.

TABLE VII: Number of Sensor Transmissions

Proposed Scheduler

\mathfrak{C}

\mu=0.1

NLSD {(21a),(21b)}

\mu=0.01

NLSD {(21a),(21b)}

\mu=1

NLSD {(21a),(21b)}

1

\{190,199\}

\{193,195\}

\{1947,1983\}

2

\{192,200\}

\{186,175\}

\{1981,1995\}

3

\{169,203\}

\{182,191\}

\{1987,1971\}

\mathfrak{C}

Benchmark DRL Scheduler

Monte Carlo Scheduler

1

2000

667

2

2000

682

3

2000

658

Considering NLSD (21b), the bar-plots in Fig. 4 reveal that the action-0 is the most adopted by the proposed scheduler among all of its possible actions. Moreover, the ASFs of all of its remaining actions are below $10^{-2}$ . This dominance of action-0 stems from the reward ${-0.1\operatorname{Tr}(\mathbf{\Psi}_{pos}(t))}$ , which incentivizes the proposed scheduler to opt for the action-0 in the absence of queries. Besides, opting for action-0 minimizes sensor transmissions, consequently saving sensor energy. Meanwhile, the Monte Carlo scheduler predominantly selects action-1 across all ${\mathfrak{C}}$ , resulting in a substantial amount of energy depletion at sensor-1. On the other hand, the ASFs of most of the actions are below $10^{-1}$ across all ${\mathfrak{C}}$ when using the benchmark DRL scheduler. However, ASFs obtained through benchmark schedulers are still higher than those obtained through the proposed scheduler. Furthermore, the proposed scheduler requires the lowest number of sensor transmissions in every ${\mathfrak{C}}$ , as evidenced by Table VII. Consequently, in comparison to the proposed scheduler, the sensor energy depletion is relatively higher in the case of benchmark schedulers.

As illustrated through the box-plots in Fig. 5, the benchmark schedulers obtain a lower ${\operatorname{MSE}_{c}(t)}$ of the maximum query response compared to the proposed scheduler. However, note that the ${\operatorname{MSE}_{c}(t)}$ values for all three schedulers are varying in the range of $10^{-2}$ . Thus, the disparity in ${\operatorname{MSE}_{c}(t)}$ of maximum query response obtained in the case of the proposed scheduler and benchmark schedulers is marginal.

Considering NLSD (21b), the box-plots in Fig. 6 unfolds that the proposed scheduler leads to a decline in ${\operatorname{MSE}_{c}(t)}$ of count range query response, relative to the benchmark schedulers, across all ${\mathfrak{C}}$ . Meanwhile, in the case of NLSD (21a), the proposed and benchmark schedulers obtain similar ${\operatorname{MSE}_{c}(t)}$ .

Furthermore, as illustrated in Fig. 5 and 6, the proposed scheduler exhibits superior performance in count range query compared to maximum query when contrasted with benchmark schedulers. This disparity arises because the ${\operatorname{MSE}_{c}(t)}$ of the maximum query response is notably more susceptible to outliers within the data gathered in ${\boldsymbol{u}}$ , in the steps steps 3-4 of Algorithm 7. Consequently, the ${\operatorname{MSE}_{c}(t)}$ of the maximum query response, refer to step 5 of Algorithm 7, typically fails to offer accurate insights into the central tendency of the collected data. Therefore, estimating a satisfactory ${\operatorname{MSE}_{c}(t)}$ of the maximum query response in the case of the proposed scheduler necessitates a higher value of $\mu$ . Fig. 7 proves this claim, as increasing $\mu$ from $0.1$ to $1$ has actually minimized the ${\operatorname{MSE}_{c}(t)}$ of the maximum query response in the case of the proposed scheduler. An increment in $\mu$ would lead to an increase in the number of sensor transmissions, which, in turn, improves the accuracy of posterior estimates. Consequently, this leads to a decline in the number of outliers within ${\boldsymbol{u}}$ . However, increasing the value of $\mu$ has one significant drawback, which is an increase in the number of sensor transmissions. Table VII shows that increasing $\mu$ from $0.1$ to $1$ has significantly increased the number of sensor transmissions.

Based on the preceding discussion, it is apparent that the proposed scheduler either succeeds in reducing ${\operatorname{MSE}_{c}(t)}$ or obtains a resembling ${\operatorname{MSE}_{c}(t)}$ , relative to the benchmark schedulers. Furthermore, the proposed scheduler accomplishes this by reducing the number of sensor transmissions. The key to the satisfactory performance of the proposed scheduler lies in its input. Instead of feeding the complete prior state of CQKF, i.e., ${(\hat{\mathbf{x}}_{pri}(t),\mathbf{\Psi}_{pri}(t))}$ , as input to the DRL scheduler, we provide a specific attribute of the prior state of CQKF, which is ${\operatorname{Tr}(\mathbf{\Psi}_{pri}(t))}$ . As mentioned in Section III-C, ${\operatorname{Tr}(\mathbf{\Psi}_{pri}(t))}$ reflects the mean square error in the prior estimate. By using ${\operatorname{Tr}(\mathbf{\Psi}_{pri}(t))}$ as input, the DRL scheduler focuses solely on selecting the most fruitful action, which later minimizes ${\operatorname{MSE}_{c}(t)}$ . In contrast, providing the complete prior state of CQKF as input, as done with the benchmark DRL scheduler, adds the extra workload of extracting the valuable information from the input to the DRL scheduler.

Meanwhile, relieving the DRL scheduler of the aforementioned extra workload positively impacts its ability to leverage correlation among NLDS states. In Fig. 6, for NLSD (21b), the proposed scheduler demonstrates a comparatively superior ability to capitalize on the correlation among NLDS states compared to the benchmark schedulers. Better exploitation of correlation implies that the proposed scheduler possesses superior insights about the most fruitful sensor during the time of sensor polling. This, in turn, yields posterior estimates that are relatively better than the ones obtained in the case of the benchmark schedulers. Consequently, this leads to a decline in its ${\operatorname{MSE}_{c}(t)}$ of count range query response, relative to the benchmark schedulers. However, in the case of NLSD (21a), no such correlation among NLDS states is available for the proposed scheduler to exploit, leading to its ${\operatorname{MSE}_{c}(t)}$ of count range query response similar to the benchmark schedulers.

Moreover, because of extra workload, the benchmark DRL scheduler requires a more complex DNN architecture, featuring three hidden layers with ${\{2.5M,M,N\}}$ neurons. In contrast, the DNN architecture of the proposed scheduler comprises just one hidden layer with four neurons. This streamlined architecture is another advantage of utilizing ${\operatorname{Tr}(\mathbf{\Psi}_{pri}(t))}$ as input.

Fig. 8 considers the scenario where alongside the maximum and count range queries, two additional queries, sample mean and variance, are posed to the edge node by two additional clients. Note that there is a negligible disparity between ${\operatorname{MSE}_{c}(t)}$ of query responses obtained in the case of the proposed scheduler and benchmark schedulers, for the maximum, sample mean and variance queries. Besides, Fig. 8 manifests that the proposed scheduler leads to a decline in ${\operatorname{MSE}_{c}(t)}$ of count range query response, relative to the benchmark schedulers, when factoring in NLSD (21a). Meanwhile, in the case of NLSD (21b), the ${\operatorname{MSE}_{c}(t)}$ of the count range query response closely resembles, for all three schedulers. Finally, even with an increase in the number of clients, the performance of the proposed scheduler has not been degraded relative to the benchmark schedulers.

VII Conclusion

This paper introduced a GoS method tailored for IoT sensors tasked with sensing NLDS. The reporting operation is scheduled by the edge node and the phrase goal-oriented in GoS emphasizes its primary objective, which is to accurately respond to client queries regarding the NLDS state. Through GoS, the edge node gathers partial yet insightful sensor observations to advance towards its objective. These observations, along with a state estimator, are used to estimate the complete NLDS state, which is later employed to generate query responses. Notably, our state estimator operates effectively without necessitating an NLDS mathematical model. Moreover, our findings showed that the proposed GoS yields an energy-efficient state observation from the sensor perspective.

Our work here considers only a single RL agent due to the centralized nature of the scheduling. A promising avenue for future research would be to adapt the proposed goal-oriented sensor scheduling framework to a multi-agent RL system, such as unmanned aerial vehicle swarm where each RL agent acts as a sensor scheduler.

References

[1] O. L. A. López, O. M. Rosabal, D. E. Ruiz-Guirola, P. Raghuwanshi, K. Mikhaylov, L. Lovén, and S. Iyer, “Energy-sustainable IoT connectivity: Vision, technological enablers, challenges, and future directions,” IEEE Open Journal of the Communications Society, vol. 4, pp. 2609–2666, 2023.
[2] P. Di Lorenzo, M. Merluzzi, F. Binucci, C. Battiloro, P. Banelli, E. C. Strinati, and S. Barbarossa, “Goal-oriented communications for the IoT: System design and adaptive resource optimization,” IEEE Internet of Things Magazine, vol. 6, no. 4, pp. 26–32, 2023.
[3] C. Zhang, H. Zou, S. Lasaulce, W. Saad, M. Kountouris, and M. Bennis, “Goal-oriented communications for the IoT and application to data compression,” IEEE Internet of Things Magazine, vol. 5, no. 4, pp. 58–63, 2022.
[4] A. Hashemi, M. Ghasemi, H. Vikalo, and U. Topcu, “Randomized greedy sensor selection: Leveraging weak submodularity,” IEEE Transactions on Automatic Control, vol. 66, no. 1, pp. 199–212, 2021.
[5] F. Chiariotti, A. E. Kalør, J. Holm, B. Soret, and P. Popovski, “Scheduling of sensor transmissions based on value of information for summary statistics,” IEEE Networking Letters, vol. 4, no. 2, pp. 92–96, 2022.
[6] J. Holm, F. Chiariotti, A. E. Kalør, B. Soret, T. B. Pedersen, and P. Popovski, “Goal-oriented scheduling in sensor networks with application timing awareness,” IEEE Transactions on Communications, vol. 71, no. 8, pp. 4513–4527, 2023.
[7] D. Gündüz, F. Chiariotti, K. Huang, A. E. Kalør, S. Kobus, and P. Popovski, “Timely and massive communication in 6G: Pragmatics, learning, and inference,” IEEE BITS the Information Theory Magazine, vol. 3, no. 1, pp. 27–40, 2023.
[8] Z. Liu, A. Clark, P. Lee, L. Bushnell, D. Kirschen, and R. Poovendran, “Towards scalable voltage control in smart grid: A submodular optimization approach,” in Proceedings of the ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS), 2016, pp. 1–10.
[9] V. Tzoumas, M. A. Rahimian, G. J. Pappas, and A. Jadbabaie, “Minimal actuator placement with optimal control constraints,” in Proceedings of the American Control Conference (ACC), 2015, pp. 2081–2086.
[10] A. Li, S. Wu, S. Meng, and Q. Zhang, “Towards goal-oriented semantic communications: New metrics, open challenges, and future research directions,” arXiv preprint arXiv:2304.00848, 2023.
[11] S. K. Nanda, “Advanced Kalman filtering with applications to power system and epidemiological data analysis,” PhD dissertation, Indian Institute of Technology Indore, May 2023.
[12] G. Valverde and V. Terzija, “Unscented Kalman filter for power system dynamic state estimation,” IET Generation, Transmission & Distribution, vol. 5, pp. 29–37, Jan. 2011.
[13] O. Nabati, T. Zahavy, and S. Mannor, “Online limited memory neural-linear bandits with likelihood matching,” in Proceedings of the International Conference on Machine Learning (ICML), Jul. 2021, pp. 7905–7915.
[14] “Tensorflow.” [Online]. Available: https://www.tensorflow.org/api_docs/python/tf/clip_by_global_norm