Goal-Oriented Sensor Reporting Scheduling for Non-linear Dynamic System Monitoring
Abstract
Goal-oriented communication (GoC) is a form of semantic communication where the effectiveness of information transmission is measured by its impact on achieving the desired goal. In the context of the Internet of Things (IoT), GoC can make IoT sensors to selectively transmit data pertinent to the intended goals of the receiver. Therefore, GoC holds significant value for IoT networks as it facilitates timely decision-making at the receiver, reduces network congestion, and enhances spectral efficiency. In this paper, we consider a scenario where an edge node polls sensors monitoring the state of a non-linear dynamic system (NLDS) to respond to the queries of several clients. Our work delves into the foregoing GoC problem, which we term goal-oriented scheduling (GoS). Our proposed GoS utilizes deep reinforcement learning (DRL) with meticulously devised action space, state space, and reward function. The devised action space and reward function play a pivotal role in reducing the number of sensor transmissions. Meanwhile, the devised state space empowers our DRL scheduler to poll the sensor whose observation is expected to minimize the mean square error (MSE) of the query responses. Our numerical analysis demonstrates that the proposed GoS can either effectively minimize the query response MSE further or obtain a resembling MSE compared to benchmark scheduling methods, depending on the type of query. Furthermore, the proposed GoS proves to be energy-efficient for the sensors and of lower complexity compared to benchmark scheduling methods.
Index Terms:
Deep Reinforcement Learning, Goal-oriented Scheduling, Internet of Things, Non-linear Dynamic System.I Introduction
There are billions of Internet of Things (IoT) devices worldwide and the number will keep growing in the coming years [1]. Notably, a significant share of the IoT landscape comprises low-cost/power sensors monitoring dynamic systems, which are usually high-dimensional. As a result, massive amounts of data are increasingly exchanged in IoT communications, often under stringent quality of service, e.g., latency and reliability, requirements [2, 3].
Given the resource limitations inherent to IoT sensors and networks, there has been a growing interest in remotely estimating the system states at a fusion center/edge node [4, 5, 6, 7]. Notably, an edge node may remotely estimate the entire system state by gathering observations from a subset of IoT sensors, rather than the entire sensor network. Thus, ultimately resulting in energy-efficient state observation. The application of remote state estimation (RSE) assisted-sensor reporting scheduling is diverse, spanning fields such as voltage regulation in power systems [8], strategic actuator placement in control systems [9], and sensing/reporting scheduling in wireless networks [4, 5, 6].
The value-of-information () [10] has been suggested in [6, 7] as a suitable metric for quantifying the impact of sensor transmission on the RSE error. Here, RSE error is defined with respect to the desired goal. A goal might be to accurately (i) identify the system state, or (ii) respond to queries from clients regarding the system state. Table I provides examples of potential client queries.
Recently, the authors in [4, 5, 6] utilized RSE-assisted sensor reporting scheduling at an edge node. The objective in [4] is to identify the state of a linear dynamic system, whereas in [5, 6], the focus is on effectively addressing client queries regarding the state of a linear dynamic system. Thus, the adopted in [4] corresponds to the mean square error of the state estimation. Meanwhile, is defined in [5, 6] as the difference between of query response relative to the prior and posterior estimates of the state estimator [7]. Here, prior and posterior estimates denote estimates obtained before and after the sensor transmission, respectively. Furthermore, [4, 5, 6] exploit a key advantage offered by RSE, namely, the ability to observe system states by selectively polling a subset of sensors. In [4], the sensor scheduling strategy is devised to minimize the state estimation , whereas in [5, 6], it aims to minimize the query response .
Query | Definition, |
---|---|
Current state | |
Maximum component | |
Count range | |
Sample mean | |
Sample variance | |
∗ Herein, . |
Fascinatingly, a closed-form mathematical expression for the query response can be obtained for certain queries like sample mean, sample variance, and current state. Thus, sensor reporting scheduling strategies for such queries can be determined analytically, as depicted in [5]. Conversely, for queries such as the maximum system state component and count range, deriving closed-form mathematical expressions for the query response proves to be unattainable. Therefore, addressing such queries necessitates the utilization of advanced approaches such as deep reinforcement learning (DRL) to tackle the sensor reporting scheduling problem, as outlined in [6].
Note that the proposals in [4, 5, 6] have one common flaw: they assume that the linear dynamic system model is perfectly known at the edge node, a prerequisite for Kalman filter-based RSE. Unfortunately, obtaining such information is often challenging or even impossible, especially in the case of a non-linear dynamic system (NLDS). Moreover, the Kalman filter cannot even deal with NLDS. Besides, in [6], a sensor must be polled at every time step, even when there are no client queries, resulting in unnecessary depletion of sensor energy. Apart from that, the complete state of the Kalman filter is provided as input in [6] to its DRL-based sensor scheduler. This input significantly inflates the size of the deep neural network (DNN) utilized by the DRL-based sensor scheduler, as it must also extract relevant information from the input. On top of that, time instances where no queries are posed are treated uniformly, providing the same reward to the DRL-based sensor scheduling algorithm on all those time instances. Consequently, the proposal in [6] struggles to determine the optimal action in the absence of queries.
Considering the aforesaid deets regarding NLDS and RSE-assisted-sensor reporting scheduling as our motivation, we propose a novel approach termed goal-oriented scheduling (GoS) for IoT sensors tasked with monitoring NLDS. In our goal-oriented communication (GoC) system model, illustrated in Fig. 1, clients pose queries about the NLDS state to the edge node, which then orchestrates sensor reporting scheduling to gather partial yet informative sensor observations. These observations are utilized by the edge node to perform RSE and address client queries. The sole motive of sensor reporting scheduling is to minimize the of future query responses, hence the phrase goal-oriented scheduling. Within our system model, the edge node employs a DRL-based scheduler, which decides whether to poll a sensor at each time step. We have devised a reward function such that our DRL-based sensor scheduler makes judicious decisions even when no queries are posed. Furthermore, the edge node utilizes the observation from the polled sensor and the cubature quadrature Kalman filter (CQKF) [11] to estimate the entire NLDS state and respond to the client queries. However, since CQKF requires a mathematical model for the NLDS, we employ Holt’s method [12, 11] to iteratively estimate it. Additionally, we provide a specific attribute of the CQKF state as input to our DRL-based sensor scheduler. This input not only aids in minimizing the query response but also significantly shrinks the size of the DNN utilized by our DRL-based sensor scheduler. Lastly, we weigh the performance of our proposed scheduler against two benchmark schedulers: the scheduler adopted in [6] and the Monte Carlo scheduler. Our complexity analysis indicates that the proposed scheduler exhibits the least complexity among the considered schedulers. Moreover, the numerical results reveal that, depending on the query type, our proposed scheduler either further minimizes the query response or obtains a resembling relative to the benchmark schedulers. In any case, this is accomplished by reducing the number of sensor transmissions, thereby saving sensor energy.
The paper is structured as follows. Section II delineates the system model. Section III describes the components of the GoS framework and presents the scheduling problem. Section IV introduces benchmark schedulers and Section V discusses the computational complexities of all the considered schedulers. Section VI presents the numerical results. Lastly, Section VII concludes the paper and outlines potential avenues for future research.
Notation: and denote the argument of the maximum function and the maximum function itself, respectively. Similarly, and denote the argument of the minimum function and the minimum function itself, respectively. The cardinality of a set is represented by , while the transpose operation is denoted by . Column vectors/matrices are indicated by boldface lowercase/uppercase letters. The determinant, trace of a square matrix, and the expected value are denoted by , and , respectively. and signify the identity matrix and null vector of dimension , respectively. Additionally, denotes a vector of dimension with all elements set to zero except the element, which is . The sets and represent real vectors of dimension and non-negative integer vectors of dimension , respectively. A Gaussian sample vector with mean and covariance matrix is denoted as . Meanwhile, a Gaussian sample observation with mean and covariance is denoted as . The indicator function, Cholesky decomposition, sample variance, and uniform distribution between and are denoted by , , , and , respectively.
II System Model
Consider the GoC system illustrated in Fig. 1. In this system, an edge node receives data from sensors indexed by and is tasked with responding to queries from a set of remote clients. A query from client is a request for the value of the function , while the edge node responds to it with an estimate . Each client asks a different type of query about the system state. The system operates in discrete time slots, labeled as . In each slot, the edge node decides whether to poll a single sensor or refrain from doing so. The sensors observe NLDS, with its state represented as
(1) |
where is the dimensionality of the NLDS state, represents a nonlinear state dynamics (NLSD) function, and denotes the Gaussian noise with zero mean and covariance .
The sensors observe the system state as captured by
(2) |
Herein, represents the observation matrix, and is the zero-mean Gaussian measurement noise with covariance matrix . Additionally, we model the channel between sensor and edge node as a packet erasure channel with a transmission error probability .
III Goal-Oriented Scheduling
The proposed GoS framework comprises the following three key components: state estimator, sensor scheduler, and query process at the clients. Detailed descriptions of each component are provided next.
III-A System State Estimator
We employ CQKF for NLDS state estimation. As initialization, CQKF requires cubature quadrature (CQ) points and their corresponding weights , whose computation procedure is available in Algorithm 1. Initially, we determine the cubature points , which are the intersection points of the unit -hyper-sphere and its axes. For example, the unit -hyper-sphere, also known as the unit circle, has and as its four cubature points, which are basically the intersection points of the unit -hyper-sphere with its axes. Likewise, the unit -hyper-sphere has , as its cubature points. Subsequently, we compute the roots of the Chebyshev-Leguerre (CL) polynomial, known as quadrature points. Here, the CL polynomial is given as
(3) |
where . Consider . To compute quadrature points, we first have to formulate the companion matrix corresponding to , where
(4) |
Next, we formulate the characteristic polynomial of , which is , here corresponds to the eigenvalues of . Note that, . Therefore, the eigenvalues of are the roots of . Finally, we determine and by utilizing the cubature and quadrature points in step 3 of Algorithm 1, respectively. Note that, in step 3 of Algorithm 1 is the first derivative of at .
The CQKF is detailed in Algorithm 2 and encompasses two steps: prediction step and update step, elaborated thoroughly in Algorithm 3 and 4, respectively.
III-A1 Prediction step
The prediction step computes the prior estimates, and . At the outset, we compute the Cholesky decomposition of the previous posterior covariance , which is further put into service to determine the sampling points . Later on, Holt’s method, elucidated in the next paragraph, transforms into the updated sampling points . At last, we compute and by utilizing in step 4 and 5 of Algorithm 3.
Knowing is essential to transform into , but such information is not available at the edge node. Therefore, we opt for Holt’s method, described in detail in Algorithm 5, a reliable way to model the NLSD function . Holt’s method estimates according to the expression available in step 1, which is updated at each time step with the help of the following smoothing parameters: . Here, are constants, while are variables whose update procedure is mentioned is step 2 and 3 of Algorithm 5.
III-A2 Update step
In the update step, we compute the Cholesky decomposition of . Following this, we determine the sampling points , which undergo a linear transformation to become the updated sampling points , as delineated in step 3 of Algorithm 4. Subsequently, we compute a vector , representing the predicted sensor measurements, which is then put into service to determine the innovation error covariance , cross-covariance , and Kalman gain . Lastly, we compute and by employing , , and in step 8 and step 9 of Algorithm 4. Here, denotes the measurement of the polled sensor.
III-B Query Process and Query Response
The query process can be modeled as a Markov chain (MC). Each client operates independently, following its own MC, with its state at time denoted as , governed by a known transition matrix . Client always requests the same function when its MC is within a subset of states, denoted as , where . Besides, the state of each client remains unknown to the edge node.
III-C GoS Problem
The problem is to anticipate future queries and schedule sensor transmissions to minimize the on future query responses. This task demands foresight, necessitating an understanding not only of the monitored NLDS but also of the query process and the interplay among various query functions.
We can model the GoS problem at the edge node as a partially observable Markov decision process (POMDP), in which the edge node must decide whether to poll a sensor. Herein, the action space is , where action signifies no device is polled, and action represents sensor is polled.
Before initiating the sensor scheduling operation, the edge node possesses prior estimates. Moreover,
(6) |
Consequently, the state in POMDP can be represented as , where and the state space is . However, the edge node lacks knowledge of , instead possessing information about the time that elapsed since the last query [6]. Consequently, the edge node has an observation , with an observation space .
The reward in POMDP is defined as
(9) |
where , denotes the selected action, while , signifies the relative importance of client . Additionally, we presume that is known to the edge node.
The long-term reward can be stated as
(10) |
where is the exponential discount factor. Moreover, represents the policy which maps to , where encompasses the probability of selecting each action. Finally, the GoS problem can be defined as [6]
(11) |
where represents the optimal policy.
Parameters | Values |
---|---|
Input dimension | |
Output dimension | |
Number of hidden layers | |
Hidden layers dimension | |
Activation function | ReLU |
Optimizer | RMSProp |
Initial learning rate | |
Mini-batch size | |
Memory buffer size [13] | |
Exponential discount factor [6] | |
Threshold for global norm of gradient vector [13] | |
(initialize) | |
(initial value) | |
III-D CQKF-cum-DRL-based Scheduler
We solve using DRL, thus, we name our scheduler as CQKF-cum-DRL-based scheduler, described in detail in Algorithm 6. Meanwhile, we are maintaining two DNNs, named online network and target network, to improve the stability of our DRL scheduler. For insights into the architecture of both networks, refer to Table II. A schematic of our proposed GoS is available in Fig. 2.
Algorithm 6 operates as follows. Initially, it computes the prior estimates to formulate . Subsequently, the online network, characterized by its weights , takes as its input and outputs the action values . Here, serves as an estimate of the reward that the scheduler would gain if action is chosen. The -greedy method then employs the action values to select an action . Primarily, the -greedy method opts to select as the argument of the maximum action value. However, to explore the whole action space, the -greedy method occasionally opts to select randomly from the set . The former operation is called exploitation, while the latter, is exploration. The posterior estimates are then reckoned according to steps 2-7 of Algorithm 2. Subsequently, , gained by the online network for selecting action , is computed using Algorithm 7. If there is no query, then utilize as the reward, to convey the mean square error in the posterior estimate to the DRL scheduler. Note that, because of , the reward expression provides an extra incentive to the DRL scheduler for selecting action-0, in case of no query. However, if a query has been asked, the subsequent procedure must be followed. At first, compute , required in . The computation of involves taking samples from a Gaussian distribution with mean and covariance . These samples are then utilized to obtain the vector , where and is the sample. The variance of yields . Once , has been computed, reckon using .
Now that both and are available, we proceed to store as , i.e., tuple, in the finite memory and increase by . Here, represents the number of tuples available in . If is full, we remove from and decrease by before storing the new tuple. Following this, we update the target network weights, denoted as , by setting , if the counter reached its threshold value, herein set to .
Next, the training process for the online network commences by sampling a mini-batch of size from . Then, we provide , i.e., fourth element of , as input to the target network and obtain its output . Now, we utilize outputs of the target network to determine the target values as
(12) |
for . Not to mention, , is an estimate of . Thereupon, we provide , as input to the online network. The corresponding target values , serve as labels for updating by minimizing using RMSProp optimizer, where
(13) |
To deal with the exploding gradient problem during the online network’s training phase, we perform the gradient-norm clipping [14]. This involves clipping the gradient vector as
(14) |
Herein, represents the threshold value for and the vector stores the clipped gradients. At last, to emphasize exploitation over exploration in the -greedy method, it is necessary to gradually decrease . Thus, we reduce by , unless it has already reached .
IV Benchmark Schedulers
IV-A Monte Carlo scheduler
The Monte Carlo scheduler, described in detail in Algorithm 8, is adopted as a benchmark due to its versatility in handling any query type. For a given client , Algorithm 8 operates as follows. Initially, it computes the prior estimates, and then subsequently, in an iterative manner, distinct Gaussian samples are drawn for sensor in step 11, by computing distinct posterior estimates either in step 7 or in step 9 depending on the inequality in step 5. The Gaussian samples are then employed to compute distinct query responses in step 12, in an iterative manner. These query responses are stored in . Next, in step 14, is computed and stored in . Here represents expected in case sensor is polled. Repeat the procedure outlined from step 3 to step 14 a total of times, to calculate for every sensor. Now, in step 16, a sensor is polled, whose index value corresponds to the index of the minimum element in . Following this, to compute the actual in step 18, Algorithm 8 again computes the posterior estimates by leveraging the received observation from the polled sensor.
Indeed, it is worth mentioning that the Monte Carlo scheduler does come with limitations. Unlike the proposed CQKF-cum-DRL-based scheduler, we need to design Monte Carlo schedulers in the case of clients. Moreover, the Monte Carlo scheduler does not even take into account the information related to the query requests while polling a sensor. It simply polls a sensor whenever a query is asked.
Note that the utilization of CQKF necessitates modifications to the original Monte Carlo scheduler available in [5]. Specifically, we have modified the procedure by relocating the computation of prior estimates, moving it outside the for loops present in steps 2 and 3. This alteration is due to the use of Holt’s method, whose smoothing parameters should only be updated once per time step.
IV-B Benchmark DRL Scheduler
Our second benchmark scheduler adopts the action space, POMDP state/observation space, reward function, and online and target network architecture utilized by the scheduler in [6]. The working of the benchmark DRL scheduler is same as the one described in Algorithm 6, except for the following changes:
-
•
Change to , indicating that the edge node must poll a sensor at every time step.
-
•
In Algorithm 6, provide , with , as input to the online network. Here, indicates the complete state of CQKF after the prediction step.
- •
-
•
Change the online and target network architecture by increasing the number of hidden layers to three, having neurons and a dropout probability of , respectively.
Thus, the distinctive features that set apart the benchmark DRL scheduler from the proposed scheduler are its action space, observation space, reward function, and DNN architecture.
V Complexity of the Considered Schedulers
Operations | Complexity | Operations | Complexity |
---|---|---|---|
ReLU | |||
Inequality | |||
Draw from |
Herein, we quantify the computational complexity of our considered schedulers in terms of the number of arithmetic operations they perform. Table III presents the complexity expressions for fundamental operations utilized in the algorithms. Note that the complexity expressions for our considered schedulers pertain specifically to the complexity associated with making a scheduling decision at a single time step.
Notice that, because of step 5 of Algorithm 8, deriving an exact expression for the complexity of the Monte Carlo scheduler is not feasible. However, we can derive expressions for both the lower and upper bound of the complexity of the Monte Carlo scheduler. The lower bound expression pertains to the case that the inequality in step 5 of Algorithm 8 is never satisfied. Conversely, the upper bound expression represents the case that the aforesaid inequality is always satisfied. The lower and upper bound complexity expressions are given by
(15a) | ||||
(15b) |
respectively. By taking into account the dominant terms in and , the final lower and upper bound complexity expressions for the Monte Carlo scheduler, in terms of big-O notation, are given by
(16a) | ||||
(16b) |
The complexity expression for the proposed scheduler is the summation of the complexities across three distinct phases: action values generation phase, action selection phase, and training phase. The complexity expressions for first and third phase are and , respectively, as derived in [6]. Here, with and , while the remaining elements of are the hidden layer sizes. Moreover, because of steps 4- 7 of Algorithm 6, the complexity of the second phase falls within the range . In the case of the proposed scheduler, . Thus, the lower and upper bound complexity expressions for the proposed scheduler are
(17a) | ||||
(17b) |
respectively. By taking into account the dominant terms in and , the final complexity expression for the proposed scheduler is given by
(18) |
|
|
|
|||||||
---|---|---|---|---|---|---|---|---|---|
|
|
|
|||||||
|
|
|
|||||||
|
|
|
|||||||
|
|
|
As mentioned in subsection IV-B, the working of the benchmark DRL scheduler is the same as the proposed scheduler. Thus, the general complexity expression for the benchmark DRL scheduler is the same as the ones derived for the proposed scheduler. However, this time . Thus, the lower and upper bound complexity expressions for the benchmark DRL scheduler are
(19a) | ||||
(19b) |
respectively. By taking into account the dominant terms in and , the final complexity expression for the benchmark DRL scheduler is given by
(20) |
From , and , we observe that the proposed scheduler has quadratic computational complexity, while the benchmark schedulers have polynomial computational complexity. Moreover, by taking into account, , , , , , , the complexity ranges of the considered schedulers for various system parameter configurations are available in Table IV. As can in seen in Table IV, both lower and upper bound complexities of the proposed scheduler are extremely small for all the system parameter configurations. Specifically, the upper bound complexity of the proposed scheduler is significantly lower than the benchmark schedulers. Furthermore, its notably low complexity renders it suitable for implementation on an embedded processor-based edge node.
VI Results
Parameters | Values | ||
---|---|---|---|
(initial value) | |||
(initial value) | |||
(initial value) | |||
(initial value) | |||
|
|||
|
|||
|
Our simulations consider the following two NLSD functions
(21a) | ||||
(21b) |
where , and signifies the element-wise product. Notably, (21a) and (21b) lead to NLDSs with non-correlated and correlated states, respectively. Furthermore, we model the query process at the client side using periodic and memoryless MC, depicted in Fig. 3. Herein, a client generates a query when its corresponding MC reaches state A. Table VI showcases the information about the clients and the queries asked by them, for the case of . Note that, the parameter in Table VI refers to the MC combinations possible at the client side.
Parameters | Client-1 | Client-2 | ||||
---|---|---|---|---|---|---|
|
|
|||||
Memoryless MC | Memoryless MC | |||||
Memoryless MC | Periodic MC | |||||
Query asked | Maximum query | Count range query |
The performance evaluation of the schedulers is performed over a duration of time steps through , and action selection frequency (ASF) metrics. Besides, we are reckoning the duration of the first time steps as a warm-up period for Holt’s method. Consequently, any actions selected and values, , computed during the warm-up period are discarded.
Proposed Scheduler | |||||||||
|
|
|
|||||||
|
|||||||||
|
|||||||||
|
|||||||||
|
Considering NLSD (21b), the bar-plots in Fig. 4 reveal that the action-0 is the most adopted by the proposed scheduler among all of its possible actions. Moreover, the ASFs of all of its remaining actions are below . This dominance of action-0 stems from the reward , which incentivizes the proposed scheduler to opt for the action-0 in the absence of queries. Besides, opting for action-0 minimizes sensor transmissions, consequently saving sensor energy. Meanwhile, the Monte Carlo scheduler predominantly selects action-1 across all , resulting in a substantial amount of energy depletion at sensor-1. On the other hand, the ASFs of most of the actions are below across all when using the benchmark DRL scheduler. However, ASFs obtained through benchmark schedulers are still higher than those obtained through the proposed scheduler. Furthermore, the proposed scheduler requires the lowest number of sensor transmissions in every , as evidenced by Table VII. Consequently, in comparison to the proposed scheduler, the sensor energy depletion is relatively higher in the case of benchmark schedulers.
As illustrated through the box-plots in Fig. 5, the benchmark schedulers obtain a lower of the maximum query response compared to the proposed scheduler. However, note that the values for all three schedulers are varying in the range of . Thus, the disparity in of maximum query response obtained in the case of the proposed scheduler and benchmark schedulers is marginal.
Considering NLSD (21b), the box-plots in Fig. 6 unfolds that the proposed scheduler leads to a decline in of count range query response, relative to the benchmark schedulers, across all . Meanwhile, in the case of NLSD (21a), the proposed and benchmark schedulers obtain similar .
Furthermore, as illustrated in Fig. 5 and 6, the proposed scheduler exhibits superior performance in count range query compared to maximum query when contrasted with benchmark schedulers. This disparity arises because the of the maximum query response is notably more susceptible to outliers within the data gathered in , in the steps steps 3-4 of Algorithm 7. Consequently, the of the maximum query response, refer to step 5 of Algorithm 7, typically fails to offer accurate insights into the central tendency of the collected data. Therefore, estimating a satisfactory of the maximum query response in the case of the proposed scheduler necessitates a higher value of . Fig. 7 proves this claim, as increasing from to has actually minimized the of the maximum query response in the case of the proposed scheduler. An increment in would lead to an increase in the number of sensor transmissions, which, in turn, improves the accuracy of posterior estimates. Consequently, this leads to a decline in the number of outliers within . However, increasing the value of has one significant drawback, which is an increase in the number of sensor transmissions. Table VII shows that increasing from to has significantly increased the number of sensor transmissions.
Based on the preceding discussion, it is apparent that the proposed scheduler either succeeds in reducing or obtains a resembling , relative to the benchmark schedulers. Furthermore, the proposed scheduler accomplishes this by reducing the number of sensor transmissions. The key to the satisfactory performance of the proposed scheduler lies in its input. Instead of feeding the complete prior state of CQKF, i.e., , as input to the DRL scheduler, we provide a specific attribute of the prior state of CQKF, which is . As mentioned in Section III-C, reflects the mean square error in the prior estimate. By using as input, the DRL scheduler focuses solely on selecting the most fruitful action, which later minimizes . In contrast, providing the complete prior state of CQKF as input, as done with the benchmark DRL scheduler, adds the extra workload of extracting the valuable information from the input to the DRL scheduler.
Meanwhile, relieving the DRL scheduler of the aforementioned extra workload positively impacts its ability to leverage correlation among NLDS states. In Fig. 6, for NLSD (21b), the proposed scheduler demonstrates a comparatively superior ability to capitalize on the correlation among NLDS states compared to the benchmark schedulers. Better exploitation of correlation implies that the proposed scheduler possesses superior insights about the most fruitful sensor during the time of sensor polling. This, in turn, yields posterior estimates that are relatively better than the ones obtained in the case of the benchmark schedulers. Consequently, this leads to a decline in its of count range query response, relative to the benchmark schedulers. However, in the case of NLSD (21a), no such correlation among NLDS states is available for the proposed scheduler to exploit, leading to its of count range query response similar to the benchmark schedulers.
Moreover, because of extra workload, the benchmark DRL scheduler requires a more complex DNN architecture, featuring three hidden layers with neurons. In contrast, the DNN architecture of the proposed scheduler comprises just one hidden layer with four neurons. This streamlined architecture is another advantage of utilizing as input.
Fig. 8 considers the scenario where alongside the maximum and count range queries, two additional queries, sample mean and variance, are posed to the edge node by two additional clients. Note that there is a negligible disparity between of query responses obtained in the case of the proposed scheduler and benchmark schedulers, for the maximum, sample mean and variance queries. Besides, Fig. 8 manifests that the proposed scheduler leads to a decline in of count range query response, relative to the benchmark schedulers, when factoring in NLSD (21a). Meanwhile, in the case of NLSD (21b), the of the count range query response closely resembles, for all three schedulers. Finally, even with an increase in the number of clients, the performance of the proposed scheduler has not been degraded relative to the benchmark schedulers.
VII Conclusion
This paper introduced a GoS method tailored for IoT sensors tasked with sensing NLDS. The reporting operation is scheduled by the edge node and the phrase goal-oriented in GoS emphasizes its primary objective, which is to accurately respond to client queries regarding the NLDS state. Through GoS, the edge node gathers partial yet insightful sensor observations to advance towards its objective. These observations, along with a state estimator, are used to estimate the complete NLDS state, which is later employed to generate query responses. Notably, our state estimator operates effectively without necessitating an NLDS mathematical model. Moreover, our findings showed that the proposed GoS yields an energy-efficient state observation from the sensor perspective.
Our work here considers only a single RL agent due to the centralized nature of the scheduling. A promising avenue for future research would be to adapt the proposed goal-oriented sensor scheduling framework to a multi-agent RL system, such as unmanned aerial vehicle swarm where each RL agent acts as a sensor scheduler.
References
- [1] O. L. A. López, O. M. Rosabal, D. E. Ruiz-Guirola, P. Raghuwanshi, K. Mikhaylov, L. Lovén, and S. Iyer, “Energy-sustainable IoT connectivity: Vision, technological enablers, challenges, and future directions,” IEEE Open Journal of the Communications Society, vol. 4, pp. 2609–2666, 2023.
- [2] P. Di Lorenzo, M. Merluzzi, F. Binucci, C. Battiloro, P. Banelli, E. C. Strinati, and S. Barbarossa, “Goal-oriented communications for the IoT: System design and adaptive resource optimization,” IEEE Internet of Things Magazine, vol. 6, no. 4, pp. 26–32, 2023.
- [3] C. Zhang, H. Zou, S. Lasaulce, W. Saad, M. Kountouris, and M. Bennis, “Goal-oriented communications for the IoT and application to data compression,” IEEE Internet of Things Magazine, vol. 5, no. 4, pp. 58–63, 2022.
- [4] A. Hashemi, M. Ghasemi, H. Vikalo, and U. Topcu, “Randomized greedy sensor selection: Leveraging weak submodularity,” IEEE Transactions on Automatic Control, vol. 66, no. 1, pp. 199–212, 2021.
- [5] F. Chiariotti, A. E. Kalør, J. Holm, B. Soret, and P. Popovski, “Scheduling of sensor transmissions based on value of information for summary statistics,” IEEE Networking Letters, vol. 4, no. 2, pp. 92–96, 2022.
- [6] J. Holm, F. Chiariotti, A. E. Kalør, B. Soret, T. B. Pedersen, and P. Popovski, “Goal-oriented scheduling in sensor networks with application timing awareness,” IEEE Transactions on Communications, vol. 71, no. 8, pp. 4513–4527, 2023.
- [7] D. Gündüz, F. Chiariotti, K. Huang, A. E. Kalør, S. Kobus, and P. Popovski, “Timely and massive communication in 6G: Pragmatics, learning, and inference,” IEEE BITS the Information Theory Magazine, vol. 3, no. 1, pp. 27–40, 2023.
- [8] Z. Liu, A. Clark, P. Lee, L. Bushnell, D. Kirschen, and R. Poovendran, “Towards scalable voltage control in smart grid: A submodular optimization approach,” in Proceedings of the ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS), 2016, pp. 1–10.
- [9] V. Tzoumas, M. A. Rahimian, G. J. Pappas, and A. Jadbabaie, “Minimal actuator placement with optimal control constraints,” in Proceedings of the American Control Conference (ACC), 2015, pp. 2081–2086.
- [10] A. Li, S. Wu, S. Meng, and Q. Zhang, “Towards goal-oriented semantic communications: New metrics, open challenges, and future research directions,” arXiv preprint arXiv:2304.00848, 2023.
- [11] S. K. Nanda, “Advanced Kalman filtering with applications to power system and epidemiological data analysis,” PhD dissertation, Indian Institute of Technology Indore, May 2023.
- [12] G. Valverde and V. Terzija, “Unscented Kalman filter for power system dynamic state estimation,” IET Generation, Transmission & Distribution, vol. 5, pp. 29–37, Jan. 2011.
- [13] O. Nabati, T. Zahavy, and S. Mannor, “Online limited memory neural-linear bandits with likelihood matching,” in Proceedings of the International Conference on Machine Learning (ICML), Jul. 2021, pp. 7905–7915.
- [14] “Tensorflow.” [Online]. Available: https://www.tensorflow.org/api_docs/python/tf/clip_by_global_norm