AI Enabled Distributed Edge Computing - MS Thesis
AI Enabled Distributed Edge Computing - MS Thesis
Fall 11-10-2020
Recommended Citation
Fragkos, Georgios. "Artificial Intelligence Enabled Distributed Edge Computing for Internet of Things
Applications." (2020). https://digitalrepository.unm.edu/ece_etds/494
This Thesis is brought to you for free and open access by the Engineering ETDs at UNM Digital Repository. It has
been accepted for inclusion in Electrical and Computer Engineering ETDs by an authorized administrator of UNM
Digital Repository. For more information, please contact disc@unm.edu.
Georgios Fragkos
by
Georgios Fragkos
THESIS
Master of Science
Computer Engineering
December, 2020
Dedication
To my parents, Kostas and Gioula, and brother John, who have constantly
supported, loved, and encouraged me throughout my graduate degree.
iii
Acknowledgments
I would like to thank my Ph.D. advisor, Dr. Eirini Eleni Tsiropoulou, for the
opportunity that she provided me with to pursue my research interests in Reinforce-
ment Learning and Game Theory, and being there for me when I needed guidance.
I also want to acknowledge both Dr. Pattichis and Dr. Sun for being a part of my
thesis committee.
I would also like to thank my friends from the PROTON lab for their help,
especially when I was new to this type of research environment. I want to say thank
you to all of my colleagues that helped me to overcome any obstacles that I came
across during my research journey.
Finally, I want to express my gratitude towards my parents and brother for
providing me with unfailing support and continuous encouragement throughout my
years of study. This accomplishment would not have been possible without them.
Thank you.
iv
Artificial Intelligence Enabled Distributed
Edge Computing for Internet of Things
Applications
by
Georgios Fragkos
Abstract
Artificial Intelligence (AI) based techniques are typically used to model decision
making in terms of strategies and mechanisms that can conclude to optimal payoffs
for a number of interacting entities, often presenting competitive behaviors. In this
thesis, an AI-enabled multi-access edge computing (MEC) framework is proposed,
supported by computing-equipped Unmanned Aerial Vehicles (UAVs) to facilitate
Internet of Things (IoT) applications. Initially, the problem of determining the
IoT nodes optimal data offloading strategies to the UAV-mounted MEC servers,
while accounting for the IoT nodes’ communication and computation overhead, is
formulated based on a game-theoretic model. The existence of at least one Pure
Nash Equilibrium (PNE) point is shown by proving that the game is submodular.
Furthermore, different operation points (i.e., offloading strategies) are obtained and
studied, based either on the outcome of Best Response Dynamics (BRD) algorithm,
or via alternative reinforcement learning approaches, such as gradient ascent, log-
linear and Q-learning algorithms, which explore and learn the environment towards
v
determining the users’ stable data offloading strategies. The respective outcomes
and inherent features of these approaches are critically compared against each other,
via modeling and simulation.
vi
Contents
List of Figures ix
List of Tables xi
Glossary xii
1 Overview 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
vii
Contents
2.4.3 Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Experiments 28
References 44
viii
List of Figures
ix
List of Figures
x
List of Tables
xi
Glossary
t Timeslot instance
(t)
Td Computation task of IoT device d
(t) (t)
Id Total amount of data of IoT device’s task’s Td
(t) (t)
„d Computation intensity of IoT device’s task’s Td
(t)
ad,j Percentage of the overall amount of the device’s computation task’s
data
(t)
Rd IoT device’s d uplink data rate to the UAV-mounted MEC server
W System’s bandwidth
(t)
pd Device’s transmission power
(t)
gd Device’s channel gain to communicate with the UAV
xii
Glossary
(t)
Otime,d Time overhead experienced by the IoT device d
(t)
Oenergy,d Energy overhead experienced by the IoT device d
T Duration of timeslot t
(t)
ed Energy availability of the IoT device d
(t)
Od Total experienced normalized overhead of IoT device d
tr,e
Oij Transmission energy overhead
(t)
Ud Utility Function
(t)
a≠d,j Data offloading strategy vector of all IoT devices excluding d
(t)ú
ad,j Optimal data offloading strategy of IoT device d
(t)ú
a≠d,j Data offloading vector of the IoT devices at PNE point excluding d
(t)ú
BRd (a≠d,j ) IoT Device’s d best offloading response
(ite)
Pd Action probability vector of IoT device d
xiii
Glossary
ˆ(t) (ite)
[Ud ] Normalized utility function
(ite)
Qd (a) Action values vector of IoT device d
(ite)
Q (t) Q-value of an offloading strategy for IoT device d
ad,j
xiv
Chapter 1
Overview
1.1 Introduction
The rapid deployment of Internet of Things (IoT) devices [1, 2], such as sensors,
smartphones, autonomous vehicles, wearable smart devices, along with the recent
advances in the Artificial Intelligence (AI) and Reinforcement Learning (RL) tech-
niques [3], have paved the way to a future of using distributed edge computing to assit
humans’ everyday activities, in several domains such as transportation, healthcare,
public safety and others [4–6]. The ubiquity of the IoT devices with enhanced sensing
capabilities creates increasingly large streams of data that need to be collected and
processed in an energy and time efficient manner.
1
Chapter 1. Overview
ture model at the edge of the network, by offering computational resources closer to
the physical location of data producers/consumers [9–11].
Except for Game Theory, another important feature that enables the research
community to take advantage of the AI’s power is Reinforcement Learning (RL).
2
Chapter 1. Overview
RL is a subset of Machine Learning, where the agents learn to achieve a goal, i.e.,
maximize the expected cumulative future reward, in an uncertain and potentially
complex environment which demonstrates dynamic variability and stochasticity [29–
32]. Since the agent’s actions have short and long term consequences, the agent
needs to gain some understanding of the complex effects its actions have on the
environment and it should find the perfect balance between exploration (exploring
potential hypotheses in terms of choosing its actions) and exploitation (exploiting
limited knowledge about what is already learned should work in a satisfactory way).
The main difference between RL and the traditional Supervised Learning [33] is
that there is no need for labeled input/output pairs and that RL focuses on finding a
balance between exploration and exploitation, achieving thus near-optimal solutions.
The latter observation reveals that the reinforcement learning techniques can be
applied in a real-time decision-making problem, which is and important component
within the dynamically changing networking and communications environments. The
selected actions of the agents transition the current state of the environment to
the next state and finally the agents experience a reward as a feedback from the
environment.
3
Chapter 1. Overview
4
Chapter 1. Overview
by jointly optimizing the devices’ data offloading, transmission power, and the UAVs’
trajectory. In [42], the problem of partial data offloading from the IoT devices to
ground or UAV-mounted MEC servers is studied in order the devices to satisfy their
minimum Quality of Service (QoS) prerequisites, by adopting the novel concept of
Satisfaction Equilibrium. In [43], the authors target at UAVs energy-efficiency, where
the authors aim at extending the UAVs’ battery lifetime by jointly optimizing their
hovering time, and the devices’ scheduling and data offloading, while considering the
constraints of the UAVs’ computation capability and the devices’ QoS constraints.
A similar problem is studied in [44] by exploiting the uplink and downlink commu-
nication among the devices and the UAVs in terms of data offloading/receiving data
respectively, while guaranteeing the energy efficient operation of the overall system.
5
Chapter 1. Overview
order to introduce a more social behavior to the users with respect to competing for
the UAV-mounted MEC servers’ computation resources.
In [49], the UAVs act as cache and edge computing nodes, and two sequentially
solved optimization problems are considered, to minimize the communication and
computation delay and maximize the energy efficiency of the system. In [50], the
UAVs act both as MEC servers and as wireless power transfer nodes charging the IoT
devices. The problem of maximizing the UAVs’ computation rate is examined under
the UAVs’ energy provision and speed constraints. This problem has been extended
in [51] by studying the minimization of the overall system’s energy consumption by
jointly optimizing the devices’ association to the UAVs, the UAVs’ flying time, and
their wireless powering duration.
The authors in [52] study a MEC environment, where a UAV is served by cellular
ground base stations (GBSs) for computation offloading. Since they aim at minimiz-
ing the UAV’s computation offloading scheduling time by optimizing its trajectory
subject to the maximum speed constraint of the UAV and the computation capacity
constraints at the GBSs, they propose an iterative algorithm based on Successive
Convex Approximation (SCA) which obtains near-optimal solutions. In [53] and [54]
the traditional problem where a set of ground users offload tasks to a UAV has been
extended, since the authors examine an UAV-assisted MEC architecture where the
UAV has a twofold role, i.e., contributing in users’ task execution and acting as a
relay node for offloading the users’ received computation tasks to an access point
(AP). The non-convex minimization of the weighted sum energy of both the users
and the UAV is achieved using a centralized iterative algorithm. Similarly, in [55]
a two-hop uplink communication for Space-Air-Ground Internet of Remote Things
(SAG-IoRT) networks is studied, which is assisted with UAV relays in order to fa-
cilitate complete offloading of the terrestrial smart devices to satellites. They target
at maximizing the whole system’s energy efficiency by jointly optimizing the sub-
6
Chapter 1. Overview
channel selection, uplink transmission power control and the UAV relay deployment.
The authors in [56] introduce a UAV-enabled MEC system, where the UAVs act
jointly as relay and data processing nodes to support the communication and com-
puting demands of the ground devices. A joint optimization problem is formulated
to minimize the service delay of the ground devices and the UAVs by determining
the UAVs optimal position, the communication and computing resource allocation,
and the devices’ task splitting.
Additionally, many research papers deal with the problem of data offloading
among a cluster of UAVs. More specifically, in [59] the authors propose the Fog
Computing aided Swarm of Drones (FCSD), where a drone will have a computation
task to execute and will partially offload its data to nearby drones in order to perform
the computations, thus acting as fog nodes. The scope of this research work is to
minimize the energy consumption of the FCSD system subject to the reliability and
latency constraints by introducing and utilizing an iterative distributed algorithm
based on the Proximal Jacobi method. As far as [60] is concerned, a network of
capacitated UAV-mounted cloudlets (NUMC) covering a region is considered, where
each UAV is endowed with limited computational resources and a restricted capacity
7
Chapter 1. Overview
providing edge computing services to IoT users in that region. The UAVs perform
binary offloading to other UAVs and as a consequence the authors formulate an ex-
act potential game in order to capture the UAVs’ competitive behavior in terms of
minimizing their energy consumption with respect to the QoS satisfaction of the IoT
users’ requirements. Moreover, the research work [61] proposes a task-scheduling
algorithm based on reinforcement learning targeting at the collaboration of multiple
UAV tasks within a UAV cluster. Specifically, the proposed algorithm enables the
UAV to adjust its task strategy automatically using its calculation of task perfor-
mance efficiency, while reinforcement learning has been deployed in order the UAVs
to learn tasks according to real-time data and as a consequence to perform decision
making regarding the channel allocation problem in a distributed manner.
8
Chapter 1. Overview
order to maximize the total system throughput within the lifetime of the SUAVs,
by optimizing the energy harvesting and resource allocation of the power cognitive
SUAVs.
In [67] the authors focus on the UAVs’ data offloading to a MEC server and thus,
they formulate a multi-nature strategy non-cooperative game among the UAVs tak-
ing into consideration the energy consumption, time delay and computation cost.
As a result, they prove the existence of a Pure Nash Equilibrium and propose a
distributed algorithm to determine the UAVs’ strategies at the PNE point. This re-
search work is sufficiently extended in [68], where the authors also aim at minimizing
a combination of energy overhead and delay for each UAV concurrently. Addition-
ally, in [69] the authors examine a UAV-assisted crowd surveillance use case, where
the UAVs acquire videos from cameras on the ground and they perform computation
either on board or at the ground servers. The research work in [70] studies the joint
optimization problem of the UAV’s trajectory and radio resource allocation via a
Successive Convex Approximation (SCA) technique, in order to maximize the num-
ber of served devices in terms of achievable uplink data rate. In [71], the UAV’s time
flight is minimized by optimizing its altitude, while jointly maximizing the number
of offloaded bits by the ground devices.
However, despite the significant research work and advances achieved by the
aforementioned research efforts, the problem of the IoT devices’ distributed and
autonomous decision-making with respect to their data offloading strategies, towards
jointly optimizing their communication and computation overhead has not yet been
fully exploited, especially under the the prism of artificial intelligence. In this thesis,
a field of IoT devices is considered supporting latency and energy sensitive IoT
applications. Accordingly each IoT device has the option to execute its computation
task either locally or offload part of it to a UAV-mounted MEC server, by considering
the joint optimization of the involved communication and computation overhead.
9
Chapter 1. Overview
1.3 Contributions
The key technical contributions of this thesis are summarized as follows. First of all,
we model and formulate the IoT devices’ communication, computation and energy
overhead due to data offloading, while based on this, the utility of each device by
offloading and processing its computation task’s data to the UAV-mounted MEC
server is reflected in representative functions.
Moreover, in order to capture the competitive behavior of the IoT devices, we for-
mulate a non-cooperative game among them aiming at maximizing their own utility
at every timeslot, while considering at the same time the experienced communica-
tion and computation time overhead, from offloading and processing their data at
the UAV. As a consequence, this process enables the devices to learn from history,
scrutinize the performance of other nodes, and adjust their behavior accordingly.
We also show the existence of at least on Pure Nash Equilibrium (PNE) point, by
proving that the game is submodular. Thus, we introduce a best response dynamics
approach which converges to a PNE.
10
Chapter 1. Overview
1.4 Outline
The rest of this thesis is organized as follows. In Section 2.1 we present the for-
mulated IoT devices’ communication and computation overhead, while in Section
2.2 we model the experienced utility of each device. Furthermore, in Section 2.3 we
adduce out proposed game-theoretic edge distributed computing framework, by first
formulating a non-cooperative game among the IoT devices and afterwards proving
that there is at least one Pure Nash Equilibrium (PNE) in Sections 2.3.1 and 2.3.2
respectively. Thus, in Section 2.3.3 we introduce a best response dynamics method
in order the IoT devices to converge to the aforementioned PNE. In Section 2.4 we
introduce three different families of Reinforcement Learning algorithms, aiming at
the IoT devices converging to a PNE in an autonomous and distributed manner.
Specifically, in Section 2.4.1 we present the Linear Reward Inaction (LRI) algorithm,
while in Sections 2.4.2 and 2.4.3 the Binary Log Linear (BLLL) and the stateless
Q-Learning correspondingly. Finally, a detailed numerical and comparative perfor-
mance evaluation results between the different proposed approaches are provided in
Chapter 3, while Chapter 4 concludes this master’s thesis.
11
Chapter 2
12
Chapter 2. AI-enabled Distributed Edge Computing System for IoT Applications
cost. The IoT device’s d set of data offloading strategies at timeslot t is denoted as
(t) (t) (t) (t) (t)
Ad = {ad,min , . . . , ad,j , . . . , ad,max }, where ad,j œ [0, 1] is a percentage of the overall
amount of the device’s computation task’s data.
(t)
where W [Hz] is the system’s bandwidth, pd is the device’s transmission power, and
(t)
gd is the device’s channel gain to communicate with the UAV at the timeslot t.
Each device’s transmission power is considered fixed in the following analysis and its
absolute value depends on its hardware characteristics. Also, following the NOMA
(t) (t)
and SIC principles [73], without loss of generality, we consider g|D| Æ · · · Æ gd Æ
|D|
q
(t) (t) (t)
. . . g1 , thus, the interference that the IoT device d experiences is ‡o2 + pdÕ · gdÕ ,
dÕ Ød+1
where ‡o2 is the variance of the Additive White Gaussian Noise [74].
13
Chapter 2. AI-enabled Distributed Edge Computing System for IoT Applications
The first term of Eq.2.2 represents the communication time overhead that the IoT
device experiences to offload its data to the UAV, while the second term captures the
experienced computation time overhead. Also, as it is observed by the denominator
of the second term in Eq.2.2, each IoT device exploits only a portion of the UAV’s
computation capability, as the latter is shared in a fair manner among the IoT devices
with respect to how many bits they offloaded to the UAV.
Furthermore, the energy overhead that each IoT device experiences by offloading
its computation task’s data to the UAV at timeslot t is given as follows.
(t) (t)
(t) ad,j · Id (t)
Oenergy,d = (t)
· pd (2.3)
Rd
(t) (t)
(t) O O
Od = time,d + energy,d
(t)
(2.4)
T ed
14
Chapter 2. AI-enabled Distributed Edge Computing System for IoT Applications
In this section, we cast the IoT devices’ distributed data offloading problem into the
analytical framework of non-cooperative game theory. Initially, the non-cooperative
data offloading game among the IoT devices is formulated, while subsequently an
analytical solution is provided to determine a Pure Nash Equilibrium point of the
game.
Each IoT device aims at maximizing its perceived utility, as expressed in Eq.2.5, at
each timeslot in order to improve its perceived benefit from offloading and processing
its data at the UAV-mounted MEC server, while mitigating its personal cost, as ex-
pressed by its experienced overhead (Eq.2.4). Thus, the corresponding optimization
15
Chapter 2. AI-enabled Distributed Edge Computing System for IoT Applications
problem for each IoT device, is expressed as the maximization of each IoT device’s
utility, as follows.
(t)
a
q d,j (t)
a Õ Õ
(t)
(t) (t) (t) d ,j
max Ud (ad,j , a≠d,j ) =b·e ’dÕ ”=d,dÕ œD
≠ c · eOd (2.6)
(t) (t)
s.t. ad,j œ Ad
Based on the maximization problem in Eq.2.6, we observe that the IoT devices’
data offloading strategies are interdependent, and the devices demonstrate compet-
itive behavior in terms of exploiting the UAV’s computing capabilities. Thus, the
utility maximization problem in Eq.2.6 is confronted as a non-cooperative game
(t) (t)
among the IoT devices. Let G = [D, {Ad }dœD , {Ud }dœD ] denote the Distributed
Data Offloading (DDO) game played among the IoT device’s at each timeslot t,
(t)
where as mentioned before D is the set of IoT devices, Ad is the data offloading
(t)
strategy set of each device d œ D, and Ud denotes the device’s utility.
The solution of the DDO game should determine an equilibrium point, where
the IoT devices have maximized their perceived utility by selecting their optimal
(t)ú
data offloading strategy ad,j . If the DDO game has a feasible PNE point, then at
that point, no device has the incentive to unilaterally change its equilibrium data
(t)ú
offloading strategy ad,j , given the strategies of the rest of the devices, as it cannot
furhter improve its perceived utility. More precisely, the PNE of the non-cooperative
DDO game is defined as follows.
Based on Definition 1, we conclude that the existence of a PNE in the DDO game
guarantees the stable operation of the distributed edge computing system, while the
16
Chapter 2. AI-enabled Distributed Edge Computing System for IoT Applications
IoT devices maximize their perceived utility. On the other hand, if the DDO game
does not have at least one PNE, that is translated to an unsteady and unstable state
of the examined system.
The theroy of S-modular games is adopted in order to show the existence of at least
one PNE for the DDO game [26, 76]. The basic intuition of the submodular games
is that an increase in one’s player’s action for given actions of rivals, reinforces the
desire of all players to decrease their actions because of strategic complementarity. S-
modular games have gained great attention in resource allocation problems in wireless
networks [77–80] due to: a) Pure Nash Equilibrium existance in S-modular games
can be proved, b) if each player initially adopts his lowest strategy or his largest
strategy, then he converges monotonically to an equilibrium, which depends on the
initial state and finally c) if the S-modular game has a unique Nash Equilibrium,
then it is dominance solvable and learning rules will converge to it, such as best
response dynamics. Specifically, we show that the DDO game is submodular, which
means that when an IoT device tends to offload a large amount of data to the UAV-
mounted MEC server, the rest of the devices follow the exact opposite philosophy,
i.e., they become more conservative in terms of their data offloading, as the MEC
server is congested with tasks. Thus, in general a submodular game is characterized
by strategic substitutes and has at least one PNE [26], [77]. Considering the DDO
(t)
game with strategy space Ad , we can prove the following theorem.
(t) (t)
Theorem 1. (Submodular Game) The DDO game G = [D, {Ad }dœD , {Ud }dœD ]
is submodular of for all d œ D the following conditions hold true:
(t)
(i) ’d œ D, Ad is a compact subset of the Euclidean space.
17
Chapter 2. AI-enabled Distributed Edge Computing System for IoT Applications
(t)
(t) (t) ˆ 2 Ud
(ii) Ud is smooth in Ad and has non-increasing differences, i.e., (t) (t) Æ
ˆad,j ·ˆadÕ ,j Õ
0, ’d, dÕ œ D, d ”= dÕ , ’j, j Õ .
Proof. Towards proving that the DDO game is submodular, we consider that the
IoT device can partition its task in any feasible set of data and offload them to the
(t)
UAV-mounted MEC server. Thus, the strategy space Ad = (0, 1] is continuous and
(t)
a compact subset of the Euclidean space and Ud is a smooth function. Also we
(t)
a
q d,j (t)
(t) a Õ Õ
ˆ 2 Ud d ,j
have: (t) (t) = b · ⁄ ≠ c · µ where we set ⁄ = e ’dÕ ”=d,dÕ œD
·( q ≠1 (t) +
ˆad,j ·ˆadÕ ,j Õ ( adÕ ,j Õ )2
’dÕ ”=d,dÕ œD
(t)
(t) (t) I Õ
(t) (t) „d ·Id · B d (t)
q ≠1 (t) · ad,j ) and µ = eOd · ( q (t) (t)
U AV
) · (1 + Od ). Thus, we
( adÕ ,j Õ )3 a Õ Õ ·I Õ
d ,j d
’dÕ ”=d,dÕ œD dÕ ”=d
[1≠ BU AV
]2 ·FU AV úT
(t)
ˆ 2 Ud
observe that ⁄ < 0 and µ > 0. Therefore, we conclude that (t) (t) < 0 and the
ˆad,j ·ˆadÕ ,j Õ
DDO game is submodular. ⌅
Consequently, taking into account that a submodular game has a non-empty set
of Pure Nash Equilibrium points [26], [77], we conclude that the DDO game has at
(t)ú (t)ú (t)ú
least one PNE a(t)ú = (a1,j Õ , . . . , a|D|,j Õ ), ad,j ).
Towards determining the PNE of the DDO game, the Best Response Dynamics
(BRD) method [81] is adopted. The BRD is a natural method by which the IoT
devices proceed to a PNE via a local search method. However, it is noted that
the quality of the PNE depends on the order that the IoT devices update their
data offloading strategies. In this research work, we consider an asynchronous BRD
algorithm, where all the IoT devices update simultaneously their data offloading
strategies.
18
Chapter 2. AI-enabled Distributed Edge Computing System for IoT Applications
The best response strategy of each IoT device to the other devices’ data offloading
strategies is defined as follows.
19
Chapter 2. AI-enabled Distributed Edge Computing System for IoT Applications
In the gradient ascent reinforcement learning approach, the IoT devices act as Learn-
ing Automata (LA) and they learn their environment by performing gradient updates
of their perceived utility. Specifically, Learning Automata are policy iterators, that
keep a vector action probabilities over the available action set and, as is common
in Reinforcement Learning, these probabilities are updated based on feedback sig-
20
Chapter 2. AI-enabled Distributed Edge Computing System for IoT Applications
nals that are received from the environment. These learning schemes perform very
well in game theoretic environments, even though they do not require any informa-
tion exchange (actions, rewards, strategies) on the other players in the game. Each
device’s data offloading decisions are characterized by an action probability vector
(ite) (ite) (ite) (ite)
Pd = [P (t) ,...,P (t) ,...,P (t) ]. At each iteration of the gradient ascent al-
ad,min ad,j ad,max
gorithm, each device probabilistically chooses its potential data offloading strategy.
(ite)
The IoT devices make their stable data offloading decision, if P (t) Ø Pthres , ’d œ D,
ad,j
where Pthres is a threshold value of the action probability. The most commonly ap-
plied gradient ascent learning algorithm is called Linear Reward-Inaction (LRI) [82]
and the corresponding action probability updating rule is given as follows [83].
(ite+1) (ite) ˆ(t) (ite) (ite) (t) (t)
P (t) =P (t) + ÷ [Ud ] (1 ≠ P (t) ), if ad,j |ite = ad,j |ite+1 (2.8a)
ad,j ad,j ad,j
where ÷ œ (0, 1] is the learning rate of the IoT devices. For large values of the
learning rate ÷, the IoT devices explore less thoroughly their available data offloading
strategies, thus they converge fast to their stable decisions, however, they achieve
lower utility. The exact opposite holds true for small values of the learning rate. The
reward that each device receives by its data offloading decision at each iteration ite
ˆ(t) (ite) (t)
[U ](ite)
of the LRI algorithm is the normalized utility [Ud ] = q d (t) (ite) .
[Ud ]
dœD
21
Chapter 2. AI-enabled Distributed Edge Computing System for IoT Applications
of constant time complexity, i.e., O(1), the aforementioned overall complexity holds
true.
(t) (ite)
(ite+1) e[Ud ] ·—
(t) (t)
P (t) = (t) (t) , if ad,j |ite+1 ”= ad,j Õ |ite (2.9b)
ad,j [Ud ]Õ(ite) ·—
e +e [Ud ](ite) ·—
where — œ R+ is the learning parameter and for large values of — the IoT devices
explore more thoroughly their available data offloading strategies. The BLLL al-
gorithm converges when the summation of the devices’ perceived utilities remain
22
Chapter 2. AI-enabled Distributed Edge Computing System for IoT Applications
approximately the same for a very small number of K consecutive iterations (con-
vergence criterion).
2.4.3 Q-Learning
(ite)
An indicative way to estimate the aforementioned Q (t) value is based on the
ad,j
following standard Q-Learning update rule which is given as follows.
23
Chapter 2. AI-enabled Distributed Edge Computing System for IoT Applications
where ◊ œ (0, 1] is the learning parameter. Since each IoT device selects an offloading
strategy at each iteration ite, we introduce the widely used action selection rule
known as the greedy approach. According to the greedy rule, the IoT devices select
the offloading strategies with the highest expected utility (Eq.2.12), thus they only
exploit the knowledge that is acquired up to the iteration ite.
(t) (ite)
ad,j |ite+1 = arg max Qd (a) (2.12)
(t) (t)
ad,j œAd
The proposed Q-Learning algorithm that converges to a PNE of the DDO game is
described extensively in Algorithm 4. We indicate as Ite the number of epochs that
the reinforcement learning algorithm will execute in order to approach a potential
Pure Nash Equilibrium at a specific timeslot t. The respective total complexity is
O(Ite·|D|), because all the IoT devices select actions with respect to the correspond-
ing action values based on the ‘-greedy approach, experience a reward and afterwards
they update this Q-value. All of these steps are performed in a sequential way and
since the rest of the stateless Q-Learning algorithm contains only algebraic calcula-
tions of constant time complexity, i.e., O(1), the aforementioned overall complexity
holds true.
24
Chapter 2. AI-enabled Distributed Edge Computing System for IoT Applications
25
Chapter 2. AI-enabled Distributed Edge Computing System for IoT Applications
26
Chapter 2. AI-enabled Distributed Edge Computing System for IoT Applications
27
Chapter 3
Experiments
In this section, indicative numerical results are presented to illustrate the perfor-
mance of the proposed artificial intelligence-enabled distributed edge computing
framework (Section 3.2). A detailed comparative analysis is performed to gain insight
about the behavior of the different learning and exploitation approaches adopted in
this paper, by highlighting the drawbacks and benefits of the BRD model versus
the examined reinforcement learning approaches (Section 3.3). Additional discus-
sions regarding the robustness and applicability of the proposed learning methods
are provided in Section 3.4.
We consider an environment consisting of |D| = 250 IoT devices, where each IoT
device’s distance from the UAV-mounted MEC server is randomly and uniformly
distributed in the interval (10m, 400m). The simulation parameters are as follows:
(t)
(t) (t) (t) (t)
Id œ [20, 100]M Bytes, Cd œ [1, 5] · 109 CP U cycles, „d = , pd œ [1.2, 2]W atts,
Cd
(t)
Id
q (t)
W = 5M Hz, b = 0.74, c = 0.0043, BU AV Ø Id and FU AV = 15 · 109 CP Usec
cycles
.
dœD
(t) (t)
Unless otherwise explicitly stated, we consider ad,min œ (0, 0.2], ad,max œ [0.8, 1.0]
28
Chapter 3. Experiments
with an intermediate step of 0.05, ÷ = 0.3, — = 1000 and ◊ = 0.6. The proposed
framework’s evaluation was conducted via modeling and simulation and was executed
in a MacBook Pro Laptop, 2.5GHz Intel Core i7, with 16GB LPDDR3 available
RAM.
Avg Overhead
0 1060.09 0.18 0.27
3 0.6
Data [bits]
(a)
0.4
2.5
0.2
0 2 4 6
BRD Iterations
Figure 3.1: BRD Average Offloaded Data & Overhead
In particular, Fig.3.1 presents the IoT devices’ average offloaded data to the UAV
and the corresponding experienced overhead as a function of the BRD algorithm’s
iterations and real execution time (lower and upper horizontal axis respectively).
29
Chapter 3. Experiments
Social Welfare
183
Avg Utility
(b)
182.5 0.73
182
0.725
0 2 4 6
BRD Iterations
Figure 3.2: BRD Social Welfare & Utility
The results reveal that the BRD algorithm converges fast to a PNE (i.e., practically
in less than 4 iterations, equivalent to 0.18 sec). Also, the IoT devices converge
to a PNE, where they experience low average overhead (Fig.3.1) and high levels of
utility (Fig. 3.2). Moreover, by studying the BRD framework from the system’s
perspective, we observe that at the PNE high levels of social welfare are obtained
(Fig.3.2).
0.5
0
0 50 100 150
LRI Iterations
Figure 3.3: LRI Action Proabilities
Fig.3.3 presents the convergence of the data offloading strategies of one indicative
IoT device to a stable data offloading decision following the LRI algorithm. It is
observed that the devices’ data offloading converge to a stable decision in less than
100 iterations i.e., 0.32 sec, following the learning procedure of the gradient ascent
30
Chapter 3. Experiments
Avg Offloaded
0 106 0.163 0.326 0.489
Average Total
0.6
Data [bits]
3.03 (b)
Overhead
3.02
3.01
0.58
3
0 50 100 150
LRI Iterations
Figure 3.4: LRI Average Offloaded Data & Overhead
Social Welfare
0.727 182
Avg Utility
(c)
181.8
0.7265
181.6
0.726
0 50 100 150
LRI Iterations
Figure 3.5: LRI Social Welfare & Utility
learning algorithm. Also, Fig. 3.4, 3.5 present the convergence of the IoT devices’
average offloaded data, overhead, and utility, as well as the system’s social welfare.
The results show that the IoT devices learn in a distributed manner their surrounding
environment and they strategically decide their data offloading strategies in order to
achieve low overhead and high utility, while collectively enjoy high levels of social
welfare. Furthermore, Fig.3.6 presents the trade-off among the achieved average
utility of the IoT devices with the corresponding execution time of the LRI algorithm
in order to converge to a stable data offloading decision as a function of the learning
parameter ÷. The results reveal that for increasing values of the learning parameter
÷, the devices learn faster their environment and make a data offloading decision.
31
Chapter 3. Experiments
LRI Execution
2 0.728
(d)
Avg Utility
Time[sec]
1 0.727
0 0.726
0 0.5 1
LRI learning parameter
Figure 3.6: LRI Learning Parameter
However, this comes at the cost of lower achieved utility, as they under-explore their
available data offloading decisions.
(a)
182.5
= 1000
182 = 500
= 100
181.5
0 1000 2000 3000
BLLL Iterations
Figure 3.7: BLLL Social Welfare
Fig.3.7-3.10 examine the behavior of the BLLL algorithm, for different values
of the learning parameter —, as a function of the iterations and the real execution
time. The results show that the BLLL algorithm converges to the PNE with high
probability, while the IoT devices follow a learning approach, bearing however the
cost of longer convergence time. Thus, the IoT devices converge close to the PNE
and they achieve high utility levels (Fig.3.8), and low overhead (Fig.3.10), while
intelligently deciding their data offloading strategies (Fig.3.9). Furthermore, the
system converges to high levels of social welfare (Fig.3.7). Moreover, it is observed
32
Chapter 3. Experiments
Avg Utility
(b)
0.73
= 1000
= 500
= 100
0.725
0 1000 2000 3000
BLLL Iterations
Figure 3.8: BLLL Average Utility
(c)
3 = 1000
= 500
2.9 = 100
2.8
0 1000 2000 3000
BLLL Iterations
Figure 3.9: BLLL Average Offloaded Data
that better results are achieved for higher values of the learning parameter —.
33
Chapter 3. Experiments
Overhead
Avg Total
= 1000
= 500
0.4 = 100
0.2
0 1000 2000 3000
BLLL Iterations
Figure 3.10: BLLL Average Total Overhead
(a)
182
Greedy ( = 0)
-greedy ( =0.01)
180 -greedy ( =0.1)
0 1000 2000 3000
Q-Learning Iterations
Figure 3.11: Q-Learning Social Welfare
devices to explore other data offloading strategies than the ones that maximize the
expected utilities, achieve the best results among the different Q-learning implemen-
tations. This is due to the fact that the IoT devices can explore alternative actions
compared to the greedy Q-learning algorithm (‘ = 0) where they myopically choose
the strategies that offer them the maximum expected utility. On the other hand, if
the devices overexplore alternative strategies, i.e., ‘ = 0.1, they deviate from good
outcomes, being ”lost” in the exploration phase.
34
Chapter 3. Experiments
Avg Utility
0.73 (b)
Greedy ( =0)
0.725 -Greedy ( =0.01)
0.72 -Greedy ( =0.1)
5
Data [bits]
(c)
Greedy ( = 0)
-greedy ( =0.01)
-greedy ( =0.1)
0
0 1000 2000 3000
Q-Learning Iterations
Figure 3.13: Q-Learning Average Offloaded Data
Fig.3.15-3.17 present the system’s social welfare, the social welfare’s mean square
error with respect to the BRD model, and the execution time of all the examined
algorithms, respectively. The results reveal that the game-theoretic model - as re-
flected by the BRD algorithm - illustrates the best results, both in terms of achieved
35
Chapter 3. Experiments
Overhead
Avg Total
-greedy ( =0.01)
0.5 -greedy ( =0.01)
0
0 1000 2000 3000
Q-Learning Iterations
Figure 3.14: Q-Learning Average Total Overhead
Social Welfare
182 181.72
181
D RI LL 0) .1) 1)
BR L BL Q( = ( =0 =0.0
Q Q(
Figure 3.15: RL Social Welfare Comparison
social welfare and execution time. Then, the BLLL algorithm achieves the highest
social welfare among all the reinforcement learning algorithms, given its inherent
attribute to converge to a PNE with high probability as demonstrated in previous
subsection. On the other hand, the LRI approach, given its simplistic action update
rule (Eq.2.8a,2.8b) converges fast (Fig.3.17) to a stable data offloading vector for all
the IoT devices, while sacrificing the achieved welfare (Fig.3.15). The Q-Learning
approaches, i.e., ‘ = 0, 0.01, 0.1 illustrate similar execution time (Fig.3.17) and high
levels of social welfare (Fig.3.15) with the BRD algorithm’s PNE outcome. In a nut-
shell, based on the results in Fig.3.16, we observe that the smallest mean square error
of the social welfare with respect to the BRD algorithm’s outcome is achieved by the
36
Chapter 3. Experiments
Mean Square
1.8496
(b)
Error
1
10
Execution Time
5 3.90
0.31 0.49
0
D LRI LL =0) 0.1) 0.1)
BR BL Q( ( = ( =
Q Q
Figure 3.17: RL Execution Time
BLLL algorithm and then by the ‘-greedy Q-learning algorithms with ‘ = 0.01 and
‘ = 0.1, respectively. Also, by allowing the IoT devices to slightly deviate from the
strategies that maximize their expected utilities, they achieve better results than the
other reinforcement learning approaches, as they thoroughly explore their alternative
strategies.
37
Chapter 3. Experiments
greedy Q-learning algorithm still illustrates results close to the BRD algorithm’s
ones, while the LRI algorithm achieves the worst outcome in terms of the system’s
social welfare. Finally, the comparative results between the different reinforcement
learning algorithms that were discussed above and presented in the graphs, are all
included in Table 3.1.
10-3
8 (a)
Mean Square
0.0068
20 Actions
6
0.00474 100 Actions
Error
4 1000 Actions
10000 Actions
2 0.0015
0.00077
0
1 2 3 4
Figure 3.18: BLLL MSE for different number of actions
38
Chapter 3. Experiments
20
Execution Time
20 Actions 17.84 (b)
100 Actions
1000 Actions
[sec]
10 10000 Actions 9.76
5.44
3.90
0
1 2 3 4
Figure 3.19: BLLL Execution Time for different number of actions
Fig.3.18 presents the mean square error of the BLLL algorithm’s achieved social
welfare compared to the outcome of the BRD algorithm for 20, 100, 1, 000, 10, 000
data offloading strategies, while Fig.3.19 shows the corresponding execution time
of the BLLL algorithm. The results illustrate that as the devices’ strategy space
increases, the achieved social welfare by the BLLL algorithm approaches the corre-
sponding one by the BRD algorithm, at the cost of increased execution time.
Based on the results provided in the latter two subsections, we observe that, the
game-theoretic BRD algorithm converges to better results both from the devices’
and the system’s perspective, primarily due to the use of the closed-form used to
determine the PNE (Eq. 2.7). Nevertheless, this requires that the devices are aware
of the closed-form solution or can extrapolate it, which bears additional overhead.
The reinforcement learning algorithms on the other hand, eliminate this assumption,
by enabling the devices to learn their environment without having a priori knowl-
edge of the optimal strategy rule. Last but not least, it should be noted that the
reinforcement learning approaches can be better applied in realistic cases where the
devices’ strategy space is not continuous as considered in the game-theoretic model
(i.e the devices may arbitrarily select any percentage of their data to offload), but in-
stead the devices are allowed to select their data offloading strategies from a discrete
39
Chapter 3. Experiments
40
Chapter 4
Part of our future work is to extend and evaluate the presented framework, while
41
Chapter 4. Conclusion and Future Works
considering a multi-UAV-mounted servers setup, where the IoT devices can exploit
the different computation choices of the environment. Moreover, another aspect of
our future work is to examine the case where the actions of the IoT devices with
respect to the UAV-mounted MEC server, i.e., offloaded bits, reside in a contin-
uous space and also design a satisfactory UAV trajectory in the continuous two-
dimensional area. In this case, it becomes impossible to represent the action values
in a finite data structure such as a 1D matrix and thus we will have to construct a
non-linear function approximator via deep neural networks. As a consequence, we
will utilize Deep Reinforcement Learning (DRL) where we will deploy several Tem-
poral Difference (TD)-based Value-based methods such as Deep Q-Networks (DQN),
Double Deep Q-Networks (DDQN), Dueling Networks, as well as Policy-based meth-
ods such as Advantace Actor Critic (A2C) and Deep Deterministic Policy Gradients
(DDPG).
Additionally, we envision the integration of the blockchain data structure [84] and
of a truth-inducing sybil resistant decentralized blockchain oracle [85] in order the
IoT devices to vote regarding their satisfaction from the perceived Quality of Service
(QoS) and Quality of Experience (QoE) from the UAV. Moreover, another important
aspect which is interesting to examine in the future is the incentivization of the IoT
devices to offer their data to the UAV following a labor economic approach [86] as
well as the importance of the information that each IoT device wants to offload in
a public safety scenario [72]. The security aspect regarding these use cases is also
essential, since in a public safety scenario, i.e., terrorist attack, the IoT devices may
have to mask their communication’s information in a way that are not traceable by
malicious users [87, 88].
We are also inclined to examine the case where the are multiple UAVs serving
the IoT devices and the latter ones have to perform autonomous decision-making
regarding to which UAV they will partially offload their data [46, 89]. At this case,
42
Chapter 4. Conclusion and Future Works
we should also consider the case of the incentivization and management of the UAVs
in order to process the IoT devices’ data [90, 91], and also the resource orchestration
in such a heterogeneous communication environment [92–94], where the UAVs may
have different characteristics.
43
References
[1] L. Atzori, A. Iera, and G. Morabito, “The internet of things: A survey,” Com-
puter networks, vol. 54, no. 15, pp. 2787–2805, 2010.
[4] N. Hassan, S. Gillani, E. Ahmed, I. Yaqoob, and M. Imran, “The role of edge
computing in internet of things,” IEEE Communications Magazine, vol. 56,
no. 11, pp. 110–115, 2018.
[5] P. J. Werbos, “The new ai: Basic concepts, and urgent risks and opportunities
in the internet of things,” in Artificial Intelligence in the Age of Neural Networks
and Brain Computing, pp. 161–190, Elsevier, 2019.
[6] L. Lei, Y. Tan, K. Zheng, S. Liu, K. Zhang, and X. Shen, “Deep reinforcement
learning for autonomous internet of things: Model, applications and challenges,”
IEEE Communications Surveys & Tutorials, 2020.
44
References
[13] Q. Han, S. Liang, and H. Zhang, “Mobile cloud sensing, big data, and 5g net-
works make an intelligent and smart world,” IEEE Network, vol. 29, no. 2,
pp. 40–45, 2015.
[14] R. Li, Z. Zhao, X. Zhou, G. Ding, Y. Chen, Z. Wang, and H. Zhang, “Intel-
ligent 5g: When cellular networks meet artificial intelligence,” IEEE Wireless
Communications, vol. 24, no. 5, pp. 175–183, 2017.
[15] M. J. Osborne and A. Rubinstein, A course in game theory. MIT press, 1994.
[16] M. Rabin, “Incorporating fairness into game theory and economics,” The Amer-
ican economic review, pp. 1281–1302, 1993.
[19] M. Le Breton and K. Van der Straeten, “Government formation and electoral
alliances: The contribution of cooperative game theory to political science,”
Revue d’économie politique, vol. 127, pp. 637–736, 2017.
45
References
[22] J. Chen, C. Hua, and C. Liu, “Considerations for better construction and de-
molition waste management: Identifying the decision behaviors of contractors
and government departments through a game theory decision-making model,”
Journal of cleaner production, vol. 212, pp. 190–199, 2019.
[23] Z. Han, D. Niyato, W. Saad, and T. Başar, Game Theory for Next Genera-
tion Wireless and Communication Networks: Modeling, Analysis, and Design.
Cambridge University Press, 2019.
[24] R. Azad Gholami, L. K. Sandal, and J. Uboe, “Solution algorithms for optimal
buy-back contracts in multi-period channel equilibria with stochastic demand
and delayed information,” NHH Dept. of Business and Management Science
Discussion Paper, no. 2019/10, 2019.
[25] Z. Zheng, L. Song, Z. Han, G. Y. Li, and H. V. Poor, “Game theory for big data
processing: multileader multifollower game-based admm,” IEEE Transactions
on Signal Processing, vol. 66, no. 15, pp. 3933–3945, 2018.
[26] Y. Zhang and M. Guizani, Game theory for wireless communications and net-
working. CRC press, 2011.
[28] A. Agah and S. K. Das, “Preventing dos attacks in wireless sensor networks: A
repeated game theory approach.,” IJ Network Security, vol. 5, no. 2, pp. 145–
153, 2007.
46
References
47
References
[44] H. Guo and J. Liu, “Uav-enhanced intelligent offloading for internet of things at
the edge,” IEEE Transactions on Industrial Informatics, vol. 16, no. 4, pp. 2737–
2746, 2020.
[45] Z. Yang, C. Pan, K. Wang, and M. Shikh-Bahaei, “Energy efficient resource
allocation in uav-enabled mobile edge computing networks,” IEEE Tran. on
Wir. Com., vol. 18, no. 9, pp. 4576–4589, 2019.
[46] K. Rael, G. Fragkos, J. Plusquellic, and E. E. Tsiropoulou, “Uav-enabled hu-
man internet of things,” in 2020 16th International Conference on Distributed
Computing in Sensor Systems (DCOSS), pp. 312–319, 2020.
[47] Y. Liu, M. Qiu, J. Hu, and H. Yu, “Incentive uav-enabled mobile edge computing
based on microwave power transmission,” IEEE Access, vol. 8, pp. 28584–28593,
2020.
[48] G. Mitsis, E. E. Tsiropoulou, and S. Papavassiliou, “Data offloading in uav-
assisted multi-access edge computing systems: A resource-based pricing and
user risk-awareness approach,” Sensors, vol. 20, no. 8, p. 2434, 2020.
[49] Z. Tan, H. Qu, J. Zhao, S. Zhou, and W. Wang, “Uav-aided edge/fog computing
in smart iot community for social augmented reality,” IEEE Internet of Things
Journal, pp. 1–1, 2020.
[50] F. Zhou, Y. Wu, R. Q. Hu, and Y. Qian, “Computation rate maximization in
uav-enabled wireless-powered mobile-edge computing systems,” IEEE Journal
on Selected Areas in Communications, vol. 36, no. 9, pp. 1927–1941, 2018.
[51] Y. Du, K. Yang, K. Wang, G. Zhang, Y. Zhao, and D. Chen, “Joint resources
and workflow scheduling in uav-enabled wirelessly-powered mec for iot systems,”
IEEE Transactions on Vehicular Technology, vol. 68, no. 10, pp. 10187–10200,
2019.
[52] X. Cao, J. Xu, and R. Zhangt, “Mobile edge computing for cellular-connected
uav: Computation offloading and trajectory optimization,” in 2018 IEEE 19th
International Workshop on Signal Processing Advances in Wireless Communi-
cations (SPAWC), pp. 1–5, IEEE, 2018.
[53] X. Hu, K.-K. Wong, K. Yang, and Z. Zheng, “Uav-assisted relaying and edge
computing: Scheduling and trajectory optimization,” IEEE Transactions on
Wireless Communications, vol. 18, no. 10, pp. 4738–4752, 2019.
[54] T. Zhang, Y. Xu, J. Loo, D. Yang, and L. Xiao, “Joint computation and com-
munication design for uav-assisted mobile edge computing in iot,” IEEE Trans-
actions on Industrial Informatics, 2019.
48
References
[55] Z. Li, Y. Wang, M. Liu, R. Sun, Y. Chen, J. Yuan, and J. Li, “Energy efficient
resource allocation for uav-assisted space-air-ground internet of remote things
networks,” IEEE Access, vol. 7, pp. 145348–145362, 2019.
[56] Z. Yu, Y. Gong, S. Gong, and Y. Guo, “Joint task offloading and resource
allocation in uav-enabled mobile edge computing,” IEEE Internet of Things
Journal, vol. 7, no. 4, pp. 3147–3159, 2020.
[58] N. Cheng, W. Xu, W. Shi, Y. Zhou, N. Lu, H. Zhou, and X. Shen, “Air-ground
integrated mobile edge networks: Architecture, challenges, and opportunities,”
IEEE Communications Magazine, vol. 56, no. 8, pp. 26–32, 2018.
[60] W. Ma, X. Liu, and L. Mashayekhy, “A strategic game for task offloading among
capacitated uav-mounted cloudlets,” in 2019 IEEE International Congress on
Internet of Things (ICIOT), pp. 61–68, IEEE, 2019.
[63] F. Zhou, Y. Wu, H. Sun, and Z. Chu, “Uav-enabled mobile edge computing: Of-
floading optimization and trajectory design,” in 2018 IEEE International Con-
ference on Communications (ICC), pp. 1–6, IEEE, 2018.
[65] Z. Na, M. Zhang, J. Wang, and Z. Gao, “Uav-assisted wireless powered inter-
net of things: Joint trajectory optimization and resource allocation,” Ad Hoc
Networks, vol. 98, p. 102052, 2020.
49
References
[66] J. Zhang, M. Lou, L. Xiang, and L. Hu, “Power cognition: Enabling intelli-
gent energy harvesting and resource allocation for solar-powered uavs,” Future
Generation Computer Systems, 2019.
[67] M.-A. Messous, S.-M. Senouci, H. Sedjelmaci, and S. Cherkaoui, “A game theory
based efficient computation offloading in an uav network,” IEEE Transactions
on Vehicular Technology, vol. 68, no. 5, pp. 4964–4974, 2019.
[73] M. Liu, T. Song, and G. Gui, “Deep cognitive perspective: Resource allocation
for noma-based heterogeneous iot with imperfect sic,” IEEE Internet of Things
Journal, vol. 6, no. 2, pp. 2885–2894, 2018.
50
References
[79] E. Altman and Z. Altman, “S-modular games and power control in wireless
networks,” IEEE Transactions on Automatic Control, vol. 48, no. 5, pp. 839–
842, 2003.
[80] S. Koulali, E. Sabir, T. Taleb, and M. Azizi, “A green strategic activity schedul-
ing for uav networks: A sub-modular game perspective,” IEEE Communications
Magazine, vol. 54, no. 5, pp. 58–64, 2016.
[81] A. Matsui, “Best response dynamics and socially stable strategies,” Journal of
Economic Theory, vol. 57, no. 2, pp. 343–362, 1992.
[82] A. Nowé, P. Vrancx, and Y.-M. De Hauwere, “Game theory and multi-agent re-
inforcement learning,” in Reinforcement Learning, pp. 441–470, Springer, 2012.
51
References
52