1 s2.0 S1389128621000165 Main

Computer Networks 188 (2021) 107829
Contents lists available at ScienceDirect
Computer Networks
journal homepage: www.elsevier.com/locate/comnet
5G network slices resource orchestration using Machine Learning

techniques✩
Nazih Salhab a,b ,∗, Rami Langar a , Rana Rahim b,c
a
LIGM, CNRS-UMR 8049, University Gustave Eiffel, Champs-sur-Marne 77420, France
b
Doctoral School of Sciences and Technologies (DSST), Lebanese University, Tripoli, Lebanon
c
Faculty of Science, Lebanese University, Tripoli, Lebanon
ARTICLE INFO ABSTRACT
Keywords: To efficiently serve heterogeneous demands in terms of data rate, reliability, latency and mobility, network
Network slicing operators must optimize the utilization of their infrastructure resources. In this context, we propose a
Machine-Learning framework to orchestrate resources for 5G networks by leveraging Machine Learning (ML) techniques. We
Resource orchestration
start by classifying the demands for resources into groups in order to adequately serve them by dedicated
5G and beyond
logical virtual networks or Network Slices (NSs). To optimally implement these heterogeneous NSs that share
OpenAirInterface OAI
the same infrastructure, we develop a new dynamic slicing approach of Physical Resource Blocks (PRBs). On
first hand, we propose a predictive approach to achieve optimal slicing decisions of the PRBs from a limited
resource pool. On second hand, we design an admission controller and a slice scheduler and formalize them
as Knapsack problems. Finally, we design an adaptive resource manager by leveraging Deep Reinforcement
Learning (DRL). Using our 5G experimental prototype based on OpenAirInterface (OAI), we generate a realistic
dataset for evaluating ML based approaches as well as two baselines solutions (i.e. static slicing and uninformed
random slicing-decisions). Simulation results show that using regression trees for both classification and
prediction, coupled with the DRL-based adaptive resource manager, outperform alternative approaches in terms
of prediction accuracy, resource smoothing, system utilization and network throughput.
1. Introduction Class Identifiers (QCIs), in order to have each type of traffic associated
to a class of service supported by a tailor-made NS. NSs are fine-tuned
Mobile networks are anticipated to provide three classes of services slices that are optimized to maximize service level objectives while
known as enhanced Mobile Broadband (eMBB), massive Machine Type meeting underlying system’s constraints. Note that we denote by slicing
Communication (mMTC) and ultra-Reliable Low-Latency Communica- ratios the amount of Physical Resource Blocks (PRBs), in the Radio
tion (uRLLC) [1]. According to a feasibility study technical report from Access Network (RAN), that are divided among the instantiated slices.
Third Generation Partnership Project (3GPP) [2], each class (eMBB, Aiming to provide an End-To-End (E2E) QoS network design, we im-
mMTC and URLLC) has its requirements in terms of throughput, mo- plement four building blocks, including: (i) demands classification, (ii)
bility, reliability, latency, energy efficiency, in addition to different optimum slicing ratios prediction, (iii) admission control coupled with
connectivity and traffic densities. In order to keep costs affordable, scheduling, and (iv) adaptive resource management. In this context,
mobile network operators have to optimize their resources to serve
we aim to address resource orchestration for 5G network slices. Recall
all of these heterogeneous demands. This is particularly interesting
that resource orchestration is about effectively consuming available
due to natural scarcity of resources, whether external (spectrum and
scarce resources, streamlining them into capabilities and leveraging the
power) or infrastructural (compute, networking, and storage). Leverag-
capabilities to create added-value through an optimized performance.
ing two enabling technologies, namely Software-Defined Networking
Our proposed resource orchestration process acts as follows. After de-
(SDN) and Network Function Virtualization (NFV), a solution consists
of provisioning dedicated logical networks, also known as Network mands classification, we address predicting the optimum slicing ratios
Slices (NSs) [3], to assure required Quality of Service (QoS). The first for several NSs sharing resources from a limited resource pool bounded
step consists of classifying the requests. In particular, we consider QoS by underlying systems capacities. To do so, we first propose employing
✩ A preliminary version of this paper appeared in the proceedings of the 2019 IEEE Global Communication Conference (GLOBECOM 2019).
∗ Corresponding author at: LIGM, CNRS-UMR 8049, University Gustave Eiffel, Champs-sur-Marne 77420, France.
E-mail addresses: [email protected] (N. Salhab), [email protected] (R. Langar), [email protected] (R. Rahim).
https://doi.org/10.1016/j.comnet.2021.107829
Received 14 August 2020; Received in revised form 27 December 2020; Accepted 7 January 2021
Available online 13 January 2021
1389-1286/© 2021 Elsevier B.V. All rights reserved.
N. Salhab et al. Computer Networks 188 (2021) 107829
Machine Learning (ML) techniques and particularly, Regression Trees A multi-access edge computing broker to answer heterogeneous
(RTs) for demands classification and slicing ratios forecasting. Recall tenant demands and related privileges in a network slicing environ-
that an RT is a graph that uses branching method to illustrate every ment was proposed by authors in [10]. They devised an orchestration
possible outcome of a decision. Then, we propose a Deep Reinforcement mechanism able to fulfill tenant requests while avoiding Service Level
Learning (DRL) implementation for an adaptive resource manager. Agreement (SLA) violations, but they did not consider the elasticity of
We validate our proposal using an experimental 5G prototype [4], cloud resources.
which is implemented as a micro-service oriented architecture based on
OpenAirInterface™ (OAI) [5] and Docker [6] containers. Specifically, 2.2. Traffic and resource prediction approaches
we configure multiple slices, generate a dataset and benchmark the
performance of several ML models as well as two baseline solutions, Authors in [11] proposed a supervised ML model based on Decision
namely static slicing and un-informed random slicing-decisions. Ac- Trees (DTs) for root-cause-analysis of QoS degradation. They observed
cordingly, we compare the ground-truth with our predicted value of that DTs can predict future QoS anomalies with confidence in order to
slicing ratios, and we analyze the effect of resource orchestration using proactively exploit these findings. They shed some lights on a real use
four different metrics: Number of Uplink (UL) PRBs, Buffer Status case where inputs about path, devices, boards, ports and links faults
Report (BSR), system utilization and network throughput. are processed to detect anomalies, but, no validation through system
The main contributions of our paper can be summarized as follows: implementation was included. Conversely, in this paper, we elaborate
on the implementation of 5G experimental prototype used to validate
• First, we comprehensively review the state of the art on resource our proposal in Section 4.
orchestration for 5G network slices. Authors in [12] used feature-selection based prioritization to predict
• Second, we design a framework consisting of four building blocks mobile traffic leveraging data from an open dataset used in a big-
for resource orchestration. data challenge. They proposed to reduce the volume of traffic log data
• Third, we provide formulation, models and algorithms to imple- sent from base stations to the server while maintaining high prediction
ment these building blocks. Specifically, we propose (i) using ML accuracy, but they did not consider adaptive resource management of
techniques for classification, (ii) predicting slicing ratios based the server.
on RTs, (iii) modeling admission control and scheduling as Knap- Authors in [13] used a stochastic model to represent time series data
sack optimization problems and (iv) leveraging DRL for adaptive using hierarchical hidden Markov model, which includes two nested
resource management. hidden Markov chains and one observable process. Knowing that MCs
• Finally, we show the effectiveness of our proposal using our are memory-less by design, they did not consider the elasticity provided
5G experimental prototype based on OAI including ground-truth by cloud resources.
measurements and other metrics, namely, UL PRBs, BSR, system A multi-objective genetic algorithm to optimize resource allocation
Central Processing Unit (CPU) utilization and network through- while minimizing CPU and memory utilization and the energy con-
put. sumption was proposed by authors in [14]. Their approach consisted
of forecasting the resource requirement according to historical time
The remainder of this paper is organized as follows. In Section 2, slots in addition to Virtual Machines (VMs) placement, whereas, in our
we present an overview of the related works. Section 3 describes our work, our formulation applies not only to VMs but also to containers
system design and details our proposed models for implementing the that are suitable for a cloud-native deployment [15]. We used Docker
building blocks of our framework. Section 4 presents the performance containers [6] for implementing our platform used for evaluating our
evaluation including a description of our prototype, used dataset, and proposals. Moreover, in our previous work [4], we demonstrated a
the discussion of the results. We conclude this paper in Section 5. micro-service based deployment of 4G EPC core using OpenAirIn-
terface (OAI). Finally, multiple authors used machine learning-based
2. Related work approaches for time series predictions [16]. This includes Trees [17],
K-Nearest Neighbor (KNN) [18], Discriminant-based [19], Random
Resource orchestration has triggered interest among researchers in Forests [20], Support Vector Machine (SVM) [21] and Gaussian Process
the past few years. In what follows, we discuss a selection of relevant Regression (GPR) [22]. We use these techniques as baselines, when
papers grouped by research areas. benchmarking our results.
2.3. Admission control and scheduling optimization approaches

2.1. Classification and slicing approaches
Authors in [23] presented a testbed called OVNES (OVerbooking
Authors in [7] proposed a framework integrating various ML al- NEtwork Slices) in charge of collecting network statistics, predicting
gorithms, SDN and NFV. They used a traffic classification module traffic behaviors by leveraging ML and applying admission control
and network slicing for self-organizing networks. In addition, they policies to select requests increasing network efficiency and schedul-
implemented such traffic classification and network slicing for eMBB, ing. The paper did not provide details about the implementation of
but they did not consider admission control, and scheduling processes these modules. Same authors in [24] designed a hierarchical control
in their proposed framework. plane to manage the E2E orchestration of slices. They formulated the
On another hand, Authors in [8] investigated a management and orchestration problem as a stochastic yield management problem and
orchestration architecture incorporating SDN and NFV for instantiating proposed optimal and heuristic approaches. However, they assumed a
and managing the federated network slices. They elaborated on their static maximum capacity of resources, which is not the case in cloud-
proposed architecture, but, they did not address the validation of such oriented deployment, as there is elasticity of cloud resources through
architecture in a 3GPP compliant testbed. auto-scalability.
Authors in [9] used a DRL-based approach to allow network entities An intelligent resource scheduling strategy for 5G RAN by exploit-
to learn about the network, aiming to make optimal decisions related ing a collaborative learning framework leveraging deep-learning and
to network slicing in 5G. They concluded that a DRL outperforms Q- re-inforcement learning was proposed in [25].
learning as well as greedy and random approaches in terms of average Authors in [26] designed a network slice admission control al-
utility per service request, but they did not provide implementation gorithm leveraging ML that learns the best acceptance policy while
details. satisfying service guarantees to tenants. They provided an analytical
2
model for slice admissibility, analyzed the system using a Semi-Markov Table 1
Standardized QCI characteristics [36].
decision processes and optimized the benefits using a practical ML
approach. l QCI Bearer type Delay budget Loss rate Priority
Authors in [27] proposed an admission control algorithm using a 0 1 GBR 100 ms 10−2 2
1 3 GBR 50 ms 10−3 3
multi-unit combinatorial auction model to determine fast winner when
2 6 non-GBR 300 ms 10−6 6
reserving resources with performance guarantees. They developed a 3 65 GBR 10 ms 10−2 0.7
reinforcement learning-based utility-maximizing strategy to distribute 4 66 GBR 100 ms 10−2 2
resources across tenants.
Authors in [28] proposed a DRL based approach for optimizing
network latency in an SDN context. They collected optimal paths form
the DRL agent and aimed at predicting future demands using deep according to current and predicted loads. Two outcomes are anticipated
neural networks. Moreover, they formulated the flow rules placement from the admission controller. The granted requests are forwarded
as an integer linear program to minimize the total network delay. to the Slice Scheduler to be served in the nearest time window. The
Inspired by these works, we formalize four building blocks along observations about denied requests, in case of no admission, are sent
with their models and implement the whole system in a 5G experimen- to the adaptive Resource Manager to train its resource management
tal prototype. techniques. We use DRL by implementing an automatic flow control
system to efficiently handle high resource utilization and maximize the
2.4. Adaptive resource management approaches throughput. The Slice Scheduler provides a feedback to the Decision
Maker to close the optimization loop and serve the postponed requests.
A framework for the configuration of radio resource management in In what follows, we present in details the aforementioned building
a sliced RAN using static slicing ratios was proposed by authors in [29]. blocks by formulating each sub-problem and proposing corresponding
They evaluated the blocking rate and the throughput per data radio solutions.
bearer of different types of slices. However, to achieve efficient resource
allocation within a changing environment, dynamic slicing based on 3.1. Gatekeeper model and problem formulation
traffic load is necessary.
Authors in [30] designed three key building blocks for network Based on slice blueprints, used as an input to our orchestrator, in
slicing, namely a forecasting module, an admission control agent and a Fig. 1, we implement an initial phase of classification of the demands.
scheduler. They used Holt–Winters method for traffic prediction. They It is impractical to let every use case dictate a tailor-made network
tolerated some violation of SLAs for an increase in resource utilization. slice to meet its requirements. Instead, a simple approach consists of
Details of the used dataset and its characteristics were not provided. aggregating the traffic per slice type. Table 1 reports our view of some
In contrast, we include an adaptive resource manager allowing to QoS classes of traffic according to blueprints and the SLAs [36]. For
avoid SLA violations and guarantee isolation between slices. Also, we instance, when traffic class 𝑙 = 3, QCI = 65 with Guaranteed Bit Rate
elaborate the details of our dataset generation. (GBR) bearer type, delay budget=10 ms, packet loss tolerance = 10−2 ,
On the other hand, authors in [31] proposed using Seasonal Auto- and a priority of 0.7, we can say that the Mission Critical Push To Talk
Regressive Moving-Average (SARIMA) for predictions. Although, (MCPTT) service can be fit [36].
SARIMA is not complex as it does not rely on predictor variables, it fails Denoting by 𝑟(𝑙)
ℎ
(𝑡) a request of a tenant ℎ for a traffic class (𝑙)
when there is an unusual growth or slowdown in the time series trends. over time (𝑡), every instance (or event) of a point process 𝜉 can be
∑
Conversely, predictor-based approaches capture such change through represented by 𝜉ℎ(𝑙) = 𝑇𝑡=0 𝛿𝑡 𝑟(𝑙)
ℎ
(𝑡) to constitute a feature vector, where
its predictors. 𝛿𝑡 denotes the Dirac function.
We propose a framework for resource orchestration in 5G networks Using classification formulation elaborated in [37], let us denote
consisting of multiple stages leveraging ML techniques in contrast to the by 𝑘 a possible category, and by 𝛼𝑘 the transpose of the corresponding
aforementioned approaches using Holt–Winters method or SARIMA. weights vector. Recall that a weight vector is a set of parameters that
Note that ML based approaches are expected to be competitive in are calculated during the training phase to correctly classify the train-
accuracy thanks to data used for training the ML models [32]. ing set and maximize the utility function. Our classification problem
As a conclusion, differently from these works, which focused on consists of assigning a score to each possible category 𝑘 through the
a single QoS aspect, in our work, we design a comprehensive QoS multiplication, using a dot product, of the feature vector of an instance
provisioning framework including classification, forecasting, admission by its related weights vector. The selected category would be the one
control, scheduling and resource management to fill the gaps that we with the highest utility resulting from assigning instance ℎ to category
identified in the state of the art. 𝑘. Accordingly, our utility function can be formulated as follows.
utility(𝜉ℎ(𝑙) , 𝑘) = 𝛼𝑘 ⋅ 𝜉ℎ(𝑙) (1)

3. System design
This formulation applies to multiple classification techniques including
The design of our network slices orchestrator, depicted in Fig. 1, is regression trees. Recall that a regression tree is built using a binary
inline with the 3GPP technical specification detailing the concept and recursive partitioning, which is an iterative process that splits the data
requirements for network sharing and management architecture [33]. into partitions or branches. Then, it continues splitting each partition
After classifying tenants demands, accomplished by the Gatekeeper into smaller groups as the method moves up through each branch [17].
building block, the tenants traffic profile along with classified re-
quirements are used to predict the adequate slicing ratios for each of 3.2. Decision maker model and problem formulation
the classified demand. This process is followed by admission control,
resource management and scheduling processes. The starting point is to 3.2.1. Forecast aware slicer
get NSs requirements grouped in terms of network characteristics such A traffic profile is a graph of network traffic based on data collected
as spectral efficiency, latency, reliability, and energy efficiency, for a over a profiling time window. This serves as an input to the orchestrator
service instance [34]. We use supervised ML for implementing the clas- for an enriched decision making process. Based on these consolidated
sification [35]. The Decision Maker building block is composed of two traffic profiles, used as an input in Fig. 1, the forecast aware slicer
sub-modules: a Forecast Aware Slicer using ML based regression and predicts the optimum slicing ratios for the different slices in order
an Admission Controller that either grants or denies resource requests to have a good starting point for the slicing process. Accordingly,
3
Fig. 1. Block diagram of 5G network slice orchestrator.
in this module, we use multiple predictor variables extracted from all the (M) points, we calculate the prediction for leaf (𝑐) and the cost
the enriched historical traffic profile, namely: timestamp, day-of-the- function (𝑆 ∗ ). Note that the object node is an internal variable denoting
week, planned-event existence, and cloudy conditions environmental the current node in which the decomposition is taking place. Let us
factor, to predict optimal values for the slicing ratio. Note that cloudy denote by (𝑞) a quality indicator that is a function of the minimum
conditions and other environmental factors affect the slicing forecast in leaf size for the sought tree and by parameter (𝜀0 ) the threshold for
two ways. From technical point of view: Radio propagation is affected the largest variation of 𝛥(𝑆). The search for optimum 𝑆 ∗ is repeated
in general by moisture content in the air, as water tends to absorb until either the largest decrease of 𝑆 would be less than 𝜀0 or one of
electro-magnetic waves reducing the range and bandwidth of wireless the resulting nodes would contain less than 𝑞 points. The result of this
systems. From ecological point of view: cloudy conditions are usually algorithm is a grown tree and its depth that allows us to predict the
coupled with a decrease in term of mobility. Accordingly, users tend to adequate slicing ratios. These are used as a starting point for the split
use more frequently their data connections. Accordingly, the demands of the PRBs. Afterwards, the admission control takes place.
for radio resources is typically increased.
To choose the best performance forecasting technique, we evaluated Algorithm 1: ML-based Regression Tree Growing
several methods including the RTs. Denoting by 𝑋 = (𝑥1 , 𝑥2 , … , 𝑥𝑛 )′ ∈
Data: 𝑀 points of 𝑘 predictors and their responses
R𝑛 , a vector of 𝑛 predictor variables, and by 𝑦 ∈ R, a scalar output
Result: Grown Tree T for responses and its depth 𝛿
denoting the response variable; we formulate our regression model as
1 do
follows.
2 Initialize a single node containing all 𝑀 points;
𝑦 = 𝑓 (𝑋, 𝛽) + 𝑒 (2) 3 Calculate 𝑚𝑐 and 𝑆;
4 if ∀ points in current node, predictors are same then
where 𝑒 is an independent random noise involved in the statistical rela- 5 break;
tionship between response variable 𝑦 and predictor variables 𝑥𝑖 allow- 6 else
ing a non perfect deterministic relation. Parameter 𝛽 = (𝛽1 , 𝛽2 , … , 𝛽𝑛 )′ is 7 search over all binary splits of all variables for the one
a vector of 𝑛 unknowns that are evaluated during the training based on which reduces S
the chosen regression model by minimizing the sum of squared errors 8 end
(𝑆), that is considered as our cost function. In particular, our proposed 9 if (Max(𝛥(𝑆)) < 𝜀0 or ∃ cardinal(node) < 𝑞) then
objective function for growing an RT consists of minimizing the sum of 10 break;
squared errors over all the leaves 𝑐 of our tree T as follows: 11 else
∑ ∑ 12 take that split and create 2 new nodes
𝑆 ∗ = minimize (𝑦𝑔 − 𝑚𝑐 )2 (3)
𝑐∈𝑙𝑒𝑎𝑣𝑒𝑠(T) 𝑔∈𝑐
13 end
1 ∑ 14 while no more new nodes;
where 𝑚𝑐 = 𝑔∈𝑐 𝑦𝑔 is the prediction for leaf 𝑐 having 𝑛𝑐 points in
𝑛𝑐 15 𝑆 ∗ ← 𝑆;
it. 16 return grown tree and its depth 𝛿
Please note that we will focus on the RTs as they provide the
best performance compared to other methods, as we will see in the
evaluation section.
3.2.3. Complexity analysis of regression tree growing
3.2.2. Proposed algorithm for growing regression trees Based on Algorithm 1, our regression tree calculates a quality
We propose a simple yet efficient algorithm (Algorithm 1) for grow- condition that is used as a stopping criterion (line 9) before proceeding
ing regression trees. Starting initially with a current node containing with the split of the data. It does this, for each predictor in every node
4
that is not a leaf node. The process repeats as long as there are some Problem (4) is NP-hard [40], but can be solved using a polynomial
levels (affecting the depth) to be treated. Denoting by 𝑀 the number of time algorithm, listed in Algorithm 2 [41], and described as follows.
points used for training, in the best case of a balanced tree, the depth The process starts by sorting the constraints on computing resources
𝛿 is (log2 (𝑀)) because of the split into two nodes (line 12). However, in increasing order to identify the bottleneck constraint. Note that
in the worst case of depth, 𝛿 is (𝑀) because each split decomposes we mean by bottleneck constraint, the tightest dimension from the D-
the data in 1 and (𝑀 ′ − 1) examples, where 𝑀 ′ is the number of dimensions of the Multiple Choice Knapsack Problem (D-MCKP) that
points of the current node. Denoting by 𝑘 the number of predictors, is first consumed when solving the mapping problem. Accordingly,
the time complexity for the regression tree growing is (𝑘.𝑀.𝛿), that the index of such bottleneck is denoted 𝑑 ∗ . We reject any network
corresponds to (𝑘.𝑀 2 ) in the worst case or (𝑘.𝑀. log2 (𝑀)) in the best slice requests for which requirements cannot be satisfied. Then, we
case. calculate the efficiency of each request defined as the ratio between the
network resource utilization value 𝑣 of each request and the bottleneck
∗
3.3. Admission controller constraint value 𝐶 (𝑑 ) . These values are then sorted in decreasing
order starting by the requests with the highest efficiency. This process
An Admission Controller receives the requests that need to be is reiterated in order to include additional requests that satisfy the
scheduled. Based on the current system load (𝑙) and supported by the constraints on resources. The algorithm stops once no more requests
data generated by the Forecast Aware Slicer, it decides whether to can be satisfied. In this case, the final network resource utilization as
grant or deny each individual request according to its priority class, well as the selected set of slice requests to be served are obtained.
as reported in Table 1.
Afterwards, granted requests are sent to the Slice Scheduler. In Algorithm 2: Heuristic for D-dimension Multiple Choice Knapsack
addition, it forwards the observations to the Resource Manager so that Problem
a reinforcement learning takes place. Accordingly, for high priority Data: Requests 𝑟𝑗 with their values 𝑣𝑗 and weights 𝑤𝑗 , systems 𝑠𝑖
resource requests, a pool re-dimensioning could take place in order with constraints 𝐶𝑖(𝑑) ;
to decrease the chances of service denial, taking into consideration Result: Max resource utilization mapping
the capabilities of underlying system and the availability of addi- 1 Sort the constraints on 𝑠𝑖 in increasing order;
tional infrastructure resources. We will address the priority concept in 2 Prioritize the bottlenecks (to get 𝑑 ∗ );
Section 3.5. 3 for 𝑟𝑗 = 1 to 𝑛 do
4 for constraint 𝑑 = 1 to 𝐷 do
3.3.1. Problem formulation and resolution if (𝑤(𝑑) (𝑑)
5
𝑗 > 𝐶𝑖 ) then
At an instant (𝑡) , we denote by 𝑥𝑖𝑗 a binary decision variable 6 disregard this 𝑟𝑗 ;
indicating whether a request 𝑗 is served by slice 𝑖 and thus admitted 7 break;
into the system or not. Index variables 𝑛 and 𝑚 denote the number of
8 end
requests and the number of slices, respectively. Each admitted request
9 end
𝑗 is valued as 𝑣𝑗 to reflect its individual revenue 𝑣𝑖 corresponding to
10 end
the amount of consumed resources. In this context, we assume that a
11 Get shortlist of eligible requests (n’)
slice tenant pays a monetary amount corresponding to the consumed
12 for 𝑟𝑗 = 1 to 𝑛′ do
resources. For simplicity, we will not get into a particular pricing of a ∗
multi-tenancy environment. Several models are available online by ma- 13 efficiency(𝑟𝑗 ) ← 𝑣𝑗 /𝐶𝑖(𝑑 ) ;
jor Cloud service providers, such as Google Cloud Provider (GCP) [38] 14 end
or Amazon Web Services (AWS) [39]. We formalize the Admission 15 Sort 𝑟𝑒𝑞𝑢𝑒𝑠𝑡𝑠 by decreasing efficiency per bottleneck;
Controller problem as D-dimensional Multiple-Choice Knapsack prob- 16 for 𝑗 = 1 to 𝑛′ do
lem that is bounded by 𝐷 constraints imposed by the hosting system 17 if (𝑟𝑗 fits in 𝑠𝑖 ) then consider it ;
18 end
capacities. The admission controller problem is formalized as follows.
19 if (𝑟𝑗 has non-integer variables) then disregard it ;
∑
𝑚 ∑
𝑛
(𝑡) 20 Return decision variables that maximize the resource utilization;
max 𝑎 = 𝑣𝑗 𝑥𝑖𝑗 (4a)
𝑥
𝑖=1 𝑗=1
s.t.
∑
𝑛
3.3.2. Complexity analysis of the proposed D-MCKP
𝑤(𝑑) (𝑡) (𝑑)(𝑡)
𝑗 𝑥𝑖𝑗 ≤ 𝐶𝑖 , 𝑑 ∈ {1, … , 𝐷}, 𝑖 ∈ {1, … , 𝑚} (4b)
The algorithm used to solve our D-dimension Multiple Choice Knap-
𝑗=1
sack Problem (D-MCKP) problem is based on the well-known Quicksort
∑𝑚
𝑥(𝑡)
𝑖𝑗 ≤ 1, 𝑗 ∈ {1, … , 𝑛} (4c) algorithm, which employs divide and conquer strategy to do the sort-
𝑖=1 ing [42]. It is known that the time complexity of Quicksort of 𝑛 items
𝑥(𝑡) ∈ {0, 1}, 𝑖 ∈ {1, … , 𝑚}, 𝑗 ∈ {1, … , 𝑛} (4d) is (𝑛.𝑙𝑜𝑔2 (𝑛)) in both best and average cases, and (𝑛2 ) in the worst
𝑖𝑗
case [42]. As Quicksort has the best performance in the average case
The objective function in (4a) aims to maximize the value resulting for most inputs, it is generally considered the ‘‘fastest’’ sorting algo-
from the resource utilization, while serving network slice requests rithm among known sorting algorithms [42]. Thus, the sort instruction
having weights 𝑤(𝑑)𝑗 expressed in terms of the 𝐷 system capacities. The (line 1) has a time complexity of (𝐷. log2 (𝐷)) as 𝐷 is the number
set of constraints (4b) specifies the constraints on the infrastructure of constraints. The for-loop (lines 3–10) in the algorithm (2) consists
computing resources in regard to the required demands in terms of of (𝐷.𝑛) iterations in the worst case that is when no request violates
the 𝐷 capacities. These resources cannot bypass upper bounds imposed the constraints and the for-loop is not prematurely ended. Thus, its
by underlying system capabilities denoted by 𝐶𝑖(𝑑) . Assuming that a complexity is (𝐷.𝑛). The sorting process is the time dominant task in
single task is assigned to exactly one system, constraint (4c) enforces the second part of the algorithm (lines 12–24). The for-loops in the
such exclusivity. Constraint (4d) ensures atomic mapping with binary last part of the algorithm consist of (𝑛′ ) iterations. The second sort
values. In the evaluation Section 4, we will consider that the number of instruction of the algorithm (line 15) consists of a sort operation of
capacities 𝐷 is 2, representing the number of virtual Central Processing (𝑛′ ) numbers that should be less complex than the first sort instruction
Units (vCPUs) and the amount of memory [14]. knowing that (𝑛′ < 𝑛) by design. All in all, provided that 𝐷 and the
5
number of slices 𝑚 are far less than 𝑛, the time complexity of the optimal behavior, by controlling the variation of the resource manager
proposed algorithm is (𝑛. log2 (𝑛)), where n is the number of requests. processing rate over time through elasticity. Such elasticity consists
of scaling-in and/or scaling-out the operations capacities in terms of
3.4. Slice scheduler model and problem formulation resources. Using a micro-service based architecture, and particularly
Docker containers, allows us to seamlessly do this auto-scaling as we
Once the Admission Controller has mapped the demands to network demonstrated in one of our previous works [4]. Note that leveraging the
slices, the mapping is addressed to the Slice Scheduler to properly serve prediction achieved by the forecast aware slicer typically favorites less
corresponding demands with minimal time duration. We denote by 𝑝𝑗 scaling commands. In addition, with Docker containers, such elasticity
the processing time of a transmission request 𝑗, out of 𝑛 requests, such guarantees that the session of the application are not interrupted as
that its time-span is 𝑐𝑗,𝑡 . Note that, as we are considering micro-service reported by [46] and verified by ourselves in our previous work [4].
based architectural deployment, the processing time is independent As it is not efficient to keep scaling-in/out on constant basis, two
from the processing capabilities of the processing node as each micro- questions need to be thoughtfully analyzed. First, provided that we use
service is atomic and is similar to any of its replicas [43]. Our problem a particular policy (𝛱), how to automatically determine an optimum
consists in finding a schedule minimizing the total time duration. flow rate (𝑓set ), to maximize the performance of the resource manager?
We define a binary decision variable 𝑧𝑗𝑡 to indicate if a request 𝑗 is Second, how to determine such ideal particular policy? Thus, the
scheduled in time window (𝜏). Our slice scheduler can be formalized, rationale behind the design of our resource manager is to act as an
once again, as an optimization problem as follows. adaptive flow control system by leveraging DRL. Indeed, using DRL
allows to capture all the intricate details of the acquired knowledge
∑
𝑛 ∑
𝜏
and thus relieves from explicitly doing the feature engineering process
min 𝑠̇ = 𝑐𝑗𝑡 𝑧𝑗𝑡 (5a)
𝑥
𝑗=1 𝑡=1 as elaborated in the following.
s.t.
3.5.1. System model
∑
𝑛 ∑
𝜏
Our resource manager is supposed to adapt its available resources
𝑧𝑗𝜎 ≤ 𝑁, 𝑡 ∈ {1, … , 𝜏} (5b)
𝑗=1 𝜎=𝑚𝑎𝑥{0;𝑡−𝑝𝑗 +1} according to the received binary priority 𝜌𝑡 from the Admission Con-
troller. Two main actions are possible: (i) increase/decrease when 𝜌𝑡 =
∑
𝜏
𝑧𝑗𝑡 ≤ 1, 𝑗 ∈ {1, … , 𝑛} (5c) 1 or (ii) maintain the current processing flow rate, when 𝜌𝑡 = 0.
𝑡=1 Accordingly, the resource manager adapts its flow rate by considering
𝑧𝑗𝑡 ∈ {0, 1}, 𝑗 ∈ {1, … , 𝑛}, 𝑡 ∈ {1, … , 𝜏} (5d) the dynamics of the resource demands. For this end, we will consider
the following three variables: (i) the flow variation over time (𝛾) rep-
Provided that the host implementing a slice has finite capacity and resenting the acceleration/deceleration, (ii) the flow rate per time-unit
can handle a maximum of up to 𝑁 requests concurrently, constraint (𝑓 ) and (iii) the current system load (𝑙). Note that 𝛾 has a direct impact
(5b) stipulates that during time window (𝜏), up to 𝑁 requests can be on the cost in a cloud computing deployment [47]. Ideally, 𝛾 should
executed. Constraint (5c) ensures that each request has to be scheduled follow the demands flow (𝛾𝑑 ) change in terms of increase/decrease.
only once. Finally, constraint (5d) stipulates that each request should To bound this flow variation, we define two parameters 𝛾min and 𝛾max
be either served in current time window 𝜏 or deferred to following time representing the minimum and maximum permissible flow variation
window. over time, respectively.
Note that 𝜏 is chosen according to the lowest granularity reported in Prior formalizing our problem, we give an illustrative example, in
the delay budget of the requested quality of services profiles, reported the context of automotive industry. Our resource manager is similar to
in Table 1. In our case, we considered 𝜏 to be 100 ms. Such a value the cruise control system that is used to automatically set the vehicle’s
is reported in [44] as the delay budget for conversational voice, Non- speed. In this case, our resource manager can decide to accelerate,
mission critical user plane Push to Talk voice, Mission critical video maintain or decelerate according to the changes in the environmental
user plane. It is worth noting that the formulated problem in (5) is NP- factors, while keeping a safe distance from the demands.
hard [40] and corresponds to a Knapsack problem, which is a particular In this context, inspired by vehicle dynamics, that maps the safe
case of a D-MCKP. The same Algorithm 2 proposed above to solve the distance between two vehicles to the velocity and the time gap in
D-MCKP problem can be thus used to solve our particular scheduling between, we formulate the load difference between the demand and the
problem in a polynomial time. In particular, we consider that D in this system load as a linear function of the flow rate per time unit and the
case to be equal to one. time gap. Accordingly, let us formalize the safety load margin (𝑙𝑔 ) that
determines the reference flow rate of our system (𝑓ref ). We consider a
3.5. Resource manager model and problem formulation simple model to let the flow rate be the distance between the demanded
and current systems load over the time in between. We formulate 𝑙𝑔 as
As stated earlier, denied requests from the Admission Controller a linear function of 𝑓 , as follows.
module are sent to the adaptive Resource Manager to train its resource 𝑙𝑔 = 𝑡gap .𝑓 + 𝑙𝑔0 (6)
management techniques and decrease the chances of service denial in
the future. where 𝑙𝑔0 is the initial load margin and 𝑡𝑔𝑎𝑝 is the time gap to transit
Let us denote by 𝑙𝑑 and 𝑙, the demanded load level and the actual from current to desired system state.
system load, respectively. These loads are expressed in bit and imply Let us denote by the relative load margin (𝑙𝑟𝑒𝑙 ) the difference
certain infrastructure requirements in terms of vCPUs and memory as between the demanded load (𝑙𝑑 ) and the system load (𝑙) as follows.
elaborated in [45]. As an initial setting, we can either be based on 𝑙rel = 𝑙𝑑 − 𝑙 (7)
readings from a similar application running on another VM/container
or by allocating the minimum of available vCPUs, that is usually a unit. The system maintains some safety margin to account for the demands
Accordingly, our goal here is to let the resource manager function variability as follows.
{
at an optimum flow rate (𝑓set ), expressed in bit/s, while maintaining min(𝑓𝑑 , 𝑓𝑠𝑒𝑡 ) if 𝑙𝑟𝑒𝑙 < 𝑙𝑔
a safety load margin (𝑙𝑔 ) to account for the shortfall between the de- 𝑓= (8)
𝑓𝑠𝑒𝑡 otherwise
manded and actual system loads. We define the flow rate as the number
of requests passing through the resource manager point in a given time Three observations are collected from the environment: (i) the flow
period usually expressed in a per-second basis. We can achieve this error (𝑒𝑡 ), defined as the difference between the reference flow rate 𝑓ref
6
and 𝑓 for each time step 𝑡, (ii) its integral ∫ 𝑒𝑡 d𝑡 allowing to eliminate On second hand, based on the defined actions in the system model,
the steady state error, and (iii) the current flow (𝑓 ) providing a boost as well as the Q-value in (13) and our designed instant reward, we can
effect. write the DRL problem for a particular policy 𝛱 as a maximization of
The mechanism is similar to a Proportional–Integral–Derivative the expected Q-Value as follows.
(PID) controller, where the integral term seeks to eliminate a residual ∑
error by adding some control effect onto the historic cumulative value 𝑄∗ (𝑠, 𝑎) = max E [ 𝛿 𝑡 𝑟𝑡 |𝑠0 = 𝑠, 𝑎0 = 𝑎, 𝛱] (14a)
𝛱
of the error. Accordingly, when the error is eliminated, the integral 𝑡≥0
term will cease to grow. It will result in diminishing the proportional s.t.
effect when the error decreases, or compensating such error by the
growing integral effect. 𝑎 ∈ A, 𝑠 ∈ S (14b)
Finally, initial conditions of load and flow for the demands and the
Based on the Bellman equation [48], problem (14a) could be rewrit-
resource manager are denoted by: (𝑙𝑑0 , 𝑓𝑑0 ) and (𝑙0 , 𝑓0 ), respectively.
ten in a recursive form as follows.
3.5.2. Problem formulation
𝑄∗ (𝑠, 𝑎) = E𝑠′ [𝑟 + 𝛿 max 𝑄∗ (𝑠′ , 𝑎′ )|𝑠, 𝑎] (15)
Keeping in mind that our objective is to maximize the resource
utilization, our adaptive resource manager problem can be formalized where 𝑄∗ (𝑠′ , 𝑎′ ) is the next time-step Q-value. To solve this problem to
as follows.
∑ optimality, we can use value iteration as follows.
max 𝜌𝑡 .𝑙
𝑡
𝑡≥0 (9) 𝑄𝑖+1 (𝑠, 𝑎) = E[𝑟 + 𝛿 max 𝑄𝑖 (𝑠′ , 𝑎′ )|𝑠, 𝑎] (16)
s.t. (6), (7), (8)
We can say that 𝑄𝑖 , expressed in (16), will converge to 𝑄∗ when
Problem (9) is NP-hard as it has non-linear and conditional constraints.
𝑖 → ∞. However, this problem is not scalable. Thus, we propose to use
Accordingly, we propose to decompose it, as depicted in Fig. 2, and
a function approximator to estimate 𝑄(𝑠, 𝑎) by exploiting deep neural
solve it using DRL, as explained here-after.
networks. On first hand, we propose using a Deep Q-Learning (DQL)
First, recall that reinforcement learning consists of dynamically
learning through a trial and error method to maximize an outcome. By function approximator of the Q-Value, as shown in Section 3.5.3.
following a policy 𝛱, the system follows sample paths of state 𝑠 ∈ S, In order to avoid an overly complicated Q-function and in order to
action 𝑎 ∈ A, and reward 𝑟 ∈ R (e.g., 𝑠0 , 𝑎0 , 𝑟0 , 𝑠1 , 𝑎1 , 𝑟1 , etc.). To enable future-proofness for a possible continuous action space, we also
enable our system to learn the best set of actions autonomously, we consider learning an optimum policy 𝛱 ∗ for all possible actions [9]. Ac-
also define a reward 𝑟𝑡 , which is a function of the control input (𝑢𝑡 ), cordingly, on second hand, we also propose using deep neural networks
the flow error (𝑒𝑡 ) and a binary bias (𝑏𝑡 ) as follows: to learn both of Q-value and the optimum policy, using Q-learning and
𝑟𝑡 = −(0.1𝑒2𝑡 + 𝑢2𝑡−1 ) + 𝑏𝑡 (10) Policy Gradients methods respectively, by training both of an actor (the
policy) and a critic (the Q-value), as shown in Section 3.5.4. In this case,
Note that 𝑒𝑡 is the flow error between desired flow rate and the flow rate
we use a Deep Deterministic Policy Gradient (DDPG) agent to learn
from previous iteration, while 𝑢𝑡−1 represents the control input from
both of 𝛱 and 𝑄 and finds an optimal policy (𝛱 ∗ ) that maximizes the
the previous time step. We consider 𝑏𝑡 as a binary variable reflecting
the minimized magnitude of the flow error, such that 𝑏𝑡 =1 if 𝑒2𝑡 ≤ 𝜀, long-term reward. We will investigate the performance both of these
or 𝑏𝑡 = 0 otherwise, with 𝜀 being a small preset threshold. Note that approaches next in our evaluation section.
as the error 𝑒𝑡 and previous control input 𝑢𝑡−1 are both preceded by a Algorithm 3: Listing of Deep Q-Learning Algorithm
negation coefficient, therefore, the reward function, expressed in (10), Data: Replay Buffer 𝑅, with capacity 𝑁
is large whenever 𝑒𝑡 and 𝑢𝑡−1 have small magnitudes, especially that Result: Optimal 𝑄 (𝜙(𝑠𝑡 ), 𝑎, 𝜃)
they are also squared. Accordingly, we can observe that the reward 1 Initialize replay buffer 𝑅 to capacity 𝑁
value is big when the magnitudes of the error and the previous control
2 Initialize Q with random weights
input are small and vice versa. In addition, it is worthy to note that the
3 while (simulation-condition) do
we decreased the impact of the error further by using a 10% coefficient
4 while (episode-iteration) do
in front of the squared error term. This multiplier is arbitrary chosen,
5 Initialize sequence 𝑠1 = {𝑥1 } and old sequence 𝜙1 = 𝜙(𝑠1 )
less than one, to make sure that only a small proportion of the error
6 for t=1 . . . T do
affects the reward value.
We denote by 𝑝 the transition probability from a state 𝑠 to another, 7 According to probability 𝜖
and by E the expectation. On first hand, we need to find the optimal 8 case: exploration, select random action 𝑎𝑡
policy 𝛱 ∗ that maximizes the total reward as follows. 9 case: exploitation, select 𝑎𝑡 = max𝑎 𝑄∗ (𝜙(𝑠𝑡 ), 𝑎, 𝜃)
∑ 10 Execute 𝑎𝑡 and observe 𝑟𝑡 and state 𝑥𝑡+1
𝛱 ∗ = arg max E [ 𝛿 𝑡 𝑟𝑡 |𝛱] 11 Set 𝑠𝑡+1 = 𝑠𝑡 , 𝑎𝑡 , 𝑥𝑡+1
𝛱 𝑡≥0
12 Preprocess 𝜙𝑡+1 = 𝜙(𝑠𝑡+1 )
s.t. 𝑠0 ∼ 𝑝(𝑠0 ) (11) 13 Store transition (𝜙𝑡 , 𝑎𝑡 , 𝑟𝑡 , 𝜙𝑡+1 ) in 𝑅
𝑎𝑡 ∼ 𝛱(⋅|𝑠𝑡 ) 14 Get random minibatch of transitions (𝜙𝑖 , 𝑎𝑖 , 𝑟𝑖 , 𝜙𝑖+1 )
𝑠𝑡+1 ∼ 𝑝(⋅|𝑠𝑡 , 𝑎𝑡 ) from 𝑅
15 if isterminal(𝜙𝑖+1 ) then
where 𝑡 is a time step and 𝛿 is a discount factor that is ≤ 1. From another 16 Set 𝑦𝑖 = 𝑟𝑖
side, we define the value function 𝑉 𝛱 (𝑠) as the expected cumulative
17 else
reward from following a policy 𝛱, starting from state 𝑠, as follows.
18 Set 𝑦𝑖 = 𝑟𝑖 + 𝛿max𝑎′ 𝑄(𝜙𝑖+1 , 𝑎′ , 𝜃)
∑
𝑉 𝛱 (𝑠) = E [ 𝛿 𝑡 𝑟𝑡 |𝑠0 = 𝑠, 𝛱] (12) 19 end
𝑡≥0 20 Perform a gradient descent step on (𝑦𝑖 − 𝑄(𝜙𝑖 , 𝑎𝑖 , 𝜃))2
We define the Quality (Q) value 𝑄𝛱 (𝑠, 𝑎) as the expected cumulative 21 end
reward from taking that action 𝑎 in state 𝑠 and following the policy 𝛱, 22 end
as follows. 23 end
∑
𝑄𝛱 (𝑠, 𝑎) = E [ 𝛿 𝑡 𝑟𝑡 |𝑠0 = 𝑠, 𝑎0 = 𝑎, 𝛱] (13)
𝑡≥0
7
Fig. 2. Block diagram of the resource manager with adaptive flow control.
3.5.3. Algorithm for Deep Q-Learning with experience replay Algorithm 4: Listing of DDPG Algorithm
The proposed algorithm for Deep Q-Learning with experience replay Data: Policy 𝛱 parameters (𝜃), critic 𝑄 parameters (𝜙)
is listed in Algorithm 3 and works as follows. First, we initialize 1 and discount factor (𝛿)
the replay buffer 𝑅 and the Q-network with random weights. The Result: Optimal policy 𝛱 ∗ with maximum reward
simulation is terminated when the (𝑠𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛-𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛) is no more true 2 Initialize critic network 𝑄(𝑠, 𝑎|𝜙) with random 𝜙
(Line 3), that is when 𝑓 < 0 or 𝑙rel < 0. During the training of the agent, 3 Initialize actor network 𝛱(𝑠|𝜃) with random 𝜃
we run at max 𝑁𝑒𝑝 training episodes, with each episode lasting for up 4 Update target network 𝑄′ and 𝛱 ′ with 𝜙′ ← 𝜙, 𝜃 ′ ← 𝜃
to 𝑇 time steps. We stop the training process, when the agent receives 5 Initialize the replay buffer 𝑅
an episode reward > 𝑉 that is a value that we set to be relatively 6 Let episode ← 0
high. We denote by 𝑁𝑒𝑝 the maximum number of episodes and by 𝑉 7 while (simulation-condition) do
the stop training value. The condition (𝑒𝑝𝑖𝑠𝑜𝑑𝑒-𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛) is fulfilled as 8 while (episode-iteration) do
long as both of the conditions on 𝑁𝑒𝑝 and 𝑉 are satisfied. Accordingly, 9 episode ← episode + 1
the latter specifies when the agent stops upon reception of an episode 10 Initialize a random process  as noise
reward greater than 𝑉 . We initialize the state of start at the beginning 11 Collect initial observation state 𝑠1
of each episode. For each time step, with a small probability 𝜖, select an 12 for t=1 . . . T do
action of exploration (try new action) or exploitation (select a greedy 13 Execute action 𝑎𝑡 = 𝛱(𝑠𝑡 |𝜃) + 
action from current policy), and observe the reward 𝑟𝑡 and next state 14 Observe reward 𝑟𝑡 and new state 𝑠𝑡+1
𝑠𝑡+1 (Lines 6–10). We store the transition in the replay buffer. Finally,
15 Store transition (𝑠𝑡 , 𝑎𝑡 , 𝑟𝑡 , 𝑠𝑡+1 ) in 𝑅
we sample a random minibatch of transitions form 𝑅 and perform a
16 Get random (𝑖 = 1..𝑁) transitions from 𝑅
gradient descent step (Lines 14–20).
17 Set 𝑦𝑖 = 𝑟𝑖 + 𝛿𝑄′ (𝑠𝑖+1 , 𝛱 ′ (𝑠𝑖+1 |𝜃 ′ )|𝜙′ )
∑
18 Loss 𝐿 = 𝑁1 𝑁 𝑖=1 (𝑦𝑖 − 𝑄(𝑠𝑖 , 𝑎𝑖 |𝜙))
2
3.5.4. Algorithm for deep deterministic policy gradient
The proposed algorithm for training our DDPG agent is listed in 19 Update critic by minimizing 𝐿 across all 𝑖
Algorithm 4. First, we initialize all of the critic 𝑄(𝑠, 𝑎|𝜙), the actor 20 Update 𝛱 using policy gradient:
∑
network 𝛱(𝑠|𝜃) with random values of 𝜙 and 𝜃 respectively, to set a 21 ∇𝜃 𝐽 = 𝑁1 𝑖 ∇𝑎 𝑄(𝑠, 𝑎|𝜙)|𝑠=𝑠𝑖 ,𝑎=𝛱(𝑠𝑖 ) ∇𝜃 𝛱(𝑠|𝜃)|𝑠𝑖
target network 𝑄′ . Then, we start our iterative training process (Line 22 Perform soft-update with 𝜇 ≪ 1 as follows:
7). We also initialize a replay buffer 𝑅, which we will populate during 23 𝜙′ ← 𝜙 + (1 − 𝜇)𝜙′
the iteration of each time step (Lines 5 and 15). Within each time step 24 𝜃 ′ ← 𝜃 + (1 − 𝜇)𝜃 ′
(𝑡), we select an action according to the policy 𝜋 with a certain random 25 end
noise ( ) and we execute it as an exploration (Line 13). We store the 26 end
transition in the replay buffer 𝑅, accordingly. Note that in the forward 27 end
pass, we compute a loss function (Line 18) to update the critic by
minimizing such loss over the chosen random transition 𝑖 chosen from
the replay buffer 𝑅. We also, update the policy 𝛱 using policy gradient
(Line 21), through the soft-update of the policy and critic parameters 4.1. 5G experimental prototype overview
(𝜃) and (𝜙), respectively (Lines 23–24).
Fig. 3 depicts our 5G Non-Standalone (NSA) experimental prototype
based on OAI implementing RAN and Core Network (CN). Northbound
4. Performance evaluation interface (NBI) for Configuration Management (Configuration Manager
in orange) interacts with our proposed orchestrator. In its turn, the con-
In this section, we start by presenting the implementation of the figuration manager commands the Software-Defined RAN Controller,
resource orchestrator in our 5G prototype. We have used such proto- namely, FlexRAN [5] through the NBI to manage underlying RAN nodes
type to generate realistic datasets to train the different modules, prior implemented using OAI.
exporting them. Before presenting the overall network slice orchestra- We implement an Operation Support Subsystem/Business Support
tor performance used in our prototype, we evaluate the performance of Subsystem using open-source projects, by setting up a ‘‘TICK Stack’’
each building block from a standalone viewpoint. (Telegraf, InfluxDB, Chronograph and Kapacitor) [49]. Telegraf is a
8
Fig. 3. Our 5G experimental prototype block diagram. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Table 2 using a Gigabit Ethernet cabled network to a second laptop (Quad Core
Gatekeeper and decision maker simulation parameters.
i7, 16 GB of RAM). Both laptops run Ubuntu 16.04. We argue that our
Parameter Explanation Value
prototype is 5G-based. On first hand, we are using a functional split in
𝐶 (1) vCPUs constraint 2 the ex-4G-baseband Unit (BBU) into a central unit (CU) and digital unit
𝐶 (2) RAM constraint (GB) 1
(DU). In addition, on the second hand, we are employing a software-
D # of Knapsack constraints 2
slices number of slices 3 defined RAN controller (FlexRAN) to manage the RAN slicing. Both of
UEs number of UEs 3 RAN functional split and SDN controller employment are 5G concepts.
𝜀0 threshold for tree growing 0.05 We also argue that for the sake of simplicity, a limited number of three
S (Complex) # of splits in the tree 100
UEs can be sufficient, as we used two UEs as users of an eMBB slice,
S (Medium) # of splits in the tree 20
S (Simple) # of splits in the tree 4 and the third UE as a IoT Gateway behind which, the traffic of multiple
q (Complex) minimum leaf size 4 IoT sensors/devices is aggregated to implement an mMTC slice. Note
q (Medium) minimum leaf size 12 that multi-base station validation is not needed in our case since we
q (Simple) minimum leaf size 36
are not evaluating Handover or cell-reselection based metrics or related
N # of PRBs 50/100
Frame Type duplexing type FDD Key Performance Indicators (KPIs). Instead, we are focusing on how to
EUTRA band LTE band (F=2600 MHz) 7 orchestrate network slices capacities sharing a BS within one mobile
K (Fine KNN) # of nearest neighbor 1 network, therefore a single USRP is sufficient.
K (Coarse KNN) # of nearest neighbor 100
K (Other KNN) # of nearest neighbor 10
𝜏 Time window for closed-loop (ms) 100 4.2. Dataset generation and simulation environment
A dataset is generated using our 5G experimental prototype by

running, for 24 h, a background script interfacing with FlexRAN to col-
plugin-driven server agent for collecting and sending metrics and
events from databases, systems, IoT sensors, and HTTP APIs. InfluxDB lect the configuration and performance management data of our COTS
is a time-series database optimized for fast, high-availability storage UEs. The collected statistics, encoded as a JSON file (JavaScript Object
and retrieval of time series data in fields such as operations monitor- Notation), include provisioned slicing ratios, priority, QCI, power mea-
ing, application metrics, Internet of Things sensor data, and real-time surements, among other configuration and performance management
analytics. Chronograph allows to rapidly build dashboards with real- metrics [5]. With background processes requiring Internet connection,
time visualizations of the accumulated data. Finally, Kapacitor is a UEs are moved from time to time in our laboratory space to change
native data processing engine. It can process both stream and batch their radio conditions. ML models are implemented using MATLAB®
data from InfluxDB, acting in real-time via its programming language [50] on Dual Intel Core i7, 2.4 GHz, 4-Cores 7th Gen. with 16 GB
called TICK-script. The User Equipments (UEs) are conventional Com- of RAM. Afterwards, RTs models are compiled as standalone applica-
mercial Off-The-Shelf (COTS) smartphones. We used OAI-5G for RAN tions for Linux so that they are implemented in our 5G experimental
to implement both of the Digital Unit (DU) and Central Unit (CU) prototype. Several ML based models are evaluated in MATLAB to
of a gNB as Docker containers. OAI-CN, including Home Subscriber classify different NSs (QCI, resource type, loss rate, and priority level)
Server (HSS), Mobility Management Entity (MME) and Serving/Packet corresponding to three conventional types of 5G slices: eMBB, URLLC
Gateways (S/P-GWs) are implemented as services. This implementation and mMTC, as summarized in Table 1. Simulation parameters for the
is inline with 5G deployment option 3 for NSA [34]. gatekeeper and the decision maker are listed in Table 2. Note that
We used B210 USRP that is connected to a USB3 port on a laptop we used both of 10 MHz and 20 MHz channel bandwidth for our
(Quad Core i7, 16 GB of RAM with Low Latency Kernel), implementing Universal Software Radio Peripheral (USRP) [51], which corresponds
the DU and the CU. Accordingly, the distributed RAN is backhauled to 50 Physical Resource Blocks (PRBs) or 100 PRBs, respectively.
9
Table 3 Table 5
Classification accuracy, speed, and training time. Resource manager parameters and values.
Class Model FA (%) FS (obs/s) TT (s) Parameter Explanation Value
Trees Complex tree 94.7 6000 1.148 𝛾max flow variation upper bound 1
[17] Medium Tree 94.7 5800 0.782 𝛾min flow variation lower bound −1
Simple Tree 95.3 9200 0.646 𝜀 threshold for reward bias 0.25
𝑓set desired processing flow 30
KNN Fine KNN 94.7 2200 1.618
𝑓𝑑0 demands initial flow 25
[18] Medium KNN 94.7 2000 1.530
𝑓0 system initial flow 20
Coarse KNN 64.7 3600 1.444
𝐿 critic DNN neurons 48
Cosine KNN 84.7 2700 1.704
𝑙𝑑0 demands initial load 80
Cubic KNN 94 3500 1.623
𝑙𝑔0 initial load margin 20
Weighted KNN 95.3 4400 1.892
𝑙0 system initial load 10
SVM Linear 96.7 1700 3.234 𝑁𝑒𝑝 number of Episodes 5000
[21] Quadratic 96 1900 2.941 𝑇 number of time steps 600
Cubic 94.7 2800 3.879 𝑡gap initial time gap 1.4
Fine Gaussian 92 2800 3.792 𝑉 stop training value 260
Medium Gaussian 96.7 2900 3.691 𝜆𝑐 learning rate for critic 10−3
Coarse Gaussian 95.3 2500 3.545 𝜆𝑎 learning rate for actor 10−4
Ensemble Boosted Trees 33.3 3100 2.186
[20] Bagged Trees 94 480 4.762
Subspace Discrim. 95.3 410 6.859
Subspace KNN 93.3 320 7.471 accuracy of more than 90% except for coarse KNN, cosine KNN, boosted
RUSBoosted Tree 33.3 8200 6.495 trees and Random UnderSampling Boosted (RUSBoosted) trees. Linear
Discrim- Linear 98 6200 1.294 Discriminant provides the highest accuracy.
inant [19] Quadratic 96.7 3700 1.922 Table 4 reports different performance metrics including the Root
Mean Square Error (RMSE), coefficient of determination (𝑅2 ) and Mean
Table 4
Absolute Error (MAE) [17] for the simulated forecasting models. We
Forecasting methods benchmarking. can see that trees (complex, medium, simple and even boosted trees),
Class Regression RMSE 𝑅2 MAE perform the best. However, this comes at the expense of an increased
Linear Basic 28.77 0.16 25.61
training time, as shown in Fig. 4. Indeed, from that figure, we can
[16] Interactions Linear 29.34 0.12 25.52 clearly see that, although RTs provide the lowest RMSE and highest
Robust Linear 29.1 0.14 24 forecasting speed, but their training time is not the least among the
Stepwise Linear 29.41 0.12 25.82 evaluated methods. However, this can be acceptable since the training
Trees Complex Tree 5 0.97 3.07 need is not as frequent as the calls for forecasting. Accordingly, we
[17] Medium Tree 5.17 0.97 3.15 chose to implement the simple trees for subsequent evaluations due to
Simple Tree 5.76 0.97 3.77
its outstanding performance and lower complexity compared to other
SVM Linear 31.48 0.01 21.74
trees.
[21] Quadratic 18.21 0.66 14.25
Cubic 16.82 0.71 13.4
Fine Gaussian 23.69 0.43 19.51 4.3.2. ML-based forecasting using regression trees
Medium Gaussian 17.48 0.69 14.12 Ground-truth metric:
Coarse Gaussian 29.06 0.14 20.32 To measure the ground-truth, we consider a simple two-slices’ sce-
Ensemble Boosted Trees 5.41 0.97 3.24 nario (eMBB and mMTC) using the 100 PRBs setup (i.e., the USRP card
[20] Bagged Trees 12.52 0.84 10.13
is configured with 20 MHz channel bandwidth). Fig. 5(a) depicts the
GPR Square Exponential 16.87 0.71 13.62 predicted values and Ground-truth. From this figure, we can see that
[22] Matern 5/2 17.05 0.7 13.73
the predicted values are close to empirical ground-truth. Indeed, the
Exponential 17.91 0.67 13.6
Rational Quadratic 16.87 0.71 13.62 predicted values are spread around the straight line displaying a perfect
match (𝑌 = 𝑋) as shown in Fig. 5(b).
Slicing ratio metric:
To show the benefit of the ML-based RTs for the Forecast Aware
4.3. Simulation and implementation results Slicer, we compare in Fig. 6, the predicted values of slicing ratios
with three alternative schemes: Optimum, Static and Random-slicing
We start by evaluating the different ML-based approaches used for approaches. Note that Optimum values are calculated using a bottom-
classification, forecasting and reinforcement learning for the resource up estimation by aggregating demands of each slice and deducing
manager. Then, we benchmark ML-based Regression Trees, which out- the ratios. For the static approach, we assume a ratio of 50% for
performed other evaluated techniques, among different strategies: Op- the eMBB slice and 50% for the mMTC slice. We can see that the
timum, Static, and Random-slicing. We finally compare to the system random approach performs the worst. On the other hand, ML-based RTs
performance in terms of number of PRBs, BSR, system utilization and outperform both static and random approaches with an average gap of
network throughput. 5% only to the optimal approach. This is due to the highest forecasting
accuracy among the evaluated schemes.
4.3.1. Performance of ML-based classification models
Several ML-based classification techniques are benchmarked when 4.3.3. Resource manager training and validation
proceeding with classification of 150 different requirements (QCI, Re- Simulation parameters of the resource manager are reported in
source Type, Loss Rate, Priority Level) to conventional three slices as Table 5. Fig. 7 shows the reward for the DDPG (black) along with the
anticipated for 5G (eMBB, URLLC, mMTC). Table 3 reports the Fore- reward of DQL (red). As our agent is based on actor–critic method,
casting Accuracy (FA), Forecasting Speed (PS) and Training Time (TT) we plot on the same figure the critic’s estimate of the discounted
of different classification models of the simulated 150 tenant requests long-term reward at the start of each episode (Q0), based on the
with corresponding requirements, considering one requirement sheet initial observation of the environment. As training progresses through
per tenant. We can notice that the majority of model types provide an episodes, we can see that Q0 approaches the true discounted long-term
10
Fig. 4. Performance comparison of regression methods.
Fig. 5. Comparison of predicted vs. ground truth.
Fig. 6. Ratio of optimum, ML, static, random.
11
Fig. 7. Reward evolution in training stage. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
reward, which confirms the suitability of the design of the DDPG critic. no data to transmit’’ and when the index gets larger, it means that the
At the bottom of Fig. 7, we display the number of steps of each episode. UE has additional data in the buffer, waiting for transmission.
We can observe that the number of steps is at most 600, as stipulated Number of UL PRBs:
in our algorithm. We note that the DDPG takes less steps than DQL. In this experiment, we collected 23586 records of measurements
including the total number of used UL PRBs using 10 MHz bandwidth
4.3.4. Overall system performance: (50 PRBs) USRP configuration. We plot in Fig. 8(a) the two normalized
To further show the effectiveness of our system leveraging RTs histograms of the used PRBs in both cases: ‘‘With’’ and ‘‘Without’’ the
based Forecast Aware Slicer, we compare it with a simple baseline decision maker and the scheduler. Since the sample sizes are different,
we have normalized the two histograms in order to have the sum of
where the modules are disabled. We create two slices having identifiers
all bar heights equal to 1 and be able to plot them on the same figure
(ID) ‘‘0’’ and ‘‘1’’, where we map our devices, successively, to one
for benchmarking. We use a uniform bin width of 5 PRBs for clarity.
of these slices and we trigger Youtube video upload. Slice 0 is not
The 𝑋-axis refers to the number of used UL PRBs at each observation.
configured to do dynamic slicing ratio with forecasting capability,
The Y-Axis refers to the normalized distribution of used UL PRBs. Two
whereas Slice 1 exploits the RTs based forecasting. In order to assess
main observations can be made here. First, the number of occurrences
the performance of this scenario, we consider four metrics, namely the
decreases with growing UL PRB usage in both approaches. We can
UL PRBs, the BSR, the system utilization and the network throughput.
explain this behavior as a result of the completion of the video upload
It is worth noting that the BSR provides information on how much data
process. Second, by enabling the decision maker and the scheduler, we
is accumulated in the UEs’ buffer and is waiting to be uploaded from significantly favor the use of low number of UL PRBs compared to the
the UE to the Evolved Node-B/new Generation Node-B (eNB/gNB), as case where they are disabled. This is particularly shown for the two
sent in MAC Control Element (CE) [52]. According to the BSR, the first bins (i.e., 1–5 and 6–10 used PRBs cases), where the frequency
eNB/gNB allocates the minimum amount of UL Grant that are resources of used PRBs is higher when enabling these modules than that when
for the Physical Uplink Shared Channel (PUSCH) when such resource is they are disabled. This behavior is reversed for high UL PRB usage.
available. Using BSR, the network optimizes UL resources based on two From a resource management standpoint, it is a positive behavior as
folds. First, the network allocates UL resources (UL Grant) only when our system allocates sufficient resources to UEs without waste and tries
the UE has data to transmit. Second, BSR allows avoiding allocating to smooth the usage of the PRBs to avoid sudden spikes in demands.
too much UL resources (more than what is needed by the UEs) to avoid This finding will be confirmed when analyzing the BSR index in the
the waste of such resources. As FlexRAN reports the buffer size value following Fig. 8(b) and Table 6.
in bytes, we map these values to BSR indexes based on a table in [52]. Number of BSR, system utilization and network throughput:
Note that BSR index values range between 0 and 63. The BSR index is To illustrate the number of occurrences where the UEs have data to
a unit-less simplified metric such that ‘‘0’’ value refers to a ‘‘UE with transmit but no UL resources are available, we plot in Fig. 8(b) the
12
Fig. 8. Decision maker and scheduler effect on PRBs and non-zero BSRs.
Table 6
Decision maker and scheduler effect on non-zero BSRs for UL.
UL ratio Without decision maker and scheduler With decision maker and scheduler
10% 29 27
20% 5 0
30% 3 0
50% 111 4
80% 107 42
90% 104 0
100% 101 0
normalized histograms of Non-zero BSR index values, with bin size blocks, namely, Gatekeeper for classification, Decision Maker with
of 10. Interestingly, we can observe that, by enabling the Decision Forecast Aware Slicer and Admission Controller sub-modules, Slice
Maker and the Scheduler modules, the bytes left in the buffers are Scheduler, and adaptive deep reinforcement learning based Resource
decreased compared to the case when they are disabled, and that is Manager. We evaluated the system performance using a 5G-ready
for all BSR index values. In particular, there are no occurrences when experimental prototype based on OpenAirInterface and FlexRAN. From
activating these modules in bin 51, unlike the case when these modules our experiments, we have observed that the Regression Trees (RTs)
are disabled. Moreover, we report in Table 6 the number of events outperform other ML models in terms of classification and prediction
where the BSR index is not zero. From this table, we can see that, accuracy. In particular, compared to linear based regressions, Root
when the Decision Maker and scheduler are not activated, there are
Mean Square Error (RMSE) is divided by six, prediction speed almost
more than 400 non-zero BSR indexes. These observations are seen even
quadrupled but training time has slightly increased. We also found that
when the maximum possible slices ratios (90% and 100%, respectively)
the average gap between RTs and the optimal approach is only 5%
are considered. However, no reported non-zeros BSR index values are
and its trend is very close to the ground-truth slicing ratio. In addition,
observed when these modules are enabled. This means that no bytes
are left in the buffer in this case and the transmission of all the data is after implementing the RTs, the whole system, allowed to reduce the
timely done. As for the observation of the Non-Zeros BSRs gradually number of wasted Physical Resource Blocks and increase the network
increasing from null value that were observed at UL slicing ratios throughput compared to the case where system modules are disabled.
of 20% and 30%, we can interpret it as follows. When the UL ratio But, this resulted in an acceptable increase of the system resources
increased, additional PRBs were allowed to be used, and thus more utilization. This effect is usually welcomed as it implies higher revenues
bytes were uploaded, but the bottleneck in this case is geared towards in a multi-tenant environment, which is a highly desirable behavior for
the resource manager, and thus cause the increase seen in values 4 both public and private cloud deployments.
then 42. In this case, the resource manager has autonomously triggered
its scale-out, which explains why such count decreased again for the CRediT authorship contribution statement
90 and 100% ratios. This confirms our previous results and shows the
effectiveness of the orchestration process in processing data in a timely
Nazih Salhab: Conceptualization, Methodology, Software develop-
manner.
ment and integration, Data curation, Writing - original draft, Visualiza-
Finally, Fig. 9 depict the normalized throughput for the eMBB Slice
tion, Investigation. Rami Langar: Supervision, Writing, Reviewing and
and the system CPU utilization during two hours of simulation, respec-
editing. Rana Rahim: Supervision, Writing, Reviewing and editing.
tively. We can observe, in Fig. 9(a) that, in overall, when the decision
maker is enabled, the normalized throughput is higher compared to the
case where it is deactivated. But, this comes at a cost of an increased Declaration of competing interest
system utilization as seen in Fig. 9(b). Interestingly, we note that the
same normalized throughput was maintained although we queued a The authors declare that they have no known competing finan-
new video for upload. This is clearly seen in the system utilization, cial interests or personal relationships that could have appeared to
which increased starting from 50 min and onwards, reflecting such influence the work reported in this paper.
additional load.
5. Conclusion Acknowledgment
In this paper, we have presented a novel framework based on This work was partially supported by the FUI SCORPION project,
Machine-Learning (ML) for network slices orchestration in 5G net- France (Grant no. 17/00464), the CNRS PRESS project, France (Grant
works. Specifically, we have designed and implemented four building no. 07771), ‘‘Azm & Saade’’ Association, and the Lebanese University.
13
Fig. 9. Resource manager effect on network throughput and system utilization.
References [22] Y. Xu, F. Yin, W. Xu, J. Lin, S. Cui, Wireless traffic prediction with scalable
Gaussian process, IEEE J. Sel. Areas Commun. (2019).
[1] N. Salhab, R. Rahim, R. Langar, R. Boutaba, Machine learning based resource [23] L. Zanzi, V. Sciancalepore, A. Garcia-Saavedra, X. Costa-Perez, OVNES: Demon-
orchestration for 5G network slices, in: 2019 IEEE Global Communications strating 5G network slicing overbooking on real deployments, in: IEEE INFOCOM
Conference, GLOBECOM, 2019, pp. 1–6. 2018 - IEEE Conference on Computer Communications Workshops, INFOCOM
[2] 3GPP TR 22.864: Feasibility Study on New Services and Markets Technology WKSHPS, 2018, pp. 1–2, http://dx.doi.org/10.1109/INFCOMW.2018.8406867.
Enablers - Network Operation; Stage 1 (R.15), 2016. [24] J.X. Salvat, L. Zanzi, A. Garcia-Saavedra, V. Sciancalepore, X. Costa-Perez,
[3] K. Samdanis, S. Wright, A. Banchs, A. Capone, M. Ulema, K. Obana, 5G Network Overbooking network slices through yield-driven end-to-end orchestration, in:
slicing – Part 2: Algorithms and practice, IEEE Commun. Mag. 55 (8) (2017) ACM CoNEXT ’18, Association for Computing Machinery, New York, NY, USA,
110–111, http://dx.doi.org/10.1109/MCOM.2017.8004164. 2018, pp. 353–365, http://dx.doi.org/10.1145/3281411.3281435, https://doi.
[4] N. Salhab, R. Rahim, R. Langar, NFV orchestration platform for 5G over on-the- org/10.1145/3281411.3281435.
fly provisioned infrastructure, in: IEEE INFOCOM 2019 - IEEE Conference on [25] M. Yan, G. Feng, J. Zhou, Y. Sun, Y. Liang, Intelligent resource scheduling
Computer Communications Workshops, INFOCOM WKSHPS, 2019, pp. 971–972. for 5G radio access network slicing, IEEE Trans. Veh. Technol. 68 (8) (2019)
[5] OpenAirInterface (OAI), 2019, https://www.openairinterface.org/. 7691–7703, http://dx.doi.org/10.1109/TVT.2019.2922668.
[6] DOCKER: Platform for high-velocity innovation, 2019, https://www.docker. [26] D. Bega, M. Gramaglia, A. Banchs, V. Sciancalepore, X. Costa-Perez, A machine
com/. learning approach to 5G infrastructure market optimization, IEEE Trans. Mob.
[7] L. Le, B.P. Lin, L. Tung, D. Sinh, SDN/NFV, machine learning, and big data Comput. (2019) 1, http://dx.doi.org/10.1109/TMC.2019.2896950.
driven network slicing for 5G, in: 2018 IEEE 5G World Forum, 5GWF, 2018, pp. [27] M. Harishankar, S. Pilaka, P. Sharma, N. Srinivasan, C. Joe-Wong, P. Tague,
20–25, http://dx.doi.org/10.1109/5GWF.2018.8516953. Procuring spontaneous session-level resource guarantees for real-time appli-
[8] T. Taleb, I. Afolabi, K. Samdanis, F.Z. Yousaf, On multi-domain network slicing cations: An auction approach, IEEE J. Sel. Areas Commun. 37 (7) (2019)
orchestration architecture and federated resource control, IEEE Netw. (2019) 1534–1548, http://dx.doi.org/10.1109/JSAC.2019.2916487.
1–11, http://dx.doi.org/10.1109/MNET.2018.1800267. [28] E.H. Bouzidi, A. Outtagarts, R. Langar, Deep reinforcement learning application
[9] Z. Xiong, Y. Zhang, D. Niyato, R. Deng, P. Wang, L. Wang, Deep reinforcement for network latency management in software defined networks, in: 2019 IEEE
learning for mobile 5G: Fundamentals, applications, and challenges, IEEE Veh. Global Communications Conference, GLOBECOM, 2019, pp. 1–6.
Technol. Mag. 14 (2019) http://dx.doi.org/10.1109/MVT.2019.2903655. [29] J. Pérez-Romero, O. Sallent, R. Ferrús, R. Agustí, On the configuration of radio
[10] L. Zanzi, F. Giust, V. Sciancalepore, M2EC: A multi-tenant resource orchestration resource management in sliced RAN, in: NOMS 2018 - IEEE/IFIP Network
in multi-access edge computing, in: IEEE Wireless Communications and Network- Operations and Management Symposium, 2018, pp. 1–6.
ing Conference, WCNC, 2018, pp. 1–6, http://dx.doi.org/10.1109/WCNC.2018. [30] V. Sciancalepore, K. Samdanis, X. Costa-Perez, D. Bega, M. Gramaglia, A. Banchs,
8377292. Mobile traffic forecasting for maximizing 5G network slicing resource utilization,
[11] G. Zhu, J. Zan, Y. Yang, X. Qi, A supervised learning based QoS assurance in: IEEE Conference on Computer Communications, INFOCOM, 2017, pp. 1–9.
architecture for 5G networks, IEEE Access (2019) http://dx.doi.org/10.1109/ [31] Y. Yu, J. Wang, M. Song, J. Song, Network traffic prediction and result analysis
ACCESS.2019.2907142. based on seasonal ARIMA and correlation coefficient, in: 2010 International
[12] Y. Yamada, R. Shinkuma, T. Sato, E. Oki, Feature-Selection based data prioritiza- Conference on Intelligent System Design and Engineering Application, 2010, pp.
tion in traffic prediction using machine learning, in: IEEE Global Communications 980–983.
Conference, GLOBECOM, 2018, pp. 1–6.
[32] MIT, Google, The New Proving Ground for Competitive Advantage, Tech. Rep.,
[13] Y. Xie, J. Hu, Y. Xiang, S. Yu, S. Tang, Y. Wang, Modeling oscillation behavior
MIT Technology Review, 2017.
of network traffic, IEEE Trans. Parallel Distrib. Syst. 24 (9) (2013).
[33] 3GPP TS 32.130: Network sharing: Concepts (rel. 14), 2016.
[14] F. Tseng, X. Wang, L. Chou, H. Chao, V.C.M. Leung, Dynamic resource prediction
[34] 3GPP TR 28.801: Study on management and orchestration of network slicing for
and allocation for cloud data center using the multiobjective genetic algorithm,
next generation network (Release 15), 2018.
IEEE Syst. J. 12 (2018).
[35] E. Brynjolfsson, T. Mitchell, What can machine learning do? Workforce
[15] N. Salhab, R. Rahim, R. Langar, Optimization of virtualization cost, processing
implications, Science 358 (6370) (2017).
power and network load of 5G software-defined data centers, IEEE Trans. Netw.
[36] 3GPP TS 23.203: Policy and charging architecture (Rel. 7), 2016.
Serv. Manag. (2020) 1–12.
[16] D.J. MacKay, Bayesian interpolation, Neural Comput. 4 (3) (1992) 415–447. [37] S.M. Wong, Y. Yao, Linear structure in information retrieval, in: Proceedings
[17] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression of the 11th Annual International ACM SIGIR Conference on Research and
Trees, Chapman and Hall by CRC, 1984, p. 368. Development in Information Retrieval, 1988, pp. 219–232.
[18] S. Zhang, X. Li, M. Zong, X. Zhu, R. Wang, Efficient kNN classification with [38] Google, 2019, https://cloud.google.com/compute/all-pricing.
different numbers of nearest neighbors, IEEE Trans. Neural Netw. Learn. Syst. [39] Amazon, 2019, https://aws.amazon.com/ec2/pricing/on-demand/.
29 (5) (2018) 1774–1785. [40] R.M. Nauss, The 0–1 knapsack problem with multiple choice constraints,
[19] R. Gribonval, From projection pursuit and CART to adaptive discriminant European J. Oper. Res. 2 (2) (1978) 125–131.
analysis? IEEE Trans. Neural Netw. 16 (3) (2005) 522–532. [41] N. Salhab, R. Rahim, R. Langar, Throughput-Aware RRHs clustering in cloud ra-
[20] Y. Wang, S. Xia, Q. Tang, J. Wu, X. Zhu, A novel consistent random forest dio access networks, in: 2018 Global Information Infrastructure and Networking
framework, IEEE Trans. Neural Netw. Learn. Syst. 29 (8) (2018) http://dx.doi. Symposium, GIIS, 2018, pp. 1–5.
org/10.1109/TNNLS.2017.2729778. [42] N. Wirth, Algorithms and Data Structures, CUMINCAD, 1986.
[21] A.Y. Nikravesh, S.A. Ajila, C. Lung, W. Ding, Mobile network traffic prediction [43] A.R. Sampaio, J. Rubin, I. Beschastnikh, N.S. Rosa, Improving microservice-based
using MLP, MLPWD, and SVM, in: 2016 IEEE International Congress on Big Data, applications with runtime placement adaptation, J. Internet Serv. Appl. 10 (1)
BigData Congress, 2016, pp. 402–409. (2019) 1–30.
14
[44] 3GPP TS 23.501: System architecture for the 5G System (5GS), V16.6.0 (Rel. Rami Langar is currently a Full Professor at University
16), 2020. Gustave Eiffel (UGE), France. Before joining UGE, he was
[45] D. Davis, A. Rosemblat, vCPU Sizing Consideration, White Paper, Dell Software, an Associate Professor at LIP6, University Pierre and Marie
2019, https://virtualizationreview.com. Curie (now Sorbonne University) between 2008 and 2016,
[46] Y. Al-Dhuraibi, F. Paraiso, N. Djarallah, P. Merle, Autonomic vertical elasticity and a Post-Doctoral Research Fellow at the School of
of docker containers with elasticdocker, in: 2017 IEEE 10th International Computer Science, University of Waterloo, Waterloo, ON,
Conference on Cloud Computing, CLOUD, IEEE, 2017, pp. 472–479. Canada between 2006 and 2008. He received the M.Sc.
degree in network and computer science from UPMC in
[47] H. Xu, B. Li, Dynamic cloud pricing for revenue maximization, IEEE Trans. Cloud
2002; and the Ph.D degree in network and computer science
Comput. 1 (2) (2013) 158–171.
from Telecom ParisTech, Paris, France, in 2006. Prof. Langar
[48] R. Bellman, Dynamic Programming, Princeton University Press, USA, 2010.
is involved in many European and National French research
[49] Time series database, 2019, https://www.influxdata.com.
projects, such as MobileCloud (FP7), GOLDFISH (FP7),
[50] Matrix laboratory: MATLAB and Simulink, 2019, https://mathworks.com/.
ANR ABCD, ANR 5G-INSIGHT, FUI PODIUM, FUI ELASTIC,
[51] Universal software radio peripheral (USRP), 2019, http://www.ettus.com. FUI SCORPION. He was chair of IEEE ComSoc Technical
[52] 3GPP TS 36.321: Medium access control (MAC) protocol specification (Rel. 12), Committee on Information Infrastructure and Networking
2015. (TCIIN) for the term Jan. 2018-Dec. 2019, and co-recipient
of the IEEE/IFIP International Conference on Network and
Service Management 2014 (IEEE/IFIP CNSM 2014) best
paper award. His research interests include resource man-
Nazih Salhab received his Computer Science and Telecom-
agement in future wireless systems, Cloud-RAN, network
munication Engineer degree and his Master II Research from
slicing in 5G/5G+/6G, software-defined wireless networks,
the Lebanese University, Engineering Faculty I. After that,
smart cities, and mobile Cloud offloading.
he received his double Ph.D. degrees from University Paris-
Est in France and the Lebanese University in Lebanon. He is
a senior advisor with more than 15 years of international ex- Rana Rahim received her Computer science and Telecom-
perience in mobile networks operation management, project munication Engineer degree from the Lebanese University
management and business analysis for major mobile net- in 2002. She then obtained her Master degree (DEA) in
work operators around the world. He also joined several 2003 from the USJ University (Lebanon) and the Lebanese
international labs as visiting scientist, including EPFL- University, and her Ph.D. degree in January 2008 from the
Switzerland and D.R. Cheriton, University of Waterloo, ON, University of Technology of Troyes (UTT) - France. She was
Canada. His research interest include: Network Function a postdoctoral researcher at the UTT from October 2008 to
Virtualization (NFV), Software-Defined Networking (SDN), October 2009. She obtained her HDR (Habilitation à Diriger
Cloud Radio Access Network (C-RAN), Orchestration, Arti- des Recherches) in 2016. She is currently Associate professor
ficial Intelligence/Machine Learning (AI/ML) and network at the Lebanese University. Her research interests include
resource management. System management, Quality of Service, IoT Networks,
Smart Grids, cloud radio access networks, slicing in 5G and
software-defined networks.
15

1 s2.0 S1389128621000165 Main

Uploaded by

Copyright:

Available Formats

1 s2.0 S1389128621000165 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S1389128621000165 Main

Uploaded by

Copyright:

Available Formats

Computer Networks 188 (2021) 107829

Contents lists available at ScienceDirect

5G network slices resource orchestration using Machine Learning

ARTICLE INFO ABSTRACT

2.3. Admission control and scheduling optimization approaches

utility(𝜉ℎ(𝑙) , 𝑘) = 𝛼𝑘 ⋅ 𝜉ℎ(𝑙) (1)

Fig. 1. Block diagram of 5G network slice orchestrator.

A dataset is generated using our 5G experimental prototype by

Fig. 4. Performance comparison of regression methods.

Fig. 5. Comparison of predicted vs. ground truth.

Fig. 6. Ratio of optimum, ML, static, random.

Fig. 9. Resource manager effect on network throughput and system utilization.

You might also like