Load Balancing Models Based On Reinforce PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Load Balancing Models based on Reinforcement

Learning for Self-Optimized Macro-Femto LTE-


Advanced Heterogeneous Network
Sameh Musleh, Mahamod Ismail and Rosdiadee Nordin
Department of Electrical, Electronics and Systems Engineering,
Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia,
43600, Bangi, Selangor, Malaysia.
[email protected]

Abstract—Heterogeneous Long Term Evolution-Advanced A Femto cell is a low power node. It becomes compulsory
(LTE-A) network (HetNet) utilizes small cells to enhance its that many processes including the installation and
capacity and coverage. The intensive deployment of small cells troubleshooting of Femto cells need to be automated. This is
such as pico- and femto-cells to complement macro-cells for the reason that the end-user is not expected to have the
resulted in unbalanced distribution of traffic-load among cells.
enough technical knowledge to be able to install Femto cells
Machine learning techniques are employed in cooperation with
Self-Organizing Network (SON) features to achieve load or to troubleshoot them. As a result, the Self-Organizing
balancing between highly loaded Macro cells and underlay Network (SON) for LTE-A is a new technology that consists
small cells such as Femto cells. In this paper, two algorithms of new concepts and functionalities to automate the
have been proposed to balance the traffic load between Macro operation of LTE-A HetNets towards better performance
and Femto cells. The two proposed algorithms are named as and higher quality of service [1]. Specifically, the operations
Load Balancing based on Reinforcement Learning of end-user of self-tuning and self-optimization are defined in SON-
SINR (LBRL-SINR) and Load Balancing based on enabled LTE-A networks [2]. SON is a recent development,
Reinforcement Learning of Macro cell-throughput (LBRL-T). and it is part of 3GPP standard for LTE-A [3]. Recently,
Both of the proposed algorithms utilize Reinforcement
diverse challenges related to SON-enabled HetNets have
Learning (RL) technique to control the reference signal power
of each Femto cell that underlays a highly loaded Macro cell. been widely researched in various international research
At the same time, the algorithm monitors any degradation in projects including 3GPP projects [4],[5]. Various efforts that
the performance metrics of both Macro and its neighbor Femto have been taken to develop advanced Radio Resources
cells and reacts to troubleshoot the degradation in real time. Management (RRM) algorithms to decrease the effect of
The simulation results showed that both of the proposed interference in a dense LTE-A HetNets [6].
algorithms are able to off-load end-users from highly loaded The traffic load balancing is one of the most demanding
Macro cell and redistribute the traffic load fairly with its topics for both the automation and self-optimization
neighbor Femto cells. As a result, both of call drop rate and processes in the context of LTE-A networks [7]. The high
call block rate of a highly loaded Macro cell are decreased.
traffic volumes, as well the unbalanced traffic volumes
Index Terms—Load Balancing; LTE-A HetNet; Small Cells; which are generated from end-users are the motivation for
Reinforcement Learning. load balancing techniques to be researched. The traffic load
balancing is targeting to achieve the balance between LTE-
I. INTRODUCTION A radio resources and end-users traffic. The process of load
balancing affects the Grade of Service (GoS), which is
One of the 3GPP technologies that meets the high demand specifically related to call maintainability. Parameters such
for new services is LTE-A HetNet. It integrates various as radiation pattern power [8], Handover power-margins [9]
network structures and various cell types. This is for the and reference signal power are optimized to cope with end-
purpose of offering new data and voice services, improved users traffic. There have been a few studies researched in the
latencies and higher throughput for end-users. The main field of load balancing for Macro and small cells in HetNets
nodes of HetNets include High Power Nodes (HPNs) such [10, 11]. Unbalanced traffic is a prominent issue that should
as Macro eNodeBs, and Low Power Nodes (LPNs) such as be investigated in-depth for indoor and outdoor HetNet
Pico and Femto cells. LPNs are defined in 3GPP as small deployment scenarios.
cells. They become important elements of LTE-A HetNet, Reinforcement Learning (RL) is a technique that is
and they contribute to improve the performance of the whole specifically used for interactive learning [12]. It is based on
network in terms of increasing both of the link and system Q-Learning (QL) technique which does not need a system
capacity, as well extending the network coverage in both defined by a formula or transfer function. As a result, it
outdoor and indoor networks [1]. The deployment of open- becomes an attractive technique to be used to optimize the
access Femto cells enables Macro cells to reduce the operations of LTE-A radio access network in real time [13-
opportunity of being overloaded or congested with a high 16].
number of end-users. Moreover, the cost of deploying In this paper, two emerging load balancing techniques
Macro sites to solve the problems of network capacity and have been proposed to overcome the high traffic-load
coverage is reduced. problem of Macro cells in LTE-A HetNet. Both of the

ISSN: 2180-1843 e-ISSN: 2289-8131 Vol. 9 No. 1 47


Journal of Telecommunication, Electronic and Computer Engineering

proposed techniques, named as LBRL-SINR and LBRL-T, algorithm is able to achieve the most effective Q-value,
are mainly employing Q-Learning method to process the based on delayed rewards. This is true regardless of the
degraded performance metrics of Macro cells and to deliver awareness of the agent about the impact of its actions on the
higher link quality for end-users. system where actions are applied. Reinforcement learning
techniques are associated with dynamic programming
II. RELATED WORK techniques, which are used to solve problems related to
optimization. The agents collaborate together during the
Most researches, which are related to traffic load learning process to converge to an optimal policy faster.
balancing in LTE and LTE-A are based on making Meanwhile, each agent during this stage puts the learned
adjustments to the handover or cell selection process in policy into action separately, increasing the capability of the
order to manage the traffic distribution between the designed self-optimization algorithm to run in distributed
neighbor cells [17]. The approaches in this field can be manner. The nature of LTE-A HetNet is rapidly changing
classified into Handover-based control and coverage control due to the dynamic change in parameters and values related
of a given cell. In the case of Handover-based control, the to the mobility of User Equipment (UEs), multipath fading,
UEs are steered into specific cells by adjusting the handover changing traffic distributions, etc.
offsets of each cell. In coverage control approach, eNodeB Each agent learns through the well-known Markov
will either extend its coverage to reach more UEs or reduce Decision Process (MDP), in which the agent is aware about
its coverage in case of overloading so that more UEs will a set 𝑆 of discrete states. Additionally, there is a set 𝐴 of
handover to its neighbor eNodeBs. The author in [18] actions for the agent to implement. At every time interval t
explained a method for monitoring the usage of Resource of the optimization epoch, the agent acquires the current
Blocks (RBs) in eNodeB. Whenever the RBs utilization state 𝑠 𝑡 before it selects a current action 𝑎𝑡 and executes it.
ratio crosses specific limit, it triggers high load status which The agent receives a reward 𝑟(𝑠 𝑡 , 𝑎𝑡 ) and the environment
will initiate optimizing eNodeB’s Reference-signal power. turns to the next state 𝑠 𝑡+1 = 𝛿(𝑠 𝑡 , 𝑎𝑡 ). Both of the 𝛿 and 𝑟
This will reduce the high load at the eNodeB and enable are the main functions in the environment, and the agent
neighbor cells to collaborate in the offloading process. might be unaware of them. In MDP, both of the functions
The author in [19] presented a technique to optimize 𝛿(𝑠 𝑡 , 𝑎𝑡 ) and 𝑟(𝑠 𝑡 , 𝑎𝑡 ) have a direct correlation with the
Jain’s Fairness Index. The proposed technique reallocates current state and action, rather than on previous states or
UEs towards underlay small cells, which are the Pico, Relay actions.
and Femto cells. Both of the Pico and Femto cells use wire- The agent learns a policy 𝜋 to decide about the next action
based backhaul to connect to the closest eNodeB. On the 𝑎𝑡+1 , depending on the current acquired state 𝑠 𝑡 which is,
other hand, Relay nodes use completely wireless connection 𝜋(𝑠 𝑡 ) = 𝑎𝑡 . A precise way to specify which policy 𝜋 that
to connect to its neighbor eNodeBs. In [20], the author the agent will learn is the policy that results in the greatest
proposed an algorithm that monitors eNodeB load based on cumulative reward for the agent. In order to make this
the Handover process and the capacity of neighbor requirement specific and more accurate, we set the
eNodeBs. The algorithm triggers an offloading process cumulative value 𝑉𝜋 (𝑠 𝑡 ) which is resulted from a random
whenever neighbor eNodeBs are found to have an adequate policy 𝜋 from random first state 𝑠 𝑡 as follows:
capacity. The technique could achieve noticeable
performance improvements, especially on UE throughput 𝑉𝜋 (𝑠 𝑡 ) = 𝑟 𝑡 + 𝛾𝑟𝑓𝑡 + 𝛾 2 𝑟𝑓𝑡+1 + 𝛾 3 𝑟𝑓𝑡+2 + ⋯
and BLER. 𝑘 𝑡+𝑘
= ∑∞ 𝑘=0 𝛾 𝑟𝑓 (1)
In [21], the author proposed an algorithm to fairly
distribute the eNodeBs load by making reductions in the
Handover-overhead, which is necessary for initiating any where the order of reward values 𝑟 𝑡+𝑘 is produced by
Handover process. The algorithm is designed based on starting from state 𝑠 𝑡 , and iteratively utilizing the policy 𝜋 to
solving Multi-objective Optimization Problem. There are choose actions as mentioned above (i.e., 𝑎𝑡 = 𝜋(𝑠 𝑡 ), 𝑎𝑡+1 =
two conflicting targets to be controlled by the optimizer, 𝜋(𝑠 𝑡+1 ) etc,.) .
signaling overhead and traffic load. A Higher weight is Each Femto cell is defined as an agent, whereby it
given by the optimizer to the desired target. interacts in real time with the environment and selects an
action in response to the changing system states. The agent
III. FORMULATION OF REINFORCEMENT LEARNING depends on the current Q-values to have the highest possible
TECHNIQUE reward. Meanwhile, it has to identify the actions that
produce the highest reward in the long term.
An LTE-A HetNet is designed as a Multi-Agent Here 0 ≤ 𝛾 < 1 is a constant value that shows the relative
Reinforcement Learning system, in which each Femto cell is value of future reward compared to current reward.
defined as an agent [12]. Reinforcement learning deals with Specifically, the future reward which is yet to be received
the issue of finding strategy for an autonomous agent to are discounted by 𝛾 𝑘 . If 𝛾 𝑘 has the value of 0, then only the
perceive and react in its environment to select optimal instant reward is considered. When 𝛾 value closes to 1, the
actions to reach its objective. For every action that the agent priority is given to the future rewards than the instant
takes in its environment, a trainer sets a reward or penalty to reward.
trigger the agent to decide about a new state. The states are The discounted cumulative reward is defined as 𝑉𝜋 (𝑠 𝑡 ), it
defined in this paper as a range of possible reference signal acquires the policy 𝜋 from the first state 𝑠. Logically, further
power values. An action is defined as the optimal reference rewards should be discounted relative to immediate rewards
signal power value. The agent is learning from the delayed because, generally, the agent would prefer to acquire the
reward in order to select actions that result in the highest reward in the shortest possible time steps. We require that
possible value of cumulative reward. A Q-learning each Femto cell learns a policy 𝜋 that produces the

48 ISSN: 2180-1843 e-ISSN: 2289-8131 Vol. 9 No. 1


Load Balancing Models based on Reinforcement Learning for Self-Optimized Macro-Femto LTE-Advanced Heterogeneous Network

maximum value of 𝑉𝜋 (𝑠) for the total number of states 𝑠, Note that 𝑄(𝑠, 𝑎) is exactly the quantity that is maximized
which will be referred to as an optimal policy, denoted 𝜋 ∗ . in Equation 2 to choose the optimal action 𝑎 in state 𝑠.
Therefore, we can rewrite Equation 2 in terms of 𝑄(𝑠, 𝑎) as
𝜋 ∗ = argmax 𝑉𝜋 (𝑠) (2)
𝜋
𝜋 ∗ (𝑠) = argmax 𝑄(𝑠, 𝑎) (5)
𝑎
𝑉𝜋∗ (𝑠) is defined as the highest discounted cumulative which indicates that learning Q-function instead of learning
reward that the agent can gain starting from the initial state 𝑉𝜋∗ (𝑠) will make the agent able to choose an optimal action
𝑠. In other words, it is the discounted cumulative reward even though the variables 𝑟 and 𝛿 are unknown for the
achieved through executing the optimal policy that is started agent.
from state 𝑠. Learning the 𝑄-function is similar as learning the optimal
It is a challenge for the agent to achieve the optimal policy policy. The main issue is about figuring out a trustworthy
𝜋 ∗ because of the lack of training data which does not offer method to estimate 𝑄 values from the instant values of
training examples in the form of (𝑠, 𝑎). However, the learner reward, 𝑟. Such a method is possible to be achieved by
is informed about one thing, which is the sequence of the iterative approximation. This conclusion is coming after
instant reward 𝑟(𝑠 𝑘 , 𝑎𝑘 ) for 𝑘 = 0, 1, 2, … This data noticing the very close relationship between 𝑉𝜋∗ and Q in
facilitates the process to learn a numerical evaluation Equations 6 and 7 as follows:
function which can be represented by states and actions,
then get the optimal policy in terms of this evaluation 𝑉𝜋∗ (𝑠) = max 𝑄(𝑠, 𝑎́ ) (6)
𝑎́
function.
One selection for evaluation function is 𝑉𝜋∗ (𝑠). The That allows rewriting as:
proposed LBRL algorithms in this paper should give
preference to state 𝑠1 over state 𝑠 2 each time when 𝑉𝜋∗ (𝑠1 ) 𝑄(𝑠, 𝑎) = 𝑟(𝑠, 𝑎) + 𝛾 max 𝑄(𝛿(𝑠, 𝑎), 𝑎́ ) (7)
is higher than 𝑉𝜋∗ (𝑠 2 ), as the cumulative future reward is 𝑎́
higher than 𝑠1 . The algorithm policy makes a selection from
the states space, and not from the actions space. However, which is an iterative equation that provides us the
in some cases 𝑉𝜋∗ (𝑠) can be used to select from the actions foundation for an algorithm that iteratively approximate 𝑄.
space as well. The optimal action to be selected in state 𝑠 is A Q-learning algorithm learns by repeatedly decreasing
the action 𝑎 that produces the highest instant reward 𝑟(𝑠, 𝑎) the differences between the Q values of the succeeding
added to the amount 𝑉𝜋∗ (𝑠) of the next state after it is states. It is able to solve optimization problems that deal
discounted by 𝛾 as shown in Equation 3. with systems which are undefined in closed form
expression, and it depends on the Temporal Difference (TD)
𝜋 ∗ (𝑠) = argmax [𝑟(𝑠, 𝑎) + 𝛾𝑉𝜋∗ (𝛿(𝑠, 𝑎))] (3) method during the learning process. To estimate the Q-value
𝑎 in Equation 7, an agent has the target to choose the action
that produces the highest value of long term reward, r.
Recall that the variable 𝛿(𝑠, 𝑎) identifies the achieved In Section III of this paper, there are two formulas that
state from applying action 𝑎 to state 𝑠. Further, an agent is have been proposed to calculate the reward, r, for each of
defined in this paper as a Femto cell that underlays a Macro the proposed algorithms. The proposed LBRL algorithms
cell. The agent that runs LBRL algorithms adopts an optimal are specified by firstly, controlling the transmitted power of
policy by learning 𝑉𝜋∗ (𝑠), then the agent will be equipped the Reference Signal (RS) at each Femto cell. Secondly, the
with complete knowledge of the instant reward function 𝑟 Reinforcement Learning (RL) as one of the machine
and the state transition function 𝛿. As the agent has gained learning techniques, which will convert each Femto cell to a
knowledge about the variables 𝑟 and 𝛿 which are employed smart node that is able to take a decision and auto-tune itself
by the environment to react to its actions, then the optimal for an optimal state.
action, a, for any state 𝑠 can be determined. Even though
learning 𝑉𝜋∗ (𝑠) is an efficient way to get the optimal policy, IV. MACRO-FEMTO SELF ORGANIZING NETWORK MODEL
it can be used only when the agent has a complete
knowledge of 𝛿 and 𝑟. This needs the capability to expect The Self Organizing Network (SON) features are
the instant result of both of the instant reward and future considered powerful development in the 4th generation (4G)
reward for each state-action pair. Practically, the agent will of mobile networks that are pertaining to the next stage of
not be able to expect an accurate result of applying random development which includes 4G and beyond 4G networks
action to a random state. Whenever the value of 𝛿 or 𝑟 is [3]. SON features are used when there is rapidly changing
undefined, then the process of learning 𝑉𝜋∗ (𝑠) is useless for traffic, highly fluctuating RF channel or to automate the
choosing the optimal policy. As well, the agent will not be operator policies which are specifically related to the mobile
able to estimate Equation 2 in this case. So another radio access network. Its main features are categorized into
evaluation function should be used by the agent for this four categories, which are self-optimization, self-
framework. configuration, self-diagnosis and self-healing [18]. SON
The evaluation function 𝑄(𝑠, 𝑎) can be determined as functions have been identified and used by multiple mobile
shown in Equation 4, so that its value is the highest service operators, as it leads to simplified operations and
discounted cumulative reward to be gained by starting from increasing profitability
state, s, initially and executing action a. Our proposed algorithms utilize SON functions, which
include self-diagnosis, self-healing, and self-optimization of
𝑄(𝑠, 𝑎) = 𝑟(𝑠, 𝑎) + 𝛾𝑉𝜋∗ (𝛿(𝑠, 𝑎)) (4) Macro and Femto cells in LTE-A HetNet. In order to
achieve fair distribution of end-users between highly loaded

ISSN: 2180-1843 e-ISSN: 2289-8131 Vol. 9 No. 1 49


Journal of Telecommunication, Electronic and Computer Engineering

Macro cell and its neighbor Femto cells, both of the state (s), action (a) and reward (r) are the integral parts that
proposed algorithms are mainly based on the self- need to be defined at each Femto cell, i.e. Femto cell-i, as
optimization concept for SON-enabled LTE-A HetNet, shown in Figure 2. The state is defined as the Reference
which is mainly employing Reinforcement Learning (RL) Signal (RS) power of Femto cell-i at t. The action of Femto
and Q-learning techniques to offload end-users from the cell-i is the optimal reference signal power level that will be
Macro cell into its neighbor Femto cells. selected from a range of pre-defined power levels for Femto
A set of three performance metrics for highly loaded cell-i at time t.
Macro cell are the main inputs for each of the proposed LBRL-SINR Algorithm is triggered at
algorithms, LBRL-SINR and LBRL-T. The three Femto cell-i
performance metrics are call block rate (B), call drop rate
(D), and average SINR, which are specific inputs of LBRL-
SINR algorithm. However, B, D, and cell throughput (T) are Three performance metrics are acquired
the specific inputs of LBRL-T algorithm. The SON module from an overloaded Macro cell and
at each Femto cell is triggered only when a Macro eNodeB exchanged with the neighbor Femto cell-i :
declares a high load state or an overload indicator (OI) is average SINR, Call Drop Rate (D), Call
Block Rate (B)
activated, then a Macro cell will trigger the LBRL algorithm
to be executed at its neighbor Femto cells, as shown in
Figure 1. The signaling between each Femto and Macro cell 𝑓(𝑖)
Reward, 𝑟𝑡+1 , is calculated at Femto
is carried over X2 or S1 interface. Each Femto cell will cell-i
independently increase the reference signal (RS) power to
increase its coverage region. As a result, the traffic in hot
areas is redirected to lightly loaded areas under Femto Q-Table is updated after estimating Q(s,a) at
cells, and thus load balancing is achieved. Femto cell-i

Start
Action is applied at Femto cell-i to select
the best RS Power state, s, that maximizes
𝑓(𝑖)
the received Reward, 𝑟𝑡+1 .
Macro cell OI Status 1 (High Load)
Figure 2: The main modules and execution sequence of LBRL-SINR
algorithm
0 (Normal Load)

As soon as the selected action, a, is applied, the reward


Macro cell triggers its neighbor underlay Femto
Normal operation for both
of Macro and Femto cells
cells over X2 or S1 interface to run an Off- (𝑟𝑓𝑡 ) at Femto cell-i is estimated as proposed in Equation 8.
The value of 𝑟𝑓𝑡 is an indicator of the current performance
loading algorithm (LBRL-SINR or LBRL-T)

of both Macro and its neighbor Femto cell-i. An overlay


Figure 1: Macro-Femto SON model Macro cell and Femto cell-i collaborate in each optimization
cycle and exchange the load information and performance
The proposed SON architecture is distributed architecture metrics through X2 interface or S1 interface as an
and not centralized. In other words, both of LBRL alternative. The three performance metrics which will be
algorithms do not need to connect to a database to exchange used to calculate the reward at Femto cell-i are: the average
the performance metrics data, while the algorithm is running SINR of all end-users at both Macro cell and Femto cell-i at
on live network. The normal signaling over X2 or S1 time t (𝑆𝐼𝑁𝑅𝑚 𝑡
and 𝑆𝐼𝑁𝑅𝑓𝑡 ), Call Drop Rate at Macro cell
interface will be enough for each Femto cell to acquire the and Femto cell-i at time t (𝐷𝑚 𝑡
+𝐷𝑓𝑡 ), Call Block Rate at
required performance metrics from its neighbor Macro cell. Macro cell and Femto Cell-i at time t (𝐵𝑚 𝑡
+ 𝐵𝑓𝑡 ). The
proposed reward function is defined as follows:
V. LOAD BALANCING BASED ON REINFORCEMENT
LEARNING OF END-USER SINR (LBRL-SINR)
𝑟𝑓𝑡 = (𝑤1 (𝑆𝐼𝑁𝑅𝑚𝑡
+ 𝑆𝐼𝑁𝑅𝑓𝑡 ) + 𝑤2 (𝐷𝑚
𝑡
+𝐷𝑓𝑡 )
𝑡 𝑡
It is normal for the CQI of each User Equipment (UE) to +𝑤3 (𝐵𝑚 + 𝐵𝑓 ))* 1/c (8)
decrease on the Macro cell side, and it implies that the
𝑡
Signal-to-Interference-plus-Noise Ratio (SINR) of the where w1, w2 and w3 are the weights. 𝑆𝐼𝑁𝑅𝑚 is the average
𝑡 𝑡
PDSCH channel is not sufficient. As a result, the cell of 𝑆𝐼𝑁𝑅𝑚,𝑘 for all end-users at time t. 𝑆𝐼𝑁𝑅𝑚,𝑘 is defined as
throughput of the Macro cell will decrease. By triggering the the SINR of UE (k) at Macro cell (m) as defined in Equation
LBRL-SINR algorithm at each underlay Femto cell, the 9. The constant c is to keep the reward (𝑟𝑓𝑡 ) value between 0
algorithm will react by adjusting the reference signal power and 1.
either through adding more power or decrease the power to
adjust the coverage region size of each Femto cell. The 𝑡
𝑆𝐼𝑁𝑅𝑚,𝑘 (𝑑𝐵) = 𝑃𝑚 + 𝐺𝑚 − 𝑃𝐿𝑚,𝑘 (𝐼𝑚,𝑘 + 𝑛2 ) (9)
algorithm decides about suitable power level at each Femto
cell, which in turn, it balances the traffic load among Macro where: 𝑃𝑚 = downlink transmitted power from Macro
and its surrounding Femto cells. cell (m) to end-user (k)
The LBRL-SINR algorithm utilizes Q-learning technique 𝐺𝑚 = downlink antenna gain of Macro cell (m)
to learn the optimal policy (Q-Value) that will determine the 𝑃𝐿𝑚,𝑘 = Path loss between Macro cell (m) and
best power level for Femto cell, mainly based on the end-user (k)
degraded performance metrics of an overlay Macro cell. The

50 ISSN: 2180-1843 e-ISSN: 2289-8131 Vol. 9 No. 1


Load Balancing Models based on Reinforcement Learning for Self-Optimized Macro-Femto LTE-Advanced Heterogeneous Network

𝐼𝑚,𝑘 = The received downlink interference at will be achieved by decreasing the chance for a Macro cell
end-user (k) who connects to Macro cell with high number of end-users to have high rates of dropped
(m) or blocked calls (D or B).
n = Thermal noise However, if the increment in the refernce signal power at
Femto cell-i was unnecessary or led to unstable performance
The downlink inter-cell interference model is simulated in terms of causing higher Drop Calls Rate (D) or higher
for LTE-A downlink. LTE-A employs Orthogonal Block Calls Rate (B) at Macro cell side, the algorithm will
Frequency Division Multiple Access (OFDMA) technique detect the degraded B or D, and estimates new reward
for its physical layer, which contributes in achieving higher value, 𝑟𝑓𝑡+1 , in the next optimization epoch which should be
spectral efficiency for LTE-A in comparison with the lower than the previous reward, 𝑟𝑓𝑡 . As a result, an
previous versions of mobile technologies. The smallest unit optimized action, a, will be applied to reduce the RS power
of bandwidth to be assigned for each end-user is the to lower level.
Physical Resource Block (PRBs). Each PRB serves a single
end-user at a time. Hence, the risk of having intracell- VII. SIMULATION ENVIRONMENT
interference is mitigated by the mentioned assignment
scheme of PRBs. An LTE-A Heterogeneous Network (HetNet) consists of
As much as the value of the reward, 𝑟𝑓𝑡 , is high, as much two types of cells, Macro cells and underlying Femto cells.
as the Femto cell-i coverage becomes wider. As a result, the In 3GPP [22], dense LTE-A HetNet is defined as a
optimized reference signal power level will force more end- heterogeneous network that consists of underlay small cells
users to camp on the Femto cell instead of camping on the varies from 4 to 10 cells which are defined as neighbors to
overlay Macro cell. their overlay Macro cell. Our simulation scenarios are
conducted on system-level simulation which is comprising 7
VI. LOAD BALANCING BASED ON REINFORCEMENT Macro cells and 42 underlay Femto cells as shown in Figure
LEARNING OF MACRO CELL THROUGHPUT (LBRL-T) 3. A number of 6 Femto cells is distributed randomly within
the coverage area of their neighbor Macro cell. As well,
This algorithm considers mainly the cell-throughput (T) each Femto cell is defined as neighbor to its nearest overlay
for all UEs instead of the average SINR in the case of Macro cell. The underlay Femto cells are able to
LBRL-SINR, to dynamically control the RS power at each communicate with the Macro cell through X2 or S1
Femto cell. It is assumed that the reference signal power of interface to exchange performance metrics and load
the Macro cell remains the same and is not subject to be information.
changed by the algorithm. This is to ensure full network The system topology as shown in Figure 3 consists of 7
coverage and to minimize the chance of creating coverage Macro cells. The center Macro cell is simulated with high
holes. As at some instant, Macro cell and its neighbor Femto traffic load that is originated from a maximum of 100 end-
cell may reduce their coverage together at the same time, users. The rest of 6 Macro cells is simulated with normal
which will create coverage hole. traffic load that is originated from a maximum of 20 end-
In this algorithm, the reward is estimated based on the users. The system bandwidth varies according to the cell
cell throughput (T) of Macro cell. The T value is one of the type. Each Macro cell has total bandwidth of 100 MHz
main components that constructs the reward function (𝑟𝑓𝑡 ) as which is the total available bandwidth from deploying 5
shown in Equation 10. The state and action of Femto cell-i Component Carriers (CCs), each CC provides a channel
are modeled in the same way as LBRL-SINR in Section IV, bandwidth of 20 MHz. Each Femto cell provides a channel
while the process of estimating the reward is different from bandwidth of 10 MHz. The traffic load of the center Macro
LBRL-SINR algorithm. cell in the 3 simulation scenarios is simulated to utilize 70%
There are three performance metrics, which are required to 99% of the Macro cell bandwidth. Meanwhile, normal
in order to estimate 𝑟𝑓𝑡 in LBRL-T, three of the metrics are traffic load is simulated to utilize a maximum of 25% of the
acquired from the Macro cell and its neighbor Femto cell-i available bandwidth at each cell of the total 6 surrounding
simultaneously. The first metric is the average cell Macro cells.
throughput at time t (𝑇𝑚𝑡 + 𝑇𝑓𝑡 ), the second metric is the Call
Drop Rate at time t (𝐷𝑚 𝑡
+𝐷𝑓𝑡 ) and the third metric is the Call
Block Rate at time t (𝐵𝑚 𝑡
+ 𝐵𝑓𝑡 ). The mentioned metrics
construct the reward function which is defined as follows:

𝑟𝑓𝑡 = (𝑤1 (𝑇𝑚𝑡 + 𝑇𝑓𝑡 ) + 𝑤2 (𝐷𝑚𝑡


+𝐷𝑓𝑡 )
+𝑤3 (𝐵𝑚 𝑡
+ 𝐵𝑓𝑡 ))* 1/c (10)

The LBRL-T algorithm keeps monitoring the cell


throughput (T) to not degrade at any time instance after the
new action, a, is applied. The immediate response of the
algorithm after an action, a, is to estimate the new reward
value, 𝑟𝑓𝑡+1 . The higher 𝑟𝑓𝑡+1 , the higher RS power value to
be assigned to Femto cell-i, which means increasing the
chance of Femto cell-i to off-load more end-users from its
neighbor Macro Cell. As a result, an improved performance Figure 3: System topology of dense LTE-A HetNet

ISSN: 2180-1843 e-ISSN: 2289-8131 Vol. 9 No. 1 51


Journal of Telecommunication, Electronic and Computer Engineering

Three simulation scenarios have been executed. They are: In Figure 6, the improved performance of Macro cell is
Fixed reference signal power allocation, dynamic reference shown through the reduced rate of dropped calls (D). In
signal power allocation by LBRL-SINR algorithm, and the other words, the low Call Drop Rate (D) is an indicator for
third scenario is a dynamic reference signal power allocation higher percentage of successful handovers (HO) among
by LBRL-T algorithm. In each of the three scenarios, each cells. When LBRL-SINR algorithm is triggered at an
UE admits to either Macro cell or its neighbor Femto cell underlay Femto cell, it could show the lowest Call Drop
depending on which cell has higher reference signal power Rate (D), as well it showed the lowest Call Block Rate (B)
value, as shown in Figure 4. If the cell Overload Indicator in comparison with both of the reference case and LBRL-T
(OI) is not active, this means that the cell is still able to algorithm. This confirms that acquiring the average SINR of
provide RBs to any new end-user that requests a connection end-users instead of the average Cell-Throughput (T)
or call. Otherwise, the call/connection request from the end- contributes in making more accurate decisions by the QL
user will be blocked. A dropped call is recorded if the optimizer to select the best RS power level at each Femto
received signal power of an end-user that has established cell. More accurate reward values (𝑟𝑓𝑡 ) were fed to the QL
connection with either Macro or Femto cell is lower than optimizer when LBRL-SINR is triggered. As a result, the
pre-determined threshold value of -110 dBm. LBRL-T algorithm showed sub-optimal performance in
comparison with LBRL-SINR, as shown in the Figures 5
UE(k) request a service
and 6.
0.29
Select Macro or Femto cell with Maximum RSRP

0.28

Femto cell is
Macro is selected as Macro cell RSRP >
Yes No selected as a 0.27
a Serving-Cell Femto cell RSRP
Serving-Cell
Call Block Rate (B) 0.26

0.25

Is Serving-Cell RSRP 0.24


Yes LBRL-SINR
< Threshold-RSRP No
LBRL-T
0.23 Fixed RS-Power

0.22
Blocked Call: If UE(k) is in IDLE
Serving-Cell
Mode
1 Overload Indicator
Dropped Call: if UE(k) is in 0.21
(OI) Status
CONNECTED Mode 0.7 0.75 0.8 0.85 0.9 0.95 1
Load Percentage
0
Figure 5: The output Call Block Rate (B) for highly loaded Macro cell

Allocate RBs to UE(k) 0.18

Figure 4: Basic procedures for estimating Call Block Rate (B) and Call 0.17
Drop Rate (D)

0.16
VIII. RESULTS AND DISCUSSION
Call Drop Rate (D)

To assess the performance of the proposed algorithms, the 0.15

same performance metrics used in the input stage to estimate


the reward values were used again in the output stage to 0.14
LBRL-SINR
assess the performance of the algorithms. Both of Call Drop LBRL-T
Rate (D) and Call Block Rate (B) have been estimated for 0.13
Fixed RS-Power

each simulation scenario and represented graphically in


Figures 5 and 6. In the first simulation scenario, fixed RS
0.12
power level of 19 dBm was set for each Femto cell. This 0.7 0.75 0.8 0.85 0.9 0.95 1
Load Percentage
scenario led to degraded performance at Macro cell and
generated considerable percentage of dropped calls, D, and Figure 6: The output Call Drop Rate (D) for highly loaded Macro cell
blocked calls, B. The y-axis in both figures represents the
percentage of B and D respectively. In particular, B is the In the second and third simulation scenarios, both of
most metric that was affected by the congestion situation. LBRL-SINR and LBRL-T evolved to new values for
In Figure 5, lower Call Block Rate (B) for both algorithms reference signal power that fluctuated in the range of 19 ± 3
is shown in comparison with the fixed RS power assignment dBm at each underlay Femto cell. In Figure 7, a comparison
scheme, which indicates that the available bandwidth is is shown for the average reference signal power of the 6
managed fairly among Macro and its neighbor Femto cells. Femto cells that underlay Macro cell 1 (Central Macro),
As a result, the chance for Macro cell to recover from where the LBRL algorithms were triggered and executed
congestion becomes higher by utilizing LBRL algorithms, during one optimization cycle for each simulation scenario.
and both of LBRL-SINR and LBRL-T algorithms showed a At each Femto cell, the minimum RS power level was set to
reduced rate of blocked calls over the normal scheme of 10 dBm, which is the lowest RS power level where neither
fixed RS power assignment. LBRL-SINR nor LBRL-T will go lower than this threshold

52 ISSN: 2180-1843 e-ISSN: 2289-8131 Vol. 9 No. 1


Load Balancing Models based on Reinforcement Learning for Self-Optimized Macro-Femto LTE-Advanced Heterogeneous Network

value. Further, a maximum value of 22 dBm was set for the set of 4 performance metrics (B, D, SINR and T) to be
RS power at each Femto cell. exchanged between Macro cell and its neighbor Femto cell
As shown in Figure 7, in order to achieve the prospective once an LBRL algorithm is triggered to run.
load balancing among Macro and its neighbor Femto cells,
the LBRL-SINR algorithm applied an increment of 1 to 3 IX. CONCLUSION
dBm of RS power at Femto cells 1, 2, and 4. In the third
simulation scenario, LBRL-T applied the same increment of This paper proposed two algorithms that optimize the
1 to 3 dBm for Femto cells 1, 5, and 6. The increment in degraded performance of LTE-A Macro cells due to high
reference signal power means that Femto cells 1, 2, 4, 5, and traffic load. The proposed algorithms utilize Reinforcement
6 are extending their coverage, and more end-users will be Learning (RL) techniques to auto-tune the reference signal
able to camp on the those 5 Femto cells instead of camping power of Femto cells, this results in offloading end-users
on their overlay Macro cell. However, if a degraded from a congested overlay Macro cell. Both of LBRL-SINR
performance is discovered by the algorithm which could be and LBRL-T algorithms optimize the RS power level of
either from Macro cell side or from its neighbor Femto cells Femto cells in real time during every optimization epoch of
side, the algorithm will react and decrease the Femto cell RS an On-air Macro cell. As a result, the distribution of traffic
power. A decrement of 1 to 3 dBm was applied by the load among Macro and Femto cells is improved, and lower
LBRL-SINR for Femto cells 3, 5, 6. As well, the same rates of dropped calls and blocked calls is achieved for
decrement was applied for Femto cells 2, 3, and 4 by LBRL- highly loaded Macro cell.
T algorithm as shown in Figure 7. As mentioned in the
previous sections of this paper, there are four types of REFERENCES
performance metrics that the algorithm could detect for
highly loaded Macro cell, those are high Call Drop Rate (D), [1] T. Nakamura, S. Nagata, A. Benjebbour, Y. Kishiyama, T. Hai, S.
Xiaodong, et al., "Trends in small cell enhancements in LTE
high Call Block Rate (B), low cell-throughput (T) and low
advanced," IEEE Communications Magazine, vol. 51, pp. 98-105,
average SINR. The degradation of any of those metrics will 2013.
affect the reward values as stated previously in Equations 8 [2] M. Peng, D. Liang, Y. Wei, J. Li, and H. H. Chen, "Self-configuration
and 10. As a result, the algorithm will reduce the RS power and self-optimization in LTE-advanced heterogeneous networks,"
IEEE Communications Magazine, vol. 51, pp. 36-45, 2013.
level at the Femto cell where the reward is estimated in
[3] L. Jorguseski, A. Pais, F. Gunnarsson, A. Centonza, and C. Willcock,
order to keep an optimal values of B, D, and SINR if LBRL- "Self-organizing networks in 3GPP: standardization and future
SINR algorithm is triggered, or B, D, T, if LBRL-T trends," IEEE Communications Magazine, vol. 52, pp. 28-34, 2014.
algorithm is triggered. [4] W. Wang, J. Zhang, and Q. Zhang, "Cooperative cell outage detection
in Self-Organizing femtocell networks," in INFOCOM, 2013
The LBRL-T algorithm is recommended to be used where
Proceedings IEEE, 2013, pp. 782-790.
the mobile operater could discover throughput-related [5] A. Aguilar-Garcia, S. Fortes, M. Molina-García, J. Calle-Sánchez, J.
issues, such as low End-user throughput or low cell I. Alonso, A. Garrido, et al., "Location-aware self-organizing methods
throughput. Since LBRL-T makes the decision to offload a in femtocell networks," Computer Networks, vol. 93, Part 1, pp. 125-
140, 12/24/ 2015.
cell based on the cell throughput as shown previously in
[6] M. Behjati and J. Cosmas, "Self-organizing network interference
Equation 10. On the other hand, LBRL-SINR utilizing the coordination for future LTE-advanced networks," in 2013 IEEE
End-user SINR as a part of its reward formula (Equation 8) International Symposium on Broadband Multimedia Systems and
makes this algorithm more suitable to be used in areas Broadcasting (BMSB), 2013, pp. 1-5.
[7] S. Jia, W. Li, X. Zhang, Y. Liu, and X. Gu, "Advanced Load
where clear indication of high interference spots is available.
Balancing Based on Network Flow Approach in LTE-A
25
Heterogeneous Network," International Journal of Antennas and
Propagation, vol. 2014, p. 10, 2014.
[8] Y. Khan, B. Sayrac, and E. Moulines, "Centralized self-optimization
in LTE-A using Active Antenna Systems," in Wireless Days (WD),
20
2013 IFIP, 2013, pp. 1-3.
[9] Z. Altman, S. Sallem, R. Nasri, B. Sayrac, and M. Clerc, "Particle
RS power level (dBm)

swarm optimization for Mobility Load Balancing SON in LTE


15 networks," in Wireless Communications and Networking Conference
Workshops (WCNCW), 2014 IEEE, 2014, pp. 172-177.
[10] A. L. Yusof, M. A. Zainali, M. T. M. Nasir, and N. Ya'acob,
10 "Handover adaptation for load balancing scheme in femtocell Long
Term Evolution (LTE) network," in Control and System Graduate
Research Colloquium (ICSGRC), 2014 IEEE 5th, 2014, pp. 242-246.
5 [11] K. Lee, S. Kim, S. Lee, and J. Ma, "Load balancing with transmission
LBRL-SINR
LBRL-T
power control in femtocell networks," in Advanced Communication
Technology (ICACT), 2011 13th International Conference on, 2011,
Reference power level
0 pp. 519-522.
Femto 1 Femto 2 Femto 3 Femto 4 Femto 5 Femto 6
[12] L. Bu, oniu, R. B. $$ka, and B. D. Schutter, "A Comprehensive
Femto Cell Index Survey of Multiagent Reinforcement Learning," IEEE Transactions
on Systems, Man, and Cybernetics, Part C (Applications and
Figure 7: RS Power allocation for 6 Femto cells that underlay Macro cell Reviews), vol. 38, pp. 156-172, 2008.
with high load [13] E. Bikov and D. Botvich, "Multi-agent Learning for Resource
Allocationn Dense Heterogeneous 5G Network," in 2015
The complexity and computational cost of LBRL-SINR International Conference on Engineering and Telecommunication
and LBRL-T are negligible since the proposed algorithms (EnT), 2015, pp. 1-6.
take a few minutes for computing an output with all the [14] I. S. Com, x015F, M. Aydin, S. Zhang, P. Kuonen, and J. F. Wagen,
"Reinforcement learning based radio resource scheduling in LTE-
needed calculations during each optimization epoch. In advanced," in Automation and Computing (ICAC), 2011 17th
addition, the memory requirement is limited. The needed International Conference on, 2011, pp. 219-224.
size of the look-up table is considered small, as it contains a

ISSN: 2180-1843 e-ISSN: 2289-8131 Vol. 9 No. 1 53


Journal of Telecommunication, Electronic and Computer Engineering

[15] J. Moysen and L. Giupponi, "A Reinforcement Learning Based network with autonomic network management," in 2010 IEEE
Solution for Self-Healing in LTE Networks," in 2014 IEEE 80th Globecom Workshops, 2010, pp. 454-459.
Vehicular Technology Conference (VTC2014-Fall), 2014, pp. 1-6. [19] K. M. Ronoh, A., "Load Balancing in Heterogeneous LTE-A
[16] O. Iacoboaiea, B. Sayrac, S. B. Jemaa, and P. Bianchi, "SON Networks," Linköping University, 2012.
Coordination for parameter conflict resolution: A reinforcement [20] A. Lobinger, S. Stefanski, T. Jansen, and I. Balan, "Load Balancing in
learning framework," in Wireless Communications and Networking Downlink LTE Self-Optimizing Networks," in 2010 IEEE 71st
Conference Workshops (WCNCW), 2014 IEEE, 2014, pp. 196-201. Vehicular Technology Conference, 2010, pp. 1-5.
[17] A. Giovanidis, L. Qi, and S. Stańczaky, "A distributed interference- [21] Z. Li, H. Wang, Z. Pan, N. Liu, and X. You, "Joint optimization on
aware load balancing algorithm for LTE multi-cell networks," in 2012 load balancing and network load in 3GPP LTE multi-cell networks,"
International ITG Workshop on Smart Antennas (WSA), 2012, pp. 28- in 2011 International Conference on Wireless Communications and
35. Signal Processing (WCSP), 2011, pp. 1-5.
[18] H. Zhang, X. s. Qiu, L. m. Meng, and X. d. Zhang, "Achieving [22] 3GPP, "“Small cell enhancements for E-UTRA and E-UTRAN —
distributed load balancing in self-organizing LTE radio access Physical layer aspects (Release 12)”," 3GPP, vol. TR 36.872
(v12.1.0), Dec. 2013.

54 ISSN: 2180-1843 e-ISSN: 2289-8131 Vol. 9 No. 1

You might also like