This section introduces the bounded multi-armed bandits model, then describes the packet size adaptation problem. Finally, we show how to map packet size adaptation to MAB by exploration and exploitation.
4.1. Bounded Multi-Armed Bandits
The MAB proposed is composed of K arms, represented as
. An arm is chosen from a non-empty subset
to draw at each time step, and we pay a drawing cost ci. The cost budget is B. It means that the total cost is no more than this budget constraint B. When a nonnegative return is received, its distribution is associated with a particular arm. We assume that each arm support has a limited return distribution, because the reward value is usually restricted in an actual application. When the arm i is drawn, the agent receives the mean value of the reward μi. Maximizing the total of the rewards that one gets from drawing the arm is the goal, but the agent does not have the original knowledge μi of each arm, so it is necessary to learn these values to select a strategy to maximize total return. In view of this, the goal is to find the arm whose rewards is the most to draw, whose total reward can achieve the maximum expectation, no more than B [
31].
Formally, A is the arm-pulling algorithm, which can get the finite sequence. NBi(A) is the random variable which is the number of the arm i pulled by A. B is the budget limit. NBi(A) is the random variable, because A depends on the reward is observed. Therefore, we have:
where
is the subset where
the agent chooses the arm i to draw.
is the indicator function. In order to ensure that the sequence of total cost is no more than B, we have:
where P (•) represents the probability. Furthermore, it assumes that the agents draw each arm number which is no more than L
i. That is:
Now, let
be the total return which is got by using the A algorithm to draw the arms and the cost is no more than B. The expectation value of G
B (A) is:
Then,
represents the optimal algorithm. It maximizes the total return:
In order to achieve the optimal algorithm
A*, we have to understand in advance the value of μi, which is not saved in our case. Therefore,
A* is on behalf of the theory of the optimization algorithm which could not be achieved. However, the regret is defined as the difference between the expected total return and the optimal value
A* [
40]. That is:
Here the purpose is to get a sequence of arm pulls which minimizes the regret of the above definition. It is a bounded multi-armed bandits problem. Since we limit , we get the budget restricted MAB. Moreover, when setting , we can obtain the standard MAB model.
4.2. Packet Size Adaptation
The nature of the wireless channel is time-varying. It is very inefficient when a fixed frame size is used. As previously discussed, a variable frame size minimizes the error caused by the frame rate of a large data packet transmission through a bad channel quality. If the channel quality is better, a large packet size is sent. The variable frame size reduces the number of retransmissions, increases the goodput of the system and saves energy. The paper finds the optimal size by MAB according to the different channel quality. The frame size is a local optimum, because the number of users and channel quality change, making the network environment change too [
41,
42,
43,
44].
In order to make use of multi-armed bandits to find the optimal frame size according to the channel quality, we need to develop exploration and exploitation. Our goal is to maximize the goodput of the networks. The channel goodput considered for developing exploration and exploitation which depends on the packet size, collision rate, and data rate, the delay of the protocol and the quality of the channel. The following equation gives the relationship between throughput and the frame size [
3]:
L: a frame size
LACK: the acknowledgment frame length
Lcollision: the average collision length
R: the transmission of data rate
HMAC: the MAC protocol of a frame header
HPHY: the PHY layer of a frame header
N: average collisions number between two renewal sensor
T: average backoff time slots under a certain channel
D: Distributed Inter-frame Spacing(DIFS)
Perror: bit error probability in the case of known channel quality
OACK: the overhead of the acknowledgment
Optotocol: the MAC and PHY protocol process delay overhead
Firstly, from the PU behavior indicated in Equation (19), we find that the probability that the packet can successfully transmit is decreased as the packet size increases. The energy is wasted when the packet collides with PU packets. Secondly, if the packet size is reduced, the ratio of energy consumption in the data transmission slots of
Figure 2 to the energy consumption in the access control slot is reduced, which also reduces the energy efficiency. There is a trade-off between the two conditions, and there may be exist an optimal packet size which leads to the best energy efficiency. Sensors could transmit as many packets as possible to improve energy efficiency during their lifetime. In this paper, the metric energy-per-bit(EPB) is used as denoting the ratio of the total energy consumption to the amount of data successfully transmitted. As introduced before, the protocol designed is to minimize the EPB for the network, not only based on an individual condition [
45]:
Every frame needs to consider the adaptation of the packet size because both the PU behavior and sensor activity are time-varying. A CM is awakened and transmits an accessing request message when data needs to be sent. The CH begins to determine the packet size of the data transmission of the current frame according to the sensor activity and the PU behavior when it receives the access request from the CMs. The total energy consumption in the network includes:
- ◆
The energy consumed by the access control slot, which includes transmitted access request packets and the broadcast of access reply packets received.
- ◆
The energy consumed of the data transmission slots.
In this work, the star topology is used between each CM and the CH which has an equal distance d. The size of the access request packet is K
1 bits. The size of the access reply packet is K
2 bits. In the access control slot, the energy consumption of the whole network is as follows:
Data packet transmission refers to the frame structure in
Figure 6. A complete data transmission occupies L slots. Eg: the CM i tries to transmit a packet whose length is L slots using channel j. The energy consumption is
. Let P
j(l) denote the transmission probability of the collision between the CM and PU. The CM only sends for
slots on channel j. P
j(l) can be expressed as:
where p
j is the transition probability from idle to busy. The successful transmission probability of a package with the length of L in the slots is
. According to the above equations, the anticipated energy consumption of the CM i transmitting data on the channel j is derived as follows:
is not related to i because CM and CH is the same distance, so . The probability of successful transmission only depends on the PU behavior, so the expected amount of successful transmissions in channel j is .
If the number of available channels is more than the number of active CMs, then less channels are selected to reducs the probability of collisions with other PUs during the data transmission, This is because when pj is less, these channels can easily remain idle if they feel idle in the first slot. Nactive is on behalf of selected the available channels.
The optimal packet size in terms of number of slots is obtained by,
,
(the minimum packet size) depends on the MAC frame format of a specific network.
(the maximum packet size) is generally selected as the maximum transmission unit (MTU) which is allowed by the network to avoid packet fragmentation. Because an active CM and the available channel change over time, packet size should be adaptive to change, to minimize the EPB used in the current frame of the network. The CH keeps tracking the changes of channel states by interval time between ACK and join by the residual energy balance from CMs to make a decision on the packet size at the beginning of each frame [
46,
47,
48].