Massive MIMO Adaptive Modulation and Coding Using Online Deep Learning Algorithm

Evgeny Bobrov, Dmitry Kropotov, Hao Lu, and Danila Zaev The work has been supported by Huawei Technologies and Interdisciplinary Scientific and Educational School of Moscow University «Brain, Cognitive Systems, Artificial Intelligence». Evgeny Bobrov is with Moscow Research Center, Huawei Technologies, Russia, and M. V. Lomonosov Moscow State University, Russia (e-mail: [email protected]). Dmitry Kropotov is with National Research University Higher School of Economics, Russia, and M. V. Lomonosov Moscow State University, Russia (e-mail: [email protected]). Hao Lu and Danila Zaev are with Moscow Research Center, Huawei Technologies, Russia (e-mail: [email protected], [email protected]).

Abstract

The paper describes an online deep learning algorithm (ODL) for adaptive modulation and coding in massive MIMO. The algorithm is based on a fully connected neural network, which is initially trained on the output of the traditional algorithm and then incrementally retrained by the service feedback of its output. We show the advantage of our solution over the state-of-the-art Q-learning approach. We provide system-level simulation results to support this conclusion in various scenarios with different channel characteristics and different user speeds. Compared with traditional OLLA, the algorithm shows a 10% to 20% improvement in user throughput in the full-buffer case.

Index Terms:

Adaptive Modulation and Coding, Link Adaptation, Olla, Deep Learning, Reinforcement Learning, Massive MIMO, Wireless Communications, Online Training

I Introduction

The adaptive modulation and coding (AMC) process carried out in the link adaptation is a crucial part of current wireless communication systems. It becomes especially important and challenging in massive MIMO systems with dynamic beamforming. Advanced AMC techniques allow a significant increase in the data rate that can be reliably transmitted [1].

Following New Radio (5G) downlink AMC procedure [2], user equipment (UE) has to suggest to the serving base station (BS) an appropriate modulation and coding scheme (MCS) to be used in the next transmission. The proposed MCS is provided by UE using a channel quality indicator (CQI). However, this indication is not enough for high-performance service. The first reason is that each CQI is associated with an interval of signal-to-inference-and-noise ratio (SINR), which could correspond to more than one MCS. In addition, in massive MIMO systems, the accuracy of CQI is limited by the number of specific antenna ports, which is usually less than the number of transmit antennas at BS. Due to this, BS cannot rely solely on the user’s CQI report in MCS selection. That is why various AMC methods are proposed for this goal.

The well-known outer loop link adaptation (OLLA) technique was first proposed in [3]. OLLA modifies the SINR CQI-based estimation by an offset [4, 5] which can be positive (making the MCS selection more conservative) or negative (when the CQI selection was too optimistic). This offset is updated based on transport blocks’ transmission success rate so that the average block error rate is kept as close as possible to the predefined target [6].

It should be noted that the OLLA family of algorithms uses only the last binary acknowledgment information and does not take into account more refined SINR channel data, e.g., sounding (SRS) based measurements. Contrary to that, we offer an adaptive and self-learning method that predicts the next MCS using the available SINR-related measurements. The method performs both the mapping from SINR and channel data to the optimal MCS and the training (self-learning) in an online manner.

The main advantage of the proposed online deep learning (ODL) algorithm is its ability to adapt to different environments, different channel types, and different scenario conditions that BS cannot measure directly, e.g., UE speed. Due to the channel aging effect, user speed is an important hidden factor for the optimal choice of MCS, and it is hard to catch it with an offline pre-trained AI-based model. In the proposed approach, the model is able to adaptively learn the behavior of the UE and implicitly take into account its speed. In the state of the art, this challenge is called c̱oncept drift [7]. It is described as a situation when some hidden features are important and change over time, but cannot be measured. This way, our task falls into the class of incremental learning [8] algorithms, which proceed with optimization in non-stationary environments such as the massive MIMO service of a mobile UE. The deep learning approach in massive MIMO scenarios was also studied in the work [9].

Traditional OLLA adapts its offset based on HARQ acknowledgment (ACK/NACK) feedback for a transmitted transport block. The adaptation is done only if the transmission is performed. In this respect, the OLLA technique is highly dependent on traffic characteristics. If traffic is sparse compared with the channel variation, the OLLA adaptation may not achieve satisfactory quality. However, other modern techniques, like e.g., eOLLA [6], can update their offset independently of whether a transmission is carried out or not, which is very convenient for bursty traffic scenarios. In this manuscript, the proposed solution updates its parameters only using ACK/NACK feedback and assumes continuous (i.e. full-buffer) traffic. The proposed solution is fully compatible with 5G NR specifications (Release 15 or higher). It does not require any modification to the standard.

The novelty of our work is in the proposed scheme of online deep learning with a new optimization target. On the one hand, it is simpler and more effective than the existing Q-learning approach ([10, 11]) to the AMC problem. On the one hand, it outperforms the basic OLLA approach because of the better utilization of the available channel//SINR information.

In this manuscript, machine training and execution are carried out exclusively on the base station side. The input (set of ’features’) of the algorithm consists of the subband SINR measurements, CQI, time period from the last sounding, and the last reference signal received power (RSRP) measurement. Training data (’features’ and ’labels’) is collected in real-time and stored in a limited memory buffer. The computational complexity and storage requirements of the ODL approach have been investigated. Simulation results prove the stable behavior of the proposal and its uniform advantage over OLLA and Q-learning baselines. Quantitatively, the proposal increases throughput value compared to OLLA by 10% to 20%, depending on the agent speed.

We summarize the advantages of our proposal as follows: (i) the ODL can adapt to different agent speeds, (ii) the proposed approach is fully compliant with the existing NR 5G specifications, (iii) the entire online machine learning process is conducted at the base station side, has feasible storage and computational overheads.

This paper is organized as follows. Section 2 briefly describes the massive MIMO model. Section 3 carries out the proposed algorithm structure, the neural network model, and the complexity of the online training with the sample buffer approach. Section 4 describes the simulation results. Section 5 contains the conclusion.

II System Model

In the MIMO system, it is possible to send several information symbols to a multi-antenna user on a single physical resource. The number of such symbols is called the rank of the user. Under certain channel conditions, the higher rank can significantly increase the amount of transmitted information, but at the same time, it increases the requirements for channel quality. The single-user MIMO model is described by the following linear system:

r=G(HWx+n).

(1)

Where $r\in\mathbb{C}^{L}$ is a vector of detected symbols at receiver, $x\in\mathbb{C}^{L}$ is a vector of sent symbols, $H\in\mathbb{C}^{R\times T}$ is a channel matrix, $W\in\mathbb{C}^{T\times L}$ is a precoding matrix, $G\in\mathbb{C}^{L\times R}$ is a detection matrix, and $n\sim\mathcal{CN}(0,I_{L})$ is a noise-vector. The constant $T$ is the number of transmit antennas, $R$ is the number of receiver antennas, $L$ is the user rank. We assume they are related as follows: $L\leqslant R\leqslant T$ . As for detection matrix $G$ we assume linear MMSE [12] and for the precoding $W$ we assume the SVD-based transmission scheme [13].

The optimization objective is to maximize the expected throughput and was also considered, e.g., in [14]. The parameters of the model, including the bandwidth and the sounding period, are provided in the Table I.

III Structure of the proposed algorithm

The general structure of the solution follows [6]. The algorithm predicts the success acknowledgment (ACK) probability for each MCS given the available SINR measurements.

We propose to consider the product of spectral efficiency and the probability of successful transmission, and maximize the resulting value over possible choices of MCS:

\widehat{mcs}_{SE}(sinr)=\arg\max\limits_{mcs}\big{\{}p_{w}(ack|mcs,sinr)SE(% mcs)\big{\}}

(2)

This approach corresponds to the maximization of the expected throughput under the assumption of the Bernoulli probabilistic scheme.

Here, $p_{w}(ack|mcs,sinr)$ is a neural network model that predicts probabilities and has weights $w$ as the parameters for optimization. At the inference stage, the neural network takes as input the frequency-specific SINR estimations and an MCS and provides an acknowledgment probability as an output. The algorithm iterates through the MCS values and selects the scheme that provides the maximum expected throughput.

In the current state-of-the-art, there is a tendency to use the Q-learning (also called reinforcement learning) technique for the AMC problem [10, 11]. This technique considers MCS selection as an agent action. While deep Q-learning is widely applied in wireless communication systems and can be applied to this task as well, we argue that this application is not natural. We propose an alternative scheme (2) using classical deep learning that appears to be superior to Q-learning. Our choice of architecture is based on the following observations:

(a)

All actions in AMC are performed immediately and the reward delay is strictly specified in advance. The reward does not depend on the future actions, as, e.g., in a chess game that is modeled by Q-learning.
(b)

There is no influence on the system from our actions. The actual SINR of the transmission is independent of the MCS we choose.
(c)

The actual channel, the BS measurements, and the precoding are time-varying in general. Thus, we have access to the input data (features) and training outputs (labels) sequentially. Older data samples tend to become irrelevant.

Observations (a) and (b) motivate the use of the traditional deep learning approach rather than Q-learning. We consider acknowledgment prediction as a binary classification problem and use the scheme (2) to select the optimal MCS. Observation c) motivates the use of the online approach.

Compared with Q-learning, the main difference in our ODL approach is the use of a binary logarithmic loss function (log-loss) instead of Q-learning Temporal-Difference (TD)-Loss [15]. This way, we move to the binary classification problem instead of maximizing the delayed rewards (a) and modeling the influence on the system of our actions (b).

Note that we do not need to model a chain of future actions for this type of optimization. Indeed, the proposed ODL method predicts only the MCS for the next transmission, while the Q-learning approach predicts a chain of future actions (which are enclosed in Q-values). Thus, the ODL method is more suitable for the MCS selection problem and, as we show later, provides more stable performance.

Refer to caption — Figure 1: Online Deep Learning algorithm block scheme.

As a competitor to our solution, we consider the following Q-learning regression model [10, 11], which selects MCS based on the following maximization principle:

\widetilde{mcs}_{SE}(sinr)=\arg\max\limits_{mcs}\big{\{}q_{w}^{SE}(ack|mcs,% sinr)\big{\}}

(3)

Here, $q_{w}^{SE}(ack|mcs,sinr)$ is the neural network regression that predicts real scalar values. The Q-learning model is trained on the rewards $r(ack,mcs)=SE(mcs)[ack]$ , where $[x]$ is the indicator function that returns $1$ if condition $x$ is true and $0$ otherwise. The condition $ack$ corresponds to the receipt of the success acknowledgment. We will discuss this in more detail in the next section.

III-A Neural Network Model

In this work, we propose using the simplest neural network for binary classification without hidden layers (logistic regression). This model is lightweight, fast trainable, and robust to the environmental changes in the online-learning setting.

Our classification model uses the standard sigmoid function, which takes any real input $t$ , and outputs a value between zero and one. The sigmoid function $\sigma:\mathbb{R}\rightarrow(0,1)$ is defined as follows: $\sigma(t)=1/(1+e^{-t})$ .

Thus, we can express the probability of receiving acknowledgement in terms of the sigmoid function $\sigma$ depending on $mcs$ and $sinr$ arguments through the function $f_{w}$ , which is the neural network function with weights $w$ :

p_{w}(ack|mcs,sinr)=\sigma(f_{w}(mcs,sinr))).

(4)

The output of the model for a given vector of input features can be interpreted as a probability and serves as the basis for classification. The optimization method computes the log-loss for all the observations $n~{}\in~{}\{1~{}\dots~{}N\}$ on which it is trained. The function $J$ counts the log-probabilities of ACKs in the following way:

J(w)=\frac{1}{N}\sum_{n=1}^{N}\big{(}ack_{n}\log p_{w}(ack_{n}|sinr_{n},mcs_{n% })+\\ (1-ack_{n})\log(1-p_{w}(ack_{n}|sinr_{n},mcs_{n}))\big{)}\rightarrow\max% \limits_{w}

(5)

Here $ack_{n}\in\{0,1\}$ is the "true" acknowledgement, which we get to know after the action is completed, and $p_{w}(ack_{n}|sinr_{n},mcs_{n})$ is the probability model of the $ack_{n}$ reception, which is a function of features: $\{sinr_{n},mcs_{n}\}$ .

For the Q-learning approach, we apply the MSE-Loss function to the reward. Since we do not have a delayed reward, this is the TD-Loss with $\gamma=0$ [15]:

F(w)=\frac{1}{N}\sum_{n=1}^{N}\big{(}q_{w}(ack_{n}|sinr_{n},mcs_{n})-r(ack_{n}% ,mcs_{n})\big{)}^{2}\rightarrow\min\limits_{w}

(6)

III-B Proposed algorithm complexity

For the process of online learning, we use the Adam [16] method as one of the simplest gradient-based algorithms. It is worth noting that the previously obtained solution $w_{t}^{*}$ (optimal weights of the model) can be used as the starting point for the next re-training step $w_{t+1}^{o}$ , resulting in: $w_{t+1}^{o}=w_{t}^{*}$ . Therefore, in practical implementation, it is enough to just make a few gradient steps at the re-training step.

Since the algorithm works online, it needs to be retrained on the new data. We suggest using a buffer for every user containing the recent samples (transmission examples): features, the selected MCS, and the result of the transmission (ack/nack). Buffer samples are updated in FIFO order; the oldest samples are replaced with the newest ones. We can visualize this mechanism as follows: (Fig. 3).

We propose adding new samples to the buffer with a (possibly adaptive) subsampling rate to avoid the situation where most features remain the same between the channel measurements. By doing so, we significantly reduce the memory buffer size and retraining speed without sacrificing prediction quality. The quality may even get better since we can expand the buffer to our storage limits. For our experiments with full-buffer users, we chose subsampling rate as an inverse probability of sounding length, excluding the pilot signals.

Parameters: Initial value $y_{o}$ and step size $d$ of OLLA. Initial CQI $c_{o}$ , target BLER $b$ , buffer size $U$ , retraining period: $N$ .

Initialize: OLLA: $y=y_{o}$ , CQI: $c=c_{o}$ , the sample buffer of the size $U$ and the neural network model $A(w)$ with number of nodes $P$ and number of connections $Q$ .

Complexity. Computations $\mathcal{O}(Q/N)$ and memory $\mathcal{O}(PU)$

procedure An Agent Scheme

for each of the first

U

TTIs do

Set

MCS\leftarrow\min(\max(\text{round}(c+y),1),29)

Receive and put to the buffer the labels: ACK or NACK:

a\in\{0,1\}

, and the features:
CQI

c\in\{1\dots n\}

, SINR

s\in\mathbb{R}^{m}

,
MCS

\in\{1\dots 29\}

Update OLLA:

y\leftarrow y+da-d(1-a)(1-b)/b

[6]

Train

A(w)

targeting

J

(5) or

F

(6) and using buffer

end for

for each time frame

k=U+1,U+2\dots

Set MCS

\leftarrow A(w)

NN prediction by (2) or (3)

Receive and replace the oldest values of the buffer labels: ACK or NACK:

a\in\{0,1\}

, and features: CQI:

c\in\{1\dots n\}

, SINR

s\in\mathbb{R}^{m}

,
MCS

\in\{1\dots 29\}

k\text{ mod }N=0

then

Initialize NN

w_{k}^{o}=w_{k-N}^{*}

from the previous step

Retrain

A(w)

using (5) or (6) and memory buffer

end if

end for

end procedure

Figure 4: The scheme of both Online Deep Learning (proposed) and Q-learning with a sample buffer. The key difference between the algorithms is in the different activation and loss functions.

III-C The structure of the neural network

The input of the neural network in the proposed approach and in the Q-learning approach is designed to be the same. The feature space of the algorithm consists of SINR for each user antenna ( $RxNum=4$ for our experiments), reported CQI, the time interval from the last sounding, cell RSRP and one of the MCS values. Additional bias parameters are configured for each layer of the network. On the output layer the acknowledge success is predicted. We apply the standard scaling normalization method by subtracting the average value and dividing it by the standard deviation for each feature across all samples. The structure of the neural network is presented in Fig. 5.

The structure of ODL and Q-learning neural networks is the same in all aspects except the activation functions at the output layer. For the output layer, ODL uses the sigmoid function, while the Q-learning method uses the identity function. This difference is motivated by a difference in the problems the models solve. ODL solves the binary classification problem by predicting the probability of success, while the Q-learning model solves the regression problem by predicting real Q-values. Thus, we selected the activation functions that give the best quality results for each of the considered approaches.

IV System-level simulation results

First, we compare the proposed machine learning algorithm with the traditional OLLA method. The provided performance gains and losses are calculated with respect to OLLA performance. We have gotten stable, uniformly better results, which have never failed in our experiments. On average, the proposal increases throughput values from 12.64% to 21.52% depending on UE speed.

The advantage of the proposal is explained by the use of additional information based on SRS-based SINR measurements. We should also notice that the step-by-step behavior of OLLA is too conservative in a rapidly changing environment.

IV-A Quality improvement with machine learning

We provide experimental results for different speeds, user ranks, and random seeds. Note that the proposed algorithm is not manually tuned for the various conditions: all its hyper-parameters remain the same. It is important since in the real-life commercial system, BS does not have information about the user speed and, especially, about the user environment (e.g., urban, rural, etc.).

The proposed online deep learning model performs the mapping from the SINR measurements to the optimal MCS. The most significant advantage is achieved on the rises and falls of the SINR quality because ODL is more adaptive to the instant SINR than OLLA and instantly converges to the optimal MCS. The following Fig. 8 shows the uniform advantage of the ODL algorithm over OLLA.

IV-B ODL and Q-learning performance comparison

Next, we compare ODL performance with the performance of the Q-learning algorithm. Our simulation results show that the ODL method works uniformly better for all user ranks at a speed of 30 km/h and a random trajectory.

TABLE I: System configuration of simulation-based experiments.

CellMaxPower	40 dBm
ThermalNoisePower	-174 dBm/Hz
Bandwith	20 MHz
TxAntNum, $T$	64
RxAntNum, $R$	4
Sounding Period	5 ms

V Conclusion

This paper proposes a novel online deep learning solution for adaptive modulation and coding for massive MIMO systems. It learns to predict the probability of transmission success for different MCS values and selects the MCS with the highest expected throughput. Simulation results show that the proposed approach outperforms both the Q-learning approach and the traditional outer loop link adaptation method. When compared to standard OLLA, our method improves user performance by 10% to 20% in the full-buffer scenario. We provided an explanation for this advantage. The proposed approach has lower complexity than the Q-learning method and provides better and more stable performance. In addition, the proposed method is fully compatible with the current 5G RAN specifications. We hope that the analysis of the AMC problem provided in this paper will help to design better and simpler NN-based solutions for adaptive MCS selection in massive MIMO systems.

References

[1] S. T. Chung and A. J. Goldsmith, “Degrees of freedom in adaptive modulation: a unified view,” IEEE Transactions on Communications, vol. 49, no. 9, pp. 1561–1571, 2001.
[2] J. Wannstrom, “LTE-advanced,” Third Generation Partnership Project (3GPP), 2013.
[3] A. Sampath, P. S. Kumar, and J. M. Holtzman, “On setting reverse link target SIR in a CDMA system,” in 1997 IEEE 47th Vehicular Technology Conference. Technology in Motion, vol. 2. IEEE, 1997, pp. 929–933.
[4] P. Song and S. Jin, “Performance evaluation on dynamic dual layer beamforming transmission in TDD LTE system,” in 2013 Third International Conference on Communications and Information Technology (ICCIT). IEEE, 2013, pp. 269–274.
[5] K. I. Pedersen, G. Monghal, I. Z. Kovacs, T. E. Kolding, A. Pokhariyal, F. Frederiksen, and P. Mogensen, “Frequency domain scheduling for OFDMA with limited and noisy channel feedback,” in 2007 IEEE 66th Vehicular Technology Conference. IEEE, 2007, pp. 1792–1796.
[6] F. Blanquez-Casado, G. Gomez, M. del Carmen Aguayo-Torres, and J. T. Entrambasaguas, “eOLLA: an enhanced outer loop link adaptation for cellular networks,” EURASIP Journal on Wireless Communications and Networking, vol. 2016, no. 1, pp. 1–16, 2016.
[7] J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, “A survey on concept drift adaptation,” ACM computing surveys (CSUR), vol. 46, no. 4, pp. 1–37, 2014.
[8] B. Krawczyk and A. Cano, “Online ensemble learning with abstaining classifiers for drifting and noisy data streams,” Applied Soft Computing, vol. 68, pp. 677–692, 2018.
[9] C.-K. Wen, W.-T. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback,” IEEE Wireless Communications Letters, vol. 7, no. 5, pp. 748–751, 2018.
[10] M. P. Mota, D. C. Araujo, F. H. C. Neto, A. L. de Almeida, and F. R. Cavalcanti, “Adaptive modulation and coding based on reinforcement learning for 5g networks,” in 2019 IEEE Globecom Workshops (GC Wkshps). IEEE, 2019, pp. 1–6.
[11] L. Zhang, J. Tan, Y.-C. Liang, G. Feng, and D. Niyato, “Deep reinforcement learning-based modulation and coding scheme selection in cognitive heterogeneous networks,” IEEE Transactions on Wireless Communications, vol. 18, no. 6, pp. 3281–3294, 2019.
[12] D. Wubben, R. Bohnke, V. Kuhn, and K.-D. Kammeyer, “Near-maximum-likelihood detection of mimo systems using MMSE-based lattice-reduction,” in 2004 IEEE International Conference on Communications (IEEE Cat. No. 04CH37577), vol. 2. IEEE, 2004, pp. 798–802.
[13] L. Sun and M. R. McKay, “Eigen-based transceivers for the MIMO broadcast channel with semi-orthogonal user selection,” IEEE Transactions on Signal Processing, vol. 58, no. 10, pp. 5246–5261, 2010.
[14] P. Fan and K. B. Letaief, “Understanding of transmission throughput and channel capacity in a systematic way,” in 2011 20th Annual Wireless and Optical Communications Conference (WOCC). IEEE, 2011, pp. 1–5.
[15] G. Tesauro et al., “Temporal difference learning and TD-Gammon,” Communications of the ACM, vol. 38, no. 3, pp. 58–68, 1995.
[16] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.