1. Introduction
The exponential increases in wireless throughput for many different types of users with high quality of service demands have been predicted to continue in upcoming years [
1]. Fifth-generation (5G) and beyond wireless communication has been developed by integrating several disruptive technologies such as Massive MIMO, mmWave communications, and reconfigurable intelligent surfaces to handle the fast growth in wireless data traffic and reliability communications [
2,
3,
4]. The orthogonal frequency division multiplexing (OFDM) technique has been verified to be a contributor due to its inevitable successes in wide-band communication networks. In fact, OFDM is still deployed in 5G systems to combat the frequency selective fading effects, therefore offering good communication quality in multi-path propagation environments [
5]. Specifically, the OFDM technique increases the spectrum efficiency significantly compared with a single-carrier approach. When the transmitted signals propagate through the wireless multi-path channels, they are distorted by many detrimental effects; for example, large obstacles, multi-path propagation, local scattering, and mutual interference by sharing the same time and frequency radio resources. To decode the desired signal effectively, the channel state information and its effects should be estimated and compensated at the receiver. For this purpose, the pilot signals should be known to both the transmitter and receiver, which are exploited to perform the channel estimation. In a 5G system, the structure of the pilot symbols in each data frame could be varied depending on the different use cases in practice [
6]. We note that, among the traditional channel estimation methods, least squares (LS) estimation is well-known as a low computational complexity method because this estimation requires no prior channel statistics [
7,
8]. However, LS estimation provides relatively high channel estimation errors in many practical applications, especially for multi-path channels. As an alternative solution, minimum mean square error (MMSE) estimation yields much better channel estimation quality than LS estimation by minimizing the channel estimation errors on average [
9]. The closed-form expression of the channel estimates obtained by the MMSE estimation relies on the assumption that, for instance, the propagation channels are modeled by a linear system, while each channel response follows a circularly symmetric complex Gaussian distribution [
10,
11]. Nonetheless, the MMSE estimation usually has high computational complexity since channel statistic information—i.e., the mean values and the covariance matrices of the propagation channels—is required. In many propagation environments, this statistical information is either extremely difficult to obtain or varies quickly in a short coherence time, making MMSE estimation challenging to implement [
12,
13].
Machine learning has recently attracted a great deal of attention in both academia and industry for various applications of wireless communications, such as radio resource allocation, physical security, signal decoding, and channel estimation [
14,
15,
16,
17,
18]. Regarding the channel estimation application, the authors in [
19] reported the use of a trained deep neural network (DNN) model with the help of a pilot signal to estimate underwater channels in an efficient manner. In [
20], the authors suggested to exploit the channel correlation in both time and frequency domains with a DNN model to perform channel estimation for the IEEE
p standard. Furthermore, in [
21], the authors investigated the effects of the channel estimation phase for a wireless energy transfer system and demonstrated that downlink channel estimation is necessary to harvest energy feedback information. In the considered system, a DNN structure makes better channel estimates than the traditional estimations comprising the LS estimation and the linear MMSE (LMMSE) estimation. We emphasize that several sophisticated techniques have been applied to estimate channel state information (CSI) to date. In a MIMO system, we could assume in practice that the CSI from each antenna at the BS shares the same autocorrelation pattern for enhancing the channel estimation quality of a particular terminal [
22]. By effectively deploying this property and arranging the CSI from the multiple antennas into a matrix, the system can exploit a well-known technique from the fields of image recognition and image denoising [
15,
23,
24,
25] to predict the pattern of CSI variation by means of the channel structure. In particular, a convolutional neural network (CNN) is applied in [
26] for channel estimation in a mmWave Massive MIMO system to reduce noise from the estimated channel, thus outperforming the traditional counterparts. In [
27], the authors proposed a CNN-based scheme to predict channels in a large-scale MIMO system as the channels age. The authors in [
28] used a deep CNN to enhance the channel estimation quality while retaining high performance compared to the traditional methods by utilizing less pilot overhead. The numerical results showed that the data-driven method remarkably improved the prediction quality. However, the authors in those papers did not consider the influences of Doppler frequencies, which can cause significant changes in the channels over time and even make the channels nonstationary. In addition, the velocity of the receiver may often vary; thus, it is important to evaluate the effect of the mismatch of the Doppler frequency between the training and testing stages of a DNN model. Another approach is to treat instantaneous channels as a time series data and then consider the CSI estimation as a typical time series learning problem to model the problem. In this case, there exist several powerful architectures in the literature that can track the long-term correlation of the channel profile effectively, including long short-term memory (LSTM) [
29] and the gated recurrent unit (GRU) [
30]. The authors in [
31] suggested a scheme that integrates an LSTM network and a feed-forward neural network (FNN) in a unified structure to track time-varying channels, but without mobility. Apart from this, the authors in [
32] reported the use of a bidirectional GRU network to estimate time-selective fading channels. Because of the ability to learn and predict the relationship among the various realizations of the propagation channels, those recurrent neural network structures showed unprecedented improvements over the traditional suboptimal channel estimation methods. Nonetheless, in both papers, the authors only considered channel estimation in SISO systems. Since MIMO technology has been widely used in many modern wireless communication systems, the evaluation of the use of a recurrent neural network for estimating channel information under the Doppler effect is necessary.
In this paper, we extend our preliminary work [
6], which only used a fully-connected deep neural network (FDNN) model to enhance the channel estimation of a MIMO-OFDM system over frequency-selective fading channels. We show the system performance of the proposed deep learning-based channel estimation framework with different receiver velocities and different neural network structures. The channel parameters in each scenario are generated based on the tapped delay line type C model (TDL-C) that was reported by 3GPP [
33]. Our main contributions are summarized as follows:
We construct a MIMO-OFDM system with the channel profile suggested by 3GPP for 5G-and-beyond systems, accounting for the effects of mobility and frequency selective fading. We make a practical assumption that the receiver does not know the instantaneous channels and that the transmitted data symbols should include pilot signals for the channel estimation;
We propose a general deep neural network that assists with the traditional channel estimation technique. Our framework does not require any prior information of channel statistics. In particular, the proposed deep learning-based channel estimation framework exploits a neural network to learn the features of the actual channels by utilizing the channel estimates obtained from the LS estimation as the input;
We provide three examples of exploiting DNN structures: a fully connected DNN, CNN, and bi-LSTM. With these typical examples, we evaluate the degree to which the system performance is improved by the assistance of a DNN in comparison to the LS estimation;
We evaluate the performance of the DNN-based channel estimation framework by extensive numerical results and show its effectiveness by comparing it with the traditional LS estimation and LMMSE estimation, in terms of both the mean square error (MSE) and bit error rate (BER). We further analyze whether the proposed estimation is robust to Doppler effects.
This paper is organized as follows:
Section 2 presents in detail the considered MIMO-OFDM system for the 5G-and-beyond channel profile. The deep learning framework that enhances the channel estimation quality is presented in
Section 3 with the three popular neural network structures. The computational complexity of the proposed framework is also analyzed in this section. The extensive simulations used to verify the machine learning-based channel estimation are shown in
Section 4 with different setups. Finally,
Section 5 presents the conclusions of the paper.
Notation: The upper and lower-case bold letters are used to denote the matrices and vectors, respectively. The notation denotes the circularly symmetric Gaussian distribution and is the complex field. The notation is the expectation of a random variable. The notation ⊗ is the convolutional operator, while ⊙ is the Hadamard product. is the big- notation that expresses the order of computational complexity. Finally, and denote the Euclidean of a vector and the Frobenius norm of a matrix, respectively.
3. Deep Learning-Based Channel Estimation
In wireless communications systems, coherent detection requires knowledge of the propagation channels between the transmitter and the receiver, which are possible to estimate by utilizing conventional estimation techniques. In this section, we present the two widely-used channel estimation schemes that motivate us to exploit deep learning frameworks to improve the channel estimation errors.
3.1. Motivations
As long as no inter-carrier interference occurs, each subcarrier can be expressed as an independent channel, therefore preserving the orthogonality among the subcarriers. The orthogonality allows each subcarrier component of the signal in (
10) to be expressed as the Hadamard product of the transmitted signal and channel frequency response at the subcarrier [
34] as
where
,
and
are the Fourier transforms of the noise, channel, and signal, respectively (unless we are working in the frequency domain).
Of all the traditional channel estimation methods, LS estimation is one of the most common approaches. We denote by
the channel estimate from the transmission antennas at the
b-th receiver antenna obtained by this estimation method. LS estimation gives the closed-form expression of the channel estimate as [
8]
where
denotes the Hermitian transpose, and
is the
matrix, denoting the transmitted signal from the transmission antennas;
is the number of the pilot signals in an OFDM symbol; and
is the regular transpose. The channel estimate from each transmission antenna can be formulated as
Then, the channel responses from all sub-carriers can be obtained by applying a linear interpolation method. It should be noted that LS estimation is a widely-used estimation approach because of its simplicity. Nevertheless, this technique does not exploit the side information from noise and statistical channel properties, such as the spatial correlation among antennas, in the estimation, and thus high channel estimation error can occur when applying LS estimation for propagation environments with a high mobility.
To cope with the above drawbacks, one can utilize the LMMSE estimation approach, which minimizes the mean square error. For LMMSE estimation, the channel estimate is formulated in the closed form expression as [
34]
where
is the LMMSE estimated channel from the
th transmission antenna at the
th receiver antenna,
is the auto-correlation matrix of the channel response in the frequency domain with the size of
;
is the cross-correlation between the actual channel and the channel estimate obtained by the LS estimation with the size of
;
is the variance of the transmitted signals, respectively;
is the identity matrix of size
. The impacts of both noise and spatial correlation among the antennas are taken into account by LMMSE estimation, which is able to improve the channel estimation accuracy. However, LMMSE estimation requires the prior knowledge of channel statistical properties; thus, the computational complexity is higher than LS estimation. Additionally, since it may be difficult to obtain the exact distribution of channel impulse responses in general [
38], the performance of the LMMSE estimation cannot always be guaranteed.
3.2. Fully Connected Deep Neural Network-Based Channel Estimation
To overcome the aforementioned drawbacks of LS and LMMSE estimation approaches, we propose a FDNN-aided estimation that minimizes the MSE between the channel estimate obtained by LS estimation and the actual channel. The structure of the proposed FDNN-based channel estimation is depicted in
Figure 5. As shown in this figure, the proposed FDNN structure is organized as layers including the input layer, hidden layers, and output layer. Notice that an FDNN may have many hidden layers. However, for the considered MIMO-OFDM system, the proposed FDNN structure is designed with 3 hidden layers that include multiple neurons. In particular, a neuron is a computational unit that performs the following calculation:
where
M is the number of inputs to the neuron for which
is the
i-th input (
);
is the
i-th weight corresponding to the
i-th input;
b is a bias; and
o is the output of this neuron. In Equation (
26),
is an activation function that is used to characterize the non-linearity of the channel data. In our proposed FDNN-based channel estimation, we borrow the tanh function as the activation function, which is defined as
where
e is Euler’s number. To minimize the mean square error, the FDNN-based channel estimation is used to learn the actual channel information provided by the channel estimates obtained from the LS estimation as the input. In more detail, we define a realization of the input for the training process as
where
is LS-estimated channel gathered from all received antennas, where the superscript
n denotes the
n-th realization; K is the number of channel samples that FDNN can handle; and the
and
operators give the real and imaginary part of a complex number, respectively. The output of the neural network is formulated as
where
is the output of the neural network at the
n-th realization. In Equations (
28) and (
29), we separate the channel estimates into the real and imaginary parts to handle the complex numbers for the use of the FDNN neural network. The learning process handles the one-by-one mapping as
As desired, the output of the neural network should be identical to the actual channels. Alternatively, the purpose of the FDNN-aided estimation is to minimize the MSE between the prediction and actual channels on average; thus, the loss function utilized for the training phase is defined as
where
N is the number of realizations used for training, and
is the actual channel corresponding to
.
and
include all the weights and biases, respectively. From a set of initial values, the weights and biases are updated by minimizing the loss function (
31) with forward and backward propagation [
15].
3.3. Convolutional Neural Network-Based Channel Estimation
CNN models have been proposed for image denoising algorithms and have been well studied by the image processing community. CNN models can be applied to learn the mapping from noisy images to clean images [
39,
40], therefore mitigating noise in the images. In addition, due to the sharing of weights and biases, a CNN can reduce the number of parameters, which reduces the complexity of the system. Based on these ideas, we can use CNN to learn the mapping from noisy channels obtained by an LS estimator to the true channels. The structure of the proposed CNN-aided estimation is shown in
Figure 6. As depicted in the figure, the proposed CNN consists of a 2D input layer, convolution layers, activation layers, and a linear layer. The 2D input layer takes the LS-estimated channel as an input, which is separated into the real part and image part and reshaped to a 2D matrix form. The channel matrix is then fed to the convolution layers. We denote by
the set of convolution layers for CNN. Each convolution layer
includes
convolution kernels of size
that are convolved with the layer input
, where
and
are the size of the
-th convolution layer. The output of the
l-th convolution layer
is
where
and
are the weights and biases of the convolution kernel for the
l-th convolution layer, respectively, and
is the convolution operator. For the proposed CNN model, after each convolution layer, we apply the well-known rectified linear unit (ReLU) activation layer, which is given as
In particular, to train the CNN model, we first reshape the LS-estimated channel from all antennas into the matrix form
, separate it into a real part and image part, and then define a realization of the input for the training process as
In a similar manner, the corresponding output of the CNN is formulated as
which contains the real and imaginary matrices of the channel estimates. The CNN model is trained to handle the following matrix mapping as
The purpose of applying the CNN model is to minimize the mean square error between the estimated and the true channels. Therefore, we use the loss function, which is defined as follows:
where
N is the number of realizations used for training, and
is the actual channel in the matrix shape corresponding to
.
and
include all the weights and biases, respectively. During the training process, the weights and biases of the CNN will be updated by minimizing the loss function (
37). We stress that the loss function (
37) shares the same training data with that in Equation (
31), but the fine structure is different. Specifically, the instantaneous channels are stacked in the vector form in Equation (
31), while it is arranged in a matrix form in Equation (
37) to make use of the benefits of the CNN.
3.4. Long Short-Term Memory-Based Channel Estimation
In the two previous subsections, we proposed two deep learning-based channel estimation methods: FDNN-based and CNN-based channel estimation approaches. However, those two methods have no ability to exploit the long-term correlation of the channels, and thus they could not reach the optimal performance in general. To address this issue, one good choice is to apply a neural network that has the ability to study the behaviors of the channel correlations, such as a recurrent neural network (RNN). The simple structure of a one-layer RNN is given in
Figure 7. As we can see from this figure, the input of the RNN cell in the current time step is the output of the RNN cell in the previous time step. Working in this way, the RNN can remember the past information of the input. The basic RNN cell is the computation unit, which performs the following calculation [
41]:
where
is the activation function;
and
are the hidden states at the time step
t and
, respectively;
and
are the input and the output at the time step
t;
,
, and
are the weights for the input layer to the hidden layer, the hidden layer to the next hidden layer, and the hidden layer to the output layer, respectively; and
,
, and
are the corresponding biases.
However, the simple RNN cell has several weaknesses: first, it has no ability to exploit the future information of the data, while the channel at the time step t has a relation not only with the past but also the future. Thus, the bidirectional network should be used in this case to obtain better performance. Second, another problem with using a simple RNN cell is that it cannot capture long-term information. One solution for this problem is to use LSTM instead. Consequently, in this paper, we propose a bidirectional-long short-term memory (bi-LSTM) network for 5G channel estimation to overcome the above-mentioned weaknesses.
The structure of the proposed bi-LSTM network for the channel estimation is illustrated in
Figure 8. In the bi-LSTM structure, the simple RNN cell is replaced by the corresponding LSTM cell, which has the structure shown in the top of
Figure 8. The computation of the LSTM cell will give the result as shown in the following equations [
41]:
where
is the hyperbolic tangent function, and
,
,
,
,
,
,
,
,
,
,
, and
are correspondingly the weights of matrices and biases. The forget function
defines which information will be forgotten by the LSTM cell,
is the cell state that contains the important information from the past, and
is a new candidate value that defines which information will be updated to the cell state
and
is the hidden state function of the LSTM cell. By working in this way, the LSTM cell can capture the important information from the past and avoid the redundant information, thus providing a greater ability to capture the information compared to the simple RNN cell. The bottom of
Figure 8 shows the structure of the bi-LSTM network. As we can see, the bi-LSTM approach is the combination of two LSTM networks with two different directions. The output of the bi-LSTM takes the outputs of the two LSTM cells into consideration via the linear layer as
where
is the hidden state concatenated from the forward hidden state
and the backward hidden state
, and
and
are the weights and biases of the linear layer, respectively. Therefore, the bi-LSTM approach can exploit the relation of both history and the future with the data in the current time step. To apply the bi-LSTM model for our system, we first gather the LS-estimated channels from all antennas and then define a realization of the input for the training process as
where
L is the sequence length considered for bi-LSTM network. Note that the input of bi-LSTM
is the LS-estimated channel for all
channel streams, so the number of features for the input is
. The output of the bi-LSTM network is the corresponding true channel as
The purpose of using a bi-LSTM network is to minimize the MSE between the predicted channel and the true channel; thus, the MSE loss function is considered. The objective function of bi-LSTM network is expressed as
where
is the true channel corresponding to
;
and
are all the weights and biases of bi-LSTM;
N is the total number of training samples; and the superscript
n denotes the
n-th training sample. The loss function can be minimized by updating
and
using gradient descent algorithms. We note that this paper considers the perfect instantaneous channels to be available for the training stage, and therefore we emphasize the imperfect channel state information as a potential extension of our work in the future.
Remark 1. The deep learning-based channel estimation framework studied in this paper is based on the assumption that the perfect CSI is available during the training stage. Such information can be very accurately estimated by the orthogonal pilot signals with a sufficiently large power budget. Even though these conditions for the pilot signals increase the cost for the training stage, the neural networks can learn the channel profile properly. The effects of imperfect channels on the training of neural networks along with the performance reduction in the testing stage as a consequence are of practical interest, which will lead to solid works in the future.
3.5. Computational Complexity
In this section, the complexity of the three deep learning models proposed to assist in the channel estimation phase is analyzed by utilizing big-
notation. The computational complexity of the proposed models involves two main parts: offline training and online prediction. The complexity analysis for offline training is still an open problem due to the complex implementation of the back-propagation process. However, we assume that the complexity of offline training can be afforded since it is an offline process [
42]. Therefore, we only concentrate on the complexity of the online prediction phase. We use big-O notation, which is a common method to describe the complexity of the proposed deep learning-based channel estimations. The number of arithmetic operations with the dominant costs is used as the metric to obtain the computational complexity order [
7].
For the FDNN-based channel estimation, from (
26), we can see that if the model has
H hidden layers, the total number of arithmetic operations has a computational complexity in the order of
where
I,
K, and
denote the input size, output size, and the number of neurons in the
i-th hidden layer, respectively. Therefore, for one OFDM symbol, the input and output size is chosen as
, and we have
samples. By using (
50), the FDNN model has a complexity that can be shown as
We now investigate the computational complexity of the CNN-based channel estimation. Given that there are
kernels of size
in the
l-th convolution layer, the number of multiplications for the
l-th convolution layer is
, where
and
are sizes of the
l-th layer. Therefore, the complexity of all convolution layers is
[
43]. The number of multiplications for the linear layer equals
. Since, for one OFDM symbol, the sizes of the convolution layer and the linear layer are
, the total number of multiplications required in the CNN model can be calculated to be in the order of
For the bi-LSTM network, it is well-known that the computational complexity of a bi-LSTM cell is
[
44], where
is the bidirectional flag (
for bi-LSTM). The notations
, and
denote the input size, the number of memory cells, and the output size, respectively. As mentioned before, the input and output of the bi-LSTM network include the
features. The sequence length for one OFDM symbol can be chosen as
. Therefore, the computational complexity of bi-LSTM network is in the order of