2.1. Random over-sampling (ROS)
For the NN classification algorithm, a class imbalance will occur when the number of samples in some classes is more than that of instances in other classes. Here, we define the larger data set as the majority class and the smaller one as the minority class. For long-range high-speed wireless channels influenced by nonlinearity, phase noises, and high loss, it is difficult to learn from the samples of the minority class in the presence of serious class overlapping. The random over-sampling (ROS) algorithm’s main function is to overcome the imbalance problem by redistributing the training dataset. We employed the ROS technique on the Rx side of the W-band PS-16QAM RoF delivery in our work. The basic principle of ROS is shown in
Figure 1. As we know, N-order modulation signal equalization is regarded as N-classification. The PS technique will introduce a severe skew in the class distribution of the baseband PS-16QAM data, causing the generation of the majority class in inner rings and a few minority classes in outer rings, which can result in the original imbalanced dataset, as shown in
Figure 1(a). The core idea of the ROS algorithm is to randomly extract samples from the minority classes and make multiple replications in order to balance the class distribution of the training set [
32,
33]. The procedure of the classic ROS algorithm to generate a training set
is described as follows:
Step 1. For an original training dataset , the data size in the smallest minority class is counted as . Meanwhile, the data size in the biggest majority class is written as , and the sample rate is SR. Here and is the length of the final training dataset for every classification. The imbalance ratio is .
Step 2. For one specific minority classification and the biggest majority classification, all their original samples construct the original minority class training set and majority class training set , respectively.
Step 3. For , we randomly choose numbers from , and find the corresponding sample set from .
Step 4. Add the selected samples to the minority class set .
Step 5. Then get the generated training set .
Step 6. For all minority classifications, we repeat the operations above and obtain a large number of samples. Also, for majority classifications, we extract only a small subset of samples from the imbalanced dataset. Finally, we get the balanced dataset generation.
2.2. Two-lane DNN (TLD) Equalizer
A two-lane DNN (TLD) equalizer has been employed in our experiment to mitigate the nonlinear impairment. The so-called “two-lane DNN” represents that such an equalizer is composed of two DNNs, one designed to process real signal sequences, and the other is applied for imaginary signal sequences since our modulated signal is a 16QAM signal, a complex signal.
Figure 2 shows the schematic diagram of the TLD equalizer combined with ROS.
In the training process, our proposed TLD equalizer is trained via two steps. In the first step, the training dataset is randomly oversampled to generate the balance 16QAM signals. In the second step, the equalizer is trained with the help of the training balanced 16QAM sequence, and the weight value of the TLD equalizer will be further optimized until the target error value is achieved. In the testing process, differently, 30% of the original received PS-16QAM signal used as the testing dataset is directly input into the well-trained TLD neural network. Finally, the BER decision can be implemented according to the equalized testing signals. Therefore, it is worth noting that ROS is only implemented to the training data, and the TLD network is well trained with the help of balanced training data.
In general, a NN is made up of an input layer, several hidden layers, and an output layer. Our proposed TLD equalizer has one input layer,
L (set as 2 in our experiment) hidden layers, and one output layer.
and
are the weight values of two lanes, respectively, which link the hidden layers and the output layer, where
I or
Q denotes the
I or
Q lane of signals it deals with, respectively,
k represents the current
k-th layer,(0
-th for the input layer, and
3-th for the output layer), and
m and
j represent the
m-th node in the former hidden layer and the
j-th node in the current layer, respectively. Firstly, the weight value randomization (
and
) initializes their TLD equalizer, learning rate, and iterative learning epoch setting. Secondly, the over-sampled PS-16QAM dataset is separated into
I and
Q two-lane vectors, respectively, and then send to the input layer with a length of
. As is shown in
Figure 2, the input vectors can be written as
and
,and then be multiplied by weight values
and
in the first hidden layer, respectively. Because the neurons between the layers are fully connected, the output of the
j-th neurons in the first hidden layer can be described as follow, respectively:
where
denotes the number of nodes in the
i-th layer, with
and
defining the numbers of nodes in the input layer and output layer, respectively. Based on the training feed-forward process, the output of the
l-th neuron in the 2
-th hidden layer can be calculated as follow, respectively:
Here, as mentioned above,
L is set to 2. And
is the number of nodes in the
1-th hidden layer. Take
in the DNN dealing with
I-lane signals as an example, it represents the weight value in the
2-th hidden layer of the
m-th node in
L-1-th hidden layer to the
l-th node in the current layer.
and
denotes the output of the
m-th neuron in the
L-th hidden layer for the
I-lane DNN and
Q-lane DNN, respectively. It is worth noting that
f (.) denotes the nonlinear active function between hidden layers. In DNN, there are several common active functions like “sigmoid” and “tanh” functions, taking gradient explosion and gradient vanishing into consideration, we chose the “ReLu” function, which can be described as:
Meanwhile, the transformation from the hidden layer to the output layer is linear, contrary to that from the input layer to the hidden layer, which is nonlinear. Considering this, we thus chose “purelin” function as the active function
g (.) in the output layer, and the final equalized output result is given as:
We note that the weight values are adaptively updated based on least mean squares (LMS) error function, which can be given as:
Where
T refers to the length of the training dataset
. Then we subtract the predetermined expected output value
from the obtained output result
to get the error value
which is fed back to TLD equalizers to participate in the calculation. With the aid of BP algorithms, the weight values
and
will get updated constantly, until reaching the preset epoch or error value. The process of iterations can be given as:
Here, denotes the learning rate and the symbol refers to calculations of the gradients. In our proposed TLD-ROS equalization scheme, the original received PS-16QAM signal is firstly processed by deployed ROS, and then the target output is corresponding modified. Finally, the average distributed 16QAM is put into the TLD classifier to obtain the optimum network parameters.