3.1. The Principle of Support Vector Regression
The concept of support vector machine (SVM) was first proposed by Cortes and Vapnik based on the theory of statistical learning, and it has been well applied in the field of classification and regression. SVM changes the traditional empirical risk minimization principle, so it has good generalization ability. When the SVM deals with nonlinear problems, it first transforms the nonlinear problem into a linear problem in a high-dimensional space, and then uses the kernel function to avoid the inner product operation in the high-dimensional space, effectively solving the problem of the curse of dimensionality and local minimum. When SVM is used to solve regression problems, it is also called Support Vector Regression (SVR) [
40,
41]. The principle of support vector regression is introduced below.
Suppose there is a sample set
, where
and
are the input and output of the training set. In the high-dimensional feature space, there must be a linear function
that can establish a nonlinear relationship between
and
. Such a fitting function expression is as Equation (13).
where
is a nonlinear transformation, which maps the input space to a high-dimensional feature space (Hilbert space), which is convenient for linear approximation in the feature space.
and
b are the parameters to be requested. Different from the traditional regression model, SVR is committed to minimizing empirical risk and structural risk. The target model and constraints of the SVR model are as Equation (14).
where
is the number of training samples,
and
are non-negative slack variable, which is used to measure the deviation from the distance-insensitive boundary.
is the structural risk, which is used to prevent overtraining.
is the boundary coefficient for balancing structural risk and empirical risk, the selection of its value directly affects the ability of transformation. The basic principle of SVR can be seen in
Figure 2.
In
Figure 3,
is insensitive loss function. If the error between regression prediction result and training sample is less than
, no error vector will be generated. The sample points outside the prediction model with
will influence the minimizing equation.
By introducing the Lagrangian multiplier method, it is converted to a dual space to solve:
In the Equation (15),
and
are the Lagrange multipliers. According to the Karush-Kuhn-Tucker (KKT) condition, solve and organize the expression of the final regression function at the optimum is as Equation (16).
Here introduces a kernel function that meets the Mercer condition
, then the Equation (16) becomes Equation (17).
Different kernel functions can construct different support vector machines. Compared with other kernel functions, Radial Basis Function (RBF) has the characteristics of few parameters and less computation. In order to improve the calculation efficiency of real-time compensation, this paper selects RBF as the kernel function of SVR, Equation (18) gives the expression of the function.
3.2. Principle of Adaptive Filter Based on Variational Bayesian Theory
Firstly, the discrete-time linear stochastic system obtained from the state-space model of the integrated navigation system needs to be considered.
In the Equation (19),
is discrete time, both process noise
and measurement noise
obey Gaussian distribution with mean 0 and covariance matrices of
and
.
and
are both linear. In such a case, the estimation result of KF is optimal. However, in practical applications, the noise is time-varying, and using the noise matrix that does not satisfy the Gaussian distribution will seriously affect the filtering results. Therefore, it is necessary to adopt an adaptive filtering strategy to solve such problems. Traditional adaptive filtering aims to estimate the posterior distribution of state and noise covariance
,
represents the collection of measurements from the initial time to time
. Since such a joint probability density function does not have an exact analytical solution, the conventional Variational Bayesian approximation is used to decompose the joint probability density function of the state and measurement noise covariance as follows
In the Equation (20),
and
are the approximate posterior probability density function. But at its root, this assumption is based on a known precise distribution of process noise. In cases where the statistical distribution of process noise is unknown, this can lead to filter performance degradation. Therefore, in order to deal with the problem of inaccurate process noise, the prediction error covariance matrix is added to the original method [
39], and the above formula can be reconstructed as
It can be seen from the In the Equation (21), that the posterior joint probability density function
can be decomposed into approximations of the posterior probability density function after VB approximation. In this way, the problem of finding the joint probability density function is transformed into the approximation of three probability density functions. As for how to obtain these three approximations, it must be clear that the core idea of VB approximation is to minimize the Kullback-Leibler divergence (KLD) between the true distribution and the approximation. The Equation (22) is as follows [
42]
The Equation (23) is the calculation formula of Kullback-Leibler divergence (KLD)
After the calculation formula is clear, the optimal solution of the above formula can be obtained [
42].
In the Equation (24), indicates the desired operation, represents a set of , , three elements. denotes any one of the elements in . represents any element in the collection except . Since the three approximate posterior probability density functions are coupled to each other, fixed-point iteration is required to solve the above equation. That is, the approximate posterior probability density function of any element in the set is updated to , which is the number of fixed-point iteration.
Through the analysis of the above formula, it can be seen that if it is required to obtain
, it is necessary to solve
first. According to the conditional independence of the hierarchical state space model, the probability density function
can be decomposed as follows
The first two items of the Equation (25) are not unfamiliar. Since the traditional Kalman filter is processed based on the Gaussian approximate distribution,
and
, Then the one-step prediction probability density function and the probability density function of the measurement error distribution are both Gaussian, so the Equations (26) and (27) holds
where
represents the mean
of the probability density function of the Gaussian distribution, and the covariance matrix is
. The true
and
are not available in one-step prediction, Therefore, it is necessary to select appropriate prior distributions for the measurement noise covariance matrix and the prediction error covariance matrix, in order to ensure that the prior distribution and the posterior distribution have the same functional form, such a prior distribution must be conjugate. Inv-Wishart is often used to represent the conjugate prior of the covariance matrix of the Gaussian distribution. The specific form and relevant parameters of Inv-Wishart distribution can refer to reference [
43]. Because
and
as shown in the above equations, are both Gaussian covariance matrices, so their prior distributions are selected as Inv-Wishart distributions.
In the Equation (28) and Equation (29),
represents the dof parameter of the Inv-Wishart distribution is
, and the inverse scale matrix is
. Referring to the Equation (30) for calculating the mean value of the Inv-Wishart distribution in [
41], there are
where
is the dimension of the system state equation, let
, be the adjustment parameter
. It can be described as Equation (31).
According to Bayesian principle, the prior distribution
can be expressed as the Equation (32)
However, it is difficult to determine
, considering that the measurement noise covariance changes slowly in the actual integrated navigation system. Here, the heuristic method proposed in [
30] is used to expand the previous approximate posteriors through
. The specific process is as follows
In the Equations (33) and (34), the value range of the forgetting factor is , and the initial value of the measurement matrix is .
The Equation (25) can be rewritten as the Equation (35)
Then the principle of obtaining
is to obtain
,
,
in turn according to the principle that a single variable is gradually introduced. For the specific calculation process can refer to [
34], and the specific algorithm flow is given in the next section. At the same time, after fixed-point iterations of times
N, the variational approximation of the posterior probability density function is given by the Equation (36).
3.3. Support Vector Regression Assisted Adaptive Filter Algorithm Specific Process
In the last section, the principle of the adaptive filter based on the VB theory is described, but it cannot be denied that in practical applications, the anti-interference ability of this filter is low, especially in the DVL measurement process. Such outliers can easily lead to filter instability. The idea of the traditional robust filter is to reconstruct the measurement noise matrix based on statistical theory, and to discriminate and eliminate outliers. However, when the number of iterations is too large or it is affected by external non-Gaussian noise, it is easy to cause the problem of inaccurate singular value decomposition or even divergence in the filtering process. Therefore, a method specially designed for outlier elimination of external sensors is adopted to improve the robustness of the integrated navigation system. The specific principles are as follows:
Firstly, the time update can be described as Equation (37):
After the time update, the measurement update should have been performed. Here, the DVL data is processed by the SVR method before the measurement update. First, a sliding window of length needs to be established, then at the current moment, the historical measurement data set of external sensors is
, when using one of the data for prediction, the Equation (38) mapping relationship can be established, where
is called the model order.
In order to use the SVR algorithm to train the prediction function, the historical data samples of the sensor are reconstructed and transformed into input samples in the form of a matrix. The input and output samples are respectively
and
, the specific form is as Equation (39).
By training the above training samples, the predicted value
at the current moment is obtained as shown in the Equation (40)
After the predicted value is obtained, it is necessary to carry out the step of discriminating outliers. Here, the innovation
is introduced to discriminate outliers. The specific form is as Equation (41)
The selection principle of the discriminant threshold
is usually carried out in accordance with the principle of
, the threshold value can be selected by the difference between the actual value and the predicted value in the sliding window. The average value and standard deviation in the sliding window are defined as Equation (42).
After obtaining the mean and standard deviation of the difference between the actual value and the predicted value in the sliding window, set the threshold
. When the difference between the real value and the innovation
is greater than the threshold
, the predicted value is used to replace the real value for measurement update. The specific schematic diagram is shown as
Figure 4.
The next step is to perform the measurement update, and the parameters need to be initialized before the update. Equation (43) represents the initialization process.
Enter the fixed-point iterative update process after initializing the parameter values.
The first is to update the approximate probability density function
of the one-step forecast covariance matrix. Equation (44) gives the calculation process.
The approximate probability density function of the imprecise measurement noise covariance matrix
is then updated by Equation (45).
Next, update the approximate probability density function of the state by using the update expression of the approximate probability density function obtained before.
The specific calculation process is shown in Equation (46).
Equation (47) gives the result of the end of the iterative process.
The specific flow chart of the adaptive filter assisted by support vector regression is given as
Figure 5.