1. Introduction
A hyperspectral remote sensing system uses sensors to collect the energy reflected by ground materials in a wide electromagnetic band range, producing hyperspectral imagery (HSI) with abundant spectral information [
1,
2]. HSI is a data cube with one spectral dimension and two spatial dimensions. Each pixel in HSI corresponds to a spectral vector, which can reflect its material characteristic and provide the basis for analysis in HSI [
3]. Target detection (TD), as an important part in the HSI processing field, is applied in both civil and military communities [
4,
5].
Different materials have unique spectral characteristics, thus spectral information plays a critical role in the TD field. Generally, whether two materials belong to the same category depends on the consistency of their spectra under the case without both interference and spectral aliasing. In the ideal assumption, the TD task in HSI will become extremely simple, because the main work can be concentrated on judging the consistency of two spectral vectors. However, the situation in real scenes is far from this assumption, as shown in
Figure 1. Although six samples in the HSI are chosen from the same type of aircraft projected, even the same one, differences in their spectral curves still exist. The intrinsic reason lies in diverse interference factors, such as the noise in the imaging device, unideal electromagnetic wave transmission environment, and reflection surface, as well as the aliasing between adjacent but different materials. All of these interferences cause variability in spectra, and make the problem of TD more challenging [
3,
6,
7].
To simplify the TD problem, researchers commonly divide the pixels in HSI into two parts: Targets of interest and backgrounds of indifference. Given this, the TD problem is described by a binary hypothesis model, which consists of two hypotheses: H
0 (target absent) and H
1 (target present) [
8,
9,
10].
Some models have been proposed to reasonably describe spectral variability for better solving of the TD task. The probability density model is widely used in the TD field, which supposes that the spectra of each species obey the Gauss distribution. Many classical TD algorithms are derived according to this model, such as matched filter (MF) [
3], spectral matched filter (SMF) [
11], and adaptive coherence estimator (ACE) [
12,
13]. These methods assume that the background pixels have the same covariance structure but different means under two hypotheses, and adopt the generalized likelihood ratio test (GLRT) [
14]. Subspace models are also widely used to explicate the phenomenon of variety [
3]. They suppose that the spectrum vectors vary in a subspace of the band space, where the dimensions of the band space are equal to the length of the spectral vector. Based on the above assumptions, the matched subspace detector (MSD) [
15], adaptive subspace detector (ASD) [
12,
16], and orthogonal subspace projection (OSP) [
17,
18] detector identify targets through the projection of the spectral vectors of tested pixels in a subspace. Unlike the above algorithms, the spectral angle mapper (SAM) [
19] distinguishes targets from backgrounds by measuring the angle distance (AD). Furthermore, the constrained energy minimization (CEM) algorithm employs a finite impulse response (FIR) filter with a constraint, which minimizes the output energy while preserving the target [
4,
20]. A robust detection algorithm uses an inequality constraint on CEM (CEM-IC) based on the spectral variability rather than the equality constraint in CEM [
21]. These methods have simple structures and can be easily implemented; however, they are accompanied with strict assumptions, and the utilization of target information is insufficient. Therefore, the TD results are often not accurate enough.
Recently, numerous TD methods have adopted machine learning-related technology. The kernel method is applied in many classical algorithms, and has also been used to produce kernel-based detectors, including kernel SMF, kernel MSD, kernel OSP, and kernel ASD [
1,
22]. These methods exploit higher-order statistics rather than using the first- and second-order statistics, and provide crucial information about data by the implicit exploitation of nonlinear features in the manner of a kernel. Furthermore, techniques about low-rank matrices and sparse representation are also employed in the TD field. The sparsity-based target detector (STD) [
23,
24] and the sparse representation-based binary hypothesis (SRBBH) [
25] express a pixel by a linear combination of very few atoms from an overcomplete dictionary consisting of targets and backgrounds. Additionally, the detector based on low-rank and sparse matrix decomposition (LRaSMD) [
26,
27] utilizes the low-rank property of backgrounds and the sparse property of targets. In addition, multitask learning (MLT) is also used in target detection [
28,
29]. Besides, some methods combining the sparsity and other structured or unstructured detectors have been proposed, like the hybrid sparsity and statistics detector (HSSD) [
30], the hybrid sparsity and distance-based discrimination (HSDD) detector [
31], the sparse CEM, and the sparse ACE [
32]. The above methods take advantage of some of the properties in HSI, but they often use a complex optimization algorithm to solve related object functions, which is more time consuming than traditional and classical methods.
One method for the TD problem, whether it is the classical method or the latest method proposed based on new technologies, relies on the prior spectra to locate targets, thus it is very significant to acquire one or more accurate and representative target spectra. If we randomly select several target samples as the input, the test results will appear fluctuated and biased [
33]. The existence of the interfering factors mentioned above is an obstacle to finding a satisfactory spectrum. Recently, some scholars have studied this problem and have proposed various methods, such as the extracting method-based endmember extraction (EE) [
34,
35], adaptive weighted learning method using a self-completed background dictionary (AWLM_SCBD) [
36], and target signature optimization based on sparse representation [
33]. These methods improve the accuracy of detection results by finding more accurate target spectra.
Moreover, there is another issue that is easily neglected; that is, how to make better use of the known prior information of the target, which is also an important factor affecting the detection performance, and which is what we are concentrating on in our work. At present, few people analyze the TD problem in HSI from this perspective. Due to the existence of spectral variation, there is often no significant demarcation line between the target and the background pixels. However, it is reasonable to make full use of spectral information to enhance the separability between them. Recently, Zou et al. [
37] proposed a hierarchical CEM (hCEM) algorithm as an improvement of CEM, greatly improving the detection accuracy of the latter. The hCEM builds hierarchical architecture to suppress the backgrounds spectra while preserving the targets. This method overcomes the deficiency of CEM, which is that it cannot completely suppress the backgrounds in one round of the filtering process. In hCEM, the prior target spectrum is used more than one time, so we can see that the hCEM makes better use of prior target information than CEM, and that is why hCEM outperforms CEM. However, when the backgrounds are suppressed by the hCEM algorithm, some target pixels are also suppressed, which reduces the accuracy of the detection result.
The AD metric, as an important measurement besides the distance metric, is considered in this paper. SAM, as a classical detection method, utilizes the AD metric, which makes it intuitive and simple. SAM is very convenient in processing the spectra with the same direction but different amplitudes in the same category because it uses AD. However, it is very rough in directly measuring the spectral AD between the tested pixel and the prior target as a basis for judging whether it is the target. The aliasing of the target and background means SAM is unable to take into account the requirements of both the detection rate and the false alarm rate. In order to better distinguish targets from backgrounds, as mentioned above, a good idea is to increase the angular separability.
In this paper, we propose an AD-based hierarchical background separation (ADHBS) model for the TD problem. The proposed method adopts a hierarchical architecture and combines the AD metric in whitened space between the spectra of tested pixels and the prior target to gradually separate the targets and backgrounds. In each layer, the AD is firstly obtained through the whitened data rather than the original data. Secondly, every pixel in the original data adjusts its value according to its corresponding AD. Next, the result is transmitted to the next layer, and the process above is repeated until the stop condition is satisfied. In order to alleviate the influence of interference, we adopt a simple smoothing preprocessing operation for HSI before the iteration. The contributions of our work are summarized as follows:
Combined with the AD in whitened space, a hierarchical architecture is employed to increase the angular separability between the background and the target pixels.
A vector perpendicular to the prior target spectral vector is introduced, and all background pixels move to the same point represented by this vector in the hierarchical processing, which is very helpful for separating the background from HSI.
A simple smoothing preprocessing operation is adopted to alleviate the influence of interference and to improve the accuracy of the detection result.
The rest of this paper is organized as follows.
Section 2 briefly describes the application of the AD metric to the TD problem. The ADHBS algorithm is presented in
Section 3. The effectiveness of the proposed model and the detection algorithm is demonstrated by extensive experiments presented in
Section 4. Discussions are presented in
Section 5. Finally, conclusions are drawn in
Section 6.
3. Proposed Method
In this paper, we were motivated by the spirit of iteration to gradually enlarge the difference of the target and background pixels relying on AD. For convenience, we used
and
to represent the vectors of the tested pixel and the given prior target spectrum, respectively. The target spectrum
could be obtained by averaging the target samples of a certain material in HSI, which is consistent with [
37]. The proposed method employs a hierarchical structure. In each layer, all vectors corresponding to background pixels move to the direction perpendicular to the target spectral vector
. The moving speed depends on the AD between the currently tested pixel
and the prior target
. The bigger the AD, the faster the background vectors move. In the iteration process, the AD between the background and the target pixels gradually increases.
Figure 5 exhibits a simple diagram, where
represents the target, and
and
represent two kinds of backgrounds. The dotted line represents the direction perpendicular to the target. As the iteration proceeds, background vectors gradually move away from the target, and tend to move toward the dotted line. As described above, the moving speed of the background vectors depends on the magnitude of the AD between itself and
. A large AD will make the background vector move faster. Background
is farther from the target than background
in the initial state, so background
will reach the dotted line first. Whether it is fast or slow, the background vectors will eventually move to the dotted line, and in this moment, the target and the background pixels are separated. In the diagram, two types of backgrounds are adopted to illustrate the basic idea. In practice, uninteresting backgrounds usually contain multiple categories, which are often more complex rather than single categories [
6,
39]. The proposed method could be viewed as follows: Different categories of backgrounds are separated in turn, according to the AD between them and the prior target spectrum, and the target is finally retained.
The AD metric between the background and the target is an important factor for separating the background. In the proposed method, the AD is calculated by the whitened data rather than the original data. If the original data are directly used to calculate the AD between
and
to separate background pixels, a serious aliasing phenomenon will still exist. The overlapping part of the target and background pixels is hardly separated, because these pixels in the aliasing area will move uniformly as they have the same AD to the given prior target. As stated in
Section 2, the whitening transformation can alleviate the aliasing phenomenon, thus it is appropriate and reasonable to calculate the AD by whitened data rather than original data to separate the background.
As mentioned above, the proposed method employs a hierarchical structure and the AD metric in the whitened space between the spectra of tested pixels and the prior target. In this paper, the lowercase letter
k is used to describe the number of layers reached by the current process, that is, the number of iterations. It starts at 1 and adds 1 for each new iteration until the end of the iteration. We use
and
to represent the vector of the
nth tested pixel in the
kth layer before and after the whitening process, respectively.
is the prior target spectrum, and
is a vector orthogonal to
.
can be generated by solving equation
.
and
are perpendicular to each other, satisfying the equation
, so
is one solution of equation
. The output of the
kth layer is formulated as follows, and it produces the
nth tested pixel in the (
k + 1)th layer:
In Equation (8),
is a parameter determining the degree of change in
and it should be located in [0, 1]. Because the change of
is related to the AD between
and
, the value of parameter
can be obtained through the AD metric, which is denoted as
. The following equation is the expression of
:
where
is the whitened target spectrum in the
kth layer. The absolute value operation is to ensure that the obtained angle is between 0 and 90 degrees. A large
makes its corresponding
change faster, so the size of
should increase with the increase of
. Here, the power function is adopted to calculate the
:
The parameter of the power function
p controls the rate of change of
, and further controls the moving speed of
in the direction of
. The constant 90 in the denominator ensures that
is between 0 and 1. The effect of parameter
p for Equation (10) is shown in
Figure 6.
As can be seen from Equation (8), in each layer, part of the tested pixel is replaced by , and thus pixels will gradually transfer to in the perspective of the AD metric. There are three reasons why we use Equation (8) to adjust the pixel value:
It can make the backgrounds shift to the direction perpendicular to the target, thus gradually increasing the AD metric between the background and the target pixels.
It can gradually turn all backgrounds into one genuine category. In HSI target detection, all non-target pixels are considered as background pixels. Some algorithms, such as SMF and ACE, treat the background as one single category. However, there is not only one species of background in real applications, which is an obstacle in the TD problem. Under the adjustment of Equation (8), layer by layer, all background pixels tend to move toward to the same point and turn to one category, which is helpful for solving the TD problem.
It can converge the final result. As long as the pixel is not exactly equal to the given target spectral vector, it will eventually be captured by the point under limited iterations.
As mentioned in third point, if one pixel is not exactly equal to the prior target spectrum, even if it belongs to the target, it will also tend to move toward , finally reaching the point determined by after sufficient iterations. The different values of the tested pixels make them move away from the given target spectrum at different speeds; consequently, during the continuous iteration process, the AD between the target and background pixels will gradually enlarge at first, and then slowly decrease thereafter. The reason for this decrease is as follows: When the backgrounds have been separated but the iteration process has not stopped, the target pixels, which are not exactly identical to the prior target , also begin to move in the direction orthogonal to . Thus, the iteration process should be stopped in an appropriate state, in which the majority of the backgrounds have converged to , and where there exists a large AD between the target and the background pixels.
Assuming that the iteration is stopped after the
Kth layer, a new dataset is obtained, represented as
. The final detection result is obtained by the AD between the new data
and the given prior target spectrum
:
These two expressions are equivalent. Thus, we took the second case to analyze the stopping criterion. In fact, each iteration can obtain one detection result. We use
to represent the detection result of the
kth iteration, and employ
to represent the energy of the result. As the iteration proceeds, the AD between the background and target pixels becomes larger and the cosine value becomes smaller. Thus, the energy of the detection results decreases. According to the trend of the energy decrease, we stopped the iterations by creatively setting a threshold, which was formulated by the following equation. Herein, we used the energy of the first layer detection result as a reference:
where
is a threshold.
In the first section, we mentioned that a series of interference factors leads to spectral variability, which degrades the accuracy of the TD task. Similarly, the same effect occurs in the AD metric between pixels. Here, a simple smoothing preprocessing operation is adopted, which can be seen as noise reduction, to further improve the detection result. The smoothing preprocessing operation uses a
mask as shown in
Figure 7c. When we directly employed the mask in
Figure 7a, the difference between adjacent pixels of the same class was reduced; however, the degree of aliasing between adjacent pixels of different categories increased. Finally, a compromise method was adopted: The mean of the original image and the smoothed image.
Figure 8 shows the flowchart of the proposed ADHBS algorithm, which includes the following steps:
- Step 1:
Smooth the input HSI;
- Step 2:
Generate a vector perpendicular to the target spectral vector ;
- Step 3:
Implement the whitening treatment and obtain and ;
- Step 4:
Calculate the value of and , and implement separation operations to obtain ;
- Step 5:
Calculate the result . If the stopping condition is satisfied, go to step 6; otherwise, , go to step 3;
- Step 6:
Output the detection result .
Algorithm1 gives the outline of the proposed method.
4. Experiments
In this section, we implemented the experiments on three HSI datasets to estimate the performance of the proposed ADHBS algorithm. Then, we compared the performance of the ADHBS method with nine competitive methods, including CEM, ACE, SMF, SAM, MSD, CEM-IC, STD, SRBBH, and hCEM. The first five methods are traditional and classical methods, and the last four methods have been proposed recently. All experiments were tested by the MATLAB 2018b software package.
Algorithm 1: ADHBS algorithm for target detection in HSI |
Input (and preprocessing): |
spectral matrix , target spectrum , threshold , |
Initialization: |
Hierarchical Separation: |
1. |
2. |
3. |
4. |
5. |
6.
|
7. |
8. |
Stop Criterion: |
9. |
if , go back to step 1; else, go to step 9. |
Output: |
10. . |
Three evaluation criteria were employed for analysis, including the receiver operating characteristic (ROC) curves, the area under the ROC curves, and the separability maps.
The ROC curve is widely used in the TD field. It describes the relationship between the probability of detection (
Pd) and the false alarm rate (
FAR). Based on the groundtruth image,
Pd and
FAR can be obtained by changing to different thresholds on the output of the detection system:
where
Nc and
Nt are the number of correct detection target pixels and the total true target pixels, respectively.
Nf and
Ntotal are the number of false alarm pixels and the total pixels, respectively. If one algorithm obtains a higher
Pd than another under the same
FAR, we can conclude that the former algorithm is better than the latter. When the ROC curves of these two methods are very close, it shows that the detection results of these two methods are very similar, and thus it is difficult to judge which is better. In this case, the area under the ROC curves (AUC) can be used to compare the performance of different methods. Additionally, in order to further illustrate the performance, the normalized AUC at lower
FAR (0.001) was added in this paper. The area below the ROC curve with a false alarm rate between 0 and 0.001 was first calculated and then normalized by dividing by 0.001 to obtain the normalized AUC at lower
FAR.
The separability maps describe the distribution of the true target and background pixels on the detection results based on the statistical perspective. They can intuitively reflect the degree of separation between targets and backgrounds by different detectors.
In this paper, we used the “null ()” function in the MATLAB 2018b software package to find a group of bases in the null space of
, and then linearly combined these bases to form
. Here, we made the coefficients of all bases equal to 1 and normalized the result of the linear combination so that the length of
was 1. According to [
15,
21,
23,
24,
25,
37], the parameters of the comparison algorithm were taken as follows: For CEM-IC, the radius parameter was set to 60 without normalizing HSI. For hCEM, the parameter of the exponential function was 20. The sizes of the inner and outer windows for MSD, STD, and SRBBH in the three datasets was 13 and 19, 23 and 31, and 13 and 19, respectively. The number of target dictionaries and the dimension of the target subspace were equal to one tenth of the number of target pixels in the dataset. For the ADHBS method, the parameter values of the power function
p will be shown in specific experiments.
4.1. Experiment on San Diego Airport Data
The first HSI dataset was collected by the AVIRIS from the San Diego airport area, which was introduced in
Section 2 and shown in
Figure 3. This dataset is classical and often appears in the experiments related to HSI target detection. Three airplanes located in the left of the image are targets. The parameter of the power function
p takes 8 for the San Diego airport data. The detection results of different methods are shown in
Figure 9. According to the variation of
in Equation (12) representing the energy ratio of each layer to the initial layer, we exhibit five results corresponding to different
in the first row in
Figure 9. It can be found that the background pixels are gradually separated as the iteration increases, and the target pixels are retained. The result in
Figure 9d is very close to the groundtruth, but there are a few background pixels that have not been separated. The backgrounds in the result of
Figure 9e are separated very well. However, the reserved target pixels are not as clear as that in
Figure 9d. The result of ADHBS in
Figure 9e is similar to the result of hCEM shown in
Figure 9j, and their performance is better than the results of the other methods.
The ROC curves and the separability map for the San Diego airport dataset are shown in
Figure 10. The AUC results of the different methods are in the first column of
Table 1 and
Table 2. It is necessary to note that the stop criterion in ADHBS is
, and the iteration in ADHBS stops in the 23rd layer. According to the ROC curves in
Figure 10a and AUC in
Table 1 and
Table 2, it can be found that the performance of the proposed method is better than that for the other methods. The performance of hCEM is only just inferior to that of ADHBS. From the separability map in
Figure 10b, we find that the separability between the target and background pixels of ADHBS is the greatest in all the results, and hCEM is the second.
4.2. Experiment on Indian Pines Data
The second HSI dataset was that of Indian Pines collected in North-western Indiana by AVIRIS, which is shown in
Figure 11a [
40,
41]. The image used for the experiment consists of 145 × 145 pixels, which is a subset of a larger scene. This dataset contains 224 spectral reflectance bands covering 0.4–2.5 μm. However, the number of bands is usually reduced to 200 in the case that the bands covering the region of water absorption should be removed. In this data, two-thirds of the area is covered by agricultural vegetation, and the remaining one-third is covered by forest or other natural perennial vegetation. The dataset includes 16 kinds of marked vegetation, such as alfalfa, oats, wheat, etc.
Figure 11b,c show the groundtruth and legend of different categories. In our experiment, we chose alfalfa, which accounts for only 0.2% with 46 pixels of this dataset, as the target to test the efficiency of the proposed method. The parameter of power function
p takes 6 for this data.
The TD results are shown in
Figure 12. The ground truth of the target is shown in the last sub-image. Five results from different layers in ADHBS are exhibited in the first row of
Figure 12. They correspond to different
values. The background pixels gradually separate and the target pixels are retained. For this data, the results of ADHBS and hCEM shown in
Figure 12e,j are both accurate. Other methods result in either incomplete background separation or the target being suppressed, resulting in poor detection performance.
The iteration in ADHBS stops in the 58th layers for this data when the stop criterion is set to
. The output is used for calculating the evaluation criterion. The ROC curves and the separability map for the Indian Pines data are shown in
Figure 13. The AUC results of the different methods are in the second column of
Table 1 and
Table 2. In
Figure 13a, the curves of ADHBS and hCEM coincide, so we increased the width of the curve corresponding to hCEM for more convenient observation. Obviously, the performances of ADHBS and hCEM are the best. In the separability map, hCEM perfectly separates the target from the backgrounds, and the separation performance of ADHBS is second only to hCEM.
4.3. Experiment on Okavango Delta Data
The third tested dataset was acquired from the Okavango Delta, Botswana in 2001–2004 by the National Aeronautics and Space Administration (NASA) EO-1 satellite [
42]. It includes 242 bands covering a 400- to 2500-nm portion of the spectrum in 10-nm windows. After removing the low SNR and bad bands, 145 bands were retained. This dataset consists of 14 identified classes representing different land covering types in seasonal swamps, occasional swamps, and drier woodlands located in the distal portion of the Okavango Delta. Specific categories are water, riparian, grasslands, etc. The sense of this Okavango Delta data is shown in
Figure 14a, consisting of
pixels.
Figure 14b,c show the ground truth and legend of different categories. Here, we chose the area covered by water as the target. A subset of this data enclosed by a black box was chosen to implement the experiment. In order to show clearly the distribution state of the target, the subset with the corresponding ground truth of water is shown in
Figure 14d,e. The size of the subset is
. The parameter of power function
p takes 6 for the Okavango Delta data. The detection results are shown in
Figure 15. As in the previous two experiments, the first row of
Figure 15 is used to show the results of the proposed method from different layers corresponding to different values of
. In the iteration process, the background pixels of the image are gradually separated. The target pixels in
Figure 15a–c are detected well by ADHBS, but there are still a lot of unseparated background pixels around the target and other areas. In
Figure 15d,e, the residual state of the background pixels is improved. Meanwhile, the interesting target pixels are also suppressed to some extent but without affecting discrimination. ACE and hCEM are inferior to ADHBS in the separation of target and background pixels but better than the other methods. The target pixels in the output of the other methods are submerged in the undivided backgrounds.
For this data, the iteration in ADHBS stops in the 141st layer when the stop criterion is
. The ROC curves and the separability map are exhibited in
Figure 16. Furthermore, the AUC results are listed in the third column of
Table 1 and
Table 2. From the ROC in
Figure 16a and AUC in
Table 1 and
Table 2, we find that the performance of ADHBS is the best. In the separability map, it is easily seen that the ACE, hCEM, and ADHBS can suppress most of the background very well, but for the hCEM method, it suppresses the target more than the other two methods. That is why the AUC value of hCEM is lower than that of ACE and ADHBS.
Through the above three groups of experiments, we can draw the conclusion that the proposed method has better ability to separate background pixels and retain target pixels than the other methods. At the same time, we find that hCEM also shows good detection results. It should be noted that both hCEM and ADHBS adopt hierarchical structures. In other words, both of them use the prior target information more than once. The experimental results show that better detection results can be obtained by fully utilizing the target information.