A Hyperspectral Change Detection (HCD-Net) Framework Based on Double Stream Convolutional Neural Networks and an Attention Module

Seydi, Seyd Teymoor; Boueshagh, Mahboubeh; Namjoo, Foad; Minouei, Seyed Mohammad; Nikraftar, Zahir; Amani, Meisam

doi:10.3390/rs16050827

Open AccessArticle

A Hyperspectral Change Detection (HCD-Net) Framework Based on Double Stream Convolutional Neural Networks and an Attention Module

by

Seyd Teymoor Seydi

¹

,

Mahboubeh Boueshagh

²

,

Foad Namjoo

³,

Seyed Mohammad Minouei

⁴,

Zahir Nikraftar

⁵

and

Meisam Amani

^6,*

¹

School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran, Tehran 14399-57131, Iran

²

Department of Earth and Environmental Sciences, Lehigh University, Bethlehem, PA 18015, USA

³

School of Computing, University of Utah, Salt Lake City, UT 84112, USA

⁴

Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran

⁵

School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK

⁶

WSP Canada Limited, Ottawa, ON K2E 7L5, Canada

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(5), 827; https://doi.org/10.3390/rs16050827

Submission received: 11 January 2024 / Revised: 23 February 2024 / Accepted: 26 February 2024 / Published: 28 February 2024

(This article belongs to the Section Remote Sensing Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Human activities and natural phenomena continually transform the Earth’s surface, presenting ongoing challenges to the environment. Therefore, the accurate and timely monitoring and prediction of these alterations are essential for devising effective solutions and mitigating environmental impacts in advance. This study introduces a novel framework, called HCD-Net, for detecting changes using bi-temporal hyperspectral images. HCD-Net is built upon a dual-stream deep feature extraction process, complemented by an attention mechanism. The first stream employs 3D convolution layers and 3D Squeeze-and-Excitation (SE) blocks to extract deep features, while the second stream utilizes 2D convolution and 2D SE blocks for the same purpose. The deep features from both streams are then concatenated and processed through dense layers for decision-making. The performance of HCD-Net is evaluated against existing state-of-the-art change detection methods. For this purpose, the bi-temporal Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) hyperspectral dataset was utilized to assess the change detection performance. The findings indicate that HCD-Net achieves superior accuracy and the lowest false alarm rate among the compared methods, with an overall classification accuracy exceeding 96%, and a kappa coefficient greater than 0.9.

Keywords:

land cover analysis; remote sensing; change detection; hyperspectral; deep learning; convolutional neural networks (CNN); Squeeze-and-Excitation (SE); AVIRIS

Graphical Abstract

1. Introduction

Changes to the Earth arise from both natural hazards, such as floods and earthquakes, and human activities, like urban development [1]. Consequently, Change Detection (CD) algorithms are essential tools for disaster and resource management. Among the various methods of landscape CD, Remote Sensing (RS) stands out as particularly important [2,3,4,5,6]. RS data track changes over time between objects within specific regions [7], providing a valuable data source with several advantages, including frequent updates, the ability to monitor vast areas, and cost-effectiveness [8]. RS data are utilized in a wide range of CD applications, such as fire monitoring [9,10], climate change studies [11,12,13,14,15], and flood mapping [16,17,18,19].

One type of RS imagery that provides better spectral resolution is Hyperspectral RS imagery (HSI) [20,21,22,23]. HSI improves the performance of CD for similar targets due to its high number of spectral bands [24,25], compared to multispectral imagery. However, the specific nature of HSI has made extracting multi-temporal imagery a significant challenge [26]. As a result, this remains a dynamic and challenging area of study. Atmospheric status, noise levels, and data overload are among the most challenging factors affecting the results of HCD [27]. The hyperspectral sensors can be divided into two categories: (1) airborne (e.g., Airborne Visible/Infrared Imaging Spectrometer (AVIRIS)); and (2) space-borne (e.g., Recursore IperSpettrale della Missione Applicativa (PRISMA), Enmap). In the near future, new space-borne sensors will be deployed (HyspIRI, SHALOM, and HypXIM) [27].

Numerous studies and methodologies have so far used HSI for HCD [27,28,29,30,31]. For example, Ertürk et al. [32] suggested a CD technique by applying sparse spectral unmixing to bi-temporal hyperspectral images. First, this method predicts the changed areas using the spectral unmixing method, then creates a binary change map by thresholding the abundance maps. Ertürk [33] also designed an HCD framework based on a fuzzy fusion strategy; similarity measures indices, the spectral angle mapper (SAM) algorithm, and change vector analysis to predict changed areas. The fuzzy inference fusion strategy was used to fuse the magnitude and angle measurements obtained by the change vector analysis (CVA) and SAM algorithms, respectively. Additionally, López-Fandiño et al. [30] proposed a two-step HCD framework for performing binary and multi-CD. They first generated a binary change map based on segmentation and thresholding and used the SAM algorithm. Then, the image differencing algorithm was used to combine multi-temporal images. The Stacked Auto-encoders algorithm was then employed to reduce the dimensionality of HSI. Finally, the binary change map and the reduced HSI were used to produce the multi-class change map. In recent work, Ghasemian and Shah-Hosseini [34] also designed an HCD framework for multiple and binary CD based on several steps: (1) stacking the bi-temporal dataset and generating sample data based on the peak density clustering algorithm, (2) implementing target detection methods based on the produced sample data, (3) generating a binary change map based on the Otsu thresholding, (4) utilizing the sparse coding algorithm and the support vector domain description (SVDD) for generating multiple maps. Furthermore, Saha et al. [35] proposed an HCD framework based on an untrained deep model for HCD. This method extracts deep features for the first and second times of hyperspectral images using the untrained model and measures the similarity of the deep features through the Euclidean norm.

Tong et al. [36] also proposed a framework for HCD by analyzing and transfer-learning of uncertain areas. This method is applied in four main steps: (1) generating a binary change map according to the uncertain area analysis using K-Means clustering, CVA, and rule-based methods, (2) classifying the source image based on an active learning framework, (3) second-time image classification based on improved transfer learning and a support vector machine (SVM) classifier, and (4) utilizing post-classification analysis for multiple change map detection. Moreover, Seydi and Hasanlou [37] designed an HCD method based on a 3D convolutional neural network (3D-CNN) and an image differencing algorithm. This framework utilized the image differencing procedure to predict change and no-change areas and then employed the 3D-CNN to classify the change areas to generate a binary change map. Finally, Borsoi et al. [38] proposed a fast spectral unmixing method for HCD based on the high temporal correlation between the abundances. This method detects abrupt changes by considering the residuals of end-member selection.

Recent progress in HCD has emphasized the value of using Siamese networks and double-stream architectures for improved spectral–spatial analysis [39,40]. Innovative methods such as meta-learning and self-supervised learning have proven effective in overcoming HCD challenges. For instance, Wang et al., 2022 applied meta-learning with Siamese networks for target detection, while Huang et al., 2023 developed a contrastive self-supervised network for HSI analysis [41,42,43]. These advancements have not only enhanced HCD techniques but also laid the groundwork for our HCD-Net framework, which leverages these developments for more accurate HCD.

Although the current HCD methods have shown promising results, they usually have several limitations, including the following:

They require a threshold, and selecting a suitable threshold can be challenging.
They primarily focus on spectral data while ignoring the potential of spatial features in improving HCD results, which has been proven by multiple studies.
Most HCD methods are complex to implement and require high-complexity computation.
Noise and atmospheric conditions can negatively affect the automatic generation of pseudo-sample data through simple predictors and thresholding methods.
Most HCD methods require additional pre-processing steps, such as highlighting changes (recognizing changes from no-changes) or dimensional reduction. The dependence of HCD results on the chosen method for conducting these pre-processing steps makes it difficult to obtain robust results in different study areas.

Given these limitations, a novel method has been proposed in this study to minimize the challenges and to improve HCD results. This study introduces a new framework for HCD based on double-stream CNNs called the HCD-Net. The HCD-Net uses multiscale 3D/2D convolution layers and 3D/2D attention blocks. The advantages of the HCD-Net are: (1) using the multiscale multi-dimensional kernel, (2) utilizing 3D/2D attention blocks, (3) high efficiency, and providing robust results in HCD. The key contributions of this study are:

Proposing and implementing HCD-Net, a novel double-stream deep feature extraction framework for HCD that integrates both 3D and 2D convolution layers in an end-to-end manner without the need for additional processing.
Introducing a 3D/2D attention mechanism within HCD-Net to significantly enhance the extraction of informative deep features, leading to improved accuracy in detecting changes in HSI. This attention mechanism allows the model to focus on salient features in both spatial and spectral dimensions, providing a more nuanced understanding of the changes occurring within the scene.
Demonstrating the robustness and versatility of HCD-Net by evaluating its performance across diverse geographic regions using both space-borne and airborne HSI. The results showcase HCD-Net’s ability to adapt to various landscapes and sensor characteristics, establishing its efficacy in a wide range of HCD applications.

2. Methodology

The HCD-Net is designed in three main steps: (1) Pre-processing, (2) tuning parameters and model training, and (3) CNN-based binary classification and accuracy assessment. The details of the HCD-Net are presented in Figure 1 and are further discussed in the following subsections.

2.1. Pre-Processing

The RS datasets require pre-processing steps before they can be utilized for CD. These steps can be divided into spectral and spatial corrections. The pre-processing includes spectral correction followed by subsequent spatial correction. Spectral rectification of the hyperspectral level-1 raw (L1R) Hyperion data (space-borne sensor) involved eliminating bands with no-data values, de-striping, de-noizing, smile, radiometric, and atmospheric corrections. Additionally, geometric correction was performed during preprocessing. After pre-processing, 154 spectral bands were used in this study for HCD. The pre-processing step for the airborne sensor (AVIRIS) has already been completed, and the pre-processed data are used in this article.

2.2. CNN-Based HCD

Deep Learning (DL) methods can automatically extract informative features from input datasets with a high degree of abstraction [44]. Among all DL frameworks, including CNNs, deep belief networks (DBNs), generative adversarial networks (GANs), recurrent neural networks (RNNs), and auto-encoders (AEs), CNN is the most commonly employed method [45]. Renowned for its applications across various fields, CNN uses stacked convolutional kernels to learn spectral and texture information in the spatial domain [46], enabling classification based on the interrelationships between the input data and target labels. The CNN network consists of two main components: a feature extractor and a softmax classifier, which is typically implemented as a multi-layer perceptron (MLP) that assigns class labels. Furthermore, the CNN network includes several operational layers, such as convolutional layers, pooling layers, nonlinear activation functions, and normalization layers [22].

This research proposes a novel double-stream HCD framework using a CNN network, as illustrated in Figure 2, which details the architecture of our dual-stream CNN framework for HCD. To improve efficiency, the two input patch datasets are merged into a single patch dataset through image differencing, a departure from many DL-based frameworks that stack input patch data, thereby reducing computation time and processing.

HCD-Net’s architecture combines 3D and 2D convolutional layers and SE blocks to effectively extract and analyze the complex spectral–spatial relationships in hyperspectral data, while keeping computational efficiency in mind. The first stream uses multi-scale 3D convolution blocks, pooling layers, and 3D-SE blocks to explore the spectral–spatial dimensions, capturing the detailed relationships across the various bands of hyperspectral images. This approach enables a thorough analysis, utilizing the full spectrum of spectral information to effectively detect subtle environmental changes.

On the other hand, the second stream focuses on extracting 2D deep features through 2D convolution layers and 2D-SE blocks. Designed to be deeper, this stream includes a broader range of multiscale convolution blocks without pooling layers, aiming to capture spatial details precisely. The 2D convolutional layers in our framework are especially good at extracting high-resolution spatial features with lower computational demands, without considering the relationship between spectral bands, thus ensuring the model’s efficiency. This differentiation in convolutional approaches allows for a comprehensive analysis by extracting detailed spatial information to complement the spectral–spatial features identified by the 3D convolutions, enhancing the framework’s overall ability to detect changes within the environment. Additionally, the inclusion of the attention module and the 2D-SE attention block in our model is strategically aimed at extracting highly informative features and significantly enhancing the model’s representational power. This approach is rooted in the broader concept of attention mechanisms within CNNs, which prioritize crucial information within the input data. Our specific implementation through these attention mechanisms is designed to refine the model’s focus on salient features, thereby boosting its analytical capabilities and overall performance in detecting nuanced changes in hyperspectral imagery.

The 3D and 2D features from both branches are combined through a concatenation and flattening process. After aligning their dimensions, the 2D and transformed 3D features are merged along the channel dimension to form a composite feature map. This map is then flattened into a one-dimensional vector for analysis by dense layers. The effectiveness of this integration mechanism has been confirmed through ablation analysis in Section 5, showing the significant potential of using both 3D and 2D convolutional layers to enhance the framework’s performance.

These features are then brought together in a concatenating layer, which merges the deep features from both streams before passing them to the classification layers. A fully connected layer bridges the CNN framework and MLP layers, leading to a softmax layer that classifies the input feature data.

This dual-stream approach, characterized by its innovative use of 2D and 3D convolutional layers and SE blocks, underlines our commitment to developing a robust framework capable of leveraging the unique advantages of hyperspectral imagery for environmental change detection. Through this method, the HCD-Net framework ensures a balanced and comprehensive analysis, achieving high accuracy and efficiency in monitoring and analyzing environmental changes.

The HCD-Net has several differences compared to other HCD frameworks, including:

Taking advantage of SE blocks in extracting informative deep features.
Utilizing the advantages of spectral information in the hyperspectral dataset through 3D convolution layers.
Combining 3D and 2D convolutions to explore high-level spectral and spatial information.
Utilizing a multiscale convolution block to increase the robustness of the network against different object sizes.
Employing a differencing algorithm to reduce computational and time costs instead of concatenating deep features in the first layers.

The multidimensional kernel convolution used in this study encompasses 3D, 2D, and 1D kernel convolutions. The distinction between these kernel convolutions is illustrated in Figure 3.

2.3. Squeeze-and-Excitation (SE) Blocks

The proposed SE block adjusts channel-wise feature responses adaptively to explicitly model the interconnections between channels, thus improving channel interdependencies with minimal computational cost. The block consists of three components: (1) Squeeze, (2) Excitation, and (3) Rescale. The Squeeze module uses global average pooling (GAP) to reduce the spatial dimensions of the input feature data to a single value. The Excitation module then investigates the output of the Squeeze module, learning adaptive scaling weights for the feature through two MLP layers with ReLU (neurons in the first layer) and Sigmoid (neurons in the second layer) activation functions, respectively. Finally, the Rescale component uses element-wise multiplication to return the features to their original size. In this research, 3D/2D SE blocks are used in the proposed architecture for HCD. Figure 3 illustrates the differences between 2D/3D SE blocks. The main difference between the two blocks is noted in the Squeeze module, where the 3D Squeeze module uses a 3D GAP, and the 2D Squeeze module uses a 2D GAP.

2.4. Convolution Layers

The convolution layers form the central core of CNN network frameworks and are capable of exploiting deep and high-level features. They can be categorized into three types based on their filter size: (1) 3D kernel convolution, (2) 2D kernel convolution, and (3) 1D kernel convolution. The proposed architecture leverages the benefits of both 3D and 2D convolution layers to extract deep features. Additionally, this research incorporates multiscale convolution blocks to enhance the network’s resilience against variations in object size.

The strength of 3D convolution layers lies in consideration of both spatial and spectral features. In other words, the 3D convolution layers consider the relation between the central pixel with its neighborhood and the relation between spectral bands. The feature map (H) for the 3D-convolution layer at position (

α

,

β

,

γ

) on the yth feature of the xth layer is given by Equation (1).

H_{(x, y)}^{(α, β, γ)} = F (b_{(x, y)} + \sum_{τ = 1}^{m_{(l - 1)}} \sum_{i = - r}^{r} \sum_{j = - s}^{s} \sum_{k = - t}^{t} W_{(x, y, τ)}^{(r, s, t)} H_{(x - 1, i)}^{(α + r, β + s, γ + t)})

(1)

where F is the activation function, b(x,y) is the bias parameter, m is the number of feature maps in the

(l - 1)

th layer;

2 r + 1

,

2 s + 1

, and

2 t + 1

are the width, height, and depth of kernel along a spectral dimension, respectively. In the 2D convolution layer, the feature map value at position (

α

,

β

) can be enumerated using the below equation:

H_{(x, y)}^{(α, β)} = F (b_{(x, y)} + \sum_{τ = 1}^{m_{(l - 1)}} \sum_{i = - r}^{r} \sum_{j = - s}^{s} W_{(x, y, τ)}^{(r, s)} H_{(x - 1, i)}^{(α + r, β + s)})

(2)

The activation function is a rectified linear unit (Relu) according to the following equation:

f (x) = max {0, x}

(3)

Furthermore, the Sigmoid activation function can be formulated by Equation (4).

f (x) = \frac{e^{x}}{1 + e^{x}}

(4)

2.5. Model Parameters Optimization

The parameters of the HCD-Net are estimated using a backpropagation method to find the optimal values. The model parameters are iteratively tuned by an optimizer based on the loss value. To achieve this, the model is trained using the training sample data, with the initial parameters initialized by the He-Normal method. The network error is then calculated using a loss function on the validation dataset. The optimizer updates the network parameters based on the feedback from the loss value. The Adam optimization algorithm is utilized to tune the parameters of the model [27,47]. Furthermore, the cost function used is binary cross-entropy, as follows:

\begin{matrix} H_{p} (q) = - \frac{1}{2} [y_{1} log (p (y_{1})) + y_{2} log (p (y_{2})) + (1 - y_{1}) log (1 - p (y_{1})) \\ + (1 - y_{2}) log (1 - p (y_{2}))] \end{matrix}

(5)

where y is a label (true value) and

p (y)

is the predicted probability observation.

2.6. Accuracy Assessment and Comparison with Other Methods

The primary purpose of this step is to evaluate the results obtained by comparing them with the reference dataset. To achieve this, several comparison indices, including Overall Accuracy (OA), Kappa Coefficient (KC), F1-Score, Recall, Precision, and Balanced Accuracy (BA), were used for accuracy assessment. The results were also compared with four state-of-the-art methods: the 2D Siamese Network, the 3D Siamese Network, General End-to-end Two-dimensional CNN Framework (GETNET), and Iteratively Reweighted Multivariate Alteration Detection (IR-MAD) Support Vector Machine (SVM) [48,49,50]. The 2D Siamese Network has two deep feature extraction channels, with the first channel analyzing the first-time hyperspectral dataset and the second channel focusing on the after-change hyperspectral dataset. Similarly, the 3D Siamese Network has a similar architecture but utilizes 3D convolution layers.

3. Case Study

The HCD-Net is evaluated using two HSIs for the purpose of evaluating various types of HCD methods. The quality of the ground truth data is a critical factor in these evaluations; hence the selection of these datasets is primarily due to their access to control datasets, which are widely utilized in various HCD studies such as Refs. [48,51]. In these studies, the ground control datasets were created through visual inspection.

The first dataset was acquired near Yuncheng, Jiangsu province in China, on 3 May 2006, and 23 April 2007. Soil, river, tree, building, roads, and agricultural fields are the main characteristics of this location. The second dataset was collected from irrigated agricultural fields in Hermiston, a city in Umatilla County, OR, USA, on 1 May 2004, and 8 May 2007. The datasets from China and the USA belong to the Hyperion sensor and are available at https://rslab.ut.ac.ir/data (accessed on 25 February 2024) (Figure 4). The third dataset is the product of the AVIRIS sensor taken in 2013 and 2015 from the Bay Area surrounding the city of Patterson, California, and is available at https://citius.usc.es/investigacion/datasets/hyperspectral-change-detection-dataset (accessed on 25 February 2024). The land cover includes soil, irrigation fields, rivers, human constructions, cultivated lands, and grassland. In all datasets, changes are affiliated with the land cover type and water body areas. The details of the used dataset and the sample dataset for three case study areas are shown in Table 1 and are illustrated in Figure 4.

For a fair comparison, 5% of the samples from the USA and China datasets and 1% of the samples from the Bay Area ground reference data were used for training the network. The sample data are divided into three parts: (1) training data (72%), (2) validation data (18%), and (3) test data (10%). Figure 5 displays the pixels used for training, validation, and testing for each scene. Green represents no-change pixels and red represents change pixels for both training and validation. Test pixels are shown in beige.

4. Experiment and Results

4.1. Results of HCD for the Bay Area Dataset

Figure 6 depicts the results of the HCD of the Bay Area dataset using four different CD methods. Most methods demonstrated high performance, although they vary in finer details. Several methods exhibit numerous false pixels in their results, while the HCD-Net precisely identified both the change and non-change pixels.

The numerical results of the HCD for the Bay Area dataset are presented in Table 2. Based on this table, most methods demonstrated an OA of over 91%. The HCD-Net improved the results of HCD by more than 2.42% in terms of OA, 2.94% in Precision, 1.51% in Recall, 2.33% in F1-Score, 2.48% in BA index, and 0.049 in KC index compared to other methods.

Figure 6a depicts the results of the HCD of the Bay Area dataset using the IR-MAD SVM method. The numerical results of the IR-MAD SVM for the Bay Area dataset are presented in Table 3. The difference in OAs of the HCD-Net and IR-MAD is remarkable at 17.52%, with a 0.354 increase in KC, highlighting HCD-Net’s superior capability in accurately detecting changes in highly varied landscapes.

4.2. Results of HCD for the China Dataset

The results of the HCD methods applied to the China dataset are depicted in Figure 7. Most methods effectively identified both the change and no-change areas. Notably, HCD-Net better detected both classes (as indicated in Figure 7e). On the other hand, there are numerous false pixels present in the results obtained from 2D-Siamese (Figure 7b) and 3D-Siamese (Figure 7c). Additionally, GETNET performed better in detecting no-change areas than change areas, but many change areas remain undetected (see Figure 7d).

Table 4 presents the accuracies of the HCD methods for the China dataset. Based on numerical results, most methods provided OAs of more than 94% and KCs of more than 0.869. The higher performance of HCD-Net is clear when judged by the KC and BA indices, indicating that it has made a significant improvement compared to other methods. Although GETNET provided slightly better accuracy than HCD-Net based on Precision, it had lower performance according to other indices, especially KC, BA, Recall, and F1-Score.

Figure 7a depicts the results of the HCD of the China dataset using the IR-MAD SVM method. The numerical results of the IR-MAD SVM for the China dataset are presented in Table 5. HCD-Net shows a notable improvement in OA by 6.06%, and in KC by 0.14 points over IR-MAD, indicating a significant enhancement in classification accuracy and agreement.

4.3. Results of HCD for the USA Dataset

The results of the HCD methods applied to the China dataset are depicted in Figure 8. The numerical results of HCD are presented in Table 6 for the USA dataset. According to this table, the majority of the methods had OAs of less than 93%, while the HCD-Net had an OA of more than 96%. The robustness of the HCD-Net is visible in most measurement indices. For instance, the improvement of the HCD-Net by the Recall and F1-Score is more than 5% compared to other models. Furthermore, the KC of the HCD-Net is 0.9, while other methods had values lower than 0.81. The GETNET provided a high accuracy by the Precision index but has performed poorly in comparison with the HCD-Net using other indices.

Figure 8a depicts the results of the HCD of the USA dataset using the IR-MAD SVM method. The numerical results of the IR-MAD SVM for the USA dataset are presented in Table 7. HCD-Net increases OA by 9.23% and KC by 0.354 points compared to IR-MAD, demonstrating a substantial improvement in overall detection performance and consistency.

5. Discussion

The results indicated that HSI has a high potential for CD purposes. For example, it was observed that the OAs for HCD of all methods were more than 90%. According to visual and statistical inspections, as shown in Figure 6, Figure 7 and Figure 8, and Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7, various methods provided different results for the study areas. However, the HCD-Net provided robust results in three case study areas. Compared to other methods, the variation in the values of KC is low in the HCD-Net. The mean KC value of the USA dataset is lower than the mean KCs of the other datasets. Thus, the complexity of the classes may affect the HCD results. Although this issue can be neglected in the HCD-Net results, other methods are highly dependent on the complexity of the dataset. Table 8 presents the results of the average OAs and BAs in three study areas for HCD-Net and all other methods (GETNET, 3D-Siamese, and 2D-Siamese). The HCD-Net demonstrated promising results (more than 97%) for the three study areas, whereas the performance of other HCD methods was not satisfactory across all study areas. The BA index evaluates the performance of CD in both change and no-change classes. Due to the imbalanced datasets, most HCD algorithms focused on change or no-change areas while losing performance in both classes. Based on Table 8, the HCD-Net had the highest performance, confirming the robustness of HCD-Net in HCD. There is a trade-off between detecting change and no-change pixels in HCD. An accurate HCD method should provide robust and high performance in the detection of both classes. The HCD-Net could detect change and no-change pixels with a high level of accuracy, whereas other methods had lower performance in detecting both classes.

The visual results (see Figure 6a, Figure 7a and Figure 8a) illustrate HCD-Net’s superior performance in HCD compared to the IR-MAD SVM, showcasing its enhanced capability to accurately identify changes across diverse landscapes. The comparison between IR-MAD SVM (non-CNN-based CD algorithm) and HCD-Net across the China, USA, and Bay Area datasets further reveals HCD-Net’s superior performance in all evaluated metrics, including OA, Precision, Recall, F1-Score, BA, and KC (Table 2, Table 4 and Table 6). HCD-Net consistently outperforms IR-MAD SVM, demonstrating its effectiveness in hyperspectral change detection. These results underscore HCD-Net’s advanced capability to accurately identify changes within diverse landscapes, benefiting from its DL architecture and attention mechanisms, which enhance its sensitivity to subtle spectral–spatial features not as effectively captured by IR-MAD SVM.

Many HCD methods tend to overlook spatial features and primarily concentrate on spectral features. Moreover, even when some methods do incorporate spatial features, they often fail to utilize optimized features. The optimization of features can be a daunting and time-consuming task, leading to potential shortcomings in HCD by traditional and other state-of-the-art methods. The HCD-Net, on the other hand, automatically extracts deep features encompassing both spatial and spectral aspects. Consequently, when conducting HCD with the HCD-Net, the results are notably accurate and reliable, especially in large-scale areas where ground truth verification is challenging. Additionally, the HCD-Net incorporates a multi-dimensional convolution layer, thereby enhancing HCD performance. Unlike many other HCD methods that focus solely on either 2D or 3D convolution layers, the HCD-Net leverages both 3D and 2D convolution layers. Furthermore, the integration of 3D/2D SE blocks facilitates the extraction of informative deep features.

Several HCD methods for hyperspectral CD, such as GETNET, necessitate additional processing steps like dimensional reduction and spectral unmixing. Given the high dimensionality of hyperspectral datasets, preprocessing poses a significant challenge. In contrast, the HCD-Net does not require such additional processing for HCD, unlike the GETNET algorithm, which relies on spectral unmixing prior to CD.

To evaluate the effectiveness of each part of the HCD-Net framework, we conducted ablation studies for the China dataset. These studies examined the impact of removing or altering specific components of the model. We focused on four scenarios: (1) the model without the 3D convolutional channel (S#1), (2) the model without the 2D convolutional channel (S#2), (3) the model without the attention modules (S#3), and (4) the full HCD-Net model with all components (S#4).

The results, shown in Table 9, highlight the importance of the attention modules. The performance drop in scenario S#3 demonstrates their role in improving the model’s focus and feature representation, which significantly enhances detection accuracy. Additionally, the comparison between S#1 and S#2 indicates that the 3D convolutional channel has a more significant impact on the model’s performance than the 2D channel. This suggests that the 3D channel is crucial for capturing complex spectral–spatial relationships in hyperspectral data, which is essential for accurate change detection.

Table 10 shows a comparison of computational times for different models, including HCD-Net. With an execution time of 465.21 s, HCD-Net is faster than the 3D-Siamese model but not as quick as the 2D-Siamese or the IR-MAD SVM, the latter being the fastest. This comparison highlights HCD-Net’s effective balance of computational speed and high precision in change detection, making it a strong choice for scenarios where detailed accuracy is more critical than the fastest processing time.

6. Conclusions

HCD-Net emerges as a pioneering framework in HCD, integrating the strengths of both 2D and 3D convolution layers alongside attention mechanisms. This novel DL-based double-stream approach ensures exceptional accuracy, demonstrated across diverse datasets, surpassing 96% in performance metrics. Its design achieves a synergy between spatial and spectral analysis, offering robustness across various terrains and sensor modalities. Additionally, HCD-Net’s efficiency in computational costs, without necessitating additional pre-processing, underscores its practicality and superiority over existing methodologies.

Author Contributions

Conceptualization, S.T.S. and Z.N.; methodology, S.T.S., M.B. and F.N.; validation, M.B. and S.M.M.; writing—original draft preparation, S.T.S., M.B. and F.N.; writing—review and editing, M.B., S.M.M., Z.N. and M.A.; visualization, S.T.S. and Z.N.; supervision, M.A.; project administration, M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets analyzed during the current study are available in the public repositories mentioned within the article. These datasets were used to support the findings of this study. No new data were generated during the course of this research. For further inquiries, the corresponding author can be contacted.

Conflicts of Interest

Author Meisam Amani is employed by The Williams Sale Partnership Ltd. (WSP) in Canada. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AE	Auto-Encoder
AVIRIS	Airborne Visible InfraRed Imaging Spectrometer
BA	Balance Accuracy
CD	Change Detection
CNN	Convolutional Neural Network
CVA	Change Vector Analysis
DBN	Deep Belief Network
DL	Deep Learning
GAN	Generative Adversarial Network
GAP	Global Average Pooling
GETNET	General End-to-end Two-dimensional CNN Framework
HCD-Net	Hyperspectral Change Detection
HSI	Hyperspectral RS Imagery
IR-MAD SVM	Iteratively Reweighted Multivariate Alteration Detection Support Vector Machine
KC	Kappa Coefficient
MPL	Multi-Layer Perceptron
OA	Overall Accuracy
PRISMA	Recursore IperSpettrale della Missione Applicativa
RNN	Recurrent Neural Network
RS	Remote Sensing
SAM	Spectral Angle Mapper
SVDD	Support Vector Domain Description
SVM	Support Vector Machine

References

Zhan, T.; Gong, M.; Jiang, X.; Zhang, M. Unsupervised scale-driven change detection with deep spatial–spectral features for VHR images. IEEE Trans. Geosci. Remote Sens. 2020, 58, 5653–5665. [Google Scholar] [CrossRef]
Asokan, A.; Anitha, J. Change detection techniques for remote sensing applications: A survey. Earth Sci. Inform. 2019, 12, 143–160. [Google Scholar] [CrossRef]
Jianya, G.; Haigang, S.; Guorui, M.; Qiming, Z. A review of multi-temporal remote sensing data change detection algorithms. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2008, 37, 757–762. [Google Scholar]
Renza, D.; Martinez, E.; Molina, I. Unsupervised change detection in a particular vegetation land cover type using spectral angle mapper. Adv. Space Res. 2017, 59, 2019–2031. [Google Scholar] [CrossRef]
Marinelli, D.; Bovolo, F.; Bruzzone, L. A novel change detection method for multitemporal hyperspectral images based on binary hyperspectral change vectors. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4913–4928. [Google Scholar] [CrossRef]
Alizadeh Moghaddam, S.H.; Gazor, S.; Karami, F.; Amani, M.; Jin, S. An Unsupervised Feature Extraction Using Endmember Extraction and Clustering Algorithms for Dimension Reduction of Hyperspectral Images. Remote Sens. 2023, 15, 3855. [Google Scholar] [CrossRef]
Hussain, M.; Chen, D.; Cheng, A.; Wei, H.; Stanley, D. Change detection from remotely sensed images: From pixel-based to object-based approaches. ISPRS J. Photogramm. Remote Sens. 2013, 80, 91–106. [Google Scholar] [CrossRef]
Seydi, S.T.; Hasanlou, M. A new land-cover match-based change detection for hyperspectral imagery. Eur. J. Remote Sens. 2017, 50, 517–533. [Google Scholar] [CrossRef]
Dennison, P.E.; Roberts, D.A. Daytime fire detection using airborne hyperspectral data. Remote Sens. Environ. 2009, 113, 1646–1657. [Google Scholar] [CrossRef]
Veraverbeke, S.; Dennison, P.; Gitas, I.; Hulley, G.; Kalashnikova, O.; Katagis, T.; Kuai, L.; Meng, R.; Roberts, D.; Stavros, N. Hyperspectral remote sensing of fire: State-of-the-art and future perspectives. Remote Sens. Environ. 2018, 216, 105–121. [Google Scholar]
Khelifi, L.; Mignotte, M. Deep learning for change detection in remote sensing images: Comprehensive review and meta-analysis. IEEE Access 2020, 8, 126385–126400. [Google Scholar] [CrossRef]
Yang, J.; Gong, P.; Fu, R.; Zhang, M.; Chen, J.; Liang, S.; Xu, B.; Shi, J.; Dickinson, R. The role of satellite remote sensing in climate change studies. Nat. Clim. Change 2013, 3, 875–883. [Google Scholar] [CrossRef]
Boueshagh, M.; Hasanlou, M. Estimating water level in the Urmia Lake using satellite data: A machine learning approach. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, 42, 219–226. [Google Scholar] [CrossRef]
Ahmadi, A.; Jalali, J.; Mohammadpour, A. Future runoff assessment under climate change and land-cover alteration scenarios: A case study of the Zayandeh-Roud dam upstream watershed. Hydrol. Res. 2022, 53, 1372–1392. [Google Scholar] [CrossRef]
Shafique, A.; Seydi, S.T.; Alipour-Fard, T.; Cao, G.; Yang, D. SSViT-HCD: A Spatial Spectral Convolutional Vision Transformer for Hyperspectral Change Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 6484–6504. [Google Scholar] [CrossRef]
Hasanlou, M.; Seydi, S.T. Automatic change detection in remotely sensed hyperspectral imagery (Case study: Wetlands and waterbodies). Earth Obs. Geomat. Eng. 2018, 2, 9–25. [Google Scholar]
Sadiq, R.; Imran, M.; Ofli, F. Remote Sensing for Flood Mapping and Monitoring. In International Handbook of Disaster Research; Springer: Singapore, 2023; pp. 1–19. [Google Scholar]
Brivio, P.; Colombo, R.; Maggi, M.; Tomasoni, R. Integration of remote sensing data and GIS for accurate mapping of flooded areas. Int. J. Remote Sens. 2002, 23, 429–441. [Google Scholar] [CrossRef]
Munawar, H.S.; Hammad, A.W.; Waller, S.T. Remote sensing methods for flood prediction: A review. Sensors 2022, 22, 960. [Google Scholar] [CrossRef]
Hasanlou, M.; Seydi, S.T. Hyperspectral change detection: An experimental comparative study. Int. J. Remote Sens. 2018, 39, 7029–7083. [Google Scholar] [CrossRef]
Dong, W.; Xiao, S.; Liang, J.; Qu, J. Fusion of hyperspectral and panchromatic images using structure tensor and matting model. Neurocomputing 2020, 399, 237–246. [Google Scholar] [CrossRef]
Fang, B.; Li, Y.; Zhang, H.; Chan, J.C.W. Collaborative learning of lightweight convolutional neural network and deep clustering for hyperspectral image semi-supervised classification with limited training samples. ISPRS J. Photogramm. Remote Sens. 2020, 161, 164–178. [Google Scholar] [CrossRef]
Fathollahi, F.; Zhang, Y. Adaptive band selection for pan-sharpening of hyperspectral images. Int. J. Remote Sens. 2020, 41, 3924–3947. [Google Scholar] [CrossRef]
Imani, M.; Ghassemian, H. An overview on spectral and spatial information fusion for hyperspectral image classification: Current trends and challenges. Inf. Fusion 2020, 59, 59–83. [Google Scholar] [CrossRef]
Takahashi Miyoshi, G.; Imai, N.N.; Garcia Tommaselli, A.M.; Antunes de Moraes, M.V.; Honkavaara, E. Evaluation of hyperspectral multitemporal information to improve tree species identification in the highly diverse atlantic forest. Remote Sens. 2020, 12, 244. [Google Scholar] [CrossRef]
Bruzzone, L.; Liu, S.; Bovolo, F.; Du, P. Change detection in multitemporal hyperspectral images. In Multitemporal Remote Sensing: Methods and Applications; Springer: Cham, Switzerland, 2016; pp. 63–88. [Google Scholar]
Seydi, S.T.; Hasanlou, M. A new structure for binary and multiple hyperspectral change detection based on spectral unmixing and convolutional neural network. Measurement 2021, 186, 110137. [Google Scholar] [CrossRef]
Liu, S.; Marinelli, D.; Bruzzone, L.; Bovolo, F. A review of change detection in multitemporal hyperspectral images: Current techniques, applications, and challenges. IEEE Geosci. Remote Sens. Mag. 2019, 7, 140–158. [Google Scholar] [CrossRef]
Hong, D.; Wu, X.; Ghamisi, P.; Chanussot, J.; Yokoya, N.; Zhu, X.X. Invariant attribute profiles: A spatial-frequency joint feature extractor for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3791–3808. [Google Scholar] [CrossRef]
López-Fandiño, J.; Heras, D.B.; Argüello, F.; Dalla Mura, M. GPU framework for change detection in multitemporal hyperspectral images. Int. J. Parallel Program. 2019, 47, 272–292. [Google Scholar] [CrossRef]
Ou, X.; Liu, L.; Tan, S.; Zhang, G.; Li, W.; Tu, B. A Hyperspectral Image Change Detection Framework With Self-Supervised Contrastive Learning Pretrained Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 7724–7740. [Google Scholar] [CrossRef]
Ertürk, A.; Iordache, M.D.; Plaza, A. Sparse unmixing-based change detection for multitemporal hyperspectral images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 9, 708–719. [Google Scholar]
Ertürk, S. Fuzzy fusion of change vector analysis and spectral angle mapper for hyperspectral change detection. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 5045–5048. [Google Scholar]
Ghasemian, N.; Shah-Hosseini, R. Hyperspectral multiple-change detection framework based on sparse representation and support vector data description algorithms. J. Appl. Remote Sens. 2020, 14, 014523. [Google Scholar] [CrossRef]
Saha, S.; Kondmann, L.; Zhu, X.X. Deep no learning approach for unsupervised change detection in hyperspectral images. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2021, 3, 311–316. [Google Scholar] [CrossRef]
Tong, X.; Pan, H.; Liu, S.; Li, B.; Luo, X.; Xie, H.; Xu, X. A novel approach for hyperspectral change detection based on uncertain area analysis and improved transfer learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 2056–2069. [Google Scholar] [CrossRef]
Seydi, S.; Hasanlou, M. Binary hyperspectral change detection based on 3D convolution deep learning. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 43, 1629–1633. [Google Scholar] [CrossRef]
Borsoi, R.A.; Imbiriba, T.; Bermudez, J.C.M.; Richard, C. Fast unmixing and change detection in multitemporal hyperspectral data. IEEE Trans. Comput. Imaging 2021, 7, 975–988. [Google Scholar] [CrossRef]
Seydi, S.T.; Shah-Hosseini, R.; Amani, M. A Multi-Dimensional Deep Siamese Network for Land Cover Change Detection in Bi-Temporal Hyperspectral Imagery. Sustainability 2022, 14, 12597. [Google Scholar] [CrossRef]
Fatemighomi, H.S.; Golalizadeh, M.; Amani, M. Object-based hyperspectral image classification using a new latent block model based on hidden Markov random fields. Pattern Anal. Appl. 2022, 25, 467–481. [Google Scholar] [CrossRef]
Wang, Y.; Chen, X.; Wang, F.; Song, M.; Yu, C. Meta-Learning Based Hyperspectral Target Detection Using Siamese Network. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5527913. [Google Scholar] [CrossRef]
Huang, Y.; Zhang, L.; Qi, W.; Huang, C.; Song, R. Contrastive Self-Supervised Two-Domain Residual Attention Network with Random Augmentation Pool for Hyperspectral Change Detection. Remote Sens. 2023, 15, 3739. [Google Scholar] [CrossRef]
Wang, Y.; Chen, X.; Zhao, E.; Song, M. Self-Supervised Spectral-Level Contrastive Learning for Hyperspectral Target Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5510515. [Google Scholar] [CrossRef]
Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P.; Nasrin, M.S.; Hasan, M.; Van Essen, B.C.; Awwal, A.A.; Asari, V.K. A state-of-the-art survey on deep learning theory and architectures. Electronics 2019, 8, 292. [Google Scholar] [CrossRef]
Ball, J.E.; Anderson, D.T.; Chan, C.S. Comprehensive survey of deep learning in remote sensing: Theories, tools, and challenges for the community. J. Appl. Remote Sens. 2017, 11, 042609. [Google Scholar] [CrossRef]
Wang, L.; Zhang, J.; Liu, P.; Choo, K.K.R.; Huang, F. Spectral–spatial multi-feature-based deep learning for hyperspectral remote sensing image classification. Soft Comput. 2017, 21, 213–221. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Liu, S. Advanced Techniques for Automatic Change Detection in Multitemporal Hyperspectral Images. Ph.D. Thesis, University of Trento, Trento, Italy, 2015. [Google Scholar]
Marpu, P.; Gamba, P.; Benediktsson, J.A. Hyperspectral change detection using IR-MAD and feature reduction. In Proceedings of the 2011 IEEE International Geoscience and Remote Sensing Symposium, Vancouver, BC, Canada, 24–29 July 2011; pp. 98–101. [Google Scholar] [CrossRef]
Moghimi, A.; Ebadi, H.; Sadeghi, V. Changes Monitoring in multitemporal satellite images using Iteratively Reweighted multivariate alteration detection (IR-MAD) algorithm and support vector machine (SVM) classification (Persian). J. Geospat. Inf. Technol. 2018, 6, 23–41. [Google Scholar] [CrossRef]
Jafarzadeh, H.; Hasanlou, M. An unsupervised binary and multiple change detection approach for hyperspectral imagery based on spectral unmixing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 4888–4906. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the proposed HCD-Net.

Figure 2. The architecture of the proposed CNN framework for HCD.

Figure 3. The main difference between 2D/3D Squeeze-and-Excitation (SE) blocks.

Figure 4. (a,b) show composite images of the original HSI collected from the China dataset in 2006 and 2007, respectively, and (c) presents a ground truth image in binary format for these datasets. (d,e) show false-color composites of the original hyperspectral images from the USA dataset, acquired in 2004 and 2007, respectively, and (f) is the ground control data in binary format. (g,h) display hyperspectral images captured from the Bay Area dataset in 2013 and 2015, respectively, and (i) presents the ground truth data.

Figure 5. Image pixels used for training, validation, and testing in the (a) Bay Area dataset, (b) China dataset, and (c) USA dataset. Training and validation sets are represented by green for No-Change pixels and red for Change pixels, while beige indicates the test set.

Figure 6. The output of the binary HCD for the Bay Area dataset. (a) IR-MAD SVM, (b) 2D-Siamese, (c) 3D-Siamese, (d) GETNET, (e) HCD-Net, and (f) Binary Ground Control.

Figure 7. The result of the binary HCD for the China dataset. (a) IR-MAD SVM, (b) 2D-Siamese, (c) 3D-Siamese, (d) GETNET, (e) HCD-Net, and (f) Binary Ground Control.

Figure 8. The result of the binary HCD for the USA dataset. (a) IR-MAD SVM, (b) 2D-Siamese, (c) 3D-Siamese, (d) GETNET, (e) HCD-Net, and (f) Binary Ground Control.

Table 1. Details of the sample dataset incorporated for HCD in three study areas.

Dataset	Size	Data	No-Change	Change
Bay Area	600 × 500	73,404	34,211	39,270
China	420 × 140	58,800	40,417	18,383
USA	307 × 241	74,987	59,688	14,299

Table 2. Accuracy assessment of the HCD methods for the Bay Area dataset.

Index	2D-Siamese	3D-Siamese	GETNET	HCD-Net
OA (%)	91.05	93.10	96.05	98.47
Precision (%)	90.57	91.19	95.64	98.58
Recall (%)	92.92	96.40	97.04	98.55
F1-Score (%)	91.73	93.73	96.33	98.56
BA (%)	90.91	92.86	95.98	98.46
KC	0.820	0.861	0.920	0.969