1. Introduction
InSAR stands as a groundbreaking technology in the fields of telemetry and remote sensing and represents a substantial innovation in synthetic aperture radar technology. InSAR technology acquires interferograms by analyzing the interference between two SAR images exhibiting strong coherence within an identical geographical region. However, accurately recovering the true phase information from the interferometric phase poses a challenge: the challenge lies in the inherently ill-posed nature of the 2-D PU problem, which implies the existence of infinite solutions. To acquire a unique solution, the 2-D PU process relies on the Itoh condition [
1]. This condition ensures that the phase difference between adjacent pixels is less than a certain threshold value [
2], enabling the estimation of true phase differences. However, in practical scenarios, phase noise and the roughness of the true phase challenge the assumption of phase continuity, making 2-D PU a formidable problem. It is crucial to address these challenges and develop robust techniques capable of handling phase noise and sudden phase variations to advance InSAR and achieve accurate phase unwrapping in complex environments.
Traditional phase unwrapping algorithms are usually classified into three categories: (1) Path-following methods [
3,
4] improve the accuracy of unwrapped phases by setting appropriate integration paths to limit error propagation. The branch-cut (BC) method pioneered by Goldstein is a typical example of a path-following algorithm. However, the BC method may be difficult to unwrap or have islands that cannot be unwrapped in regions with high noise levels or dense branch cuts. To address these limitations, researchers subsequently proposed a quality-guided (QG) algorithm, which achieves high accuracy in areas with high data quality. (2) Optimization-based approaches [
5,
6] mainly focus on minimizing the difference between the true phase gradient and the computed phase gradient by formulating an objective function. (3) Integrated denoising and unwrapping methods improve the accuracy of phase unwrapping by combining denoising and phase unwrapping processes. For example, Bayesian algorithms transform phase unwrapping problems into state estimation problems and simultaneously perform noise suppression and phase unwrapping. The Bayesian algorithms include the extended Kalman filtering phase unwrapping algorithm [
7], the unscented Kalman filtering phase unwrapping algorithm, the iterated unscented Kalman filtering phase unwrapping method, an unscented Kalman filtering method with fading factor inserted [
8,
9], and so on. Integrated denoising and unwrapping methods overcome the problems of nonlinear phase differences and noise, but converting nonlinear systems to linear systems can result in the loss of high-order phase information, reducing the accuracy [
10,
11] of phase unwrapping. In addition, due to the high computational time consumption, they are not suitable for real-time applications. In areas with high phase gradients and severe noise, these algorithms are prone to unwrapping failure. In addition, since the PU error propagates throughout the image, the effect of the unwrapping failure spreads to other regions, further degrading the accuracy of the unwrapping.
Over the past few years, researchers have explored the use of deep learning techniques to process radar data. Wang [
12] describes a one-step PU method utilizing a convolutional neural network (CNN) that predicts the unwrapped phase directly from the wrapped interferograms without an intermediate step. However, the use of downsampling and upsampling operations within the framework of this network poses the risk of information loss, which affects the accuracy of the unwrapped phase. Zhang in [
13] for the PU problem proposed a classification network for predicting the wrapped count of pixels for each pixel in the wrapped interferograms: the computation of the wrapped count is realized by the classification task, but since the wrapped counting categories are finite, if the finite value is exceeded, the network will suffer from imprecise computation. They transformed the PU problem into a pixel classification task by training a semantic segmentation network to predict which category each pixel belongs to in order to obtain PU results. For example, Zhou’s [
14] approach treats the PU problem as a segmentation task in order to recover the true phase from the obtained phase gradient. And Sica [
15,
16] proposed a new network by estimating the phase gradients in both directions and reconstructing the entire unwrapped graph using the L2-norm. Moreover, in [
17], Wu proposed a PUNet with regression properties, but the regression network cannot distinguish when the phase difference is an integer multiple of
, and it requires a large amount of training data and computational resources. Although the method proposed by Wu can obtain phase unwrapping results from wrapped interferograms, this method is only applicable to small-area interferograms, and the unwrapping results are not accurate for large-scale interferograms. Compared with traditional PU methods, these methods demonstrate state-of-the-art performance and demonstrate that deep learning is also a feasible InSAR data unwrapping method. Zhou [
18] emphasized the critical need for advancements in AI-based PU technology to address the complexities of real-world environments effectively. By providing insights into the developmental trajectory of AI-driven PU, Zhou’s work laid a solid foundation for future endeavors in this. It is still important to continue searching for more accurate phase unwrapping methods for real-world applications.
Research has shown that the U-Net architecture achieves multi-level feature extraction and information propagation through skip connections, aiding with effectively capturing local and global features. However, when dealing with high-resolution images, an information bottleneck between the encoder and decoder of U-Net may lead to the loss of fine details. Additionally, as the network depth increases, U-Net may encounter issues such as gradient vanishing or exploding, resulting in training difficulties. To address these challenges, this study introduces ResNet with two attention mechanisms—GAU and FPA—in the U-Net-based network framework. The residual connections of ResNet help alleviate the gradient vanishing issue, speeding up the training process and enabling the network to learn complex features more effectively with increased depth. Moreover, the incorporation of GAU and FPA helps mitigate the feature information loss in module connections [
19,
20,
21,
22]. GAU guides the network to focus on global information, while FPA assists the network with capturing features at different scales, enhancing the robustness of phase unwrapping. Consequently, we propose a robust dual-attention network for performing 2-D PU, named ResDANet in this paper. ResDANet is designed to estimate the phase gradient information of interferograms. Particularly, ResDANet excels at deep architecture learning of phase gradients from a vast dataset of training images with varying noise levels and terrain features. This approach empowers ResDANet to discern the correct phase gradient mode without relying on the assumption of phase continuity. To achieve this, we meticulously design and train ResDANet by utilizing a variety of simulated datasets with different noise levels and terrain characteristics. We subsequently deploy ResDANet to predict phase gradients. Then, we employ a L1-norm objective function to calculate the ultimate PU result with the intent of minimizing the disparity between the wrapped phase gradient and the gradient estimated by ResDANet. The accuracy of the phase gradient information obtained through the ResDANet method is improved, and the final phase unwrapping accuracy is measured by the root mean square error (RMSE) and unwrapping time. For different test data, the degree of reduction in RMSE and reduction in unwrapping time varies. Overall, ResDANet’s unwrapping performance is superior to traditional 2-D PU methods. Furthermore, ResDANet exhibits robustness under challenging conditions such as severe noise and diverse terrain.
2. Principles and Related Work
We begin by offering a comprehensive introduction to the fundamental principles underlying traditional 2-D PU methods. PU is the process of restoring unambiguous phase values from a set of 2-D phase median values that only know the modulus
rad [
1]. However, the exact wrapped count
k in the range of
is an unknown integer, making the determination of the exact wrapped count
k a vital objective for obtaining an accurate unwrapped phase.
The PU process can be expressed as:
where
s denotes the pixel position,
is the true phase,
is the wrapped phase, and
k(
s) is the unknown ambiguity number of the pixel
s and is an integer. From (1), it can be seen that the true phase can be obtained by adding several
to each wrapped phase. However, since the PU problem involves an infinite number of solutions, a unique solution needs to be determined. If the phase difference of
is less than
, the true phase can be obtained by integrating the wrapped phase values.
The definition of the phase ambiguity gradient is shown in (2):
where
k(s) and
k(
s − 1) represent the phase ambiguity of adjacent pixels, and
round(·) represents rounding down. For the 2-D PU problem, there are ambiguity gradients in the range and azimuth directions. When the assumption of phase continuity holds at any position in the wrapped phase, the phase ambiguity can be obtained according to (3):
where
is the value of the wrapped phase. However, due to factors such as noise and sudden changes in the terrain, the assumption of phase continuity is disrupted, resulting in an unequal gradient between the phase ambiguity
and the true phase ambiguity
. Regardless of the optimization algorithm employed, the accuracy of PU heavily relies on the phase ambiguity gradient. However, in scenarios characterized by abrupt changes in terrain and severe noise, the conventional PU approach based on the assumption of phase continuity may yield relatively low accuracy. In this study, a two-stage approach is proposed for phase unwrapping. In the initial stage, ResDANet is employed to predict the phase ambiguity gradient in both the range and azimuth directions. In the second stage, the results obtained from ResDANet are refined using the L1-norm to enhance the accuracy of phase unwrapping for a minority of pixels, preventing potential degradation caused by misclassifications made by the neural network.
3. Training Strategy and Network Structure
3.1. Training Dataset Generation
In the context of sudden changes in the terrain and severe noise, where reliable PU proves to be challenging, it becomes impractical to gather an ample amount of ground measurement data for training. As a result, the generation of synthetic interferograms that closely mimic real-world features prior to training becomes essential. This article discusses three datasets designed to simulate such scenarios, and they have undergone separate training processes, all of which have yielded favorable unwrapping outcomes.
3.1.1. Digital Elevation Inversion
This article uses digital elevation model (DEM) inversion to obtain the true phase information. The relationship between the true phase and the altitude is shown in (4) [
23]:
where
represents the true phase value of the i-th row and j-th column pixel;
represents the wavelength of the synthetic aperture radar;
is the effective vertical baseline length;
is the corresponding altitude information, which can be obtained through DEM;
H is the orbital altitude of the satellite;
represents the angle of incidence of the radar waves illuminating the ground. We use TanDEM-X onboard synthetic aperture radar parameters to generate real phase information and then obtain the wrapped phase through wrapping. In order to make it close to the real situation, random noise is added.
3.1.2. Random Sine and Cosine Function Superposition
This method generates a three-dimensional surface by adding N (N is a random number) sines and cosines, and the frequency and phase of each sine and cosine function are random. The generated three-dimensional surface is the true phase and is better-regulated in terms of the amplitude, frequency, and phase of the generated terrain. The method of stacking random sine and cosine functions enables the generation of terrain with known structures, improving understanding and verification of phase unwrapping algorithm performance. After generating the true phase, the wrapped phase is generated by wrapping and adding random noise to obtain the dataset of the training network. The mathematical relationship between the superposition of random sine and cosine functions is shown in Formula (5).
where
and
are the amplitudes of the sine and cosine functions, respectively,
and
are the frequencies of the sine and cosine functions, respectively, and
and
are the phases of the sine and cosine functions, respectively.
3.1.3. Distorted 2-D Elliptical Gaussian Surface
A distorted 2-D Gaussian surface is similar to bell-shaped mining subsidence terrain [
17], and the data of local positions in large distorted surfaces is similar to the deformation caused by slope subsidence. Therefore, we use this to simulate local mining subsidence and slope subsidence. The simulation signals of different patterns and deformation intensities can be generated by adjusting parameters. The 2-D elliptical Gaussian function can be expressed as (6):
where
X = (
x1,
x2) represents the 2-D grid of a training sample, and
u = (
u1,
u2) controls the position of the deformed center. The covariance matrix is expressed as:
The shape and size of the deformation area are influenced by various elements, including a 2-D random diagonal matrix
D, an orthogonal basis of another 2-D random matrix
U, and a scaling factor
s. These factors collectively regulate the characteristics of the deformation area.
This article simulates the deformation caused by multiple factors by randomly adjusting the deformation position u and deformation intensity ∑. Due to the consideration of phase changes caused by atmospheric turbulence, fractal Perlin noise is obtained by overlaying Perlin noise with different frequencies and amplitudes to simulate the impact of atmospheric turbulence on the true phase. Finally, the true phase and wrapped phase obtained by combining 2-D Gaussian surface deformation with turbulence are used as the training datasets.
This article divides the above three datasets into two categories to train ResDANet separately. To facilitate the differentiation of the datasets, they are named dataset1 and dataset2, respectively. Dataset1 is obtained by the DEM inversion method and is generated by the random trigonometric function method in a 1:1 quantity ratio. The simulated data represent terrain features such as mountains, valleys, and plains and simulates the complexity of geological structures, the randomness of land use, and irregular changes to surface processes. Dataset2 simulates various irregular shapes and deformations, including terrain changes simulated during mining processes. It is commonly used for testing and evaluating algorithms to ensure that they exhibit good robustness and accuracy when handling various distortions and deformations.
3.2. Proposed Network Model Structures
While the shape of the phase ambiguity gradient resembles the boundary of interference fringes, calculating it contrasts significantly from typical image segmentation tasks. One crucial distinction lies in the precision required for classifying the phase ambiguity gradient accurately, and high accuracy is required for the positioning and categorization of nonzero gradients. Once an error occurs, it may lead to an error being transmitted throughout the entire row or column. This article uses downsampling to extract features, followed by upsampling for resolution restoration. It combines FPA and GAU to calculate the phase ambiguity gradient. To mitigate network degradation in deep neural networks, a residual structure is employed for downsampling that is comprised of multiple cascaded residual blocks.
This article employs an FPA [
22] structure to collect spatial information at different scales.
Figure 1 illustrates the FPA network structure. In order to capture features of different sizes at different scales, we use
,
, and
convolutions in the pyramid structure. The
,
, and
convolution kernels help to capture phase changes in large, medium, and small ranges, respectively. Different convolution kernels enable the network to capture surface features more comprehensively, thereby allowing the network to more effectively perform the phase unwrapping task. This article continuously fuses information of different sizes through the FPA structure and then adds pixel-level attention to the feature map to improve the accuracy of network calculation of the phase ambiguity gradient. The feature maps obtained by adding attention and the global average pooling branch are added to form the ultimate output, further improving the performance of the FPA module.
The GAU [
19] module introduces a mechanism to assign weights to another feature based on the input feature, enabling the network to focus on crucial information. The network structure of the GAU module, depicted in
Figure 2, involves several key steps. Initially, the weighted low-level features undergo a
convolution, extracting further features. Subsequently, after globally pooling the weighted features, a
convolutional operation is applied to adjust the number of channels. This ensures that the weighted high-level features and the weighted low-level features possess an equal numbers of channels. Then, based on the number of channels, the weighted high-level features are multiplied by the weighted low-level features to achieve the weighting operation. Finally, the network upsamples the weighted advanced features and adds them to the weighted features. In the application of this article, the GAU module has, to some extent, improved the accuracy of network calculation of the phase ambiguity gradient.
Residual neural networks can better capture complex features and patterns, thereby improving the accuracy of models [
24]. Residual connections can allow information to jump directly between layers, preserving the original feature information and avoiding the rapid attenuation of gradients in propagation, thus helping to better train deep networks. This article adopts a residual network structure to solve the problem of network degradation when calculating the ambiguity gradient in deep deepening and cascades multiple residual blocks to complete the network calculation. The residual calculation structure is shown in
Figure 3 and is mainly composed of two convolutional kernels with a size of
, and the calculation results are consistent with those obtained through the shallow features introduced by the convolutional kernel of
that are added to complete the calculation; here, the convolutional kernel of
is mainly used to adjust the number of network channels.
The ResDANet structure, which combines the residual structure, FPA, and GAU, is shown in
Figure 4. The feature maps of the network’s downsampling module and upsampling module are of the same size, and the feature maps can be passed into the corresponding upsampling block without clipping, avoiding the feature loss caused by clipping. The bottleneck part of the connection between downsampling and upsampling uses the FPA to provide pixel-level attention for the network. The input received by each upper sampling layer comes from the output of the upper sampling layer, the output of the GAU module, and the output of the corresponding residual block. By adopting a GAU mechanism, the network mainly adds attention to shallow features through deep features, improving the quality of incoming upsampling block features to a certain extent, and it implements a skip structure similar to U-Net [
15] networks to enhance feature fusion. This approach combines the learning capabilities of U-Net, the deep feature extraction of ResNet, and the network structure of GAU and FPA attention mechanisms, potentially offering superior feature learning, accuracy, generalization, and robustness in phase unwrapping tasks compared to single structures or traditional methods.
3.3. Training Process
We employed a variable learning rate strategy for training the network, starting with a smaller learning rate in the initial phases to ensure the correct parameterization of the network. We then increased to the maximum learning rate after a certain number of steps. The maximum learning rate was , and the minimum was . We selected dataset1 and dataset2 that were mentioned earlier and conducted two separate training sessions to train ResDANet separately to obtain different weights. ResDANet was trained each time using a dataset of 2000 samples. And the size of the datasets was . The computer configuration for online training was CPU: Intel Core i5-9400F 2.90 GHz, RAM: 64 GB (2666 MHz), GPU: NVIDIA GeForce RTX 2060 SUPER (8192 MB).
3.4. Unwrapping Using Phase Gradients from ResDANet
This phase is to compute the complete image of the unwrapped phase using the phase ambiguity gradient estimated by ResDANet. However, due to the presence of noise in the wrapped phase and the variation between models estimated by ResDANet in different directions, the obtained phase gradient information may not form a non-rotating field. Therefore, in order to solve this problem, a combination of ResDANet and a traditional unwrapping method based on an optimization algorithm is employed in this paper. First, the distance fuzzy gradient and direction fuzzy gradient are estimated using ResDANet, and then, the network output values are optimized to obtain the phase values for unwrapping. The network post-processing stage utilizes the L1-norm [
25,
26] used to find the optimal solution to make the best approximation of the true phase gradient for the ResDANet computation.
where
is a weighting factor,
represents the phase derivatives between adjacent pixels in the true phase map, and
represents the phase derivatives between adjacent pixels as estimated by ResDANet. Equation (
8) is the model of the MCF (minimum cost flow) [
27], which takes into account the accuracy and speed of phase unwrapping: this indicates that it can improve the efficiency of phase unwrapping. Finally, we bring the obtained count
k into Formula (1) to obtain the unwrapped phase.
4. Experimental Results and Analysis
To demonstrate the performance of ResDANet, we conducted unwrapping tests on 2-D simulated data and real data. And compared with traditional algorithms such as BC [
3], QG [
28], MCF [
27], and LS (least squares) [
29] as well as methods such as RUKF [
30] and the neural network PUNet [
17]. For a quantitative assessment of ResDANet-PU’s robustness, the RMSE between the unwrapped phase and the true phase along with the time required for unwrapping were employed as evaluation metrics for simulated data.
For real-data experiments, we used the residual count of the rewrapped phase and the time consumption for unwrapping the phase as evaluation metrics, and we compared them for different unwrapping algorithms.
4.1. 2-D Simulation Data Experimental Results
4.1.1. Analysis of the Results of the First Training Set
In this section, we present the PU results of ResDANet trained on dataset1. We conducted comprehensive unwrapping tests on four simulated images selected from dataset1 and compared ResDANet with traditional phase unwrapping methods, including BC, QG, LS, and MCE, as well as RUKF and PUNet. The evaluation criterion for the simulated datasets was the RMSE, as shown in
Figure 5, ResDANet demonstrates smooth and clear phase unwrapping capabilities. Particularly in regions with dense fringes and significant noise, ResDANet exhibits noticeable advantages in unwrapping, whereas the BC and QG algorithms do not unwrap some areas. PUNet does not obtain a clear unwrapping effect in dataset1. For examples 1 and 2, the unwrapping results of RUKF are slightly smaller than those of ResDANet, but the unwrapping times of RUKF are much higher than those of ResDANet. ResDANet achieves smaller RMSE values and shorter computation time. Overall, ResDANet exhibits superior unwrapping efficiency compared to the six other unwrapping methods. Further details can be found in
Table 1 and
Table 2.
4.1.2. Analysis of the Results of the Second Training Set
In this section, we discuss the PU results of ResDANet trained on dataset2. We conducted unwrapping tests on simulated data from dataset2 and compared ResDANet with traditional phase unwrapping methods, including BC, QG, LS, and MCF, as well as RUKF and PUNet.
Figure 6 illustrates the unwrapping results obtained. It is worth noting that even when the dataset used to train ResDANet is changed, the unwrapping results remain excellent. Particularly for images with sloped phases, as shown in
Figure 6, ResDANet outperforms PUNet and traditional algorithms in unwrapping quality by exhibiting lower RMSE. While RUKF achieves unwrapping performance comparable to that of ResDANet, it requires longer unwrapping times. The robustness of traditional algorithms is not particularly evident when unwrapping different types of data. Overall, ResDANet demonstrates superior unwrapping efficiency compared to the six unwrapping methods discussed in the article. Furthermore, ResDANet demonstrates robustness in producing satisfactory unwrapping results across diverse datasets. Detailed information regarding the RMSEs and computation times for the unwrapped phase and true phase can be found in
Figure 6 and
Table 3 and
Table 4.
4.1.3. Real-Data Test Results
When testing real data, the wrapped phase we tested includes parts of the wrapped area in the Three Gorges region of China and parts of an Italian volcano, as shown in
Figure 7 [
31,
32]. From
Figure 8, the proposed method demonstrates smooth phase distributions when handling real data of varying sizes and minimizes interference and ambiguity in relation to the original phase. Even for large-scale real data, this method exhibits outstanding accuracy and reliability, laying a solid foundation for further analysis and applications. The residual counts of the rewrapped results and the time consumption required for phase unwrapping for the different methods are shown in
Table 5 and
Table 6, respectively. From
Table 5, it can be seen that the residual counts of the rewrapped phases in this method are closer to the residual counts of the wrapped phases for the real data. PUNet displays subpar performance in unwrapping real data for Three Gorges and the Italian volcano, underscoring the robustness of the methodology presented in this article. This article attempts to use BC to calculate the unwrapped phase of the Italian volcano, but due to the dense tangents of the Italian volcanic branches, the branch cutting method cannot complete the unwrapping calculation. Therefore, this article does not provide its unwrapped diagram.
4.2. Ablation Experiments
We conducted two rounds of ablation experiments. The network resulting from the removal of the FPA module from the ResDANet architecture was designated as ResGNet (residual and GAU net), while the network resulting from the removal of the GAU module was termed ResFNet (residual and FPA net). Our experimental findings reveal a notable degradation in the unwrapping performance of both ResGNet and ResFNet. During the phase unwrapping of data1 and data2 in dataset1, ResGNet and ResFNet exhibit noticeable unresolved patches. Furthermore, in the phase unwrapping process of data3 and data4, the unwrapped phase appears insufficiently smooth; this is particularly evident for data4, for which the unwrapping effectiveness significantly decreases. When unwrapping data2 from dataset2, the presence of sloping stripes and small patches results in an subpar unwrapping outcome. In the case of low-noise data4, unresolved small spots emerge during the unwrapping process. From the ablation experiment results, it is evident that the removal of the FPA or GAU modules significantly impacts the network performance. Conversely, ResDANet consistently delivers clear phase unwrapping results across various data types, fringe densities, and noise levels; these results are often accompanied by lower RMSE values. This underscores the effectiveness of the FPA and GAU modules within ResDANet and shows that they contribute significantly to the overall performance of this architecture.
Figure 9 illustrates the unwrapping outcomes, while detailed RMSE and unwrapping time information can be found in
Table 7 and
Table 8, respectively.
5. Conclusions
This article introduces ResDANet: a novel neural network that addresses the estimation of phase gradient information without assuming phase continuity. ResDANet combines the learning ability of U-Net, the deep feature extraction ability of ResNet, and the feature extraction abilities of GAU and FPA attention mechanisms. ResDANet improves the positioning and classification accuracy of nonzero phase fuzzy counting gradients in phase unwrapping tasks with better feature learning ability, accuracy, generalization ability, and robustness. It can effectively learn and accumulate correct phase gradient patterns from diverse wrapped images with varying noise levels and terrain features, resulting in more accurate phase gradient information. In network post-processing, the unwrapped phase gradient is made to achieve the best approximation between the unwrapped phase gradient and the true phase gradient, resulting in more accurate PU results. To demonstrate the robustness of ResDANet, the network is trained on two different datasets. This article also conducts ablation experiments to corroborate the superiority of ResDANet. The ablation experiments demonstrate the efficacy of the FPA and GAU modules within ResDANet, as they significantly contribute to the overall performance of the architecture. Compared to conventional unwrapping algorithms, ResDANet not only reduces computation time but also alleviates the potential impact of phase unwrapping error propagation when dealing with simulated and real data. Overall, the experimental evidence shows that ResDANet outperforms individual structures and traditional methods and demonstrates significant practical value in this field.