Next Article in Journal
The 3D Density Structure of the South China Sea Based on Wavelet Multi-Scale Analysis of Gravity Data and Its Tectonic Implications
Next Article in Special Issue
SSFAN: A Compact and Efficient Spectral-Spatial Feature Extraction and Attention-Based Neural Network for Hyperspectral Image Classification
Previous Article in Journal
A Novel Hybrid Deep-Learning Approach for Flood-Susceptibility Mapping
Previous Article in Special Issue
End-to-End Detail-Enhanced Dehazing Network for Remote Sensing Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Communication

A Transformer-Unet Generative Adversarial Network for the Super-Resolution Reconstruction of DEMs

School of Artificial Intelligence, Beijing Normal University, Beijing 100875, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(19), 3676; https://doi.org/10.3390/rs16193676
Submission received: 17 July 2024 / Revised: 25 September 2024 / Accepted: 27 September 2024 / Published: 1 October 2024

Abstract

:
A new model called the Transformer-Unet Generative Adversarial Network (TUGAN) is proposed for super-resolution reconstruction of digital elevation models (DEMs). Digital elevation models are used in many fields, including environmental science, geology and agriculture. The proposed model uses a self-similarity Transformer (SSTrans) as the generator and U-Net as the discriminator. SSTrans, a model that we previously proposed, can yield good reconstruction results in structurally complex areas but has little advantage when the surface is simple and smooth because too many additional details have been added to the data. To resolve this issue, we propose the novel TUGAN model, where U-Net is capable of multilayer jump connections, which enables the discriminator to consider both global and local information when making judgments. The experiments show that TUGAN achieves state-of-the-art results for all types of terrain details.

1. Introduction

The digital elevation model (DEM) is a kind of ground model that denotes the height of the ground in the form of a series of ordered digital arrays and numerically simulates the terrain through limited topographic elevation data from the Earth’s surface. DEMs have been widely used in the fields of natural disaster prevention, surveying, mapping, hydrology, meteorology, geological geomorphology, etc. [1,2,3]. DEMs with high resolution contain more information and can provide accurate descriptions. For example, urban river flood models constructed using high-resolution DEMs can improve the accuracy of flood predictions [4]; thus, the quality of DEMs plays an important role in their investigation and application.
The resolution of a DEM depends on the method used to generate the data, and many of the existing methods [5,6] for acquiring DEM data (e.g., LiDAR, InSAR interferometry, photogrammetry and existing topographic maps) are either too costly, not widely enough used in the region, vulnerable to weather or unable to provide the most up-to-date data. In addition, many super-resolution implementations in digital image field have proved to be effective. Therefore, super-resolution reconstruction of DEMs to obtain high-precision and high-resolution DEM data has become one of the most effective and feasible methods for improving the resolution of DEMs.
Recently, with the rapid advancement of deep learning, methods based on deep learning have swiftly developed in the DEM super-resolution domain. These methods [7,8,9,10] use artificial neural networks to learn the complex connections between low-resolution and high-resolution DEMs, and then use the connections to generate high-quality super-resolution DEMs. This approach overcomes the shortcomings of traditional interpolation methods [11,12,13,14,15], learns from a large amount of training data and generates high-resolution DEMs even in regions with complex terrains or incomplete data.
Image super-resolution reconstruction algorithms with deep learning [16] are categorized into single-image super-resolution (SISR) reconstruction methods [17] and reference-based image super-resolution (RefSR) reconstruction methods [18,19]. RefSR generates output images with high resolution by utilizing high-resolution reference images combined with input images with low resolution. In the super-resolution reconstruction process, a high-resolution reference image serves as a guide, supplementing the low-resolution input image with missing extra information while providing a more accurate representation of the details [20,21]. In the realm of DEM super-resolution reconstruction, algorithms based on deep learning primarily employ the SISR method for reconstruction [7,8,9,10]. Previously, we proposed SSTrans, which makes innovative use of terrain self-similarity to reconstruct images via super-resolution [22], largely solving the problem of difficult-to-obtain reference images. SSTrans can achieve good results in nonsmooth areas; however, the effect is not significant in smooth areas.
The remainder of this article is organized as follows. Section 2 introduces the background related to our research, which includes the basic model SSTrans and the discriminator U-Net. Section 3 introduces the model structure, including the overall framework of TUGAN, the architecture of the U-Net discriminator network, the introduction of spectral normalization and the introduction of the loss function. Section 4 analyses the outcomes of the experiments, including an evaluation of the reconstructed outcomes, a comparison with the other super-resolution algorithms and a comparison of the quality of reconstruction of the terrain attributes of the DEM. Section 5 concludes the paper.

2. Background

The Transformer network conceptualized by Vaswani et al. [23] is a neural network utilizing the self-attention mechanism, which was originally employed in natural language processing applications for tasks such as machine translation. In traditional neural network models (such as CNNs), the information passes through layers of neurons, and each neuron is connected only to the neurons above and below it. The position of each word or clause is an important parameter in this calculation. The self-attention mechanism strengthens the connections between units, which in turn improves the exact value.
Previously, to find a solution to the lack of reference image issue, we proposed SSTrans, which utilizes the terrain self-similarity feature to construct a DEM reference dataset; this approach is capable of obtaining a deep relationship between the DEM data with low resolution and the reference data and solves the problem of not having a publicly available DEM reference dataset for RefSR work. Using topographic self-similarity, constructed reference data can provide effective detailed information for low-resolution DEMs during the reconstruction process.
However, the reconstruction results for SSTrans are better in nonsmooth areas (for example, areas with large elevation differences) but not in smooth areas. This is because self-similarity makes the model focus more on the special and complex parts of the terrain, while the global features of the smooth terrain are more important than the local content is; therefore, the model cannot take advantage of the data. For the purpose of solving this problem, we propose a new GAN structure in which we use SSTrans as a generator and employ U-Net as a discriminator, with the aim of concurrently addressing both the global information and local conditions.
In addition, Schonfeld et al. [24] introduced a GAN architecture based on a U-Net [25,26,27] discriminator. U-Net GAN can supply comprehensive feedback, encompassing both global and local aspects to the generator through an encoding–decoding process, and the discriminator used in this method can assist the generator in producing more realistic images both globally and locally.
In this paper, we integrate ideas from the U-Net GAN to construct a newer, more efficient super-resolution network model for DEMS called TUGAN, mainly by using U-Net as the discriminator. On this basis, we can provide both global and local feedback to the generator through the encoding–decoding process, helping the generator to produce realistic images.

3. Model Structure

We create a new DEM super-resolution reconstruction method with Transformer and a generative adversarial network on the basis of SSTrans. In this new method, we use SSTrans as the generator and U-Net as the discriminator to construct the TUGAN to better feed the generator with the reconstruction information of each elevation point.
As shown in Figure 1, the DEM data with low-resolution I L R and the corresponding reference DEM data I R e f are input into the generator, and the output of the generator, G ( I L R , I R e f ) , is obtained after the backbone network operation, residual feature extraction and feature fusion in the generator. I H R is the real high-resolution DEM data, then G ( I L R , I R e f ) and I H R are inputted into the discriminator D e n c , which is the encoder part of the U-Net discriminator. The encoder obtains the global information by downsampling step by step and categorizes each data point. The encoder obtains the global information by stepwise downsampling and classifies each data point, and then, D d e c , the decoder of U-Net, performs the stepwise upsampling operation to obtain the output data with the same resolution as the input resolution. As a result, we can attain accurate positioning between the elevation points. The ability of U-Net to accurately segment detailed information can be further enhanced by adding jump connections between the encoder module and the decoder module with matching resolution data. By combining the encoder module and the decoder module, the discriminator can acquire the ability to distinguish between authentic data and synthetic data and transfer the loss information to the generator so that the generator can improve the local and global constructions to better deceive the discriminator.

3.1. Generator Network Architecture

We use SSTrans [22] as a generator for TUGAN, which is a Transformer structural model with significant generative power whose structure is shown in Figure 2. SSTrans utilizes the self-similarity of the terrain to construct the corresponding reference data for each DEM data. I R e f refers to the reference after double triple downsampling and double triple upsampling operations and I L R refers to the LR after double triple upsampling, which is included in data preprocessing step. Details in I L R and I R e f are extracted in redisual feature extraction. It employs an attention mechanism to identify the most pertinent details for super-resolution reconstruction. Finally, features V and F, along with maximum confidence weight matrix W for I L R and I R e f , generate the final I S R after feature synthesis.

3.2. Discriminator Network Architecture

We use U-Net structure as our discriminator. Here are two reasons for it: (1) The reconstruction quality of each reconstruction method is seriously degraded in regions with complex terrain surfaces. Therefore, it is important to enhance the performance of both the generator and the discriminator, where U-Net comes in; (2) The U-Net can also offer both global and detailed understanding on the terrain, providing adequate error feedback to the generator on each elevation point.
As shown in Figure 3, the network uses a convolutional kernel of size 4 × 4 for downsampling operations to perform the encoding process on the DEM data. A 2 × 2 upsampling and a 3 × 3 convolution operation are then used to obtain the data, and there is a consistent resolution with the input data to achieve accurate localization.
Since the U-Net structure contributes to an increase in training instability, to enhance the stability, Lipschitz constraints must be added to limit dramatic changes in the function, and the TUGAN method adds a regular spectral normalization expression after the convolutional layers of the discriminator to better maintain the continuity of Lipschitz. In a GAN, for discriminator D, if any x and y in the image space satisfy Equation (1),
D ( x ) D ( y )   K x y ,
then the discriminator is said to be k-Lipschitz continuous, where · is the L 2 regularization and k is the Lipschitz constant. The 1-Lipschitz continuity is achieved by dividing the parameter matrix W of the convolutional layer of the discriminator D by the spectral parameter. Spectral normalization is achieved by first decomposing the parameter matrix W by a singular value to achieve the maximum singular value and then dividing the matrix W through the utilization of the maximum singular value of W to achieve 1-Lipschitz continuity. The formula for spectral normalization is given by
W S N = W σ ( W ) ,
where σ ( W ) is the spectral parameter of matrix W.

3.3. Loss Function

Many GAN model super-resolution algorithms use the objective function proposed by the original GAN, which may result in unstable training, which leads to a disappearance of gradients or gradient explosion. In terms of the discriminator, TUGAN draws on the proposed spectral normalization method by adding spectrally normalized GANs (SNGANs) [28] to the discriminator network to strengthen the stability of the training; and with regard to the loss function, TUGAN borrows from the Wasserstein GAN with a gradient penalty (WGAN-GP) [29] and uses the method of adding a gradient penalty to enhance the stability of the training process.
Before introducing WGAN-GP, its predecessor Wasserstein GAN(WGAN) [30] should first be introduced. WGAN is also known as a GAN and uses the Wasserstein distance; most studies have proven that the use of the Wasserstein distance instead of K L dispersion and J S dispersion as similarity measures can resolve the gradient disappearance and pattern collapse problems during training. Calculating the Wasserstein distance requires implementing Lipschitz continuity so that the neural network f ( x ) fixed elements x 1 and x 2 are satisfied:
| f ( x 1 ) f ( x 2 ) |   k | x 1 x 2 | .
After organizing, the distribution distance formula can be formulated as follows:
L = D ( r e a l ) D ( G ( x ) ) .
The discriminator’s aim D is to distinguish the real data from the simulated data and increase the distance between these two data points; thus, the discriminator’s loss can be defined as follows:
D ( l o s s ) = D ( G ( x ) ) D ( r e a l ) .
The generator’s aim is to make the simulated data close to the real data; thus, the distance L needs to be minimized, and the generator’s loss can be defined as follows:
G ( l o s s ) = D ( G ( x ) ) .
WGAN utilizes weight trimming to maintain training stability but is relatively rigid; when the discriminator is a multilayer network, it is difficult to set an appropriate trimming threshold, so WGAN is prone to gradient disappearance or gradient explosion problems. Therefore, WGAN-GP introduces the method of gradient penalization to replace weight cropping, and the formula for gradient penalization is expressed as follows:
N o r m = g r a d ( D ( X i n t e r , [ X i n t e r ] ) ) ,
g r a d i e n t _ p e n a l t y s = M S E ( N o r m k ) ,
where X i n t e r is the sampling of x in the joint distribution space, N o r m denotes the gradient of x corresponding to D in the joint distribution space, and g r a d e n t _ p e n a l t y s is the mean-square error between N o r m and k. Usually, k is set to one. The loss of the discriminator after the addition of the gradient penalty can be defined as follows:
D ( l o s s ) = D ( G ( x ) ) D ( r e a l ) + λ g r a d i e n t _ p e n a l t y s ,
where λ is the parameter utilized to adjust the intensity of the gradient penalty.
The loss function of the TUGAN generator consists of two parts, adversarial loss and reconstruction loss, which can be expressed as follows:
L G = λ r e c L r e c + λ a d v L a d v ,
where L r e c uses the L 1 loss denoted as
L r e c = 1 W H x = 1 W y = 1 H G ( I x , y L R ) I x , y H R 1 .
The adversarial loss L a d v of the generator can be denoted as
L a d v = D ( G ( I L R ) ) .
The loss function of the TUGAN discriminator includes three parts: the encoding loss, the decoding loss and the gradient penalty, which can be represented as follows:
L D = L D e n c + L D d e c + λ g r a d i e n t _ p e n a l t y s .
where L D e n c is the loss of the encoder module of the discriminator denoted as
L D e n c = D ( G ( I L R ) D ( I H R ) .
L D d e c is the loss of the discriminator decoder module denoted as
L D d e c = 1 W H x = 1 W y = 1 H D ( G ( I x , y L R ) ) D ( I x , y H R )
g r a d i e n t _ p e n a l t y s is shown in Equation (8), which is λ and usually takes the value of 10.

3.4. Data Normalization

Consistent with the SSTrans method, we choose four regions in China, namely the Inner Mongolia Plateau, Qinling Mountains, Tarim Basin, and North China Plain, and automatically match a reference DEM for each high-resolution DEM. Due to the large range of elevation difference values in the DEM data, normalizing the elevation values to the range of [ 1 , 1 ] is essential, as shown below:
x n o r m a l i z a t i o n = 2 ( x x m i n ) x m a x x m i n 1 ,
where x denotes the elevation value in the DEM data, x m i n expresses the minimum elevation value in the DEM data, x m a x expresses the maximum elevation value in the DEM data and x n o r m a l i z a t i o n expresses the result after normalization.
The hyperparameters used in training were the Adam optimizer [31] with β 1 = 0.9 , β 2 = 0.99 , and ε = 1 × 10 8 ; the learning rate was set to 1 × 10 4 ; the reconstruction loss λ r e c = 1 ; and the antagonistic loss λ a d v = 1 × 10 3 . The network with only the anti-loss function is trained for 2 epochs, and then the network with all losses is trained for 50 epochs.

4. Experimental Results and Analysis

4.1. Data Descriptions

The ASTER GDEM V3 dataset provides the experimental data needed for this study with a data resolution of 30 m. As shown in Figure 4, we selected four representative regions in mainland China, namely the Inner Mongolia Plateau, the Qinling Mountains, the Tarim Basin and North China Plain, as the geographical dataset for assessing the performance of TUGAN. These regions have diverse topographic features with significant differences in terrain and elevation. Based on the characteristics of these regions, we established a DEM dataset on the basis of self-similarity. A total of 40 , 000 DEM data pairs were included, of which 30 , 000 pairs were used as our training set and the other pairs were used as our validation set. Each region stores 10 , 000 data pairs, of which 7500 pairs were used for training and 2500 pairs for confirmation. After super-resolution, the dataset was improved from 30 m to 10 m.
The reconstructed DEM super-resolution results were assessed using the root mean square error R M S E and mean absolute error M A E .
R M S E = 1 N i = 1 N ( I H R I S R ) 2 ,
M A E = 1 N i = 1 N | I H R I S R | .
The peak signal-to-noise ratio P S N R and structural similarity index measure S S I M were also evaluated in this article. A larger PSNR indicates that the gap between the original image and the reconstructed image is smaller.
S S I M ( x , y ) = ( 2 μ x μ y + c 1 ) ( 2 σ x y + c 2 ) ( μ x 2 + μ y 2 + c 1 ) ( σ x 2 + σ y 2 + c 2 ) ,
P S N R = 10 log 10 Δ S M S E .
The SSIM is an index used to evaluate the similarity between reconstructed data and original data, and the calculation results are very close to the human visual judgment standard; the higher the SSIM, the more similar the reconstructed data to the original data and the better the reconstruction quality.
SSIM is definited as in Equation (19), where x denotes the primary DEM data, y denotes the reconstructed DEM data, μ x expresses the average grayscale of x, μ y expresses the average grayscale of y, σ x y expresses the covariance of x and y, and σ x and σ y express the standard deviations of x and y, respectively. Moreover, parameters c 1 and c 2 are two constants set according to the empirical values.
The evaluation index of the terrain attributes of DEM data employs the mean absolute error (MAE), which is represented by E t p as follows:
E t p = 1 N i = 1 N t i t i
where t i and t i represents the terrain attribute of the original high-resolution DEM data and the reconstructed DEM data, respectively.

4.2. Results for the Four Test Regions

For the purpose of verifying the validity of the TUGAN method proposed in our article, 900 × 900 -sized DEM data from the test sets of the above four regions are selected. The maximum height differences among the four regions are shown in Table 1, and the evaluation indices of the TUGAN reconstruction results in the four regions are shown in Table 2. The visualization results of the TUGAN reconstruction are shown in Figure 5, where Figure 5b1–b4 shows the results of the TUGAN reconstruction of the four regions, and the experimental results are analyzed as follows.
Region 1, situated in the Inner Mongolia Plateau, has a high relief and simple and smooth surfaces, and the maximum elevation difference is 946 m. As shown in Table 2, the mean absolute error M A E is 3.08 , the R M S E is 4.24 , and the average M A E per 1000 m elevation difference is 3.25 . The PSNR reaches 35.57 and the structural similarity S S I M reaches 99.21 % , which is also shown in Figure 5b1, where the reconstructed DEM of the region is visually consistent with the original one. The error between the original DEM and the reconstructed DEM in this region is small, and the reconstruction result is stable.
Region 2 is located in the Qinling Mountains and has large topographic changes and many mountain ranges; however, the surface characteristics are similar to those of Region 1, both of which are relatively simple and smooth. The maximum elevation difference in Region 2 is 2338 m, which is the largest among the four regions. As shown in Table 2, the M A E value of this region is 10.32 , the R M S E value is 13.52 , the average M A E per 1000 m elevation difference is 4.41 and the PSNR value reaches 25.51 . Compared with Region 1, the SSIM value of this region is further improved to 99.32 % , which is close to the same as human visual judgement standard, as shown in Figure 5b2. Due to the large elevation difference in this region, the errors in all the indicators except for the SSIM greatly increase, which proves that the errors in the DEM super-resolution reconstruction strongly correlate with the elevation difference.
Region 3 is situated in the Tarim Basin and features little change in terrain, with a smaller maximum elevation difference of 203 m and a more complex topographic surface. As shown in Table 2, the M A E value of this region is 1.29 , the RMSE value is 1.69 , the average MAE per 1000 m of elevation difference is 6.35 , the PSNR value reaches 43.57 and the SSIM value reaches 97.21 % . As shown in Figure 5b3, due to the complexity of the topographic surface of the region, the visual effect results in more small pixels, and some subtle differences can be observed after local zooming. However, as the average MAE per 1000 m of elevation difference increases, the SSIM, an index for assessing similarity, decreases.
Region 4 is a plain located in North China, where the maximum elevation difference is 124 m and most locations are below 50 m. In terms of the complexity of the terrain surface, the reconstruction task is more difficult. In addition, the MAE of this region is 1.25 , the RMSE is 1.64 , the average reconstruction time per 1000 m is 1.5 m and the average reconstruction time per 1000 m is 2.5 m, which is greater than that of Region 3. As shown in Table 2, the MAE is 1.25 , the RMSE is 1.64 , the mean absolute error per 1000 m is 10.08 , the PSNR is 43.84 and the SSIM is 95.99 % . As shown in Figure 5b4, the errors in this region can be observed via local magnification in comparison with the original DEM. Due to the smaller height difference and the increased number of complex surface in Region 4 compared to Region 3, the quality of the reconstruction is reduced, and the mean absolute error per 1000 m is significantly greater.

4.3. Comparison with Other SR Methods

In this article, the TUGAN method is compared with the bicubic method [32], super-resolution convolutional neural network (SRCNN) [8], super-resolution generative adversarial network (SRGAN) [9], enhanced deep super-resolution network (EDSR) [7] and SSTrans [22] methods.
A comparison of each method for the four regions is shown in Table 3 and Figure 6, where the first column of Figure 6 is the visualization image of the primary DEM and the visualization image of the local magnified region of the primary DEMs is in the second column. The subsequent visualization reconstruction outcomes of each method in four regions correspond to each method.
Region 1 and Region 2 have large elevation differences and simple and smooth surfaces, and the EDSR method has slight advantages over the SSTrans method in both regions. In this article, the evaluation indicators of TUGAN improved by the SSTrans method exceeded those of the EDSR method, indicating significant improvement. In Region 1, the MAE of TUGAN was lower by 30.6 % than that of SSTrans, the RMSE was lower by 24.96 % than that of SSTrans, the PSNR was greater by 7.59 % than that of SSTrans and the SSIM was 0.28 % greater than that of SSTrans.
In Region 2, the TUGAN model still achieved good reconstruction results. The method reduces the MAE by 20.37 % compared to that of SSTrans; the RMSE by 18.16 % compared to that of SSTrans; the PSNR by 7.32 % compared to that of SSTrans; and the SSIM is 0.28 % higher than that of SSTrans, which is the same as the SSIM value of the EDSR. As shown in Figure 6, visual analysis revealed that Regions 1 and 2 have high SSIM values and high similarity, making it challenging to subjectively assess the advantages and disadvantages of the methods through visual perception.
The elevation difference between Region 3 and Region 4 is relatively small, and the terrain surface is more complex, requiring more detailed information for reconstruction. Therefore, the reconstruction quality of each method decreases to some extent compared to that of Region 1 and Region 2. The TUGAN method proposed in this article, an improved method based on SSTrans, has achieved further improvements in reconstruction evaluation indicators.
In Region 3, the MAE of TUGAN was lower by 16.7 % than that of SSTrans, the RMSE was lower by 16.75 % than that of SSTrans, the PSNR was greater by 3.76 % than that of SSTrans and the SSIM was greater by 1.08 % than that of SSTrans. In Region 3, the reconstruction quality of TUGAN was much better than that of the other methods, with both methods achieving an SSIM above 90 % , while the other methods did not exceed 90 % .
The terrain surface of Region 4 is more complex than that of Region 3, and the reconstruction quality of each method decreased. However, SSTrans and TUGAN maintained the advantages obtained on the basis of reference data, with TUGAN having a slightly higher quality than SSTrans. The MAE of TUGAN was 23.31 % lower than that of SSTrans, the RMSE was 23.36 % lower than that of SSTrans, the PSNR was 5.61 % higher than that of SSTrans and the SSIM was 1.88 % higher than that of EDSR. In regions with more complex topographic surfaces, such as Region 4, TUGAN achieved greater improvement than Region 3.
In general, in Regions 1 and 2, the reconstruction effect of the SSTrans was slightly worse than that of the EDSR proposed in this article. After improving the SSTrans, the reconstruction quality of the TUGAN model exceeded that of the EDSR model in these two regions. In Regions 3 and 4, the TUGAN maintained the advantages of SSTrans in these two regions and further improved the quality of reconstruction.
To provide further elucidation of the research findings, Figure 7 presents a visual depiction of the reconstruction of North China. The figure illustrates the absolute value of the difference between each pixel point in the HR image and the DEM image after super-resolution processing. When the difference is minimal, the corresponding pixel color appears as a grey shade. Conversely, when the difference is significant, the pixel color takes on a blue tint. This chromatic variation not only aids in the identification of the precision of each region in the reconstruction process but also reflects the effectiveness of the model in responding to diverse terrain characteristics. In terms of visual assessment, other models exhibit a marked deficiency compared to our model with respect to the aforementioned difference.

4.4. Terrain Attribute Analysis for DEM

In addition to an analysis of the reconstruction results of the DEM elevation value, this article also selects slope and aspect to analyze the reconstruction effect of the model.

4.4.1. Slope Reconstruction Analysis

The slope attribute refers to the terrain slope at each location. It can be used to describe the steepness and inclination direction of the terrain. The slope is an important indicator in many geographic analyses and applications, usually expressed in percentage or angle. The slope calculation formula is
S l o p e = 57.29578 × arctan ( d z d x ) 2 + ( d z d y )
where 57.29578 is the result of 180 π and the scanning window is shown in Figure 8. d z d x and d z d y can be represented as follows:
d z d x = 4 × ( c + 2 f + i ) w g h t 1 4 × ( a + 2 d + g ) w g h t 2 8 × x C e l l s i z e
d z d y = 4 × ( g + 2 h + i ) w g h t 3 4 × ( a + 2 b + c ) w g h t 4 8 × y C e l l s i z e
where w g h t 1 , w g h t 2 , w g h t 3 , w g h t 4 represent the number of valid cells. For example, if the values cf and i are all valid, then w g h t 1 = ( 1 + 2 + 1 ) = 4 . This same calculation applies to w g h t 2 , w g h t 3 and w g h t 4 . Variables x C e l l s i z e and y C e l l s i z e represent the resolution of the DEM in the x-axis direction and the y-axis direction, respectively.
The evaluation of the slope reconstruction results of TUGAN and other methods in four regions is shown in Table 4. The slope value range is [0, 90], and the calculation method of the evaluation index is shown in Equation (21). Figure 9 shows the visual comparison of the slope reconstruction results of TUGAN and other methods. This article selects the slope reconstruction results of TUGAN in Region 4 for visual comparison with SRGAN, EDSR and SSTrans.
In comparison with alternative methods, TUGAN demonstrated superior performance across all four regions. In Region 1, the reconstruction index of TUGAN was found to be 5.61 % lower than that of SSTrans. Similarly, in Region 2, the reconstruction index of TUGAN was observed to be 10.64 % lower than that of SSTrans. In Region 3, the reconstruction index of TUGAN was found to be 9.82 % lower than that of SSTrans. Similarly, in Region 4, the reconstruction index of TUGAN was observed to be 43.9 % lower than that of SSTrans. The visualization results for Region 4 are presented in Figure 9. It is evident that the reconstruction outcomes of TUGAN restored more intricate details in comparison to those of SSTrans, SRGAN and EDSR. The reconstruction results of TUGAN were further enhanced in terms of both indicators and visual effects.

4.4.2. Aspect Reconstruction Analysis

The aspect attribute refers to the direction of terrain slope. The aspect attribute can be used to describe the direction and orientation of terrain. It is an important indicator in many geographic analyses and applications. It is usually expressed in degrees or azimuth. The calculation formula for the aspect is
A s p e c t = 57.29578 × atan 2 ( d z d x , d z d y )
where the scanning window is shown in Figure 8. Then, d z d x and d z d y can be calculated as follows:
d z d x = 4 × ( c + 2 f + i ) w g h t 1 4 × ( a + 2 d + g ) w g h t 2 8
d z d y = 4 × ( g + 2 h + i ) w g h t 3 4 × ( a + 2 b + c ) w g h t 4 8
where w g h t 1 , w g h t 2 , w g h t 3 , w g h t 4 are consistent with those indicated in Equations (23) and (24).
Table 5 presents a quantitative evaluation of the aspect reconstruction results of TUGAN and various methods in four regions. The range of aspect values is [0, 360], and the calculation method of the evaluation index is illustrated in Equation (21). Figure 10 depicts a visual comparison of the aspect reconstruction results of TUGAN and various methods. This article selects the aspect reconstruction results of TUGAN in Region 4 for visual comparison with SRGAN, EDSR and SSTrans methods.
In Region 1, TUGAN’s reconstruction index is 5.7 % lower than SSTrans; in Region 2, TUGAN’s reconstruction index is 8.16 % lower than SSTrans; in Region 3, TUGAN’s reconstruction index is 0.42 % lower than SSTrans; in Region 4, TUGAN’s reconstruction index is 8.53 % lower than SSTrans. In comparison to alternative methods, the reconstruction results of TUGAN are considerably superior. The visualization results for Region 4 are shown in Figure 10. In this region, TUGAN exhibits high-quality, detailed information that is more accurate than that generated by SSTrans, SRGAN and EDSR methods.

5. Future Work

The TUGAN proposed in this article generated high-quality reconstruction results, but there is still some room for further research, of which the outlook is as follows.
This article intended to conduct a study in different topographic regions in China, where only publicly available datasets with 30 m resolution can be downloaded. The method proposed in this article was shown to perform well on the 30 m resolution dataset, but was not validated on the higher-resolution, for example, 10 m dataset. Therefore, future research will try to deepen the cooperation with researchers in the field of geography to obtain higher resolution data in China and even in the world to validate the effectiveness of TUGAN in high-resolution DEM data.
In addition, in super-resolution reconstruction, the larger the scale factor, the worse the reconstruction quality. In this article, the experiments were carried out with a magnification scale of four times, and in future work, larger scale reconstruction experiments will be attempted in order to obtain higher-resolution DEM data.

6. Conclusions

In conclusion, this article proposes a novel DEM super-resolution reconstruction method called TUGAN, which inherits the advantages of SSTrans and U-Net GAN to enhance the resolution of DEMs. TUGAN uses self-similarity and transformer-based features to construct high-resolution DEMs on the basis of low-resolution inputs, which overcomes the challenges posed by smooth terrain surfaces.
The results of our experiments prove the effectiveness of TUGAN in the super-resolution reconstruction of DEMs, especially in regions with varying terrain complexities and elevation differences. TUGAN outperforms several existing super-resolution methods, including Bicubic, SRGAN, SRCNN, EDSR and its base method, SSTrans, with regard to diverse evaluation metrics such as the MAE, RMSE, PSNR and SSIM. The improvements in reconstruction quality are particularly noticeable in regions with complex terrains.
The TUGAN model is capable of obtaining more accurate and detailed terrain information at a reduced cost due to its superior super-resolution reconstruction effect. The TUGAN model can be employed to conduct a more comprehensive analysis of terrain features, such as assisting geological departments in the development of more precise water flow models, which in turn can facilitate more accurate predictions of flood occurrences.
Future research directions may involve further optimization of the TUGAN model and its application to other geographical regions and datasets. Additionally, exploring other effective models for super-resolution reconstruction might prove to be valuable for future endeavors in this field.

Author Contributions

Conceptualization, X.Z. and Q.Y.; methodology, X.Z. and Z.B.; validation, X.Z. and Z.B.; formal analysis, Z.B.; data curation, Z.B. and Z.C.; writing—original draft preparation, Z.B., Z.X., Z.C. and S.W.; writing—review and editing, Z.X., Z.C. and S.W.; supervision, X.Z.; project administration, X.Z. and Q.Y.; funding acquisition, X.Z. and Q.Y. All authors have read and agreed to the published version of the manuscript.

Funding

The research was funded by STI2030-Major Projects 2022ZD0205000 and the Joint Research Fund in Astronomy (U2031136) under cooperative agreement between the NSFC and CAS.

Data Availability Statement

These data were derived from the following resources available in the public domain: https://lpdaac.usgs.gov/products/astgtmv003/, accessed on 29 September 2024.

Acknowledgments

The authors express their gratitude to the reviewers for their valuable feedback and helpful suggestions. Qian Yin is the author to whom all correspondence should be addressed.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. de Almeida, G.A.; Bates, P.; Ozdemir, H. Modelling urban floods at submetre resolution: Challenges or opportunities for flood risk management? J. Flood Risk Manag. 2018, 11, S855–S865. [Google Scholar] [CrossRef]
  2. Cook, A.; Merwade, V. Effect of topographic data, geometric configuration and modeling approach on flood inundation mapping. J. Hydrol. 2009, 377, 131–142. [Google Scholar] [CrossRef]
  3. Baier, G.; Rossi, C.; Lachaise, M.; Zhu, X.X.; Bamler, R. A nonlocal InSAR filter for high-resolution DEM generation from TanDEM-X interferograms. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6469–6483. [Google Scholar] [CrossRef]
  4. Muthusamy, M.; Casado, M.R.; Butler, D.; Leinster, P. Understanding the effects of Digital Elevation Model resolution in urban fluvial flood modelling. J. Hydrol. 2021, 596, 126088. [Google Scholar] [CrossRef]
  5. Liu, X. Airborne LiDAR for DEM generation: Some critical issues. Prog. Phys. Geogr. 2008, 32, 31–49. [Google Scholar] [CrossRef]
  6. Shan, J.; Aparajithan, S. Urban DEM generation from raw LiDAR data. Photogramm. Eng. Remote Sens. 2005, 71, 217–226. [Google Scholar] [CrossRef]
  7. Zhou, A.; Chen, Y.; Wilson, J.P.; Su, H.; Xiong, Z.; Cheng, Q. An enhanced double-filter deep residual neural network for generating super resolution DEMs. Remote Sens. 2021, 13, 3089. [Google Scholar] [CrossRef]
  8. Chen, Z.; Wang, X.; Xu, Z.; Hou, W. Convolutional neural network based dem super resolution. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 41, 247–250. [Google Scholar] [CrossRef]
  9. Demiray, B.Z.; Sit, M.; Demir, I. D-SRGAN: DEM super-resolution with generative adversarial networks. SN Comput. Sci. 2021, 2, 48. [Google Scholar] [CrossRef]
  10. Wang, Y.; Jin, S.; Yang, Z.; Guan, H.; Ren, Y.; Cheng, K.; Zhao, X.; Liu, X.; Chen, M.; Liu, Y.; et al. TTSR: A Transformer-based Topography Neural Network for Digital Elevation Model Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4403719. [Google Scholar] [CrossRef]
  11. Shepard, D. A two-dimensional interpolation function for irregularly-spaced data. In Proceedings of the 23rd ACM National Conference, Princeton, NJ, USA, 27–29 August 1968; pp. 517–524. [Google Scholar] [CrossRef]
  12. Chaplot, V.; Darboux, F.; Bourennane, H.; Leguédois, S. Accuracy of interpolation techniques for the derivation of digital elevation models in relation to landform types and data density. Geomorphology 2006, 77, 126–141. [Google Scholar] [CrossRef]
  13. Sibson, R. A brief description of natural neighbour interpolation. In Interpreting Multivariate Data; John Wiley & Sons: New York, NY, USA, 1981; pp. 21–36. [Google Scholar]
  14. Wang, B.; Shi, W.; Liu, E. Robust methods for assessing the accuracy of linear interpolated DEM. Int. J. Appl. Earth Obs. Geoinf. 2015, 34, 198–206. [Google Scholar] [CrossRef]
  15. Aguilar, F.J.; Agüera, F.; Aguilar, M.A.; Carvajal, F. Effects of terrain morphology, sampling density, and interpolation methods on grid DEM accuracy. Photogramm. Eng. Remote Sens. 2005, 71, 805–816. [Google Scholar] [CrossRef]
  16. Yang, W.; Zhang, X.; Tian, Y.; Wang, W.; Xue, J.H.; Liao, Q. Deep Learning for Single Image Super-Resolution: A Brief Review. IEEE Trans. Multimed. 2019, 21, 3106–3121. [Google Scholar] [CrossRef]
  17. Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 1833–1844. [Google Scholar] [CrossRef]
  18. Lu, L.; Li, W.; Tao, X.; Lu, J.; Jia, J. Masa-sr: Matching acceleration and spatial adaptation for reference-based image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6368–6377. [Google Scholar] [CrossRef]
  19. Yang, F.; Yang, H.; Fu, J.; Lu, H.; Guo, B. Learning texture transformer network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5791–5800. [Google Scholar] [CrossRef]
  20. Zheng, H.; Ji, M.; Wang, H.; Liu, Y.; Fang, L. Crossnet: An end-to-end reference-based super resolution network using cross-scale warping. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 88–104. [Google Scholar] [CrossRef]
  21. Yue, H.; Sun, X.; Yang, J.; Wu, F. Landmark image super-resolution by retrieving web images. IEEE Trans. Image Process. 2013, 22, 4865–4878. [Google Scholar] [CrossRef]
  22. Zheng, X.; Bao, Z.; Yin, Q. Terrain Self-Similarity-Based Transformer for Generating Super Resolution DEMs. Remote Sens. 2023, 15, 1954. [Google Scholar] [CrossRef]
  23. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
  24. Schonfeld, E.; Schiele, B.; Khoreva, A. A u-net based discriminator for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8207–8216. [Google Scholar] [CrossRef]
  25. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 June 2015; pp. 234–241. [Google Scholar] [CrossRef]
  26. Guo, C.; Li, C.; Guo, J.; Cong, R.; Fu, H.; Han, P. Hierarchical features driven residual learning for depth map super-resolution. IEEE Trans. Image Process. 2018, 28, 2545–2557. [Google Scholar] [CrossRef]
  27. Gao, F.; Xu, X.; Yu, J.; Shang, M.; Li, X.; Tao, D. Complementary, heterogeneous and adversarial networks for image-to-image translation. IEEE Trans. Image Process. 2021, 30, 3487–3498. [Google Scholar] [CrossRef]
  28. Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. arXiv 2018, arXiv:1802.05957. [Google Scholar]
  29. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A.C. Improved training of wasserstein gans. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
  30. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar] [CrossRef]
  31. Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
  32. Ruangsang, W.; Aramvith, S. Efficient super-resolution algorithm using overlapping bicubic interpolation. In Proceedings of the 2017 IEEE 6th Global Conference on Consumer Electronics (GCCE), Nara, Japan, 10–13 October 2017; pp. 1–2. [Google Scholar]
Figure 1. G is SSTrans, I L R is low-resolution DEM data, I R e f is corresponding reference DEM data, G ( I L R , I R e f ) is output result of generator and D e n c is U-net discriminator. In encoder part, D d e c is decoder part of U-net discriminator.
Figure 1. G is SSTrans, I L R is low-resolution DEM data, I R e f is corresponding reference DEM data, G ( I L R , I R e f ) is output result of generator and D e n c is U-net discriminator. In encoder part, D d e c is decoder part of U-net discriminator.
Remotesensing 16 03676 g001
Figure 2. The structure of SSTrans [22]. The features of I L R , I R e f , I R e f , and I L R , which are extracted by the residual network, are F, V, K, and Q, respectively. The matrix P and W are the position matrix and weight matrix, respectively, calculated by the relevance calculation. V is the HR feature representation of I L R . The super-resolution output, denoted by I S R , is obtained by a pixel-wise addition between F and a feature that is the synthesis of the matrix F, V , and W.
Figure 2. The structure of SSTrans [22]. The features of I L R , I R e f , I R e f , and I L R , which are extracted by the residual network, are F, V, K, and Q, respectively. The matrix P and W are the position matrix and weight matrix, respectively, calculated by the relevance calculation. V is the HR feature representation of I L R . The super-resolution output, denoted by I S R , is obtained by a pixel-wise addition between F and a feature that is the synthesis of the matrix F, V , and W.
Remotesensing 16 03676 g002
Figure 3. TUGAN Discriminator.
Figure 3. TUGAN Discriminator.
Remotesensing 16 03676 g003
Figure 4. Four representative regions used for analysis. The Inner Mongolia Plateau is shown in (a), which covers an area between 41°N and 43°N, and 111°E and 113°E. The Qinling Mountains is shown in (b), which ranges from 32°N to 34°N, and 108°E to 110°E. The Tarim Basin is shown in (c), which ranges from 39°N to 41°N, and 81°E to 83°E. The North China Plain is shown in (d), whose region is between 35°N and 37°N and 115°E and 117°E.
Figure 4. Four representative regions used for analysis. The Inner Mongolia Plateau is shown in (a), which covers an area between 41°N and 43°N, and 111°E and 113°E. The Qinling Mountains is shown in (b), which ranges from 32°N to 34°N, and 108°E to 110°E. The Tarim Basin is shown in (c), which ranges from 39°N to 41°N, and 81°E to 83°E. The North China Plain is shown in (d), whose region is between 35°N and 37°N and 115°E and 117°E.
Remotesensing 16 03676 g004
Figure 5. Images of the TUGAN reconstruction. Figure (a1,b1,a3,b3) illustrate the original high-resolution images of Region 1, Region 2, Region 3, and Region 4, respectively. Figure (a2,b2,a4,b4) display the TUGAN-reconstructed images of region 1, region 2, region 3, and region 4, respectively.
Figure 5. Images of the TUGAN reconstruction. Figure (a1,b1,a3,b3) illustrate the original high-resolution images of Region 1, Region 2, Region 3, and Region 4, respectively. Figure (a2,b2,a4,b4) display the TUGAN-reconstructed images of region 1, region 2, region 3, and region 4, respectively.
Remotesensing 16 03676 g005
Figure 6. The effect of each method.
Figure 6. The effect of each method.
Remotesensing 16 03676 g006
Figure 7. The visual comparison of each method.
Figure 7. The visual comparison of each method.
Remotesensing 16 03676 g007
Figure 8. In calculating the DEM attributes at e, a nine-square scanning window is employed, with its center located at e.
Figure 8. In calculating the DEM attributes at e, a nine-square scanning window is employed, with its center located at e.
Remotesensing 16 03676 g008
Figure 9. The comparison of the slope reconstruction results of each method.
Figure 9. The comparison of the slope reconstruction results of each method.
Remotesensing 16 03676 g009
Figure 10. The comparison of the aspect reconstruction results of each method.
Figure 10. The comparison of the aspect reconstruction results of each method.
Remotesensing 16 03676 g010
Table 1. The maximum elevation difference of the four regions.
Table 1. The maximum elevation difference of the four regions.
RegionsMaximum Elevation (m)Minimal Elevation (m)Maximum Elevation Difference (m)
Region 122061260946
Region 225281902338
Region 31109906203
Region 41295124
Table 2. The TUGAN reconstruction results in the four regions.
Table 2. The TUGAN reconstruction results in the four regions.
RegionsMAE (m)RMSE (m)PSNR (dB)SSIM (%)
Region 13.084.2435.5799.21%
Region 210.3213.5225.5199.32%
Region 31.291.6943.5797.21%
Region 41.251.6443.8495.99%
Table 3. The comparison of each method in the four regions.
Table 3. The comparison of each method in the four regions.
RegionsMethodsMAE (m)RMSE (m)PSNR (dB)SSIM (%)
Region 1Bicubic6.127.3030.4797.31%
SRGAN6.178.1529.9096.09%
SRCNN5.026.2631.9198.38%
EDSR3.805.0635.3198.60%
SSTRans4.445.6533.0698.93%
TUGAN3.084.2435.5799.21%
Region 2Bicubic15.2419.2821.3998.21%
SRGAN17.7923.1020.8697.54%
SRCNN14.8618.2022.0298.85%
EDSR11.4015.3427.3899.32%
SSTRans12.9616.5223.7799.04%
TUGAN10.3213.5225.5199.32%
Region 3Bicubic2.463.1838.0886.32%
SRGAN2.102.7839.2287.71%
SRCNN2.222.8738.9989.37%
EDSR1.983.2439.9189.80%
SSTRans1.552.0341.9996.13%
TUGAN1.291.6943.5797.21%
Region 4Bicubic2.533.3237.0774.76%
SRGAN2.483.2937.7977.08%
SRCNN2.403.1738.1078.35%
EDSR2.313.0838.3679.22%
SSTRans1.632.1441.5194.11%
TUGAN1.251.6443.8495.99%
Table 4. Quantitative evaluation of slope reconstruction results of TUGAN and other methods.
Table 4. Quantitative evaluation of slope reconstruction results of TUGAN and other methods.
RegionsBicubicSRGANSRCNNEDSRSSTransTUGAN
Region 13.304.073.053.021.961.85
Region 25.287.665.174.664.043.61
Region 32.502.132.112.191.121.01
Region 42.932.422.522.481.640.92
Table 5. Quantitative evaluation of aspect reconstruction results of TUGAN and other methods.
Table 5. Quantitative evaluation of aspect reconstruction results of TUGAN and other methods.
RegionsBicubicSRGANSRCNNEDSRSSTransTUGAN
Region 168.1175.3963.9762.5839.7837.51
Region 229.6042.3928.7125.3722.1720.36
Region 384.4186.0579.2578.1333.2933.15
Region 486.9987.0783.7481.8333.7530.87
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zheng, X.; Xu, Z.; Yin, Q.; Bao, Z.; Chen, Z.; Wang, S. A Transformer-Unet Generative Adversarial Network for the Super-Resolution Reconstruction of DEMs. Remote Sens. 2024, 16, 3676. https://doi.org/10.3390/rs16193676

AMA Style

Zheng X, Xu Z, Yin Q, Bao Z, Chen Z, Wang S. A Transformer-Unet Generative Adversarial Network for the Super-Resolution Reconstruction of DEMs. Remote Sensing. 2024; 16(19):3676. https://doi.org/10.3390/rs16193676

Chicago/Turabian Style

Zheng, Xin, Zhaoqi Xu, Qian Yin, Zelun Bao, Zhirui Chen, and Sizhu Wang. 2024. "A Transformer-Unet Generative Adversarial Network for the Super-Resolution Reconstruction of DEMs" Remote Sensing 16, no. 19: 3676. https://doi.org/10.3390/rs16193676

APA Style

Zheng, X., Xu, Z., Yin, Q., Bao, Z., Chen, Z., & Wang, S. (2024). A Transformer-Unet Generative Adversarial Network for the Super-Resolution Reconstruction of DEMs. Remote Sensing, 16(19), 3676. https://doi.org/10.3390/rs16193676

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop