1. Introduction
Precipitation is one of the key components of the hydrological cycle [
1], and data related to it are often employed as the primary input data for rainfall–runoff models. Adequate and accurate precipitation information is required not only for climate studies but, also, for water resource management, as well as for forecasting extreme events such as floods or droughts [
2,
3]. Nowadays, precipitation data can be obtained from various sources. The network of ground-based rainfall gauges using automatic or semi-automatic sensors is one of the most widely used data sources because of its accuracy and recorded historical data series. An important characteristic of gauge-based rainfall data is that they are point data and represent an area defined by a limited radius around the location of the device [
4]. Moreover, the density of the measuring stations is uneven between areas, and their locations are often in favor of the accessible lower-lying areas [
5]. Therefore, it is necessary to have a high-resolution spatial dataset that can effectively capture the variations in spatial precipitation.
Satellite-based precipitation products covering a large area over an extended time interval have received growing interest from scientists, as such data promise to comprise a reliable source [
5]. Several gridded satellite-based precipitation products with high resolution and global (or near-global) coverage are now available, such as the Tropical Rainfall Measuring Mission (TRMM) product with a resolution of 0.25° [
6], the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) product with a resolution of 0.25° [
7,
8], the Climate Hazards Center InfraRed Precipitation with Station data (CHIRPS) product with a resolution of 0.05° [
9], and the Climate Prediction Center morphing technique (CMORPH) product with a grid resolution of 8 km [
10]. Precipitation information obtained from these products constitutes a valuable data source for distributed hydrological models, especially for areas wherein the density of ground-based rainfall gauges is low or for large river basins spanning many countries. The differences between satellite-based precipitation products are mainly characterized by measured technologies and data-retrieval algorithms. The effectiveness of using gridded satellite-derived precipitation products as inputs to rainfall-runoff models has been previously reported [
2,
4,
11,
12,
13,
14,
15]. However, certain limitations of satellite-based precipitation datasets have also been identified in several studies [
3,
16]. According to Derin and Yilmaz [
17], satellite-derived precipitation products generally find it difficult to represent values in complex terrain areas, because these areas are characterized by high spatial variations. Moreover, the characteristics of the study area, such as geographic location, topographic features, and vegetation cover, also have a considerable influence on the hydrological performance of satellite precipitation products [
5,
18].
Therefore, the reanalysis of satellite-based precipitation products by region is necessary before these data can be used for climate studies or rainfall-runoff modeling. The widely used method of reanalyzing satellite-based products is the precipitation bias correction. The nature of the bias correction depends closely on the density of the ground-based gauge networks and the retrieval algorithms. There have been a variety of bias correction methods developed to solve this problem—for instance, the linear scaling [
19], multiplicative [
20], distribution mapping [
21,
22], and genetic algorithm [
23] methods. After being corrected by one of the bias correction methods, the adjusted precipitation was exploited as the input of the hydrological model for a case study basin. The basins selected for the aforementioned studies are generally medium-sized basins for which a network of rain gauges is available. With respect to the large river basins flowing through many countries, collecting data over a long time period is a challenging task because of the data-sharing policy of the national meteorological agencies of the countries involved.
In this study, a bias correction method based on the convolution neural network (CNN) model was developed for the purpose of reanalyzing satellite-derived precipitation data. Additionally, the Mekong River basin has been selected, because it is one of the largest river basins in the world [
24]. The Mekong River basin covers six countries (mostly developing countries) with a variety of climates and topographical features. The reliable information on precipitation for the Mekong River basin will be useful in forecasting extreme events such as floods or droughts.
The two gridded precipitation data sources used in this study are the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR) [
25] and Asian Precipitation-Highly Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE) [
26]. PERSIANN-CDR is a product of the PERSIANN family products, which is useful for research on a scale suitable for extreme weather events [
25]. Moreover, PERSIANN-CDR products are available and could be easily accessed for various purposes [
8]. Meanwhile, APHRODITE is the gridded precipitation product of an international cooperation program conducted by the Japanese Meteorological Agency and other countries through the collection and analysis from thousands of Asian stations [
26]. Therefore, APHRODITE datasets are often considered as observation data for research in the Mekong Region, [
27,
28,
29] as well as for Asia [
30,
31]. However, a considerable limitation of studies using APHRODITE precipitation data is the availability of this data, which is only available up to 2015 (available 1998-2015 for version V1901), since this is a product conducted through international cooperation projects (APHRODITE projects).
With the aim of producing a more up-to-date dataset than that of the APHRODITE product (which was paused in 2015), sufficiently reliable for the Mekong basin studies, a convolutional autoencoder (ConvAE) neural network model was constructed to correct the rainfall bias from satellite-based products. PERSIANN-CDR is considered a satellite-derived precipitation product, while APHRODITE is referred to as a gauge-based observation product, and both of these products have the same spatial resolution of 0.25°. In addition to the ConvAE neural network model, another bias correction method based on the standard deviation statistic was also applied to correct the pixel-to-pixel bias for the satellite-based products. The performance from these two methods has been examined by comparing statistical properties—for example: mean, standard deviation, distribution, and correlation with an independent observation dataset.
3. Model Application
This study has proposed two approaches, (1) based on the ConvAE model, and (2) the statistical method, to reanalyze the satellite-derived precipitation. Besides, the results of this study are closely related to open-source software libraries. Accordingly, the programming language used throughout the study was Python [
42]. Several processes such as data processing, data management, or data visualization were accomplished using Numpy [
43], Pandas [
44], and Matplotlib [
45] libraries. For the ConvAE model, our work exploited a Python deep-learning library, Keras—A high-level neural network API (application programming interface) [
46]—and used TensorFlow [
47] as the backend. All ConvAE models were implemented on Google Colaboratory (also known as Colab), which is a free Google cloud service based on the Jupyter Notebook [
48].
3.1. ConvAE Neural Network
For the ConvAE network model, the input data (PERSIANN-CDR) and target data (APHRODITE) were two daily gridded precipitation products and have the same grid size of 100 × 60, as stated above. Similar to other neural network models, the performance of the ConvAE model undergoes careful evaluation through training, validation, and testing. All of the 18-year data available were divided into three nonoverlapping datasets for these three purposes. The first dataset employed for the purpose of training the model covered 14 years (1998–2011). The second dataset, spanning 2 years (2011–2013), was used for the purpose of validating the model performance. The remaining dataset, spanning the period 2014–2015 (2 years), was used to objectively verify the performance of the model through comparison with two corrected datasets from the ConvAE neural network model and standard deviation method.
For most CNN models, there is no specific reference structure for the selection of layers, number of layers, and order of layers, as well as the hyperparameters inside the model. Proposing an optimal architecture is usually based on a careful trial and error evaluation process. With respect to the precipitation bias correction problem from satellite-based products, several ConvAE models developed based on typical structures, such as VGGNet [
49] or Unet [
50], were also considered. However, the corrected data from these models were not satisfactory when compared to the observed data.
According to Karpathy [
37], the most prevalent form of CNN architecture is stacking several convolution layers, followed by pooling layers, and repeating this pattern until the desired spatial dimension is reached. In this study, the proposed ConvAE model has the structure illustrated in
Figure 6, which is a combination of two network architectures, the encoder network, and the decoder network.
The model’s input and target data are raster data (2-dimensional) and have the same dimensions of 100 × 60 × 1, where the parameters correspond to the height, width, and depth in the CNN model [
37], respectively. In the first part, the encoder architecture, the arrangement of two convolution layers is stacked before every pooling layer, with the idea of making the network model larger and deeper to better capture the complex features of the input data [
37].
For the convolution layer, the filter parameter is referred to as the number of output filters in convolution required to generate feature maps by applying convolution operations. The recommended number of filters in this study started at 32 and then increased to 64, 128, and 256 in the deeper layer. The selection of the number of filters has a power function of 2, with the aim to save computer resources when processing data. In addition to the number of filters, the kernel size parameter is also an important parameter in the convolution layer, specifying the width and height of the 2D convolution window [
46]. According to Rosebrock [
51], the kernel size values are usually odd numbers, and large kernel values (5 × 5 or 7 × 7) are often considered to be applied to data larger than 128 × 128 in order to quickly reduce the spatial dimensions. For this study, the recommended kernel size value in the convolution layers is 3 × 3, because the spatial dimension of the input data is only 100 × 60.
Adding a pooling layer after the convolution layer is a popular pattern used for arranging layers within the CNN. The pooling operation is applied on each feature map (which is created after the convolution operation) using the pool size parameters to produce a new set with the same number of feature maps; however, the dimension of each feature map will be reduced. The size of the pooling operation (pool size) is smaller than the size of the feature map, and a pool size value of 2 × 2 pixels is usually applied to each pooling operation [
52]. This means the spatial dimension of the feature map will be halved (both horizontal and vertical) after the pooling operation. Moreover, AveragePooling and MaxPooling are two widely used functions to reduce the spatial dimension of a feature map. While AveragePooling calculates the average value for a patch on a feature map, the MaxPooling chooses the maximum value. Before deciding MaxPooling was the pooling function in this study, a comparison of the model performance was carried out by applying the two mentioned above functions in turn. The results indicated that the MaxPooling function is better at capturing higher values than the AveragePooling function.
In the decoder part, reconstructing the encoded data is implemented using a combination of each UpSampling layer with two stacked convolution layers and repeating until the desired format is reached. The UpSampling layer is simply understood as a way of scaling up of the data using the nearest neighbor algorithm or bilinear interpolation. Here, a size parameter of (2, 2) inside the UpSampling layer has been selected to simply double the dimensions of the input. In accordance with each UpSampling layer, the number of filters in the convolution layers decreases from 256 to 128, 62, and, finally, 32 after reaching the desired size of 100 × 60. At the last convolution layer, the number of filters was set to 1 so that the reconstructed data had the same output size as the input size.
In addition to the construction of the ConvAE model structure, one of the important issues for deep-learning neural network problems is the selection of hyperparameters, such as loss function, optimization algorithm, or the number of epochs for the training process. The recommended parameters in this study have undergone careful evaluation and comparison of performance. The proposed loss function is the mean square error (MSE), which has shown superior performance compared to other loss functions, such as the mean absolute error. Along with the loss function, the Adam optimization algorithm [
53] was considered suitable for this study; it is widely applied in studies of deep-learning applications. Additionally, the ConvAE model has been established to record the necessary information during the training and validation processes. Besides, the recommended number of epochs in the ConvAE model was 5000, with a batch size of 32. Finally, in order for the ConvAE model to be effectively adjusted, the early stopping technique was applied to prevent overfitting problems (if possible) [
54], and the model checkpoint technique was developed to save the model performance information before the model stopped.
3.2. Standard Deviation Method
For the standard deviation method, data were corrected from PERSIANN-CDR products based on Equation (1). All available data for 18 years were divided into two independent datasets. The statistical dataset for 1998 to 2013 (16-year baseline period) was used to calculate the statistical indicators mentioned in Equation (1) of the PERSIANN-CDR and APHRODITE precipitation products. The remaining 2-year dataset (2014–2015) was employed to examine the performance of the statistical method and compare it with the corrected data acquired from the ConvAE model.
Both of the gridded precipitation products had the same dimensions of 100 × 60 after processing, with the total number of cells being 6000. Due to the fact that this study was conducted in the Mekong River basin, 1112 grid cells were counted in the catchment, and other cells outside the catchment were ignored. From the data for a 16-year baseline period, statistical properties such as the mean and the standard deviation corresponding to each month of the two products were calculated. Note that each grid cell has different statistical properties. The basic statistical properties of a grid cell in the Mekong basin are illustrated in
Figure 7.
The two-year independent dataset (2014–2015) was used to evaluate the method performance through the cell-by-cell pairing of the corrected data and observed data, for which the corrected data were calculated using Equation (1) from the PERSIANN-CDR product.
4. Results and Discussion
In this section, an independent dataset (testing dataset) spanning two years (2014–2015) was adopted to evaluate the performance of two methods of correcting the daily precipitation bias from satellite-based products. First, the PERSIANN-CDR data were employed as the input for the models to generate two corrected datasets, which correspond to the two methods mentioned above. Then, these corrected data were used to evaluate the performance of the two methods by comparison with the gauge-based data (APHRODITE).
As for the ConvAE neural network model, before verification using a testing dataset was performed, the model underwent the training and validation process, as described in
Section 3.1. Conducting a validation step is necessary to select the optimal parameters of the model and to prevent overfitting problems often faced when working with neural networks. In this study, we skipped presenting the results of the validation step. Instead, the optimal parameters of the model obtained from the validation step were used to conduct the testing step.
4.1. Temporal Correlation
The performance metric indicators used to evaluate the temporal correlation between the observed and corrected precipitation products over the Mekong River basin during the testing period are the MAD, RMSE, and NSE. The comparison results are depicted in
Table 2 and
Table 3, and
Figure 8 and
Figure 9.
Table 2 provides information on the mean annual precipitation over the Mekong basin corresponding to the rainfall products during the two-year testing period from January 2014 to December 2015. Overall, the satellite-based precipitation shows a tendency to be overestimated compared to the observed data. Over the Mekong River basin, the average annual rainfall based on the observed data (APHRODITE) was only 1068 mm, an amount smaller by about 500 mm than the corresponding amount given by the satellite-based data (PERSIANN-CDR). For the two corrected precipitation products, the ConvAE model exhibits better performance with the standard deviation method; the values of the average annual rainfall for these two products are 1110 mm and 924 mm, respectively.
An opposite trend was witnessed more clearly when the total monthly precipitation obtained from the products was of interest (see
Table 3 and
Figure 8). Comparing correlations between the corrected data series and observed data, the ConvAE model illustrated superior performance compared to the statistical method, with an NSE value of 0.97 and a MAD value of 12.6 mm. The values corresponding to the two indices, NSE and MAD, for the statistical methods were modest at 0.83 and 22.3 mm, respectively. Additionally,
Figure 8 also indicates the uncertainty of the standard deviation method, as the total amount of rainfall adjusted in July 2015 was abnormal compared to that in the remaining months. One of the reasonable causes of the irregularity in the total corrected precipitation could be the satellite-based data.
4.2. Probability Distribution
In addition to comparing the mean annual precipitation correlation of the products over the Mekong River basin, the probability distribution of the rainfall data by grid cell was also considered. The probability density function (PDF) and cumulative distribution function (CDF) are two statistical functions utilized to describe the probability distribution of the total precipitation by grid cells. The probability distribution of the total rainfall in the two-year testing period (2014–2015) is shown in
Figure 9 and
Figure 10.
As can be seen in
Figure 9 and
Figure 10, corrected precipitation data from satellite-based products demonstrate a certain similarity to observed data. For the statistical method, the two-year corrected data reveals that this model continues to exhibit uncertainty, which is more evident in the PDF curves of both 2014 and 2015. By contrast, the ConvAE model continues to illustrate a stable performance not only through the PDF curve but, also, through the CDF curve. With respect to the observed data, the annual precipitation measured in the Mekong River basin was concentrated in the range of 900 mm to 1200 mm, which accounted for nearly 40% in 2014 and approximately 25% in 2015. In contrast to the observed rainfall products, the satellite-based rainfall product illustrated significant differences in both the probability distribution and precipitation intensity. Specifically, the total rainfall measured in 2014 mainly ranged from 1200 mm to 2400 mm (accounting for approximately 79%) and ranged from 1200 mm to 2100 mm (about 78%) in 2015.
As for the two corrected rainfall products,
Figure 9 and
Figure 10 also reveal that the ConvAE model outperforms the statistical method. Although the statistical method achieves notable performance when evaluating the temporal correlation with the NSE value of 0.83 and the RMSE value of 38.4 mm (
Table 3), the probability distribution of this precipitation product shows a low correlation with the observed precipitation. In addition, the adjusted rainfall from the statistical method was mainly in the range of 600 mm to 900 mm for the two-year testing period, accounting for well above 31% for 2014 and nearly 38% for 2015.
In the case of the ConvAE model, the corrected data indicated better agreement with the observed data in terms of the PDF and CDF. The total annual precipitation recorded in the Mekong basin from the ConvAE model had the same probability distribution pattern with the observed data in both years of testing. Moreover, this value chiefly ranged from 900 mm to 1200 mm and accounted for the similarity percentage for both years at about 31%.
In addition, another statistical comparison was also conducted to evaluate the correlation of the annual precipitation per grid cell between the rainfall products. These statistical criteria are presented in the Taylor diagram (
Figure 11).
In the Taylor diagrams [
55,
56], these datasets represent the total annual rainfall of each grid cell across the Mekong basin, corresponding to the precipitation products based on the observed, ConvAE, statistic, and satellite, respectively. It can be seen that the ConvAE model generally outperforms other products, with higher correlation coefficients (about 0.91 for 2014 and 0.84 for 2015) and lower in terms of the RMSD and standard deviations in both years of the testing period. For 2014, the ConvAE model agrees well with the observations, with a standard deviation of 410 mm/year compared to the observed value of 390 mm/year. Meanwhile, the statistical model illustrates poorer performance than the satellite-based product when the evaluation criteria such as the correlation coefficients and RMSD are significantly lower (see
Figure 11a).
In the case of 2015 (
Figure 11b), the Taylor diagram recorded a similar trend as in 2014, where the ConvAE model performed the best performance, while the statistical model depicted uncertainty. The poor performance of the statistical method results from all the statistical values represented in the Taylor diagram, including a correlation coefficient of 0.50, an RMSD value of 400 mm/year, and a standard deviation of 390 mm year. The satellite-based data have a moderate correlation coefficient (only 0.62 compared to 0.84 of the ConvAE model); however, there is less spatial variation than the other two models (with a standard deviation of 350 mm/year compared to the observed value of 420 mm/year).
An overview of the comparison of the temporal correlation and probability distribution has revealed an instability and uncertainty of the statistical method in adjusting the rainfall products in the Mekong River basin. Although the mean annual rainfall across the basin in the two-year testing was 924 mm/year, which is close to the observed value of 1068 mm/year (see
Table 2 and
Table 3), the annual rainfall per grid cell exhibited an opposite trend (see
Figure 11). On the other hand, a stable high performance was noted in the case of the ConvAE model in both comparisons conducted above.
4.3. Spatial Correlation
In addition to taking into account the temporal correlation, a comparison of the spatial correlation between the corrected precipitation data and observed data was also conducted to evaluate the effectiveness of the two bias corrective methods. The spatial correlation between the precipitation products was assessed by comparing the average pixel-by-pixel differences (by RMSE, MAD, and bias values) and correlation index (Corr). The spatial distribution pattern of the precipitation products is illustrated in
Figure 12,
Figure 13 and
Figure 14. The comparative results in the two-year testing dataset are summarized in
Table 4.
With the visualization of the gridded products, the spatial distribution of the annual precipitation could be clearly identified in
Figure 12 and
Figure 13. In general, there were significant differences between the precipitation products and the uneven annual rainfall distribution over the Mekong River basin, ranging from roughly 250 mm to well above 2250 mm. Moreover, the LMB received much higher average annual precipitation than the UMB. The recorded information from the observed data revealed that the North-Central of Lao PDR and the eastern mountainous areas bordering Vietnam are the places receiving the largest rainfall of the year (more than 2000 mm).
A similar pattern of rainfall distribution was also noted in the case of the monthly precipitation.
Figure 14 illustrates the spatial distribution of the precipitation in August 2014, which is one of the months experiencing the largest precipitation of the year in the Mekong basin. The visualized images again obviously illustrate that satellite-based precipitation products are overestimated in terms of the annual precipitation and monthly precipitation, especially in the LMB. With respect to the two corrected rainfall products, the spatial distribution patterns point out two opposite trends. While the ConvAE model proved a close relationship with the observed rainfall data, the adjusted precipitation from the statistical method demonstrated the opposite trend.
Table 4 provides quantitative information on the differences between the precipitation products.
As can be seen in
Table 4, the figures again indicate the effectiveness of the ConvAE model in both verification years. The correlation coefficient (Corr) value of the ConvAE model that measures the agreement with observed data in the spatial distribution by pixel-by-pixel was 0.91 and 0.84 for 2014 and 2015, respectively. In addition, other indicators of the ConvAE model—For example, the RMSE of 174 mm, MAD of 134 mm, and bias of 39 mm in 2014—Also demonstrated the smallest pixel-by-pixel difference. For satellite-based precipitation, the overestimation was clearly evident from the MAD and bias indicators (where bias was a positive value). Moreover, the average of the annual rainfall difference with the observed data over the Mekong River basin had a large gap, an amount of 574 mm for 2014 and 448 mm for 2015. However, the satellite-based precipitation products achieved remarkable spatial correlation. The correlation values in the two years of the testing period were 0.61 and 0.63, respectively, which were higher than those given by the statistical methods and smaller than the ConvAE model.
Another important fact was also identified in the case of corrected data from the statistical method. In spite of indicating an impressive temporal correlation compared to the observed data with an NSE value of 0.83 (see
Table 3),
Table 4 reveals the uncertainty of the statistical method in the spatial distribution, as well as spatial correlation. The correlation values for this method were only 0.32 for 2014 and 0.46 for 2015, which were the lowest values out of the three products mentioned in
Table 4. Besides, the bias value is a negative number, which means that the average of the annual precipitation by grid cell of the statistical method is smaller than the observed data, an amount corresponding to −61 mm in 2014 and −226 mm in 2015.
4.4. Spatial Bias Correlation
Finally, pixel-by-pixel precipitation differences between the corrected products and observed data are also of interest and have been visualized in
Figure 15,
Figure 16 and
Figure 17.
The spatial bias distribution of the precipitation products is obtained by comparing pixel-by-pixel between the precipitation products and observation data and then calculating the difference of each pair of these pixels. Positive values of the pixels simply implied that the compared precipitation was higher than the observed precipitation, and so on. In order to clearly illustrate the pixel-by-pixel differences between the compared precipitation products and observed data, a fixed scale was applied to visualize the results. This scale ranged from −1000 mm to 1000 mm for the annual rainfall bias (
Figure 15 and
Figure 16) and ranged from −200 mm to 200 mm for the monthly rainfall bias (
Figure 17).
Overall, the ConvAE model demonstrated the lowest bias distribution among the three products described. The satellite-based precipitation again evidently expressed overestimation, especially in the LMB, where the pixel-by-pixel bias of this product was mostly positive, with a difference of more than 1000 mm noted for the annual rainfall. Meanwhile, the instability and uncertainty were recorded in the case of the statistical methods in terms of both the annual rainfall and monthly rainfall. The precipitation spatial bias pattern of this method depicted the considerable differences between the pixels over the Mekong River basin. It can be seen that, despite the facts of the bias value, the average of the pixel-by-pixel difference was not high (see
Table 4), with an amount of 37 mm for 2014 and −61 mm for 2015; the spatial bias of the precipitation fluctuated sharply.
In the case of the ConvAE model, there was a satisfactory agreement between the adjusted precipitation and the observed data in the testing phase of the two years. Furthermore, the spatial distribution pattern of the precipitation bias gave information on the pixel-by-pixel difference of the ConvAE model as being negligible compared to the statistical method or satellite-based product. However, the precipitation data of 2015 showed an anomaly of a grid cell, where the adjusted data was much smaller than the observed data (see
Figure 16a). This was also the location that recorded unusually heavy rainfall in 2015 (more than 2250 mm) compared to the other precipitation products (see
Figure 13). The cause of this anomaly may be the observed data.
5. Conclusions
This paper proposes an effective approach based on the CNN model, called the ConvAE model, to address the problem of daily gridded precipitation bias correction from satellite-derived precipitation data. In addition to the ConvAE model, another bias correction method based on the statistical method, called the standard deviation method, was also introduced in this study. The performance of the bias correction methods was carefully evaluated by comparing the corrected data with observed data in terms of both the temporal correlation and spatial correlation. The Mekong River basin was selected as the case study area, because it is one of the largest river basins in the world, covering six countries (most of which are developing countries). Therefore, reliable information on the precipitation over the Mekong River basin is valuable in forecasting extreme events such as floods or droughts.
With respect to the standard deviation method, the adjusted precipitation indicated a noticeable result in the temporal correlation. However, this model has revealed instability and uncertainty in terms of the probability distribution, spatial correlation, and spatial bias distribution of precipitation. In contrast to the standard deviation method, the ConvAE model demonstrated superior and more stable performances in most comparisons conducted in this study. Moreover, the precipitation spatial distribution patterns illustrated the outstanding performance of the ConvAE model compared to the standard deviation method in describing the spatial relationships between adjacent grid cells. This could be explained by the ConvAE model constructed on the idea of the CNN model, which has proved very effective in the field of computer vision. Meanwhile, the standard deviation method considers pixels as independent values and does not take into account the spatial relationship of precipitation. Another advantage of the ConvAE model is the ability to capture extreme rainfall events and rainfall distribution trends due to the design of architectural layers inside the ConvAE model, i.e., the convolutional and pooling layers.
Despite the fact that the precipitation bias correction problem was effectively solved by the ConvAE model, some limitations need to be considered. The results of this study depend closely on the gridded precipitation data sources. In particular, PERSIANN-CDR is exploited as a satellite-derived precipitation dataset, and APHRODITE is considered as an observed precipitation dataset. Both of these gridded daily precipitation products have the same spatial resolution of 0.25°. APHRODITE is the gridded precipitation product of the international cooperation program; therefore, they are closely related to the data sources provided by the countries in the region of interest. Moreover, precipitation data are usually used in conjunction with hydrological models to simulate rainfall–runoff processes for a specific basin. This is a limitation of this study as a result of the rainfall-runoff process, which has not been illustrated.
For hydrological studies in large areas, such as the Mekong basin, which spans many countries, updating the rainfall data continuously is an important requirement to ensure an accurate rainfall-runoff process simulation. However, it is difficult to construct updated rainfall datasets at the same time because of the close reliance on data collection and the distribution methods of the countries involved. On the other hand, satellite-based precipitation data with various products, availability, and coverage of a large area may be a good suggestion for large study basins, if these data are well-calibrated both spatially and temporally by the proposed technique.
This study is the first step towards enhancing our understanding of the application of deep-learning neural network models to hydrological-related problems. The findings of this study highlighted the potential of the ConvAE model in the daily precipitation bias correction problem. In the context of the APHRODITE project being paused (from 2015), the corrected data source from the ConvAE model promises to be a reliable alternative data source. Furthermore, the ConvAE model could be applied to other satellite-based precipitation products, higher-resolution precipitation data (for example, the spatial resolution of 0.05° and radar data), or other problems related to gridded data.