Tutorial For Merging Satellite-Based Precipitation Datasets With Ground Observations Using Rfmerge
Tutorial For Merging Satellite-Based Precipitation Datasets With Ground Observations Using Rfmerge
Tutorial For Merging Satellite-Based Precipitation Datasets With Ground Observations Using Rfmerge
21 May 2020
1 About
This vignette describes a basic application of the RFmerge function to create an improved precipitation
dataset by combining two different satellite-based precipitation products with ground-based observations and
user-selected covariates (i.e., digital elevation model and Euclidean distances to rain gauge stations).
We use as case study Valparaiso (Chile), as an example of how to generate this product at daily temporal
scale and at 0.05◦ spatial resolution, from January to August 1983. This example requires the following data
from the user:
i) time series of rainfall observations,
ii) metadata describinf the spatial coordiantes of the rain gauges,
iii) the Climate Hazards Group InfraRed Precipitation with Station data version 2.0 (CHIRPSv2),
iv) the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks -
Climate Data Record (PERSIANN-CDR), and
v) the Shuttle Radar Topography Mission (SRTM-v4) digital elevation model (DEM).
In addition, Euclidean distances are also used as covariates, but they are automatically computed within
RFmerge as the Eucalidean distances from each rain gauge station to every grid-cell within the study area.
2 Installation
Install the latest stable version (from CRAN):
install.packages("RFmerge")
Alternatively, you can also try the under-development version from Github:
if (!require(devtools)) install.packages("devtools")
library(devtools)
install_github("hzambran/RFmerge")
∗ [email protected]
† [email protected]
1
3 Setting up the environment
1. Load other packages that will be used in this analysis:
library(zoo)
library(sf)
library(rgdal)
library(raster)
2. Load the RFmerge package, which contains the main function used in the analysis and required datasets:
library(RFmerge)
Secondly, we need to load the satellit-based precipitation datasets and other covariates. For this example,
CHIRPSv2 (Funk et al., 2015) and PERSIANN-CDR (Ashouri et al.,2015), at a spatial resolution of 0.05◦ , are
used as dynamic covariates (in this context, dynamic means time-varying covariates). The Digital Elevation
Model SRTM-v4 (DEM), also with a spatial resolution of 0.05◦ , is used as a static covariate to account for the
impact of elevation on precipitation(in this context, static means that this covariate does not change in time).
chirps.fname <- system.file("extdata/CHIRPS5km.tif",package="RFmerge")
prsnncdr.fname <- system.file("extdata/PERSIANNcdr5km.tif",package="RFmerge")
dem.fname <- system.file("extdata/ValparaisoDEM5km.tif",package="RFmerge")
Then, we want to visualise the first six rows of the spatial metadata:
2
head(ValparaisoPPgis)
40
20
0
Index
Plotting the accumulated precipitation estimates for the first eight months of 1983 from CHIRPS and
PERSIANN-CDR, and overlying the boundaries of the study area (only its first attribute):
chirps.total <- sum(CHIRPS5km, na.rm= FALSE)
persiann.total <- sum(PERSIANNcdr5km, na.rm= FALSE)
plot(chirps.total, main = "CHIRPSv2 [Jan - Aug] ", xlab = "Longitude", ylab = "Latitude")
plot(ValparaisoSHP[1], add=TRUE, col="transparent")
3
CHIRPSv2 [Jan − Aug]
−32.5
800
Latitude 600
400
−33.5 200
Longitude
plot(persiann.total, main = "PERSIANN-CDR [Jan - Aug]", xlab = "Longitude", ylab = "Latitude")
plot(ValparaisoSHP[1], add=TRUE, col="transparent")
500
400
300
200
4
6 Preparing input data
In order to use the spatial information stored in ValparaisoPPgis, we first need to convert it into a
SpatialPointsDataFrame, using the latitude and longitude fields, stored in the lat and lon columns:
stations <- ValparaisoPPgis
( stations <- st_as_sf(stations, coords = c('lon', 'lat'), crs = 4326) )
5
SRTM−v4
−32.5
5000
4000
Latitude 3000
2000
−33.5 1000
Longitude
Note that for this example we want to produce a merged precipitation product only from January 1st to
August 31th in 1983, i.e., 243 days; therefore, the precipitation products used as covariates have 243 layers
each (one for each day in the time period used ofr the analysis), which is the same number of rows of the
ValparaisoPPts object.
nlayers(CHIRPS5km)
## [1] 243
( nlayers(CHIRPS5km) == nlayers(PERSIANNcdr5km) )
## [1] TRUE
( nlayers(CHIRPS5km) == nrow(ValparaisoPPts) )
## [1] TRUE
Now, we have to verify that the precipitation products and the DEM have the same spatial extent:
extent(CHIRPS5km)
## class : Extent
## xmin : -71.85
## xmax : -69.95
## ymin : -34
## ymax : -32
( extent(CHIRPS5km) == extent(PERSIANNcdr5km) )
## [1] TRUE
( extent(CHIRPS5km) == extent(ValparaisoDEM5km) )
## [1] TRUE
6
and the same spatial resolution:
res(CHIRPS5km)
If you would like to test the use of Euclidean distances as covariates in RFmerge you need to be sure that
Euclidean distances will be correctly computed from your datasets.
Because ValparaisoPPgis, CHIRPS5km, and PERSIANNcdr5km all use geographical coordinates, we need
first to project their values into a projected coordinate reference system (CRS), i.e., one defined on a flat,
two-dimensional surface.
First, we reproject the rainfall observations from geographic coordinates into WGS 84 / UTM zone 19S
(EPSG:32719):
stations.utm <- sf::st_transform(stations, crs=32719) # for 'sf' objects
Second, we reproject the satellite products from geographic coordinates into WGS 84 / UTM zone 19S
(EPSG:32719):
#utmz19s.p4s <- CRS("+init=epsg:32719") # WGS 84 / UTM zone 19S
utmz19s.p4s <- sf::st_crs(stations.utm)$proj4string # WGS 84 / UTM zone 19S
Third, we reproject the polygon with the boundaries used to define the study area from geographic coordinates
into WGS 84 / UTM zone 19S (EPSG:32719):
ValparaisoSHP.utm <- sf::st_transform(ValparaisoSHP, crs=32719)
Fourth, we create a new data.frame with the expected metadata, i.e., at least, ID, lat, lon:
st.coords <- st_coordinates(stations.utm)
lon <- st.coords[, "X"]
lat <- st.coords[, "Y"]
You might want to skip (at your own risk) this reprojection step when the study area is small enough to
neglect the impact of using geographic coordinates in the computation of Euclidean distances.
7
7 Running RFmerge
7.1 Covariates
Now, we can create the covariates object to be used in the RFmerge function. For doing this, we will create a
list object with the different covariates. Please note that the order and name of the covariates in the
list is not important.
covariates.utm <- list(chirps=CHIRPS5km.utm, persianncdr=PERSIANNcdr5km.utm,
dem=ValparaisoDEM5km.utm)
7.2 Setup
Finally, if you want the resulting merged files be written into disk you need to define the output directory
(drty.out) before running RFmerge. Then, you can run the RFmerge function as follows:
Without using parallelisation (default option):
drty.out <- file.path(tempdir(), "Test.nop")
rfmep <- RFmerge(x=ValparaisoPPts, metadata=ValparaisoPPgis.utm, cov=covariates.utm,
id="ID", lat="lat", lon="lon", mask=ValparaisoSHP.utm,
training=0.8, write2disk=TRUE, drty.out=drty.out)
Detecting if your OS is Windows or GNU/Linux, and setting the ‘parallel’ argument accordingly:
onWin <- ( (R.version$os=="mingw32") | (R.version$os=="mingw64") )
ifelse(onWin, parallel <- "parallelWin", parallel <- "parallel")
## [1] "parallel"
Using parallelisation, with a maximum number of nodes/cores to be used equal to 2:
par.nnodes <- min(parallel::detectCores()-1, 2)
drty.out <- file.path(tempdir(), "Test.par")
rfmep <- RFmerge(x=ValparaisoPPts, metadata=ValparaisoPPgis.utm, cov=covariates.utm,
id="ID", lat="lat", lon="lon", mask=ValparaisoSHP.utm,
training=0.8, write2disk=TRUE, drty.out=drty.out,
parallel=parallel, par.nnodes=par.nnodes)
8
7.3 Expected outputs
If RFmerge run without problems, and write2disk=TRUE, the following output files will be stored in your
user-defined drty.out directory:
i) the rain gauge stations used as training and evaluation datasets; and
ii) the final merged product (individual GeoTiff files).
The aforementioned resulting objects will be stored within drty.out as follows:
• Ground_based_data/Training/: In this directory, time series and metadata used to train RF-MEP will
be stored as a zoo file (Training_ts.txt), and a text file (Training_metadata.txt), respectively.
• Ground_based_data/Evaluation/: This directory will store time series and metadata not used in
training the Random Forest model, but that are available for evaluating the performance of the results
in an independent evaluation dataset. The Evaluation_ts.txt and Evaluation_metadata.tx files
are stored as zoo and CSV files, respectively)
• RF-MEP/: This directory will store the individual GeoTiff files produced by the RF-MEP algorithmn,
using the same spatial resolution as the selected covariates.
7.4 Evaluation
After running RFmerge we will use the evaluation dataset of rain gauge observations to evaluate the performance
of RF-MEP at observation not used in the merging procedure. For this purpose, we will create a stack with
the obtained merged product and will import the time series and metadata from the evaluation set:
ts.path <- paste0(drty.out, "/Ground_based_data/Evaluation/Evaluation_ts.txt")
metadata.path <- paste0(drty.out, "/Ground_based_data/Evaluation/Evaluation_metadata.txt")
Visualisation of one day of precipitation using RF-MEP, and overlying the boundaries of the study area (only
its first attribute)::
plot(rfmep[[11]], main="RF-MEP precipitation for 1983-01-11", xlab="Longitude", ylab="Latitude")
plot(ValparaisoSHP.utm[1], add=TRUE, col="transparent")
plot(eval.gis.utm, add=TRUE, pch = 16, col="black")
The total amount of precipitation over Valparaiso for January to August 1983 according to RF-MEP, and
overlying the boundaries of the study area (only its first attribute)::
rfmep.total <- sum(rfmep, na.rm= FALSE)
First, we will compare RF-MEP (and the two precipitation products used as covariates) with the rain gauge
data from the evaluation set. To extract the RF-MEP precipitation values at the gauge locations, we will use
the extract function from the raster package:
coordinates(eval.gis) <- ~ lon + lat
9
chirps.ts <- t(raster::extract(CHIRPS5km.utm, eval.gis))
persiann.ts <- t(raster::extract(PERSIANNcdr5km.utm, eval.gis))
To evaluate the performance of RF-MEP and the products used in its computation, the NAsh-Sutcliffe
efficiency (NSE) will be used. The optimal value for the NSE is one.
# Defining a function to compute the Nash-Sutcliffe efficiency(NSE)
NSE <- function(sim, obs) return( 1 - (sum((obs - sim)^2)/ (sum((obs - mean(obs))^2)) ) )
# Computing the NSE between the observed rainfall measured in each one of the raingauges
# of the training dataset and CHIRPSv2, PERSIANN-CDR, the merged product `rfmep`:
for (i in 1:nsres) {
ldates <- time(eval.ts)
lsim <- zoo(sres[[i]], ldates)
nse.table[, (i+1)] = NSE(sim= lsim, obs= eval.ts)
} # FOR end
Finally, a boxplot comparing the performance, in terms of KGE’, of the merged product in comparison to the
original SREs used as covariates will be produced:
# Boxplot with a graphical comparison
sres.cols <- c("powderblue", "palegoldenrod", "mediumseagreen")
boxplot(nse.table[,2:4], main = "NSE evaluation for Jan - Aug 1983",
xlab = "P products", ylab = "NSE'", ylim = c(0, 1), # horizontal=TRUE,
col=sres.cols)
legend("topleft", legend=c("CHIRPS", "PERSIANN-CDR", "RF-MEP"), col=sres.cols, pch=15, cex=1.5, bty="n")
grid()
8 Full vignette
In order to reduce the package dependencies for CRAN, this vignette was built with reduced functionality.
The full vignette can be found here.
9 Software details
This tutorial was built under:
## [1] "x86_64-pc-linux-gnu (64-bit)"
## [1] "R version 4.0.0 (2020-04-24)"
## [1] "RFmerge 0.1-10"
10
10 References
1. Ashouri, H., Hsu, K.-L., Sorooshian, S., Braithwaite, D. K., Knapp, K. R., Cecil, L. D., Nelson, B. R.,
and Prat, O. P. (2015). PERSIANN-CDR: Daily Precipitation Climate Data Record from Multisatellite
Observations for Hydrological and Climate Studies, Bulletin of the American Meteorological Society,
96, 69–83, doi:10.1175/BAMS-D-13-00068.1.
2. Baez-Villanueva, O. M.; Zambrano-Bigiarini, M.; Beck, H.; McNamara, I.; Ribbe, L.; Nauditt, A.;
Birkel, C.; Verbist, K.; Giraldo-Osorio, J.D.; Thinh, N.X. (2020). RF-MEP: a novel Random Forest
method for merging gridded precipitation products and ground-based measurements, Remote Sensing
of Environment, 239, 111610. doi:10.1016/j.rse.2019.111606.
3. Funk, C., Peterson, P., Landsfeld, M., Pedreros, D., Verdin, J., Shukla, S., Husak, G., Rowland, J.,
Harrison, L., Hoell, A., and Michaelsen, J. (2015) The climate hazards infrared precipitation with stations-
a new environmental record for monitoring extremes, Sci Data, 2, 150 066, doi:10.1038/sdata.2015.66.
4. Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B., & Gr"{a}ler, B. (2018). Random forest as
a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ, 6, e5518.
11