1. Introduction
Fossil fuels (coal, petroleum, and natural gas) encompassed 81% of the total primary energy supply in 2016 [
1]. These energy sources are characterized by their limited reserves and for producing greenhouse gases (GHG), which cause, as consequences, the increase of the average temperature and extreme changes in the climate, among others. Climate change stabilization requires us to impose limitations on GHG emissions from fossil fuel combustion, thus, to try to minimize the effects of climate change, governments of different countries have accepted and ratified agreements, including an energy transition from fossil fuels to renewable energies [
2].
One of the least widespread renewable energies in the world is geothermal [
3], which uses the Earth’s internal heat to generate heat in buildings and to generate energy in geothermal plants. Geothermal energy is defined as the energy stored in the form of heat beneath the surface of solid Earth [
4]. Among the different types of geothermal energy: very low, low, high, and very high enthalpy, the present research is focused in low or very-low enthalpy types, also known as shallow types [
5]. This energy is characterized by being partially renewable, having high availability, reducing GHG emissions [
6], and producing less waste than other energy sources.
The detection of areas with geothermal potential was initially carried out by means of perforations to measure the temperature in the subsoil. Despite the fact that this method is accurate, it relies on the estimation of the geothermal reservoir parameters (thickness, porosity, permeability, and temperature) to estimate the geothermal potential [
7]. Another factor to consider are the significant costs for the drilling operations. For this reason, different researchers have tried to develop methodologies to reduce costs in the search for these areas, taking into great consideration the use of remote sensing as a means to prioritize zones of potential geothermal energy [
8,
9], due to the availability of free satellite imagery and its cost-effective ratio for the geothermal exploration of large areas. One approach that involves remote sensing is the search for areas of rocks altered by hydrological action using images from one or more sensors, since they are indicative of thermal fluids being discharged along faults [
10]. Surface thermal properties are an indicator of geothermal resources, which can be measured by the use of images from thermal sensors [
11,
12]. Additionally, temperatures can also be obtained using multispectral imagery [
13] acquiring land surface temperature (LST) values.
Among the geomatic techniques, geographic information systems (GIS) offer a whole new dimensionality by combining the remote sensing data with other geographic information sources as in [
14], in which gravimetric anomalies, thermal images, and the location of geological faults were employed. Among the available GIS techniques, the use of random forest classifiers stands out, since they allow the ability to extract information from categorical and numerical variables. As a result, they support the decision-taking process without significant computational requirements. These algorithms are popular among the scientific community due to its ease of use and being able to be applied in multiple fields as in the case of [
15], in which they were used to obtain a model of the demand of electricity. Another transversal example is the application to the solar energy field [
16], in which authors employed classifiers such as random forest, extra trees, and support vector regression to carry out a comparison among them and to evaluate which hour range is more useful for a solar thermal system.
The present article proposes a combination of geomatic techniques along with different free/open sources to delimit potential geothermal areas potential in a simple and non-invasive way. The rest of the paper is organized as follows:
Section 2 describes the methodology of the developed approach and the materials employed. Prediction results are presented in
Section 3, along with the validation of the classification. Discussion is detailed in
Section 4, whereas concluding remarks and future research directions are presented at the end of the paper in
Section 5.
4. Discussion
The result of the classification, as can be seen in
Figure 11a, shows that areas with potential greater than 80 W·m
−1 (highlighted in red) are found mainly in the southeast and east of the study area. Potential zones between 35 and 50 W·m
−1, as expected, are located near the potential area, greater than 80 W·m
−1, located in the east, in addition to a small area located near the center of the study area. The surface of each geothermal class is summed up in
Table 4.
As can be seen in
Figure 11b, there are a large number of common areas between the results obtained from the random forest classification and the IRENA map. Besides, it can be seen that the map obtained from the proposed classifier, the area related to a geothermal potential of 40–60 W·m
−1 is larger than the stated by ground truth, which categorizes as 60–80 W·m
−1. Comparing this area with the Bouguer gravimetric anomalies map we can notice that the area of potential greater than 80 W·m
−1 is located where the gravimetric anomalies are at its lowest, meaning that there is a relationship between them, as it was described in several works like [
14,
58,
59].
Paying attention to the geological map (
Figure 7) and the geothermal potential map (
Figure 10) there is a clear resemblance, as some areas look similar in both maps. For example, the area of potential greater than 80 W·m
−1 located in the bottom east of the study area occupies the same area in the geological map as the loose sediments. Another example that stands out more is the case of the area of potential between 60 and 80 W·m
−1, which occupies the same area as the plutonic or intrusive igneous rocks. This relationship between some types of rocks and sediments is not unfamiliar as some other researchers have noticed this relationship before [
10,
12].
As for the geological faults map and the LST map, it is difficult to associate them with the geothermal potential map directly. In the case of the geological faults map (
Figure 8), both known and unknown faults are distributed all over the study area in areas occupied by different types of geothermal potential classes. In the case of the LST map (
Figure 6) this is due to the fact that, as it was said before, the areas with the highest temperatures in the map correspond with areas of infrastructures, whereas in other areas the temperature variation is not enough to establish comparisons. The relationship between temperature in the subsoil and the surface using remote sensing techniques has been a goal of many works like [
14,
60] making it a reliable technique to apply in a geothermal study.
To refine the areas related to geothermal reservoirs, the land cover map was used to take into account those areas that cannot be exploited because they are occupied by infrastructures (roads, buildings, etc.) water and vegetation zones, and areas without reliable data, as the zones affected by the cloud cover during the image acquisition. As can be seen in
Figure 12 and in
Table 5, the number of potential zones has been reduced to almost half for the classes 35–50 W·m
−1 and 40–60 W·m
−1.
To assess the influence of the various elements of the presented workflow in the detecting ability of areas of geothermal potential, a sensitivity study was carried out. It aimed to highlight the importance the various features (see
Figure 1) in detecting locations with the RF classifier.
It was considered a total of six combinations with three elements, namely: geological map, faults, and gravimetric anomalies. The land cover and LST map are included always in the assessment. A RF classification with a depth = 8 and number of trees = 100 trees was applied with a
k-fold cross-validation with
k = 10. In
Table 6, the overall accuracy, the Kappa index, and the area under the ROC curve (AUC) for the six combinations, as well as the reference are reported.
As it is shown in the previous table, there is a common behavior for the three variables considered. The faults’ influence zone is the class with the lowest influence in the final results, lowering the Kappa index about 5.4% when it is absent. However, when the geological map or the gravimetric anomalies do not participate in the classification the Kappa index worsens between 14.2% and 16.0% respectively. In the last three rows of
Table 6 the effect is shown when only three variables are considered: The land cover and LST map plus the highlighted variable. First of all, the results when only the faults are taken into account are very significant, yielding results similar to a random classification (Kappa = 0, and AUC = 0.5). Secondly, the geological map plays the most significant role, with a decrease of the Kappa index of only 16.6%, whereas the AUC remains approximately 90%. Finally, the inclusion of the gravimetric anomalies is not sufficient for an adequate classification, as stated by the low Kappa index (56%).
In order to assess the ability of the RF algorithm to forecast the geothermal potential, the proposed methodology is tested, training the RF classifier with a non-random subset of the training polygons. Next, the overall accuracy of the rest of the ground truth information, which has stayed completely unknown to the RF during training, is quantified. So, the RF is trained using only the training polygons located in the southeast quadrant of the study area, where the four classes of geothermal potential are present. A total of 171 polygons are considered, namely 30% of the original set. The RF configuration remains the same: depth = 8, number of trees = 100, and a
k-fold validation with k = 10. The RF classification for the cross-validation achieved a Kappa index of 86.4%, an overall accuracy of 92.7%, and an AUC equal to 98.6%. These values are pretty similar to those obtained for the complete study area (see previous section). The subset RF classification is applied to the southeast quadrant of 529.8 km
2 (23.8% of the total area of the study region). Next, the trained classifier is applied to areas progressively larger up to the complete study region (2227.31 km
2). The overall accuracy and Kappa index for the different areas is evaluated in relation to the original RF classification (
Figure 11a) as shown in
Table 7.
As shown in
Table 7, the results of the subset RF classification diverges from the reference classification (
Figure 11a) proportionally to the distance from the training area. It is significant that the Kappa index decrease abruptly when the tested region is about 2.8 times the trained area and then the values decrease with a soft trend. For predictions up to 2.1 times the original training area, the proposed methodology is able to achieve an adequate classification with a Kappa > 75% [
61,
62].
As future lines of improvement, the enhancement of the initial dataset, as for example the generation of gravimetric anomalies by field observations, for a fine detection of geothermal potential zones, is recommended. Another improvement would be a geological survey of the study area, which would imply an enrichment of the geological classes as well as the reclassified informational classes. However, the potential improvement of the final precision will imply a significant increase in the cost of the approach; by these reasons, the present research, based on open/free data sources, provides an optimal baseline according to the authors.
The addition of data related to aquifer zones would allow obtaining the areas suitable for the exploitation of geothermal energy through vertical circuits, which take advantage of geothermal energy by taking and reinjecting water from aquifers. The thermal data would be improved by satellite night thermal images that would allow the ability to compare with the day images, and to derive thermal inertia maps.
Finally, it would be of interest to carry out geothermal drill tests in the areas identified with great geothermal potential, in addition to several samples in other areas spread over the study area, to validate the technique and the results obtained.