1.2. Tropospheric Ozone
Tropospheric ozone, also called environmental ozone, is a colorless gas that is mainly created through photochemical reactions of nitrogen oxides (NOx) and volatile organic compounds (VOCs) [
10]. Additionally, soil and plants emit NOx and VOCs (biogenic sources) [
11].
O
3 is formed through reactions caused by sunlight. The phenomenon by which precursors are emitted and accumulated through reactions necessary for the formation of ozone to occur is called photochemical smog [
12]. Smog episodes are the main production process for tropospheric ozone [
13]. It is the most important oxidant present in the lower levels of the atmosphere, making it a potentially dangerous compound due to its ability to react with most compounds, degrading them. In addition, this affects both materials and living things that are affected by its action, externally but also internally, in the gas exchange that takes place when breathing, which is the main way humans are affected by ozone pollution. The response also varies greatly between individuals for genetic reasons (antioxidant capacity of cells), age (children and the elderly are the most sensitive groups), and the presence of respiratory conditions (allergies and asthma) [
14].
When the ozone concentration is high, it is not advisable to do physical exercise, especially in the central hours of the day, since it increases the breathing rate and increases the entry of ozone into the lungs [
11,
14]. In regards to vegetation, a high level of ozone damages the leaves, reduces plant growth, and leads to a lower crop yield [
11,
14]. In the vicinity of the sources (cities, roads, and industries), fresh emissions can react with ozone and locally reduce its concentration; nitric oxide emitted into the air locally by automobiles removes ozone carried from other parts [
3]:
where NO is nitric oxide and NO
2 is nitrogen dioxide. However, at a certain distance, the photochemical formation of O
3 is reactivated, which is why the concentrations of O
3 are normally low in industrial centers and urban areas. On the other hand, in rural areas and in the outskirts of urban areas, the concentration is higher [
11].
The photochemical formation of ozone does not occur during the night, because there is no sunlight [
11]. In addition, during the day, the maximums of O
3 concentrations occur from noon, when the radiation is highest; regarding the time of year, the maximums occur during the spring and summer months.
The World Health Organization [
15] has established that when the ozone concentration in the air that is breathed is higher than
and is maintained for more than 8 h, there is a probability of significant health effects [
16].
To synthesize ozone, a machine called an ozone generator or ozonizer is used, in which a high electrical voltage known as the corona effect is generated, which produces ozone from O
2 [
17]. The ozone generation has application in the elimination of bad odors and disinfection of the air, in the treatment and purification of water, and in ozone therapy.
What has been mentioned in the previous paragraphs provides a justification for carrying out a robust analysis [
18,
19,
20] of the information given by ozone measuring systems, so that we can be able to describe with high precision the behavior of ozone concentration in cities, even when there are a large number of extreme observations and the data under study follow heavy-tailed distributions [
19,
21]. This last aspect has become a bottleneck because many times the outliers represent the response of the physical system to certain types of inputs. Therefore, these values are carriers of useful information and should not be eliminated, but must be treated appropriately in order to understand what they are telling us about the system from which they come. It is for this reason that the robust data analysis gains strength and shows a highly precise way to explain the behaviors of air pollution variables, even when they come from heavy-tailed distributions with a large number of outliers.
The aforementioned, applied to the analysis of ozone concentration in urban areas, constitutes one of the contributions of the research presented in this paper.
The robust analysis of the concentration of ozone at Belisario air quality monitoring station, which is in Quito, Ecuador, is the foremost objective of this research. This analysis is carried out from 1 January 2008 to 31 December 2019 [
22,
23].
Quito was chosen because of the effect of this city’s vehicular traffic, poor fuel quality, and traffic jams on air pollution levels. Furthermore, there are other factors in the city that could increase air pollution levels, such as industrial zones and population growth [
23].
The design of a sensor-based system aimed at measuring and georeferencing atmospheric variables, including ozone, is presented in [
24]. In addition, a low-cost air quality monitoring station was presented in [
25], where the authors proposed a calibration procedure based on artificial intelligence techniques. Furthermore, another air quality monitoring system was presented in [
26], which employed a Zigbee network to improve the network layout. Additionally, a long short-term memory (LSTM) network was used in [
26] to predict the urban air quality pollution period. Moreover, O
3 and seven other indicators were used in [
26] to qualitatively validate the experiential knowledge of the authors; to allow cells to have long-term memory, the hidden layer of recurrent neural network cells was replaced by LSTM cells.
A meta-analysis of ground-level ozone pollution was carried out in [
27]. This analysis was performed to find the similarities and differences between the risk of asthma aggravation and ground-level ozone exposure measurements. In [
27], three different time periods (i.e., 1-h daily maximum, 8-h daily maximum, and 24-h average concentrations) were used. In addition, the meta-analysis performed in [
27] was carried out using pooled relative risks, 95% confidence intervals, and random-effects models [
28]. Additionally, in order to achieve robustness in the results, subgroup and sensitivity analyses were conducted in [
28]. Moreover, the Cochrane Q statistic and I
2 estimation [
29] were used to measure heterogeneity and inconsistency in the meta-analysis. Furthermore, in order to test whether there was publication bias, the authors made use of a funnel plot and Egger’s test [
30].
In [
31], a study was carried out in 48 cities of China from 2013 to 2017. In that study, time-series analyses were conducted and the authors used a generalized model combined with a random effect model to estimate ozone levels. Another time-series study was performed in [
32] in 184 cities of China from 2014 to 2017. Additionally, in [
32], the relationship between patients due to pneumonia and ozone concentration was found by using a generalized additive model, and the authors provided robust proof of the existence of a relationship. The robustness of the relation was tested by fitting two-pollutant models.
A 10-year study on the relationship between particulate matter and ozone exposure (PM and OE), and a depression and anxiety diagnosis (DAD) in Saxony, Germany, was conducted in [
33]. In that research, the data used for the analysis corresponded to the information collected from 2005 to 2014. In [
33], the analysis was performed by using generalized estimating equations [
34], and the robust metric was the number of days for which the maximum value of the 8-h average ozone concentration was greater than
. Additionally, two-pollutant models were built and a sensitivity analysis was carried out, aimed at studying the relationship between PM and OE and DAD.
The reasons why there was a high ozone concentration in Chengdu, China in July 2017 were studied in [
35]. In order to perform such a study, both measurements and air quality simulations were used. Additionally, in order to identify the VOC sources and perform the quantification of these sources, positive matrix factorization [
36,
37,
38] was used. Moreover, in [
35], the impact of physical and chemical processes on ozone concentrations was analyzed by the integrated process rate method [
39,
40,
41].
In various situations in which it is intended to study the behavior of air pollution variables, researchers face the problem that these variables either do not follow a Gaussian distribution or do not follow any known parametric distribution. Therefore, classical statistical inference methods cannot be used for data analysis and nonparametric statistical inference must be used. For example, both Mann–Whitney U and Kruskal–Wallis tests [
42,
43] were applied in [
44] to analyze vehicle emissions. In [
44], 1000 vehicles were tested in order to find significant differences between the mean emissions of air pollutants. Other instances of authors using robust techniques to analyze the concentration of air pollutants are presented in [
45,
46,
47,
48,
49,
50,
51,
52,
53,
54].
In this research, the measurement results of 12 years of tropospheric ozone concentration were analyzed using robust techniques. The urban area chosen for the study was Belisario station [
22], and robust statistics [
18,
19,
20] were used in this research to determine the robust central tendency and scale estimations of tropospheric ozone and find parametric, nonparametric, and robust confidence intervals that explain the O
3 concentration [
18,
19,
20,
42,
43]. The O
3 concentration measurements that were analyzed in this paper were taken from 1 January 2008 to 31 December 2019.
The analysis presented here allowed us to judge the central tendency of the data based on the variability. Here, the data were grouped and classified, and similarities and differences were determined. Moreover, confidence intervals were used to do the aforementioned classification and several methodologies were used to analyze the data: classic, nonparametric, bootstrap, and robust methodologies.
In [
23], another analysis of air pollution variables in Quito is presented. However, the analysis performed in [
23] is not robust and, what is more, it is only based on the mean and maximum values. For this reason, the study performed in this research can be considered essential to comprehensively understand, in a formal and rigorous manner, the behavior of tropospheric ozone at Belisario monitoring station.
In this paper, in order to perform robust data analyses, each year under study was considered as a random variable. Additionally, it was shown that the distribution of these variables was heavy-tailed [
19,
21]. The concentration of other air pollution variables in Quito was robustly analyzed in [
47,
48,
49,
50,
51]. Furthermore, in [
55,
56] some robust estimators were also used. Additional research in which statistical tools have been used to analyze the O
3 concentration are as follows.
A pollution weather prediction system was proposed in [
57] and used to measure O
3 among other pollutants. In [
57], in order to carry out predictions, linear regression and artificial neural networks were used.
An air quality monitoring network aimed at analyzing the changeable nature of ozone across several communities of California, USA, was shown in [
58], where the mean absolute error was used to analyze O
3 concentrations and the accuracy of measurement nodes and their correlation to reference instrumentation was indicated by using least squares regression. Moreover, summary statistics based on the mean, standard deviation, minimum, maximum, mean bias deviation, mean absolute deviations, and ordinary least squares statistics were used in [
58] to present the data in a meaningful way.
In order to produce well-calibrated data, both multiple linear regression and nonlinear techniques were used in [
59]. Additionally, a recalibration was done to mitigate the bias presented by the sensors and improve the variance.
A low-cost air quality monitoring system was presented in [
60]. This system was aimed at monitoring O
3, among other pollutants, and a comparative analysis between a neuro-fuzzy system and a multilayer feed-forward perceptron was performed.
Finally, a statistical analysis of ensembles of O
3 profiles at the P. N. Lebedev Physical Institute, Moscow, Russia from 1996 to 2017 was carried out in [
61]. This analysis was based on radiometric ozone monitoring and several statistical parameters were calculated: mean, variance, root-mean-square error, probability density function, probability distribution function, covariance function, correlation function, and frequency spectra.
The contribution of the present research with respect to the studies mentioned above is that, in order to optimize the sampling process to reduce power consumption in cases where researchers use portable devices powered by a battery, variables have been defined that represent the hours of the day in which the ozone concentration is the highest. This was done for both hours grouped by months and hours grouped by days of the week, separating working days from weekends, and it was shown that all the variables considered were different. For example, what happens in a particular month has nothing to do with what happens in other months. Therefore, it is difficult to make predictions since the distributions of the variables are different. Specifically, the variables do not come from the same statistical populations. Therefore, it is shown here that using robust methods is more effective than using nonrobust ones.
The objectives of this paper were as follows:
Compare the values of tropospheric ozone measurements based on four sets whose elements are variables that represent the following: (a) the 12 years under study, (b) the months: January to December, (c) the days: Monday to Sunday, and (d) the hours in pairs: from 0:00–1:00 to 22:00–23:00.
Analyze the behavior of the abovementioned variables in comparison with the different categories of air pollution established by the IAQ of Quito [
23].
Estimate the data’s central tendency and variability, and quantify differences using robust and nonrobust confidence intervals.
In this paper, it is shown that for the data under study, the trend of tropospheric ozone concentration at Belisario station has been towards stability for years, very variable by months and during the day, and moderately changing between the days of the week.
Some general comments on ozone sensors are made in
Section 2.
Section 3 describes the data and presents summary statistics on the collected data. Furthermore, nonparametric statistical inference is used in
Section 4 to classify the data. Moreover,
Section 5 is a robust data analysis and classification, and
Section 6 is the discussion. Finally,
Section 7 presents the conclusions of the paper.