We first retrieved the XCO
2 using a full physics retrieval algorithm. However, it is difficult to directly analyze the spatial distribution of XCO
2 in the flight test area using the limited retrieval results. The high accuracy surface modeling (HASM) method has been widely used in the simulation of climate change, CO
2 concentrations in the surface layer, and terrestrial land cover change [
17,
18,
19]. We therefore used HASM to fill the gaps where there was no information on XCO
2 in the flight test area. The entire workflow reproduced in
Figure 4 includes three main steps. First, we used a full physics retrieval algorithm to retrieve XCO
2. Second, we used regression modeling to establish the approximate (i.e., not very accurate) surface of XCO
2 in the flight test area. And third, we used HASM, which took the approximate surface of XCO
2 as its driving field and retrieved XCO
2 as its optimum control constraints, to simulate a high accuracy XCO
2 surface for the whole flight test area.
3.1. Retrieval Algorithm
In this study, the retrieval algorithm is primarily used for simultaneously fitting the O
2 A-band at 760 nm and the CO
2 band at 1610 nm to get column abundances of CO
2 and O
2 based on the optimal estimation method [
20]. The retrieval algorithm is an iterative retrieval method which calls the radiative transfer model with updated parameters after each iteration step. The primary components of the flight XCO
2 retrieval processing workflow are summarized in
Figure 5. Given that the atmospheric and surface states and the observing geometry for a sounding, the forward model generates radiance spectra and Jacobians (as will be described in more detail in
Section 3.1.1 below). This model first generates two synthetic spectra that fully resolves the solar spectrum, the absorption and scattering cross sections for each atmospheric gas and the reflecting surface. The inverse method is based on Rodger’s optimal estimation approach. This method modifies the initial state vector to minimize differences between the observed and simulated spectra from each sounding. This inverse method is described in more detail in
Section 3.1.2 below.
3.1.1. Forward Model
Synthetic spectra were generated using a radiative transfer algorithm. Losses can be caused by absorption and the scattering of the atmosphere. The sources (i.e., gains) mainly come from atmospheric emission and multiple scattering, and the general expression of the radiative transfer is given by the following equation [
21]:
where μ is the cosine of the zenith angle,
I is the specific intensity,
J is the source function (multiple scattering), and τ is the optical depth.
The radiative transfer model SCIATRAN was used to solve Equation (1). SCIATRAN is a comprehensive software package for the modeling of radiative transfer processes in the terrestrial atmosphere and ocean in the spectral range from the ultraviolet to the thermal infrared (i.e., 0.18–40 μm) including multiple scattering processes, polarization, thermal emission and ocean-atmosphere coupling. The software is capable of modeling spectral and angular distributions of the intensity or the Stokes vector of the transmitted, scattered, reflected, and emitted radiation assuming either a plane-parallel or a spherical atmosphere [
22].
The inputs and outputs of the SCIATRAN model can be seen in the
Table 4. The solar irradiance spectra were acquired from Dr. Kurucz (
http://kurucz.harvard.edu). Gas absorption and scattering cross sections were obtained from the SCIATRAN software package which incorporates a climatological database obtained from a 2D chemical transport model. The atmospheric state (temperature, humidity and pressure profiles), surface state and aerosol optical properties were taken from ground synchronous observations and the European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis data. A set of instrument line shape functions were provided by the Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences. The outputs of the forward model were synthetic spectra and Jacobians.
3.1.2. Inverse Method
The inversion was used to obtain the unknown geophysical parameters (CO
2 and O
2 profiles) which minimize the differences between the observed and synthetic spectra from each sounding. The inverse method is based on Rodger’s optimization algorithm in which the spectrum is expressed symbolically as in Equation (2):
where
x is the unknown state vector,
b is a set of quantities required by the forward model but not retrieved,
F is the forward model, and
are the spectral errors due to the measurements and the forward model.
To find the state vector with the maximum a posteriori probability, we minimized the cost function (Equation (3)):
where
is the error covariance matrix corresponding to the measurement vector,
is the a priori state vector which holds the prior knowledge about the state vector elements and
is the corresponding a priori error covariance matrix which specifies the uncertainties of the a priori state vector elements as well as their cross-correlations.
The Levenberg-Marquardt method was used to obtain the solution in an iterative manner (Equation (4)):
where
K is the Jacobian or weighting function matrix consisting of the derivatives of the forward model with respect to the state vector elements
K = ∂
F(
x,
b)/∂
x. In the case of convergence,
is the most probable solution given the measurements and prior knowledge and it is then denoted as the maximum a posteriori solution
of the inverse problem.
is the corresponding covariance matrix consisting of the variances of the retrieval state vector elements and their correlations. The damping factor
adjusts the step size of the iteration in a way which ensures that each step further minimizes the cost function.
The inverse method continued to provide iterative improvements of the solutions until both of convergence criteria listed below were achieved and we could obtain CO
2 and O
2 column abundances:
- (1)
The fitted residuals with root mean square error (RMSE) differences between synthetic and observation spectra less than some pre-determined threshold: 0.25% for CO
2 and 2% for O
2 window [
4,
13].
- (2)
The normalized successive difference of the state vector is less than some pre-determined threshold (1%) in the TANSO-FTS SWIR L2 algorithm (see [
2] for details).
For CO
2 we derived column-averaged dry air mole fractions by normalizing the CO
2 columns with the simultaneously retrieved oxygen columns retrieved from the O
2 A-band. Oxygen is an accurate proxy for the air column because its mole fraction is well known and has negligibly small variations. Then XCO
2 was calculated using Equation (6) [
4,
23] as follows:
where
is the retrieved absolute CO
2 column (in molecules/cm
2),
is the retrieved absolute O
2 column (in molecules/cm
2), and
is the assumed (column-averaged) mole fraction of O
2 used to convert the O
2 column into a corresponding dry air column.
3.2. Derivation of Approximate XCO2 Surface
The approximate XCO
2 surface in the flight test area was required to establish as the driving field of HASM (described in
Section 3.3 below). The processing steps used to estimate the approximate XCO
2 are summarized in
Figure 6. Firstly, we used the least squares fit for the captive balloon’s CO
2 profiles, the surface CO
2 concentration in the surface layer and other synchronous ground observations to build a regression model [
17,
18]. Secondly, we combined the regression model with the Weather Research and Forecasting (WRF) model [
24,
25] to derive the approximate CO
2 surface for every pressure layer in the whole flight test area. Finally, the pressure weighting function was used to merge every pressure layer’s CO
2 surface to estimate the approximate XCO
2 surface.
3.2.1. Initial Regression Model
We used an ordinary least squares multivariate linear fit method to build the regression models. The two groups of explanatory variables listed in
Table 5 were used to build separate regression models for the CO
2 concentrations in the surface layer [
18] and the CO
2 profile. We used the modeling spatial relationships/ordinary least square tools in ArcGIS to perform this work, and both models passed the F test at the 95% confidence level.
The CO
2 concentration in every pressure layer at every grid point in the flight test area was then calculated as follows:
where
is the CO
2 concentration in every pressure layer, b is a constant,
x represents the explanatory variables,
a is the corresponding coefficient for each of the explanatory variables, and
n is the number of explanatory variables.
3.2.2. WRF Model
WRF is a next-generation mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting needs. The model serves a wide range of meteorological applications across scales from tens of meters to thousands of kilometers. WRF can generate atmospheric simulations using real-world data or idealized conditions. WRF offers operational forecasting a flexible and computationally-efficient platform for operational forecasting using initial data from the US National Center for Environmental Prediction (NCEP).
The initial data used in this study were derived from reanalysis datasets of NCEP. We generated forecasts at 20:00 each day using an integrate interval of 30 h and 1 km by 1 km grid. Physical parameters were the YSU scheme, Monin Obukhov scheme, WSM 6-class graupel scheme, RRTM scheme (longwave), Goddard scheme (shortwave), Noah-MP land-surface model and Kain-Fritsch (new Eta) scheme respectively. We used WRF Version 3.5 to model the CO2 concentration at every point using the aforementioned explanatory variables in the flight test area on an hourly basis from 10:00 to 14:00 on the 11th, 14th, and 16th of September. We then used Equation (7) to calculate the approximate CO2 concentration of every pressure layer at each point in the flight test area.
3.2.3. Pressure Weighting Function
The pressure weighting function h relates the local CO
2 concentration specified on the discrete pressure levels to the profile-weighted average [
8], so XCO
2 could be calculated as follows:
where
u denotes the CO
2 concentration and the subscripts refer to the layers, and:
where
p denotes the pressure and the subscripts again refer to the layers, and
denotes the surface pressure. For the edge layers, if
i = 1, only the left first term applies, that is
, while if
i =
n, only the left second term applies, that is
.
The approximate CO
2 concentration for every pressure layer at each point in the flight test area was calculated using the methods described in
Section 3.2.1 and
Section 3.2.2. With the methods specified at this step, the approximate XCO
2 was obtained by averaging the approximate CO
2 concentration of every pressure layer, weighted by the pressure weighting function.
3.3. High Accuracy Surface Modeling
The high accuracy surface modeling [
26] platform takes global approximate information (e.g., remote sensing images or model simulation results) as its driving field and local accurate information (e.g., ground observation and/or sampling data) as its optimum control constraints. A surface can be uniquely defined by the first and the second fundamental coefficients [
27,
28,
29,
30,
31] in terms of the fundamental theorem of surfaces. The first fundamental coefficients are used to express the intrinsic geometric properties that do not depend on the shape of the surface, but only on measurements that we can carry out while on the surface itself. The second fundamental coefficients reflect the local warping of the surface, namely its deviation from a tangent plane at the point under consideration, which can be observed from outside the surface. The Earth’s surface system or a component surface of the Earth’s surface environment can be simulated with HASM when its spatial resolution is fine enough and is uniquely defined by both extrinsic and intrinsic invariants of the surface.
If a surface is a graph of a function
, the first fundamental coefficients
E,
F and
G can be formulated as
The second fundamental coefficients
L,
M and
N can be formulated as
The first and the second fundamental coefficients should satisfy the following Gauss equation set:
where
,
,
,
,
are the Christoffel symbols of the second kind which depend only on the first fundamental coefficients and their derivatives.
Finite difference methods are used for solving the Gauss equation set (Equation (12)). It can be simplified as the following equation set [
26]:
If
is the value of
at the pth sampled point
in the computational domain, the simulation value should be equal or approximate to the sampling value of this lattice so that the constraint equation is added to the simplified equation set (Equation (13)). The matrix formulation of the HASM master equations can be expressed as follows [
26]:
where the parameter λ is the weight of the sampling points and determines the contribution of the sampling points to the simulated surface. λ could be a real number, which means all sampling points have the same weight, or a vector, which means every sampling point has its own weight. An area affected by a sampling point in a complex region is smaller than in a flat region. Therefore, a smaller value of λ is selected in a complex region and a bigger value of λ is selected in a flat region.
High accuracy surface modeling can simulate continuous attribute variations in three-dimensional space, and has been successfully used in simulating climate change [
17,
32,
33], constructing DEMs [
28,
29], interpolating soil properties [
34,
35,
36] and terrestrial land cover change [
19]. For this study, HASM took the approximate XCO
2 surface as its driving field and the flight retrieved XCO
2 as its optimum control constraints as shown in Equation (15):
where
is the final XCO
2 surface which is calculated by HASM,
is the approximate XCO
2 surface which is obtained by regression modeling, and
is the retrieval XCO
2. The main inputs of HASM are the retrieval XCO
2 and the approximate XCO
2 surface with the output being the high accuracy surface of XCO
2.