Bolyog Pap

On conditional least squares estimation for affine diffusions
based on continuous time observations
Beáta Bolyog∗ , Gyula Pap
Bolyai Institute, University of Szeged, Aradi vértanúk tere 1, H–6720 Szeged, Hungary
e–mails: [email protected] (B. Bolyog), [email protected] (G. Pap).
* Corresponding author.
arXiv:1703.02376v2 [math.PR] 26 Jul 2017
Abstract
We study asymptotic properties of conditional least squares estimators for the drift parameters
of two-factor affine diffusions based on continuous time observations. We distinguish three cases:
subcritical, critical and supercritical. For all the drift parameters, in the subcritical and supercritical
cases, asymptotic normality and asymptotic mixed normality is proved, while in the critical case,
non-standard asymptotic behavior is described.
1 Introduction
Affine processes are applied in mathematical finance in several models including interest rate models
(e.g. the Cox–Ingersoll–Ross, Vasiček or general affine term structure short rate models), option pricing
(e.g. the Heston model) and credit risk models, see e.g. Duffie, Filipović and Schachermayer [16],
Filipović [17], Baldeaux and Platen [2], and Alfonsi [1]. In this paper we consider two-factor affine
processes, i.e. affine processes with state-space [0, ∞) × R. Dawson and Li [13] derived a jump-type
stochastic differential equation (SDE) for such processes. Specializing this result to the diffusion case,
i.e. two-factor affine processes without jumps, we obtain that for every a ∈ [0, ∞), b, α, β, γ ∈ R,
σ1 , σ2 , σ3 ∈ [0, ∞) and ̺ ∈ [−1, 1], the SDE
( √
dYt = (a − bYt ) dt + σ1 Yt dWt ,
(1.1) √ p t ∈ [0, ∞),
dXt = (α − βYt − γXt ) dt + σ2 Yt (̺ dWt + 1 − ̺2 dBt ) + σ3 dLt ,
with an arbitrary initial value (Y0 , X0 ) with P(Y0 ∈ [0, ∞)) = 1 and independent of a 3-dimensional
standard Wiener process (Wt , Bt , Lt )t∈[0,∞) , has a pathwise unique strong solution being a two-
factor affine diffusion process, and conversely, every two-factor affine diffusion process is a pathwise
unique strong solution of a SDE (1.1) with appropriate parameters a ∈ [0, ∞), b, α, β, γ ∈ R,
σ1 , σ2 , σ3 ∈ [0, ∞) and ̺ ∈ [−1, 1].
In this paper we study asymptotic properties of conditional least squares estimators (CLSE)
aT , bbT , α
(b bT , βbT , γ
bT ) of the drift parameters (a, b, α, β, γ) based on continuous time observations
(Yt , Xt )t∈[0,T ] with T > 0. This estimator is the high frequency limit in probability as n → ∞ of
the CLSE based on discrete time observations (Yk/n , Xk/n )k∈{0,...,⌊nT ⌋} , n ∈ N. We do not estimate
the parameters σ1 , σ2 , σ3 and ̺, since for all T ∈ (0, ∞), they are measurable functions (i.e., statis-
aT , bbT , α
tics) of (Yt , Xt )t∈[0,T ] , see Appendix C. It will turn out that for the calculation of (b bT , βbT , γ
bT )
one does not need to know the values of the diffusion coefficients σ1 , σ2 , σ3 and ̺, see (3.6).
2010 Mathematics Subject Classifications: 60J60, 62F12.
Key words and phrases: affine processes, conditional least squares estimators.
1
The first coordinate process Y in (1.1) is called a Cox–Ingersoll–Ross (CIR) process (see Cox,
Ingersoll and Ross [12]). In the submodel consisting only of the process Y , Overbeck and Rydén
[27, Theorems 3.4, 3.5 and 3.6] derived the CLSE of (a, b) based on continuous time observations
(Yt )t∈[0,T ] with T > 0, i.e., the limit in probability as n → ∞ of the CLSE based on discrete
time observations (Yk/n )k∈{0,...,⌊nT ⌋} , n ∈ N, which turns to be the same as the CLSE (b aT , bbT )
of (a, b) based on continuous time observations (Yt , Xt )t∈[0,T ] , and they proved strong consistency
and asymptotic normality in case of a subcritical CIR process Y , i.e., when b > 0 and the initial
distribution is the unique stationary distribution of the model.
Barczy at al. [6] considered a submodel of (1.1) with a ∈ (0, ∞), β = 0, σ1 = 1, σ2 = 1,
̺ = 0 and σ3 = 0. The estimator of the parameters (α, γ) based on continuous time observations
(Xt )t∈[0,T ] with T > 0 (which they call a least square estimator) is in fact the CLSE, i.e., the limit in
probability as n → ∞ of the CLSE based on discrete time observations (Xk/n )k∈{0,...,⌊nT ⌋} , n ∈ N,
which can be shown by the method of the proof of Lemma 3.3. They proved strong consistency and
asymptotic normality in case of a subcritical process (Y, X), i.e., when b > 0 and γ > 0.
Barczy at al. [7] considered the so-called Heston model, which is a submodel of (1.1) with a, σ1 , σ2 ∈
(0, ∞), γ = 0, ̺ ∈ (−1, 1) and σ3 = 0. The estimator of the parameters (a, b, α, β) based on
continuous time observations (Yt , Xt )t∈[0,T ] with T > 0 (which they call least square estimator)
is in fact the CLSE, i.e., the limit in probability as n → ∞ of the CLSE based on discrete time
observations (Yk/n , Xk/n )k∈{0,...,⌊nT ⌋} , n ∈ N which can be shown by the method of the proof of
Lemma 3.3. They proved strong consistency and asymptotic normality in case of a subcritical process
(Y, X), i.e., when b > 0. Note that Barczy and Pap [8] studied the maximum likelihood estimator
σ2
(MLE) of the parameters (a, b, α, β) in this Heston model under the additional assumption a > 21 .
In the subcritical case, i.e., when b > 0, they proved strong consistency and asymptotic normality
σ2
of the MLE of (a, b, α, β) under the additional assumption a > 21 . In the critical case, namely,
if b = 0, they showed weak consistency of the MLE of (a, b, α, β) and determined the asymptotic
σ2
behavior of the MLE under the additional assumption a > 21 . In a special supercritical case, namely,
when b < 0, they showed strong consistency of the MLE of b, weak consistency of the MLE of
β and proved asymptotic mixed normality of the MLE of (a, b, α, β). Barczy at al. [3, 4] studied
the asymptotic behavior of maximum likelihood estimators for a jump-type Heston model and for the
growth rate of a jump-type CIR process, respectively, based on continuous time observations.
We consider general two-factor affine diffusions (1.1). In the subcritical case, i.e., when b > 0
and γ > 0, we prove strong consistency and asymptotic normality of (b aT , bbT , αbT , βbT , γ
bT ) under the
2 2 2
additional assumptions a > 0, σ1 > 0 and (1 − ̺ )σ2 + σ3 > 0. In a special critical case, namely
if b = 0 and γ = 0, we show weak consistency of (bbT , βbT , γbT ) and determine the asymptotic
behavior of (b aT , bbT , α
bT , βbT , γ
bT ) under the additional assumptions β = 0 and (1 − ̺2 )σ22 + σ32 > 0.
In a special supercritical case, namely, when γ < b < 0, we show strong consistency of bbT , weak
consistency of (βbT , γ bT ) and prove asymptotic mixed normality of (b aT , bbT , αbT , βbT , γ
bT ) under the
σ12
additional assumptions αβ 6 0, σ1 > 0, and either σ3 > 0, or a − 2 (1 − ̺2 )σ22 > 0. Note
that we decided to deal with the CLSE of (a, b, α, β, γ), since the MLE of (a, b, α, β, γ) contains, for
RT Xt
example, 0 (1−̺2 )σ 2 2 dt, and the question of the asymptotic behavior of this integral as T → ∞
2 Yt +σ3
is still open in the critical and supercritical cases.
2
2 The affine two-factor model
Let N, Z+ , R, R+ , R++ , R− , R−− and C denote the sets of positive integers, non-negative
integers, real numbers, non-negative real numbers, positive real numbers, non-positive real numbers,
negative real numbers and complex numbers, respectively. For x, y ∈ R, we will use the notations
x∧y := min(x, y) and x∨y := max(x, y). By Cc2 (R+ ×R, R), we denote the set of twice continuously
differentiable real-valued functions on R+ × R with compact support. Let (Ω, F, P) be a probability
space equipped with the augmented filtration (Ft )t∈R+ corresponding to (Wt , Bt , Lt )t∈R+ and a given
initial value (η0 , ξ0 ) being independent of (Wt , Bt , Lt )t∈R+ such that P(η0 ∈ R+ ) = 1, constructed
as in Karatzas and Shreve [24, Section 5.2]. Note that (Ft )t∈R+ satisfies the usual conditions, i.e.,
the filtration (Ft )t∈R+ is right-continuous and F0 contains all the P-null sets in F. We will denote
the convergence in distribution, convergence in probability, almost surely convergence and equality in
D P a.s. D
distribution by −→, −→, −→ and =, respectively. By kxk and kAk, we denote the Euclidean
norm of a vector x ∈ Rd and the induced matrix norm of a matrix A ∈ Rd×d , respectively. By
I d ∈ Rd×d , we denote the d × d unit matrix. For quadratic matrices A1 , . . . , Ak , diag(A1 , . . . , Ak )
will denote the quadratic block matrix containing the matrices A1 , . . . , Ak in its diagonal.
The next proposition is about the existence and uniqueness of a strong solution of the SDE (1.1),
see Bolyog and Pap [11, Proposition 2.2].
2.1 Proposition. Let (η0 , ξ0 ) be a random vector independent of the process (Wt , Bt , Lt )t∈R+ satis-
fying P(η0 ∈ R+ ) = 1. Then for all a ∈ R+ , b, α, β, γ ∈ R, σ1 , σ2 , σ3 ∈ R+ , ̺ ∈ [−1, 1], there is a
(pathwise) unique strong solution (Yt , Xt )t∈R+ of the SDE (1.1) such that P((Y0 , X0 ) = (η0 , ξ0 )) = 1
and P(Yt ∈ R+ for all t ∈ R+ ) = 1. Further, for all s, t ∈ R+ with s 6 t, we have
Z t Z t p
−b(t−s) −b(t−u)
(2.1) Yt = e Ys + a e du + σ1 e−b(t−u) Yu dWu
s s
and
Z t
Xt = e−γ(t−s) Xs + e−γ(t−u) (α − βYu ) du
s
(2.2) Z Z
t p p t
+ σ2 e−γ(t−u) Yu (̺ dWu + 1 − ̺2 dBu ) + σ3 e−γ(t−u) dLu .
s s
Moreover, (Yt , Xt )t∈R+ is a two-factor affine process with infinitesimal generator
(A(Y,X) f )(y, x) = (a − by)f1′ (y, x) + (α − βy − γx)f2′ (y, x)

(2.3)
1 1
+ y σ12 f1,1
′′ ′′
(y, x) + 2̺σ1 σ2 f1,2 (y, x) + σ22 f2,2
′′
(y, x) + σ32 f2,2
′′
(y, x),
2 2
where (y, x) ∈ R+ × R, f ∈ Cc2 (R+ × R, R), and fi′ , i ∈ {1, 2}, and fi,j ′′ , i, j ∈ {1, 2}, denote the
first and second order partial derivatives of f with respect to its i-th and i-th and j-th variables.
Conversely, every two-factor affine diffusion process is a (pathwise) unique strong solution of a
SDE (1.1) with suitable parameters a ∈ R+ , b, α, β, γ ∈ R, σ1 , σ2 , σ3 ∈ R+ and ̺ ∈ [−1, 1].
The next proposition gives the asymptotic behavior of the first moment of the process (Yt , Xt )t∈R+
as t → ∞, see Bolyog and Pap [11, Proposition 2.3].
3
2.2 Proposition. Let us consider the two-factor affine diffusion model (1.1) with a ∈ R+ , b, α, β, γ ∈
R, σ1 , σ2 , σ3 ∈ R+ , ̺ ∈ [−1, 1]. Suppose that E(Y0 |X0 |) < ∞. In case of b ∈ R++ we have
E(Yt ) = ab + O(e−bt ) and




α
γ − aβ
bγ + O(e
−(b∧γ)t ), γ ∈ R++ ,

E(Xt ) = α − aβ b t + O(1), γ = 0,



 β E(Y ) + E(X ) − α + aβ − aβ e−γt + O(1), γ ∈ R .
γ−b 0 0 γ bγ (γ−b)b −−
In case of b = 0 we have E(Yt ) = at + O(1) and


 aβ
− γ t + O(1),
 γ ∈ R++ ,

E(Xt ) = − 12 aβt2 + O(t), γ = 0,



 β E(Y ) + E(X ) − α − aβ −γt
γ 0 0 γ γ2
e + O(t), γ ∈ R−− .
a

In case of b ∈ R−− we have E(Yt ) = E(Y0 ) − b e−bt + O(1) and

 β aβ −bt

 − γ−b E(Y 0 ) + (γ−b)b e + O(1), γ ∈ R++ ,

 −bt

 β βa

 E(Y0 ) + E(X0 ) − b2 e + O(t), γ = 0,
 b
β aβ −bt
E(Xt ) = − γ−b E(Y0 ) + (γ−b)b e + O(e−γt ), γ ∈ (b, 0),

 −bt



 −β E(Y0 ) + aβ b te + O(e−γt ), γ = b,



 β α aβ aβ −γt
γ−b E(Y0 ) + E(X0 ) − γ + bγ − b(γ−b) e + O(e−bt ), γ ∈ (−∞, b).
Based on the asymptotic behavior of the first moment of the process (Yt , Xt )t∈R+ as t → ∞, we
can classify two-factor affine diffusions in the following way.
2.3 Definition. Let (Yt , Xt )t∈R+ be the unique strong solution of the SDE (1.1) satisfying P(Y0 ∈
R+ ) = 1. We call (Yt , Xt )t∈R+ subcritical, critical or supercritical if b ∧ γ ∈ R++ , b ∧ γ = 0 or
b ∧ γ ∈ R−− , respectively.
3 CLSE based on continuous time observations
Overbeck and Rydén [27] investigated the CIR process Y , and for each T ∈ R++ , they defined a
CLSE (b aT , bbT ) of (a, b) based on continuous time observations (Yt )t∈[0,T ] as the limit in probability
of the CLSE (b aT,n , bbT,n ) of (a, b) based on discrete time observations (Y iT )i∈{0,1,...,n} as n → ∞.
n
We consider a two-factor affine diffusion process (Yt , Xt )t∈R+ given in (1.1) with known σ1 ∈ R++ ,
σ2 , σ3 ∈ R+ and ̺ ∈ [−1, 1], and with a random initial value (η0 , ζ0 ) independent of (Wt , Bt , Lt )t∈R+
satisfying P(η0 ∈ R+ ) = 1, and we will consider θ = (a, b, α, β, γ)⊤ ∈ R+ × R4 as a parameter. The
aim of the following discussion is to construct a CLSE of θ based on continuous time observations
(Yt , Xt )t∈[0,T ] with some T ∈ R++ .
4
bT,n of θ based on discrete time observations (Y i , X i )i∈{0,1,...,⌊nT ⌋}
Let us recall the CLSE θ
n n
with some n ∈ N, which can be obtained by solving the extremum problem
⌊nT ⌋ 2 2
X
bT,n := arg min
θ Y i − E Y i F i−1 + X i − E X i F i−1 .
n n n n n n
θ∈R5 i=1
By (2.1) and (2.2), together with Proposition 3.2.10 in Karatzas and Shreve [24], for all s, t ∈ R+
with s 6 t, we obtain
Z t
−b(t−s)
(3.1) E(Yt | Fs ) = e Ys + a e−b(t−u) du,
s
Z t Z t
−γ(t−s) −γ(t−u)
E(Xt | Fs ) = e Xs + α e du − βYs e−γ(t−u)−b(u−s) du
s s
(3.2) Z Z
t u
− aβ e−γ(t−u) e−b(u−v) dv du.
s s
Thus, for all i ∈ N, we have

Z 1
b n
E Y i | F i−1 = e− n Y i−1 + a e−bw dw
n n n
0
and
Z 1 Z 1
γ n n γ
−n −γw
E X i | F i−1 = e X i−1 + α e dw − βY i−1 e(γ−b)w− n dw
n n n n
0 0
Z 1 Z w
n γ
− aβ eγw− n e−b(w−v) dv dw.
0 0
Consequently,
⌊nT ⌋ 2
X
bT,n =
θ arg min Y i − Y i−1 − c − dY i−1
n n n
(a,b,α,β,γ)⊤ ∈R5 i=1
(3.3)
2
+ X i − X i−1 − δ − εY i−1 − ζX i−1 ,
n n n n
where
(3.4) (c, d, δ, ε, ζ) := (cn (a, b), dn (b), δn (a, b, α, β, γ), εn (b, β, γ), ζn (γ)) := gn (a, b, α, β, γ)
with
Z 1
n b
c := cn (a, b) := a e−bw dw, d := dn (b) := 1 − e− n ,
0
Z 1 Z 1 Z w
n n γ
−γw γw− n −b(w−v)
δ := δn (a, b, α, β, γ) := α e dw − aβ e e dv dw,
0 0 0
Z 1
n γ γ
ε := εn (b, β, γ) := β e(γ−b)w− n dw, ζ := ζn (γ) := 1 − e− n .
0
5
The function gn : R5 → R × (−∞, 1) × R2 × (−∞, 1) is bijective, so first we determine the CLSE
cT,n , dbT,n , δbT,n , εbT,n , ζbT,n ) of the transformed parameters (c, d, δ, ε, ζ) by minimizing the sum on the
(b
right-hand side of (3.3) with respect to (c, d, δ, ε, ζ). We have
⌊nT ⌋ 2
X
b
cT,n , dT,n = arg min
b Y i − Y i−1 − c − dY i−1 ,
n n n
(c,d)⊤ ∈R2 i=1
⌊nT ⌋ 2
X
b b
δT,n , εbT,n , ζT,n = arg min X i − X i−1 − δ − εY i−1 − ζX i−1 ,
n n n n
(δ,ε,ζ)⊤ ∈R3 i=1
hence, similarly as on page 675 in Barczy et al. [5], we get

 
" # δbT,n
cT,n
b (1) −1 (1)  
(3.5) = ΓT,n ϕT,n , εbT,n  = Γ(2) −1 ϕ(2)
  T,n T,n
dbT,n
ζbT,n
with
 
⌊nT
P⌋  
 ⌊nT ⌋ − Y i−1  Y ⌊nT ⌋ − Y0
(1)  i=1 n  (1)  n 
ΓT,n :=  ⌊nT ⌋
 P ⌊nT ⌋
P 2 
, ϕT,n P⌋
:=  ⌊nT ,
− Y i−1 Y i−1 − Y i − Y i−1 Y i−1
n n n
n n i=1
i=1 i=1
 ⌊nT
  
P⌋ ⌊nT
P⌋
 ⌊nT ⌋ − Y i−1 − X i−1  X ⌊nT ⌋ − X0
 i=1 n
i=1 n   n 
 ⌊nT   ⌊nT ⌋ 
 P⌋ ⌊nT
P⌋ ⌊nT
P⌋   P 
(2)
ΓT,n :=  − Y i−1 2
Y i−1 Y i−1 X i−1  , ϕT,n
(2) 
:=  − X i − X i−1 Y i−1 
n n n 
 i=1 n i=1 n i=1 n n  i=1
 ⌊nT 
 ⌊nT ⌋   P⌋ 
 P ⌊nT
P⌋ ⌊nT
P⌋ 2  − X i − X i−1 X i−1
− X i−1 Y i−1 X i−1 X i−1 i=1 n n n
n n n n
i=1 i=1 i=1
(1) (2)
on the event where the random matrices ΓT,n and ΓT,n are invertible.
3.1 Lemma. Let us consider the two-factor affine diffusion model (1.1) with a ∈ R+ , b, α, β, γ ∈ R,
σ1 ∈ R++ , σ2 , σ3 ∈ R+ and ̺ ∈ [−1, 1] with a random initial value (η0 , ζ0 ) independent of
(Wt , Bt , Lt )t∈R+ satisfying P(η0 ∈ R+ ) = 1. Suppose that (1 − ̺2 )σ22 + σ32 > 0. Then for each
(1) (2)
T ∈ R++ and n ∈ N, the random matrices ΓT,n and ΓT,n are invertible almost surely, and hence

there exists a unique CLSE b cT,n , dbT,n , δbT,n , εbT,n , ζbT,n of (c, d, δ, ε, ζ) taking the form given in (3.5).
(1)
Proof. The aim of the following discussion is to show that the random matrix ΓT,n is almost surely
(1)
strictly positive definite checking that for all x ∈ R2 \ {0}, we have x⊤ ΓT,n x > 0 almost surely.
Indeed, for all x = (x1 , x2 )⊤ ∈ R2 ,
" #⊤ " # ⌊nT ⌋ " #⊤   ⊤ " #
⌊nT ⌋
x1 (1) x1 X x1 1 1 x1 X
ΓT,n =    ds = (x1 − x2 Y i−1 )2 ds > 0,
x2 x2 i=1 x2 −Y i−1 −Y i−1 x2 i=1
n
n n
6
(1)
and x⊤ ΓT,n x = 0 if and only if x1 − x2 Y i−1 = 0 for all i ∈ {0, 1, . . . , ⌊nT ⌋}, which happens with
n
probability 0, since x = (x1 , x2 )⊤ 6= 0 and, for each i ∈ {1, . . . , ⌊nT ⌋}, the distribution of Y i−1 is
n
absolutely continuous, since the conditional distribution of Y i−1 given Y0 is absolutely continuous,
n
see, eg., Ben Alaya and Kebaier [10, Proof of Proposition 2] and Ikeda and Watanabe [22, page 222].
(2)
In a similar way, the stochastic matrix ΓT,n is almost surely strictly positive definite, since for
(2)
all x ∈ R3 \ {0}, we have x⊤ ΓT,n x > 0 almost surely. Indeed, for all x = (x1 , x2 , x3 )⊤ ∈ R3 ,
 ⊤    ⊤   ⊤  
x1 x1 x 1 1 1 x1
  (2)   ⌊nT
X⌋        ⌊nT ⌋
X
x2  Γ x2  = x2   −Y i−1   −Y i−1  x2  ds = (x1 − x2 Y i−1 − x3 X i−1 )2 ds > 0,
  T,n      n   n    n n
i=1 i=1
x3 x3 x3 −X i−1 −X i−1 x3
n n
(2)
and x⊤ ΓT,n x = 0 if and only if x1 − x2 Y i−1 − x3 X i−1 = 0 for all i ∈ {0, 1, . . . , ⌊nT ⌋}, which
n n
happens with probability 0, since x = (x1 , x2 , x3 )⊤ 6= 0 and, for each i ∈ {1, . . . , ⌊nT ⌋}, the
distribution of (Y i−1 , X i−1 ) is absolutely continuous, because, as in the proof of part (b) in the proof
n n
of Theorem A.1 in Bolyog and Pap [11], the conditional distribution of (Y i−1 , X i−1 ) given (Y0 , X0 )
n n
is absolutely continuous. ✷
3.2 Remark. The first order Taylor approximation of gn (a, b, α, β, γ) at (0, 0, 0, 0, 0) is

1
n (a, b, α, β, γ), hence we obtain the first order Taylor approximations
1
Y i − E Y i | F i−1 ≈ Y i − Y i−1 − a − bY i−1 ,
n n n n n n n
1
X i − E X i | F i−1 ≈ X i − X i−1 − α − βY i−1 − γX i−1 .
n n n n n n n n
Using these approximations, one can define an approximate CLSE θ bapprox of θ based on discrete
T,n
time observations (Yi , Xi )i∈{0,1,...,⌊nT ⌋} , n ∈ N, by solving the extremum problem
⌊nT ⌋
bapprox :=
X 1 2
θ T,n arg min Y i − Y i−1 − a − bY i−1
(a,b,α,β,γ)⊤ ∈R5 i=1 n n n n
1 2
+ X i − X i−1 − α − βY i−1 − γX i−1 ,
n n n n n
⊤
hence θbapprox = n bcT,n , dbT,n , δbT,n , εbT,n , ζbT,n . This definition of approximate CLSE can be considered
T,n
as the definition of LSE given in Hu and Long [20, formula (1.2)] for generalized Ornstein–Uhlenbeck
processes driven by α-stable motions, see also Hu and Long [21, formula (3.1)]. For a heuristic
motivation of the estimator θ bapprox based on discrete observations, see, e.g., Hu and Long [19, page
n
178] (formulated for Langevin equations). ✷
7
We have
" RT #
1 (1) a.s. T − 0 Ys ds (1)
Γ −→ RT RT 2 =: GT ,
n T,n − 0 Ys ds 0 Ys ds
 RT RT 
T − 0 Ys ds − 0 Xs ds
1 (2) a.s.  R T RT 2 RT 
 =: G(2)
Γ −→   − 0 Ys ds 0 Ys ds 0 Xs Ys ds
n T,n RT RT RT 2
T
− 0 Xs ds 0 Xs Ys ds 0 Xs ds
as n → ∞, since (Yt , Xt )t∈R+ is almost surely continuous. By Proposition I.4.44 in Jacod and

Shiryaev [23] with the Riemann sequence of deterministic subdivisions ni ∧ T i∈N , n ∈ N., we
obtain  
" # XT − X0
(1) P YT − Y0 (1) (2) P  RT  (2)
ϕT,n −→ RT =: f T , ϕT,n −→  
 − 0 Ys dXs  =: f T ,
− 0 Ys dYs RT
− 0 Xs dXs
as n → ∞. By Slutsky’s lemma, using also Lemma 3.1, we conclude
   
cbT,n b
aT
   
dbT,n  " (1) #  bbT 
  (G )−1 f (1)  
bapprox   P T T   bT
(3.6) θ T,n = n  δbT,n  −→ (2) −1 (2)
=: α bT  =: θ as n → ∞,
  (GT ) f T  
εb   βb 
 T,n   T
ζbT,n γbT
(1) (2)
whenever the random matrices GT and GT are invertible.
3.3 Lemma. Let us consider the two-factor affine diffusion model (1.1) with a ∈ R+ , b, α, β, γ ∈ R,
σ1 ∈ R++ , σ2 , σ3 ∈ R+ and ̺ ∈ [−1, 1] with a random initial value (η0 , ζ0 ) independent of
(Wt , Bt , Lt )t∈R+ satisfying P(η0 ∈ R+ ) = 1. Suppose that (1 − ̺2 )σ22 + σ32 > 0. Then for each
(1) (2) bT given
T ∈ R++ , the random matrices G and G are invertible almost surely, and hence θ
T T
P b
bT,n −→
in (3.6) exists almost surely. Moreover, θ θ T as n → ∞.
(1)
Proof. The aim of the following discussion is to show that the random matrix GT is almost surely
(1)
strictly positive definite checking that for all x ∈ R2 \ {0}, we have x⊤ GT x > 0 almost surely.
Indeed, for all x = (x1 , x2 )⊤ ∈ R2 ,
" #⊤ " # Z " #⊤ " #" #⊤ " # Z
x1 x1 T x1 1 1 x1 T
(1)
GT = ds = (x1 − x2 Ys )2 ds > 0,
x2 x2 0 x2 −Ys −Ys x2 0
(1)
and x⊤ GT x = 0 if and only if x1 − x2 Ys = 0 for all s ∈ [0, T ], which happens with probability 0,
since x = (x1 , x2 )⊤ 6= 0 and, for each s ∈ (0, T ], the distribution of Ys is absolutely continuous,
since the conditional distribution of Ys given Y0 is absolutely continuous, see, eg., Ben Alaya and
Kebaier [10, Proof of Proposition 2] and Ikeda and Watanabe [22, page 222].
8
(2)
In a similar way, the stochastic matrix GT is almost surely strictly positive definite, since for
(2)
all x ∈ R3 \ {0}, we have x⊤ GT x > 0 almost surely. Indeed, for all x = (x1 , x2 , x3 )⊤ ∈ R3 ,
 ⊤    ⊤  ⊤  
x1 x1 Z T x1 1 x1 1
Z T
          
x2  G(2) x2  = x2   −Ys   −Ys  x2  ds = (x1 − x2 Ys − x3 Xs )2 ds > 0,
  T         
0 0
x3 x3 x3 −Xs −Xs x3
(2)
and x⊤ GT x = 0 if and only if x1 − x2 Ys − x3 Xs = 0 for all s ∈ [0, T ], which happens with
probability 0, since x = (x1 , x2 , x3 )⊤ 6= 0 and, for each s ∈ (0, T ], the distribution of (Ys , Xs ) is
absolutely continuous, because, as in the proof of part (b) in the proof of Theorem A.1 in Bolyog and
Pap [11], the conditional distribution of (Ys , Xs ) given (Y0 , X0 ) is absolutely continuous.
P b
bT,n −→
Next we are going to show θ θ T as n → ∞. The function gn introduced in (3.4) admits
an inverse gn−1 : R × (−∞, 1) × R2 × (−∞, 1) → R5 satisfying
gn−1 (c, d, δ, ε, ζ) = (a, b, α, β, γ)
with
c
b = −n log(1 − d), a= R1 , γ = −n log(1 − ζ),
n
0 e−bw dw
R n1 γ Rw
ε δ + aβ 0 eγw− n 0 e−b(w−v) dv dw
β=R 1 γ
, α= R n1 .
0
n
e(γ−b)w− n dw 0 e−γw dw
P
Convergence (3.6) yields (b cT,n , dbT,n , δbT,n , εbT,n , ζbT,n ) −→ 0 as n → ∞, hence dbT,n ∈
(−∞, 1) and ζbT,n ∈ (−∞, 1) with probability tending to one as n → ∞. Consequently,
cT,n , dbT,n , δbT,n , εbT,n , ζbT,n ) = θ
gn−1 (b bT,n with probability tending to one as n → ∞. We have
bbT,n = −n log(1 − dbT,n ) = ndbT,n h1 (dbT,n )
with probability tending to one as n → ∞, where the continuous function h1 : (−∞, 1) → R is

given by (
− x1 log(1 − x) if x 6= 0,
h1 (x) :=
1 if x = 0.
P P P
By (3.6), we have ndbT,n −→ bbT and dbT,n −→ 0, thus we obtain h1 (dbT,n ) −→ h1 (0) = 1, and hence
P b
bbT,n −→ bT as n → ∞.
Moreover,
b
cT,n nb
cT,n nb
cT,n nb
cT,n
b
aT,n = R 1 = R 1 = R1 =
n b
e−bT,n w dw n n b
e−bT,n w dw 0 exp −n−1bbT,n v dv h2 (n−1bbT,n )
0 0
with probability tending to one as n → ∞, where the continuous function h2 : R → R is given by

Z 1 ( −x
1−e
−xv x if x 6= 0,
h2 (x) := e dv =
0 1 if x = 0.
9
P P P
We have already showed bbT,n −→ b
bT , yielding n−1bbT,n −→ 0, and hence h2 (n−1bbT,n ) −→ h2 (0) = 1
P P
as n → ∞. By (3.6), we have nb
cT,n −→ b aT,n −→ b
aT , thus we obtain b aT as n → ∞.
In a similar way,
bT,n = −n log(1 − ζbT,n ) = nζbT,n h1 (ζbT,n )
γ
P P
with probability tending to one as n → ∞. By (3.6), we have nζbT,n −→ b
γT and ζbT,n −→ 0, thus
P P
we obtain h1 (ζbT,n ) −→ h1 (0) = 1, and hence γ
bT,n −→ γ
bT as n → ∞.
Further,
γ
bT,n
εbT,n nb
εT,n e n
βbT,n = R 1
=
h2 (n−1 (bbT,n − γ
γ
bT,n
b bT,n ))
0
n
e(bγT,n −bT,n )w− n dw
P P
with probability tending to one as n → ∞. We have already showed bbT,n −→ b bT and γbT,n −→ γ bT ,
γ
bT,n
P P P P
yielding n−1bbT,n −→ 0 and n−1 γ bT,n −→ 0, and hence e n −→ 1 and h2 (n−1 (bbT,n − γ bT,n )) −→
P P
h2 (0) = 1 as n → ∞. By (3.6), we have nb εT,n −→ βbT , thus we obtain βbT,n −→ βbT as n → ∞.
Finally,
R 1 γ
bT,n Rw γ
bT,n
δbT,n + b
aT,n βbT,n n
0 eγbT,n w− n
0 e−b(w−v) dv dw nδbT,n + b
aT,n βbT,n e− n IT,n
α
bT,n = R 1 =
n
e−bγT,n w dw h2 (n−1 γbT,n )
0
with probability tending to one as n → ∞, where
Z 1 Z w
n
γbT,n w
1 |bγT,n | |bbT,n |
IT,n = n e e−b(w−v) dv dw 6 e n e n .
0 0 n
P P P P
We have already showed b aT,n −→ baT , b
bT,n −→ bbT , βbT,n −→ βbT and γ bT,n −→ bγT , yielding
γ
bT,n |b
γT,n |
P P P P P
n bbT,n −→ 0 and n b
−1 −1 −1
γT,n −→ 0, and hence h2 (n γ bT,n ) −→ h2 (0) = 1, e n −→ 1, e n −→ 1
|bT,n |
P P P
and e n −→ 1, implying IT,n −→ 0 as n → ∞. By (3.6), we have nδbT,n −→ α
bT , thus we
P
bT,n −→ α
obtain α bT as n → ∞. ✷
Using the SDE (1.1) and Corollary 3.2.20 in Karatzas and Shreve [24], one can check that
 
baT − a
 
b  " (1) #
 bT − b  (G )−1 h(1)
(3.7) bT − θ = 
θ b − α =
 T T
= G−1
α
 T  (2) −1 (2) T hT
 βb − β  (G T ) hT
 T 
γT − γ
b
(1) (2)
on the event where the random matrices GT and GT are invertible, where
" (1) # " (1) #
GT 0 hT
GT := (2)
, hT := (2)
,
0 GT hT
with  
Z " # Z 1
(1)
T p 1 (2)
T   p
hT := σ1 Ys dWs , hT := fs + σ3 dLs ),
 −Ys  (σ2 Ys dW
 
0 −Ys 0
−Xs
10
where
p
(3.8) fs := ̺Ws +
W 1 − ̺2 B s , s ∈ R+ ,
is a standard Wiener process, independent of L. Indeed,

" (1)
#
bT = G−1 f fT
θ T T with f T := (2)
,
fT
bT − θ = G−1 (f − GT θ), where

hence θ T T
 " #
(1)a (1)
−fT  GT
 b 
  
 
f T − GT θ =  α .
 (2) 
f − G(2) 
 β 
 T T  
γ
Using the first equation of SDE (1.1) and Corollary 3.2.20 in Karatzas and Shreve [24], we obtain
" # Z " # Z T" #" #⊤ " #
a T 1 1 1 a
(1) (1)
f T − GT = dYs − ds
b 0 −Ys 0 −Ys −Ys b
Z " # " #⊤ " # ! Z " #

T 1 1 a T 1
= dYs − ds = dYs − (a − bYs ) ds
0 −Ys −Ys b 0 −Ys
Z " #
T 1 p
= σ1 Ys dWs .
0 −Ys
In a similar way,
      ⊤  
α Z T 1 Z T 1 1 α
(2) (2) 










 


 
 
f T − GT β  =  −Ys  dXs −  −Ys   −Ys  ds β 
0 0
γ −Xs −Xs −Xs γ
   ⊤     
Z T 1 1 α Z T 1
        
=  −Ys    dXs −  −Y   β  ds

 =  −Ys  dXs − (α − βYs − γXs ) ds
   s      
0 0
−Xs −Xs γ −Xs
 
Z 1
T   p p
=  −Ys  σ2 Ys [̺dWs + 1 − ̺2 Bs ] + σ3 dLs .
 
0
−Xs
4 Consistency of CLSE
First we consider the case of subcritical Heston models, i.e., when b ∈ R++ .
11
4.1 Theorem. Let us consider the two-factor affine diffusion model (1.1) with a, b ∈ R++ , α, β ∈ R,
γ ∈ R++ , σ1 ∈ R++ , σ2 , σ3 ∈ R+ and ̺ ∈ [−1, 1] with a random initial value (η0 , ζ0 ) independent
of (Wt , Bt , Lt )t∈R+ satisfying P(η0 ∈ R+ ) = 1. Suppose that (1 − ̺2 )σ22 + σ32 > 0. Then the CLSE
⊤ a.s.
bT = b
of θ = (a, b, α, β, γ)⊤ is strongly consistent, i.e., θ aT , bbT , α
bT , βbT , γ
bT −→ θ = (a, b, α, β, γ)⊤
as T → ∞.
Proof. By (3.7), we have
(4.1) bT − θ = (T −1 GT )−1 (T −1 hT )
θ
on the event, where the random matrix GT is invertible, which has propapility 1, see Lemma 3.3.
By Theorem A.2, we obtain
a.s.
(4.2) T −1 GT −→ E(G∞ ) as T → ∞,
where
" (1)
#
G∞ 0
(4.3) G∞ := (2)
0 G∞
with  
" # 1 −Y∞ −X∞
1 −Y∞  
G(1)
∞ := , G(2) 
∞ :=  −Y∞
2 Y∞ X∞ 
2
Y∞ ,
−Y∞ Y∞ 2
−X∞ Y∞ X∞ X∞
where the random vector (Y∞ , X∞ ) is given by Theorem A.1, since, by Theorem B.2, the entries of
E(G∞ ) exist and finite.
(1) (1)
The matrix E(G∞ ) is strictly positive definite, since for all x ∈ R2 \{0}, we have x⊤ E(G∞ )x >
0. Indeed, for all x = (x1 , x2 )⊤ ∈ R2 \ {0},
" #⊤ " #
x1 x1
E(G(1)
∞) = E (x1 − x2 Y∞ )2 > 0,
x2 x2
since, by Theorem A.2, the distribution of Y∞ is absolutely continuous, hence x1 − x2 Y∞ 6= 0

(2)
with probability 1. In a similar way, the matrix E(G∞ ) is strictly positive definite, since for all
(2)
x ∈ R3 \ {0}, we have x⊤ E(G∞ )x > 0. Indeed, for all x = (x1 , x2 , x3 )⊤ ∈ R3 \ {0},
 ⊤  
x1 x1
   
x2  E(G(2) ) x2  = E (x1 − x2 Y∞ − x3 Z∞ )2 > 0,
  ∞  
x3 x3
since, by Theorem A.2, the distribution of (Y∞ , X∞ ) is absolutely continuous, hence x1 − x2 Y∞ −

(1) (2)
x3 X∞ 6= 0 with probability 1. Thus the matrices E(G∞ ) and E(G∞ ) are invertible, whence we
conclude
" (1)
#
−1 −1 a.s. [E(G∞ )]−1 0
(4.4) (T GT ) −→ (2) −1
= [E(G∞ )]−1 as T → ∞.
0 [E(G∞ )]
12
The aim of the next discussion is to show convergence
a.s.
(4.5) T −1 hT −→ 0 as T → ∞.
We have
Z Z RT √
1 T p 1 T Ys dWs a.s.
0
Ys dWs = Ys ds · RT −→ 0 as T → ∞.
T 0 T 0 Ys ds
0
Indeed, we have already proved

Z
1 T a.s. a
Ys ds −→ E(Y∞ ) = ∈ R++ as T → ∞,
T 0 b
and the strong law of large numbers for continuous local martingales (see, e.g., Theorem D.1) implies
RT √
0 Ys dWs a.s.
RT −→ 0 as T → ∞,
0 Ys ds
since we have Z Z
T T
1 a.s.
Ys ds = T · Ys ds −→ ∞ as T → ∞.
0 T 0
Further,
Z Z RT √
1 T p T (σ2 Ys dW fs + σ3 dLs ) a.s.
(σ2 fs + σ3 dLs ) = 1
Ys d W (σ22 Ys + σ32 )ds · 0
RT 2 −→ 0
T 0 T 0 (σ Y s + σ 2 ) ds
0 2 3
as T → ∞. Indeed, we have already proved

Z
1 T 2 a.s. a
(σ2 Ys + σ32 ) ds −→ E(σ22 Y∞ + σ32 ) = σ22 + σ32 ∈ R++ as T → ∞,
T 0 b
and the strong law of large numbers for continuous local martingales (see, e.g., Theorem D.1) implies
RT √
(σ2 Ys dW fs + σ3 dLs ) a.s.
0
RT 2 −→ 0 as T → ∞,
(σ Y + σ 2 ) ds
0 2 s 3
since we have
Z T Z T
1 a.s.
(σ22 Ys + σ32 ) ds =T· (σ22 Ys + σ32 ) ds −→ ∞ as T → ∞.
0 T 0
One can check

Z T p
1 a.s.
Ys Ys dWs −→ 0,
T 0
Z T p Z T p
1 a.s.
fs + σ3 dLs ) −→ 1 a.s.
fs + σ3 dLs ) −→
Ys (σ2 Ys d W 0, Xs (σ2 Ys d W 0
T 0 T 0
13
as T → ∞ in the same way, since
Z T
1 a.s.
Ys3 ds −→ E(Y∞
3
) ∈ R++ ,
T 0
Z T
1 a.s.
Ys2 (σ22 Ys + σ32 ) ds −→ E Ys2 (σ22 Ys + σ32 ) ∈ R++ ,
T 0
Z T
1 a.s.
Xs2 (σ22 Ys + σ32 ) ds −→ E Xs2 (σ22 Ys + σ32 ) ∈ R++
T 0
as T → ∞. Consequently, we conclude (4.5). Finally, by (4.4) and (4.5), we obtain the statement. ✷
In order to handle supercritical two-factor affine diffusion models when b ∈ R−− , we need the
following integral version of the Toeplitz Lemma, due to Dietz and Kutoyants [14].
4.2 Lemma. Let {ϕT : T ∈ R+ } be a family of probability measures on R+ such that ϕT ([0, T ]) = 1
for all T ∈ R+ , and limT →∞ ϕT ([0, K]) = 0 for all K ∈ R++ . Then for every bounded and
measurable function f : R+ → R for which the limit f (∞) := limt→∞ f (t) exists, we have
Z ∞
lim f (t) ϕT (dt) = f (∞).
T →∞ 0
As a special case, we have the following integral version of the Kronecker Lemma, see Küchler and
Sørensen [25, Lemma B.3.2].
RT
4.3 Lemma. Let a : R+ → R+ be a measurable function. Put b(T ) := 0 a(t) dt, T ∈ R+ .
Suppose that limT →∞ b(T ) = ∞. Then for every bounded and measurable function f : R+ → R for
which the limit f (∞) := limt→∞ f (t) exists, we have
Z T
1
lim a(t)f (t) dt = f (∞).
T →∞ b(T ) 0
Next we present an auxiliary lemma in the supercritical case about the asymptotic behavior of Yt
as t → ∞.
4.4 Lemma. Let us consider the two-factor affine diffusion model (1.1) with a ∈ R+ , b ∈ R−− ,
α, β, γ ∈ R, σ1 , σ2 , σ3 ∈ R+ and ̺ ∈ [−1, 1] with a random initial value (η0 , ζ0 ) independent of
(Wt , Bt , Lt )t∈R+ satisfying P(η0 ∈ R+ ) = 1. Then there exists a random variable VY such that
a.s.
(4.6) ebt Yt −→ VY as t → ∞
with P(VY 6= 0) = 1, and, for each k ∈ N,

Z t
kbt a.s. Vk
(4.7) e Yuk du −→ − Y as t → ∞.
0 kb
Proof. By (2.1),
Z t
E(Yt | Fs ) = E(Yt | Ys ) = e−b(t−s) Ys + a e−b(t−u) du
s
14
for all s, t ∈ R+ with 0 6 s 6 t. Thus
Z t
E(ebt Yt | FsY ) = ebs Ys + a ebu du > ebs Ys
s
for all s, t ∈ R+ with 0 6 s 6 t, consequently, the process (ebt Yt )t∈R+ is a non-negative
submartingale with respect to the filtration (FtY )t∈R+ . Moreover, b ∈ R−− implies
Z t Z ∞
bt bu a
E(e Yt ) = y0 + a e du 6 y0 + a ebu du = y0 − < ∞, t ∈ R+ ,
0 0 b
hence, by the submartingale convergence theorem, there exists a non-negative random variable VY
such that (4.6) holds.
The distribution of VY coincides with the distribution of Ye−1/b , where (Yet )t∈R+ is a CIR
process given by the SDE
q
et = adt + σ1
dY Yet dWt , t ∈ R+ ,
with initial value Ye0 = y0 , where (Wt )t∈R+ is a standard Wiener process, see Ben Alaya and Kebaier
[9, Proposition 3]. Consequently, P(VY ∈ R++ ) = 1, since Yet , t ∈ R++ , are absolutely continuous
random variables.
If ω ∈ Ω such that R+ ∋ t 7→ Yt (ω) is continuous and ebt Yt (ω) → VY (ω) as t → ∞, then, by
the integral Kronecker Lemma 4.3 with f (t) = ekbt Yt (ω)k and a(t) = e−kbt , t ∈ R+ , we have
Z t
1
Rt e−kbu (ekbu Yu (ω)k ) du → VY (ω)k as t → ∞.
e−kbu du 0
0
Rt −kbt
Here 0 e−kbu du = − e kb −1 , t ∈ R+ , thus we conclude the second convergence in (4.7). ✷.
The next theorem states strong consistency of the CLSE of b in the supercritical case.
4.5 Theorem. Let us consider the two-factor affine diffusion model (1.1) with a ∈ R+ , b ∈ R−− ,
α, β, γ ∈ R, σ1 ∈ R++ , σ2 , σ3 ∈ R+ and ̺ ∈ [−1, 1] with a random initial value (η0 , ζ0 ) independent
of (Wt , Bt , Lt )t∈R+ satisfying P(η0 ∈ R+ ) = 1. Then the CLSE of b is strongly consistent, i.e.,
a.s.
bbT −→ b as T → ∞.
Proof. By Lemma 3.3, there exists a unique CLSE bbT of b for all T ∈ R++ which has the form
given in (3.6). By Ito’s formula,
Z T Z
1 2 2 1 2 T
Ys dYs = (YT − Y0 ) − σ1 Ys ds, T ∈ R+ ,
0 2 2 0
hence, by (4.6) and (4.7), we have
RT RT RT T
R
2 − Y 2 ) + T σ 2 T Y ds
bbT = (Y T − Y 0 ) 0 Y s ds − T 0 Y s dY s (Y T − Y 0 ) 0 Y s ds − 2 (Y T 0 2 1 0 s
RT RT 2 = RT RT 2
2
T 0 Ys ds − 0 Ys ds 2
T 0 Ys ds − 0 Ys ds
1 bT
bT R T 1 2bT 2 RT
T e YT − e Y0 e
bT
0 Ys ds − 2 e YT − e2bT Y02 + 12 σ12 ebT ebT 0 Ys ds
= RT RT 2
e2bT 0 Ys2 ds − T1 ebT 0 Ys ds
VY
1 2
a.s. 0(VY − 0) − b − 2 (VY − 0) + 12 σ12 0 − VbY
−→ V2 2 =b
− 2bY − 0 − VbY
as T → ∞. ✷.
15
4.6 Remark. For critical two-factor affine diffusion models, it will turn out that the CLSE of a and
α are not even weakly consistent, but the CLSE of b, β and γ are weakly consistent, see Theorem
6.2. ✷
4.7 Remark. For supercritical two-factor affine diffusion models, it will turn out that the CLSE of
a and α are not even weakly consistent, but the CLSE of β and γ are weakly consistent, see
Theorem 7.3. ✷
5 Asymptotic behavior of CLSE: subcritical case
5.1 Theorem. Let us consider the two-factor affine diffusion model (1.1) with a, b ∈ R++ , α, β ∈ R,
of (Wt , Bt , Lt )t∈R+ satisfying P(η0 ∈ R+ ) = 1. Suppose that (1 − ̺2 )σ22 + σ32 > 0. Then the CLSE
of θ = (a, b, α, β, γ)⊤ is asymptotically normal, namely,
1 D
(5.1) bT − θ) −→
T 2 (θ e ∞ )[E(G∞ )]−1 )
N5 (0, [E(G∞ )]−1 E(G as T → ∞,
e ∞ has the form

where G∞ is given in (4.3) and G
 
σ12 Y∞ −σ12 Y∞
2 ̺σ1 σ2 Y∞ 2
−̺σ1 σ2 Y∞ −̺σ1 σ2 Y∞ X∞
 
 −σ12 Y∞ 2 σ12 Y∞ 3 −̺σ1 σ2 Y∞2 3
̺σ1 σ2 Y∞ 2X
̺σ1 σ2 Y∞ ∞ 
 
 2 2 2 2 2 2 2 
 ̺σ1 σ2 Y∞ −̺σ1 σ2 Y∞ σ2 Y ∞ + σ3 −(σ2 Y∞ + σ3 )Y∞ −(σ2 Y∞ + σ3 )X∞  ,
 
 −̺σ σ Y 2 ̺σ σ Y 3 −(σ 2 Y + σ 2 )Y (σ 2 Y + σ 2 )Y 2 (σ 2 Y + σ 2 )Y X 
 1 2 ∞ 1 2 ∞ 2 ∞ 3 ∞ 2 ∞ 3 ∞ 2 ∞ 3 ∞ ∞ 
2 2 2 2 2
−̺σ1 σ2 Y∞ X∞ ̺σ1 σ2 Y∞ X∞ −(σ2 Y∞ + σ3 )X∞ (σ2 Y∞ + σ3 )Y∞ X∞ 2 2
(σ2 Y∞ + σ3 )X∞ 2
where the random vector (Y∞ , X∞ ) is given by Theorem A.1.

1
(5.2) bT − θ) = (T −1 GT )−1 (T − 21 hT )
T 2 (θ
on the event where GT is invertible, which holds almost surely, see Lemma 3.3. By (4.4), we have
a.s.
(T −1 GT )−1 −→ [E(G∞ )]−1 as T → ∞. The process (ht )t∈R+ is a 5-dimensional continuous local
e t , t ∈ R+ , where
martingale with quadratic variation process hhit = G
 
σ12 Ys −σ12 Ys2 ̺σ1 σ2 Ys −̺σ1 σ2 Ys2 −̺σ1 σ2 Ys Xs
 
Z t ̺σ1 σ2 Ys2 Xs 
2 2 σ12 Ys3 −̺σ1 σ2 Ys2 ̺σ1 σ2 Ys3
 −σ1 Ys 
e t :=  
G  ̺σ1 σ2 Ys −̺σ1 σ2 Ys2 σ22 Ys + σ32 −(σ22 Ys + σ32 )Ys −(σ22 Ys + σ32 )Xs  ds.
0 
 −̺σ σ Y 2

 1 2 s ̺σ1 σ2 Ys3 −(σ22 Ys + σ32 )Ys (σ22 Ys + σ32 )Ys2 (σ22 Ys + σ32 )Ys Xs 

2 2 2 2 2 2
−̺σ1 σ2 Ys Xs ̺σ1 σ2 Ys Xs −(σ2 Ys + σ3 )Xs (σ2 Ys + σ3 )Ys Xs (σ2 Ys + σ3 )Xs 2 2
By Theorem A.2, we obtain

a.s.
e T −→ e ∞)
(5.3) T −1 G E(G as T → ∞,
16
e ∞ ) exist and finite. Using (5.3), Theorem D.2 yields
since, by Theorem B.2, the entries of E(G
1 D e ∞ )) as T → ∞. Hence, by (5.2) and by Slutsky’s lemma,
T − 2 hT −→ N5 (0, E(G
1 D
bT − θ) −→
T 2 (θ e ∞ ) [E(G∞ )]−1 ⊤
e ∞ ) = N5 0, [E(G∞ )]−1 E(G
[E(G∞ )]−1 N5 0, E(G
as T → ∞. ✷
6 Asymptotic behavior of CLSE: critical case
First we present an auxiliary lemma.

D
6.1 Lemma. If (Yt , Xt )t∈R+ and (Yet , Xet )t∈R+ are continuous semimartingales with (Yt , Xt )t∈R+ =
(Yet , Xet )t∈R+ , then
Z 1 Z 1
Y 1 , X1 , Xs dYs , Ysk Xsℓ ds : k, ℓ ∈ Z+ , k + ℓ 6 n
0 0
Z 1 Z 1
D
= Ye1 , Xe1 , Xes dYes , Yesk Xesℓ ds : k, ℓ ∈ Z+ , k + ℓ 6 n
0 0
for each n ∈ N.
Proof. By Proposition I.4.44 in Jacod and Shiryaev [23] with the Riemann sequence of deterministic

subdivisions ni ∧ T i∈N , n ∈ N, we have
n
X Z n Z
P
1
1X k ℓ P 1
X i−1 (Y i − Y i−1 ) −→ Xs dYs , Y i X i −→ Ysk Xsℓ ds
n n n
0 n n n 0
i=1 i=1
as n → ∞ for each k, ℓ ∈ Z+ , and similar convergences hold for (Yet , Xet )t∈R+ . The assumption
implies
Xn n
X n n
D 1 X k ℓ D 1 X ek eℓ
X i−1 (Y i − Y i−1 ) = Xei−1 (Ye i − Ye i−1 ), YiXi = Yi Xi
n n n n n n n n n n n n
i=1 i=1 i=1 i=1
for each n ∈ N and k, ℓ ∈ Z+ , hence we obtain the statement. ✷
6.2 Theorem. Let us consider the two-factor affine diffusion model (1.1) with a ∈ R+ , b = 0,
α ∈ R, β = 0, γ = 0, σ1 , σ2 , σ3 ∈ R+ and ̺ ∈ [−1, 1] with a random initial value (η0 , ζ0 )
independent of (Wt , Bt , Lt )t∈R+ satisfying P(η0 ∈ R+ ) = 1. Suppose that (1 − ̺2 )σ22 + σ32 > 0.
Then
  " #" #⊤ −1 " # 
 
aT − a
b R 1 1 Y 1 − a
  1 ds 
   0 1 2 σ2 R 1 
 T bbT   −Ys −Ys − 2 Y1 + a + 21 0 Ys ds 
   
  D    ⊤ −1  
(6.1) αb − α −→  X1 − α 
 T   1 1   
 T βb  R 1     R R  
 T        1 1
−Y1 X1 + (α + ̺σ1 σ2 ) 0 Ys ds + 0 Xs dYs 
 0 −Ys  −Ys  ds  
R1 
TbγT 1 2 σ22 R 1 σ32
−X −X s − X + α X ds +
s Y ds + 2 1 0 s 2 0 s 2
17
as T → ∞, where (Yt , Xt )t∈R+ is the unique strong solution of the SDE
( √
dYt = a dt + σ1 Yt dWt ,
(6.2) √ p t ∈ [0, ∞),
dXt = α dt + σ2 Yt (̺ dWt + 1 − ̺2 dBt ),
with initial value (Y0 , X0 ) = (0, 0).

" # " # " RT #−1  RT 1 
aT − a
b aT − a
b T − 0 Ys ds σ1 0 Ys2 dWs
= = RT RT 2  R T 23 .
bbT b
bT − b − 0 Ys ds Y ds −σ Y dW
0 s 1 0 s s
We can write
" RT # " 1 #" RT #" 1 #
T − 0 Ys ds T2 0 1 − T12 0 Ys ds T 2 0
RT RT 2 = 3 RT RT 2 3
− 0 Ys ds 0 Ys ds 0 T2 − T12 0 Ys ds
1
T 3 0 Ys ds 0 T2
and   " # 
RT 1
2 σ1 T
R 1
σ1 0 Ys dWs  T 0 Y 2 dW
 R T 23 =  T R0 s 3 s  .
T
−σ1 0 Ys dWs 0 T2 − Tσ12 0 Ys2 dWs
Consequently,
" # " #" # " RT #−1  R 1

1 σ1 T
aT − a
b 1 0 b aT − a 1 − T 2 0 Ys ds 2
Y dW
= = RT RT 2  T R0 s 3 s  .
T bbT 0 T bbT 1 1
− T 2 0 Ys ds T 3 0 Ys ds T
− σ12 0 Ys2 dWs T
In a similar way,
   RT RT −1
αbT − α 1 − T12 0 Ys ds − T12 0 Xs ds
   R RT 2 RT 
 T βbT  =  − 2 T Ys ds
1 1
Ys ds 1
Y X ds
   T 0 3 3 s s 
RT T
R T0 T 0
R T
TbγT − T12 0 Xs ds 1
T 3 0 Ys Xs ds
1
T 3 0 Xs ds
2
 R 1

σ2 T 2 f σ3
 T 0 Y s d Ws + T LT 
 R R
fs − σ32 T Ys dLs 
3
×  − Tσ22 0T Ys2 dW .
 R 1
T 0
R 
T fs − σ32 T Xs dLs
− Tσ22 0 Ys2 Xs dW T 0
The aim of the following discussion is to prove

Z T Z T
1 1 1 1 k ℓ
YT , XT , 2 Xs dYs , k+ℓ+1 Ys Xs ds : k, ℓ ∈ Z+ , k + ℓ 6 2
T T T 0 T 0
(6.3) Z 1 Z 1
D k ℓ
−→ Y1 , X1 , Xs dYs , Ys Xs ds : k, ℓ ∈ Z+ , k + ℓ 6 2
0 0
as T → ∞. By part (ii) of Remark 2.7 in Barczy et al. [5], we have

1 1
(T ) (T ) D
Yet , Xet t∈R+
:= YT t , XT t = (Yt , Xt )t∈R+ for all T ∈ R++ ,
T T t∈R+
18
since, by Proposition 2.1, (Yt , Xt )t∈R+ is an affine process with infinitesimal generator
1
(A(Y,X ) f )(y, x) = af1′ (y, x) + αf2′ (y, x) + y σ12 f1,1
′′ ′′
(y, x) + 2̺σ1 σ2 f1,2 (y, x) + σ22 f2,2
′′
(y, x) .
2
Hence, by Lemma 6.1, we obtain
Z 1 Z 1
Y 1 , X1 , Xs dYs , Ysk Xsℓ ds : k, ℓ ∈ Z+ , k + ℓ 6 2
0 0
Z 1 Z 1
D (T ) (T ) k ℓ
= Ye1 , Xe1 , Xes(T ) dY
e(T ) ,
s Yes(T ) Xes(T ) ds : k, ℓ ∈ Z+ , k + ℓ 6 2
0 0
Z T Z T
1 1 1 1
= Y T , XT , 2 Xs dYs , Ysk Xsℓ ds : k, ℓ ∈ Z+ , k + ℓ 6 2
T T T 0 T k+ℓ+1 0
for all T ∈ R++ . Then, by Slutsky’s lemma, in order to prove (6.3), it suffices to show the convergences
1 P 1 P
(6.4) (YT − YT ) −→ 0, (XT − XT ) −→ 0,
T T
Z T Z T Z T
1 P 1 P
(6.5) Xs dYs − Xs dYs −→ 0, (Ysk Xsℓ − Ysk Xsℓ ) ds −→ 0
T2 0 0 T k+ℓ+1 0
as T → ∞ for all k, ℓ ∈ Z+ with k + ℓ 6 2. By (3.21) in Barczy et al. [5], we have
(6.6) E(|Ys − Ys |) 6 E(Y0 ), s ∈ R+ ,
hence

1 1
E (YT − YT ) 6 E(Y0 ) → 0,
T T
Z T Z T
1 1 1
E 2
(Ys − Ys ) ds 6 2 E(|Ys − Ys |) ds 6 E(Y0 ) → 0,
T 0 T 0 T
P RT P
as T → ∞, implying T1 (YT − YT ) −→ 0 and T12 0 (Ys − Ys ) ds −→ 0 as T → ∞, i.e., the first
convergence in (6.4) and the second convergence in (6.5) for (k, ℓ) = (1, 0).
p
As in (3.23) in Barczy et al. [5], we have E(|Xs − Xs |) 6 E(|X0 |) + (σ22 E(Y0 ) + σ32 )s for all
s ∈ R+ , hence
1
(6.7) sup E(|Xs − Xs |) = O(T 2 ) as T → ∞,
s∈[0,T ]
thus

1 1 1
E (XT − XT ) = O(T 2 ) → 0,
T T
Z T Z T Z T
1 1 1 1 1 3
E (Xs − Xs ) ds 6 2 E(|Xs − Xs |) ds = 2 O(T 2 ) ds = 2 O(T 2 ) → 0,
T2 0 T 0 T 0 T
P RT P
as T → ∞, implying T1 (XT − XT ) −→ 0 and T12 0 (Xs − Xs ) ds −→ 0 as T → ∞, i.e., the second
convergence in (6.4) and the second convergence in (6.5) for (k, ℓ) = (0, 1).
19
As in (3.25) in Barczy et al. [5], we have E[(Ys − Ys )2 ] 6 2 E(Y02 ) + 2sσ12 E(Y0 ) for all s ∈ R+ ,
hence
(6.8) sup E[(Ys − Ys )2 ] = O(T ) as T → ∞.

s∈[0,T ]
2
By Proposition B.1, E(Ys2 ) = E(Y02 ) + (2a + σ12 ) E(Y0 )s + a s2 for all s ∈ R+ , hence
(6.9) sup E(Ys2 ) = O(T 2 ) as T → ∞,

s∈[0,T ]
and sups∈[0,T ] E(Ys2 ) = O(T 2 ) as T → ∞. We have

p
E(|Ys2 − Ys2 |) = E(|(Ys − Ys )(Ys + Ys )|) 6 E[(Ys − Ys )2 ] E[(Ys + Ys )2 ]
p
6 2 E[(Ys − Ys )2 ](E(Ys2 ) + E(Ys2 )),
yielding p 3
sup E(|Ys2 − Ys2 |) = 2 O(T )(O(T 2 ) + O(T 2 )) = O(T 2 ) as T → ∞,
s∈[0,T ]
thus
Z T Z T Z T
1 1 1 3 1 5
E (Ys2 − Ys2 ) ds 6 E(|Ys2 − Ys2 |) ds = O(T 2 ) ds = O(T 2 ) → 0,
T3 0 T3 0 T3 0 T 3
1
RT P
as T → ∞, implying T3 0 (Ys2 − Ys2 ) ds −→ 0 as T → ∞., i.e., the second convergence in (6.5)
for (k, ℓ) = (2, 0).
In a similar way, E[(Xs − Xs )2 ] 6 2 E(X02 ) + 2s(σ22 E(Y0 ) + σ32 ) for all s ∈ R+ , hence
(6.10) sup E[(Xs − Xs )2 ] = O(T ) as T → ∞.

s∈[0,T ]
2 2
By Proposition B.1, E(Xs2 ) = E(X02 ) + α s E(X0 ) + α s2 + σ22 s E(Y0 ) + a s2 + σ32 s, thus
sups∈[0,T ] E(Xs2 ) = O(T 2 ) and sups∈[0,T ] E(Xs2 ) = O(T 2 ) as T → ∞. We have
p
E(|Xs2 − Xs2 |) 6 2 E[(Xs − Xs )2 ](E(Xs2 ) + E(Xs2 )),
yielding p 3
sup E(|Xs2 − Xs2 |) = 2 O(T )(O(T 2 ) + O(T 2 )) = O(T 2 ) as T → ∞,
s∈[0,T ]
thus
Z T Z T Z T
1 1 1 3 1 5
E (Xs2 − Xs2 ) ds 6 E(|Xs2 − Xs2 |) ds = O(T 2 ) ds = O(T 2 ) → 0,
T3 0 T3 0 T3 0 T3
1
RT P
as T → ∞, implying T3 0 (Xs2 − Xs2 ) ds −→ 0 as T → ∞, i.e., the second convergence in (6.5)
for (k, ℓ) = (0, 2).
Further,
E(|Ys Xs − Ys Xs |) 6 E(|Ys − Ys ||Xs |) + E(Ys |Xs − Xs |)

p p
6 E[(Ys − Ys )2 ] E(Xs2 ) + E(Ys2 ) E[(Xs − Xs )2 ]
20
yields
p p 3
sup E(|Ys Xs − Ys Xs |) = O(T ) O(T 2 ) + O(T 2 ) O(T )) = O(T 2 ) as T → ∞,
s∈[0,T ]
thus
Z T Z T Z T
1 1 1 3 1 5
E 3
(Ys Xs − Ys Xs ) ds 6 3 E(|Ys Xs − Ys Xs |) ds = 3 O(T 2 ) ds = 3 O(T 2 ) → 0,
T 0 T 0 T 0 T
1
RT P
as T → ∞, implying T3 0 (Ys Xs − Ys Xs ) ds −→ 0 as T → ∞, i.e., the second convergence in
(6.5) for (k, ℓ) = (1, 1).
Using the Cauchy–Schwarz inequality, we obtain
Z T Z T Z T Z T
E Xs dYs − Xs dYs 6 E (Xs − Xs ) dYs + E Xs d(Ys − Ys )
0 0 0 0
p p
6 E1 (T ) + E2 (T )
with Z ! Z !
T 2 T 2
E1 (T ) := E (Xs − Xs ) dYs , E2 (T ) := E Xs d(Ys − Ys ) .
0 0
√
Using dYs = a ds + σ1 Ys dWs , we have
Z Z 2
!
T T p
E1 (T ) = E a (Xs − Xs ) ds + σ1 (Xs − Xs ) Ys dWs 6 2a2 E1,1 (T ) + 2σ12 E1,2 (T )
0 0
with
Z 2
! Z 2
!
T T p
E1,1 (T ) := E (Xs − Xs ) ds , E1,2 (T ) := E (Xs − Xs ) Ys dWs .
0 0
Applying (6.10), we obtain

Z T Z T Z T Z T
E1,1 (T ) = E (Xs − Xs )(Xu − Xu ) ds du = E[(Xs − Xs )(Xu − Xu )] ds du
0 0 0 0
Z T Z T p Z T Z T p
6 E[(Xs − Xs )2 ] E[(Xu − Xu )2 ] ds du = O(T ) O(T ) ds du = O(T 3 ).
0 0 0 0
Again by the Cauchy–Schwarz inequality, we obtain

Z T Z T Z T p
E1,2 (T ) = E (Xs − Xs )2 Ys ds = E[(Xs − Xs )2 Ys ] ds 6 E[(Xs − Xs )4 ] E(Ys2 ) ds.
0 0 0
Rt√ R √ R √
Using Xt = X0 +σ2 0 Ys dW fs +σ3 Lt and Xt = σ2 t Ys dW fs , we get Xt −Xt = X0 +σ2 t ( Ys −
√ 0 0
fs +σ3 Lt , and, applying Minkowski inequality and a martingale moment inequality in Karatzas
Ys ) d W
21
and Shreve [24, 3.3.25], we obtain
" Z 4 #! 1
1 1
t p p 4
1
(E[(Xt − Xt )4 ]) 6 [E(X04 )] + σ2 E
4 4 f
( Y s − Y s ) d Ws + σ3 [E(L4t )] 4
0
Z p p 4 1
1
t √ √
[E(X04 )] 4 2 4 4
6 + σ2 (2 · 3) t E ( Ys − Ys ) ds + σ3 3 t
0
Z 41
1
t √ √
[E(X04 )] 4 2 4
6 + σ2 36t E[(Ys − Ys ) ] ds + σ3 3 t.
0
Applying (6.8), we get
(6.11) sup E[(Xt − Xt )4 ] = O(T 3 ) as T → ∞,

t∈[0,T ]
RT p 7
which, by (6.9), implies E1,2 (T ) = 0 O(T 3 ) O(T 2 ) ds = O(T 2 ) as T → ∞. Using E1,1 (T ) =
7 7
O(T 3 ) as T → ∞, we conclude E1 (T ) = O(T 3 ) + O(T 2 ) = O(T 2 ) as T → ∞.
√ √
Using dYs = a ds + σ1 Ys dWs and dYs = a ds + σ1 Ys dWs , we obtain d(Yt − Yt ) =
√ √
σ1 ( Yt − Yt ) dWt , thus
Z T p p Z T
2 2 2 2
E2 (T ) = σ1 E Xs ( Ys − Ys ) ds 6 σ1 E[Xs2 |Ys − Ys |] ds
0 0
Z T p
6 σ12 E(Xs4 ) E[(Ys − Ys )2 ] ds.
0
Rt√
Using Xt = αt + σ2 fs , we obtain
Ys d W
0
"Z 4 #! 41 Z 14
1
tp t
[E(Xt4 )] 4 6 |α|t + σ2 E fs
Ys d W 6 |α|t + σ2 (2 · 3) t E 2
Ys2 ds
0 0
Z t 41 q
σ2 2
= |α|t + σ2 36t a a+ 1 s ds = |α| + σ2 4
6a(2a + σ12 ) t,
0 2
hence we conclude
(6.12) sup E(Xs4 ) = O(T 4 ) as T → ∞.

s∈[0,T ]
RT p 7
Using (6.8), we obtain E2 (T ) = 0 O(T 4 ) O(T ) ds = O(T 2 ) as T → ∞. Hence
Z T Z T
1 1 p p 1 7
E 2
Xs dYs − Xs dYs 6 2 E1 (T ) + E2 (T ) = 2
O(T 4 ) → 0
T 0 0 T T
R RT
T P
as T → ∞, implying T12 0 Xs dYs − 0 Xs dYs −→ 0 as T → ∞, i.e., the first convergence in
(6.5). Thus we conclude convergence (6.3).
Applying the first equation of (1.1) and using b = 0, we obtain
Z
σ1 T 21 YT − Y0 − aT D
Ys dWs = −→ Y1 − a, as T → ∞.
T 0 T
22
By Itô’s formula and using b = 0,
1 3
d(Yt2 ) = 2Yt dYt + σ12 Yt dt = 2Yt (a dt + σ1 Yt 2 dWt ) + σ12 Yt dt = (2a + σ12 )Yt dt + 2σ1 Yt 2 dWt ,
hence Z Z
T T 3
YT2 = Y02 + (2a + σ12 ) Ys ds + 2σ1 Ys2 dWs .
0 0
Consequently,
Z RT R1
σ1 T 3 Y 2 − Y02 − (2a + σ12 ) Ys ds Y 2 − (2a + σ12 )
D Ys ds
− 2 Ys dWs = − T
2 0
−→ − 1 0
T 0 2T 2 2
as T → ∞. In a similar way, applying the second equation of (1.1) and using β = 0 and γ = 0,
we obtain Z
σ2 T 21 f σ3 XT − X0 − αT D
Y s d Ws + LT = −→ X1 − α, as T → ∞.
T 0 T T
By Itô’s formula and using β = 0 and γ = 0,
1
ft + σ3 dLt ) + Xt dYt + ̺σ1 σ2 Yt dt
d(Yt Xt ) = Yt dXt + Xt dYt + ̺σ1 σ2 Yt dt = Yt (α dt + σ2 Yt 2 dW
3
ft + Xt dYt + σ3 Yt dLt ,
= (α + ̺σ1 σ2 )Yt dt + σ2 Yt 2 dW
hence
Z T Z T 3
Z T Z T
YT XT = Y0 X0 + (α + ̺σ1 σ2 ) Ys ds + σ2 fs +
Ys2 dW Xs dYs + σ3 Ys dLs .
0 0 0 0
Consequently,
Z T Z T
RT RT
σ2 fs − σ3
3 YT XT − Y0 X0 − (α + ̺σ1 σ2 ) 0 Ys ds − 0 Xs dYs
− 2 Ys d W 2
Ys dLs = −
T 0 T2 0 T2
Z 1 Z 1
D
−→ −Y1 X1 + (α + ̺σ1 σ2 ) Ys ds + Xs dYs
0 0
as T → ∞. Again by Itô’s formula and using β = 0 and γ = 0,

1
ft + σ3 dLt ) + (σ 2 Yt + σ 2 ) dt,
d(Xt2 ) = 2Xt dXt + (σ22 Yt + σ32 ) dt = 2Xt (α dt + σ2 Yt 2 dW 2 3
hence Z Z Z
T T 1 T
XT2 = X02 + (2αXs + σ22 Ys + σ32 ) ds + 2σ2 fs + 2σ3
Ys Xs dW2
Xs dLs .
0 0 0
Consequently,
Z Z RT
σ2 T 1
fs − σ3
T XT2 − X02 − 0 (2αXs + σ22 Ys + σ32 ) ds
− 2 2
Ys Xs dW Xs dLs = −
T 0 T2 0 2T 2
R1
D X12 − 0 (2αXs + σ22 Ys + σ32 ) ds
−→ −
2
as T → ∞, and we conclude (6.1). ✷
23
7 Asymptotic behavior of CLSE: supercritical case
First we present an auxiliary lemma about the asymptotic behavior of E(Xt2 ) as t → ∞.
α, β ∈ R, γ ∈ (−∞, b), σ1 ∈ R++ , σ2 , σ3 ∈ R+ and ̺ ∈ [−1, 1] with a random initial value (η0 , ζ0 )
independent of (Wt , Bt , Lt )t∈R+ satisfying P(η0 ∈ R+ ) = 1. Then supt∈R+ e2γt E(Xt2 ) < ∞.
Proof. By Proposition B.1,

Z t Z ∞
bt

bu
sup e E(Yt ) = sup E(Y0 ) + a e du = E(Y0 ) + a ebu du < ∞,
t∈R+ t∈R+ 0 0
since b < 0. Moreover,

Z t Z t
γt γu
sup e | E(Xt )| = sup E(X0 ) + α e du − β eγu E(Yu ) du
t∈R+ t∈R+ 0 0
Z ∞ Z ∞
γu bu
6 | E(X0 )| + |α| e du + |β| sup e E(Yu ) e(γ−b)u du < ∞,
0 u∈R+ 0
using γ < 0 and γ − b < 0. Again by Proposition B.1,

Z t
2bt 2 2 2 2bu
sup e E(Yt ) = sup E(Y0 ) + (2a + σ1 ) e E(Yu ) du
t∈R+ t∈R+ 0
Z ∞
2 2 bu
6 E(Y0 ) + (2a + σ1 ) sup e E(Yu ) ebu du < ∞,
u∈R+ 0
using b < 0. Hence

Z t
(b+γ)t
sup e | E(Yt Xt )| = sup E(Y0 X0 ) + a e(b+γ)u E(Xu ) du
t∈R+ t∈R+ 0
Z t Z t
+ (α + ̺σ1 σ2 ) e(b+γ)u E(Yu ) du − β e(b+γ)u E(Yu2 ) du
0 0
Z ∞ Z ∞
γu bu bu
6 | E(Y0 X0 )| + a sup e | E(Xu )| e du + (|α| + |̺|σ1 σ2 ) sup e E(Yu ) eγu du
u∈R+ 0 u∈R+ 0
Z ∞
2bu 2
+ |β| sup e E(Yu ) e(γ−b)u du < ∞,
u∈R+ 0
24
using b < 0, γ < 0 and γ − b < 0. Consequently,
Z t Z t
2γt 2 2 2γu
sup e E(Xt ) = sup E(X0 ) + α e Xu du − 2β e2γu Yu Xu du
t∈R+ t∈R+ 0 0
Z t Z t
+ σ22 e2γu Yu du + σ32 e2γu du
0 0
Z ∞ Z ∞
2 γu γu (b+γ)u
6 E(X0 ) + |α| sup e | E(Xu )| e du + 2|β| sup e | E(Yu Xu )| e(γ−b)u du
u∈R+ 0 u∈R+ 0
Z ∞ Z ∞
+ σ22 bu
sup e E(Yu ) e(2γ−b)u du + σ32 e2γu du < ∞
u∈R+ 0 0
using γ < 0, γ − b < 0 and 2γ − b < 0. ✷

Next we present an auxiliary lemma about the asymptotic behavior of Xt as t → ∞.
independent of (Wt , Bt , Lt )t∈R+ satisfying P(η0 ∈ R+ ) = 1. Suppose that αβ ∈ R− . Then there
exists a random variable VX such that
a.s.
(7.1) eγt Xt −→ VX as t → ∞
and, for each k, ℓ ∈ Z+ with k + ℓ > 0,

Z t
(kb+ℓγ)t a.s. VYk VXℓ
(7.2) e Yuk Xuℓ du −→ − as t → ∞,
0 kb + ℓγ
σ2
where VY is given in (4.6). If, in addition, σ3 ∈ R++ or a − 21 (1 − ̺2 )σ22 ∈ R++ , then the
distribution of the random variable VX is absolutely continuous. Particularly, P(VX 6= 0) = 1.
Proof. By (2.2),
Z t
E(Xt | Fs ) = E(Xt | Ys , Xs ) = e−γ(t−s) Xs + e−γ(t−u) (α − βYu ) du
s
for all s, t ∈ R+ with 0 6 s 6 t. If α ∈ R+ and β ∈ R− , then

Z t
γt Y,X γs
E(e Xt | Fs ) = e Xs + eγu (α − βYu ) du > eγs Xs
s
for all s, t ∈ R+ with 0 6 s 6 t, consequently, the process (eγt Xt )t∈R+ is a submartingale with
respect to the filtration (FtY,X )t∈R+ . If α ∈ R− and β ∈ R+ , then
Z t
γt
E(e Xt | FsY,X ) γs
= e Xs + eγu (α − βYu ) du 6 eγs Xs
s
for all s, t ∈ R+ with 0 6 s 6 t, consequently, the process (eγt Xt )t∈R+ is a supermartingale with
respect to the filtration (FtY,X )t∈R+ , hence the process (−eγt Xt )t∈R+ is a submartingale with respect
25
to the filtration (FtY,X )t∈R+ . In both cases, supt∈R+ E(|eγt Xt |2 ) < ∞, see Lemma 7.1. Hence, by
the submartingale convergence theorem, there exists a random variable VX such that (7.1) holds.
If ω ∈ Ω such that R+ ∋ t 7→ (Yt (ω), Xt (ω)) is continuous and (ebt Yt (ω), eγt Xt (ω)) →
(VY (ω), VX (ω)) as t → ∞, then, by the integral Kronecker Lemma 4.3 with f (t) =
e(kb+ℓγ)t Yt (ω)k Xt (ω)ℓ and a(t) = e−(kb+ℓγ)t , t ∈ R+ , we have
Z t
1
Rt e−(kb+ℓγ)u (e(kb+ℓγ)u Yu (ω)k Xu (ω)ℓ ) du → VY (ω)k VX (ω)ℓ as t → ∞.
e−(kb+ℓγ)u du 0
0
Rt −(kb+ℓγ)t
Here 0 e−(kb+ℓγ)u du = − e kb+ℓγ −1 , t ∈ R+ , thus we conclude (7.2).
σ2
Now suppose that σ3 ∈ R++ or a − 21 (1 − ̺2 )σ22 ∈ R++ . We are going to show that the
random variable VX is absolutely continuous. Put Zt := Xt − rYt , t ∈ R+ with r := σσ21̺ . Then
the process (Yt , Zt )t∈R+ is an affine process satisfying
( √
dYt = (a − bYt ) dt + σ1 Yt dWt ,
√ t ∈ R+ ,
dZt = (A − BYt − γZt ) dt + Σ2 Yt dBt + σ3 dLt .
p
where A := α − ra, B := β − r(b − γ) and Σ2 := σ2 1 − ̺2 , see Bolyog and Pap [11, Proposition
2.5]. We have
Z t Z t p Z t
γt γt γt γt γu γu
e Xt = re Yt + e Zt = re Yt + Z0 + e (A − BYu ) du + Σ2 e Yu dBu + σ3 eγu dLu ,
0 0 0
where we used (2.2) with s = 0 multiplied both sides by eγt .
Thus the conditional distribution of
Rt
e Xt given (Yu )u∈[0,t] and X0 is a normal distribution with mean reγt Yt + Z0 + 0 eγu (A − BYu ) du
γt
R t R t
and with variance Σ22 0 e2γu Yu du + σ32 0 e2γu du. Hence
γt
E eiλe Xt (Yu )u∈[0,t] , X0
Z t Z t Z t
λ2
= exp iλ reγt Yt + Z0 + eγu (A − BYu ) du − Σ22 e2γu Yu du + σ32 e2γu du .
0 2 0 0
Consequently,
γt γt
E eiλe Xt = E E eiλe Xt (Yu )u∈[0,t] , X0
Z t Z t Z t
λ2
= E exp iλ reγt Yt + Z0 + eγu (A − BYu ) du − Σ22 e2γu Yu du + σ32 e2γu du
0 2 0 0
Z t Z t Z t
γt γu λ2 2 2γu 2 2γu

6 E exp iλ re Yt + Z0 + e (A − BYu ) du − Σ2 e Yu du + σ3 e du
0 2 0 0
2 Z t Z t
λ 2 2γu 2 2γu
= E exp − Σ2 e Yu du + σ3 e du .
2 0 0
D
Convergence (7.1) implies eγt Xt −→ VX as t → ∞, hence, by the continuity theorem and by the
monotone convergence theorem,
2 Z t Z t
iλVX
iλeγt Xt
λ 2 2γu 2 2γu
E e = lim E e 6 lim E exp − Σ2 e Yu du + σ3 e du
t→∞ t→∞ 2 0 0
2 Z ∞ Z ∞
λ
= E exp − Σ22 e2γu Yu du + σ32 e2γu du .
2 0 0
26
for all λ ∈ R. If σ3 ∈ R++ , then we have

iλVX
σ32 2
E e 6 exp − λ
4(−γ)
R∞
for all λ ∈ R, hence −∞ E eiλVX dλ < ∞, implying absolute continuity of the distribution of
VX .
σ2
If a − 21 (1 − ̺2 )σ22 ∈ R++ , then we have
Z Z
iλVX
Σ22 2 ∞ 2γu Σ22 e4γ 2 2
E e 6 E exp − λ e Yu du 6 E exp − λ Yu du
2 0 2 1
for all λ ∈ R. Applying the comparison theorem (see, e.g., Karatzas and Shreve [24, 5.2.18]), we
obtain P(Yt 6 Yt for all t ∈ R+ ) = 1, where (Yt )t∈R+ is the unique strong solution of the SDE
p
dYt = (a − bYt ) dt + σ1 Yt dWt , t ∈ [0, ∞),
p
with initial value Y0 = 0. Consequently, taking into account Σ2 = σ2 1 − ̺2 > 0, we obtain
Z ∞ Z ∞ Z 2
Σ2 e4γ
E eiλVX dλ 6 E exp − 2 λ2 Yu du dλ
−∞ −∞ 2 1
   
Z ∞ Z √ √
Σ22 e4γ 2 2 2π 2π 1
=E exp − λ Yu du dλ = E  qR =
2γ
E  qR  < ∞,
2 2 Σ 2 e 2
−∞ 1 Σ2 e2γ 1 Yu du 1 Yu du
whenever
 
1
(7.3) E  qR  < ∞.
2
1 Yu du
By the Cauchy–Schwarz inequality, we have

Z 2 p 2 Z 2 Z 2
1 1
1= Yu · √ du 6 Yu du du,
1 Ys 1 1 Yu
hence   s  s Z
Z 2 2 sZ 2
1 1 1 1
E  qR  6 E du 6 E du = E du.
2 1 Yu 1 Yu 1 Yu
1 Yu du
D
For each u ∈ R++ , we have Yu = c(u)ξ, where the distribution of ξ has a chi-square distribution
σ2 R u σ2 (e−bu −1)
with degrees of freedom σ4a2 and c(u) := 41 0 e−bv dv = 1 4(−b) , see Proposition B.1. Hence
1

1 1 1
E = E ,
Yu c(u) ξ
1

where E ξ < ∞, since the density of ξ has the form
2a
1 2 −1
e− 2 1R++ (x)
x
R ∋ x 7→ x σ1
2a
σ2 2a

2 1 Γ σ12
27
σ12 2a
and the assumption a − 2 > 0 yields σ12
− 1 > 0. Consequently,
Z 2
Z 2 Z 2
1 1 1 1 4(−b)
E du = E du = E 2 −bu − 1)
du < ∞,
1 Yu ξ 1 c(u) ξ 1 σ1 (e
R∞
thus we obtain (7.3), and hence −∞ E eiλVX dλ < ∞, and we conclude absolute continuity of the
distribution of VX . ✷
7.3 Theorem. Let us consider the two-factor affine diffusion model (1.1) with a ∈ R+ , b ∈ R−− ,
independent of (Wt , Bt , Lt )t∈R+ satisfying P(η0 ∈ R+ ) = 1. Suppose that αβ ∈ R− . Suppose that
σ2
σ3 ∈ R++ or a − 21 (1 − ̺2 )σ22 ∈ R++ . Then
 bT 
(b
TeaT − a)
2
 − bT 
 e 2 (bbT − b) 
 
 bT  D
(7.4)  T e 2 (bαT − α)  −→ V −1 ηξ
 
 e−bT /2 (βb − β) 
 T 
(b−2γ)T
e 2 γT − γ)
(b
as T → ∞ with  
1 VbY 0 0 0
 
 V2 
0 − 2bY 0 0 0 
 
 VY VX 
V := 0 0 1 b γ ,
 
 V2 VY VX 
0 0 0 − 2bY − b+γ 
 
VX2
0 0 0 − Vb+γ
Y VX
− 2γ
where VY and VX are given in (4.6) and (7.1), respectively, η is a 5 × 5 random matrix such that
 σ2 V σ12 VY2 ̺σ1 σ2 VY2

̺σ1 σ2 VY ̺σ1 σ2 VY VX
− 1b Y 2b − b 2b b+γ
 
 σ12 VY2 σ12 VY3 ̺σ1 σ2 VY2 ̺σ1 σ2 VY3 ̺σ1 σ2 VY2 VX 
 2b − 3b 2b − 3b − 2b+γ 
 
 ̺σ1 σ2 VY2 2
σ2 VY 2
σ2 VY2 2
σ2 VY VX 
ηη ⊤ =  − ̺σ1 σb2 VY 2b − b 2b b+γ
,
 
 ̺σ1 σ2 VY 2 ̺σ1 σ2 VY 3 2
σ2 VY 2 2
σ2 VY 3 σ2 VY VX 
2 2
 − 3b − 3b − 2b+γ 
 2b 2b 
̺σ1 σ2 VY VX ̺σ1 σ2 VY2 VX σ22 VY VX σ22 VY2 VX σ22 VY VX2
b+γ − 2b+γ b+γ − 2b+γ − b+2γ
and ξ is a 5-dimensional standard normally distributed random vector independent of (VY , VX ).
Proof. We have
 bT 
Te
(baT − a)
2
 − bT 
 e 2 (bbT − b) 
  bT
 bT  bT bT bT (b−2γ)T
bT − θ ,
 T e 2 (bαT − α)  = diag T e 2 , e− 2 , T e 2 , e− 2 , e 2 θ
 
 e− bT2 (βb − β) 
 T 
(b−2γ)T
e 2 γT − γ)
(b
28
where, by (3.7),
" (1) #−1 " (1)
#
bT − θ = G−1 hT = GT 0 hT
θ T (2) (2) .
0 GT hT
We are going to apply Theorem D.2 for the continuous local martingale (hT )T ∈R+ with quadratic
variation process hhiT = Ge T , T ∈ R+ (introduced in the proof of Theorem 5.1). With scaling
matrices bT 3bT bT 3bT (b+2γ)T
Q(T ) := diag e 2 , e 2 , e 2 , e 2 , e 2 , T ∈ R++ ,
by (7.2), we have
 
σ2 V σ12 VY2 ̺σ1 σ2 VY2
− 1b Y 2b − ̺σ1 σb2 VY 2b
̺σ1 σ2 VY VX
b+γ
 
 σ12 VY2 σ12 VY3
− 3b
̺σ1 σ2 VY2 ̺σ1 σ2 VY3
− 3b
̺σ1 σ2 VY2 VX 
− 2b+γ 
 2b 2b

⊤ a.s.  ̺σ1 σ2 VY ̺σ1 σ2 VY2 σ22 VY σ22 VY2 σ22 VY VX

Q(T )hhiT Q(T ) −→  − b − b  = ηη ⊤
2b 2b b+γ 
 ̺σ1 σ2 VY2 ̺σ σ V 3 σ22 VY2 σ2 V 3 σ2 VY VX 
2 2
 − 13b2 Y − 23bY − 2b+γ 
 2b 2b 
̺σ1 σ2 VY VX ̺σ σ2 VY2 VX σ22 VY VX σ2 VY2 VX σ22 VY VX2
b+γ − 12b+γ b+γ − 22b+γ − b+2γ
as T → ∞. Hence by Theorem D.2, for each random matrix A defined on (Ω, F, P), we obtain
D
(7.5) (Q(T )hT , A) −→ (ηξ, A) as T → ∞,
where ξ is a 5-dimensional standard normally distributed random vector independent of (η, A).
The aim of the following discussion is to include appropriate scaling matrices for GT . The matrices
(1) (2)
GT and GT can be written in the form
 R 
ebT T
(1) 1 1 − T 0 Ys ds
√ 1
GT = diag T 2 , e−bT  bT R T R T 2  diag T 2 , e
−bT
e√ 2bT
− T 0 Ys ds e 0 Ys ds
and
 bT RT γT RT 
e
1 −√ T 0 Ys ds − e√T 0 Xs ds
(2) 1  ebT RT R T R T

GT = diag T 2 , e−bT , e−γT  − √T 0 Ys ds e2bT 0 Ys2 ds e(b+γ)T 0 Ys Xs ds

γT R T R T RT 2
− e√T 0 Xs ds e(b+γ)T 0 Ys Xs ds e2γT
0 Xs ds
1
× diag T 2 , e−bT , e−γT ,
(1) (2)
hence the matrices (GT )−1 and (GT )−1 can be written in the form
 R −1
ebT T
(1) 1 1 − √
T 0
Y s ds 1
(GT )−1 = diag T − 2 , ebT  bT R T R T
 diag T − 2 , ebT
e
−√ T 0
Ys ds e2bT 0 Ys2 ds
and
 bT RT γT RT −1
e
1 0 −√ T
Ys ds
0 − e√T Xs ds
(2) 1  ebT RT R
2bT T Y 2 ds
R
(b+γ)T T Y X ds

(GT )−1 = diag T − 2 , ebT , eγT  − √
0 Y s ds e 0 s e 0 s s 
T
R
e√γT T (b+γ)T
RT 2γT
RT 2
− T 0 Xs ds e 0 Ys Xs ds e 0 Xs ds
1
× diag T − 2 , ebT , eγT .
29
We have
bT bT bT bT (b−2γ)T
1 1

diag T e 2 , e− 2 , T e 2 , e− 2 , e 2 diag T − 2 , ebT , T − 2 , ebT , eγT
1 bT bT 1 bT bT bT

= diag T 2 e 2 , e 2 , T 2 e 2 , e 2 , e 2
and
1 1

diag T − 2 , ebT , T − 2 , ebT , eγT Q(T )−1
1 1
bT 3bT bT 3bT (b+2γ)T

= diag T − 2 , ebT , T − 2 , ebT , eγT diag e− 2 , e− 2 , e− 2 , e− 2 , e− 2
1 bT bT 1 bT bT bT

= diag T − 2 e− 2 , e− 2 , T − 2 e− 2 , e− 2 , e− 2 .
Moreover,
 R 
1 bT bT ebT T 1 bT
1 − T 0 Ys ds
√ bT
diag T 2 e 2 , e 2  bT R T R  diag T − 2 e− 2 , e− 2
e T
−√ T 0
Ys ds e2bT 0 Ys2 ds
" RT #
1 −ebT 0 Ys ds (1)
= bT RT RT 2 =: J T
− eT 0 Ys ds e2bT
0 Ys ds
and
 bT RT γT RT 
e
1 −√ 0 Ys ds − e√T 0 Xs ds
1 bT bT bT  bT RT RT
R 
T T
diag T 2 e 2 , e 2 , e 2  e
−√ 0 Ys ds e2bT 0 Ys2 ds e(b+γ)T 0 Ys Xs ds

T
eγT
RT RT RT 2
− √T 0 Xs ds e(b+γ)T 0 Ys Xs ds e2γT
0 Xs ds
1 bT bT bT

× diag T − 2 e− 2 , e− 2 , e− 2
 RT RT 
1 −ebT 0 Ys ds −eγT 0 Xs ds
 bT RT R R 
= e 2bT T Y 2 ds (b+γ)T T Y X ds =: J (2)
− T Y s ds e s e s s  T
γT R 0T R0T R 0T 2
− eT 0 Xs ds e
(b+γ)T
0 Ys Xs ds e2γT
0 Xs ds
Consequently,
 bT 
Te (b
2 aT − a)
 − bT 
 e 2 (bbT − b) 
 
 bT  (1) (2) −1
 T e (b2 αT − α)  = diag J T , J T Q(T )hT ,
 
 e− 2 (βb − β) 
bT
 T 
(b−2γ)T
e 2 γT − γ)
(b
where, by Lemma 7.2,
(1) (2) P
(7.6) diag J T , J T −→ V as T → ∞.
By (7.5) with A = V , by (7.6) and by Theorem 2.7 (iv) of van der Vaart [28], we obtain
(1) (2) D
Q(T )hT , diag J T , J T −→ (ηξ, V ) as T → ∞.
30
The random matrix V is invertible almost surely, since
(b − γ)2 VY4 VX2

det(V ) = − >0
8(b + γ)2 b2 γ
(1) (2) −1 D
almost surely by Lemma 7.2. Consequently, diag J T , J T Q(T )hT −→ V −1 ηξ as T → ∞. ✷
Appendix
A Stationarity and exponential ergodicity
The following result states the existence of a unique stationary distribution of the affine diffusion
process given by the SDE (1.1), see Bolyog and Pap [11, Theorem 3.1]. Let C− := {z ∈ C : Re(z) 6 0}.
A.1 Theorem. Let us consider the two-factor affine diffusion model (1.1) with a ∈ R+ , b ∈ R++ ,
α, β ∈ R, γ ∈ R++ , σ1 , σ2 , σ3 ∈ R+ , ̺ ∈ [−1, 1], and with a random initial value (η0 , ζ0 )
independent of (Wt , Bt , Lt )t∈R+ satisfying P(η0 ∈ R+ ) = 1. Then
D
(i) (Yt , Xt ) −→ (Y∞ , X∞ ) as t → ∞, and we have
Z ∞
u1 Y∞ +iλ2 X∞
α σ32 2
(A.1) E e = exp a κs (u1 , λ2 ) ds + i λ2 − λ
0 γ 4γ 2
for (u1 , λ2 ) ∈ C− × R, where κt (u1 , λ2 ), t ∈ R+ , is the unique solution of the (deterministic)

differential equation

 ∂κt −γt 1 2 2
 ∂t (u1 , λ2 ) = −bκt (u1 , λ2 ) − iβe λ2 + 2 σ1 κt (u1 , λ2 )

(A.2) + i̺σ1 σ2 e−γt λ2 κt (u1 , λ2 ) − 21 σ22 e−2γt λ22 ,


κ (u , λ ) = u ;
0 1 2 1
(ii) supposing that the random initial value (η0 , ζ0 ) has the same distribution as (Y∞ , X∞ ) given
in part (i), (Yt , Xt )t∈R+ is strictly stationary.
In the subcritical case, the following result states the exponential ergodicity and a strong law of
large numbers for the process (Yt , Xt )t∈R+ , see Bolyog and Pap [11, Theorem 4.1].
A.2 Theorem. Let us consider the two-factor affine diffusion model (1.1) with a, b ∈ R++ , α, β ∈ R,
of (Wt , Bt , Lt )t∈R+ satisfying P(η0 ∈ R+ ) = 1. Suppose that (1 − ̺2 )σ22 + σ32 > 0. Then the process
(Yt , Xt )t∈R+ is exponentially ergodic, namely, there exist δ ∈ R++ , B ∈ R++ and κ ∈ R++ , such
that

(A.3) sup E g(Yt , Xt ) | (Y0 , X0 ) = (y0 , x0 ) − E(g(Y∞ , X∞ )) 6 B(V (y0 , x0 ) + 1)e−δt
|g|6V +1
31
for all t ∈ R+ and (y0 , x0 ) ∈ R+ ×R, where the supremum is running for Borel measurable functions
g : R+ × R → R,
(A.4) V (y, x) := y 2 + κx2 , (y, x) ∈ R+ × R,
and the distribution of (Y∞ , X∞ ) is given by (A.1) and (A.2). Moreover, for all Borel measurable
functions f : R2 → R with E(|f (Y∞ , X∞ )|) < ∞, we have
Z
1 T
(A.5) P lim f (Ys , Xs ) ds = E(f (Y∞ , X∞ )) = 1.
T →∞ T 0
B Moments
The next proposition gives a recursive formula for the moments of the process (Yt , Xt )t∈R+ .
B.1 Proposition. Let us consider the two-factor affine diffusion model (1.1) with a ∈ R+ ,
b, α, β, γ ∈ R, σ1 , σ2 , σ3 ∈ R+ , ̺ ∈ [−1, 1]. Suppose that E(Y0n |X0 |p ) < ∞ for some n, p ∈ Z+ .
Then for each t ∈ R+ , we have E(Ytk |Xt |ℓ ) < ∞ for all k ∈ {0, . . . , n} and ℓ ∈ {0, . . . , p}, and
the recursion
1 Z t
E(Ytk Xtℓ ) = e−(kb+ℓγ)t E(Y0k X0ℓ ) + ka + k(k − 1)σ12 e−(kb+ℓγ)(t−u) E(Yuk−1 Xuℓ ) du
2 0
Z t Z t
+ (α + k̺σ1 σ2 ) e−(kb+ℓγ)(t−u) E(Yuk Xuℓ−1 ) du − ℓβ e−(kb+ℓγ)(t−u) E(Yuk+1 Xuℓ−1 ) du
0 0
Z t
1
+ ℓ(ℓ − 1)σ22 e−(kb+ℓγ)(t−u) E(Yuk+1 Xuℓ−2 ) du
2 0
Z t
1
+ ℓ(ℓ − 1)σ32 e−(kb+ℓγ)(t−u) E(Yuk Xuℓ−2 ) du
2 0
for all t ∈ R+ , where E(Yti Xtj )

:= 0 if i, j ∈ Z with i < 0 or j < 0. Especially,
Z t
−bt
E(Yt ) = e E(Y0 ) + a e−b(t−u) du,
0
Z t Z t
−γt −γ(t−u)
E(Xt ) = e E(X0 ) + α e du − β e−γ(t−u) E(Yu ) du,
0 0
Z t
E(Yt2 ) =e −2bt
E(Y02 ) + (2a + σ12 ) e−2b(t−u) E(Yu ) du,
0
Z t
−(b+γ)t
E(Yt Xt ) = e E(Y0 X0 ) + a e−(b+γ)(t−u) E(Xu ) du
0
Z t Z t
−(b+γ)(t−u)
+ (α + ̺σ1 σ2 ) e E(Yu ) du − β e−(b+γ)(t−u) E(Yu2 ) du,
0 0
Z t Z t
E(Xt2 ) = e−2γt E(X02 ) + α e−2γ(t−u) E(Xu ) du − 2β e−2γ(t−u) E(Yu Xu ) du
0 0
Z t Z t
+ σ22 e−2γ(t−u) E(Yu ) du + σ32 e−2γ(t−u) du.
0 0
32
If σ1 > 0 and Y0 = y0 , then the Laplace transform of Yt , t ∈ R++ , takes the form
− 2a2 ( )
σ 2 Z t σ λe −bt y
0
(B.1) E(e−λYt ) = 1 + 1 λ e−bu du 1
exp − σ2 R t
, λ ∈ R+ ,
2 0 1 + 1 λ e−bu du
2 0
σ12 Rt
i.e., Yt has a non-centered chi-square distribution up to a multiplicative constant 4 0 e−bu du, with
4a 4e−bt y 0
degrees of freedom σ12
and with non-centrality parameter Rt .
σ12 0 e−bu du
If σ1 > 0 and (1 − ̺2 )σ22 + σ32 > 0, then for each t ∈ R++ , the distribution of (Yt , Xt ) is
absolutely continuous.
Proof. It is sufficient to prove the recursion in the case when (Y0 , X0 ) = (y0 , x0 ) with an arbitrary
(y0 , x0 ) ∈ R++ × R, since then, for arbitrary initial values with E(Y0n |X0 |p ) < ∞, the recursion
follows by the law of total expectation. One can show that
Z t
(B.2) E(Yuk Xu2ℓ ) du < ∞ for all t ∈ R+ and k, ℓ ∈ Z+ ,
0
see Bolyog and Pap [11, proof of Theorem 5.1]. For all k, ℓ ∈ Z+ , using the independence of W , B
and L, by Itô’s formula, we have
d(e(kb+ℓγ)t Ytk Xtℓ )

p
= (kb + ℓγ)e(kb+ℓγ)t Ytk Xtℓ dt + ke(kb+ℓγ)t Ytk−1 Xtℓ (a − bYt ) dt + σ1 Yt dWt
p p
+ ℓe(kb+ℓγ)t Ytk Xtℓ−1 (α − βYt − γXt ) dt + σ2 Yt (̺ dWt + 1 − ̺2 dBt ) + σ3 dLt
1
+ k(k − 1)e(kb+ℓγ)t Ytk−2 Xtℓ σ12 Yt dt + kℓe−(kb+ℓγ)t Ytk−1 Xtℓ−1 ̺σ1 σ2 Yt dt
2
1
+ ℓ(ℓ − 1)e(kb+ℓγ)t Ytk Xtℓ−2 (σ22 Yt + σ32 ) dt
2
p
= ke(kb+ℓγ)t Ytk−1 Xtℓ (a dt + σ1 Yt dWt )
p p
+ ℓe(kb+ℓγ)t Ytk Xtℓ−1 (α − βYt ) dt + σ2 Yt (̺ dWt + 1 − ̺2 dBt ) + σ3 dLt
1
+ k(k − 1)σ12 e(kb+ℓγ)t Ytk−1 Xtℓ dt + kℓ̺σ1 σ2 e(kb+ℓγ)t Ytk Xtℓ−1 dt
2
1 1
+ ℓ(ℓ − 1)σ22 e(kb+ℓγ)t Ytk+1 Xtℓ−2 dt + ℓ(ℓ − 1)σ32 e(kb+ℓγ)t Ytk Xtℓ−2 dt.
2 2
Writing this in an integrated form and taking the expectation of both sides, we obtain the recursive
formulas for E(Ytk |Xt |ℓ ) < ∞ for all k ∈ {0, . . . , n} and ℓ ∈ {0, . . . , p}.
Formula (B.1) can be found in Ikeda and Watanabe [22, Example 8.2].
If σ1 > 0 and (1 − ̺2 )σ22 + σ32 > 0, then for each t ∈ R++ , the conditional distribution of
(Yt , Xt ) given (Y0 , X0 ) is absolutely continuous, see the proof of part (b) in the proof of Theorem
3.1 in Bolyog and Pap [11]. This clearly implies that the (unconditional) distribution of (Yt , Xt ) is
absolutely continuous. ✷
The next theorem gives a recursive formula for the moments of the stationary distribution of the
process (Yt , Xt )t∈R+ in the subcritical case, see Bolyog and Pap [11, Theorem 5.1].
33
B.2 Theorem. Let us consider the two-factor affine diffusion model (1.1) with a ∈ R+ , b ∈ R++ ,
α, β ∈ R, γ ∈ R++ , σ1 , σ2 , σ3 ∈ R+ , ̺ ∈ [−1, 1], and the random vector (Y∞ , X∞ ) given by
Theorem A.1. Then all the (mixed) moments of (Y∞ , X∞ ) of any order are finite, i.e., we have
E(Y∞n |X |p ) < ∞ for all n, p ∈ Z , and the recursion
∞ +

n p 1 1 2 n−1 p n+1 p−1
E(Y∞ X∞ ) = na + n(n − 1)σ1 E(Y∞ X∞ ) − pβ E(Y∞ X∞ )
nb + pγ 2
n p−1 1
+ p(α + n̺σ1 σ2 ) E(Y∞ X∞ ) + p(p − 1)σ22 E(Y∞
n+1 p−2
X∞ )
2

1 2 n p−2
+ p(p − 1)σ3 E(Y∞ X∞ ) ,
2
k X ℓ ) := 0 for k, ℓ ∈ Z with k < 0 or ℓ < 0.
holds for all n, p ∈ Z+ with n + p > 1, where E(Y∞ ∞
Especially,
a 2 a(2a + σ12 ) 3 a(a + σ12 )(2a + σ12 )

E(Y∞ ) = , E(Y∞ )= , E(Y∞ )= ,
b 2b2 2b3
bα − aβ 2 ) + (α + ̺σ σ ) E(Y )
a E(X∞ ) − β E(Y∞ 1 2 ∞
E(X∞ ) = , E(Y∞ X∞ ) = ,
bγ b+γ
2 −2β E(Y∞ X∞ ) + 2α E(X∞ ) + σ22 E(Y∞ ) + σ32

E(X∞ )= ,
2γ
2 (2a + σ12 ) E(Y∞ X∞ ) − β E(Y∞3 ) + (α + 2̺σ σ ) E(Y 2 )

1 2 ∞
E(Y∞ X∞ ) = ,
2b + γ
2 ) − 2β E(Y 2 X ) + 2(α + ̺σ σ ) E(Y X ) + σ 2 E(Y 2 ) + σ 2 E(Y )
a E(X∞
2 ∞ ∞ 1 2 ∞ ∞ 2 ∞ 3 ∞
E(Y∞ X∞ )= .
b + 2γ
If σ1 > 0, then the Laplace transform of Y∞ takes the form
−2a/σ12
−λY∞ σ12
(B.3) E(e )= 1+ λ , λ ∈ R+ ,
2b
i.e., Y∞ has gamma distribution with parameters 2a/σ12 and 2b/σ12 , hence

Γ σ2a2 + κ
κ 2a
E(Y∞ ) = 1κ , κ ∈ − 2,∞ .
2b
Γ σ2a2 σ1
σ2 1 1
If σ1 > 0 and (1 − ̺2 )σ22 + σ32 > 0, then the distribution of (Y∞ , X∞ ) is absolutely continuous.
C Statistics for diffusion coefficients
Next, for any T > 0, we give a statistic for σ12 , σ22 , σ32 and ̺ using continuous time observations
(Yt , Xt )t∈[0,T ] . Due to this result, we do not consider the estimation of the parameters σ1 , σ2 , σ3
and ̺, they are supposed to be known.
34
Let us consider the two-factor affine diffusion model (1.1) with a ∈ R+ , b, α, β, γ ∈ R, σ1 ∈ R++ ,
σ2 , σ3 ∈ R+ , ̺ ∈ [−1, 1]. Suppose that we have P(Y0 ∈ R++ ) or a ∈ R++ . Then for all T ∈ R++ ,
we have
Z T
(C.1) P Yu du ∈ R++ = 1.
0
Indeed, if ω ∈ Ω is such that [0, t] ∋ u 7→ Yu (ω) is continuous and Yv (ω) ∈ R+ for all v ∈ R+ ,
Rt
then we have 0 Ys (ω) ds = 0 if and only if Ys (ω) = 0 for all s ∈ [0, t]. Using the method of the
proof of Theorem 3.1 in Barczy et. al [5], we get (C.1). The (predictable) quadratic variation process
of Y , X, and the (predictable) quadratic covariation process of Y and X are
Z t Z t Z t
2 2 2
hY it = σ1 Yu du, hXit = σ2 Yu du + σ3 t, hY, Xit = ̺σ1 σ2 Yu du, t ∈ R+ .
0 0 0
If, in addition, a ∈ (σ12 , ∞), then for each T ∈ R++ , we have

" # " RT #−1 " # " #
σ22 Yu du T hXiT σb22 (T )
= R T0/2 =: ,
σ32 0 Yu du T /2 hXiT /2 b32 (T )
σ
hY iT hY, XiT hY, XiT

σ12 = R T b12 (T ),
=: σ ̺= RT = 1 R T =: ̺b(T ),
0 Y u du σ1 σ2 0 Yu du b12 (T )b
σ σ22 (T ) 2 0 Yu du
since the matrix " RT #

Yu du T
R T0/2
0 Yu du T /2
is invertible almost surely. Indeed,
Z T Z T /2 Z T Z T /2
T
P Yu du − T Yu du = 0 = P Yu du = Yu du
2 0 0 T /2 0
Z T Z T /2
=E P Yu du = Yu du (Yu )u∈[0,T /2]
T /2 0
Z T
=E P Yu du = I (Yu )u∈[0,T /2] R T /2
T /2 I= 0 Yu du
Z T
=E P Yu du = I YT /2 = y R T /2
T /2 I= 0 Yu du, y=YT /2
Z T /2
=E P Yu du = I Y0 = y R T /2
= 0,
0 I= 0 Yu du, y=YT /2
since Z
T /2
P Yu du = I Y0 = y =0
0
for each I ∈ R+ and y ∈ R+ , because the additional condition a ∈ (σ12 , ∞) yields that the
R T /2
distribution of 0 Yu du is absolutely continuous, see Filipović et al. [18, Theorem 4.3]. (The
R T /2
absolute continuity of 0 Yu du might hold without the additional condition a ∈ (σ12 , ∞).)
35
We note that (b σ12 (T ), σ
b22 (T ), σ
b32 (T ), ̺b(T )) is a statistic, i.e., there exists a measurable function
Ξ : D([0, T ], R) → R4 such that (b σ12 (T ), σ
b22 (T ), σ
b32 (T ), ̺b(T )) = Ξ((Yu )u∈[0,T ] ), where D([0, T ], R)
denotes the space of real-valued càdlàg functions defined on [0, T ], since
 P −1  P 2  " #
1 ⌊nT ⌋ ⌊nT ⌋ 2 (T )
n i=1 Y i−1 T i=1 X i − X i−1 P σ
b 2
(C.2)  P n  P n n
2  −→ 2 ,
1 ⌊nT ⌋/2 ⌊nT ⌋/2
Y i−1 T /2 X i − X i−1 σ
b 3 (T )
n i=1 n
i=1 n n
P⌊nT ⌋ 2 P⌊nT ⌋
i=1 Y i − Y i−1 P i=1 Y i − Y i−1 X i − X i−1 P
(C.3) n n
b12 (T ),
−→ σ n n n n
−→ ̺b(T )
1 P⌊nT ⌋ 1 P⌊nT ⌋
n i=1 Y i−1 n
n i=1 Y i−1 n
as n → ∞, where the convergences in (C.2) and (C.3) hold almost surely along a suitable subsequence,
the members of the sequences in (C.2) and (C.3) are measurable functions of (Yu , Xu )u∈[0,T ] , and
one can use Theorems 4.2.2 and 4.2.8 in Dudley [15]. Next we prove (C.2) and (C.3). By Theorems
I.4.47 a) and I.4.52 in Jacod and Shiryaev [23],
⌊nT ⌋ ⌊nT ⌋/2
X 2 P
X 2 P
X i − X i−1 −→ [X]T = hXiT , X i − X i−1 −→ [X]T /2 = hXiT /2 ,
n n n n
i=1 i=1
⌊nT ⌋ ⌊nT ⌋
X 2 P
X P
Y i − Y i−1 −→ [Y ]T = hY iT , Y i − Y i−1 X i − X i−1 −→ [Y, X]T = hY, XiT
n n n n n n
i=1 i=1
as n → ∞. Moreover, for all T ∈ R+ , we have

⌊nT ⌋ Z T ⌊nT ⌋/2 Z T /2
1 X P 1 X P
Y i−1 −→ Yu du, Y i−1 −→ Yu du
n n
0 n n
0
i=1 i=1
as n → ∞, see Proposition I.4.44 in Jacod and Shiryaev [23]. Hence (C.2) and (C.3) follow by
Slutsky’s lemma.
D Limit theorems for continuous local martingales
In what follows we recall some limit theorems for continuous local martingales. We use these limit
theorems for studying the asymptotic behaviour of the MLE of θ = (a, b, α, β, γ)⊤ . First we recall a
strong law of large numbers for continuous local martingales.

D.1 Theorem. (Liptser and Shiryaev [26, Lemma 17.4]) Let Ω, F, (Ft )t∈R+ , P be a filtered
probability space satisfying the usual conditions. Let (Mt )t∈R+ be a square-integrable continuous local
martingale with respect to the filtration (Ft )t∈R+ such that P(M0 = 0) = 1. Let (ξt )t∈R+ be a
progressively measurable process such that
Z t
2
P ξu dhM iu < ∞ = 1, t ∈ R+ ,
0
and
Z t
a.s.
(D.1) ξu2 dhM iu −→ ∞ as t → ∞,
0
36
where (hM it )t∈R+ denotes the quadratic variation process of M . Then
Rt
ξu dMu a.s.
(D.2) R t0 −→ 0 as t → ∞.
2
0 ξu dhM iu
If (Mt )t∈R+ is a standard Wiener process, the progressive measurability of (ξt )t∈R+ can be relaxed
to measurability and adaptedness to the filtration (Ft )t∈R+ .
The next theorem is about the asymptotic behaviour of continuous multivariate local martingales,
see van Zanten [29, Theorem 4.1].

D.2 Theorem. (van Zanten [29, Theorem 4.1]) Let Ω, F, (Ft )t∈R+ , P be a filtered probability
space satisfying the usual conditions. Let (M t )t∈R+ be a d-dimensional square-integrable continuous
local martingale with respect to the filtration (Ft )t∈R+ such that P(M 0 = 0) = 1. Suppose that
there exists a function Q : [t0 , ∞) → Rd×d with some t0 ∈ R+ such that Q(t) is an invertible
(non-random) matrix for all t ∈ R+ , limt→∞ kQ(t)k = 0 and
P
Q(t)hM it Q(t)⊤ −→ ηη ⊤ as t → ∞,
where η is a d × d random matrix. Then, for each Rk×ℓ -valued random matrix A defined on
(Ω, F, P), we have
D
(Q(t)M t , A) −→ (ηZ, A) as t → ∞,
where Z is a d-dimensional standard normally distributed random vector independent of (η, A).
References
[1] Alfonsi, A. (2015). Affine Diffusions and Related Processes: Simulation, Theory and Applica-
tions. Springer, Cham, Bocconi University Press, Milan.
[2] Baldeaux, J. and Platen, E. (2013). Functionals of Multidimensional Diffusions with Appli-
cations to Finance. Springer, Cham; Bocconi University Press, Milan.
[3] Barczy, M., Ben Alaya, M., Kebaier, A. and Pap, G. (2016). Asymptotic behav-
ior of maximum likelihood estimators for a jump-type Heston model. Available on ArXiv:
http://arxiv.org/abs/1509.08869
[4] Barczy, M., Ben Alaya, M., Kebaier, A. and Pap, G. (2017+). Asymptotic properties
of maximum likelihood estimator for the growth rate for a jump-type CIR process based on
continuous time observations. To appear in Stochastic Processes and their Applications. Available
on ArXiv: http://arxiv.org/abs/1609.05865
[5] Barczy, M., Döring, L., Li, Z. and Pap, G. (2013). On parameter estimation for critical
affine processes. Electronic Journal of Statistics 7 647–696.
[6] Barczy, M., Döring, L., Li, Z. and Pap, G. (2014). Parameter estimation for a subcritical
affine two factor model. Journal of Statistical Planning and Inference 151-152 37–59.
37
[7] Barczy, M., Nyul, B. and Pap, G. (2016). Least squares estimation for the
subcritical Heston model based on continuous time observations. Available at arXiv
http://arxiv.org/abs/1511.05948
[8] Barczy, M. and Pap, G. (2016). Asymptotic properties of maximum-likelihood estimators for
Heston models based on continuous time observations. Statistics 50(2) 389–417.
[9] Ben Alaya, M. and Kebaier, A. (2012). Parameter estimation for the square root diffusions:
ergodic and nonergodic cases. Stochastic Models 28(4) 609–634.
[10] Ben Alaya, M. and Kebaier, A. (2013). Asymptotic behavior of the maximum likelihood
estimator for ergodic and nonergodic square-root diffusions. Stochastic Analysis and Applications
31(4) 552–573.
[11] Bolyog, B. and Pap, G. (2016). Conditions for stationarity and ergodicity of two-factor affine
diffusions. Communications on Stochastic Analysis 10(4) 587–610.
[12] Cox, J. C., Ingersoll, J. E. and Ross, S. A. (1985). A theory of the term structure of interest
rates. Econometrica 53(2) 385–407.
[13] Dawson, D. A. and Li, Z.: Skew convolution semigroups and affine Markov processes, Ann.
Probab. 34(3) (2006) 1103–1142.
[14] Dietz, H. M. and Kutoyants, Yu. A. (1997). A class of minimum-distance estimators for
diffusion processes with ergodic properties. Statistics and Decisions 15 211–227.
[15] Dudley, R. M. (1989). Real Analysis and Probability. Wadsworth & Brooks/Cole Advanced
Books & Software, Pacific Grove, California.
[16] Duffie, D., Filipović, D. and Schachermayer, W. (2003). Affine processes and applications
in finance. Annals of Applied Probability 13 984–1053.
[17] Filipović, D. (2009). Term-Structure Models. Springer-Verlag, Berlin.
[18] Filipović, D., Mayerhofer, E. and Schneider, P. (2013). Density approximations for mul-
tivariate affine jump-diffusion processes. Journal of Econometrics 176 93–111.
[19] Hu, Y. and Long, H. (2007). Parameter estimation for Ornstein–Uhlenbeck processes driven by
α-stable Lévy motions. Communications on Stochastic Analysis 1(2) 175–192.
[20] Hu, Y. and Long, H. (2009). Least squares estimator for Ornstein–Uhlenbeck processes driven
by α-stable motions. Stochastic Processes and their Applications 119(8) 2465–2480.
[21] Hu, Y. and Long, H. (2009). On the singularity of least squares estimator for mean-reverting
α-stable motions. Acta Mathematica Scientia 29B(3) 599–608.
[22] Ikeda, N. and Watanabe, S. (1981). Stochastic Differential Equations and Diffusion Processes.
North-Holland Publishing Company.
[23] Jacod, J. and Shiryaev, A. N. (2003). Limit Theorems for Stochastic Processes, 2nd ed.
Springer-Verlag, Berlin.
38
[24] Karatzas, I. and Shreve, S. E. (1991). Brownian Motion and Stochastic Calculus, 2nd ed.
Springer-Verlag, New York.
[25] Küchler, U. and Sørensen, M. (1997). Exponential families of stochastic processes, Springer-
Verlag, New York.
[26] Liptser, R. S. and Shiryaev, A. N. (2001). Statistics of Random Processes II. Applications,
2nd edition. Springer-Verlag, Berlin, Heidelberg.
[27] Overbeck, L. and Rydén, T. (1997). Estimation in the Cox-Ingersoll-Ross model. Econometric
Theory 13(3) 430–461.
[28] van der Vaart, A. W. (1998). Asymptotic Statistics, Cambridge University Press.
[29] van Zanten, H. (2000). A multivariate central limit theorem for continuous local martingales.
Statistics & Probability Letters 50(3) 229–235.
39

Bolyog Pap

Uploaded by

Copyright:

Available Formats

Bolyog Pap

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bolyog Pap

Uploaded by

Copyright:

Available Formats

On conditional least squares estimation for affine diffusions

based on continuous time observations

Beáta Bolyog∗ , Gyula Pap

Moreover, (Yt , Xt )t∈R+ is a two-factor affine process with infinitesimal generator

(A(Y,X) f )(y, x) = (a − by)f1′ (y, x) + (α − βy − γx)f2′ (y, x)

In case of b = 0 we have E(Yt ) = at + O(1) and

3 CLSE based on continuous time observations

Thus, for all i ∈ N, we have

hence, similarly as on page 675 in Barczy et al. [5], we get

3.2 Remark. The first order Taylor approximation of gn (a, b, α, β, γ) at (0, 0, 0, 0, 0) is

gn−1 (c, d, δ, ε, ζ) = (a, b, α, β, γ)

bbT,n = −n log(1 − dbT,n ) = ndbT,n h1 (dbT,n )

with probability tending to one as n → ∞, where the continuous function h1 : (−∞, 1) → R is

with probability tending to one as n → ∞, where the continuous function h2 : R → R is given by

is a standard Wiener process, independent of L. Indeed,

bT − θ = G−1 (f − GT θ), where

Z " # " #⊤ " # ! Z " #

Proof. By (3.7), we have

since, by Theorem A.2, the distribution of Y∞ is absolutely continuous, hence x1 − x2 Y∞ 6= 0

since, by Theorem A.2, the distribution of (Y∞ , X∞ ) is absolutely continuous, hence x1 − x2 Y∞ −

Indeed, we have already proved

as T → ∞. Indeed, we have already proved

One can check

with P(VY 6= 0) = 1, and, for each k ∈ N,

5 Asymptotic behavior of CLSE: subcritical case

e ∞ has the form

where the random vector (Y∞ , X∞ ) is given by Theorem A.1.

Proof. By (3.7), we have

By Theorem A.2, we obtain

6 Asymptotic behavior of CLSE: critical case

First we present an auxiliary lemma.

for each n ∈ N and k, ℓ ∈ Z+ , hence we obtain the statement. ✷

with initial value (Y0 , X0 ) = (0, 0).

Proof. By (3.7), we have

The aim of the following discussion is to prove

as T → ∞. By part (ii) of Remark 2.7 in Barczy et al. [5], we have

as T → ∞ for all k, ℓ ∈ Z+ with k + ℓ 6 2. By (3.21) in Barczy et al. [5], we have

(6.6) E(|Ys − Ys |) 6 E(Y0 ), s ∈ R+ ,

(6.8) sup E[(Ys − Ys )2 ] = O(T ) as T → ∞.

(6.9) sup E(Ys2 ) = O(T 2 ) as T → ∞,

and sups∈[0,T ] E(Ys2 ) = O(T 2 ) as T → ∞. We have

(6.10) sup E[(Xs − Xs )2 ] = O(T ) as T → ∞.

E(|Ys Xs − Ys Xs |) 6 E(|Ys − Ys ||Xs |) + E(Ys |Xs − Xs |)

Applying (6.10), we obtain

Again by the Cauchy–Schwarz inequality, we obtain

Applying (6.8), we get

(6.11) sup E[(Xt − Xt )4 ] = O(T 3 ) as T → ∞,

(6.12) sup E(Xs4 ) = O(T 4 ) as T → ∞.

as T → ∞. Again by Itô’s formula and using β = 0 and γ = 0,

First we present an auxiliary lemma about the asymptotic behavior of E(Xt2 ) as t → ∞.

Proof. By Proposition B.1,

since b < 0. Moreover,

using γ < 0 and γ − b < 0. Again by Proposition B.1,

using b < 0. Hence

using γ < 0, γ − b < 0 and 2γ − b < 0. ✷

and, for each k, ℓ ∈ Z+ with k + ℓ > 0,

for all s, t ∈ R+ with 0 6 s 6 t. If α ∈ R+ and β ∈ R− , then

By the Cauchy–Schwarz inequality, we have

b+γ − 2b+γ b+γ − 2b+γ − b+2γ