Lecture 1 Quant

Topics in Probability:
Quantitative Investments Strategies
Marco Avellaneda
G63.2936.001
Spring Semester 2009

Syllabus
1. Risk Models, Factor Analysis and Correlation Structures
-- Statistical models of stock returns

-- The classics: CAPM, APT
-- Factor analysis
-- Dynamic PCA of correlation matrices
-- Economic significance of eigenvectors & eigenportfolios
-- Exchange-traded Funds (ETFs)
-- Factor analysis via ETFs
-- Random matrix theory
-- Examples: US equities, NASDAQ, EM bonds, Brazil, China,
European stocks
-- Risk-functions and dynamic risk-management of equity portfolios
Syllabus
2. Statistical arbitrage for cash equities
-- Long-short market-neutral investment portfolios

-- Leverage & setting ex-ante performance targets
-- Performance measures
-- Back-testing concepts: in-sample/out-of-sample performance,
survivorship biases
-- Time-series analysis of stock residuals
-- PCA-based residuals
-- ETF-based residuals
-- Extracting information from trading volume (subordination)
Syllabus
3. Statistical arbitrage in options markets
-- Option markets revisited

-- Volatility and options trading
-- Data issues with option markets, implied dividend
-- Modeling stock-ETF dynamics and ETF-stock dynamics
-- Weighted Monte-Carlo technique for model calibration
-- Relative-value analysis: options on single stocks
-- Relative-value analysis: options on indices and ETFs
-- Construction of risk-functions for option portfolios
-- Market-neutral option portfolios
-- Dispersion trading
-- Back-testing option portfolio strategies
Course Requirements
-- Three projects, or assignments, associated with the different parts
of the course. Projects will be approved by instructor.
-- Projects will deal with real data. They will involve programming
and quantitative financial analysis as well as your
contribution to and interpretation of the theory presented.
-- Programming will involve the management of large (real) datasets, the

use of Matlab but also other programming languages and software
needed to ``get the job done’’.
-- The grade will be based on the three projects and on class participation.
-- Pre-requisites: knowledge of applied statistics, proficiency in at least one

programming environment, knowledge of basic finance concepts
(e.g., interest rates, present value, stocks, Markowitz, Black Scholes).
-- Books and notes: provided after each lecture.

Statistical Models of Stock Returns
Consider a stock (e.g IBM). The return R over a specified period is the
change in price, plus dividend payments, divided by the initial price.
St + ∆t − St + Dt ,t + ∆t
Rt =
St
How can we explain or predict stock returns?
-- Fundamental analysis (earnings, balance sheet, business analysis)

this will not be considered in this course!
-- ``Trends’’ in the prices. (Not very effective)
-- Explanation of the returns/prices based on statistical factors

Factor models
Nf
R = ∑ β j Fj + ε
j =1
F j , j = 1,..., N f , Explanatory factors
β j , j = 1,..., N f , Factor loadings

Nf
∑β F
j =1
j j Explained, or systematic portion
ε Residual, or idiosyncratic portion

CAPM: a `minimalist’ approach
Single explanatory factor: the ``market’’, or ``market portfolio’’
R = βF + ε , Cov( R, ε ) = 0
F = usually taken to be the returns of a broad-market index (e.g., S&P 500)

Normative statement: < ε >= 0 or < R >= β < F >
Argument: if the market is `èfficient’’, or in `èquilibrium’’, investors cannot

make money (systematically) by picking individual stocks and shorting the
index or vice-versa (assuming uncorrelated residuals). (Lintner, Sharpe. 1964)
Counter-arguments: (i) the market is not `èfficient’’, (ii) residuals may be

correlated (additional factors are needed).
Multi-factor models (APT)
Nf
R = ∑ β j F j + ε , Corr ( F j , ε ) = 0
j =1
Factors represent industry returns (think sub-indices in different sectors, size,

financial statement variables, etc).
Nf
Normative statement (APT): < ε >= 0 or < R >= ∑ β j < F j >

j =1
Argument: Generalization of CAPM, based again on no-arbitrage. (Ross, 1976)
Counter-arguments: (i) How do we actually define the factors? (ii) Is the number
of factors known? (iii) The structure of the stock market and risk-premia
vary strongly (think pre & post WWW) (iv) The issue of correlation of residuals
is intimately related to the number of factors.
Factor decomposition in practice
-- Putting aside normative theories (how stocks should behave), factor
analysis can be quite useful in practice.
-- In risk-management: used to measure exposure of a portfolio to

a particular industry of market feature.
-- Dimension-reduction technique for the study a system with a large number

of degrees of freedom
-- Makes Portfolio Theory viable in practice. (Markowitz to Sharpe to Ross!)
-- Useful to analyze stock investments in a relative fashion (buy ABC, sell

XYZ to eliminate exposure to an industry sector, for example).
-- New investment techniques arise from factor analysis. The technique is

called defactoring (Pole, 2007, Avellaneda and Lee, 2008)
Principal Components Analysis of
Correlation Data
Consider a time window t=0,1,2,…,T, (days) a universe of N stocks. The returns
data is represented by a T by N matrix(Rit )
σ =
i
2 1 T
∑
T − 1 t =1
R(
it − R
2
i , ) 1 T
Ri = ∑ Rit
T t =1
Rit
Yit =
σi
1 T
Γij = ∑
T − 1 t =1
YitY jt
Clearly, Rank (Γ) ≤ min ( N , T )

Regularized correlation matrix
Cij =
1 T
T − 1 t =1
R (
∑ it i jt j ij
− R R − R )(
+ γδ , γ = 10 )
-9
Cij
Γ reg
ij =
Cii C jj
This matrix is a correlation matrix and is positive definite. It is equivalent for

all practical purposes to the original one but is numerically stable for inversion
and eigenvector analysis (e.g. with Matlab).
Note: this is especially useful when T<<N.

Eigenvalues, Eigenvectors and
Eigenportfolios
λ1 > λ2 ≥ .... ≥ λ N > 0 eigenvalues
( )
V ( j ) = V1( j ) , V2( j ) ,...,VN( j ) , j = 1,2,..., N . eigenvectors
N
 VN ( j)

F jt = ∑ Vi Yit = ∑ 
( j) i
 Rit
i =1  σ i 
returns of
i =1 “eigenportfolios”
We use the coefficients of the eigenvectors and the volatilities of

the stocks to build ``portfolio weights’’. These random variables span the
same linear space as the original returns.
50 largest eigenvalues using the 1400 US
stocks with cap >1BB cap ( Jan 2007)
N~1400 stocks
T=252 days
Top 50 eigenvalues for S&P 500 index
components, May 1 2007,T=252
30%
25%
20%
15%
10%
5%
0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Model Selection Problem:
How many EV are significant?
Need to estimate the significant eigenportfolios which can be used as factors.
Assuming that the correlation matrix is invertible (regularize if necessary)
N
< Ri R j >= Cij = ∑ λkVi ( k )V j( k )
k =1
N
Vi ( k ) ~ 1 N
Vi ( k )
Fk ≡ ∑ Ri , Fk ≡ ∑ Ri
i =1 σi λk i =1 σi
~ ~ ~
< F >= λk ,
k
2
< F >= 1,
k
2
< Fk Fk ' >= δ kk '
Ri = ∑ β ik Fk ⇒ β ik = σ i λk Vi ( k )
k
Karhunen-Loeve Decomposition
R = vector of random variables with finite second moment, <.,>=correlation
C =< R ⊗ R >=< RR ' > Covariance matrix
Ω = C1/ 2 Symmetric square root of C
F = Ω −1R, R = ΩF F has uncorrelated components
Β = Ω = C1/ 2 Loadings= components of the

square-root of C
Since the eigenvectors vanish or are very small in a real system, the modeling
consists in defining a small number of factors and attribute the rest to ``noise’’
Bai and Ng 2002, Econometrica
2
1 N T
 m

I (m) = min
β NT
∑∑ 
i =1 t =1 
Rit − ∑
k =1
β ik kt 
F

m* = arg min (I ( m) + m ⋅ g (N , T ))
m
lim N ,T →∞ g ( N , T ) = 0, lim N ,T →∞ min( N , T ) g (N , T ) = ∞
Under reasonable assumptions on the underlying model, Bai and Ng prove

that under PCA estimation, m* converges in probability to the true
number of factors as N , T → ∞
Connection with eigenvalues of
correlation matrix
2
N T
 m

J (m ) ≡ arg min
1 1
β NT
∑ 2 ∑

i =1 σ i t =1 
Rit − ∑
k =1
β ik kt 
F

 N 2 (k ) 2 
( )
N N
J (m ) = ∑λ k also, I (m ) = ∑ λk  ∑ σ i Vi 
k = m +1 k = m +1  i =1 
 N 
m = arg min  ∑ λk + mg (N , T ) )
*
Linear penalty function
m  k = m +1 
For finite samples, we need to adjust the slope g(N,T).

Apparently, Bai and Ng (2002) tend to underestimate the number of factors in
Nasdaq stocks considerably. (2 factors, T=60 monthly returns, N=8000 stocks)
Useful quantities
m
1
N
∑λ
k =1
k = Explained variance by first m eigenvectors
N
1
N
∑λ
k = m +1
k = Tail
N
1 m
N
∑ λk + g = Objective Function = U (m, g )
k = m +1 N
∂ 2U (m* ( g ), g )
Convexity =
∂g 2
Objective function U(m,g)
150%
100%
1 U
10 50%
19
m
28 0%
g
Optimal value of U(m,g) for different g
0.8
0.8
0.7
0.7
U(m*(g),g)
0.6
0.6
0.5
0.5
0.4
0.4
0.3
1 2 3 4 5 6 7 8 9 10 11 12 13
g
Implementation of Bai & Ng
on SP500 Data
g m* Lambda_m* Explained Variance Tail Objective Function
Convexity
1 117 0.20% 87.88% 12.12% 0.355 -
2 59 0.39% 71.44% 28.56% 0.522 -0.085085
3 29 0.59% 57.11% 42.89% 0.603 -0.041266
4 16 0.76% 48.51% 51.49% 0.643 -0.018110
5 10 0.96% 43.52% 56.48% 0.665 -0.007000
6 7 1.18% 40.43% 59.57% 0.680 -0.003096
7 6 1.22% 39.25% 60.75% 0.691 -0.004872
8 4 1.56% 36.56% 63.44% 0.698 0.001069
9 4 1.56% 36.56% 63.44% 0.706 0.000000
10 4 1.56% 36.56% 63.44% 0.714 0.000000
11 4 1.56% 36.56% 63.44% 0.722 0.000000
12 4 1.56% 36.56% 63.44% 0.730 0.000000
13 4 1.56% 36.56% 63.44% 0.738 -
If we choose the cutoff m* as the one for which the sensitivity to g is zero, then
m*~5 seems appropriate.
This would lead to the conclusion that the S&P 500 corresponds to a 5-factor model.
The number is small in relation to industry sectors and to the amount of variance
explained by industry factors.
The density of states: a useful formalism
Spectral theory as seen by physicists – origins in Quantum Mechanics and
High Energy Physics.
#{k : λk / N ≤ E}
F (E ) ≡ F ( E ) is increasing, F (1) = 1
N
 λk 
f (E ) = F ' (E ) = f (E )
1
N
∑δ  E −
k 
 ∴
N
D.O.E.
One way to think about the DOE is as changing the x-axis for the y-axis,i.e.
counting the number of eigenvalues in a neighborhood of any E, 0<E<1.
Intuition: if N is large, the eigenvalues of the insignificant portion of the

spectrum will ``bunch up’’ into a continuous distribution f(E).
Integrated DOE
1.2
0.8
F(E)
0.6
0.4
0.2
0
0.00 0.20 0.40 0.60 0.80 1.00
E
In the DOE language…
N λm
= ∫ E f (E )dE , = 1 − F (λm )
1 m
N
∑λ
k = m +1
k
N
0
E
U (E , g ) = ∫ x f (x )dx + g (1 − F (E ))
0
∂U (E , g )
= E f (E ) − gf (E ) = (E − g ) f (E )
∂E
If f ( g ) ≠ 0, then E * ( g ) = g.
Dependence of the problem on g
( )
g
V (g ) = U E * ( g ), g = ∫ xf ( x)dx + g (1 − F ( g ) )
0
= gF ( g ) − ∫ F ( x )dx + g − gF ( g )
g
= g − ∫ F ( x )dx
g
V ' (g ) = 1 − F ( g )
V ' '(g) = − f (g)
According to this calculation, the best cutoff is the level E

where the DOE vanishes (or nearly vanishes) coming from the right.
A closer look at equities
-- There is information in equities markets related to different activities

of listed companies
-- Industry sectors
-- Market capitalization
-- Regression on industry sector indexes explain often no more than

50% of returns
-- Since there exist at least 15 distinct sectors that we can identify in

US/ G7 economies, we conclude that we probably require at least
15 factors to explain asset returns.
-- Temporal market fluctuations are important as well. In order for factor

models to be useful, they need to adapt to economic cycles.
Stocks of more than 1BB cap
in January 2007
Market Cap unit: 1M/usd
Sector ETF Num of Stocks Average Max Min
Internet HHH 22 10,350 104,500 1,047
Real Estate IYR 87 4,789 47,030 1,059
Transportation IYT 46 4,575 49,910 1,089
Oil Exploration OIH 42 7,059 71,660 1,010
Regional Banks RKH 69 23,080 271,500 1,037
Retail RTH 60 13,290 198,200 1,022
Semiconductors SMH 55 7,303 117,300 1,033
Utilities UTH 75 7,320 41,890 1,049
Energy XLE 75 17,800 432,200 1,035
Financial XLF 210 9,960 187,600 1,000
Industrial XLI 141 10,770 391,400 1,034
Technology XLK 158 12,750 293,500 1,008
Consumer Staples XLP 61 17,730 204,500 1,016
Healthcare XLV 109 14,390 192,500 1,025
Consumer discretionary XLY 207 8,204 104,500 1,007
Total 1417 11,291 432,200 1,000
January, 2007

Lecture 1 Quant

Uploaded by

Lecture 1 Quant

Uploaded by

Topics in Probability:

Quantitative Investments Strategies

Spring Semester 2009

1. Risk Models, Factor Analysis and Correlation Structures

-- Statistical models of stock returns

2. Statistical arbitrage for cash equities

-- Long-short market-neutral investment portfolios

3. Statistical arbitrage in options markets

-- Option markets revisited

-- Programming will involve the management of large (real) datasets, the

-- Pre-requisites: knowledge of applied statistics, proficiency in at least one

-- Books and notes: provided after each lecture.

How can we explain or predict stock returns?

-- Fundamental analysis (earnings, balance sheet, business analysis)

-- ``Trends’’ in the prices. (Not very effective)

-- Explanation of the returns/prices based on statistical factors

F j , j = 1,..., N f , Explanatory factors

β j , j = 1,..., N f , Factor loadings

ε Residual, or idiosyncratic portion

Single explanatory factor: the ``market’’, or ``market portfolio’’

F = usually taken to be the returns of a broad-market index (e.g., S&P 500)

Argument: if the market is ``efficient’’, or in ``equilibrium’’, investors cannot

Counter-arguments: (i) the market is not ``efficient’’, (ii) residuals may be

Factors represent industry returns (think sub-indices in different sectors, size,

Normative statement (APT): < ε >= 0 or < R >= ∑ β j < F j >

Argument: Generalization of CAPM, based again on no-arbitrage. (Ross, 1976)

-- In risk-management: used to measure exposure of a portfolio to

-- Dimension-reduction technique for the study a system with a large number

-- Makes Portfolio Theory viable in practice. (Markowitz to Sharpe to Ross!)

-- Useful to analyze stock investments in a relative fashion (buy ABC, sell

-- New investment techniques arise from factor analysis. The technique is

Clearly, Rank (Γ) ≤ min ( N , T )

This matrix is a correlation matrix and is positive definite. It is equivalent for

Note: this is especially useful when T<<N.

λ1 > λ2 ≥ .... ≥ λ N > 0 eigenvalues

We use the coefficients of the eigenvectors and the volatilities of

Assuming that the correlation matrix is invertible (regularize if necessary)

R = vector of random variables with finite second moment, <.,>=correlation

C =< R ⊗ R >=< RR ' > Covariance matrix

Ω = C1/ 2 Symmetric square root of C

F = Ω −1R, R = ΩF F has uncorrelated components

Β = Ω = C1/ 2 Loadings= components of the

lim N ,T →∞ g ( N , T ) = 0, lim N ,T →∞ min( N , T ) g (N , T ) = ∞

Under reasonable assumptions on the underlying model, Bai and Ng prove

For finite samples, we need to adjust the slope g(N,T).

Intuition: if N is large, the eigenvalues of the insignificant portion of the

According to this calculation, the best cutoff is the level E

-- There is information in equities markets related to different activities

-- Regression on industry sector indexes explain often no more than

-- Since there exist at least 15 distinct sectors that we can identify in

-- Temporal market fluctuations are important as well. In order for factor

You might also like