3-Basic Stats

Basic Statistical Concepts
Learning Objectives :
1. Probability Density function
2. Normal Distribution
3. Correlation and Covariance
4. Various methods of computing Volatility
5. Vasicek Model
Probability Density Function- Basic Concepts
❑ Let X be a continuous random variable . Then a probability distribution or probability density
function (pdf) of X is a function f(x) such that for any two numbers a and b with b >= a,
P ( a<= X <= b) is ∫f(x) dx , with integration from a to b
❑That is, the probability that X takes on a value in the interval [a, b] is the area above this interval
and under the graph of the density function. The graph of f(x) is often referred to as the density
curve
❑For f (x) to be a legitimate pdf, it must satisfy the
following two conditions: f (x)
✓ f (x) >= 0 for all x

✓∫f(x) dx = 1 for the area under the curve
X--→
a b
2
Cumulative Density Function- Basic Concepts
❑ The cumulative distribution function [CDF] F(x) for a discrete random variable X, gives for every
number x, the probability that P (X <= x)
❑It is obtained by summing the pdf over all possible values of the y where y <= x
❑The cumulative distribution function F(x) for a continuous rv X is defined for every number x by
F (x) = P ( X <=x) = ∫f(y) dy , with integration from - 𝒊𝒏𝒇𝒊𝒏𝒊𝒕𝒚 to x
CDF
1.20
1.00
0.80
0.60
0.40
0.20
-
3
Normal Distribution
❑ Of all the distribution functions, Normal Distribution is amongst the most important
❑Numerous economic and physical measures and indicators are normally distributed
❑A continuous rv X is said to have a normal distribution with parameters μ and σ, if the pdf of X is
f (x) = 1 / σ√2π e [ -1/2 ( x-μ )/σ)2]
❑Each density curve is symmetric about μ and bell-shaped, so the center of the bell (point of
symmetry) is both the mean of the distribution and the median.
❑ For μ= 0 and σ= 1, the pdf is called Standard Normal Distribution
❑ The Standard Normal Distribution is a reference distribution from which information on other
distributions can be obtained using z = (x -μ )/ σ. Z is negative for values to left of the mean
❑For any z value, area to the left of the curve can be found from empirical tables 4
Important Z values and Percentiles
CRITICAL Z Values And Percentiles
Percentiles 95 99 99.5 99.9
Z Value 1.645 2.33 2.58 3.08
✓ 68% of the population lies within

1 SD of the mean
✓ 95% of the population lies within
2 SD of the mean
✓ 99.7% of the population lies
within 3 SD of the mean
5
Normal Probability Distributions
Let p= 0.99
Mean, µ 0
Std Dev, σ 1
Then z Value= F-(p) 2.33 =NORM.INV(p,mean, SD)
X pdf CDF
And P value for a given z , P(z) 0.99 =NORM.DIST(p,mean, SD)
-4.00 0.00 0.00
-3.50 0.00 0.00
-3.00 0.00 0.00
-2.50 0.02 0.01 PDF and CDF
-2.00 0.05 0.02
1.05
-1.50 0.13 0.07
-1.00 0.24 0.16 0.85
-0.50 0.35 0.31
0.65
- 0.40 0.50
0.50 0.35 0.69 0.45
1.00 0.24 0.84
0.25
1.50 0.13 0.93
2.00 0.05 0.98 0.05
2.50 0.02 0.99
(0.15)
3.00 0.00 1.00
3.50 0.00 1.00 pdf CDF
4.00 0.00 1.00 6
Inverse Normal Function
CDF
1.20 ✓ Enter a p value ( say 0.8)
1.00
0.80
✓ Use the NORM.INV function to
0.60 compute the z value
0.40
0.20
✓ This gives the Z value corresponding to
- the area of 89%
✓ In other words we are finding z using
the inverse function F- (p)
pdf ✓ We are thus inverting the CDF function
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
-
7
Typical problem
❑ Consider a portfolio with a mean return of 15% and SD of 25%.

✓ What is the probability that the portfolio returns will be between 10% and 20% assuming normal distribution
✓ What is the probability that the returns will be negative ?
Mean 15.00% Mean 15.00%

SD 25.00% SD 25.00%
Lower Bound 10% Lower Bound Infinity

Upper Bound 20% Upper Bound 0%
Z Value-LB NA
Z Value-LB (0.20)
Z Value-UB 0.20 Z Value-UB (0.60)
CDF Lower Bound 42% CDF Lower Bound 0%

CDF Upper Bound 58% CDF Upper Bound 27%
PROBABILITY 15.9% PROBABILITY 27.4%

8
Question (Hull 1.17)
✓A bank estimates that its profit next year is normally distributed with a mean of 0.8% of assets and
the standard deviation of 2% of assets. How much equity (as a percentage of assets) does the
company need to be (a) 99% sure that it will have a positive equity at the end of the year and (b)
99.9% sure that it will have positive equity at the end of the year? Ignore taxes.
Mean 0.80% Mean 0.80%

SD 2.00% SD 2.00%
Z value corresponding Z value corresponding to

to 99% probability 2.326 99% probability 3.090
(Be careful about sign)
Using z = (X-μ)/σ Using z = (X-μ)/σ

we get X as -3.85% we get X as -5.38%
9
Two other properties : Skew and Kurtosis
𝐫 𝐭𝐡 𝐦𝐨𝐦𝐞𝐧𝐭 = 𝐄 (𝐗 − µ)𝐫 𝐫 𝐭𝐡 𝐦𝐨𝐦𝐞𝐧𝐭 = 𝐄 (𝐗 − µ)𝐫
𝐄 (𝐗 − µ)𝟑
𝐒𝐤𝐞𝐰 = 𝐄 (𝐗 − µ)𝟒
𝛔𝟑 𝐤𝐮𝐫𝐭𝐨𝐬𝐢𝐬 =
𝛔𝟒
✓ Usually 3 is subtracted from the figure arrived
In case of Positive Skew, HIGH Returns are ✓ 3 : Normal Distn

MORE likely and LOW Returns are LESS ✓ Light Tailed : Leptokurtosis (>3)
likely ✓ Fat Tailed : Platykurtosis ( <3)
In case of Negative Skew, HIGH RETURNS

are LESS likely and HIGH returns are
MORE likely
Excess Kurtosis leads to a situation where

VERY HIGH and VERY LOW returns are
MORE Likely than NORMAL DISTN 10
Computation of Population Variance ….
X f(x) Xf(X) X-μ (X-μ)2 f(x) X (X-μ)2

1 10% 0.100 -2.100 4.410 0.441 ✓ Variance is a measure of
2 20% 0.400 -1.100 1.210 0.242 dispersion
3 35% 1.050 -0.100 0.010 0.003 ✓ It can be also calculated as
4 20% 0.800 0.900 0.810 0.162
E(X2)- E(X)2
5 15% 0.750 1.900 3.610 0.542
100% 3.100 ✓ Note Population Variance
1.39 and Sample Variance differ
Mean (μ) 3.1 ✓ In case of Sample Variance
Variance( σ2) 1.39 we divided by (N-1)
Standard
Deivation(σ) 1.18
✓ This adjusts for the degrees
of freedom and a bit more
conservative
11
Covariance and Correlation
❑ When two random variables X and Y are not independent, it is frequently of interest to assess how
strongly they are related to one another.
❑For discrete random variables it is given by the equation Cov(X,Y) =Average of ∑ (X-μx) (Y-μy)
❑For a strong positive relationship, product will be positive , for a strongly negative relationship
product will be negative
❑If they are not corelated , the negatives and positives will mostly cancel out , giving a product closer
to Zero
❑The correlation coefficient of X and Y, denoted by Corr(X, Y) or just ρ , is defined by
Cov(X, Y)/σxσy
12
Illustrating COVAR and CORREL
A B A-μA B-μB Product

4 1 -4.5 -3.8 17.1
5 4 -3.5 -0.8 2.8
6 4 -2.5 -0.8 2
7 3 -1.5 -1.8 2.7
8 2 -0.5 -2.8 1.4
9 5 0.5 0.2 0.1
10 8 1.5 3.2 4.8
11 7 2.5 2.2 5.5
12 6 3.5 1.2 4.2
13 8 4.5 3.2 14.4
μA 8.5 55
μB 4.8
COVAR 5.50
Sigma a 2.87
Sigma b 2.32 CORREL 0.83
13
Volatility
❑ Volatility is defined as the Standard Deviation of the return provided per unit of time
✓Suppose that Si is the value of a variable on day i. The volatility per day is the standard
deviation of returns ln(Si /Si-1)
✓Continuous Compounding returns are used for Volatility computation
❑Normally days when markets are closed are ignored in volatility calculations
❑The volatility per year is 252 times the daily volatility
❑Variance rate is the square of volatility
❑Of the variables needed to price an option the one that cannot be observed directly is volatility
❑We can therefore imply volatilities from market prices and vice versa
14
Computing volatility- Daily Standard Deviation
Closing Price ln returns Squared

Daily Standard Deviation
Price Relative Return
𝒖𝒊 = 𝑺𝒊 𝑺𝒊 − 𝑺𝒊−𝟏
𝑺𝒊 𝒖𝒊 = 𝒍𝒏 ≅
ൗ𝑺 𝑺𝒊
𝑺𝒊−𝟏 𝑺𝒊−𝟏
𝒊−𝟏 𝒍𝒏
𝑺𝒊−𝟏
20.00
𝒎
20.10 1.00500 0.00499 0.00002 𝟏
19.90 0.99005 -0.01000 0.00010 𝝈𝟐𝒏 = 𝒖)𝟐
෍(𝒖𝒏−𝒊 −ഥ
𝒎−𝟏
20.00 1.00503 0.00501 0.00003 𝒊=𝟏
20.90 1.00966 0.00962 0.00009
20.40 0.97608 -0.02421 0.00059
𝒎 𝒎
20.50 1.00490 0.00489 0.00002 𝟏 𝟏
20.60 1.00488 0.00487 0.00002 𝝈𝟐𝒏 = ෍ 𝒖𝟐𝒏−𝒊 𝝈𝟐 = ෍ 𝒖𝟐𝒏−𝒊
20.30 0.98544 -0.01467 0.00022 𝒎 𝒎
𝒊=𝟏 𝒊=𝟏
SUM 0.01489 0.00424
Average 0.00074 0.00021
Daily Volatility 1.456% The above simplification is usually made

in Risk Management
15
Problem : HULL 10.18
Suppose that observations on a stock price (in dollars) at the end of each of 15 consecutive days are
as follows:30.2, 32.0, 31.1, 30.1, 30.2, 30.3, 30.6, 30.9, 30.5, 31.1, 31.3, 30.8, 30.3, 29.9, 29.8
Estimate the daily volatility using both approaches

1st Method 2nd Method
i Si ln(Si/Si-1) ln(Si/Si-1)-ave [ln(Si/Si-1)-ave]^2 (Si -Si-1)/Si-1 [(Si-Si-1)/Si-1]^2
0 30.20
1 32.00 0.0579 0.0588 0.0035 0.0596 0.0036
2 31.10 -0.0285 -0.0276 0.0008 -0.0281 0.0008
3 30.10 -0.0327 -0.0317 0.0010 -0.0322 0.0010
4 30.20 0.0033 0.0043 0.0000 0.0033 0.0000
5 30.30 0.0033 0.0043 0.0000 0.0033 0.0000
6 30.60 0.0099 0.0108 0.0001 0.0099 0.0001
7 30.90 0.0098 0.0107 0.0001 0.0098 0.0001
8 30.50 -0.0130 -0.0121 0.0001 -0.0129 0.0002
9 31.10 0.0195 0.0204 0.0004 0.0197 0.0004
10 31.30 0.0064 0.0074 0.0001 0.0064 0.0000
11 30.80 -0.0161 -0.0152 0.0002 -0.0160 0.0003
12 30.30 -0.0164 -0.0154 0.0002 -0.0162 0.0003
13 29.90 -0.0133 -0.0123 0.0002 -0.0132 0.0002
14 29.80 -0.0034 -0.0024 0.0000 -0.0033 0.0000
SUM 0.0067 SUM 0.0069
16
SD 0.0228 SD 0.0222
Computing volatility- EWMA
Closing Price Daily Squared Weighted
Day Price Ratio Return Return Weights Square Returns ✓ Historical SD assigns same weight to
80% all prices
✓ Thus its an equally weighted
- 20.00 approach
1 19.80 1.010101 1.01% 0.0001010 20% 0.0000202 ✓ Recent spikes will not be captured
2 20.13 0.9838055 -1.63% 0.0002666 16% 0.0000427 ✓ Exponentially Weighted Moving
3 20.15 0.9988103 -0.12% 0.0000014 13% 0.0000002
4 20.18 0.9985084 -0.15% 0.0000022 10% 0.0000002
Average (EWMA) assigns greater
5 20.17 1.000386 0.04% 0.0000001 8% 0.0000000 weight to recent returns
55 20.00 1.0008524 0.09% 0.0000007 0% 0.0000000 ✓ It does this with a single parameter λ
56 20.00 1.0001241 0.01% 0.0000000 0% 0.0000000 ✓ Each successive weights is lowered
57 20.04 0.9981933 -0.18% 0.0000033 0% 0.0000000
58 20.04 0.9999751 0.00% 0.0000000 0% 0.0000000
by λ
59 20.04 1.000166 0.02% 0.0000000 0% 0.0000000
60 19.98 1.0025724 0.26% 0.0000066 0% 0.0000000
Sum 0.000495 100.0% Simplified formulae
Avg 0.0000135 0.00083% 0.00642%
𝝈𝟐𝒏 = 𝝀𝝈𝟐𝒏−𝟏 +(𝟏 − 𝝀)𝒖𝟐𝒏−𝟏
Daily volatility 0.28730% 0.80127%
17
Advantages of EWMA
❑ Relatively little data needs to be stored
❑We need only remember the current estimate of the variance rate and the most recent observation
on the market variable
❑Tracks volatility changes
❑l = 0.94 has been found to be a good choice across a wide range of market variables
✓Risk Metrics, a database created by JPM , uses λ=0.94 for updating volatility estimates across a
range of market variables
18
Estimating Volatilities - GARCH
❑ In GARCH (1,1) we assign some weight to the long-run average variance rate
 2n = g V L + a u n2 − 1 + b  2n − 1
❑Maximum weightage , 80% at least is assigned to β
❑Balance weight is distributed between ά and Ύ
❑Since weights must sum to 1
g + a + b =1
19
Computing volatility- GARCH
Relatives daily µi
Weighted
returns
Simplified formulae
Day Prices daily µ2i Weights squared
𝑺𝒊 𝒖𝒊 𝒖𝟐𝒊 𝜶𝜷𝒊−𝟏
0 20.00
ൗ𝑺
𝒊−𝟏 𝝈𝟐𝒏 = 𝝎 + 𝜶𝒖𝟐𝒏−𝟏 + 𝜷𝝈𝟐𝒏−𝟏
1 19.80 1.01010 1.01% 0.00010 10% 0.000010
2 20.13 1.01646 1.63% 0.00027 8.00% 0.000021
3 20.15 1.00119 0.12% 0.00000 6.40% 0.000000
Where 𝝎= ΎVL where VL is the long
4 20.18 1.00149 0.15% 0.00000 5.12% 0.000000 Assumed Parameters run average variance rate
5 20.17 0.99961 -0.04% 0.00000 4.10% 0.000000
53 20.00 0.99732 -0.27% 0.00001 0.00% 0.000000 α 10%
54 20.02 1.00090 0.09% 0.00000 0.00% 0.000000 β 80% And 𝜶+𝜷+ Ύ =1
55 20.00 0.99915 -0.09% 0.00000 0.00% 0.000000 γ 10%
56 20.00 0.99988 -0.01% 0.00000 0.00% 0.000000 σ2(LR) 0.00010
57 20.04 1.00181 0.18% 0.00000 0.00% 0.000000 ω 0.000010
Key difference: We are assuming a
58 20.04 1.00002 0.00% 0.00000 0.00% 0.000000 Long Run Volatility / SD and
59 20.04 0.99983 -0.02% 0.00000 0.00% 0.000000
60 19.98 0.99743 -0.26% 0.00001 0.00% 0.000000
assigning it a certain weight
0.000032
σ2(MA) 0.000825% σ2n 0.008210%
σ(MA) 0.2873% σn 0.9061%
20
Suppose that the price of an asset at close of trading yesterday was $300 and its volatility was
estimated as 1.3% per day. The price at the close of trading today is $298. Update the volatility
estimate using (a) The EWMA model with λ = 0.94 and (b) The GARCH(1,1) model with w =
0.000002, alpha = 0.04, and beta = 0.94.
a) Use 𝝈𝟐𝒏 = 𝝀𝝈𝟐𝒏−𝟏 +(𝟏 − 𝝀)𝒖𝟐𝒏−𝟏

Where un-1= −2/300 = −0.00667
b) Use 𝝈𝟐𝒏 = 𝝎 + 𝜶𝒖𝟐𝒏−𝟏 + 𝜷𝝈𝟐𝒏−𝟏
21
Comparing the three approaches
Method
𝒎
Volatility / Standard Deviation 𝟏 Assigns equal weight to each
𝝈𝟐𝒏 = ෍ 𝒖𝟐𝒏−𝒊
( Conventional Way) 𝒎 day
𝒊=𝟏
EWMA Assigns higher weight to

𝝈𝟐𝒏 = 𝝀𝝈𝟐𝒏−𝟏 +(𝟏 − 𝝀)𝒖𝟐𝒏−𝟏
latest days . 𝝀 ,typically 90%
and above , decides the
weight
GARCH 𝝈𝟐𝒏 = 𝝎 + 𝜶𝒖𝟐𝒏−𝟏 + 𝜷𝝈𝟐𝒏−𝟏 Weight is given to the Long
Run Range Variance (𝝎) to
which the Variance is
ultimately going to get pulled
. Can be used to FORECAST
22
Forecasting Volatility using GARCH
𝝈𝟐𝒏 = 𝝎 + 𝜶𝒖𝟐𝒏−𝟏 + 𝜷𝝈𝟐𝒏−𝟏
Key formulae we have learnt
𝝈𝟐𝒏 = 𝜸𝑽𝑳 + 𝜶𝒖𝟐𝒏−𝟏 +𝜷𝝈𝟐𝒏−𝟏
From above it can derived that 𝑬 𝝈𝟐𝒏+𝒕 = 𝑽𝑳 + 𝜶 + 𝜷 𝒕

𝝈𝟐𝒏 − 𝑽𝑳
❑ This equation forecasts the volatility on t day forward using the information available
at end of day n-1
❑ The Variance Rate exhibits mean reversion with reversion level of 𝑽𝑳 and a reversion
rate of 1- 𝜶-𝜷
❑ 𝜸 is effectively the rate at which the Volatility mean reverts 23
Suppose that the parameters in a GARCH(1,1) model are a = 0.03, b= 0.95 and w = 0.000002.
(a) What is the long-run average volatility?
(b) If the current volatility is 1.5% per day, what is your estimate of the volatility in 20, 40, and 60 days?
(c) Suppose that there is an event that increases the volatility from 1.5% per day to 2% per day. Estimate the effect on
the volatility in 20, 40, and 60 days.
• . λVL = w, therefore VL = 0.000002/ (1-0.03-0.95) = 0.0001
𝑬 𝝈𝟐𝒏+𝒕 = 𝑽𝑳 + 𝜶 + 𝜷 𝒕 𝝈𝟐𝒏 − 𝑽𝑳
Therefore , Volatility in 20 days = 0.0001+ ( 0.98) ^20 * ( 1.5%^2 – 0.00001) = 0.000183
Volatility = SQRT(0.000183) = 1.35%
With 2% , Volatility will be 1.73%
24
Vasicek Model
❑Very important for determining the credit risk capital for a portfolio of loans
❑For a large portfolio of loans, each of which has a probability of PD of defaulting by time T the
Worst Case Default Rate that will not be exceeded at the X% confidence level is
 N  PD  + r N ( X ) 
 −1 −1

WCDR = N  

 1 − r 

where r is the Gaussian copula correlation
❑Assumes default probability is same , so is default correlation amongst the loans
❑Credit VAR at X % confidence interval is WCDR X EAD X LGD less Expected Loss
25
Vasicek Model
❑ The result from the Vasicek model is used to determine the Credit VAR
Expected Loss
Probability
Credit VAR = Capital Requirement

for Unexpected Loss
Loss corresponding to WCDR of

99.9%
Loss over 1 year
❑Incidentally for PD=1%, and ρ= 0.5, WCDR works out to 42%. If correlation = 0, WCDR = PD
 N  PD  + r N ( X ) 
 −1 −1

WCDR = N  

 1− r 
 26
Suppose that a bank has made a large number loans of a certain type. The one-year
probability of default on each loan is 1.2%. The bank uses a Gaussian copula for time to
default. It is interested in estimating a “99.97% worst case” for the percent of loan that
default on the portfolio. Show how this varies with the copula correlation.
PD 1.2%
Confidence Level 99.97%
VALUE as given
Correlation N-1(PD) N-1(X) by Formula N(Value)
0 -2.257 3.432 -2.257 1.2%
0.2 -2.257 3.432 -0.808 21.0%
0.4 -2.257 3.432 -0.112 45.5%
0.6 -2.257 3.432 0.634 73.7%
0.9 -2.257 3.432 3.157 99.9%
27

3-Basic Stats

Uploaded by

Copyright:

Available Formats

3-Basic Stats

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3-Basic Stats

Uploaded by

Copyright:

Available Formats

Basic Statistical Concepts

✓ f (x) >= 0 for all x

f (x) = 1 / σ√2π e [ -1/2 ( x-μ )/σ)2]

❑ For μ= 0 and σ= 1, the pdf is called Standard Normal Distribution

CRITICAL Z Values And Percentiles

Percentiles 95 99 99.5 99.9

Z Value 1.645 2.33 2.58 3.08

✓ 68% of the population lies within

❑ Consider a portfolio with a mean return of 15% and SD of 25%.

Mean 15.00% Mean 15.00%

Lower Bound 10% Lower Bound Infinity

CDF Lower Bound 42% CDF Lower Bound 0%

PROBABILITY 15.9% PROBABILITY 27.4%

Mean 0.80% Mean 0.80%

Z value corresponding Z value corresponding to

Using z = (X-μ)/σ Using z = (X-μ)/σ

✓ Usually 3 is subtracted from the figure arrived

In case of Positive Skew, HIGH Returns are ✓ 3 : Normal Distn

In case of Negative Skew, HIGH RETURNS

Excess Kurtosis leads to a situation where

X f(x) Xf(X) X-μ (X-μ)2 f(x) X (X-μ)2

❑The correlation coefficient of X and Y, denoted by Corr(X, Y) or just ρ , is defined by

A B A-μA B-μB Product

Sigma b 2.32 CORREL 0.83

✓Continuous Compounding returns are used for Volatility computation

❑The volatility per year is 252 times the daily volatility

❑Variance rate is the square of volatility

Closing Price ln returns Squared

Daily Volatility 1.456% The above simplification is usually made

Estimate the daily volatility using both approaches

❑ Relatively little data needs to be stored

❑Tracks volatility changes

a) Use 𝝈𝟐𝒏 = 𝝀𝝈𝟐𝒏−𝟏 +(𝟏 − 𝝀)𝒖𝟐𝒏−𝟏

b) Use 𝝈𝟐𝒏 = 𝝎 + 𝜶𝒖𝟐𝒏−𝟏 + 𝜷𝝈𝟐𝒏−𝟏

EWMA Assigns higher weight to

From above it can derived that 𝑬 𝝈𝟐𝒏+𝒕 = 𝑽𝑳 + 𝜶 + 𝜷 𝒕

• . λVL = w, therefore VL = 0.000002/ (1-0.03-0.95) = 0.0001

With 2% , Volatility will be 1.73%

where r is the Gaussian copula correlation

❑Assumes default probability is same , so is default correlation amongst the loans

Credit VAR = Capital Requirement

Loss corresponding to WCDR of

You might also like