1B40 Practical Skills: Properties of The Gaussian Distribution

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

1B40 Practical Skills

Properties of the Gaussian distribution


The (Gaussian) normal distribution has the functional form
1  − ( x − µ )2 
p ( x) = exp  
σ 2π  2σ
2

For simplicity we will set µ = 0.

0.6

X= 0, σ =5

0.4
f(x)

0.2

0
-20 -15 -10 -5 0 5 10 15 20

What fraction of the results lie between -x to +x?


The probability density function f(x) is the fraction of results that lie between x and x + dx.
Define φ(x) such that:
x

φ ( x) = ∫ p ( x′) dx′,
−x

1  1  x′  2 
+x

φ ( x) = ∫ exp  − 2  σ   dx′.
σ 2π − x  
Let t = x'/σ and z = x/σ then φ(x) becomes
+z
1  t2 
φ (z) = ∫ exp − 2  σ dt.
σ 2π −z
This is symmetric about t = 0 and can be written as
2 +z  t2 
φ ( z) =
2π ∫0
exp −  .
 2
The function erf(z) is referred to as the Error Function and is defined by
2 z
erf ( z ) = ∫ exp  −t 2  dt
π 0

and is tabulated in mathematical tables. Thus we have

1
 z 
φ ( z ) = erf  .
 2
It represents the fraction of p(x) that lies within ± z standard deviations of the mean of the
distribution.

z = x/σ φ(z) Approximate fraction of readings


outside ±z
0 0 1
1 0.683 1/3
2 0.9540 1/20
3 0.9973 1/400
4 0.9994 1/16000

The erf function is plotted below.


1.0

0.8

0.6
erf(z)

0.4

0.2

0
0 0 .5 1.0 1.5 2 .0

z = x /σ

Important points
• Two out of three observations lie within ± σ (one in three outside).
• About one in twenty observations lie outside ± 2σ
• About one in 400 observations lie outside ± 3σ
• About one in 16,000 observations lie outside ± 4σ, i.e. there is one chance in 16,000 that
the true value lies outside this range.

You can use these results to check that σ has been estimated correctly. Roughly two thirds of
the readings should lie between x ± σ . When you quote a result as x ± σ m you imply the
probability that the true value lies in the quoted range is roughly two thirds.

2
Treatment of suspect results
The discussion above on the properties of the normal distribution can help guide us when we
have “suspect” results. Consider the following set of experimental data:

Residual ( t i − t ) (10-4 s2)


2
Time t (s)
5.38 25
5.42 1
5.48 25
5.30 169
5.34 81
5.29 196
5.97 2916
5.32 121
5.40 9
mean = 5.43 s sum=39367
σ=(39367/9)1/2 =2 ×10-1 s
σm=σ/3 =7×10-2 s

The result is T = (5.43 ± 0.07) s. The results are well clustered about the mean value except for
the 5.97 value. If you recalculated the mean omitting this result you get T = (5.36 ± 0.02) s.

Are you justified in neglecting this one reading? In general you never ignore the result of an
experiment in this way. However you can calculate the probability that the suspect result is
valid. The standard deviation of the distribution is ±0.20 seconds. (Note in the final result the
error quoted is standard error on the mean, not the standard deviation of the sample). The
suspect reading is about 3 standard deviations away from the mean; the probability of its being a
valid part of the distribution is about one part in 400 - i.e. not impossible, but unlikely!

You may be justified in ignoring a particularly unusual result BUT ONLY IF YOU CAN
SPOT THE SYSTEMATIC ERROR WHICH HAS LED TO THE ODD RESULT.

You can usually do this as you are doing the experiment. If for example you are determining
the value of some well known constant you may sometimes see (with hindsight) that a single
unusual measurement has led you to an inaccurate value for the result. Your alternatives are:
1. Repeat the entire experiment taking care to avoid what happened before (Best).
2. Comment on what has happened, noting the likely source of error in your final result
(Next best).
3. Remove the single odd measurement and recompute the answer (OK sometimes but
least preferable – could be dishonest!).

You might also like