P. Vanicek, Introduction To Adjustment Calculus

Download as pdf or txt
Download as pdf or txt
You are on page 1of 250

INTRODUCTION TO

ADJUSTMENT
CALCULUS

P. VANICEK

September 1973

TECHNICAL REPORT
LECTURE NOTES
NO.
35217
INTRODUCTION TO
ADJUSTMENT CALCULUS
(Third Corrected Edition)

Petr Vanfcek

Department of Geodesy & Geomatics Engineering


University of New Brunswick
P.O. Box 4400
Fredericton, N .B.
Canada
E3B 5A3

February, 1980
Latest Reprinting October 1995
PREFACE

In order to make our extensive series of lecture notes more readily available, we have
scanned the old master copies and produced electronic versions in Portable Document
Format. The quality of the images varies depending on the quality of the originals. The
images have not been converted to searchable text.
FOREWORD

It has long been the author's conviction that most of the

existing courses tend to.slide over the fundamentals and treat the

adjustment purely as a technique without giving the student a deeper

insight without answering a good many questions beginning with "why".

This course is a result of a humble attempt to present the adjustment

as a discipline with its own rights, with a firm basis and internal

structure; simply as an adjustment calculus. Evidently, when one tries

to take an unconventional approach, one is only too liable to make

mistakes. It is hoped that the student will hence display some patience

and understanding.

These notes have evolved from the first rushed edition - termed

as preliminary- of the Introduction to Adjustment Calculus, written for

course SE 3101 in 1971. Many people have kindly communicated their

comments and remarks to the author. To all these, the author is heavily

indebted. In particular, Dr. L. Hrad{lek, Professor at the Charles

University in Prague, and Dr. B. Lund, Assistant Professor at the Math-

ematics Dept. UNB, made very extensive reviews that helped in clarifying

many points. Mr. M. Nassar, a Ph.D. student in this department, carried


most of the burden connected with rewriting the notes on his shoulders.

Many of the improvements in formulations as well as most of the examples

and exercise problems contained herein originated from him.

None of the contributors should however, be held responsible for

any errors and misconception still present. Any comment or critism com-

municated to the author will be highly appreciated.

P. Van!~ek
October 7, 1974
CONTENTS

Introduction • . . .. . 1

1. Fundamentals of the Intuitive Theory of Sets

1.1 Sets, Elements and Subsets 6


1.2 Progression and Definition Set .... 7
1.3 Cartesian Product of Sets . .... 8
1.4 Intersection of Sets .... 9
1.5 Union of Sets ... .... 10
1.6 Mapping of Sets . .... 12
1. 7 Exercise 1 13

2. Fundamentals of the Mathematical Theory of Probability

2.1 Probability Space, Probability Function and Probabil..,.


ity ' I • I • • ' I • I ' I I 15
2.2 Conditional Probability . 16
2.3 Combined Probability 16
2.4 Exercise 2 • . • . • . . 18
3 Fundamentals of Statistics

3.1 Statistics of an Actual Sample


3.1.1 Definition of a Random Sample 20
3.1.2 Actual (Experimental) PDF and CDF . • • • . 22
3 .1. 3 Mean of a Sample . • . • 26
3.1.4 Variance of a Sample • . • , 29
3.1.5 Other Characteristics of a Sample •• 32
3.1.6 Histograms and Polygons • . . . . 34
3.2 Statistics of a Random Variable
3.2.1 Random Function and Random Variable •. 47
3.2.2 PDF and CDF of a Random Variable , ••• 47b
3.2.3 Mean and Variance of a Random Variable .• 51
3.2.4 Basic Postulate (Hypothesis) of Statistics,
Testing . . . . . . . . . . . . . 55
3.2.5 Two Examples of a Random Variable 56
3.3 Random Multivariate
3.3.1 Multivariate, its PDF and CDF • • • 66
3.3.2 Statistical Dependence and Independence , . 69
3.3.3 Mean and Variance of·a Multivariate . • . . 70
3.3.4 Covariance and Variance-Covariance Matrix • 72
3. 3. 5 Random Multisample, its PDF and CDF . . • . 76
3.3.6 Mean and Variance-Covariance Matrix of
a Multisample 76
3.3.7 Correlation . • . . . . . . . • • • 81
4. Fundamentals of the Theory of Errors

4.1 Basic Definitions . . . • . • . 89


4.2 Random (Accidental) Errors 91
4.3 Gaussian PDF, Gauss Law of Errors . 92
4.4 Mean and Variance of the Gaussian PDF • 94
4.5 Generalized or Normal Gaussian PDF 97
4.6 Standard Normal PDF . . . . . • 98
4.7 Basic Hypothesis (Postulate) of the Theory of Errors,
Testing • • . • • • 106
4.8 Residuals, Corrections and Discrepancies • • . . • • . • • 109
4.9 Oth~Possibilities Regarding the Postulated PDF . • • . . . 112
4.10 Other measures of Dispersion 113
4.11 Exercise 4 . . . . . . . . . 118
5. Least -Squares Principle

5.1 .-The Sample Mean as "The Least Squares Estimator~' • • . 123


5. 2 The Sample Mean as "The Maxi111u!n Probability Estimator" • • 125
5. 3 Least Squares Principle . . , . . . • . • . . 128

5.4 Least-Squares Principle for Random Multivariate 130


5.5 Exercise 5 . . . • • · • · • • • 132
6. Fundamentals of Adjustment Calculus

6.1 Primary and Derived Random Samples 133


6.2 Statistical Transformation, Mathematical Model . 133
6.3 Propagation of Errors
6.3.1 Propagation of Variance-Covariance Matrix, Covariance
Law . • • . • 137
6.3.2 Propagation of Errors, Uncorrelated Case • . • . 145
6.3.3 Propagation of Non-Random Errors, Propagation of
Total Errors • • 153
6.3.4 Truncation and Rounding . . • • . . • • • • • • . . 157
6. 3. 5 Tolerance Limits, Specifications and Preanalysis 162
6.4 Problem of Adjustment
6.4.1 Formulation of the Problem • • . . . • • • • • 166
6.4.2 Mean of a Sample as an Instructive Adjustment
Problem, Weights . • . , • 167
6.4.3 Variance of the Sample Mean • • • . • • • • • • • • 170
6.4.4 Variance -Covariance Matrix of the Mean of a
Multisample 174
6.4.5 The Method of Least-Squares, Weight Matrix . 176
6.4.6 Parametric Adjustment • • • . . • • . • • • • . . • 179
6.4.7 Variance-Covariance Matrix of the Parametric
Adjustment Solution Vector, Variance Factor and
Weight Coefficient Matrix . . • • . . • . . . 193
6.4.8 Some Properties of the Parametric Adjustment
Solution Vector . . • . • . • • . • • • • . . • 201
6.4.9 Relative Weights, Statistical Significance of a Priori
and a Posteriori Variance Factors . . . . • • 202
6.4.10 Conditional Adjustment • . . • • . . . . . .• . 204
6.4.11 Variance-Covariance Matrix of the Conditional Adjust-
ment Solution . 213
6, 5 hercise 6 . . . . . . . . . . . . . . . . . 220

Appendix I Assumptions for and Derivation of the Gaussian PDF . 233


Appendix II Tables . . • • . 238
Bibliography • . . . • • • • • . . . . • • 241
INTRODUCTION

In technical practice, as well as in all experimental sciences,

one is faced with the following problem: evaluate quantitatively para-

meters describing properties, features, relations or behaviour of various

objects around us. The parameters can be usually evaluated only on the

basis of the results of some measurements or observations. We may, for

example, be faced with the problem of evaluating the length of a string.

This can be measured directly. Here the only parameter we are trying to

determine is the observed quantity itself and the problem is fairly

simple. More complicated proposal would be, for instance, to determine

the coefficient of expansion of a rod. •rhen the parameter--the

coefficient of expansion--cannot be measured directly, as in the previous

case, and we have to deduce its value from the results of observations of

length, by performing some computations using the mathematical relationship

connecting the observed quantities and the wanted parameters. The more

complicated- the problems get, of course, the more complex is the system

whose parameters we are trying to determine. Obviously, the determination

of the orbital parameters of a satellite from various angles observed on

the surface of the earth would be an example of one such still more

sophisticated task.

The adjustment is a discipline that tries to categorise those

problems and attempts to deal with them symmetrically. In order to be

able to deal with such problems systematically the adjustment has to use a

language suitable for this purpose, the obvious choice being mathematics.

l
2

Hence, the problem to be treated has to be first "translated" into the

language of mathematics, i.e., the problem has to be first mathematically

formulated. The mathematical formulation of the problem would really be

the mathematical formulation of the relation between the observed quan-

tities (observables) and the wanted quantities (parameters). This relation-

ship is called the mathematical model. Denoting the observables by L

(L stands for one, two, or n quantities) and the parameters by X (X stands

for one, two or m quantities) the most general form of the mathematical

model cna be written as

F (X, L) = 0 •

The above equation merely states that there is a (implicit) relation

between the observables and the parameters. The formulation of an actual

mathematical model has to be done taking into account all the physical

and geometrical laws--simply using the accumulated experience. The com-

plexity of the mathematical model reflects the complexity of the problem

itself. Thus the mathematical model of our first problem is practically

trivial:

X= L

where X is the wanted length and L is the observed length.

The mathematical model for the coefficient of expansion a of the

rod is more complicated, namely, for instance

R, = R, 0 (1 + at)

where a = X, the observed length R, and the observed difference in temper-

ature t create L and R,


0
is another parameter (length of the rod at
3

a fixed tern~erature) which we happen to know. The mathematical model for

the satellite orbital elements would be more complicated still.

Once the mathematical model has been formulated it can become

a subject of rigorous mathematical treatment, a subject of adjustment

calculus. Hence, the formulation of the mathematical model itself is to

be considered as being beyond the scope of adjustment calculus and only

the various kinds of mathematical models alone constitute the subject of

interest.

There is one particular class of models, that are very often

encountered in practice, and that can be termed as overdetermined. By an

overdetermined model we understand a model which does not have a unique

solution for X because there are "unnecessarily many" observations supplied.

This can be the case, say, with our first example, if the length is measured

several times. The model in this case would be formulated as

X = R. 1
X = R,
2

X = R.n
where t 1 , t 2 , .•• , tn are all encompassed by the symbol L. Or, in the

second example, we may have

t1 = t0 (1 + at 1 )

R- 2 = R- 0 (1 + at 2 )
4

••• I t
n
) = L.
As we can easily see, these overdetermined models may or may

not have a unique solution. They usually have not. Therefore, in order

to produce a unique solution of some kine, we have to assume that the

observations were not quite correct, that there were errors in their

determinations.

This leads us into the area of the theory of errors with its

prerequisites--the theory of probability and statistics. With the help

of these disciplines we are aboe to define the most probable unique

solution (if it exists) for the parameters of the mathematical models.

We also are usually able to establish the degree of reliability of the

solution.

The notes are divided into six sections: Fundamentals of the

intuitive Theory of Sets, Fundamentals of the Mathematical Theory of

Probability, Fundamentals of Statistics, Fundamentals of the Theory of

Errors, Least-Squares Principle, Fundamentals of the-Adjustment Calculus.

The first four sections describe the relevant parts of the individual

fields that are necessary to understand what adjustment is all about. They,

by no means, claim any completeness and it is envisaged that an interested

student will supplement his reading from other sources, such as those

listed at the end of these notes.

A separate section (5) is devoted to the philosophical basis

of the adjustment calculus. Although not very extensive it should be

regarded important, giving the reasons why the least-squares techniqu~

is used in adjustment.
5

Finally, the last section deals with the basics of the adjust-

ment proper. Here again, only the introductory parts of the adjustment

calculus could be treated with the understanding that only the subsequent

courses will develop the full picture.

Throughout the course emphasis is placed on the parallel develop-

ment of concepts of "discrete statistics", i.e. statistics of random

samples, and "continuous statistics", i.e. statistics of random variables.

While random samples are the quantities we deal with in every-day practice,

the mathematical tools used are predominently from the continuous domain.

Good understanding of the interplay of the two concepts is indispensable

for anyone who wants to be able to use the adjustment calculus properly.

The bibliography given at the end of these notes lists some

of the useful books dealing with statistics and adjustments. Interested

reader is recommended to complement the reading of these notes by turning

to at least some of the listed sources.


1. FUNDAMENTALS OF THE INTUITIVE THEORY OF SETS

1.1. Sets 1 Elements and Subsets

A set is an ensemble of objects (elements) that can be distin-

guished one from another. The set is defined when all its elements are

defined.

Example 1.1: - { k':- :o::: 'I/,,


4.18 }
Al

A2 - { 1, 8,
/I I'-

15, <r
"
0,
'I I,
.::0~
/p-
4}

A3 - { 0, 1}

A4 - {all the left feet}

A5 _ {all the cities with more than one million inhabitants

in New Brunswick}

R - {all the real numbers} , and

I - {all the positive integers} , are all sets.

The text within the brackets{ ••• } is known as the list of the-

set. If an element a exists in the list of a set A, we say that the element

a belongs to the set A, and this is denoted by

a e: A

which is read as "a belongs to A". On the other hand, if an element a

does not belong to a set A, we write

a t A

which is read as "a does not belong to A".

Example 1.2: Referring to Example 1.1, we see that:


\I"~
-;0.:-
,, \
e: -A-1 2 t A1 , and a right foot t A4 .

6
7

A part of a set G is called a subset of G whether it contai~s one

or several elements. The fact that a set H is contained in G is hence

written as

HC G

If H is not contained in G, i.e. if not all the elements of H are at the

same time elements of G, we write

H¢ G

Example 1.3: Referring to Example 1.1, we see that:

. {2 35 118} C::: I

3 6.2} c:: R, and · { ~' \?

A set which does not contain any element is known as the empty

(void or null) set, and is denoted by~ •

Example 1.4: The set A6 = {all people taller than 10 feet}

contains no elements, i.e. A6 =~ . Also from Example 1.1,

we find that A5 =~ .
The sets are called egual if they contain the same and none but

the same elements.

Example 1.5: All the following sets are equal

. {1 ' 2 ' 3} ' {3 ' 1 ' 2} ' {2 ' 3 ' 1} ' •••

1. 2. · P:t(;)gresston and Definition Set

A progression ~ is an ordered (by ordered we mean that ~ is

arranged such that each of its elements has a specific position) ensemble

of objects (elements) that may not all be distinguishable one from another.

The definition set D of a progression ~ is the set composed from all the

distinguishable elements of ~. In such a case, we shall also say that D


8

belongs to ~.

Example 1. 6: ~ = tl , 2 , W', 2 , 1 , 8 , C/ ) , is a
progression, and its definition set D is given by
D :: {1 , 2 , u, 8 , CV' }•
At this point, the difference between a progression and a set

should be clear in mind. For instance, the progression


( &((', 8, 2 1 , 1 , 2 , ~ } represents a different progress-

ion than the one given in Example 1.6. However, the sets

{U, 8 ' 2 ' 1} ' . {2 ' 1 ' 8 ' cfC/' ' " }' ...
are all the same as the definition set D in Example 1.6.

1.3. · Cartesian Product of Sets


The Cartesian product of two sets A and B is a set, called the

product set and denoted by Ax B (reads A cross B), whose elements are all

the ordered two-tuples of elements of the component sets A and B. Hence,

if a g A and b g B, then the two-tuple (a,b) g A x B. However, if b i A


or a i B, the two tuple. (b,a) i Ax B.
The above definition can be extended to more than two sets, say

n sets. In such a case, the elements of the product set will be all the

ordered n-tuples of elements of the component sets. Accordingly, we can

define the Cartesian n-power Anlo'r An.if no danger of confusi.on with indexed
set exists·I of. a set A as ·t:ne Cartes-ian p;rC!>duat ot tl:i.e same ·s:e-t A with itself
n-times.
Example 1.7: If A:: {3 , 1 , 5} and B = {2 , 4} , then the product set

Ax B is AX:B- {(3, 2), (3, 4), (1, 2), (1, 4), (5, 2),
( 5 , 4)} . Referring to Example 1.1, we can easily see that:
' ',
' :0: )
I I '
g Al X A2 ' ( 4 .18 ' <r
9

(1 , <:::::> ) t A1 x A2 but (- 1 , ~ ) e: A2 x A1 ,

(1 , 2 , 15 , 1 , 8) e: I 5. , and ( 5 .16 , 3 • 26 , 1 , 0 , 1 ) e: R5..

1. ~-. Intersection of Sets

The intersection of two sets A and B, denoted by An B, is a

set which is a subset of both A and B and does not contain any elements

other than the common elements to A and B. The intersection of two sets

can be represented by the shaded area in Figure 1.1· Diagrams of this kind
are called "Venn diagrams" .

Figure 1.1

From the above diagram we can easily see that

Example 1.8: Referring to Example 1.1, we find that:


n ' ,,;
A
1 A2 : { :o-:.
/,,, }, Rni:I,

Note that we can define a subset A of B as such a set whose

intersection with B is A itself. In other words, if AC: B them An B : A,

or vice versa (see Figure 1.2).


10

Figure·l.2

If A() B = r/J, then the sets A and B are said to be disjoint sets.

The intersection of n sets A1 , A2 , •.• ,An is usually denoted by

n
() A. , where
i=l ~

n
('\ A. ::: A1 () A2 tl A3 ... () A •
i=l ~ n

This is illustrated in Figure 1.3 by the common area to all sets.

·Figure 1.3

1.5. Union of Sets

The union of two sets A and B, denoted by AU B, is a set that


contains all the elements of A and B and none else. Similar to the inter-

section, the union of the two sets is represented by the shaded area in
11

Figure 1.4

Figure 1.4

The union of n sets A1 , A2 , ... ' A is denoted by


n
n
V A. where
i=l ~

Example 1.9: Referring to Example 1.1, we obtain

\? ' 4.18, 1, 8, 15, ~

0' 4 } '

and I U R ::: R

Thinking of the union as the addition of sets, the subtraction

of two sets is known as the complement of one into the other. Referring

to Figure 1.5, and considering the two sets A C:: B, the set of all the

elements contained in B and not contained in A is called the complement

of A in B, and is dimoted by B - A. --
12

13-A

Figure 1.5

Example 1.10: Referring to Example 1.1, we get:

The complement of A3 in A2 is
-\I,~
A 2 - A3 : { 8, 15 , (t , :0: , 4} , and
I I \

H-I 5, { all real numbers that are not integers}.

1.6. Mapping of Sets

f is called a mapping of A into B if it relates one and only one

element from B to each element of A. This means that for each element

a£ A there will be only one corresponding image b £ B (see Figure 1.6).

Figure·l.6

Note here that the one-to-one relationship (i.e each b £ B has got one

and only one argument. a-~ A) is not required. We shall denote any such mapping

by

and read it as "f is an element of the set of all the mappings of A into B",
13

or simply "f is a. mapping of A into B", or "f maps A into B".

If the elements of B are all images of the elements of A, then

f is called a.n onto ~a.pping, or simply we say that "f maps A onto B".

If A and B are numerical sets, then f is called a function

(which gives the mathematical relationship between each a. e A and its

corresponding image be B). In this case, the image b of a will be nothing

else but the functional value f(a) •

Example 1.11: Given the set A = {a1 , a 2 , a. 3 } = {2, -1, 3} and the mapping

f e {A+ B} , where f(ai) = a.~ for each a. e A, i=l,2,3,


~ ~

then the images b. e B are computed as the functional values


·~

of the corresponding elements ai e A, i.e. bi w f(ai) = ai ,

which give b 1 = (2) 3 =8 , b2 = (-1) 3 = -1, and


. b 3 = (3)
3

= 27.
Generally, f is an into function, hence we write

(8, -1, 27) e B . However, if f is an onto function, then


the image set B of this example is given a.s

B 3 {8, -1, 27} •

1. 7. Exercise 1

1. Which of the following sets are equal?


{:e, r, s}, {s, r, 'C}, {r, s, t}, {-£, s, r} .
2. Let A: {d}, B: {c, d}, c = {a, b, c}
'
D - {a, b} and H: {a, b, d};

(i) is BCD 7 (ii) is c- B 1

(iii) is DCC 1. (iv) is B :# H 7

(v) is Ac:H .
1 (vi) is (AU D) C: H ?

(vii) is (A("\ B) ¢;. C 1 (viii) is (R (\ C) : D 1


14

3. Let u = {1, 2, 3, ... '


8' 9} ' A :: {1, 2' 3' 4} , B :: {2, 4, 6, 8} '
c = {3, 4, 5, 6} andD = {1' 3, 5, 7' 9}; then find the following:
( i) BUD ~
. ( ii) A()c .
)

(iii) AUB .
) (iv) U- A j •

(v) a set H, which is a subset of all the sets U, A and D.

4. Considering the following Venn diagram with the sets A, B, C, D and H,

indicate by shading the suitable areas on separate diagrams, the

following sets:
( i) DUH j
(ii) H.n c )

(iii) en B . I

(iv) A- C )

(v) BUc .
J

(vi) (11: - B) U (B f1 C) J'

(viii) A - (c U B) ·

5. Considering the two sets:

A = {3, 4, 0, -1} and B :: {-2, 5} , find the Cartesian products A x B


and B x A. Also find the second power B2 of the set B.

6. Given the set X:: {-2, -1, 0, 1, 2} , with f E {X~ Y}. If for each

x E X, f (x) = x 2 + 1, find the image set Y considering that f is an

onto function.
2. FUNDAMENTALS OF THE MATHEMATICAL THEORY OF PROBABILITY

2.1 Probability Space, Probability Function and Probability

Let us have a set D ~ ~ and let us assume that it can be par-

titioned into mutually disjoint subsets Dj C D such that D !: UD. (by


J J
mutually disjoint subsets we mean such subsets that D.n D. = ~ for any
~ J
pair Di' Dj' i ~ j). Such a set D we shall call probability space.

Any mapping P of D onto [0, 1] (that is the set of all positive

real numbers "b" satisfying the inequalities (0 .:::_ b < 1)) that has the

following two properties:

(1) If D'C D, then P(D') = 1- P(D- D'), (note that D- D') is the

complement of D' in D; see section 1. 5) , and


n
(2) If Dl' D2' ...
, Dn C D are mutually disjoint, then P( D.) =
~
u
i=l
n
~ P(D.), is called a probability function. The value (P (D I))
i=l ~

of the probability function P (takes any value from [0, 1]) is called

the probability. Note that the difference between the function and

the functional value has been mentioned in section 1.6.

The above two properties of the probability function have the

following consequences:

{_1} P(_D) = 1,

(2) P(~) = 0,

(3) If D' C D; then P (D' ) .:::_ 1,

(4) If D"CD'; then P(D"l .:::_ P(D'}, and

(5) If A, BCD, and A()B = {l1; then P CAU B) = P (A} + P (B).

15
16

If D is a ;point set , i.e. its elements can be represented by

points , it is always decomposable.

The value E P (D. )


~
E T0, 1] is sometimes called the total or
i

accumulative probability of U D.•


i ~

2~2L Conditional Probability

If A, 13 c D; then the ratio P(A()13)/P(13) = P(A/13) is called.

the conditional ;probability. The right hand side, that is P(A/13),is read

as "probability of A given 13". In other words, the conditional probability

P(A/13) can be interpreted as the probability of occurrence of A under the

condition that B occurred.

From the above definition of the conditional probability, we

notice that:

(1) If P(B) = 0; then P(A/13) is not defined,

(2) If B CA; then A ()B =B (see section 1.4), and then P(A/B) = 1, ·
(3) If A(\ B = C/J; i.e. A and Bare disjoint sets ; then P(A/B) = 0.

2._3 ... Combined Probability

If the conditional probability P(A/B) equals to P(A), then it is

clear that the occurrence of A does not depend on the occurrence of B. In

such a case we say that A and 13 are independent. Using the definition of

the conditional probabi~ity from the previous section, we can write:

P(A n B) = P(A) • P(13)


This can be understood as the probability of simulta.rteous occurrence of A
and B, which is usually denoted by P(A, B) sn read as probability of A
17

and B, and known as the combined (compound) probability of A and B, that is

P(A, B) = P(A) · P(B).

Similarly, we define the combined probability of occurrence of

the independent D1 , D2' • • .·I Dnc: D as the product of their individual

probabilities, i.e.

p (D.' Dj)
l.
= p (D.)
l.
P(Dj) i :! j

p (D.
l.
I
Dj' Dk) = p .(D.)
l.
P(Dj) P(Dk) i :! j, j :! k, i :! k,

n
IT p (D.)
l.
i=l

Example 2.1: Suppose we have decomposed the probability space D into seven

mutually disjoint subsets D1 , D2 , ••• , D7 as shown in Figure

2.1 such that:


7
D = U D.
l.
i=l

Figure 2.1.

Assuming that the probabilities P(D.) of the individual


l.

subsets Di are found to be:

P(Dl) = 1/28, P(D 2 ) = 2/28, P(D 3 ) = 3/28, P(D 4 ) = 4/28,


P(D 5 ) = 5/28, P(D 6 ) = 6/28, and P(D 7 ) = 7/28; then we get:

Total probability of D., i


l.
= 1, 2, ••• , 7 is
7
P(D) = P(U D.) = L P(Di) = (1+2+3+4+5+6+7)/28 = 28/28 = 1.0.
i l. i=l
7
Combined probability of all Di = II P(D.) = 0
l.
i=l
18

Example 2.2: In this example we assume that our probability space D is

decomposed into five elements dj € D, j = 1, 2, ••• , 5. If

the probabilities P(D.), as represented by the ordinates in


J
Figure 2.2, are given as:

+------....-··--···------·-----......-
O,'?J"'c.

0. '2,-11-----...----·---r------ --------------1--
0.\-t------ ·----

P(dl) = 0.2, P(d2 ) = 0.3, P(d 3 ) = 0.1, P(d4 ) = 0.1,


and P(d 5 ) = 0.3; then we get:
5
Total probability P(D) = P(L/d.) = ~ P(d.) = 0.2+0.3+0.1+0.1+0.3
J J j=l J
= 1.0 .

Combined probability of d 1 and d 2 (for example) = P(d1 , d2 )


2
= rr P(d.> = o.2.o.3· = o.o6.
j=l J --

This combined probability has to be underst't>Od as the probabili·ty

of simultaneous occurrence of d 1 and d 2 under the assumption

of their independence.

2.4. Exercise 2.

We have determined that every number of a die have the proba-

bility of appearing when the die is tossed, proportional to the number itself.
19

Let us denote: A = {even numbers\ , B = [prime numbersj, and C = [odd


numbersj; all subsets of the set of numbers appearing on the die.

Required: (1) Construct the probability space D.

(2) Find the probability of each individual element d. E D.


~

(3) Find P(A), P(B) and P(C).

(4) Find the probability that:


(i) an even or prime number occurs,

(ii) an odd prime number occurs,

(iii) A but not B occurs.


III. FUNDAMENTALS OF STATISTICS

3.1 Statistics of an Actual Sample

3.1.1 Definition of a Random Sample

Any finite (i.e. containing only a finite number n of elements)

ordered progression of elements (see section 1.2) s = (sl' s2' ... , sn)

such that:

(i) its definition set D (see section 1.2) can be declared a

probability space (see section 2.1); and

(ii) it has the probability function P 'defined. for. every


. . ' . d;&
l. D in such a

way that P(d.)


~
= c./n, where
~
c. is the count (frequency), of the
~

element di ·in s.. -::_-~· · · ·


may be ~Hw:lea ~ :r.an~ $~l.e: ~-·- 'I'.lJ;e: :r~~q;;l. ~j:l:P. is knovm as the relative
frequency.
Example 3.1: Consider the following progression s

~1 s2 s3 s4 s5 s6 s7
~ !! { 1, <:>
' cfCJ': 1, 1, <I. ' r:::; }

which has seven elements, (i.e • n = 7) •


The definition set D of ~ will be

dl d2 d3 d4

D= {1 , ~ , &Cr: CL } , which

consists of four elements (i.e. m = 4),


the counts of which are:

20
21

corresponding probabilities (relative frequencies)

are:

P(dl) = p (1) = 3/7, P(d 2 ) = P(Q) = 2/7,

p (d3} = P(t/) = 1/7, and

P(d4} = P(c{ ) = 1/7.

Note here that really both properties required from P to be a

probability function (section 2.1) are satisfied. In particular we have

(from the above example}: the total probability


m
P(D) =P U d,)
~
i=l
4
= l:
i=l

Accordingly, any finite ordered progression of elements may be

declared a random sample. This is a very important discovery and has to

be born in mind throughout the following development. As a result, it

is always possible to construct the probability space and the associated

probabilities "belonging" to the sample (i.e. the probability associated

with each element in the definition set of the sample).

From now on we shall deal with DCR (recall that R is the set of

all real numbers}, i.e. with numerical sets and progressions only. Also,

D will be considered ordered in either ascending or descending sense;

usually the former is used.

It has to be noted here that our definition of a random sample

is not standard in the sense that it admits much larger family of objects to

be called random samples than the standard definition. More will be said

about it in 3.2.4.
22

Example 3.2: A die is tossed 100 times. The following

table lists the six numbers and the

frequency (count) with which each number

appeared:

number d.J.. 1 2 3 4 5 6

count c.J. 14 17 20 18 15 16

Find the probability that:

(i) a 3 appears ;

(ii) a 5 appears;
(iii) an even number appears j

(iv) a prime number appears.

Solution:
20
(i) P(3) = - - = 0.20 )
100
(ii) P(5) ··--
15
100 = 0.15 J
(iii} P(2 ,4 ,6) = P(2) + P(4) + P(6)
17 18 16 51
= 10'0 + 10'0 + IOo = IOo = 0. 51 ~
(iv) P(2,3,5)'= P(2) + P(3) + P(5)
17 20 15 52
= I05' + I05' + laO= laO= 0.52 •

3.1.2 Actual (Experimental) Probability Distribution Function (PDF)

and Cumulative Distribution Function (CDF)

If the random sample ~ is a progression of numbers only (and,

of course, its definition set Dis a numerical set), which we shall from

now on always assume, then Pis a discrete function mapping D into [0,1].
23

This function is usually called experimental (actual) probability

distribution function (or experimental frequency function, etc.) of the

sample~' and abbreviated by PDF. The values P(di), di ~ D, are known

as experimental probabilities of d., which are equal to the corresponding


~

relative frequencies.

Example 3.3: Assume that a certain experiment gave us the

following random sample:

~ :.; ( 1 , 2 , 4 , 1 , 1 , 2 , 1 , 1 , 2 ) , n = g_.

Then its definition set is:

D~ {1, 2, 4} = {d., i=l,2,3} , m = 3 •


~

Therefore, the frequencies ci of di are:

c1 = 5 , c2 = 3 and c3 = 1.

The corresponding experimental probabilities

are: P(l) = 5/9, P(2) = 3/9 and P(4) = 1/9 •.


3
As a check i!l P(di} = ~ (5+3+1) = 1.
The discrete PDF of the given ~ in this

example is depicted in Figure 3.1 (which

is sometimes called a bar diagram), in which

the abscissas represent d~ and the


~

ordinates represent the corresponding P(d.).


~

2
Figure 3.1
24

Since we are using numerical sets only (and therefore ordered),

it makes sense to ask, for instance, what is the actual probability of d

being within an interval D.J::D, where D' .- [ ~, d j]. Such probability

is denoted by P(D' 2 or P(~ .::_ d .::_ dj). To answer this question, we use

the actual PDF and get

(3.1)

The above expression (equation 3.1) must be understood as giving the actual

probability of d€D 1 s {dk' ••• , dj~D rather than d€[~, dj] (i.e. the

probability that d will acquire a specific discrete value equal to

~' dk+l' ••• , dj-l' dj rather than the probability that d will be
anywhere in the continuous interval I~, dj]}. This is not always properly

understood in practice.

The function C of d .... D given by


~

C(d.l
~
= E P(dj) €[0,1] (3.2)
j<i

is called experimental (actual} cumulative distribution function (or

summation density function, etc.) of the sample~. and usually

Example 3.4:
-
abbreviated by CDF.

Using the data and results from example

3.3, we can compute the CDF of the given

sample~ by computing each C(d.) as follows:


1

C(d1 ) = P(d1 ) = 5/9,


C(d2 ) = P(d1 ) + P(d2 } = 5/9 + 3/9 = 8/9,
and C(d 3 l = (P(d1 ) + P(d2 }} + P(d3 )

= C(d2 l + P(d3 } = 8/9 + 1/9 = 9/9 = 1.


25

Figure 3.2 illustrates the discrete CDF of the given

sample 1;.

1 ------ --- -·--·---


8/9 - ---- --- ---.
I
I
5/9 --- --- .. rr-·----~~~J
I
I
I
f
0~--~~~------~--------------~---------d·l
4
' 2.

Figure 3.2

From Figure 3.2, we notice the following properties of the

CDF:

(i} the value (ordinate} of the CDF is always positive,

(ii) the CDF is a never decreasing function,

(iii l the cumulative probability C(<\), where dm is the largest

d .. e::D, is always equal to 1.


l.
26

Example 3.5: Using the data from example 3.2, we

can construct the CDF of the die tossing

experiment as follows:

C(l) = P(l) = 0.14,

C(2) = C(l) + P(2) = 0.14 + 0.17 = 0.31,

C(3) = C(2) + P(3) = 0.31 + 0.20 = 0.51,

C(4) = C(3) + P(4) = 0.51 + 0.18 = 0.69,

C(5) = C(4) + P(5) = 0.69 + 0.15 = 0.84, and

c(6) = d(5) + P(6) = o.84 + 0.16 = 1.00.

Note again that the maximum value of the

CDF is one. The graphical representation

of the above CDF can be constructed similar

to Figure 3.2.

3.1.3 Mean of a Sample

Consider the sample~ 5 (~ 1 , ~ 2 , .•. , ~n} with its definition

••. , d }. The real number M defined as:


m

n
1
M=- E ~- E [d1 , d ], (3.3)
n i=l ~ m

is called the ~ (average) of the actual sample.

We can show that M equals also to:

[ M =
m

i=l
E d.~ P(di)
.1 (3.4)
27

The proof of (3.4) reads as follows:

m 1m
R.H.S. = ~ d.
~
(c./n)
~
=- ~ d.c. =
i=l n i=l ~ ~

1 n
= l: t;, =M •
n i=l ~

The mean M of a sample can be interpreted as the outcome of

applying the summation operator ~ divided by n on ~, and is often

written as:

M= E(~) =mean (~) = ave (~) = -~ , (3.5)

where the symbol E (an abbreviation for the "mathematical Expectation")

must be understood as another name for the summation operator E opera-

ting on P(d.)d. (and not on ~i!).


~ ~

Note that E is a linear operator, and hence it has the following

properties (where k is a constant and ~ is a random sample) :

(i) E(k) = k,

(ii) E (k~) = kE (~),

(iii) E(~+k) = E(~) + k,

(iv) E(E ~j) = E E(~j), where ~j, j = 1, 2, ••. , s, ares random samples
j j
with the same number of elements m in their corresponding definition

sets Dj (Do not confuse~. with ~j~ the former is an element in the
J
latter. In other words, ~. is a single element in a sample, but
J
~j is one sample in a class of samples) ,
(v) If ~ = (~ 1 ), then E~) = ~l'
(vi) E (E(~)) = E (~) •
28

Example 3. 6: Using the random sample l; given in example

3.3, let us compute its mean from equation

(3.3) as follows:

1 n 1 15 2
M=E( ~ )= - r ~. = -(1+2+4+1+1+2+1+1+2 }= - = 1-
n i=l 1 9 · 9 3
Also, we can use equation (3-4), from which we

get:

M=E(~)=
m
.: djP(dj)= 1.
J-l 1
+ 2 • ~ + 4 •
15 2
t ~
=- (5+6+4'=- = 1-
9 '- 9 3 .
Obviously, both formulae (3.3) and (3.4) give

identical answers.

It is interesting to note that computing the mean of a sample

using equation (3.4) is analogous to computing the centre of balance

in mechanics. This can be simply seen by considering the probabilities

P(d.) or the counts c. as weights, and then taking the r moments= 0


l l

about any point , e.g. the origin 0 (see Figure 3. 3 which uses the data

from example 3.3).

I
I f (cw-s) .l.. (or 3) }(or1)
~ ·9~

I

~0
(
1 4
'
2 4
I I

~ M ~
I

Figure 3.3
29

The resulting distance of the centre of balance from the point is

nothing else but the sample mean M.

It is worthwhile mentioning here that, based on the above analogy

with mechanics, the mean M computed from equation {3.4) is also called the

weighted mean, in which each element d. ED is weighted (the concept of


~

weights is to be discussed later in details) by its probability P(d.).


~

3.1.4 Variance of a Sample

Let us have again an actual sample~= (~ 1 , ~2 , ... , ~n) with

a mean M. Then, the real number s 2 defined as


n
2
4 ( ~. -M) ' ( 3. 6)
~
i=l

is called the variance (dispersion) the actual sample. The square root

of the variance s 2 , i.e. s, is known as the standard deviation of the


sample.

Keeping in mind the relationship between the random sample ~

and its definition set D, we can write:

n m
1 E 2
E d. P(d.)
n
i=l j=l J J

which will provide another expression for s2 , namely:

m
= E P (d.) (d. -M) 2 (3.7)
j=l J J
30

5 2 can be also interpreted as the outcome of the application

of the operator Eon (s-E(s)) 2 meaning really P(d.) (d.-M) 2 and is often
J J
written as

52 =E ((s-E(s)) 2 = var (s). ( 3. Ba)

Carrying out the prescribed operation, we get

Applying the calculus withE operator (as summarized in section 3.1.3),

we obtain:

52 = E<s 2 >-2E<s>E<s> + E2 <s>

= E<s 2 > - E2 <s> •

From equation (3.5) we have E(S) = M, then by substituting for E(s) we get

(3.8b)

Consequently, the corresponding expression to equation (3.7b) will be:

m
2 2
E d. P (d.) -M (3.9)
j=l J J

It is worth mentioning that giving the analogy with mechanics

(as discussed in the previous section) we can regard the variance of the

sample (equation 3.7)) as the moment of inertia of the system of corres-

pending mass points with respect to M.


31

Example 3-. 7: Let us compute the variance 8 2 of the sample ~

given in example 3.3, by using equation (3.8b).

F;irst, we compute the first term E(,;2) as

follows:
n
E(E; 2 )= 1. ~ E;~= 1.
9 (1+4+16+1+1+4+1+1+4) =J1_ •
n i=l ~ 9

8ubsti tuting in equation ( 3 .B'b) , and knowing

that M = 1 ~ from example 3.6, we get:

33 9 .3;3..(15) 2-
8 2-- var (c)-
"' - 9 - (15)2.-
9 - 81

- 297-225 - 72 - ~ ~ 0 89
- 81 - 81 - 9 - - · - .

Taking the square root of the computed variance,

we obtain the standard deviation of the sample as:

2 12 = 2.828 ;. 0.943.
3 _3

The same result is obtained if we use equation

(3.9), firstly we have

= 1. 33
9 (5+12+16) =- 9 '

and since M = 1 ~ , we obtain

82 33
= 9- (159)2 = _98 ;. 0.89.

It should be noted here that the same value for the sample

variance can be obtained from equations (3.6) and (3,7). The verifica-
32

tion is left to the student (e.g. using the data from the above example).

However, equation (3.9) is advantageous from the computational point of

view, especially for large samples. A similar statement holds for

computing the sample mean Musing equation (3.4).

3 .1. 5 Other "Characteristics" of a Sample: Median and Range

The median, Med (~)of the sample~= (~ 1 , ~ 2 , ••• , ~n) is defined

differently for n odd and for n even. For n odd, Med (~) equals the ~

that is in the middle of the ordered progression~' that is

Med (~) = ~, +l • (3.10)


(~)
2
For n even, Med (~) is the mean of ~ and ~ that is:
(~) (~ +1}

Med(~} = 1. (r;: + ~ 1 .. (3.11)


2 . (~} (~ +1)

Example 3.8: Consider the sample~ ;. (5,3,6,4,1,2).

To obtain Med (r;:}, we first arrange the

sample in either ascending or descending

order, for instance: [;; : (1,2~3,4·,5,6),

n=6. Since we have n even, we get:

Similarly, the ascending ~regression of the


sample~ given in example 3.3 is:
I" ..
~ : (1,1,1,1,1~2,2,2,4), n = 9.

In this case n is odd, and we get:


33

Med(F;) = 1 •

The range, Ra(t;) of the sample t;;: (C , i=l,2, .•. n) is defined


~

as the difference between the largest (t;9;l and the smallest (F;s) elements

of t; that is:

I Ra(~) = ~~-~s ·1 (3.12)

Consequently, for an ascendingly ordered sample t;, we get

(3.12a)

Note that the range of the sample can be also determined from its

definition set D: {dj, j=l,2, ... m }. The corresponding expressions to

(3.12} and (3.12a), respectively are:

Ra(t;) = Ra(D) = d.i -d s ' (3.13)

and Ra(t;) = Ra(D) = dm-d1 0 (3.13a)

Example 3.9: From example 3.8, we have the ascendingly

ordered samples: (1,1,1,1,1,2,2,2,4}, n=9'

whose definition set is D = - {1,2,4} , m=3.

To obtain the range, we use either equation

( 3 .12a} , i. e . Ra(F;)= t; -t;


n 1
= 4-1 = _3,

or we use equation (3.13a), i.e.

At this point, we can summarize the different characteristics

of the sample t; originated in example 3. 3; as computed in the last three


34

sections, namely: M = 1.6 o.8 s = 0.94,


Med(t;) = 1 and Ra(~} =3
(Note that the "bar" above the last digit means that it is a periodic

number).

3.1.6 Histograms and Polygons

From now on, the number of elements n of a sample ~ will be

called the ~ of the sample. A sample with large size n, is often

divided into classes (categories). Each class is a group of n. indi-


~

vidual elements (n. < n). To achieve this, we usually determine


1.

the range of s (see section (3.1.5)),and then divide the range

into k intervals* by (k+l) class-boundaries (class-limits). It is usual

to make the intervals equidistant. The difference between the upper and

lower boundaries of a class is called the class-width. The number c of

elements in each class is called the class-c.ount (class-frequency).

This process in statistics is called classification of the sample. The

"box" (or rectangular) graphical representation of the classified sample

is called the histogram of the sample.

Example 3.10: Let us have the following random sample s:


~; (17,3,2,8,1,5,2,4,6,15,8,9,2,3,10,9,
11,12,4,5,8,6,7,4,5), n = 25
* The interval from a to b is either:

open denoted by (a, b) = (x :a<x<b)


closed " II
[a,b] = (x:a<x<b)
Open-closed,' "It " ( a,b] = (x :a<x<b)
closed-open, " I a,b} = (x :a<x<b)
To reconcile this known notation with the terminology of the theory of sets,
it has to be understood that any such interval can be regarded as a set.
To distinguish such a set from a point set, we shall call it a compact set.
35

First, we compute the range of s using


equation (3.12), i.e.

Ra(s1 = ~ i - ss = 17-1 = 16.

Let us use four intervals:

[1,5], (5,9], (9,13] and (13,17].

Hence, the class-counts will be:

c1 (Il,5])= 12, c2 ((5,9]) = 8,

c3 ((9,13])= 3 and c\((13,17]) = 2.


The histogram of the given sample in this

example is shown in Figure 3.4, in which

the horizontal axis represents the class

boundaries and the ordinates represent the

class-frequencies c. (see the left-hand


~

scale). Relative
Frequency (c; )

I2 - - - - - ---....---.... --- - -- - - ·- - · - - - - --- - - - -· · - - - - -

3
2 ··- - - - -- -..

Figure 3.4

Note in the above figure that a rectangle is drawn over each interval
with constant height equal to the corresponding class-count.
It is usually required that the area of,ol: under,the histogram

has to be equal to one. Assume that we have k classes with corresponding


k
class-counts c. such that: t c. = n. Let us denote the class-width,
~ i=l 1
assumed to be constant, by ~. Hence, the area a of the histogram is

given by:

k
= ~(c 1+c 2 + •.. +ck) = ~ t c ..
1
= ~n.
i=l

This means that the area under the histogram equals the class-width

multiplied by the size of the sample.

Therefore, to make the area of the histogram equal to one, we

simply have to divide each ordinate c. by the quantity n~. The new
1

(transformed) ordinate ~ is also called the relative count (compare


1

this to the relative count mentioned in section 3.1.2, which represents

the experimental probability of an individual element; however, here we

are dealing with counts in an interval).

Example 3.11: Using the data from example 3.10, we have:

n = 25 and ~ = 4. The quantity n~ = 25.4 =100.


,....,
Hence, to compute the relative counts c. of the
1

classified sample~' we divide each ordinate c.


l

(obtained in example 3.10) by 100. This gives us:

-c = s·· = 0.08,
2 100
37

The histogram of the sample in this case

will be the same as in example 3.10, with

the only difference that the ordinate scale

is going to be changed (see Figure 3.4, the

right-hand scale).

Using the relative counts .,..,ci, the area "a "


under the histogram equals to one, as we

can see from the following computation

(using Figure 3. 4):

a= 4.0.12 + 4.0.08 + 4.0.03 + 4.0.02

= 0.48 + 0.32 + 0.12 + 0.08 = 1.0,

which may be used as a check on the correct-

ness of computing - c .•
J.

Let us denote the largest and the .smallest abscissas of a histo-

gram by .Q. and s, respectively (e.g. in Figure 3.4, g.= 17 and s=l).

Notice that for any subinterval D' = [a,b] of the interval 6. = [s, R-],

we can compute the area a (D') under the histogram. This a(D') will be

given as a real number from [0,1]. Hence, a can be regarded as a function

mapping any subinterval of [s, ~] onto [0,1]. Therefore, it is easy to

see that ct can be considered as a probability function (see section 2.1),

more specifically one of the possible probability distribution functions

(PDF's) of the sample. Obviously, such PDF (i.e. a) depends on the

particular accepted classification of the sample.


38

From the above discussion, we find that the probability of any

subinterval of [s,t] is represented by the corresponding area under the

histogram. On the other. b.anao., the ordinates of the histogram do not represent

probabilities (again compare the histogram with the bar diagram given in

section 3.1.2).

Example 3.12: Referring to Figure 3.4, we may ask: what is the

probability of D' = [6, 10]; or, what is the

probability that the sample element, say x, lies

between 6 and 10. This can be written as:

P(6 < X < 10) = ?

The answer will be given by the area under the

histogram between 6 and 10 (which is shaded in

Figure 3.4}, i.e.

P(6~x~l0) = P(6~s9) + P(9<x~l0) =


= (9-6).0.08 + (10-9).0.03
= 3.0.08 + 1.0.03 = 0.24 + 0.03

= 0.27 .

On th~ other hand, by inspecting the actual

sample ~ originated in example 3.10

we find out that the actual number of elements

in the interval [6,10] is nine. This number

represents ( 2 ;).100% = 36% of the sample.

Or, we say that the actual probability

P(6~x~l0) = 0.36, which does not agree precisely

with the result obtained when using the

corresponding histogram (i.e. 0.27).


39

The difference between the actual probability and the computed

probability using the histogram, as experienced in example 3.12, is

largely dependent on the chosen classification of the•sample (selection

of the class-intervals). Usually, one gets a smaller difference (better

agreement) by selecting the class-boundaries so as not to coincide with

any of the elements of the given sample. The construction of histograms

can be considered a subject of its own right. We are not going to

venture into this subject any deeper.

Example 3.13: If we, for instance, use the following classification

(for the sample~ given in example 3.10): [0.7, 4.8],

[4.8, 8.9], [8.9, 13] and [13, 17.1], i.e. we have

again four equal intervals, for which ~ = 4.1. Then

we get the class-counts as c 1 = 9, c2 = 9, c3 =5


and c 4 = 2. The quantity n~ = 25.4.1 = 102.5. Hence,

the relative counts are:

- = c2- = 102.5
cl
9
=. 0.0878,
- 5 . 0.0488, and
c3 = 102.5

c4 =
2
102.5 =. 0.0195.

In this case, the new histogram of the sample ~ is

shown in Figure 3.5.


40

Relative
frequency

-'

9 o. 0~78

2
0+-----~----~~~~~~~--~~---+--~~
0-7 ./f..8 1 /3.0
I
6

Figure 3.5

The probability P(6~~10) is computed as follows

(shaded area in Figure 3.5):

= 2.9.0.0878 + 1.1.0.0488 ;
. 0.2546 + 0.0537
.
= 0.3083 = 0.31,
.
which gives a better agreement with the actual

probability than the classification used in

example 3.11.
41

The graphical representation of a histogram, which uses the

central point of each box (class-midpoint) and its ordinate (the

corresponding relative .class-count), is called a polygon.

In order to make the total area under the polygon egual to one

we have to add one more class interval on each side (tail) of the

corresponding histogram. The midpoints s' and i' of these two, lower and

upper tail intervals,are used to close the polygon.

Therefore, it can be easily seen that the area~· under the poly-

gon has again the properties of probability. This means that ~· is one

of the possible PDF's of the sample. Hence~~ can be used for determining

the probability of any·D' = fa, b]cis', .i' J. Note also here

that the ordinates of the polygon do not represent probabilities.

Example 3.14: The polygon corresponding to the histogram of

Figure 3.4 is illustrated in Figure 3.6.

0.12 ........- - - -

3 IS

Figure 3.6
42

Similar to the histogram, the area"a"under

the polygon should be equal to one. To show

that this is the case, we compute"a"using

Figure 3.6 as:

a = '4' (1.2
' • 0.12 + ~
1 ( 0.12 + 0.08 ) ~

+1. (0.08 + 0.03) +


2
~ (0.03 + 0.02) +
1
+ 2 • 0.02}

= 2(0.12 + 0.20 + 0.11 + 0.05 + 0.02)

= 2 (0.50) = 1.00.
Let us compute the probability P(6~x~l0)

using the polygon (the required probability

is represented by the shaded area in Figure

3.6}. To achieve this, we first have to

interpolate the ordinates corresponding to 6

and 10, which are found to be 0. 090 and

0.0425, respectively. Therefore, the

required probability is:

P(6~x~l0) = P(6~x~7) + P(7~x~l0)


1 1
= 1.2(0.09+0.08)+3.~0.08+0.0425)
= 1.0.085 + 3.0.06125
= 0.085 + 0.184; 0.27,
which is the same as the value

obtained when using the corresponding histogram.


43

So far, we have constructed the histogram and the polygon

corresponding to the PDF of a sample. Completely analogously, we may

construct the histogram and the polygon corresponding to the CDF of the

sample which will be respectively called the cumulative histogram and the

cumulative polygon. In this case, we will use a modified form of equation

(3.2), namely

C(a) = P(x~a) = l: P(x.


~-1
<x<x.) •
- ~
(3.14)
x <a
i"""

Example 3.15: Let us plot the cumulative histogram and cumulative

polygon of the sample ~ used in the examples of

this section.

For the cumulative histogram, we get the following

by using Figure 3.4:

C(l) = P(1) =0 (remember that the probability of

individual elements from the

histogram or polygon is always

zero}.

C(5) = P[l,5] = 4.0.12 = 0.48,

C(9) = C(5)+P(5,9] = 0.48 + 4.0.08

= 0.48 + 0.32 = 0.80,


C(l3) = C(9)+P(9,13] = 0.80 + 4.0.03

= 0.80 + 0.12 = 0.92,


C(l7) = C(l3)+P(l3,17] = 0.92 + 4.0.02

= 0.92 + 0.08 = 1.00.

Figure 3.7 is a plot of the above computed

cumulative histogram.
44

C(;_)

o. s6--···--- ,--
I
1
0.8'3

I
c,oo~------~~------~._--~r-~---4-------+----------~--
51 9 : 13 1"1
6 10
Figure 3.7

For the cumulative polygon, we get the

following by using Figure 3.6:

C(-1) = O,
C(3) = ~(4.0.12)= ~(0.48)= 0.24,
= p[:..l,3]

C(7) = C(3)+P(3,7] = 0.24+ ~.4(0.12 + 0.08)

= 0.24 + 0.40 = 0.64,


C(ll) = C(7}+P(7,ll] = 0.64 + ~.4(0.08 + 0.03)
= 0.64 + 0.22 = 0.86,
C(l5) = C(ll)+P(ll,l5] = 0.86+ ~ .4(0.03 + 0.02)
= 0.86 + 0.10 = 0.96,
C(l9) = C(l5)+P(l5,19] = 0.96 + ~ . 4.0.02
= 0.96 + 0.04 = 1.00

Figure 3.8 is a plot of the above computed

cumulative polygon (note here, as well a&· in Figure

3. 7, the properties of the CDF mentioned in example

3.4).
45

CC;)
J.oo
o. 9b
o.f16
0.64

0,54--
0.24

-I 3 -, I
I
fI 15 \9
6 10
Figure 3.8

By examining Figures 3.7 and 3.8, we can see that the cumulative

polygon uses tbe central point of each class-interval along with its

ordinate from the corresponding cumulative histogram. Therefore, the

relationship between the cumulative polygon and its corresponding

cumulative histogram is exactly the same as the relationship between the

polygon and its corresponding histogram.

Because of the nature of the CDF, we can see that the cumulative

probability - represented by an area under the PDF extending to the left-

most point - is represented just by an ordinate of the cumulative histogram

or the cumulative polygon. Hence the cumulative histogram or the cumulative

polygon can be used to determine the probability P[a,b], a<b, simply by

subtracting the ordinate corresponding to a from the one corresponding to b.


46

Example 3.16: Let us compute the probability P[6,10] by

using:

(i) the cumulative histogram of Figure 3.7,

(ii} the cumulative polygon of Figure 3.8.

First, we get the following by using Figure 3.7:

The interpolated ordinates corresponding to 6

and 10 are found to be 0.56 and 0.83, respect-

ively. Therefore, P[6,10]= P(6~x~l0) =

= 0.83-0.56 = 0.27, which is the same value as the one

obtained when using the hi.stogram (example 3.12).

Second, we get the following by using Figure 3.8:

The interpolated ordinates corresponding to 6

and 10 are found to be 0.54 and 0.805,

respectively. Therefore:

P[6,10]= P(6~x~l0)= 0.805-0.54~0.27 ,

which is agai!i t1l~ same value as the one

obtained when using the polygon (example 3.14).

To close this section, we should point out that both the histo-

grams and the polygons (non-cumulative as well as cumulative) can be

refined by refinning the classification of the sample. Note that this

refinement makes the diagrams look smoother.


47a

3;2 Statistics of a Random Variable

3.2.1 Random (Stochastic) Function and Random (Stochastic) Variable

In order to be able to solve the problems connected with inter-

val probabilities (see the histograms and polygons of section 3.1.6) more

easily and readily, the science of statistics has developed a more con-

venient approach. This approach is based on the replacement of the

troublesome numerical functions defined on the discrete definition set

of a random sample, by more suitaple functions. To do so, we first

define two idealizations of the real world: the random (stochastic)

function and the random (stochastic) variable.

A random or stochastic function is defined as a function X

mapping an unknown set U* into R, that is

xe: {U + R}

(Later on, concepts of multi-valued Xe: {U + Rm} (where Rm is the cartesian

m-power of R, see section 1.3) are developed.)

This statement is to be understood as follows: For any value

of the argument u e U, the stochastic function x assumes a value x (u) c: R.

But, because the set U is considered unknown, there is no way any

formula for x can be written and we have to resort to the following

"abstract experiment" to show that the concept of random functions can

be used.

"Note that in experimental sciences the set U may be fully or at least


partly known. The science of statistics however, assumes that it is
either not known, or works with the unknown part of it only.
47b
Suppose that the function:x is realised by a device or a process

(see the sketch} that produces a functional

u - -...4__x_ _.-l-__...,.. ·:X:.(U)


value x(u) every time we trigger it. Knowing nothing about the inner

workings of the process all we can do is to record the outcomes x(u}.

When a large enough number of values x(u) have been recorded, we can

plot a histogram showing the relative count of the x{u) values within

any interval [.x0 , ~]. In this abstraction we can imagine that we have

collected enough values to be able to compute the relative counts tor

any arbitrarily small interval dx and thus obtain a ~mooth histogrmn'.

Denoting the limit of the relative count d~vided by the width dx o£ the

interval [x, x + dx], for dx going to zero, by cjl(x) we end up with a

function cjl that maps x e R into R.

Going now back to the realm of mathematics, we. see that the

outcome of the stochastic fun~tion can be viewed as a pair (x(u), cj~(x)}.

This pair is known as the random {stochastic) variable. It is usual in

literature to refer just to the values x(u) as random variable with the

tacit understanding that the function cjl is also known. .

We note that the function $ is thus defined over the whole set

of real numbers R and has either positive or zero values, i.e. cjl is non-

negative on all R. Further, we shall restrict ourselves to only such ;

that are integrable on R in the Riemannian sense, i.e. are at least

piece-wise continuous on R.

3.2.2 PDF and CDF of a Random Variable

The function ; described in 3.2.1, belonging to the random

variable x, is called the probability distribution function (PDF) of' the

random variable. It can be regarded as equivalent to the experimental.


48

PDF (see 3.1.2) of a random sample. From our abstract experiment it

can be seen that

j t/J(x) dx =1 ( 3 .15)
-oo

since the area under the "smooth histogram" must again equal to 1 (see

3.1.6). This is the third property of a PDF, the integrability and

non-negativeness being the first two. We note that eq. 3.15 is also

the necessary condition for t/J(x) dx to be called probability (see 2.1).

Figure 3.9 shows an example of one such PDF, i.e. tP in which

the integral (3.15) is illustrated by the shaded area under the rp.

Figure 3.9

l
The definite integral of the PDF, tP, over an interval D c D is

called the probability of D'. So, we have in particular:

Jxorp (x) dx = P(x<x)


-o
€[0, 1] , (3.16a)
-oo
00

I tfl(x) dx = P(x -
> X )
0
€ [0 1 1] 1 (3 .1Gb)

€ [0, 1] • (3.16c)

Consequently,

P (x > x )
- 0
=1 - P (x _< Xo> • (3.17)
The integrals (3.16a), (3.16b) and (3.16c) are represented by

the corresponding shaded areas in Figure 3.10: a, b, and c, respectively.

(a) ~)

~--T-----~~~~~~~~x
..... 00

(c:)
._.. ,._,__.....,___,'-'-""""~"""'1~---1---:11• )(
)f' )(2.. -to oO

Figure 3.10

At this point, the difference between discrete and compact probabil-

ity spaces, should be again-born • in.··mind. In the discrete space, the value

of the PDF at any point, which is an element of the discrete definition set

of the sample, can be interpreted as a probability (section 3 .1. 2) • However

in the compact space, it is only the area under the PDF, that has

got the properties of probability.* We have already met this problem when

dealing with histograms.

* The whole development for the discrete and the compact spaces could be
made identical using either Dirac's functions or a more general definition
of the integral.
so

Note further that:

X
0
P (x = x0 ) = f ~(x) dx = 0*).
X
0

Analogous to section 3.1.2, the function ~defined as

I----------------------------------
X
~(x) =f~(y) dy, e:[x+ [0, 1]} (3.18)
-<»

where ye:R is a dummy variable in the integration, is called a CDF

provided that ~ is a PDF. ~ is again a non-negative, never decreasing

function, and determines the probability P(x < x ). (Compare this


- 0

with section 3.1.2); namely:

~(x0 ) = P(x < x ) e:


- 0
[0, 1] . (3.19)

Figure 3 .11 shows how the CDF' (corresponding to the PDF in

Figure 3.9) would look.

\t'(x) \f(x):: I.
l.o - - - - - - -

L---~~------------~~--~~x
--+00
a..
Figure 3.11

* This may not be the case for a more general definition of the integral,
or for ~being the Dirac's function.
51

If <P is symmetrical, '1:' will be"inversely symmetrical"around the

axis 'l:'(x) = 1/2. Figure 3.12 is an example of such a case.

<((x) lt' {x):: I.

Figure 3.12

Not'~Fthat '1:' is the primitive function of <P since we can write:

d 'l:'(x)
<P (x) =
dx

In addition, we can see that q,(x) has to disappear in the infinities in order

to satisfy the basic condition :


00

! <P (x) dx = 1.
-co

Hence, we have:

lim 'l:'(x) =0 , lim 'l:'(x) = 1 •


x+-CIO x-+oo

3.2.3 Mean and Variance of a Random Variable

It is conceivable that the concept of a random variable is useless

if we do not know (or assume) its PDF. On the other hand, we do not have the

one-to-one relation between the random variable and its PDF as we had with
52

'
the random samples (section 3.1.1 and 3.1.2). The random variable acts

only as an argument for the PDF.

The random variable can be thus regarded as an argument of the

function called PDF, that runs from minus infinity to plus infinity.

Therefore, strictly speaking, we cannot talk about the "mean" and the

"variance" of a random variable, in the same sense as we have talked

about the "mean" and the "var-iance" of a random sample. On the other

hand, we can talk about the value of the argument of the centre of gravity

of the area under the PDF. Similarly, we can define the variance related

to the PDF. It has to be stated, however, that it is a common practice

to talk about the mean and the variance of the random variable; and

this is what we shall do here as well.

The~~ of the random variable x is defined as:

" =-~ x ¢(x) dx • l (3.20)

Note the analogy of (3.20) with equation (3.4), section 3.1.3.

~ is often written again in terms of an operator E*; usually

we write
00

E* (x) =~ =f x ~ (x) dx
t) (3.21)

t E* is again an abbreviation for the mathematical Expectation, similar


to the operator E mentioned in section 3.1.3. However, we use the
"asterisk" here to distinguish between both summation procedures, namely:
E implies the summation using E; and E* implies the summation using!.
53

We can see that the argument in the operator E* is x·~(x) rather than x,

x being just a dummy variable in the integration. However, we shall

again use the customary notation to conform with the existing literature.

We have again, the following properties of E*, where k is a

constant: (i) E* (kx) = k E* (x);

. . . r'

are r different "random variables", i.e., r random

variables with appropriate PDF's;

(iii) and we also define:

E*(E*(x)) = E* (x) = ~tt).


The variance cr 2 of a random variable x with mean~' is defined as:

~ a2 =-~ (x-•) 2 0 (x) dx ·I (3.22)

Note the analogy of (3.22) with equation (3.8), section 3.1.4. The

square root of cr 2 , i.e. a, is again called the standard deviation of the

random variable.

Carrying out the operation prescribed in (3.22) we get:


00

cr 2 =! [x 2 ~(x) -. 2x~+ (x) + ~2~ (x)] dx

00 00 00

= ! x2 ~(x) dx - 2~ ! x ~(x) dx + ~ 2 ! ~(x) dx •


-00

ttin order to prove this equation, one has to again use the Dirac's
function as the PDF of E*(x).
54

In the above equation, we know that the integral in the second

term equals~ (equation (3.20)), and the integral in the last term equals

one (equation (3.15)). Therefore, by substituting we get:

00

(3.23)
-oo

Note the similarity of the first term in equation (3.23) with


m
E(l;2) = 2: d~ P(d.) (section 3 .1. 4) • This gives rise to an often used
j=l J J
notation:

(3.23a)

We shall again accept this notation as used in the literature, bearing in

mind that E* is not operating on the argument, but on the product of the

argument with its PDF.

The expression

(3.24)
-oo

is usually called the r-th moment of the PDF (random variable); more

precisely; the r-th moment of the PDF about zero. On the other hand,

the r-th central moment of the PDF is given by:

00
r
m'r = ! (x-~) ~ (x) dx • (3.25)
-oo
55

By inspecting the above expressions for mr and m~ along with

equations f3. 20) and ( 3. 22) , we can see that:

l.l = ml (3.26a)

and a2 = m2 = m2 - l.l2 = m2 - mf (3.26b)

Compare the above result (3.26 a, b) with the analogy to mechanics men-

tioned in sections 3.1.3 and 3.1.4.

3.2.4 Basic Postulate (Hypothesis) of Statistics, Testing

The basic postulate of statistics is that "any random sample has

got a parent random variable". This parent random variable x~R is usually

called population and is considered to be infinite. It is common in stat-

istics to postulate the PDF of the population for any random sample, and

call it the postulated, or the underlying PDF. Such a postulate may be

hence tested for statistical validity.

In order to be able to test the statistical validity we have to

assume that the sample can be regarded as having been picked out, or drawn

from the . population, each element of the sample independently from the rest.

~his additional property of a sample is required by the standard definition

of a random sample as used in statistical literature. However, since the

present Introduction does not deal with statistical testing we shall keep

using our original, more general definition.

There are infinitely many families of PDF's. Every such family is

defined by one or more independent parameters, whose values characterize the

shape of its PDF. The individual members of a family vary according to the

value of these parameters. It is common to use if possible, the mean and the

standard deviation as the PDF's parameters. The less parameters the family of

PDF's contains the better; the easier it is to work with.

The usual technique is that we first select the "appropriate"


9
family of PDF's on the basis of experience and then try to find such values
56

of its parameters that woUld fit the actual random sample the best. In

other words, the shape of the postulated ~(x) is chosen first; then, its

parameters are computed using some of the known techniques.

Since we shall be dealing with the samples and the random var-

iables (populations) at the same time, we shall use, throughout these notes,

the latin letters for the sample characteristics, and the corresponding greek

letters for the population characteristics as we have done so far.

3.2.5 Two Examples of a Random Variable

Example 3.17: As the first example, let us investigate a random variable x


with rectangular (uniform) PDF ,. which is symmetrical

around a· value x; = k~ ::.:t'§t'"· the ·probability


of x < k-q and x > k+q, be zero. Obviously, this PDF has the

following analytical form (see Figure 3.13):

~ax t:S oF s!i m, e-try


P(k-~!: x !: l<t 'l).: 1.

Figure 3.13

=<
h, for (k - q < x < k + q)

o(x)
0, for (x < k- q) and (x > k + q).
57

This can be written in an abbreviated form as:

~(x) <
=
h, for (i.x-kl.-< q) ·

0, for (!.x- kl > q).


-

The above ~ contains apparently three parameters k, q and h.

However, only two are independent, since one can be eliminated

from the condition (3~15), i.e.:

! ~(x) d.x =1
-""
that must be satisfied for any ~ to be a PDF. Let us

elimi~ate for instance the parameter h. We can write:


00 k-q k+q
f ~ (x) dx f= ~ (x) dx + f ~ (x) dx +
-oo -oo k-q

00

f ~ (x) dx
k-4-q

k+q
= 0 + ! hdx + 0
k-q
k+q
k+q
= h ! dx = h [x]k = 2hq = l.
-q
k-q

This means that h =21q , and therefore:

--<l/(2q), for ( l.x - kl ~ q)


~(x)
0, for ( Ix - k I > q)

The corresponding CDF to the above ~ is:

0, for (x ~ k - q)

ll'(x) = !
X

-oo
~(x) d.x -1
2q !
k-q
X
dx. = ;q_ (x-k+q), for

( lx - kl ~ q) •

1, for (x :::, k + q) ,

and is shown in Figure 3.14.


)0

o/(x)

/. 0 ---- - - - - - - - - - - - -

0,5

Figure 3.14

From the above figure we see that the function~~linear in the

interval over which <l> :/: 0, and is constant everywhere else. Note

that:
cjl(x) =d '!' (x)
dX
The mean of the given PDF is computed from equation (3.20) as

follows:
00
k+q 1 2 k+q
11 =f x<P (x)dx = 12q f
k-q
xdx = -2q [:!....]
2 k-q

= -14q (k 2 + 2kq + q
2
- k
2 2
+ 2kq - q )

- 4kq -
- 4q - k.
This result satisfies our expectation for a symmetrical function

around k. The variance of the given PDF can be obtained from

equation (3.22), yielding


00
2 k+q
1
cr <j>(x) dx = 2q fk-q (x-k) 2 dx =
59
k+q k+q 2 k+q
1
=-
2q
k-q
f. X
2
dx -- 2k
2q
J
k-q
X a.x+L
2q
J
k-q
dx

3 k+q
= ~q [~ Jk-q - 2k 2 + k2

Since k =~ and q = 13o, then h =~q = 2 ;~ 0 , and we can

express the given rectangular PDF, which we will denote by

R, in terms of its mean ·~ and its standard deviation a as

follows:

<
~her, for ( lx-p.l) ~ her)
R(·~, a ; x) = ~(x)
0, for ( Ix - pI ) > her) •

Similarly, we can express its corresponding CDF, which we will

denote by R , in terms of~ and cr, as follows:


c
0, fo~:~{x<1 ~- ll - ho) .

~ho (x-J.t.+l3cr) , for ( Ix- ~-1 < 13a) •

1, for (x ~ ~ + ho) • -
Assume·· that we would like to compute the probability of

x e: [ ]..1-cr, ~+cr] , where x has the rectangular PDF .

This can be done by using equation (3.16c) and Figure 3.15, as

follows:
6o

{>(x).

I ,

~~~~~~~~~~~~~-.-x
(~-f3C1') I (~+JJo-)

(~-rr) cJ+o-)
Figure 3.15

].I+a
P ( 1,1-0! < X < 1J +a ) = f cp ( X ) dx
}.1-a

].I+ a
1 1
= f 2ha dx = 2ha [ 2 o]
}.1-a

=;§- = ~~ = L ~ 32 =o.~ 577 = Q.:.2§_.


The above probability is given by the shaded area in Figure

3.15.

Similarly, for this particular uniform PDF, we find that:

P(~J-2a-:_ X-:_ ]1 + 2a): P(~J-3a-:_ X-:_ 1J + 3a): 1.0

·In:' statistie.a.l,t;esting, we: often need to compute the moments

of the PDF (see/seo1;i.ea,3.2.3). Let us,., .-fer instance,·

compute the third moment m3 about zero of the rectangular

PDF... We will use equation (@-:.-24),. L e.


61
oo
3 ll +13a 3 1
f X ¢(x) dx= .( X ~dx
-oo )1-/Jcr . cr

= ~her (8/3crp 3 + 2413 a3 ll )

3 2
m3 =ll_·-~~

J:l-:;xample 3.18: As a second example let us investigate a random variable with

a triangular PDF, which is symmetrical around x = k . Let us

assume that the probability of x < k - q and x > k + q, be

zero. We may write (see Figure 3.16):

Figure 3.16
62

for (x < k - q).

q), for (k - q < x < k).

~(x) =
+ q), for (k < x < k + q).

0, for (x > k + q).

This .can be rewritten in the following abbreviated form as:

~(x) = < h (q - Ix -
qo
k

for ( Ix - k I
I), for ( Ix - k

:: q) •
I < q)

From the above, we can see that the triangular PDF has the

same parameters (k, q, h) as the uniform PDF of example 3.17.

Let us again eliminate the parameter h from the condition


00

f ~(x) dx = 1. This integral is nothing else but the area

of the triangle, so.· that we, can Write:_:~ • 2q • h = qh = 1.

-
--. ,
1
Thi.s gives . us : h = -,
q
and hence,

~(x) ·= < - 1
q

0, for
.
ix-~ l
q
-. (1·x-k 1' <
for

Clx-kl ~ q).
q) •

The .c-omputations of the ~ and the variance of the triangular

PDF can be performed by following the same procedure as we have

done for the rectangular PDF in example 3.17. We state here

the results without proof, and the verification is left to the

student.

The ~ Jl of the given triangular PDF equals to k, and the

variance a 2 comes out as~ q 2 .


63

Since k = lJ and q = 16a , we ean againt~press the tri-

angular PDF, which we will denote by T, in terms of its mean

T (\1, a ;x) = ~(x)


~
voa
-
lJ and its standard deviation a, as follows:

<
lx-}.1 I ' for( lx··\11
6(/
< 16a)

0, for ( IX-lll > 16a ) .

The corresponding CDF is given by:

0, for (x ~ ~- q).

l!'(x) / (l- lx-21 ) dx, for ( lx-gl ': q)


q ,..,
lJ-q ':l.

1, for (x: ll + q),

and is shown in Figure 3.17.

'f(>c)=J.
t.o

0.. 5 -----·-- -------

Figure 3.17

The integral in the above equation can be rewritten as:


<
X

! _.q,_+_x_-_J.!.o;;.' dx, for (x < u).

f x (~ _ Ix-u I ) dx
- 11-fllq q + xq2-
k-q q 9.2
X
......._______-"'-]J dx + f 9. - x + 11 dx,
]1-q q2 11 q2

for (x > 11).

and we get:

X
f (q + x - 1J) dx
11-q

= 1 {~2(x2-112+2~q-q2) + (q-11) (x-11+q)}


q2

== 1 2 {x 2 -2~x+i+2q(X-!1) + q2 }
2q
2
= . (x-11) _ + (x-11)_ + ~
2q2 q 2

Similarly,
X ( 2
l f (q-x+).l) dx = - x;JJ) - + (x-~2_,
2
q 11 2q q

and

f g,_+x-11 dx=l
2 2
q
JJ-q

Finally, we can express the CDF, which we are going

to denote by T , in terms of the mean 11 and the


c
.statt:C'tard d.eviation o, as folJ_(lj)wa:
65

1
=1/
0, for (X 2_1-l- ./6a)

(x-1J)
120 2
2
+ (x-1J) +
/60 2
.
!, for (l..l - ./60 .:s_ x .:s_ u) •

(X-l..l) 2 + (X-1J) 1 f ( .j )
2 ; 60 + 2' or u .:s_ x .:s_ 1..l + 60 •
\ 120

1, for (x .::_ 1..l + ./60) •

By following the same procedure as in example 3.17, we

can compute the probabilities: P(l..l-0 .:s_ x .:s_ 1..l + 0),P(~-20 ~ x ~ 1..l + 20)

and P (1J-30 .:s_ x .:s_ 1J + 30) as well as the third moment m3 about zero for

the triangular ·PDF. Again, we give here the results, and the verification

is left to the student:

P(lJ-0 2_ X 2_ 1J + a) ; 0.66 ,

P(lJ-20 .:s_ X .:s_ .l..l + 20) - 0.97

P(lJ-30 .:s_ x .:s_ 1..l + 30) = 1 and

m
3
= l..l3 + 3a 2 1..l •
66 .

3.3 Random Multivariate

3.3.1 Multivariate, its PDF and CDF

Analogically to the ideas of stochastic function and stochastic

variable given in section 3.2.1, we introduce the concept of a multi-

valued stochastic function


s
XE {U + R }

in the s-dimensional space.

We note that X is a vector function, i.e., X(u) can be written

as:

X(u) =(x1 (u), x 2 (u), ..• , xs(u)) E Rs, u E u.


The individual components xj(u) £ R, j = 1, 2, .•• , s are called components

or constituents of X(u). We also note that each component xj of the stoch-

astic function X can be regarded as a random variable (univariate) of its

own. One particular value of xj may be denoted by x~*) and similarly a


~

particular value of X may be denoted by


1 s
X.~ (x. , • • • I x.) •
J. J.

Note thq.t a specific value of X is a sequence of real numbers (not a

set), or a numerical vector.

The pair (X(u), ~(X)), where


1 2 s
~ (X) = ~ (X , X , . • •• , X ) (3-27)

is a non-negative, integrable function on Rs is called a random multi-

variate or simply a multivariate.

*The superscripts and subscripts here are found very useful to distinguish
between the c~ponents xJ, j = 1, 2, ... , s of the multivariate X, and
the elements x~, i = 1, 2, •.. , nj of the univariate (random variable)
XJ. J.
67

s
We can speak of a probability of X£ [X0 , Xl] C R , and define it

as follows:
r-------------------------------,
xi
P(X0 ~X~ x1 ) =!X ~(X) dX E[O, 1] . ( 3-28)
0

Here the integral sign stands for the a-dimensional integration~ dX for

an a-dimensional differential, i.e. dX = (dx1 , 2


dx ,
s
• • • dx ) and
1 2 s
= (xl, xl' •. • x 1 ) are assumed to satisfy

the following inequalities:

j = l , 2 , ••• , s

Note that in order to be able to call the function ~ a PDF, the following

condition has to be satisfied:

f
R
s~ (X) dX =1

A complete analogy to the one-dimension 1 or univariate case


l (3-29)

(section 3.2.2) is the definition of the multivariate CDF. It is defined

as follows:
X
~(X) = f ~(Y) dYe {Rs + [0, 1]} (3-30)
-oo

where Y is an a-dimensional dummy variable in the integration.

Example 3.19: Consider the univariate PDF shown in Figure 3.12. This

bell-shaped PDF is known as the normal or Gaussian PDF (to be

discussed later in more details), and is usually denoted by N,


68

in terms of its ~ and o we have

~(x) = N(~, o; x) •

Then the multivariate normal PDF in two-dimensional space, i.e.

~(X)= ~(x 1 , x 2 ), would appear as illustrated in Figure 3-18.

Figure 3-18.

In the two-dimensional space, ~(X) is ca.lled a bivariate PDF,

and the bivariate normal PDF illustrated above can be expressed as

~(X) = N( ~l' ~2' 01' o2 ; X)


..
·1 2
1
X - l-11 X- l-11
= exp [-~ . -~ ]
2 'IT 0 cr2 01 02
1

~
\ )< -- •· .v
X
\~.:..- ).l~ ~.
[~-
'2
~--:-

pi
2
0 2. J
69

3.3.2 Statistical Dependence and Independence

The PDF,~' of-the multivariate X may have a special form,

namely

In this case, the integral in equation (3-28) can be rewritten as:

s xj
= IT ! l (3-31)
j=l xj
0

Remembering that each component xj of the multivariate X can be regarded

as a univariate, and regarding ~j as the PDFs of the corresponding

univariates we can rewrite equation (3-31) as:

s s
IT IT P(xj < xj < xJ1' ) •
o-
j=l j=l

Comparing this result with equation (3-28), we get the relationship

between the probabilities

P(X <X< X ) = ~ P(xj < xj < xj) (3-32)


o- - l . o- - l
J=l

This relation can be read as follows: "The combined probability of all

the components satisfying the condition: xj < xj < xj equals to the


0- - l'
product of the probabilities for the individual components 11 , and

obviously satisfies the definition of the combined probability of


70

independent events (section 2.3). Hence, the components xj of such a


multivariate X are called statistically independent. The PDF from
example 3.19 is statistically ~ndependent.

If the PDF of a multivariate cannot be written as a product

of the PDF's of· its constituents, then these constituents are known as

statistically dependent. In this case, the probability P(X0 ~X~ x1 )


is not equal to the product of the individual probabilities.

It can be shown that for statistically independent components

we have

J R cj> j (xj) dxj = 1, j = 1, 2, ••• , s.

3.3.3 Mean and Variance of a Multivariate

The sequence

... , u )
s
= E* (X) (3-33)

where

~. = J xj cj>(X)dX = E* (xj)e R, j = 1, 2, ••• , s (3-34)


J Rs

is called the mean of the multivariate X. The argument of the operator E*


1 2 s 1 2 s
(i.e. the a-dimensional integral) is X. cj>(X) == (x , x , ••• , x ) • <P (x , x , ••• , x ) •

Similarly the variance of the multivariate X is given by

cr2 = (crl, cr2, ••• , cr2) I


2 s

(3-35)

where

(f~ =
J

= E* (xj -11 . ) 2 ) e R, j = 1, 2, ••• , s. ( 3-36)


J
71

Note that we can write again

~ - 2 ~ - 2 ~ 2 -2
E*(X-~) = E*(X-E*(X)) = E*(X) - ~ ( 3-37)

and

The variance of the multivariate does not express the statis-

tical properties in the multi-dimensional space as fully as the variance

of the univariate does in the one-dimensional space. For this reason,

we extend the statistical characteristics of the random multivariate


further and introduce the so-called variance-covariance matrix (see

section 3. 3. 4) •

Let us now turn our attention to what the mean and the variance

of a "statistically independent" multivariate look like. For the stat-

istically independent components xj, j = 1, 2, ••• , sofa multivariate x,

we obtain

s
=J [xj ~ . (xj) ( II ~ R, (x.R,) dx.R,) dxj] ( 3-39)
Rs J .R-=1
R-r!j

Here, according to section 3.3.2, all the integrals in equation (3-39)

after the IT-sign are equal to one, and thus we have

( 3-40)
72

Similarly,

(3-41)

Thus for the statistically independent X, we can compute the mean and

the variance of each component xj separately, as we have computed

cr 2 of the PDF from example 3.19.

3.3.4 Covariance and variance-covariance Matrix

Before we start describing the variance-covariance matrix,

let us define another statistical quantity needed for this matrix. This

quantity is called covariance and it is defined for any two components

xj and xk of a multivariate X as

cov (3.42)

We note three things in equation (3-42). First, if j =k


we see that the expressions for the covariances become identical with

those for the variances, namely:


73

Secondly, if the components of the multivariate are statistically

independent, the covariances (j # k) are all equal to zero. To show this,

let us write

Finally, noting that for a pair of components of a statistically

independent multivariate we have

( 3. 43)

we can write:
74

. k .
O'jk = E*(xJx -xJ j.lk-j.ljXk + ].lj].lk)

'k k
= E*(xJx ) - ].1 E*(xj) - ].I.E*(x) + ].lj].lk
k J

. k . k . k
= E*(xJx ) - ].lj].lk = E*(xJx) - E*(xJ) E*(x) =0
. k
Hence, for statistically independent components xJ and x , we get

(3-44)

or more generally, for r independent components we get

r r
t
E* ( II X ) = II (3-45)
t=l t=l

Equation (3-45} completes the list of properties of the E* operator

stated in section 3.2.3.

As we stated in section 3.3.3, the variance (cr2} of a multi-

variate is not enought to fully characterize the statistical properties

of the multivariate on the level of second moments. To get the same

amount of statistical information as given by the variance alone (in the

univariate case), we have to take into account also the covariances.

The variances and covariances can be assembled into one matrix

called the variance-covariance matrix or just the covariance matrix.


75

The variance-covariance matrix of a multivariate X is usually denoted by

~~ and looks as follows:

0'2
1 0'12 0'13 crls
cr2
0'21 2 0'23 0 2s

~*
X
= (3-46)

.
0'2
crsl crs2 s

It is not difficult to see that the variance-covariance matrix

can also be written in terms of the mathematical expectation as follows:

~* = E* [(X-E*(X)) (X-E*(X))T] , (3-47)


X

which is the expectation of a dyadic product of two vectors. Note

that the superscript T in the above formula stands for the transposition

in matrix operation. The proof of equation (3-47) is left to the

student.

Note that the variance-covariance matrix is always symmetrical,

the diagonal elements are the variances of the components and the off-

diagonal elements are the covariances between the different pairs of

components. The necessary and sufficient condition for the variance-

covariance matrix to be diagonal, i.e. all the covariances to be zeros,

is the statistical independence of the multivariate. The variance-

covariance matrix is one of the most fundamental quantities used in

adjustment calculus. It is positive - definite (with diagonal elements

always positive) and the inverse exists if and only if there is no absolute

correlation between components.


76

3.3.5 Random Multisample, its PDF and CDF

Like in the univariate case, we can also define here a quan-

tity n corresponding to the random sample ;, defined in section 3.1.1

as follows:

J::lI.
n -
Jr•i·
I. 2
1
t;3,
2
• • •

..... ,
• • I
E;l )
nl
~2 )
e R

e R
1
n2
(3-48)
n (_t;l, F;3'
n2
I.
lt,:S l«~. s
t;2, F;3,
s
• • • • • I
E;s
n
s
n
e: R s

which is a straightforward generalization of a random sample, and will

be called a random multisample. From the above definition, it is obvious

that n has s components (constituents), F,;j, each of which is a

random sample on its own. The number of elements n. in each component


J
F;j may or may not be the same.

We can also define the definition set as well as the actual

(experimental) PDF and CDF of a multisample in very much the same was as

we have done for a random sample. Also, the distribution and cumulative

distribution histograms and polygons can be used for two-dimensional multi-

samples. The development of these concepts, however, is left to the

student.

3.3.6 Mean and Variance-Covariance Matrix of a Multisample

The mean of a multisample (3.48) is defined as

... , (3-49)

where from equation (3-3) we get


1 nj . .
M. = n. .~ 1 E;~ = E(F,;J) e:R, j = 1, 2, ..• , s • ( 3-50)
J J.= J.
J
77

Here, the operator E is defined as a vector of operators E which is

obvious from comparison of (3.49)with (3.50). Similarly,

r
l -2
s =
2 2 2 2
(sl, s2, s3, .•• , ss) = E- (n-M)
- 2 e: R
s ,
] ( 3-51)

where from equation (3-6), we get


n.
2 1 J j 2
s. = L (E;.-M.) =E(E;j-M.) 2 e: R, j = 1, 2, •• , s.
J
(3-52)
J n. i=l l. J
J
-
We can also define the standard deviation S of the multisample n as

(3-53)

Example 3.20: Let us determine the mean M, the variance s2 and the
standard deviations of a multisample n = (t; 1 , t; 2 , t; 3 ),

where

sl = ( 2, 3, 4, 7, 4) ,

s2 = (6, 4, o, 3, 2) and

s3 = ( 5; 2, 5, 5, 8) .
Here we have n 1 = n 2 = n 3 = 5. The mean M is given from

equation (3-49) as
78

The members Mj, j = 1, 2, 3 are computed from equation (3-50)

as follows:

n 5
M = 1:._ 1:1 ~:- = 1:. 1: ~:-
l nl i=l ~ 5 i=l ~

= l5 20
(2 + 3 + 4 + 7 + 4) = 5 = 4 '

M2 = t (6 + 4 + 0 + 3 + 2) = l~ = 3,
M3 = t (5 + 2 + 5 + 5 + 8) = ~ = 5 ,

and we get
_,
M = (4, 3, 5) ,

The variance "82 is given from equation (3-51) as

~ 2 2 2
8 = (81, 82, 83) •

The members 8~, j = 1, 2, 3 are computed from equation (3-52)

as follows:

nl 2 5
8
2 1 1
=--- 1: (~;-M 1 ) =-1 r
1
(~. - 4)
2
1 nl i=l ~ 5 i=l ~

1
= 5 [4 + 1 + 0 + 9 + 0] = 514 = 2.8 '
79

82 =1 [(3}2 + (1}2 + (-3)2 + (0}2 + (-1)2]


2 5

= ; [9 + 1 + 9 + 0 + 1] = 2 ~ = 4.0

s~ = t [(0)2 + (-3)2 + (0)2 + (0)2 + (3)21

= 51 [0 + 9 + 0 + 0 + 9] =
18
-s = 3.6 ,

and we get

s-2 = (2.8, 4.0, 3.6) •

2
Taking the square root of the individual members Sj' j = 1, 2, 3,

we obtain the standard deviation s as


- = (s 1 ,
S s2 , S 3 ) = (1.67, 2.0, 1.9) •

If the jth and kth components of a multisample have the same

number of elements, say n, we can write the covariance Sjk between these

two components ~j and ~k as:

( 3-54)

which can be rewritten as:

Note that the covariance Sjk' as defined above, depends on the ordering of
. k
the elements in both components ~J and ~ , whereas the means Mj and ~ and

t he .
var~ances sj2 an d sk2 d o not. Therefore, to obtain a meaningful covariance
. k
sjk' each of the components ~J and ~ should be in the same order as it

was acquired. This can be visualized from the following example. Assuming

that the elements of ~j are observations of one vertical angle, and the

elements of ~k are the corresponding times of the observations. Clearly,

to study the relationship (covariance) between the observation time and

the value of the observed vertical angle, the matched pairs must be

respected.
80 a

~~-l: 21~: Let us determine the covariances between the different

pairs of components of the multisample n given in example 3.20.

The covariances 8jk are computed from equation (3-54} as follows:

5
8 12 = 8 21 = 51 i~l[(~i-
1
4)
2
(~i- 3 )
]

=~ [(-2)(3) + (-1)(1) + (0)(-3) +

+ (3)(0) + (0)(-1)]

= 15 [-6-1+0+0+0] = -T5 = - 1.4,

813 = 831 = ~(-2)(0) + (-1)(-3) + (0)(0) +

+ (3)(0) + (0)(3)]
l .
= - [ 0+3+0+0+0]
5
= -35 = 0.6 and
-

823 = 832 =? (3)(0) + (1)(-3) + (-3)(0) +


+ ( 0 )( 0) + ( -1 )( 3)]

= 51 ( 0 - 3 + 0 + 0 - 3]• -6 : :
=~ -1.2 •

Finally, we can assemble the variance covariance matrix En of


the multisample n:

62
1 6 12 613 6 1. s

82
8 21 2 8 23 62 s

E = (3-54)
n

82
8 sl 8 s2 s

Having defined the mean and the variance-covariance matrix of

a multisamp1e 1e·t us stop and reflect for a while. We have stated in

3.3.3 that the expansion from one to s dimensions defied a straight-

forward ~eneralisation of one dimensional variance. We had to introduce

the variance-covariance matrix to describe ~he statistical properties


eoo

of a ~ultivariate on the second moments level. Turning to the relationship

sample - univariate we discover· that this is not paralleled in the multi-

dimensional case either. While formulae for the mean and the variance

of a sample and a univariate were equivalent, those for a multisample

and a multivariate are not. While equivalent formulae to (3-34), (3-35)

and (3-42) can be devised for the multisample, the ones used mostly in

practice ((3-49), (3-51) and {3-54)) correspond really to (3-40), (3-41),

and (3-43) valid only for statistically independent multivariate.

This, together with the difficulty with the computation of

multisample covariances, i.e., the necessity to have the same number

of elements in any two components, leads often in practice to the

adoption of an assumed variance-covariance matrix. Decisions connected

with the determination of the multisa~ple variance-covariance matrix

are among the trickiest in adjustment calculua.

Example 3.22. Let us determine the variance-covariance matrix of the

multisample n introduced in example 3.20. In this case, we


have the variances computed in example 3.20, the results
were:
2
8 12 = 2. 8' 82 = 4.o and

Also, we have the covariances computed in example 3.21, the

results were:

812 = 821 =- 1.4, 813 = 831 = 0.6 and

823 = 832 =- 1.2.


Therefore, the required variance-covariance matrix will be:
81

2.8 -1.4 0.6

E
n
= -1.4 4.0 -1.2

0.6 -1.2 3.6

3.3.7 Correlation

Although the covariances of a multisample do not play the same

role as the covariances of a multivariate, they still can serve as a

certain measure of statistical dependence. We say that they show

the degree of correlation between the appropriate pairs of components.

The degree of correlation as a measure of statistical dependence,

may, of course, vary. We can see that the covariance Sjk E R may attain

any value. Hence it is not a very useful measure because we cannot

predetermine the value of the covariance corresponding to the maximum

or complete correlation. For this reason, we use another measure, the

correlation coefficient, which is usually denoted by p, and is

defined as

( 3-57)

It can be shown that pjk varies from -1 to+l.

Based on the use of the correlation coefficient is the correlation

calculus, a separate branch of statistics. It will suffice here to say

that we call two components ~j and ~k of a multisample n:


( i) totally uncorrelated, if pjk =0 ,
(ii) correlated, if IPjkl < 1 ,
(iii) totally positively correlated, if pjk =1 ,
(iv) totally negatively cor'related, if pjk = -1 .
Note that for the multivariate, the expression for pjk is written completely

analogous to equation (3-57).


82

Example 3.23: Let us discuss the degree of correlation between the

different pairs of components of the multisample n which is

used in examples 3.20 to 3.22 inclusive, and whose variance-

covariance matrix is given in example 3.22.

The correlation coefficients pjk are computed from equation (3-57)

as follows:

- 1.4
= 1.67 . 2 = - o. 42

=
0.6 = 0.19 '
1.67·1.9

-1.2
2 • 1. 9
=- 0.31 .

Note that:

Since

jpjkl < 1, j, k = 1, 2, 3, j :j: k,

thus the components ~ 1 , ~ 2 and ~ 3 of the given multisample n

are all correlated.

Example 3.24: Let us discuss the degree of correlation between the


1 2
components ~ and ~ , and between ~l and ~ 3 of the multi-

sample n = (~ 1 , ~
2
, ~
3
), where:

1 4),
~ = (2' 1, 3, 5,
~2 = ( 4' 2. 6,_ 10, ._at~ ...
1;3 = (-4, -2, -6, -10, -8).
83

By computing the means and variances of .;J, j = 1, 2, 3


similarly to example 3.20, and the covariances 812 and 813

similarly to example 3.21, we get the following results:

Ml = 3, M2 =6· and M3 = -6,


82 = 2, 8 22 =8 82
1 and
3 =8 '

8 1 = ·12~ s2 = 83 = 2 12 '
8
12
=4 and 813 = -4.
Hence

8 12
P12 =8 •S = ...,l-2.......4-2._,1....2 = + l,
1 2

which means that .; 1 and .; 2 are totally positively correlated,

and

~13 _4
P13 = S1 ·S3 = - 1 '

which means that .; 1 and ~:; 3 are totally negatively correlated.

At this po.int it is worthwhile mentioning that the computa-

tions of the means, variances, covariances ~d correlation coefficients

of the constituents of a multisalple are always preferably performed in a

tabular form for easier checking· The following table is an example of

such an arrangment using the two constituents .;1 and ~ 2 of the multi-

sam~le introduced in example 3.20.


84

t;1 F;2 1
( ~ :t.-M1) •
~.
1
~
1
( F;i-M1) (E.l-M )2
"i 1
F.::~
2
(t,;i-M2) '(
(~~-M2)2 ~i-11 2)

2 -2 4 6 3 9 -6

3 -1 1 4 1 1 -1

4 0 0 0 -3 9 0

7 3 9 3 0 0 0

4 0 0 2 -1 1 0

-
E 20 14 15 20 -7
. --- -·--·-
1
il'!.1 = ~ (20) = 1.~' M = - ( 15)
2 5 =3 '

s12 = 5
1
(14) = 2.8, 2 1
82 = "5 (20) =4 '

s 1 = /2.8 = 1.67 ' s.2 = llt = 2


'
1
812 = 5 (- 7 ) = -1.1~,

and

p12 = 1.67
-1.4
. 2 =- 0.4?- .
86

,61, 70, 102, 107, 113, 114) 117,'119, 120, 126, 120, 129, 129, 13:2, !37,
i3V, 130~ 130, 142, 143, 146, 1~6, 1·17, 1471 148, 149, 149, 150, 150, 153.
153, 156, 157, 158, 150, 159, 159, 159, 162, 162, 164, 166, 166, 166, 167,
16!>, 169, 169, 169, 170, 170, 171, 17~; 172, 172, 173, 173, 175, 175, 176,
176, 1.76, 17.7, 177, 178, 179, 180, 180, 181, 181, 181, 182, 183, 184, 184,
l85, 186, 187, 188, 188, 190. 192, 102, 193, 194, 194, 194:, 195, 195, 195,
· ::JO, 10~I., 10°.
··1nn "'' 1n~ 900, •01
iT1..1 1 - -
~01 , -9 01 , -9 0"'"":t
, -
9 •o•
9 9- 0....-, •O·'·!:': -•o~u 1 -~os 1 9..,.. 09.
... ·-:> 1 - • ,

90!)
-
"09 21.~~ 216 <:11!)
I .- • J J~ 919 I 219I -•)•)1
I • - J
2?9 9"'3 2"'7
IW-1 ,.,_, l .- t -
"33.. J --'*'J
·~-.t. 2'-'6
0
93~, I
t .-.

240, 247, 254, 262, 270

Required: (i) Glassify this sample according to your own choice, and

then draw its: distribution histogram, distribution polygon, cumulative

histogram, cumulat'ive polygon.

( ii) :Oet:ermine the mean, standard deviation, median and range

of the sample; then plot these quantities on your histograms and

polygons.

(iii) Det~rmine the probability of the height being in between

121 and 174 ems, by using your distribution histogram, your distribution

polygon, the cumulative histogram, the cumulative polygon, the actual

sample. Then compare the results.

(3) Verify the results given in Example 3.18 for the mean, the variance

and the third moment about zero of the triangular PDF.


3.4 Exercise 3

(l) The following table gives the weights as recorded to the nearest

pound for a random sample of 20 high-school students:

138 150, 146' 158, 150

146 164 138 164 164

150 146 158 173 150

158 130 146 150 164

Required: (i) Compute the mean, the standard deviation, the median

and the range of this random sample using both the original sample

and its definition set.

(ii) Compute the experimental probabilities of the

individual elements and then construct the corresponding discrete

PDF and CDF of the sample.

(iii) Compute the probability that the weight of a high-

school student is less than or equal to 150 pounds.

(iv) Compute the probability of the student weight to be

in the interval [158, 173].

(2) The following table gives the observed heights in em of a random

sample of 125 nine years old pine trees.


87

(4) Verify the results given in Examples 3.17 and 3.18 for the

probabilities P( )J-cr ~ x ~ ]J+cr), P( ).l-2cr ~ x ~ )J + 2cr) and

P(JJ-3cr ~ x ~ )J + 3cr)using the rectangular and the triangular CDF s

respectively ~ather than the corresponding PDF s.)

(5) Let x be a random variable whose PDF is given by:

4> (x) = < h, for ( -3 ~ x .::_ 7)

0, everywhere else.

~.9.ui.red: (i) Determine h.

(ii) Compute the mean and the standard deviation of x.

(iii) Construct the CDF of x.

( iv) Use both the PDF and CDF' to determine the following

probabilities: P(x~1.5),

P(x.?_2.5),

P( -1 .::_ X _-:_ 4) ,

P( JJ-2cr::_ x _-:_ JJ+2cr) •

(v) Compute the 3-rd and 4-th moments of the PDF about

zero.

(6) Let x be a random variable having the following PDF:

¢(x)
=
< k·x , for (O < x < 2)

0 , everywhere else.

B.eS!.ulr_e_£_: ( i) Determine the mean, the variance and the standard

deviation of x.

( i.i) Compute the probability P( 1 _-:_ x .::_ 1. 5) •


88

(7) Let x be a random variable whose PDF is given as:

k + 5o1 x - 5o3 , for ( 3 ~ x ~ 8)


1 13
¢(x) k-50x+ 50 , for (8..::_x~l3)

0 , everywhere else.

Be!!~ired: (i) Determine the mean and the standard deviation of x.

(ii) Compute the probabilities: P(5.5 ..::_ x ~ 10.5), P(x ~ 9),

P(x,.::. 7), P(].l - cr..::_ x ~ J.l + cr) .

( 8 ) Given a multisample n = (t;1 , 2 3


t; , t; ), where~
1
= (4.2, 3.7, 4.1),

t;2 = (26.7, 26.3, 26.6), and ~ 3 = (-17.5, -17.0, -18.0).

Required: (i) Compute the mean of n .

(ii) Compute the variance-covariance matrix of n.

(iii) Compute all the correlation coefficients between

the different pairs of components of n.

(9) Given a bivariate X = (x1 , x 2 ) with PDF

lx1-gl + 1 f ( lxl-ql < s 16, and

\~
s2t 1213 st 6h ' or
¢(X) lx 2-rl < t 13)

, everywhere else,

where q, rare some real numbers and s, t are some positive real

numbers.

Required: (i) Compute the mean of X.

(ii) Compute the variance-covariance matrix of X.


89

4. FUNDAMENTALS OF THE THEORY OF ERRORS

4.1 Basic D:efinitions

In practice we work with observations which are nothing else

but numerical representation of some physical quantities,.e.g. lengths,

angles, weights, etc. These observations are obtained through

measurements of some kind by comparison to predefined standards. In many

cases we obtain several observations for the~ physical quantity, which

are usually postulated to represent this quantity.

There is a different school of thoughts claiming that no

quantity can be measured twice. They say that if a quantity is measured

for the second time, it becomes a different quantity. Philosophically,

the two approaches are very different, however, in practice they coincide.

They vary in assuming different things (hypotheses), but they lead to

the same results.

The observations representing the same quantity may or may not

have some spread or dispersion (by spread we mean that not all the

observations are identical). For instance, when we measure the length of

the side of a rectangle using a graduated ruler, we will have two possi-

bilities (see Figure 4.la, b).

a) b)

Figure 4·.1
90

First, if the length of that side is exactly equivalent to an

integer number of graduations (divisions) on the ruler, the measurement

of it will not produce any spread. This is simply because the beginning

of the side will be at a graduation line of the ruler, and at the same

time the end of the side will be at another graduation line, and hence

we get always the same result. On the other hand, if the end of the

side is located between two division lines on the ruler, there will be

a fraction of the smallest division on the ruler to be estimated. The

estimates(observatiom) will differ, s~ due to different observers, and

hence we shall get a spread.

Usually, the spread and its presence depend on many other

things like: the design of the :experiment, measuring equipment, precision

required, atmospheric conditions, etc. If we know the causes that

influence the spread, we can try to account for them in one way or the

other. In other words, we will apply certain corrections to eliminate

such unwanted influences which are usually called systematic errors.

Examples of systematic errors are numerous like: variation of the length

of a tape with temperature, variation of atmospheric conditions with

time, etc.

In practice, this is possible if we can express such corrections

mathematically as functions of some measurable physical quantities. In

some cases, the systematic errors remain constant in both magnitude and

sign during the time of observations , e.g. most of the instrumental

systematic errors.. In such cases, we can eliminate these systematic

errors by following certain · teclmiques in making the observations. For

example, the error in the rod reading due to the inclination of the line
91

of sight of the level, with respect to the bubble axis, can be eliminated

by taking the backsight and the foresight at equal distances from the

level.

Further, we shall assume that there are no blunders (mistakes)

in the observations. These blunders are usually gross errors due to the

carelessness of the observer and/or the recorder. The elimination of

blunders has to be carried out before starting to work with the observations.

The ways for intercepting blunders are numerous and are as different as

the experiments may be. We are not going to venture into this here.

4.2 Random (Accidental) Errors

Even after eliminating the blunders and applying the appropriate

corrections to eliminate the systematic errors, the observations repre-

senting a single physical quantity usually still have a remaining spread,

i.e. are still not identical, and we begin to blame some unknwon or

partly unknown reasons for it. Such remaining spread is practically

inevitable and we say that the observations contain random or accidental

errors.

The above statement should be understood as follows: given a

finite sequence L of observations of the same physical quantity ~·, i.e.

we assume that the individual elements~., i = 1, 2, .•• , n represent the


~

same quantity t; where ~· is the unknown value, and can be written as:

n
~i
= nl
~
+ E: i' ~
L-
- 1I 2I •••I
n •
92

The quantities e's are the so-called random (accidental) errors*.

The sequence

( 4-3)

(or the sequence L, equation (4-l), for this matter) is declared a random

sample as defined earlier in section 3.1.1. This random sample has a

parent random variable, as defined in section 3.1.2.

It should be noted that the term "random error" is used rather

freely in practice.

4_.3 Gaussian PDF. Gauss Law of Errors

The histograms (polygons) of the random samples representing

observations encountered in practice generally show a tendency towards

being bell-shaped, as shown in Figure 4.2 a,b.

Figure 4.2

* It may happen, and as a matter of fact often does happen, that we are
able to spot some depen~ence of e (for whatever this means) on one or
more parameters,.e.g. temperature, pressure, time, etc., that had not
been suspected and eliminated before. Then we s~ that the e's change
systematically or predictably with the parameter in question, or we say
that there is a correlation between the e 's and the parameter. Here, we
may say that the observations still contain systematic errors. In such a
case we may try to eliminate them again, after establishing the law
governing their behaviour.
93

Various people throughout the history have thus tried to

explain this phenomenon and establish a theory describing it. The

commonly accepted explanation is due to Gauss and Laplace independently.

This explanation leads to the derivation of·the well known model - the

Gaussian PDF. The assumptions,,.due to Hagen, necessary to be taken into

account, along with the derivation of the law, due to de Moivre, are

given in Appendix I. Here we state only the result.

The Gaussian PDF,G(C;E) is found to be (equation (I-ll),

Appendix I) :

G(C; e)=~~ exp (-2e 2 /C), I ( 4-4)

where its argument E is the random error, i.e. a special type of random

variable with mean eqlal to zero, and C is the only parameter of the dis-

tribution. The Gaussian PDF is continuous and is shOwn in Figure 4.3.

Figure 4.3.
From the above Figure we note the following characteristics of

the Gaussian PDF,

( i) G is symmetrical around 0.

(ii) The maximum ordinate of G is at E = 0, and equals /( 2/lJrr )) , which

varies with the parameter C, see Figure 4.2b.

(iii) G approaches the E axis asymptotically as E goes to ~ =.


94

(iv) G has two points of inflextion at e: =+ tC/2.

The shape of G reflects what is known as the "Gauss law of a

large sample of errors", which states that:

( i) smaller errors are more probable than the larger errors,

(ii) positive and negative errors have the same probability.*

Note··that since G is a PDF it satisfies the following con-

dition:
00

f G(C ;E) de: (4.5)

4.4 Mean and Variance of the Gaussian PDF

Since G is symmetrical around zero, it is obvious that its

mean~ equals zero (see section 3.2.5).


-- E

The variance cr 2 of G is again obtained from


E

00
2 2 2
a
e:
= E* ( e: -~ )
E
= f e: G( C; e:) de:

00
2
= ig_ C'IT f E exp (-2e: 2 /C)de:: (4.6)
-oo

Recalling that
00
2 2 2 f1f
f t exp (-a t ) dt = (a> 0),
0 4a 3 '

we get from equations (4.6) and (4.7)

* The same result can be obtained using slightly weaker (more general)
assumptions through the "central limit theorem".
95

=--
2a
3 '

where

a = lc2 .

Hence,

12 · clc c
= 21lc • 21l2 =4
and we get

C = 4cr~2 • (4-8)
2 .
Consequently, the variance cr , or rather the standard deviation cr ~ can
~ . ~

be considered the only parameter of G. Substituting equation (4-8) into

equation (4-4) we get:


---------------------------r
1 2 ·~
exp (-£ /(~J) • (4-9)
a~ .f'2,-}

Note from equation (4-8) that ~ = IC/2, which equals to the abscissas

of the two points of inflextion of G.

Example 4.1. Let us compute, approximately, the probability P(-cr < £ < cr)
- E- - ~

assuming that £ has a Gaussian PDF. We first expand the function


2 2
exp (-£ /2cr£) to be able to integrate equation (4-9). Recall

that:

exp (y) = ey = l+y z:


+ 2! + ~
3! +

Hence
96

2 2 2 4 6
exp ( -€ /(2o )) = 1 - _e- + ....£_
4 _....£__ +·
E
2og
2
Bog 6
48oe:

and

E:

3 5
= __1_ [ 20 ~:~ 1 2 • 2oe_ + _L . 2oe:
o(21T) ~ 2o 3 Bo ~ 5
e: e:

7
1 2oE' + ]
- 48o 6 • 7 •••
e:

= ~~~1.
1T.
(J {C
- 0.167 + 0.025- 0.003]
e:

= i:-,; [0.855] ;. 0.683

Thus:

P(-o < e: <a) ;. . 0.683 •


e:- -e:
By following the same procedure, we can find that:

P(-2oe: ~ e: 1.. 2oJ ,; 0.954,

P( -3oe: ~ e: ~ 3oJ ;. 0 •. 991·


97

4.5 Generalized or Normal Gaussian PDF

The Gaussian PDF (equation (4.9)) can be generalized to have an

arbitrary mean ~ . This is achieved by the transformation


y

Y = e: + ~
y ' ( 4-10)

in equation (4-9), where y is the argument of the new PDF- the generalized

Gaussian. Such generalized Gaussian PDF is usually called normal PDF and

is denoted by N, where:
2
(y-~ )

'-T). 2
y
.
( 4-11)

The name "normal" reflects the trust which people have, or

used to have, in the power of the Gaussian law (also called the "normal

law") which is mentioned in section 4. 3 . If the errors behave according

to this law and display a histogram conforming to the normal PDF, they

are normal. On the other hand, if they do not, they are regarded as

abnormal and strange things are suspected to have happened.

The normal PDF contains only two parameters - the mean ~~and

the standard deviation J,• Hence, i t is well suited for computations.

Note here that the family of G(~; e:) is a subset of the family

of N(~, ~; y). Also note that the following condition has to be satis-

fied by N:

The formula for the normal CDF corresponding toN is given as:
2
1 y . (x-lly).
'!'N(y) = ali";) L"'exp(- 2 ) dx' ( 4-12)
y 2o
.Y
where x is a dummy variable in the integration.
98

For the generalized (normal) Gaussia~PDF, it can be again

shown that:

P ( 1-l -a < y < 1-l + a ) = 0. 683 ..


y y- - y y '

< y < 1-l + 2a ) · 0.954


P(J..l y -2a y- - y y

and
< y < ll + 3a ) ; 0. 997 ..
P( J..l y -3ay- - y y

(Compare the values to the corresponding results of the triangular PDF

in example 3.18).

4.6 Standard Normal PDF

The out.come t of the following linear transformation


x - 1-lx
t =--....;;;;. ( 4-13)

is often called the standardized random variable, where x is a random

variable with mean J..l and standard deviation a • Note that the above
X X

standardization processdces not require any specific distribution

tor x.
The transformation of the normal variable y (equation (4-10))
y-]J
to a standardized normal variable t = ~results
a
in a new PDF

l exp (-t 2 /2) = N(O, l; t) = N(t), ( 4-14)


1(21T)
'-------~-----------·-····--··-

whose mean J..lt is zero and whose standard deviation at is one. This

PDF is called the standard normal PDF, a particular member of the family

of all normal distributions.


99

S:dtnce both the parameters lit~· '.0 and crt' • -l;~ are. determined once

for all, the standard normal PDF is particularly suitable for tabulation

due to the fact that it is a function of t only. An example of such

tabulation is given in Appendix II-A, which gives the ordinates of the

standard normal PDF for different values oft. Note again that

00 l
£oo N(t)dt = /(2n)
The CDF corresponding to N (t) is given by

2
l ft exp (- L) dx,
I!'N(t) = 1(2n) -"" 2 (4-15)
or
2
l 1 ft exp (- ~ )dx,
I!'N(t) =-
2
+
1(2n) 0 ( 4-16)

where x is a dummy variable in the integration. Again, the CDF of

the standard normal PDF is tabulated to facilitate its use in probability

computations. Appendix II-B is an example of tabulated I!'N(t) using

equation (4-15), which gives the accumulated areas (probabilities)

under the standard normal PDF for different positive* values of t.

Appendix II-C contains a similar table, but it gives the values of

the second term in equation (4-16) only, for different values of t.

Hence, care must be taken when using different tables for computations.

* For negative values of the argument t the cumulative probability


P(t <-t ) = I!'N(-t ) is computed from ~N( t ) through the condition:
- 0 0 0

~ (-t )
N o
=l - ~ (t ) .
N o
100

The second term in equation (4-16) is usually known as 2 ~2


erf (t), i.e.

~N(t) = 12 + 212
1
erf (t), ( 4-17)

where, erf (t) is known as the error function, and is obviously given by

l erf (t) = -J; !~ exp (- }- )dx • I (4-18)

This erf (t) is also tabulated*.

In order to be able to use the tables of the standard normal

PDF and CDF for computations concerning a given normal random variable

x, we first have to standardize x, i.e. to transform x to t using

equation (4-13), then enter these tables with t. Thus, if we want,

for instance, to determine the probability P(x < x ) we have to write:


- 0
X-j..l X -J..I
P(x < X )
-o
= P(--X
cr-
< 0
cr
X )
• (4-19)
X X
This is identical to the probability :P(t ~ t 0 ) that can be obtained

from the standard normal tables.

Example 4. 2 Suppose that the height h of a student

is a normally distributed random


N(\:)
variable with mean J..lh = 66 inches and

standard deviation crh = 5 inches. Find

the approximate number K out of 1000

students h inches tall:


0 0·4 t ( i) h ~ 68 inches (Figure 4. 4 -:i.) j
Figure 4.4 - i
(ii) h ~ 61 inches (Figure 4.4-ii)j

* In most of the computer languages, this error fun:tion, erf (t).is a


built-i.n function. Hence it can be called as any lJ.brary subroutJ.ne
and evaluated ~ore precis~ly than using the corresponding t~bles.
101

(iii) h ~ 74.6 inches (Figure 4. 4-iii))


(iv) [64.3 .:_h.:_ 70] inches (Figure 4 •. 4-iv).

Solution: We are going to use the


Table in Appendix II-B.
(i) P(h .:_ 68) = P(t .:_ 68- 66 )
5
= P(t .:_ 0.40) = 0.6554 •
Hence, K1 = (0.6554)(1000) ; 655 students.
(ii) P(h _< 61) = P(t <
-
61- 66 )
5
Figure 4.4-ii
= P(t .:_ -1) = 1-P(t .:_ 1)

= 1. - 0.8413 = 0.1587 •
N{\:)
Hence, K2 = (0.1587)(1000) = 159 students •
(iii) P(h _> 74.6) = P(t > 74 • 6- 66
- 5
= P(t ~ 1.72)

1.72 t = 1. - P(t .:_ 1.72)

= l. - 0.9573 = ~.
Figure 4. 4-iii
Hence, K~ = (0.0427)(1000); 43 students.

(iv) P(64.3.:. h.:_ 70) =


=p (64.3-66 < t < 70-66
5 - - 5
N{t)
=p (-0.34.:. t.:. 0.80)
= P(t .:_ 0.80) - P(t .:_ -0.34)

= P(t .:_ 0.80) - (1-P(t .:_ 0.34)


= 0.7881- [1-0.6331]
.. 0.'34 0 o.s t = 0.7881- 0.3669 = 0.4212 •
Figure 4 .11.-i v Hence K4 = (0. 4212 )( 1000) = 1+21 students.
102

For the normal random variable h

given in example 4.2, determine the

student's height H such that:

( i) P(h < H ) = 0.6554 (Figure ir.• 5- i) ) •


- l
(ii) p (h 2. H2 ) = 0.25 (Figure 4. 5-ii) •
. j

(iii) p (h..:_ H3 ) ::: 0.20 (Figure 4·5-iii)·I

(iv) P(H4 _:. h..:_ H 5 ) = 0.95,


where H4 = f.lh -K and H5 = f.lh +K. (Figure 4, 5-iv) •

Solution: Again in this example, we

are going to use the standard normal

CDF table given in Appendix II-B.

( i) P ( h ..:_ H1 ) = P(t ..:_ t l) == 0. 655 4 .

From the above mentioned table, we


N(t)
0.6'554
get t = 0 .l+, that corresponds to
l
probability P = 0.6554. But we know
Hl-f.lh
that t = •
l crh

From example ir.-2 we have f.lh = 66 inches


figUf_e 4. 5-i and. ah =5 inches. Hence,
-66
H1<•.
tl = 5 = 0.4
from which we get

H1 - 66 = 5(0.4) = 2,
i.e.
H1 = 66 + 2 = 68 inches,

which is identical to the first case

in example 4. 2; however, what we are

doing here is nothing else but the

inverse solution.
103

But:

and we get

By interpolation in the above mentioned

table we get t 2 ' 0.675 which

Figure L~. 5-ii


i.e. H2 = 66 + 5 (0.675)

= 66 + 3.375 = 69.375 inches

By examining the above mentioned table

we discover that the smallest probabil-

ity reading is 0.50, since it considers


N(t.) only the positive values oft. Therefore

we have to write:

and we get

P(t <-t )
- 3
=1 - 0.20 = 0.80.
By interpolation in the above mentioned
Figure 4.5-iii
table we get: (-t 3 ) = 0.842, which

corresponds to P = 0.80. Then we have:


H -66
t3 = 35 = -0.842

and, H3 = 66-5(0.842)
= 66-4.210 = 61.79 inches.
101~

(iv) P(H 4 < h < H )


- - 5

N(t)

= P(-t 0 -< t < t


- 0
) = 0.95,
K K
where t =-=-.
-t 0 o crh 5

F,igure 4 • 5-i ~ The above statement means that:

P(t -< t 0 ) - P(t -<-t 0 ) = 0.95.


However, from the symmetry of the

normal PDF we get:

= l. - 0.95 = 0.025
2
and we get:

P(t -< t 0 ) = 0.95 + 0.025 = 0.975,


or P(t -<
-
t 0 ) = 1. - 0.025 = 0.975.

From the above mentioned table we get:

t
0
= 1.96, which corresponds to

P = 0.975, and we have:

t 0 = 5K = -
1.96 ,

i.e. K = 5(1.96) = 9.80. Consequently:

and
105

Example 4.4: Let us solve example 4.1· again by

using the standard normal CDF tables.

Recall that it was required to compute

P(-cr < e: <a ), where e: has a


E: - - E:

Gaussian PDF (i.e. its ~e: = 0). We

can write:

P(-cr < e: < a )


E: - - E:

-0' ...() 0' - 0


= P( ~ .::_ t < e: ))
0' 0'
E: E:

Figure 4.6 = P(-1 .::_ t .::_ 1), see Figure 4.6.

Further we can write:

P(-1 .::_ t .::_ 1) = 2P(O .::_ t .::_ 1).

From the table given in Appendix II-C,

we get:

P(O .::_ t .::_ 1) = 0.3413.


Hence,

= 0.6826 = 0.683,
which is the same result as obtained

in example 4.1.
106

4.7 Basic Hypothesis (Postulate) of the Theory of Errors,


Testing

We have left the random sample of observations L behind in

section 4.2 while we developed the analytic formulae for the PDF's

mostly used in the error theory. Let us get back to it and state the

following basic postulate of the error-theory. A finite sequence of

observations L representing the same physical quantity is declared a

random sample with parent random variable distributed according to the

normal PDF N(~t' crt; t). Other PDF's are used rather seldom. The

validity of this hypothesis may or may not be tested, on which topic

we shall not elaborate here.

The mean ML of the sample L is said to approximate (the word

estimate is often used in this context) the mean ~t of the parent PDF,

Also, the variance s12 of the sample L is said to

estimate the variance cri of the parent PDF.

Considering the original sample

L = ( t . ) = ( t'+e . ) , i = l , 2 , ••• , n ,
~ ~

we get:
l n l n l l n
M1 =-f. £. =- .2:1 (£1+e.) =- (n£') +- .E 1 e.; = £1 + M (4-20)
n i=l ~ n ~= ~ n n ~= .... e

SLnce the random errors c.'s are postulated to have a parent Gaussian

PDF N(O, cre; e),which implies that ~e = 0, then we should expect that

M + 0 and we can write equation (4-20) as:


£

(4-21)
107

keeping in mind that by the unknown value £'we mean the unknown mean

~£ of the parent PDF of £. We say that the mean ~ of the sample L

approximates (estimates) the value of the mean ~£ of the parent PDF of £.

Similarly, we get

2 1 n n m 2
(.ti-~)2 1 2 1 s 2
SL =n i~l =- i~l[ti-(.t'+ ME))
n
= - k .( c:. - ME) =
m.i=l ~ £
(4.22)

The above result indicates that the variance s12 of the sample L is

identical to the variance s 2£ of its corresponding sample of random errors

c:. This is actually why si is sometimes called the mean sguare error of

the sample, and is abbreviated by MSE. Also, s1 is known as the root

~ean square error of the sample, and is abbreviated by RMS. According

to the basic hypothesis of the error-theory we can write equation (4-22)

as:

( 4-23)

which states that si estimates the variance o~ of the parent PDF of


.. . ' £
n
) •

:Sxample 4.5 Assume that the sample L: (2, 7, 6, 4,

2, 7, 4, 8, 6, 4) is postulated to be

normally distributed. Let us tr~~sform

this sample in such a way that the transformed

sample will have:

(i) Gaussian distribution

(ii) Standard normal distribution.


108

Solution; ·First we compute the mean~

and the variance si of the given sample

as follows:
10
1 l
M =
10 i~l
~.
J.
- 10 (50) = 5,
1

According to the basic postulate of the

error-theory we can say that:

~~ =~ = 5 and o 2 = o~ ; s1 = 2,

where~£ and o 2 are respectively the

mean and the standard deviation of the

parent normal PDF N( ~ 2 , o £,; ~.) assumed

for the given sample. The parameters

~£and o 2 will be used for the required

transformations as follows:

( :i.) The Gaussian distribution G( o ; E),


e
where o~ = o2 = 2, has an argument e

obtained from equation (4-10) as:

Hence the transformed sample that has a

Gaussian PDF is:

e =(e.),
J.
i = l, 2, ... , 10 ·,i.e.:

~ - (-3, 2, l, -1, -3, 2, -1, 3, l, -1).


109

(ii) The standard normal distribution

N(t), has an argument t obtained from

equation (4-13) as:

R..-)..ln R-.-5
~
t . :
~ (j £
;v : _].__
2

' ~ = 1 ' 2 ' ... ' 10
.
Hence, the transformed sample that has

a standard normal PDF is T - ( t i),

i = 1' 2' ... ' 10 i.e.:

T- (-1.5, 1, o.5, -0.5, -1.5, 1, -o.s,


1.5, 0.5, -0.5).

4.8 Residuals, Corrections and Discrepencies

As we have seen, we are not able to compute the unknown

value £'or )..1 £. All w·e can get is an estimate £ for it from the

following equation

£: M_
--r,
= £'+ Me: = £'+ E' *) ( 4-24)
-
and hope that e:, in accordinace with the basic postulate of the error-

theory, will really go to zero.

The residual ri is defined as the difference between the

observation £. and the sample mean£, i.e.


J.

* From now on, we shall use the symbol i for the mean~ of the sample
L. The "bar" above the symbol will indicate the sample mean to make
the notation simpler.
110

Residuals with inverted signs are usually called corrections. It should be noted

that a residual, as defined above, is a uniquely determined value and not

a varia:bl e. The observed value £. is fixed and so is the mean £ for the
~

particular sample. In other words, for a given sample, the residuals can

be computed in one way only. Note that the differences (£. - £)


~
= r. ~
are

called residuals and not errors, because errors are defined as s. = ( 9.•• - J1 n)
l l lv

and Jl£ may be different from £.

In practice, one often hears talks about "minimized residuals",

"variable residuals" etc. which are not strictly correct. If one wants

to regard the "residuals as variables" the problem has to be stated differ-

ently. The difference v. between the observed value £. and any arbitrarily
l ~

assumed (or computed) value £0 , i.e.

(4.26)
should be called discrepency, or misclosure, to distinguish it from the

residual. These discr~pencies are obviously linear functions of £ 0 jtheir

values vary with the choice of £0 • Hence one can talk about "minimization

of discrepencies", "variation of discrepencies" etc. Evidently, residuals

and discrepencies are very often mixed up in practice.

At this point it is worthwhile to mention yet another pair of

formulae for computing the sample mean £ and the sample variance s2 Such
L
simplified formulae facilitate the computations especially for large samples
111

whose elements have large numerical values. The d.evelopment of these

formulae is done analogically to the formulation of equations (4.20),

(4.22), (4.25) and (4.26). Here we state only the results, and the

elaboration is left to the student.


(4.27)
I = !0 + V ,
and
n
2
I:. r.l. , . (4.28)
i=l

where: i 0 is an arbi tra:rily c.rosen value, usually close to I J


n
- 1
v=- I: v.l. ,
n
i=l

and
r 1. = i.l. - i = v.l. - v.- (4.30)

Example 4.6: The second column of the following tab1e is a sample of 10

observations of the same distance. It is required to compute

the sample mean and variance using the simplified formulae

given in this section.

We take ! 0 = 972.0 m,
10 1
v- =1-
10 i=l I: v. =10 (10.50) =1.05 m,
l.

I =!0 + v = 972.0 + 1.05 = 973.05 m2


_:;) 1 10 2 1 2
MSE= or:; = IQ I: ri = 10 (o • 5730) = 0. 0573. m
1=1

and RMS = SL = 0.24 m.


n
One of the checks on the computations is that I:
. 1
1'
i
= o,
J.=

see the fourth colt·~ of the given table.


112

._
No. . .1.
l.
•··
"· . ' .. :~.vl..= 1. - R.o
r
i
= R..J. - i r.
2
. 10
4
(m) l.
(m) = v.J. -
- v
J.
(m2)

1' 972.89 0.89 -0.16 256


2 973.46 1.46 0.41 1681
3 973.04 1.04 -0.01 1
4 972-73 0.73 -0.32 1024
5 972.63 0.63 -0.42 1764
6 973.01 1.01 -0~04 16
7 973.22 1.22 0.17 289
8 973.10 1.10 0.05 , 25
9 973-30 1.30 0.25 625
'

10 973.12 1.12 0.07 49


I

l:e 10.50 -0.95 5730


+0.95

= o.oo

4.9 Other Possibilities Regarding the Postulated PDF

The normal PDF (or its relatives) are by no means the only bell-

shaped PDF's that can be postulated. Under different assumptions, one can

derive a whole multitude of bell-shaped curves. Generally, they would

contai.1 ~ than two parameters which is an advantage from the point of

view of fitting them to any experimental PDF. In other words the additional

parameters provide more flexibility. On the other hand, the computations


113

with such PDF's are more troublesome. In this context let us just mention

that some recent attempts have been made to design a family of PDF's that

are more peaked than the normal PDF in the middle. Such PDF's are called

"Leptokurtic". This more pronounced peakedness is a feature that quite

a few scholars claim to have spotted in the majority of observational

samples. We shall have to wait for any definite word in this domain for

some time.

Hence, the normal is still the most ~opul~r PDF and likely to remain

so because it is relatively simple and contains the least possible number

of parameters - the mean and the standard deviation.

4.10 Other Measures of Dispersion

So far, we have dealt with two measures of dispersion of a sample

namely: The root mean square er·ror (RMS) mentioned in section 4. 7, and

the range (Ra) mentioned in section 3.1.5. Besides the RMS and the range

of a sample the following measures of dispersion (spread) are often used.

The averag~ or :mean error a of the sample L is defined as


e
n
1
a
e =n E 1£. - il = 1n
l.
(4.31)
i=l
114

which is the mean of the absolute values of the residuals.

The most probable error p , of the sample L, is defined as the


e
error for which:

P(lrl < P)
e "p)e =0.50· = .P(lrl 4. 31) !(
.. I
which means that there is ·50.%. probability that. the resid:ual is smaller and

50% probability that the residuAl is larger than Pe·


The most probable error of a random sample can be computed by constructing

the CDF of the corresponding absolute vaJues of the sample residuals, and

take the value of r which corresponds to the CDF = 0.5 as the value of p
e

Both a and p can be defined for the continuous distributions as


e . e
well. For instance, by considering the normal PDF, N(~ ,a ; x),we can
X X
write:

ae = f lxl ~ (x) dx
- CIO

oo (x-~ )2
1
= --=--- I Ix I exp (- - ~ ) ' dx • (4.33)
a I( 21r -oo 2a
X X

Similarly for pe' by taking the symmetry of the normal curve into account,

we can write:

P(x < ~X - p e ) = 1{1 N (~X - p ) .::


e

(x-~ )
2
1 c~x-pe)
=a l(27r) ! exp (- X ) dx = 0.25 (4.34)
X _CIO
2a 2
X
and

P(x -< ~ x + p e ) = I{IN(~x + p )


e
= 0.75 (4.35)
where I{IN is the normal CDF.
115

It can be shown for the normal PDF N(Jl a · x) that "a " "a " and "p "
x' x' x ' e e
are related to each other by the following approximate relation:

or (4.36)
a : a ~ p ,;, 1. 0 : 0. 80 : 0. 67 .
x e e
The relative or proportional error r , of the sample L, is defined
e
as the ratio between the sample RMS and the sample mean, i.e.

re = s1 I I. (4.37)
In practice, the relative error isusually usedto describe the uncertainty

of the result, i.e. the sample mean. In that case, the relative error is

defined as:
....J-.1-e_=_S_Q_/_:€___,1 (4.38)

where S is the standard deviation of the mean I and will be derived later
I
in Chapter 6 .. In this respect, one often hears expressions like "proper-
tional accuracy 3 ppm (parts per million)", which simply means that the

relative error is 3/10 6 =3 · 10-6 . It should be noted that unlike the

other measures of dispersion, the relative error is unitless.

The idea of the confidence intervals is based on the assumption

of normality of the sample, i.e. the postulated parent normal PDF

(N (I, SL;t)) for the random sample L. It is very common to represent the

sample L by its mean I and its standard deviation s1 as

l[ I + SL ] I
or

and refer to it as the "68% confidence interval" of L This is based on

the fact that the probability P(\1 2- a2 .::_ t .::_ 1-1 2 + ai~ is approximately 0.68 for
116

the normal PDF (see section 4.5).

Similarly) one can talk about the "95% confidence interval")

the "99% confidence interval") etc. In general, the confidence interval

of £ is expressed as:

(4.40)
where K is determined in such a way as to make

P(J.lR.- KcrR. ~ £ ~ J1£+KcrR.)eqi::tal.to 0.95, 0.99, etc.

The values (I - KS 1 ) and (I + KS1 ) are called the lower and the

upper confidence limits.

Example 4.1: Let us compute the average error,the relative error and the

95% confidence interval for the sample of observations L


given in example 4.6.

The average error is computed using equation (4.31) and the

fourth column of the given table in example 4.6 as:

1 10 1
a =- l: Ir. I = 10 (1. 90) = 0.19 m •
e 10 i=l ~

The relative error of the sample is computed from equation

(4.37) and the results obtained in example 4.6 as:


I - 0.24 •
247 ppm
re = 81 R. = 913.05

The 95% confidence interval of R. is

[I - K s < £ < I + Ks ] .
L - - L

where the number K is computed so that

P(Jl 2 - Kcr 2 ~ R. ~ J.l£+ KcrR.) = 0.95 .


This is identical to the probability P(-K < t < K) obtained

from the standard normal tables (see example 4.3) the last

case). Hence we can write:


117

P(-K < t < K) = P(t < K) - P(t < -K) = 0.95


from which we get

P(t < K) ~ 0.975,

Using the table for the standard normal variable of Appendix

II- B we get:

K = 1.96.
(In practice K =2 is usually used for the 95% confidence

interval. ) The 95% confidence interval of ..t then becomes

[973.05- 1.96 (0.24) ~ 2 ~ 973.05 + 1.96(o.24)])

that is:

[972.58 < 2 < 973.52] m

or

[973.05 + 0.47] m.

Example 4.8: Given a random variable x assumed to have a normal distri-

bution N (35, 4; x), compute the most probable error.

From the assumed PDF we have:

N{t) ~
X
= 35 and cr X = 4.
The most probable error p is computed so that
e
P(ll~x - p e < X < 11
~x
+ pe ) =
-tp 0 t
= P(-t p-< .t < t )
p
= 0.50, (Figure 4.7a)
Figure 4.7a
where t
p cr
X

The above probability statement can be rewritten as· (equation

(4.35)):
118

P(x < y
- x
+ p )
e
= P(t <
-
t )
p
= 0.75 (Figure 4. 7b) .

From the table in Appendix II - B, we obtain t


p
= 0.675
corresponding toP= 0.75. Hence,

p
e
= 4 tp = 4 (0.675) = ---
2.7 .
0 tf
Figure 4. Zb Note that in the second case of example t~.3, the value 3.375

is nothing else but the most probable error of the given random

variable h.

4.11 Exercise 4

1. Prove that the Gaussian PDF given by equation ( lt.• t~), has two points of

inflection at abscissas + IC/2.

2. For the Gaussian PDF given by equation (!~.• 8), determine approximately

the probabilities: P(-2o < s < 2a ) and P(-3a < s < 3a )


S E E E

by integrating the PDF, then check your results by using the standard

normal tables.

3. Prove by direct evaluation that the standard normal PDF has a standard

deviation equals to one.

l~. Show that the standard deviation a, average error a and the most pro-
e
bable error p of the normal PDF satisfy the following approximate
e
relations:

a a
e
1.0 0.80 ~' o. 67.
119

5. Determine: the average error, the most probable error, the relative

error and the 90% confidence interval of the random sample given in

the second problem of exercise 3, section 3. ~..

6. Assume that the sample H = (-5, -4., -3, -2, -1, 0, l, 2, 3, L~, 5) is

hypothesized (postulated) to have a Gaussian distribution. 'rransform

this sample so that the transformed (new) sample will have:

(i) Normal distribution with mean equal 10.

(ii) Standard normal distribution.

T. Gi.ven a random variable x distributed as N (25, 10; x), determine the

following probabilities:

(i) P(x ~ 28.5) j (ii) P(x 2 22.5),


(iii) P(x ~ 21.5), (iv) P(l6.T5 < x < 23.82)J

( v) P ( J x-25j < 1. 25) .

8. For the random variable in the previous problem, determine the values

Z. such that
l

( i) P(x < Zl) = 0.65, (ii) P(x ~ z2 ) = 0.025


'
( ii) P(x < Z3) = 0.33 ~
( iv) P(jx-25! _: Z4) = 0.33
( v) P(jx-25j >
-
z5 ) = 0.50.
120

9.
D

/'
/

/
/ h

- , . - ~.

The above figure shows a surveying technique to determine the height h

of ~ tower CD, which cannot be measured directly. The observed quantities

are:

~ = the horizontal distance AB >

a ,S = the horizontal angles at A and B J

e = the vertical angle of D at B .

The field results of these observations are given in the following table:
121

Field Observations

.l(m) a 13 e
P-45.63 65° 32' 03" 37° 13' 08" 42° 53' 15"
.55 32 04 13 11 52 30

.59 31 59 13 10 53 00

.65 32 01 13 . 13 51 00

.58 31 58 13 06 52 15

13 12 52 45
..
51 .15"

53 00

51 45

52 15

Average temperature during the observations time was T = 20° F.

The following information was gi•ren to the observer:

(i, The micrometer of the vertical circle of the used theodolite was not

adjusted to read 00' OO" when the corresponding bubble axis is

horizontal; it reads- (00' 30").

(ii) The nominal length of the used tape is 20m at the calibration temper-

ature T0 = 60° F, and the coefficient of expansion of the tape material

is y =5 • 10- 5 / 1° F •
122

Required

( i) Compute the estimated values for the quanti ties 9.-, a, (3 and 6 •

(ii) For each of the above observed quantities compute its standard de-

viation and its average error.

(iii) Compare the precision of these observed quantities (by comparing

the respective relative errors).

(iv) Assume that each of these observed quantities has a postulated normal

parent PDF, construct the 95% confidence interval for each quantity.

(v) Compute the estimated value of the tower's height h to the nearest

centimeter.
123

5. LEAST-SQUARES PRINCIPLE

5.1 The Sample Mean as

"The Least 89-uares Estimator"

One may now ask oneself a hypothetical question: given the

sample L = (~.),].
i = 1, 2, .•• , n, what is the value ~ 0 that makes the

summation of the squares of the discrepancies

v.]. = 2.
].
- ~
0
, i = 1, 2, ... , n, ( 5-l)

the smallest (i.e. minimum)?

The above question may be stated more precisely as follows:

Defining a "new variance" 8* 2 as

2
S* =-1n n
I (2.-2°)
i=l J.
2
=-1n n
.I v.
2
J.=l ].
(5-2)

find the value 2° that is going to give us the smallest (minimum)

value of 8* 2 .

Obviously, such a question can be answered mathematically.

From equation (5-2), we notice that 8* 2 is a function of 2°, .which is

the only free variable here: and can be written as

(5-3)

We know that:

Hence, by differentiating equation (5-2) with respect to 2° and equa~-

ting it to zero, we get:


124

1 n -2 n
=- E [2(i.-R. 0 )(-l)] = - E (i.-i 0 ) = 0 ,
n i=l ~ n i=l ~

that is:
n
E (i.-i0
i=l ~
) =0 •

The above equation can be rewritten as:

n n·
0 0
E R.. = E R. = ni ,
i=l ~ i=l

which yields

o n
R. =-n1 E R.. - R.
i=l ~
(5-4)

The result (5-4) is nothing else but the "sample mean" I again. In

other words, the mean of the sample is the value that minimizes the

sum of the squares of the discrepancies making them equal to the

residuals,(see section 4.8).

This iE the reason why the mean I is sometimes called the

least-squares estimation (estimator) of i, i.e. of ~i ; the name being

derived from the process of minimization of the squares of the discre-

:pancies. We also notice that i minimizes the variance of the sample if

we want to regard the variance as a function of the mean.

Note that the above property of the mean is completely indep-

endent of the PDF of the sample. This means that the sample mean I is

always "the mi:"l.imum variance estimator of R." whatever t.he PDF may be.
125

5. 2 The Sample Mean as


"The Maximum Probability Estimator"

Let us take our sample L again, and let us postulate an underlying

parent PDF to be normal (see section 4.5) with a mean ~ = ~ 0 and a


~

variance cr~ 2 given by:


n
2
crn
"'
= ,S* 2 = -n1 .
~=1
l:
(~. -
~

We say that the normal PDF, N(~~' cr~; ~) ; N(~ 0 , S*; ~) is the most

probable underlying PDF for our sample L (L = (~. ), i = 1, 2, •.. n)


~

if the combined probability of simultaneous occurrence of n elements,

that ~the normal distribution N(~ 0 , S*; ~),at the same places as Lis

maximum. In other·words, we ask that:

P[ ( ~. < ~ < L + 8~.)' i = 1, 2 •• n] =


1. 1. 1.

n
= II N( ~o' S*; L)
1.
. 8~.
~
i=l
be maximum with respect to the existing free parameters. By examining equation

(5.6), we find that the only free parameter is ~0 (note that S* is a function

of ~ 0 ), and hence we can write the above combined probability as a function

of ~0 as follows:

P[ (~. < ~ < ~ + 8~.), i = 1, 2, • • • , n] = A (~ 0 ) (5.7)


1. - - i ~ .

Note that 8~'s are some values depending on L and therefore are determined

uniquely by L.
126

We shall show that the value of ~0 satisfying the above condition

is (for the postulated normal PDF) again the value rendering the smallest

value of S*. We can write;

n
[ A( ~ 0 ) J = max I II N(~ 0 , S*; ~.) 8~.]
1. 1.
~ 0 sR i=l

n n (~. -~0)2
l
= max-, II II exp (-
1.
) 8~.
1.
1
~ 0 sR i=l S*I(27T) i=l 28* 2

n (£.-~0)2
= max l )n II exp ( -
1.
8~.] (5.8)
0 [( S* ll(21r) 1.
Q, sR
i=l 2S* 2

n
Here II 8~. is determined by L, and hence does not lend itself to maximiza-
1.
i=l
tion. It thus can be regarded as a constant, i.e.

n (!l,.-.Q.o)2
max [ A( ~ 0 )] =max II exp ( - 1
2 ) ] . (5.9)
i=l 2S*
R. 0 sR

Let us denote the second term in-the RHS of equation (5.9) by Q, which can

be expressed as:

n (~.-~o)2
Q= .Ill exp ( -x. ) , where x. = -1---~- (5.10)
1.= 1. 1. 2S* 2

This implies that:

n n
£n Q = .Q.n II exp ( -x.)) =
1.
E tn ( exp ( -x. ) ) ,
1.
i=l i=l
or
n
Q = exp ( E ( -x. ) ) , (5.11)
i=l 1.
127

From equations (5-9) ' ( 5-10) and (5-ll) we get:

n (,Q,.-,Q,o)2 n
J. ) l (,Q,.-,Q,o)2]
IT exp (- = exp [-- i: (5-12)
J.
i=l 28* 2 28* 2 i=l

The condition (5-9) can be then rewritten as:

n
[( l )
S*I(2'TT)

From equation (5-5), we have:


n
l:
i=l

Hence by substituting this value into equation (5-13) we get:


l n
max [ A( ,Q, 0 ) ] = max [ ( ) exp ( - ~) ] •
,Q, 0 ER ,Q, 0 ER S*/(2'TT)

Since the only quantity in equation ( 5-14) that depends on ,Q, 0 is S*, we

can write:
n
[;x.(,Q,O)J =max [(..1.)] =
,Q, 0 ER S*

=min [(S*)n ]. ( 5-15)


,Q, 0 ER

Because S* is a non-negative (quadratic) function of ,Q, 0 , the minimum of

(S*)n will be attained for the same argument as the minimum of S* (see

Figure 5-l.

Figure 5-l
128

Finally, our original condition (equation (5-9) can be restated as:

( 5-16)

which implies that

as*
- as*-2=
-=- 0 '

that is:
n
a
-I. v.2 :::: 0 . (5-17)
].
ato i=l

Obviously, the condition (5-17) is the same condition as that of the

"minimum variance" discussed in the previous section, and again we have


0 -
R, = ll, •

We ha.ve thus shown that under the postulate for the underlying

PDF, the mean 1 of the sample 1 is the ~aximum probability estimator for

5/,. As a matter of fact, we would find that the requirement of maximum.

probability leads to the condition

(5.18)

for quite a large family of PDF's, in particular the s~etrical PDF's.

If one assumes the additional properties of the random sample as

mentioned in 3.2.4.. then additional features of the sample mean can

be shown. This agajn is considered beyond the scope of this course.

5.3 Least-Squares Principle


We have shown that the sample mean renders always the minimum sum

o.:f squares of discrE)ancies and that this property is required, for a lar~e

family of postulated PDF's, to yield the maximum probability for the underlying
129

PDF. Hence the sample mean ~' which automatically satisfies the condition

of the least sum of squares of discrepancies, is at the same time the

most probable value of the mean ~~ of the underlying PDF under the con-

dition that the underlying PDF is symmetrical. This is the necessary

and sufficient condition for the sample mean to be both the least squares

and the maximum probability estimator, i.e. for both estimators to be

equivalent.

The whole development we have gone through does not say any-

thing about the most probable value of the standard deviation a~ of the

underlying PDF*). a~ has to be postulated according to equation (4.23).

The idea of minimizing the sum of squares of the discrepancies

is known as the least-squares principle,and has got a fundamental

importance in the adjustment calculus. We shall show later how the

same principle is used for all kinds of estimates (not only the mean of

a sample) and how it is developed into the least-squares method. However,

the basic limitations of the least-squares principle should be born in

mind, namely

(i) A normal PDF (or some other symmetrical PDF) is postulated.

(ii) The least-squares principle does not tell anything about the

best estimator of a~ with respect to the mean ~~ of the

postulated PDF.

*) Some properties of the standard deviation s can be revealed if the


additional properties of the random sample are assumed (see 3.2.4).
130

5.4 Least-Sqaures Principle for Random Multivariate

So far, we have shown t.hat the least-squares principle spells

out the equivalence between the sample mean 9., and the estimate for the

parent population mean ~9., determined from the condition that the sum of

discrepancies be minimum. We have also shown that 9., is the most probable

estimate for P9., providing the parent population is postulated to have

normal or any other synunetrical PDF. We shall show now that the same

principle is valid even for random multisample if we postulate the under-

lying PDF to be st.atistically independent (see Section 3. 3. 2) .

Denoting the multisample by Land its components by Lj, j = 1,

2, ... , s, and remembering that each Lj is a sample on its own, we can

write:

(Ll, L 2 , Ls)

"}
L • ., " I

( 5-19)
j Q,j ) sRs
Lj (X,j
1' 9.,2'
... , n.
J

Assuming a particular value L for the multisample L, where


0

L = (Q,l, Q,2, • • • I
Q, s) t:Rs (5-20)
0 0 0 0

is a numerical vector (sequence of real numbers), the associated dis-

crepancies V, which can be regarded as a multisample as well, are:

·- (V ,
l
v2 , ... I vs ) ( 5-21)

Here, each Vj, j 1, 2, •.. , sis a sample of discrepancies on its own,

i.e.

.... ' (5-22)


131

Making use of formula (3-52), we can write analogically to (5.2)

S*~ 1
=-
n.
( 5-23)
J
J
. 2
'l'he minimization of the variances, i.e. minimization of each E[ (V'J) ] ,

is equivalent to the minimization of each s~ 2 , or as we usually write:


J

(min [E(V'j ) 2 ], j = 1, 2, .. , s);;:; min [trace Hi'J *}, (5-24)


L ER 8 L ERS
0 0
......
where L/r:;' is the variance-covariance matrix of the mu.ltisample L (see

section 3.3.6). By carrying out this operation,similar to section 5.1,

w·e w:i.ll find that the vector

( 5-25)

sati.sfi.es the cond:i.ti.on ( 5-2l~). On the other hand, the result ( 5-25)

is nothing else but the mean E of the multi sample, i.e. :


L = E c:Rs • (5-26)
0

N(~j, S~; ~j), for each component Lj


-
Postulating a normal PDF,
0 J
of the mu.ltisample L, the multivariate PDF of the parent population can

be written as:

~
s
~(£) = IT N(£j, S*. Q, j )
j=l 0 j '
2
s (£j-£j)
l " 0
= IT exp [- ]' ( 5-2'7)
j=l s~l(2rr) 2S~ 2
J J

where Q,j is the random variable having mean ~j and standard deviation s~·.
0 J
Following a similar procedure as in section 5.2, we end up again with

with the discovery that the vector

( 5-28)

*) Trace of a matrix is the sum of its diagonal elements.


13~

maximizes the probability that the members of the parent population will
,_.
occur at the same places as the members of the multisample L.

Hence -LER s is, _under the above conditions,*)the


_,
maximum probable estimator for the mean ~ of the postulated parent

multivariate PDF, where

( 5-29)

5.5 Exercise 5

1. Prove that the mean~ of a continuous PDF, ~(x), defined as:


00

~ =J x ~ (x) dx

minimizes the PDF variance cr 2 , defined as:

as* ---
2. Prove that---= 0, is the necessary and sufficient condition for the
&Q.o
rectangular (uniform) PDF, R( .Q. 0 _,__ S*; .Q.), to be the most probable

underlying PDF for a sample L with mean i and variance s2 . Note

that the analytic expression for the uniform PDF is given in example

3.17, section 3.2.5.

3. Prove that the ·same helds fC~r· t~..e triangular_--

PDF, T(.Q. 0 , S*; .Q.),using its analytic expression given in example


3.18, section 3.2.5.

*) It can be shown that L is the maximum probability estimator of ~ even . when


we postulate a statistically dependent multi dimensional PDF from a certain
family of PDF's.
133

6. FUNDAMENTALS OF ADJUSTMENT CALCULUS

6 .1 Prima.ry and Derived Random Samples

So far, we have been dealing with random samples (multisamples)

that had been obtained through some measurement or through any other data

collecting process. These samples may all be regarded as primary or original

random samples (multisamples).

In practice, we are often interested in other samples that would

be derived from the primary samples by means of a computation of some kind.

Such samples may be called derived random samples (multisamples).

From the philosophical point of view, there is not much difference

between these two, since even the "primary" samples may be regarded as

derived from the samples of physical influences or physical happenings.

However, it is necessary to distinguish between them to be able to speak

about the transition from one to the other.

6.2 Statistical Transformation, Mathematical Model

The transition from a primary to a derived sample (multisample)

along with the associated variances and ~ovariances~be called statistical

transformation. We have already met two examples of such transformation

although applied to random variable rather than sample (see sections 4.5 and
4.6), namely the transformation of the Gaussian PDF to the normal and to
the standard normal PDF's, respectively.

Such statistical transformation may not always be as simple as

in the above two cases. As a matter of fact, it may not be even possible

to derive the sample at all from the primary sample which is usually
134

the case with multisamples. In other words, it might not be possible to

express the derived sample explicitly in terms of the primary sample.

Let us consider a primary multisample L = (Li), i = 1, 2J . . . ,s,

that has s constituents. Each constituent Li = (~ki), k = 1, 2,

is a random sample on its own and represents a distinct physical quantity


i
~i (i.e. the observations ~k , k = 1, 2, . . • ,ni are all representing the

same physical quantity t. ). Now, we may be interested in deriving a multi-


~

sample X having n constituents, ie.

X : : : (Xj) ' J. =1 ' 2 >. • • ,n'


from the original multisample L; noting again that each constituent Xj

represents a distinct physical quantity xj, j = 1, 2, . . . ,n. The formulae

(relationships) relating the physical quantities ~ and

x, where

and
(6.1)*
X = (xl ' x2 ' . • . 'xn )
are called the mathematical model for the statistical transformation; and

is usually expressed as:

l F ( .Q. , x) =0 J (6.2)

where F denotes the vector of functions f., i


~
= 1, 2, ... ' r(having r
components)that can be established between ~ and x.

To be able to derive x from ~, the mathematical model (6.2) should

be formulated as:

x =F (t), (6.3)

* Note that ~ and x are nothing else but the multivariates corresponding
to the multisamples L and X respectively.
135

which gives x as explicit function of t.

Example 6.1: After having measured the two perpendicular edges a and b

of a rectangular desk (see Figure 6.1), suppose that we

are interested in learing something about the length of


b
,. ' the diagonal d, and about the surface area cc of this disk.
d d
" ex.. In this case, the mathematical model will be written as:
/
1/
x = F ( t) , where
Figure 6.1
x = (x1 , x2 ) = (d, a. ) , and

To derive the components of x from t we write:

a. = f 2 (a, b) = ab •

In vector notation, we can write:

x=[::l = [:1 =

The possibility .of e•rrying out the statistical transformation depends

basically on three factors:

(i) complexity of the mathematical model, ~.e., the possibility of expressing

x explicitly in terms of L (x:. F(t));

(ii) "completeness" of the primary multisa.mple L, i.e. whether all its con-

stituents have the same number of elements in order to deduce the variance-

covariance matrix ~1 ;

(iii) our willingness to match the individual s-tuples of elements from the

primary multisa.mple L with then-tuples of elements from the derived

multisa.mple X, which creates much of a problem.


136

Particularly the last two factors are so troublesome that we

usually do not even try to carry out the transformation and put up with
~

some estimates, i.e. representative values E(X) and E for


~tatistical
. X
the derived multisample instead. To do so, we first evaluate E(L) and

E1 for the primary multisample L, from which we then compute the statistical

estimates E(X) and EX for X.


According to the basic postulate of the error theory and to make

the subsequent development easier, we generally postulate at this stage the

PDF of the parent multivariate to the multisample L and assume

E(L)
-
= E*(£), EL = E*t'
-
E(X) = E*(x), and EX= E~ (6.4)

in very much the same way as we postulated

i· = p£ and SL = a R.
for the univariate case as discussed in section 4.7. This postulate allows
us to work with continuous variables in the mathematical model and write it

as:

F(L, X) =0 (6.5)
understanding tacitly that each value X has· its counterpart L.
- A

From now on·, we shall write E for E(L),a.nd .X for the statistical

estimate of . X • Hence the mathematical model (6.5) becomes


..
F(L, X) =0 , (6.6)
which consists of r functional relationships between L and X.
..
· From the point of view of the mathematical molel F(L, X) = 0,
the statistical transformation can be either solvable (if s > n) or unsolvable

(if s < n). If it is solvable then we may still have~ distinctly different

cases:
137

(.i) " (when r


ei·ther the model yields only one solution X = s = n) by

using the usual mathematical tools, i.e., X is uniquely derived from


-
L;

(ii) or the mathematical model is overdetermined (when r, s > n) and cannot


"
be resolved for X at all by using the ordinary mathematical tools,

since an infinite number or different solutions for X can be found.

The first case we have met in example (6.1) where the determina-

t.ion of X from L does not present any problem from the statistical point of

view. The only problem is to obtain L:X from L and L:L. This problem, known

as propagation of errors, will be the topic of the next section.

If the model is overdetermined, or as we often say, if there are

£~9unda~cies, (redundant or surplus observations) then the problem of trans-


- "
forming (L, l:L) + (X, l:X) constitutes the proper problem of adjustment.*)

6.3 ?ropagation of Errors

6.3.1 Propagation of Variance-Covariance Matrix, Covariance Law

The rela·tionship between EX and EL for a mathematical model

F (L, X) = 0

is known as the propagation of variance-covariance matrix. Such relationship

can be deduced explicitly only for explicit relations


"
X = F (L) •

To make things easier, let us deduce it first for one particular explicit

relation, namely the linear relation between X, and L, i.e.


-

* It has to be mentioned here that in practice we are in both cases working


with E- and ~>, the variance-covariance matrices of L and X rather than
l,L, EXLbelong~ng to the samples L and X. The expressions for l,E' l:~ are
derived in 6.4.4.
138

where B is indeed an n by s matrix composed of known elements *). Note

that X is determined uniq~ely, as required. We want to establish the

transition

E1 = E( (L-L) (L-E)T) ~ l: X ,where E = E(L). (6.8)

We can write:

(6.9)

Here X = BL, and according to the postulate introduced in section 6.2

we can write:

E(X) = E(B L + C) =B -
E (L) + C =B -L + C.
Hence

EX =E ( (BL

= E (B(L E) (B(L- E))T)

=BE ((L E) (L- E)T) BT = B(E 1 ) BT,

I Ex = B EL BT. I (6.10)

This formula (6.10) is known as the law of propagation of variance--

covariance· matrix, or simply the ·covariance law.

*) This matrix B, which determines the linear relationship between X and


L is sometimes called' the "design matrix" ,"the matrix of the coefficients"
of the constituents of L in the linearized model, or simply the "coef-
ficients matrix".
139

Example 6.2: Assume that the variance-covariance matrix of a given

multisample L = (R.l, .Q,2' .Q,3) was found to be


3 2 0

L:L = 2 3 1

0 1 4
If a multisample X= (x1 , x 2 ) is to be derived from L

according to the following relationships:

X
1
=

determine the variance-covariance matrix L:X of X.

It can be seen that the above relationships between the

components of X and L are linear, and our mathematical

model can be expressed as:

X= B L
2,1 2,3 3,1

i.e.

{::1 =
f: -:]
1
f::1
This indicates that the coefficients matrix B is given by:

B = [:
0

1
-3
0
I •
140

The variance-covariance matrix ~X of X is given by equation

(6.10) i.e., in our case:

~
X
= B
2,2 2,3 3,3 3,2

-310 .r~0 1~ ~4'1 rt-3; i0J


0

-1

7
-121
1
{-3~ i]
0
= [39
5
5]
23

i.e.

Now we shall show that the propagation of variance-covariance

matrix can be deduced even for a more general case, namely the non-linear

relation between X and L, i.e. -


X =F (L) (6.11)
when F is a function with at least the first order derivative. Here we

have to adopt another approximation yet. We have to linearize the relation

(6.11) using, for instance, Taylor's series expansion around an approximate


value 1° for L .

X= F(L0 ) + dF (L - L0 ) +higher order terms,


dL JL = L0

where

dF

dL
IL= Lo
(L - L0 ) =
s
L:
i=l
-, C!F
dR,. R,.
~ ~
=
141

Taking the first two terms only, which is permissible when the values of

the elements in~ 1 are much smaller than the values of 1i' we can write:

(6.12)
where is again ann by s matrix but this time composed from all the partial
B
ax.
~
derivatives alj *).Applying the expectation operator we obtain
.Q, =.Q.o
j j
realizing that E(F(L 0 )) = F(L 0 ) and E(L 0 ) =L
0 (because L0 is a

selected vector of constant values)

(6.13)
Subtracting (6~13) from (6.12) we get:

X E (X) . B(L- E(L)) = B(L- L) (6.14)

and we end up again with

Ex
. (6.15)

resi:tzing that EX-E(X) = EX

* Explicitly' if we have

xl xl (.Q.l' Q,2' Q, )
' s
x2 x2 (.Q.l' .R.2' . '
.Q,
s
)
X = =

X
n
X (tl, .Q,2'
n
. '
.R.
s
)

:- :,t'lleta.'.. the matrix B will take the form:


ax 1 ax 1 ax1
a.R.l a.Q.2 a.Q.
s
ax:2 ax 2 ()X2
B=
a.R.l al2 a.R.
n,s s
....
axn ax ax
_!l n
a.R.l a.R.2 a.R. •
s
142

Hence the linear case may be rega:cded as one particular instance(special

case)of the more general explicit relation, yielding therefore the same

law for the propagation of variance-covariance matrix, i.e., the same

covariance law. It should be noted that the physical units of the individual

elements of both matrices B and E1 must be considered and selected in

such a way to give the required units of the matrix EX

Example 6.3: Let us ;!:;·ak.e .again the example . 6.1 and form the variance

covariance matrix EX for the diagonal d and the area a of

the desk in question. We have:

EL = [s~ sabl
sab s~ J
and the model is non-linear, although explicit, i.e.

X = F ( L) , or ( d, a ) = F (a, b) •

We have to linearize it as follows:

X = (d, a) = (d0 , a 0) + B [(a, b)- (a0 , b0 )],

where (do, ao) = F(a0 , b 0 ), and

~ere:
.
B =

r aa
-aa
a~
Ml
a
3b
a
ab •

a.

Hence, the matrix B in this case takes the form:

B =
2,2
and by applyi.ng · the covariance law (equation ( 6.15 )) we

·. t!~
get:

I:
X
= sd•l
82
= B I:
L
BT
8ad a

= t~d b~dJ [s;


8ba
a/d
ab] b/d
82
b
8 r :}
=

Example 6.4: Let us assume that the primary multisample L = (a, b) which

we have dealt with in Examples 6.1 and 6.3 is given by:

L = {a, b} = {(128.1, 128.1, 128.2, 128.0, 128.1), (62.5,

62 •7 , 62 •6 , 62 •6, 62. 5 )} , in e~ntil!letres •

Accordingly, the statistical estimate of the derived

quantities will be

i =l~J..
· a'·
= [II (a)~a + b~1>) 2 ]].;
where a and bare the estimates (means) of the two:meas'U.r.ed

sides of the desk. From the given data we get

a = 128.1 em. and b = 62.58 em.

Hence
144

X=

= [
142. 57 em
· 8016. 50 cm2
l
E1
l
After computing the variance-covariance matrix we get

·ro.oo4 0 em 2
~L = 0 0.0056

which indicates that the constituents a and b are being taken

as statistically independent.

Evaluating the elements of the B matrix (as given in Example·

.6. 3) we get:

B =
-a
a"
b
b
A

-a
=
I 0.898
62~58
o.439l
128.1

in which the elements of the first row are unitless, and

of the second row are in em.

Finally ~X is computed as follows:

~ =B ~ BT
X L

= [0.898 ro.~o4
0.~056]
0.4391 [ 0.898 62.581
62.58 128.1 0.439 128.1

= [0.0043
0. 5397
0.5397 ]
107 . 5627 , with units
[ 2
~:3 =~]
em •

Furthermore

sd = l(o.oo43) = o.o66 em.,

S = 1(107.5627) = 10.37 cm2


a
145

6.3.2 Propagation of·Errors,;,'¢1ncorrel.._teQ. C$-se

If X contains one component only, i.e. x, the matrix B in the formulae

( 6.10) or ( 6.15) degenerates into a 1 by s matrix, i.e. into a row vector


B s [B 1 , B2 , • . • ,Bs], and

EX = BI:LBT

becomes a quadratic form which has dimensions 1 by 1. Then·

s2X . (6.16)
If, moreover, 1 is assumed uncorrelated, we have

r1 = diag (S~ ,
1
si2 ' . . . ' (6.17)

which is a diagonal matrixJand we can write

s
8.2
X
= I:
i=l
B~
l
si i' (6.18)

This formula is known as the law of propagation of MSE' s or simply the

law of propagation of errors. The law of propagation of errors is hence

nothing else~ but a special case of the propagat~on of variance-covariance

matrix.

The law of propagation of errors has many applications in

surveying practice as well as in many other experimental sciences.

Example 6.5: In figure 6.2, we. assume a plane-triangle in which

the angles a and {3 whose estimat,ed values are:


- = 32°
a 15' 20 11 ' with s a = 4" '
s = 75° 43' 32" ' with se = 3" ' are e>bserved
Also, assume that a and {3 are independent, i.e. sa{3 = o.
Let us estimate the third angle y ,along with its standard

error Sy ,as follows:


Figure 6. 7.
146

r = 18o 0 - (a + s) = 72° 1, "o8 J

8 2 = (ay)2 8 2 + (ar)2 82
y aa a a~ s
= (;.,1) 2 (4) 2 + (-1) 2 . (3) 2 = 16 + 9 = 25 )

that is: 8y = 5" •

Example 6.6: Figure 6.3 shows a levelling line between two bench marks

A, C, with observed level differences h. of the individual


l.

sections with length ~i' i = 1, 2, • . . ,s. Assume that


c
all the h.'s are uncorrelated and the M8E of h. is propor-
1. l.

t 1.,\on a 1 · t o 2
k
.II(, i 1

1. • e • 8h . = k ~.,
l.
where k is a constant.
l.

Let us deduce the exPression for the M8E of the overall level

difference ~H between A and C where:

s
~H = HC - HA = r hi •
i=l
The mathematical model in this case is

+ ••• + h
s
A Hence:
: 2· (~)2
Figure 6.3 2
8~H = (a~H)2
dhl sh + dh
82
h2 + • ..
1 2

= ( 1) 2 (k~l) + ( 1)2 ( k~2) + •

s s
= r k ~.
l.
=k z ~.l. '
i=l i=l

which means that the M8E of ~H equals to the constant of propor-

tionality k multiplied by the total (overall) length of the

levelling line A - C.
147

Let us consider the example 6.3 and assume that the errors in

a, b are uncorrelated i.e. 8ab =0 as we did in Example 6.4. Then we can

treat d and OG separately (if we are interested in their individual M8E' s

alone) and we get by applying the law of propagation of errors:

82 = (~)2 82 + (~)2 82 1
=~ (a2 8 2 + b2 8 2)
d aa a <lb b a b

82 = (~)2 82 + (~)2 82 = b282a +


a
282
b
a aa a <lb b

Note that the same results can be obtained from Example 6.3 immediately by

putting 8 ab = 0.

On the otber hand, if we are interested in the covariance 8da

between the two derived quantities d and ~ , we have to apply the covariance

law (equation 6.15) and we will end up with

8 da = a~ ( 8 a 2 + sb 2) '

that is Sda ~ 0, and ~X (X= (d, a)) is not a diagonal matrix, even though

the ~L of the primary multisample is diagonal i'.e .. Sab = 0, see the

results obtained in ·E:xample 6. 4 • This is a very important discovery and

should be taken into consideration when using the derived multisample

X = (d, a) for any further treatment in which case we cannot assume that

d and a are uncorrelated any more and we must take the entire ~X into

account.

Example 6. 7: Let us solve Example 6.2 again, but this time we will

consider the primary multisample L = (~ 1 ,~ 2 ,~ 3 ) as

uncorrelated and its ~L is:


148

"~L = rl3~ 300 0041 = diag (3, 3, 4).

From example 6. 2 we have:

=[~ ~3] X= [=~l


0
B l =d

Hence:

I
X = Bl: LBT )

L:x =[:
0

l
-:] 3

0
0

3
0

0
l

0
2

0 0 4 -3 0

=[3: l:l = r xl
82

8
x2xl
xh]
8

82
x2

which again verifies the fact that even when ~L is diagonal

the !'X is not.

On the other hand we can treat x 1 and x 2 separately by using

the law of propagation of errors (since L is uncorrelated)

to get 82 and 8 2 separately; for instance ,


xl x2
axl 2 2 ax ax
2 + (_1)2 82 + (_1)2 82
8
xl
= (a;-) 8 "R; a.Q.2 ,11_2 at 3 ,11_3
l l

= (1) 2 (3) + (o) 2 (3) + (-3) 2 (4)


=3 + 0 + 36 = 39,
which is the same value as we got by applying the covariance

law above.

Example 6.8: To determine the two sides AC = z and BC =y of the plane

c triangle shown in Figure 6.4, the length AB =x along with

the two horizontal angles a and S were observed and their

estimates were found to be:

x = 10 m, with S
X
=3 em,
-a = 90° with sa = 2 "
'
i3 = 45° with ss = 4"
Figure 6.4
'
s aS = -1 arc sec
2
and sXCI. = sx(3 = 0 .
It is required to compute the statistical estimates for

y and z along with their associated variance-covariance

matrix~ in cm2 , where


X = (y, z).

First, we establish the mathematical model which relates the

primary and derived samples, i.e.,

X= F (L), where

From the sine law of the given triangle we get:

sin a.
y-· = _..;..._
2<
sin S
~
= _:;;;.__
sin y
.'
however the angle y is not observed, i.e. it is not an

element of the primary sample, therefore we have to sub-

stitute for it in terms of the observed quantities, say

a and (3 by putting
150

sin y =sin (a+ S),

and we get:

y =X sin a
--~~~-­
sin (a + S)

Z =X sin S
--~~~--
sin (a + S) •

By substituting for a, S and x, we get

"'y = 10 \I 2 = 10 ( l. 414) = 14.14 m 1

z"' = 10 m.

Our mathematical model then can be written as:

(a, S,
(a, S,
x)1 = [x
x)
sin a/ sin (a + S)J
x sin S/ sin (a + S)

To compute LX =B LL BT , we have to evaluate the matrix

-B "W<nic-b: is ·of the form

~ 2:/. 'Oy
\ 'Oet ' as ax
B =
az az 'Oz
aa ' as a:x:

z -y
sin(a + s) tan (a + S) X

=
-z z
tan (a +S) sin ( et' + S) X

From the given data, the matrix E1 takes the form


151

82 s
a. sa.s a.x

= s2 s
IL ssa. s Sx.

s s2
xa. 8xs X

4 -1 0
= -1 16 0
0 0 9

(It is very important to maintain the same seQuence of the

elements of the primary sample in both matrices B and ~L

to give a meaningful ~X.)

Now matching tbe units of the individual elements of B and

~L' keeping in mind that ~X is reQuired in cm2 , results in


scaling the B matrix to

z ( 100) -Y(lOO) ¥...


X
p" sin(a. + S) p"tan(a. + S)

B=
-Z(lOO) y( 100) z
X
p"tan(a. + S) p"sin(a. + S)

where p" = 206265 = 2.10 5 arc sec.

Evaluating the elements of the above B matrix we get:

B =
[0.007 0.007 1.414]
0.005 0.010 1.000

and conseQuently
152

L:x =[· 007


.005
.007
.010
1.4141
1. 000
r 4
J l ~1 1~
-1 0]
~ ~1.414
.007
.007
.005
.010
J
1.000
)

i.e.

.[18. 0009 12.7272] . [.18 13] cm2


12.7272 9.0016 13 9 '
and

S
y
= 118 = 4.2 em J

S
z
= l9 = 3 em.

The results of the above example show that the high precision

in measuring the angles a and S has insignificant effect on the estimated

standard errors of the derived y and z lengths as compared to the effect of

the precision of the measured length x. Hence, one can use the error

propagation to detect the main deciding factors in the primary sample on

the accuracy of the derived quantities and decide on the needed accuracy

of the observations. This process is usually known as pre-analysis which

is done before taking any actual measurements by using very approximate

values for the observed quantities. This results in accepting specifications

concerning the observations techniques to achieve the required accuracy.

Some more details about it are given in section 6.3.5.


153

6. 3. 3 Propagation of Non-Random Errors, propagation of Total :Errors

The idea of being able to foretell the expected magnitude of

the MSE (as a measure of random errors) of a function of observations -

this is essentially what the law of propagation of errors is all about -

is often extended to non-random errors. These non-random errors are

sometimes called systematic errors, for which the law governing their

behaviour is not known. Hence, the values of such non-random errors

used in the subsequent development are rather hypothesized (postulated)

for the analysis and specification purposes.

The problem may be now stated as follows: let us have an

explicit mathematical model

X =f (L) , (6-19)
in which x is a single quantity, f i.s a single-valued function and

L = (~ 1 , ~2 , ••. , ~s) is the vector of the different observed quantities

that are assumed to be uncorrelat~d. We are seeking to determine the

influence of small, non-random errors o~.1 in each observation R.. on the


1

result x. This influence will be denoted by oX •


The problem is readily solved using again the truncated

Taylor's series expansion, around the approximate values 1° = (~~' R.~, ... ,~~),

from which we get:

X
. f(L 0 )
=
of (L-1°)
+ aLl
L = 10
s
= X
0
+ L:
of (~.-~~) (6-20)
i=l F.i
J.
R, • = ~~ J. J.
J. J.
154

By substituting o~. for (~.-~?) and 0 for (x-x 0 ) in equation (6-20)


l l l X

we get:

0
X
= a~.l *) ' (6-21)

which is the formula for the propagation of non-random error.

Note in formula ( 6-21) , the sign~ of both the partial

derivatives (~~i) and the non-random errors o~i' haveto be considered.


(Compare this to formula ( 6-18) .)

We may also ask what incertitude can we expect in x if the

observations ~. are burdened with both random and non-random errors.


l

In such a case we define the total error as:

I= T l(a 2 + 8 2 ) ~ (6-22)

with o being the non-random error and 8 being the M8E. Combining

the two errors in x as given above and using equations (6-18) and (6-21)

we get:

s of 2 s
T
X
= /[~; 1
~
a:r-l ali) +
i=l
z:

s af 2 2 2
= I[ i=l
z: ( (-;-;;-) ( o~. +8. ) } +
o~i
0~.]
l l J

or
s 2
T = ·'j t l: ( ..u:_) 2
T .. + q] ' ( 6-23)
x i=l a~. l:
l

where q may be regarded as a kind of "covariance" between individual non-

random errors, and T. is the total error in the observation~ .•


l l

*For the validity of the Taylor's series expainsion, we can see that
the requirement of o~i being small in comparison to ~i is obviously
essential.
155

As we mentioned in section 4.2, the non-random (systematic)

errors may be known or assumed functions of some parameters. In this

case their influence oX on x can be also expressed as a function of the

same parameters.

Example 6.9: Let us solve Example 6.2 again considering the primary

multisample L = (~ 1 , ~2 , ~3 ) to be uncorrelated with variance-

covariance matrix:

and having also non-random (systematic) errors given as:

same units as the given standard errors,.

It is required to compute the total error in the derived

quantities: x 1 and x 2 according to the mathematical model

given in Example 6.2.

The total errors are given by equation (6-22) as:

We have:

3 ax 1 2
82 = I: (aT"") 82 = 39
xl i=l ~
~.
~
'

3 ax 2 2
82 = L: (aT)
x2
82
~.
= 15 .
i=l ~ ~
The influences o and o- due to the given non-random errors
xl x2
in L are computed from eguation (6-21) as follows:

3
0 = L:
xl i=l

= (1)(-1.5) + (0)(2) + (-3)(0.5)

= -1.5 + 0 - 1.5 = - 3

3 3x 2
0 = L:
().Q., o.Q..
x2 i=l l
l

= (2)(-1.5) + (1)(2) + (0)(0.5)

= -3 + 2 + 0 = - 1 •

Hence, the reguired total errors will be:

T
xl
= IK-3) 2 + 39] = 1[48] =~ '

T = 1[{-1) 2 + 15] = 1[16] = ~·


x2

Example 6.10: Consider again Example 6.6. In addition to the given

information, assume that each height difference h. has got


l

a non-random (systematic) error expressed as oh = k'h.,


l i l

where k' is another constant, a constant of proportionality

between h. and oh . Determine the total error in LlH where


l .
l
s
LlH = HC-HA = L: h. = h1 + h 2 +
l
... + h
s
i=l
The total error in LlH is given by:
157

In Example 6.6, we found that:

s
where k was a constant and iAC =i=l
E i. is the entire length of
~

the levelling line AC.

We can now compute o~H as follows:

s a~H
o~H = i=l
E
3h.
oh.
~
~

where

a~H a~H a~H


- - = -3h
3h1
- = ... =--=
3h 6
1'
2
and

= k'h. ~

Then we get
s s
~H.
o~H = i~l ;'.~: .•. k.'' i~l hi = k'

Finally, the expression for the total error in ~H will be:

6. 3. 4 Truncation and Rounding

In any computation we have to represent the numbers we work with,

whioh may be e~toer irrational like n, e, 12, or rational with very

many decimal places like 1/3, 5/11, etc., by rational numbers with a

fixed number of figures.

The representation can be made in basically two different

ways. We either truncate the original number after the required number
158

of figures or we round off the original number to the re~uired length.

The first :process can be mathematically described as:

(6-24)

where a is the original number assumed normalized*), n is the re~uired

number of decimal :places and Int stands for the integer value.

Example 6.11: ~ = 3.141592 ••••. , n = 3 and we get:

7T =T= ~

= Int (31 41.592 ..• ) 10-3


= 3141 • l0- 3

= 3.141.
The second :process, i.e. the rounding-of~ can be described
by the formulae:

I a ,;, ~ = Int (a•lOn + 0.5)/lOn \ ( 6-25)


in which all terms are as described above. ·

Example 6.12: ~, n =3 and we get:

1T = TIR = Int (TI•l0 3 + 0.5) l0- 3

= Int (3141.592 + 0.5) l0- 3

= Int (31 42.092 .•. ) 10-3


= 3142 . l0- 3

= 3.142.
It can be seen that the errors involved in the above two

alternative :processes differ. Denoting the error in "a" due to

* To normali.ze the number, say 3456.21, we write it in the form 3.45621 • 103.
159

truncation by 8 and the error due to rounding by 8 'we get:


~ ~ .

8 =a - ;, E [0, 10-n)
aT
8 =a- ~ E [-0.5 10-n, 0.5 10-n)
~
and we may postulate that-8 has a parent random variable distributed
aT
according to the rectangular (uniform) PDF (see section 3.2.5):

R( 0.5 10-n , o; 8 ) (6-26)


~
while 8 has parent PDF:
~
( 6-27)

as shown in Figure 6.5.

0 0
PDF of rounding errors PDF of truncation errors

Figure 6.5
160

From example 3.17, section 3.2.5 we know that 0 = q/13, where q equals
-n
half of the width of the R. In our case, obviously q = 0.5 10 so

that (J = 0.289 10
-n
.

Beeause of their different means, the error in truncation

propagates according to the "total error law" and the errors in rounding

propagates according to the "random error law". Hence, if we have a

number x:

.x=f(L), ( 6-28)

where

L=(.Q..), i = l, 2, ... ,s
l

is a set of s numbers to be either truncated or rounded off individually,

we can ~orrite the formulae for the errors in .x due to truncation and

rounding errors in the individual .Q.~s as follows:


l

s s
= l[(l: lL l:
i=l at.l i=l

l
(6-30)
12

This indicates clearly that the error in .x due to the rounding process

is less than the corresponding error due to truncation; and this is

w·hy we always prefer to work with rounding rather than truncation.

Example 6.13: Let us determine the expected error in the sum .x of


1000
a thousand numbers ai~.x = i~l ai, if

( i) the individual values a. were truncated to five decimal


l

places;
161

(ii) the individual values a. were rounded-off to five


~

decimal :places.

Solution:

(i) The error 8 due to the truncation of individual a. is


XT ~

computed from equation (6-29) as follows:

1000
8 =I{[ E ~X . (0.5 . l0- 5 )] 2 +
~ i=l oai
1000 ~x 2 1 10
+ .E (-~-) ( 12 10- )}
~=l oai

=1{[0.5 . l0- 5 . 10 3 ] 2 + _! . lo- 10 . 10 3 }


12

= I{~4 . lo- 4 + _! . l0- 7}


12

= l{lo- 8 (2500 + 0.833)}

• 0.005001 = 0.005 .

(ii) The error 8 due to the rounding bf individual a. is


~ ~
computed from equation (6-30) as follows:

1000 2
(lL) ( - l . 10-10)}
8 = I{ E
XR i=l aa.~ 12

l . 10-10)}
= I{ (looo) (-
12

= l{lo- 8 (0.833)}
. 0.000091
'
which is much smaller than the corresponding 8
XT
162

6.3.5 Tolerance ~imits, Specifications and Preanall!!!_

Another importantant application of the propagation laws for

errors is the determination of specifications for a certain experiment

when the maximum tolerable errors of the results, which are usually

called tolerance limits, are known beforehand. Such process is known

as pre-analysis. The set-up of the specifications should therefore

result in the proper design of the experiment, i.e. the choice of

observation techniques, instrumentation, etc., to meet the permissible

tolerance limits.

The specifications for the elementary processes should account

for both the random and the inevitable non-random (systematic) errors.

This is, unfortunately, seldom the case in practice. It is usual to

require that the specifications are prescribed in such a way as to meet

the tolerance limits with the probability of approximately 0.99. If

we hence expect the random errors to have the parent Gaussian PDF,

the actual results should not have the total error, composed of the non-

random error o and 2. 5 to 3 times the RMS , which corresponds to

probability of 99% •larger than the prescribed tolerance limits, i.e.

IT ~ 1{0 2 + 2
(3a ) ), (6-31)

Example 6.14: Assume that we want to measure a distanceD = 1000 m,

with a relative error (see 4.10) not worse than 10-4 , using a

20 m tape which had been compared to the "standard" with a

precision not better than 3cr < 1 mm, i.e. tolerance limits

of the comparison were ~ l mm • Assume also that the whole

length Dis divided into 50 segments d.,


~
i = l, 2, .•• , 50,
each of which is approximately 20 m. Providing that each

segment d. will be measured only twice: forward F. and


1 1

backward B., what differences can we tolerate (accept or


1

permit) between the back and forth measurements of each

segment?

Solution:

The tolerance limits in D, i.e. the permissible total

error in D, is given by

TD = lOOCm. 10
-4 = 0.10 m = 10 em.

This total error TD is given by

where cD is the non-random (systematic) error in D, aD is

the random error in D and the factor 3 is used to get probability

> 99% according to the assumed Gaussian PDF. Knowing that

50
D = i=l
L: d~, where d. = -12 (F.+B.)
... 1 1 1

we get:

50 3D
L: cd.
i=l
ad.1 1

where

Hence,
50
L: 1· c.1 <- 50 mm =5 em.
i=l
164

Thus, we must require that:

em2

or
2 75 cm2 , :8-,. 33 2
aD !.. -9 = _.. - em

in order to meet the specifications.

Denoting the MSE in the individual segm~nts di·by

ad = ad (all assumed equal) we get


i
2 50 2
a = t a
D i=l di

from which we obtain

Rememb.ering that each segment di is given by:


' 1
d.1. = 2 (F.1. +B.)
1.
,

and denoting the MSE in either Fi or Bi (both assumed equal)

by a we get:

2 2 ad. 2 2" adi 2 2


ad =a
d = 1
<aF ) aF + (air") aB
i i i i i
2 2
= (1)2
2
a2 + (1)
2
a

and
2 2
o ~ 2od = 2(0.lb) = 0.33 em.

Recalling that we want to know what differences between the

forth and back measurements can we tolerate, and denoting

such differences by ~., we can write:


2

~. =F.-B.
2 2 2

Then:
2
3~. 3~. 2
2 2 2 2
a~. = 0~ = (3F~) oF. +(3B~) OB
2 2 2 2 2

= (1)202 + (-1)202

= 2o 2 •

Thus, we end up with the condition:

2 2 - 2
o~ ~ 2o = 2(0.33) = 0.66~ em

or

o~ ~ 0.816: 0.8 em.

This means that if we postulate a parent Gaussian PDF for

the differences ~' the above o~ is required to be smaller

or equal than the RMS of the underlying PDF. Consequently,

the specifications will be as follows: We should get 68%

of the differences ~'s with~ o~, i.e. within+ 0.8 em, and

95% of~ within~ 2o~, i.e. within~ 1.6 em. These specifications

are looser than a man with an experience in practice would expect.

It illustrates the fact that in practice the specifications are

very often unnecessarily too stringent.


166

6.4 Problem of Adjustment

6.4.1 Formulation of the Problem

Let us resume now at the end of section 6.2 where we have

defined the proper problem of adjustment as the transition


"
(E, E1 ) + (X, EX) ( 6. 32)

for an overdetermined mathematical model

F(L, X) =0 • (6.33)

By "overdetermined" we mean that the known E contains too many components

to generally f.it the abbve model for whatever X we choose, i.e. yielding

infinite number of solutions X • The only way to satisfy the model ,

i.e. the prescribed relations,is to allow some of or all the E to change

slightly while solving for X. In other words, we have to regard E as


an approximate value of some other value L which yields a unique solution X

and seek the final value L together with X.

Denoting

L - E =v (6.34)
we may reformulate our mathematical model (6.33) as:

"
F(L, X) = F(L + V, X) =0 (6.35)
where V is called the vector of discrepancies.

Note that V plays here very much the same role as the v's

have played in section 4.8. From the mathematical point of view, there

is not much difference between V and v. However, from the philosophical

viewpoint, there is, because V represents a vector of discrepancies of


s different physical quantities (see also section 5.4)while v was a vector

of discrepancies of n observations of the same physical quantity. To

show the mathematical equivalence of these two we shall, in the next

section, treat the computation of a sample mean as an instructive adjust-

ment problem.

6.4.2 Meafi,of a·sample as an Instructive Adjustment Eroblem,weights

Let us regard a random sample L = (~ 1 , ~ 2 , . . . ,~n) of n

observations representing one physical quantity L as uncorrelated estimate

of the mean . Further we shall denote the definition set of L by 1,

where 1 = (1 1 , 1 2 , . . . ,1m) consists of only m distinctly different

values of ~'s. Let us seek an estimate x, satisfying the mathematical

model

X =~ (6.36)

representing the identity transformation. Evidently, the model is

overdetermined because the individual 1.,


J
j = 1, 2, . • 'Jm, are

different from each other and cannot therefore all satisfy the model.

So, we reformulate the model as:

j = 1, 2, . . . , m (6.37)

where the v's are the discrepancies. We have to point out that, although

we seek now the same result as we have sought in section 4.7, t.he formu-

lation here will be slightly different to enable us to use analogies later

on. While we have been taking all the n observations into account in section

4.7, we shall now work only with them distinctly different values 1.,
J
168

j = 1, 2, ••• ,m, that constitute the sample L*t


Thus we shall have to compute the mean I from the second

formula introduced in section 3.1.3 (equation (3.4)}, i.e.:

m: -
I: t j
i = j=l. P(ij·· }= ~ i."J
.1=1
PJ ' (6.38)

rather than the first (equation (3.3)) as used in section 4.T. Here,

according to section 3.1.3, Pj = cj/n with cj, being the count of the

same values tj in the original sample t•containing all n observations.

Hence Pj are the experimental (actual) probabilities. In other words, if·

we wish x to equal i, the.model (6.37} yields the following solution:

m
X= I: . 'l pj J (6.39}
j=l . .1

or

(6.40)
in vector notation,

The coefficients Pj are called weight coefficients,or simply


.....
weights,and xis called the weighted mean- analogy borrowed tram mechanics

(see section 3.1.3}. Note that, with the weights being nothing else but

the experimental probabilities, we put "more weight" on the val.ues with

which we are more "certain", i.e. which are repeated more often in the

sample, which is intuitively pleasing.

* L = (i , "i , • • •, "i )
can be regarded in this context as a sample of
"groupe~" o~servation~, i.e. each constituent t , j = 1, 2, • ·• .,m,
has a count ( frequenby). cj associated with itjin the original sample
L •
169

In our slightly different notation even the least-squares prin-

ciple, as formulated in section 5.3, would sustain a minor change. While

.
we were see k ~ng such Nno as t o rnak e

1 2 n n o 2
v. l: = ;1 l: (~ - ~ ). (6.41)
. 1 ~
n ~= i=l i

minimum, we would have to write now the·condition of minimum variance as:

m 2
min [ l: P.v. (6.42)
~o 8 R j=l J J

where v:j = ~. - ~ 0 • In matrix notation, (6.42) becomes:


J

( 6. 43)

where P is a diagonal matrix, i.e.

(6.44)

The latter formulation, i.e. equations 6.42 and 6.43 is more

general since we can regard the former formulation, i.e. euqation 6.41 as

a special case of (6.42) and not vice-versa. We have


n 2 n 2
1
l: V, = l: P.v.
n ~ ~ ~
i=l i=l

which implies that P. =


~
l,
n
fori= 1, 2, ••• , n are equal weights for all

the observations i!. Hence we shall use (6.43) exclusively from now on.
~

The same holds true even for the two formulae for i and we shall use

equation (6.40).

Note that once we apply the condition 6.43, the discrepancies

cease to be variable quantities and become residuals (see 4.8). We shall

denote these residuals by v.


170

Equation 3.7 can now bF> obviously written as

m A
s 2 = 2
E P. v j (6.45)
L J
j=l

or in matrix notation,

[s L
2 = VT PV.
l (6.46)

Consequently, we shall restate the least-squares principle as follows:


T
the value x that makes the value of the quadratic form V PV the least

ensures automatically the minimum variance of the sample L. This property

does not depend on any specific underlying PDF. If L has got normal
A

parent PDF (or any symmetric distribution), xis the most probable estimate

of x, which is sometimes called the maximum likelihood estimate of x.

6.4.3 Variance of the Sample M.ean

We have shown that the simple problem of finding the mean of

a sample can be regarded as a trivial adjustment problem. Hence we are

entitled to ask the question: What will be the variance-covariance matrix

of the result as derived rrom the variance-covariance matrix of the original

sample ? In other words, we may ask what value of variance can be as soc-

iated with the result - the mean of the sample.

The question is easily answered using the covariance law

(section 6.3.1). We have established that (equation 6.40):


A T-
x =P L •

Hence, by applying the covariance law (equation 6.15)we obtain:


171

l:" = Bl: L -BT = = s~


X X

i.e.
s* X
= ( 6. 4 7)

Here l:L is not yet defined. All we know is that L- (i 1 , i 2 ,•. im)

is a sample of "grouped" observations £. with different weights (observed


~

probabilities) P. associated with them. Let us hence assume these obser-


~

vations uncorrelated and let us also assume that there can be some

"variances" S~ attributed to these observations. In such a case, the


R,i
variance covariance matrix of L can be expressed as
-
2
l:- = diag (Si I s- ,
2
• • • I s~ ) (6.48)
L £2 R,
1 m

Substituting (6.48) into (6. 47) I we get:


m 2
s~
X
= l: P. s2-
J £.
*) . ( 6. 49)
j=l J

On the other hand the value of x (i.e. the sample mean) can be

computed using the original sample of observations, L = (t 1 , t 2 , ... , tn),

i.e. the ungrouped observations ti' i = 1, 2, ... , n, which all have equal

experimental probabilities (equal weights) of 1/n, yielding:

n
1
X = n1 l: £. =
~ n
(6. 50)
:.i=l
2
Hence, we can compute the variance of the mean, i.e. S", again by applying
X

the law of propagation of errors on (6.50), and we get:

n 2 n
l: = (!) l: (6.51)
n
X i=l i=l

*It should be noted here that since L <i 1 , i 2 , ••. , i) is a sample of =


group observations, for which a differen~ weight Pj (ex~erimental proba-
bility) is ~ssociated with each_element £., j = 1, 2, ••. , m, the individual
variances s I assigned to the t. are, inJgeneral, different from each other,
i.e. they j vary with the grortps of observations.
172

in which all the 'varianc~ s~. are again assumed to have the same value
~

and equal to the sample variance s 2L given by


n 2 A

2: ( ~. ;;.. x) . (6.52)
~
i=l

Equation (6.51) then gives:

s~ = (6.53)
X

which indicates that the variance of the sample mean equals to the variance

of the sample computed from equation (6.52) divided by the total number

of elements of the sample*~

We thus ended up with two different formulae, (6.49) and

(6.53), for the same valueS~.


X
In the first approach, we have regarded

the individual observations (really groups of observations having the same

value) as having different variances s 21 associated with them. The second


j
approach assumes that all the observations belong to the same sample with

Numerically, we should get the same value of S~ from both


2
variance SL.
X
formulae, hence
m 2
2 8L
2: (P 2
j
s-~. ) =-
n
(6.54)
.··j=l ~

Let us write the left hand side. of ( 6. 54) in the form:

m
I: (P 2
j
j=l

* In terms of our previous notation, we can write the variance of the


sample mean as

n •
173

and the right hand side in the form:

8 2 n 8 2
L 1
--=
n n L: (l)
n
i=l

Using the oame manipulation as in section 6.4.2 when dealing with the

V's and ~'s, and also earlier, in section 3.1.4 when prooving equation

(3.4), the right hand side can be rewritten as:


2
n 8L m 82
1
n
E (-) = E [P. (_&_)]
i=l n j=l J n

in which P. has the same meaning as in (6.49).


J
Now, the condition (6.54) becomes:

m 82
L
E [ PJ. (--;-) ], (6.55)
j=l

which can

K, j = 1, 2, . . . , m, (6.56)

where K is a constant value for a specific sample that equals to the

variance of the sample mean. From (6.56) we get:

8tj = p
K
j
' j = 1, 2, .•. , m, (6.57)

which shows that in order to get the correct result from (6.49) we have to

assume that in the first approach the individual observations have

variances inversely proportional to their weights.

This result is usually expressed in the form of the following

principle: the weight of an observation is inversely ;proportional to its

variance, i.e.

(6.58)
174

We can also write using equation (6.57):

= 1·s 02 = K (6.59)

2
where s 0 , constant for a specific sample, is known as the variance of unit

weight. It can be interpreted as the variance of an imaginary observation

whose weight equals to one. In the case of sample mean I, S0 equals to si.
From equations (6.46) and (6.53) we can write:

S~ = ~lPv ( 6. 60)
x n

This result will be often referred to in the subsequent development.

We have to point out that the whole argument in this section

hinges on the acceptance of the "variances" s~. and s 2R:.. They have been
1 1
introduced solely for the purpose of deriving formulae (6.53) ,(6.58) that

are consistent with the rest of the adjustment calculus •. The more rigorous

alternative is to accept the two formulae by definition.

6.4.4 Variance Covariance Matrix of the Mean of a Multisample

We have seen in section 6.4.3 that the mean I of a sample L

has also a standard deviation Si associated with it. This standard devi-

ation is l(n)-times smaller than the standard deviation SL of the sample

itself and can be interpreted as a measure of confidence we have in the

correctness of the mean ~. Evidently, our confidence increases with the

number of observations.

We can now ask ourselves the following question: Does the mean
-
L of a multisample L also have a variance-covariance matrix associated with

it? The answer is - there is nothing to prevent us from defining it by

generalising the discovery from the last section. We get

s2- s- - s- -
~1 ~1 ~2 ~l~s

E- = s- - s2- s- - (6.61)
L ~2.Q,l ~2 .Q,l~s

s- - s2-Q,
2s 21 s
175

where
l 82
n. !/.,.
1 1

and

= Ls
n. !/.,,!/.,,
1 1 J

Here we have to require again that ni = nj, i.e. that both components of

the multisample have the same number of elements (see section 3.3.5).

Obviously, if this requirement is satisfied for all the pairs of components

we have

= ns = n
and
(6.62)
By analogy, the variance-covariance matrix obtained via the

covariance law (see section 6.3.1) from the variance-covariance matrix of

the mean of multisample is associated with the mean of the derived multi-

sample, or statistical estimate X. We say that

r"" = BEEBT I
is the variance-covariance matrix of the statistical estimate X, i.e. of
(6.53)

the solution of uniquely determined mathematical model

X = F(L) •

Similar statements can be used for other laws of propagation of errors.

Development of these is left to the student, who should also compare results

of this sections with the solution of Example 6.14.


176

Example 6.15: Let us take again the experiment described in Examples

6.1, 6.3 and 6.4. This time we shall be interested in

deriving the variance-covariance matrix EX of the solution

vector X.

Solution: First we evaluate EE from eq. (6.61). We obtain


2 1 2 o.oo4 cm2 2
S- = - S = 5 = 0.0008 em )
a 5 a

__o 2
--..;....;·0;..:50;..::.5.;:;.6...:c;::;m=-- = 0.0011 em2

Since Sab =0 we get

E- =[ 0. 0008 0 ] 2 1
em = 5 ~L •
L 0 0.0011

NowJZ::X can be evaluated from eq;u.ation,(i).;6s) and-we have

or
2
zA =fo. ooo81 em 0.1079 cm3 ]
4
X 0.01079 21.51254 em

Thus the standard deviations of the estimates d and aare

given by

1(0.00081 cm2 ): 0.028 em ,

1(21-.51254 cm4 ) = ..;..4.;...;.6....;.4....;c=m'2-•

6. 4. 5 The Method of L.east-s quares, Weight M.atrix

The least~squares principle as applied on the trivial identity

transformation 1i.e. the sample mean,can be generalized for other mathematical

models. Takiqs the general formulation of the problem of adjustment as


177

described in section 6.4.1, i.e.

[ F(L + v' X) = 0' I


we can again ask for such X that would make the value of the quadratic

form of the weighted discrepencies, VTPV, minimum, i.e.

min n
A
(6.64)
XER

The· condition (6. 64) for the majority of mathematical models, is enough to specify

such X =X uniquely. The approach to adjustment using this condition became

known as the method of least-squares.

The question remains here as how to choose the matrix P. In

the case of the sample mean we have used

2 2 2
P = diag (K/Sl , K/S l , . . . , K/S l ) ,
l 2 m

that is
2
1/S I ).
m

Using the notation developed for the multisample, this can be rewritten as:

Q
which indicates that the matrix P is obtained by multiplying the constant
(6.65)

K by the inverse of the variance-covariance matrix of the means of observa-

tions. This is in our case a diagonal matrix as we have postulated the

sample L to be uncorrelated.

We again notice that, mathematically, there is not much dif-

ference between a sample and multisample - they can be hence treated in much

the same way. Thus, there is not basic difference between the apparently
178

trivial adjustment of the sample mean and the general problem of adjust-
A'

ment. The only difference is that in the first case X is a vector of one

component, while generally it may have many components.

This gives a rise to the question of what would be the role of

K (K having been a scalar equal to s?X in the adjustment of the mean of sample),

in the least squares method, where X has several constituents. Let us just

say at this time that we usually compute the weight matrix P, as it is called

in the method of least-squares,as

p =K l: -1 (6.66)
E
where K is an arbitrarily chosen constant, the meaning of which will be

shown later. This can be done because, as will also be shown later, the

solution X is independent of K since it does not change the ratio between

the weights or variances o:t:. 4he individual observations.

In this course we shall be dealing with only two particular

mathematical models which are the most frequently encountered in practice.

In these models, we shall use the following notation:

n for the number of constituents of the primary or original multisample L;

u for the number of constituents of the derived, or unknown(to be derived)

. multisample X;

r for the number of independent equations ~elationships) that can be for-

mulated between the constituents of L and X.

Moreover, we shall consider these models to be linear.

The first model is

AX =L , (6.67)
179

in which A is an n b u matrix, X is a u b 1 vector and L is an n by 1

vector (n = r > u) • The adjustment of this model is usually called

parametric adjustment, adjustment of observation equations, or adjustment

of indirect observations, etc.

The second model is

BL =C (6. 68)

in which B is an r by n matrix, Lis n x 1 and Cis r x 1 vectors (r < n).

The adjustment of this model is known as conditional adjustment, adjustment

of condition equations, etc.

The two mathematical models are evidently quite special since

they are both linear. Fortunately many problems in practice, although

non-linear by nature, can be linearized. This is the reason why the two

treated models are important.

6.4.6 Parametric Adjustment

In this section, we are going to deal with the adjustment of

the linear model (6.67), i.e.

AX + C =L (n > u) (6.69)

which, for the adjustment, will be reformulated as:

AX - (L + V) =0
or
180

v = AX - L*) • (6. 70)

Here A is called the design matrix, X is the vector of unknown

parameters, L is the vector of observations, (L = L* - C where L* is the

mean of the observed multisample), and Vis the vector of discrepancies,

which is also unknown. The formulation (6.70) is known as a set of

observation equations.
" that would minimize the quadratic
We wish to get such X = X

form VTPV in which P is the assumed weight matrix for the observations

L (see the previous section). This quadratic form, which is sometimes

called the quadratic form of weighted discrepancies, can be rewritten

using the observation equations (6.70) as


- T -
= (AX - L) P(AX - L)

= ((AX)T- LT) (PAX- PL


T T -T T T -T -
= X A PAX - L PAX - X A L + L PL (6.71)

-1
From equation (6.66) we have P = K E_ , where K is a constant scalar and
L
Er; is the variance-covariance matrix of L. Since E- is symmetric, the
L

weight matrix P is symmetric as well and PT = P. We can thus write

since it is a scalar quantity.

Substituting (6.72) into (6.71) we get

* If we have a non-linear model


L = F (X)
it can be easily linearized by Taylor's series expansion, i.e.
o ClF 0
L = F(X
) + -~
ClX X=Xo
(X-X ) + • • • I
0
in which we neglect the higher order terms. Putting l\X for X-X , l\L for
L-F(XO) and A ( a matrix) for ClF/ClXIx=xo we get
l\L ,;, Al\X •
This is essentially the same form as equation (6.69). However, in this
case we are solving for the corrections l\X to the approximate value x0 of
the vector X, instead of solving for X itself.
181

(6.73)

The quadratic function (6.73), called sometimes the variations

function, is to be minimized with respect to X. This is accomplished by

equating all the partial derivatives to zero, i.e.

i = 1' 2' ... , u, (6. 74)

and we obtain, writing d/dX for the whole vector of partial derivatives

d/dXi,

which can be rewritten as:

or by taking the transpose of both sides we get:

r (ATPA) X = ATPL 1+) (6. 75)

This system of linear equations is called the system of normal equations

which can be written, as often used in the literature, in the following

abbreviated form:
~

N X= U (6.76)

where N = (ATPA) is known as the matrix of coefficients of the normal

equations, or simply the normal equation matrix and U = ATPL is the vector

of absolute terms of the normal equation.

The system of normal equations (6.76) has a solution X

* From matrix algebra we know that if A is a symmetric matrix and X is a


vector we get:
d
ax AX

+ Note that the normal equatiomcan be obtained directly from the mathemati-
cal model by pre-multiplying it by ATP •
182

given by

(6. 77)

if the normal equation matrix, N = ATPA, has an inverse. Note that N

is a symmetric positive definite matrix.*)

To discuss the influence of the weight matrix P on the solution


A

vector X, let us use a different weight matrix, say P', such that

P' = yP (6. 78)

where y is an arbitrary constant. Substituting (6.78) into (6.77) we get:

X' = (ATP'A)-l (ATP'L)

= (ATyPA)-l (ATyPL) (6. 79)


1 T -1 T -
=y (A PA) y(A PL)
A

=X
This result indicates that the factor K in equation (6.66) for computing

the weight matrix P from LL' can be chosen arbitrarily without any influ-
A

ence on X, which really verifies the statement we have made earlier, in

section 6.4.4.

It should be noted that the vector of discrepancies V as defined

in (6.70), becomes after minimization of the vector of residuals (see 4.8)

of the observed quantities. As such, it should be again denoted by a

different symbol, say R, to show that it is no longer a vector of variables


A

(function of X) but a vector of fixed quantities. Some authors use V

for this purpose and this is the convention we are going to use (see also
A

6.4.2). The values v. are computed directly from equation (6.70) in the
~

same units as these of the vector L. - Then the adjusted observations will

be given by £ =L+ V.

* ATmatrix say N, is positive definite if the value of the quadratic form


Y NY is positive for any vector Y (of the appropriate dimension) .
183

We should keep in mind that one of the main features of the

parametric method of adjustment is that the estimate of the vector of

unknown parameters, i.e. X, is a direct result of this adjustment as

given by equation (6.77).

At this stage, it is worthwhile going back to the trivial

problem of adjustment- the sample mean. According to the equation (6.79),

we can choose the weights of the individual observations to be inversely

proportional to their respective variances with an arbitrary constant K

of proportionality. This indicates that the weights do not have to equal


n
to the experimental probabilities for which E P = 1, as we required
i=l i
in sections 6.4.2 and 6.4.3. In this case, the observation equations

will be

x = ~l + v 1 , with weight p 1 ,

A
x = tn + v , with weight P .
n n
Or, in matrix form

AX
A
=L - A
+ V

where

1 ~
1 iJ.
2
A= -L
.
1 t
n
184

with weight matrix, P = diag (P1 , P 2 , · •· , Pn).

Substituting in equation (6.77) we get the solution, i.e. the weighted

mean of the sample, as


n
L: p.L
A i=l ~ ~ (6.80)
X=
n
L: p.
i=l ~
n
which agrees with the result in section 6.4.2. _,when L:pi equa]s to one.
i=l
Formula (6.80) is the general formula used to compute the weighted mean

of a sample of weighted observations.

Exa;mple 6.16: Let us have a levelling line connecting two junction points>

G and J, the elevations of which,HG' HJ,are known. The

levelling line is divided into three sections d1 , d 2 and d 3

long. Each level difference h1 , h 2 and h 3 was observedJwith


-
resUlts h1 , h2 and h 3 • - The observations h. are considered
~

uncorrelated with variances proportional to the corresponding

lengths d., i = l, 2, 3.
~

It is required to determine the adjusted values of the

elevations of points land 2, i.e. H1 and H2 respectively,


G using the parametric adjustment.
Figure 6.6
Solution

From the given data we have:number of observations n=3 ;

number of unknowns u = 2. Therefore, we have.one redund-

ant observation. The independent relationships between

the observationsand the unknowns are written as follows

(each relation, corresponds to one observation,):


The above relations can be rewritten in the general form

used in the previous development:

A X =L
3,2 2,1 3,1
where X = (Hl' H2) and

Hl = hl + HG = Ll )

-H1 + H2 = h2 = L2
-H = h - H = L
2 3 J 3

Putting this in matrix form, we get

1 0

-1

0
1

-1
[::l =

The corresponding set of observation equations are:

Hl = HG + ( hl + v 1 ) '

-Hl + H2 = ( h2 + v 2) '
H2 = -HJ + (ii 3 + v 3 ) .

These observation equations can be written in matrix form

as:

V = A X E
3,1 3,2 2,1 3,1
186

where:

vl (iil + HG)

v
3,1
= v2

v3
X =
2,1 [::1 1 =

(h3
ii2

- HJ)

and the design matrix A is given by

'A
3,2
= r-~ ~1
0 -1 •
We assumed that the observed values hl' h2 and h 3 are
-
uncorrelated. We will also assume that HG and HJ are

errorless, Hence:
2
L:- = diag (S-2 ' S- S-2
1 h. h2 l
' h3
But S-2 is proportional to d., i = 1, 2, 3;
h. l
l
thus

Further, we choose K: =1 and we get

p = K L:-
1
-1 .
= d1ag
1 1
(d, d' L) .
1 2 d3
Applying the method of least-squares the normal equations

are
A
N X = u
2,2 2,1 2,1 '
where
1
N = ATPA = -1 0 0 1 0
dl
[: 1 _: J 0
1
0 -1 1
d2
1
0 0 0 -1
d3
This giv:es
187

1 1
(L +d),
dl
--
d2
2
N
=
2,2
--
1
d2
'
1 1
( -d2 . + -
d3
)

and 1
-l 0 0 (hl + HG)
T-
U = A PL =
r: 1 _:l dl
0

0
1
d2
0
0

.L
h2
(ii 3 - HJ)
d3

Hence

u =

A
The solution X is given by
(\
-1
X N u
2,1 = 2,2 2,1
where
-1 dl d2 d3
N (L + L)
= (dl+ d2+ d3) d2 d3

1 (.L + L)
d2 ' dl d2

Performing the multiplication N-1u and realizing that

Uow, we compute the residuals vi from the equation

V = AX - L and find

i = 1, 2, 3.
188

Finally, we compute the adjusted observations from

L=L+V.
Remembering that HG and HJ are assumed errorless we get:

h.l. = h.l.
+ v.'
l.
i = 1, 2, 3.

Example 6.17: A local levelling network composed of 6 sections,shoWn in


Figure 6.7,was observed. Note that the arrow heads indicate

the direction of increasing elevation • The following

table summarizes the observed differences in heights h.


l.

along with the corresponding length of each section.

Section ; , St~tions - ii. l. length 1.


(km) l.
No. from to (m)

1 a c 6.16 4
c
2 a d 12.57 2

Figure 6.7 3 c d 6.41 2

4 a b 1.09 4

5 b d 11.58 2

6 b c 5.07 4

2
Assume that the variances Sh., i = 1, 2, .•. , 6, are
" l.
proportional to the corresponding lengths t.. The elevation
l.

H of station a is considered to be 0 metres. It is


a

required to adjust this levelling net by the parametric

method of adjustment and deduce the least-squares estimates


A A
Hb, He, and Hd for the elevations flb' He and Hd of the

p6ints b, c, and d.
189

Solution:

From the given data we have - number of independent obser-

vat ions: n = 6, number of unknowns: u = 3. Hence we have

3 redundant observations, i.e. 3 degrees of freedom .

Our mathematical model in this case is linear, i.e.

A X =· L 1
6,3 3,1 6,1
where
X = (~, He, Hd) •
3,1
The 6 independent observation equations will be(one

equ~tion for each observed quantity):

hl + v·l = Hc H
a = Hc o.o = H
c'

h2 + v 2 :::
Hd H
a
::: Hd - o.o = Hd,

h~ + v 3 :::
Hd H
c

h4 + v4 :::
~ H
a
:::
~- o.o :::
~ '

h5 + v 5 = Hd ~'

E6 + v 6'· = Hc - ~ .
The above set of equations can be rewritten i<l the following

form, after substituting the values of h.:


].

v
1
=
/.
H
c
6.16 ,
A

v2 :::
Hd 12.57
A
·'·
v3 = -Hc + Hd 6.41
A
'
v4 = Hb 1.09
'· .,
v
:J
= -~ .... "Hd 11.58
/'.

v6 = -~ + Hc 5.07
In matrix form we can write
190

v ... A X E
6,1 6,3 3,1 6,1
where

vl 6.16
v2 12.57
v3 H 6.41
b
v = v4 X =
H
E = 1.09
6,1 3,1 c 6,1
v ..) 11.58
Hd 5.07
v6
and the design matrix, A,is

0 1 0
0 0 1
0 -1 1
A = 1 0 0
6,3
-1 0 1
-1 1 0

since we have no information about the correlation between

h.,
l.
we will treat them as uncorrelated. Hence, the variance-

covariance matrix EE of the observed quantities will be:

EE = diag (~, 2, 2, 4, 2, 4)
6,6

understanding that· the constant factor K is assumed one.

The corresponding weight matrix is given as:

p = diag (0.25, 0.5, 0.5, 0.25, 0.5, 0.25).


6"6
The normal equations are

N X & u
3,3 3,1 3,1
yielding the solution
-1 u
X = N
3,1 3,3 3,1
where
191

N = AT P. A
3,3 3,6 6,6 6,3
Thus:

0 0 1 -1 0.25 0 0 0 0 0
N=
[:
0
1
-1
1
0
0
0
1
-:] .0
0 0.5
0
0
0.5
0
0
0
0
0
0 l(

0 0 0 0.25 0 0
0 0 0 0 0.5 0
0 0 0 0 0 0.25

0 1 0
0 0 1
0 -1 1
X
1 0 0
-1 0 1
-1 1 0

and
[ 0 0 0
N ;: 0.25 0 -0.5
0.25
0
-0.5
0
-0.25
0.25
J 0
0
1
0
0
1
0 0.5 0.5 0 0.5 0 0 -1 1
1 0 0
-1 0 1
-1 1 0

Finally:

N = ll.OO-0.25
-0.25
1.00
-0.5 ]
-0.5
3,3
-0.5 -0.50 1. 50

Note that N is a symmetric,positive-definite matrix.


192

Hence:

0.81.6
0.81
N
-1 = [ 0.8
1.6 0.8
3,3 o.8 o.8 1.2
Computing U =ATPL- , we get.

-0.25]
U=l0.~5
0 0 0.25 -0.5 6.16
0 -0.5 0 0 0.25 12.57
3,1 0 0.5 0.5 0 0.5 0 6.41
1.09
11.58
5.07

and

-6.78501
u = [ -0.3975 .
3,1 15.2800
Performing the mult~plication N-l U, we get X as:

1.6 0.8
o.8J [-6. 785ol 1.051
x [
= o.8 1.6 0.8 -0.3975 = [ 6.16
3,1 o. 8 0.8 1.2 15,2800 i2.59
Therefore, we have obtained' the following· estimates

~ = 1.05 m,

H
c
= 6.16 m,
Hd = 12.59 m.

By substituting the values of X we get the residual vector


-
V for the observed h. from the equation
l.

V =A X - L .
193

Namely:

o.oo m
0.02 m
v = 0.02 m
6,1
-0.04 m
-0.04 m
0.04 m

The adjusted observations h are computed from:


+ v.~ i = 1, 2, •.• , 6

and we get:

6.16 0.00 6.16


12.57 0.02 12.59
6.41 0.02 6.43
A

h = 1.09 + -0.04 = 1.05


11.58
5.07
l-0.04
I

0.04
11.54
5.11

in metres.

The computations can be checked by deriving the heights of

points b, c and d from Ha using the adjusted hi. The resulting values

must not differ from the adjusted values Hb' He and Hd.

6.4.7 Variance-Covariance Matrix of the Parametric Adjustment Solution

Vector, Variance Factor and Weight Coefficient Matrix

"' is given by equation


The parametric adjustment solution vector X

(6.77), i.e.
194

This can be written as:

X= BE (6.81)

where

(6.82)

The variance-covariance matrix EX of the solution vector X can be easily

deduced by applying the covariance law (equation (6.15)) on (6.81);

we get:
LX= BEE BT. (6.83)

From equation (6.66), we have


p = KI:--1
L
and inverting both sides we obtain
-1
EL = K P , (6.84)

Substituting (6.82) and (6.84) into (6.83) we get


EA = (N-lATP) K p-l (N-lATP)T. (6.85)
X
Both P and N are symmetric matrices, so that we can write:
PT = P,
NT= N and .( N-l)T =· N-1 •
substituting this into (6.85) we get
E"
X
=K N-l AT P P-l P A N-l

=K
that is

("X =K -N
On the other hand, by. putting p ·=
-1
=K (ATPA)-l,

KL-
,;.1
1
in (6.86) we get
(6.86)

L
E" =! K (AT E--1A)-l = (ATEE -lA)-1 , (6.87)
X 1C L

which shows that EX does not depend on the choice of the factor K. In fact,

this statement is valid only if we know the correct values of the elements
195

of LE • Unfortunately, however, LE is often known only up to a scale factor,

i.e. we know the relative variances and covariances of the observations


-1
only. This means that we have to work with the weight matrix KE-L
without knowing the actual value of the factor K. Therefore ~X cannot be

computed from equation (6.87).


"T "
If we develop the quadratic form V PV *)considering the obser-

vations L to be influenced by random errors only, we get an estimate "'K


for the assumed factor K given by

VTPV = ( n - u) ~ • (6.88)
The multiplier in the right-hand side is nothing else but the difference

between the number of independent observations and the number of unknown

parameters, i.e. the number of redundant observations, which is sometimes

denoted by'lif»and called the number of degrees of freedom, i.e.

df=n-u. (6.89)
df must be greater than zero in order to be able to perform a least-squares

adjustment. Hence equation (6.88) becomes

K = (6.90)
Usually)in the literature, K is known as the a priori variance factor and

K is called the least-squares estimate of the variance factor,or,simply,

estimated or a posteriori variance factor~ T.h& estimated variance factor


can be now used instead of the a priori one, yielding an estimate of ~X

* Here, the vector V is the vector of residuals from the least squares
adjustment .
196

" -1
l:"
X
= K" N = K(ATPA)-l

"T"
V PV (ATPA)-l
= (6.91)
df

which is known as the estimated variance-covariance matrix of x.


To discuss the influence of the chosen variance factor K in
-1 "
the weight matrix P =K Z:L on Z:x, as defined by (6.91), we take another

factor, say K ' • We obtian P' = K'Z:-L -1 = yP. Substituting in equation

(6.91) we get:

"T "
~~ = y (V PV) ~ ( Tp )-1 = ~"
6x df Y A A 6x
The above result indicates that fx given by equation (6.91) is independent

of the choice of the a priori variance factor K. We recall that the same
"
holds true for the estimated solution vector X (equation 6.79).

It often happens in the adjustment calculus, that we have to use

the estimated parameters Xin subsequent adjustments as "observations".

Then we have to take into account their respective weights. We know that

the weight matrix of an observation vector must be proportional to the

inverse of its variance-covariance matrix (equation 6.66). Thus, we can

see that the matrix of normal equations, N, can be immediately used as the

weight matrix of the vector X, since the inverse N-l is proportional to the
-1
variance-covariance matrix Z:~. Accordingly, the matrix N is known also

as the weight coefficient matrix and the square roots of its diagonal

elements are called (Hansen's) weight coefficients.


" -1
Note that X is called uncorrelated when N is diagonal, i.e.

when N is diagonal. In such a case, we can solve the normal equations

separately for each component of Xwhich satisfies our intuition. The


" is only remotely related to the correlation of L.
correlation of X "
X
197

will be uncorrelated if L is uncorrelated, i.e. P is diagonal, and if the

design matrix A is orthogonal. On the. other hand, N may be diagonal

even for some other general matrices P, A.

Let us now turn once more back to the "adjustment'~ of the

sample mean (see 6.4.3). It is left to the student to show that the

normal equations degenerate into a single- equation, namely equation ( 6. 40)

On the other hand, using eq. (6.91), we obtain the estimated variance of

the mean x as

(6.92)

Evidently the estimated variance of x differs from the variance sAX 2


(see eq. 6.60) in the denominator.
By analogy we define a new statis-
2 A

tical quantity, the estimated variance s1 of a sample L

n n
l: (~. _ I)2 = 1 l: (6.93)
1. n - 1
i=l i=l

(compare with eq. 3.6) which is used in statistics wherever the mean I of the

sample L is also being determined. It is again left to the student to

show that using the estimated variances for the grouped observations (see

6.4.2) the formula(6.92)(instead of 6.60) can be derived using the argumentation

of 6.4.2 and 6.4.3.

The estimated variances of the sample L and its mean R. can be

also computed using non-normalized weights,.i.e. weights p. for which


.l.
n
l: pi ~ 1 (see 6.4.6). It can be shown that the appropriate formulae are
i=l
1
n
=n - 1
l: (6.94)
i=l
198

and
2 n A
A
1 2
S- = E pi (6.95)
~ n ~
(n-1) z pi i=l
i=l

To conclude this section, let us try to interpret the meaning

of the variance factor K, introduced for the first time in 6.4.5. Let

us take, for simplicity, an experiment yielding a unit matrix of normal

equations, i.e. N = I. What would be the variance-covariance matrix

of the solution vector X? It will be a diagonal matrix

This implies that all the variances s~ 2 of the components of X equal to


X.
l
K. Since the square roots of the diagonal elements of N (all equal to 1)
A
can be considered as the weights P. of the components x. of X we can also
l l

write:
2 2
pl sl = P2S2 = •.• = Pnsn 2 = K
(6.97)

Comparison with equation (6.59) gives some insight into the role the

variance factor K plays. It can be regarded as the variance of unit


2
weight (see 6.4.3) and is accordingly usually denoted by either S or
0

cr 0 2 (in case of postulated variances). This is .again intuitively pleasing

since it ties together formulae (6.66) and (6.65), where K can be also
2 2 2 A A A

equated to S • Analogically, we denote K by either S or cr


0 0 0
A 2 A

By adopting the notation cr for K, and further by denoting the


0

weight coefficient matrix of the estimated parameters X, i.e. N-1 , by Q,

the equations (6.90) and (6.91) become:


199

~
cr
2 vrpv
=-- (6.98)
0 df
~ 2
~A
= cr Q. (6.99)
X 0

Example 6.18: Let us compute the estimated variance-covariance matrix

~"
X
" in example 6.16.
of the adjusted parameters X The

~A matrix is computed fro~ equation (6.99). First, from


X

the above mentioned example we have:


AT HJ - HG - thi
v = ._.;;__ _;;,__;;;;._;;;;...._ [dl ' d2 ' d3 J'
1,3 ~d.
l. l.

p . 1 1 1
3,3 ;:: .-diag [d, d' d]
1 2 3

and df =n - u =3 - 2 = 1.
Hence,

[1, 1, 1],

and
A 2 - 2•
cr
0
(HJ - HG -~hi) /
-1 l.
As we have seen, N = Q is given by
dl d2 d3 d2 + d3 1
-1
Q =N = d2d3 d2
2,2 ~:ldi

dl + d2
L •
d2 dld2

We thus obtain finally

" 2 (HJ- HG -1{hi)2


~A
X
= cr
o
Q=
~.d.
l. l.
200

Example 6.19: Let us compute the estimated variance-covariance matrix


~ A

~X of the adjusted parameters X in example 6.17. We

are going to use equations (6.98) and (6.99). First, from

the above mentioned example we have

"T
v = [o.oo, o.o2, o.o2, -o.o4, -o.o4, o.o4]
1,6
in metres,

P = diag [0.25, 0.5, 0.5, 0.25, 0.5, 0.25]


6,6
in m-2- and

df =n - u =6 - 3 = 3.
Hence

and
= 0.002
3
.!.
- 0.00067 ( uni tless) •

Also, from example 6.17, we have

Q = N-1 = [1.6
0.8
0.8
1.6 o.BJ
0.8 2
in m .
3,3 o.8 0.8 1.2
Finally,

~"
X = ~ 0 2Q = 10-4
10.67
5.33
5.33
10.67
5·33]
5.33 in m ,
2
3,3 5.33 5.33 8.0

~:~~
or"" [10.67 5.33
'Ei = 5. 33 10.67 ] in cm2 •
5.33 5.33 8.0
201

6.4.8 Some Properties of the Parametric Adjustment Solution Vector

It can be shown that the choice of the weight matrix P of the

observations E (proportional to the inverse of variance-covariance matrix

~L) and the choice of the least-squares method (minimization of VTPV)


to get the solution X = X ensures that the resulting estimate X has got

the smallest possible trace of its variance-covariance matrix LX. In

other words: taking P =~ 2 ~L-l and seeking min VTPV, provides such a
0 XERU
solution X that satisfies at the same time the condition

min trace LX . (6.100)


Xt::Ru
This is a result similar to the consequence of the least squares principle

applied to random multivariate (section 5.4) and we are not going to prove

it here.

Similarly, it can be shown that for uncorrelated multisample of

observations L = (L1 , L2 , . . . , Ln) which are assumed to be normally

distributed with PDF given by:

n 1
q,(LJ3;L) = II exp [ - (6.101)
i=l s.I(27T)
~

we get the most probable estimate of L0 if the condition min VTPV


XERu
is satisfied. This can be verified by writing

n
1 1
q,(L 0 ;,S;L) = - - - = - n - - exp [--
2
L
(27T)n/2 II S i
i=l
i=l

1 1 T
= n
exp [- 2; V PV] ,
II s.l
i=l
202

T
which is maximum if both V PV and trace (EX) are minimum. This is valid
for any fixed K.

6.4.9 Relative Weights, Statistical Significance of A Priori and A


Posteriori Variance Factors

We have seen in section 6.4.6 that the choice of the a priori


variance factor cr 2 , or~. does not influence the estimated solution
0
...
vector X. Also, in section 6.4.7 we have seen that the same holds true
even for the estimated variance-covariance matrix EX. Hence, for the

purpose of getting the solution vector X along with its EX' we can assume
2
any relative weights, i.e. P = cr 0 EL -l , with cr 2 chosen arbitrarily. On
0
T
the other hand, t~e~mai"irix
of no}"lna;l':lquations, i.e. N =A PA, and the
... 2 AT A

estimated variance factor, i.e.' cr = V PV/df, are influenced by the


0
2
selection of cr •
0

These features of ~·o 2 are used in practice for two different

purposes. First, is to render the magnitude of the elements of the

normal equation matrix N such as to make the numerical process of its

inversion the most precise. This is accomplished by choosing the value


2
of cr such as to make the average of the elements of N close to one.
0

The second purpose is to test the consistency of the mathe~atical

model with the observations and to test the correctness of the assumed

variance-covariance matrix E1 . Usually, if we do not have any idea


2
about the value of the variance factor cr , we assume
0

after performing the least-squares adjustment, we get

The ratio~ 2 /cr 2 , provides some


2
as an estimate of the assumed cr
0 0 0
203

testimony about the correctness of LL and the consistency of the model.

This ratio should be approaching 1. By assuming in particular, a 2


0
= 1,
we should end with &02 = 1 as well. If this is not satisfied, we start

looking into the assumed LL and use the obtained cr~ from the adjustment

instead of a 2 in computing the weights. If the resulting new variances


0

and covariances of the observations are beyond the expected range known

from 'experience, we have to start examining the consistency of the math-

ernatical model with the observations, i.e. if it really represents the

correct relationship between the observed and the unknown quantities.

This approach is also used to help detecting the existing

"systematic errors" in the observations L, that manifest themselves as

deviations from the mathematical model. These deviations cause an

"overflow" into the value of the quadratic form VTPV and con~equently,
"2 •
into a
0

The theoretical relation between the a priori and a posterior

variance factors allows us to test statistically the validity of our

hypothesis. However, this particular topic is going to be dealt with

elsewhere. Let us just comment here on the results of the adjustment

of the levelling network discussed in Examples 6.17 and 6.19. In corn-

puting the weight matrix P, we assumed a 2


0
= 1. After the adjustment we

obtained &2 ~
0
0.00067. Thus the ratio a 2
e
;cro2 equals to 1500 which is con-

siderably different from 1. This suggests that the variance-covariance

matrix E- was postulated too "pessimistically" and that the actual variances
L

of the observations are much lower.


204

6.4.10 Conditional Adjustment

In this section we are going to deal with the adjustment of

the linear model (6.68), i.e.

B L = c '
(r < n), (6.102)

which represents a set of "r" independent linear conditions between n

observations L. Note that C is an r by 1 vector of constant values

arising from the conditions.

For the adjustment, the above model is reformulated as:

B (L + V) ~ C =0
or, as we usually write it

B- ·v + w = 0 *)) (6.103)

where:
(w= BE-c. (6.104)

The system of equations (6.103) is known as the condition

e~uations, in which B is the coefficient matrix, V is the vector of dis-

crepencies and W is. the vector of constant valu~s ~. We re9all that "n" is
the number of observations and "r" is the number of j,ndependent

conditions. It should also be noted that no unknown parameters,.i.e.


·.rector X, appear in the condition equations. The discrepencies V are

the only unknowns

* If we have a non-linear model F(L) = 0, it can be again linearized by


Taylor's series expansion, yielding:

F(TJ) = F(L 0 ) + ~~ I
1=1°
(L-1°) + • • • '

in which we again neglect the higher order terms. Putting V = (1-1°),


B for aF /3L and W = F(L0 ) , we end up with the lineariz.=d condition
equations of the form: BV + W = 0 J which is the same as ( 6 .103} ..
205

"
We wish again to get such estimate V of V that would minimize
T
the quadratic form V PV, where P = ao2 E-L -1 is the assumed weight matrix
. . T
of the observations L. The formualtion of this condition, e • mJ.n V PV,L
Ve:Rn
is not as straightforward, as it is in the parametric case (section 6.4.6).

This is due to the fact that V in equation (6.103) can not be easily

expressed as an explicit function of B and w. However, the problem can

be solved by introducing the vector K of r unknowns, called Lagrange's


r,l
multipliers or correlates*). We can write:

min VTPV =min [VTPV + 2KT (BV + W)] (6.105)


Ve:Rn Ve:Rn
since the second term on the right hand side equals to zero. Let us

denote

To minimize the above function, we differentiate with respect to V and

equate the derivatives to zero. We get:

which, when transposed, gives PV + BTK = 0.


"
The last equation can be solved for V and we obtain:

lV = -P-1 BT K .} (6.106)

This system of equations is known as the correlate equations.

Substituting equation (6.106) into (6.103), we eliminate V:

B(-P-l BT K) + W = 0 ,

or

·~-lBT) K= W l (6.107)

This is the system of normal equations for conditional adjustment. It is

usually written in the following abbreviated form:

* This is why the conditional adjustment is sometimes called: adjustment


by correlates.
206

MK = W, (6.108)

where

(6.109)

The solution of the above normal equations system forK yields:

(6.110)

Once we get the correlates K, we can compute the estimated residual vector

V from the correlate equations (6.106). finally the adjusted observations


....
1 are computed from

L=L+V. (6.111)

In fact, if we are not interested in the intermediate steps,

the formula for the adjusted observations 1 can be written in terms of the

original matrices B and P and the vectors :E and C. We get

1 =1 + v
=1 - P-l BT K

=1 - P-lBT(BP-lBT)-l (B L - C) . (6.112)

It can also be written in the following form:

[ L = (I - T) L+ HC ) \ (6.113)

where: I is the identity matrix ,

(6.114)

Example 6.20: Let us solve example 6.16 again, using this time the"con-

ditional method of adj.ustment. We have only one condition

equation between the observed height differences h., i


l.
= 1,2,3,
and we thus note that the number of degrees of freedom
207

is t.he same as in example 6.16. Denoting HJ - HG by [!H, the

existing condition can be written as:

L:h.l = llH.
After reformulation we get:

which ~an be easily written in the matrix form as

B v + w = 0
1,3 3,1 1,1
where
[ vl
B = [1, 1, 1], v = v2
v3

and

w = ii + 112 + h3 - llH

= L. h.l - llH .
i

The weight matrix of the observations is given by (see

example 6 .16) :
P (l 1 L)
3,3 = diag dl' d2!' d3

and

The system of normal equations for the correlates K is given

by equation (6.108) as

.M K = w
1,1 1,1 1,1
where

L:d .•
i l
208

The solution for K is

K = M-1 W
1
= 'i:'A'"
.d.
U:ii. -
i 1
AH).
1 1

The estimated residuals are then computed from equation

(6.106) as:
A

V = -P-1 T
BK

0 0 1 r-
dl
e.h.
J.
- AH

=- 0 d2 0 1 Ea.
1 1
0 0 d3 1

dl
~h. - H
1 J.
-- d2 L: d.
i J.

d3

and we get:
AH - th.J.
v.
,J.
= L:
l.
d.'
1
idi
This is the same result as obtained in example 6.16 when

using the parametric adjustment (note that AH=H -H).


J G
In this particular problem, we notice that the adjustment

divide the misclosure, i.e. ( dH -E. h.), to the individual


l. 1

observed height differences proportionally to the corresponding

lengths of levelling sections, i.e. inversely proportionally

to the individual MSE's. The adjusted observations are given


by equation (6.1~1) i.e.
"
L =L + v '
or

i = 1' 2' 3.
209

This yields:
llH - Eh.~ ~
h. = h. + d ••
~
~ ~
~d.
~ ~

Finally, the estimates of the unknown parameters, .i.e.


" " "
X= (H1 , H2 ) , are computed from the known elevations

and the adjusted observations h., as follows:


. ~

H ::a
HG + hl
1

= HG - dl
+ hl + - - (llH - ~iii J
~d.

~ ~

and

H2 = HJ - h3
d3
= HJ - h3 - (llH - Eh. ) •
~ ~
BU.
i ~

The results are again identical to the ones obtained from

the parametric adjustment.

Example 6.21: Let us solve example 6.17 again, but this time using the

conditional adjustment. The configuration of the levelliqg


b
network in question is illustrated again in Figure 6.8,

for convenience.

From the above mentioned example we have:

No. of observations, n = 6,
a c
No. of unknown parameters, u = 3.
Figure 6.8 Then df =6 - 3 = 3, and we shall see that we can again

for.mulate only 3 independent condition equations between

the given observations.


210

By examining Figure 6.8, we see that there are 4 closed


loops , namely: (a - c - d - a) , (a - d - b - a) ,

(b- c -d-b) and (a- c - b -a).

This means that we can write 4 condition equations, one

for each closed loop. However, one can be deduced from

the other 3, e.g. the last mentioned loop is the summation

of the other three loops.

Let us,for instance, choose the following three loops:

loop I = a - c - b - a ,

loop II = a - c - d - a >

loop III =a - d - b - a •

These loops give the condition equations as follows

(hl + vl) - (h6 + v6) (h4 + v4) = 0,


(hl + vl) + (h3 + v3) (h2 + v2) = o,
(h4 + v4) = 0.
Then we get

v 1 - v 2 + v 3 + (hl ii2 + h3) = 0 J

v 2 - v 4 - v 5 + (ii 2 - ii 4 - ii 5 ) =0 •

The above set of condition equations can be written in the

matrix notation as

B V + W = 0 ,
3,6 6,1 3,1

where:
211

1 0 0 -1 0 -1

B = 1 -1 1 0 0 0
3,6
0 1 0 -1 -1 0

VT = (vl, v2, v3, v4, v5, v6)

rl _ 1
1,6

and
ii4 _ ii6)
w = (hl - h2 + h3)
3,1 (h - h4 - h )
2 5

Substituting the observed quantities h. i = 1, 2, II II II ' 6'


1

into the above vector we get

0.0

w= 0.0 in metres •
3,1
-0.1

The weight matrix P of the observations is formulated as:

(see example 6.17):

P = diag (0.25, 0.5, 0.5, 0.25, 0.5, 0.25)


6,6

and
-1
P = diag (4, 2, 2, 4, 2, 4).
6,6
The normal equations for the correlates K are

M K = W
3,3 3,1 3,1

where
212

12 4 4
4 8 -2

4 -2 8
By inverting M we get:

0.15 -0.1 -0.1

-0.1 0.2 0.1

-0.1 0.1 0.2

The solution Tor K is given by

0.15 -0.1 -0.11 o. 011


K
3,1
-1
=M W = -0.1r-0.1
0.2
0.1
0.1
0.2
o.ol
[ o.o
-0.1
= 1-0.02
-0.01 •

The estimated residuals are computed from equation (6.106):

= o.oo
0.02
0.02 m

-0.04
-o.o4
0.04
and are again identical with the results of example 6.17.

The adjusted observations will be.

L =E + V , i.e.

6.16 o.oo 6.16


~1
12.57 0.02 12.59
~2
6.41 0.02
~3 6.43
}i = 1.09
+
-o.o4 = 1.05
,..4
11.58 -0.04 11.54
~5
h6 5.07 0.04 5.11

in metres.
213

Finally, to compute the estimated elevations of points

b, c, and d, i.e. Hb , He and Hd, we will use the given

elevation H and the adjusted observations h .•


a ~

For instance;

~ = Ha + h 4 = 0.0 + 1.05 = 1.05 m,

He = Ha + h1 = 0.0 + 6.16 = 6.16 m,

Hd = Ha + h2 = 0.0 + 12.59 = 12.59 m.

These are obviously identical with the corresponding results

of the parametric adjustment.

Note again that when computing the estimates of the unknown

parameters from the adjusted observations we can follow any

route in computing them. They all lead to the same answer.

6.4.11 Variance-Covariance Matrix of the Conditional Adjustment Solution

The formula for the variance-covariance matrix LL of the adjusted

observations - the "result" of the conditional least squares adjustment -

can be developed by applying the law of propagation of variance-covariance

matrix (equation 6.15) on equation (6.113). In this equation, the matrices

I,T,H are, obviously, fixed. Similarly, the vector C is considered as a

vector of theoretically deduced, and therefore errorless, values, then

LC will be zero. Hence, we get:


"
~--
"L
= (aL)
at
(aL)T
LE aL
214

= L-lr - T
IL- T - TL-I + TL-T •
T (6.115)
L L L
T
It is not difficult to see that both (HET ) and (TLEI) are square symmetric

matrices, hence we can write:

T
= L--
L
TL- (2I- T
L
). (6.116,)

Recalling, from equation 6.114, that T = P-l [BT(BP-lBT)-lB] and

knowing that P = cr 0 2 LE
-1
, i.e.

LE = cr 0 2 P-1 , then by substituting these quantities into equation (6.116)

we get:

(6.ll7)

Noting that

we get finally
r-~-LL_=_cro-2-P--1-(-I--_B_T_(B-P---1-BT_)___l_B_P___l_)-.-,]
(6.118)

Here, similar to the parametric adjustment, to obtain the estimated


.
var~ance-covar~ance . ma tr~x cr 2 ~ns
. we use "" . t ead o f cr 2 , were:
h
0 0
215

~ 2
cr (6.119)
0

and we end up with


L~ = ; 2 (P-1 - p-1 BT (BP-lBT)-1 BP-1)' (6.120)
L o
or,in an abbreviated form:
LA
L
= ; o2 (I - T)P-l *) . (6.121)

Analogous to the parametric adjustment, it also can be shown

that the estimate L assures the minimum trace of its variance-covariance

matrix L£· Under the same assumptions as stated in section 6.4.8, the

estimate L is also the most probable estimate of L.

Regarding the correlation between the adjusted observations


"'L, we can see that "'L will be uncorrelated if:(i) Lis uncorrelated and

(ii) the coefficient matrix B is orthogonal. If these two conditions

are satisfied then T and P-l will be diagonal matrices. On the other
A
hand, we can experience uncorrelated L even for some other general T and

P.

Finally, we note that again the choice of the a priori variance

factor cr 0 2 does not influence the estimated LL defined by equation (6.121).

Example 6.22: Let us determine the variance-covariance matrix "' for the
LA
L
conditional adjustment formulated in example 6.20.

We have
b.H - ~h.
~ ~
v.
~
= d
i
i = 1, 2, 3,
~d.
~ ~

p 1 1 1
and = diag (d, d).
d2 ' 3
3,3 l
Thus we get

It can be shown that similarly LA ~ 2 TP-1


* v= 0
216

AT A (tlH - :EE. )2
"2 VPV 'l
0
o
= ----
r
= ------~~
4:d.
l l

The required variance-covariance matrix is given by equation

6.121, i.e.

First we compute T = P-lBTM-1B. We recall from example

6.20 that M-l = 1/ Z:d. ,


i l
B = [1,1,1]
1 1 1,3
'

Hence,

0 0 1
dl [ LJ [1, l, l]
I: d.
T = 0 d2 0 l l l

0 0 d3 l

and we obtain
dl dl

d2 d2

d3 d3

Further we get
a.l a.l

(I - T) P-l = C/.2 a.2

a.3 a.3

where a..
l
= i = 1, 2, 3.

Finally we get:
217

E - 2 (I - T) P-l
h
= cr 0
3,31

2 al al al
(AH - l.Eh·)
• l.
= Ed.
a2 a2 a2
i l.
a a3 a
3 3

.1\

Example 6.23: Let us determine the variance-covariance matrix 2-'\ for


L
the conditionally adjusted levelling net of example 6.21.

We have

v"'T = [0.00, 0.02, 0.02, -0.04, -0.04, 0.04]


1,6

in metres,

6 ~ 6 = diag [0.25, 0.5, 0.5, 0.25, 0.5, 0.25]


. m-2
1.n · and

r = df = n - u =6 - 3 = 3.
Hence,
hT "
V PV = 0.002 (unitless),
"T A
V PV
" !
cr
o
;i:-
.~ r
= 0 · 3002 = 0.00067 (unitless).

The required E£ matrix is computed again from equation

6.121 as

E"
L
= cr"o 2 (I - T) P-1
6,6
where
-1
P = diag [4, 2, 2, 4, 2, 4]
6,6
218

and T is computed from

T = p -1 (BTM-1 B).
6,6

From example 6.21, we

M
-1
= (BP-lBT)-1 = I have:

-0.1
-0.1
0.15
-0.1
0.2
0.1
-0.1
0.1
0.2
l
and

[~ -n
0 0 -1 0
B = -l 1 0 0
3,6
1 0 -1 -1

Hence J

(BTM-1 B) =
0.15 -0.1 0.1 -0.05 0 -0.05
-0.1 0 •. 2 -0.1 -0.1 -0.1 0
0.1 -0.1 0.2 0 -0.1 0.1
=
-0.05 -0.1 0 0.15 0.1 0.05
0 -0.1 -0.1 0.1 0.2 -0.1
-0.05 0 0.1 0.05 -0.1 0.15

and: T = p -1 (BTM-1 B) =
6,6

0.6 -0.4 0.4 -0.2 0-0.2


-0.2 0.4 -0.2 -0.2 -0.2 0
0.2 -0.2 0.4 0 -0.2 0.2
=
-0.2 -o.4 0 0.6 0.4 0.2
0 ..,.Q,:2: -0.2 . 0.2_ 0.4 -0.2
. '

'-
-0.2 0 o.4 ().2 -o.4 0.6

Hence

(I - T) p -1 =
219

1.6 -0.8 0.8 -0.8 0 -0.8


-0.8 1.2 -0.4 -0.8 -0.4 0
0.8 -0.4 1.2 0 -0.4 0.8
-0.8 -0.8 0 1.6 0.8 0.8
0 -0.4 -0.4 0.8 1.2 -0.8
-0.8 0 0.8 0.8 -0.8 2.4

Finally we get
~ 2
L:~
L
= (J
0
(I - T) P-l as
6,6

10.72 -5.36 5.36 -5.36 0 -5.36


-5.36 8.04 -2.68 -5.36 -2.68 0
5.36 -2.68 8.04 0 -2.68 5.36
= lo- 4
-5.36 -5.36 0 10.72 5.36 5.36
0 -2.68 -2.68 5.36 8.04 -5.36
-5.36 0 5.36 5.36 -5.36 16.08

in metres squared.

By dropping the scalar 10-4 we get the results in em2 .

The comments stated at the end of section 6.4.9 regarding


~ 2 2
the value of a versus the assumed v~lue of 1.0 for a
0 0

hold true here as well.


220

6.5 Exercise 6

1. To determine the height h of a wall shown in the Figure, the

horizontal distance t and the vertical angle 8 were observed and

found to be:

t = 85.34 m, with S£ = 2 em.

8 = 12° 37' 30" , with S- = 10" .


8
Required: Compute the statistical estimate for h

along with its RMS.

2. To determine the distance P1 P2 = c, which cannot be measured directly

due to the existence of some obstacle as shown in the Figure, the

following measurements were taken:

with S- = 3 em,
a

plp3 = b = 40 m, with s"b = 4 em,


y = 60°' with s-y = 25".
Reguired: Compute the distance plp2 and its

standard error to the nearest mm.

3. Determine the standard error of the estimated hedght h of the tower

given in Problem number 9, Exercise 4, s~ction 4.11. Consider all the

measured ~uantities, namely t, a, S and 8 to be uncorrelated.

4. From a point P 0 in the x-y coordinate system shown below, a distance

d = 5637.8 mandan azimuth T = 49.9873 grads (100 grads= 90 degrees)

to a second point P1 were measured. The relative error of d is

1.2 · 10- 4 . The RMS ofT is 0.08 centigrads (1 grad= 100 centigrads).
221

Required: Compute the following:


" ·"
( i) The coordinate difference_s ( l:::.x, l:::.y)

between points P0 and P1 .

(ii) The variance-covariance matrix EX,


" " 2
where X = (t::.x, !:::.y), in m ,

(iii) The RMS of l:::.x and !:::.y respectively.


- - ....
b.)l.
(iv) The correlation coefficient

between l:::.x and l:::.y.


~----------------~~x

5. The shown traverse consists of two legs P0 P1 and P1 P2 . The coordinates

(x , y ) of the initial point P as well as the (x, y) coordinates of


0 0 0

the reference mark R are considered to be

error-free (errorless), i.e. fixed

quantities. The measured quantities

are the horizontal angles sl and s2

and the horizontal distances d 1 and d 2

respectively. The available data are:

x = 100.0 m y = 200.0 m

x0 = 150.0 m y0 = 150.0 m

~1 = 750 with s- = 3"


sl
s2 = 21o 0 with s- = 2"
s2
d:1 = 100 m and d2 = 200 m.

The standard error of the observed distance is to be calculated according

to the formula

Sd( em) = 1. 0 -(em) ·+ d(m) . 10- 2 .


222

Required: Compute the following:


A

(i) The estimated coordinates (x 1 , y 1 ) of point P1 , along with their

associated variance-covariance matrix E(A A ) •

xl, Yl A A

(ii) The estimated coordinates (x 2 , y2) of point P2 and their

variance-covariance matrix E(A ~ )'


x2, Y2
Note that the coordinates are required to the nearest rom and the
2
variances and covariances are required in em •

(iii) Discuss the correlation among the estimated coordinates x 1 , y 1 ,

x2 and y2.

6. Having an intersection problem, as shown in the Figure, i.e.


observing the two horizontal angles S and a from the two known stations

P1 (x1 , y 1 ) and P 2 (x 2 , y 2 ) in order to determine the (x, y) coordinates

of an unknown station P.

Given: the following data:


'P (x]ij)
x.:j.. = 200.0 m
E -
(xl, Yll=C :] em
2
'
r: (>cp 'j,) ----9
I
yl = 500.0 m

x2 = 546.4 m
E -
(x2, y2
- ) =l -o.j em
2

L~
y2 = 300.0 m -0.5 3

a = 90° s-a = 3" ~ (x,}1J


and s- -= 0 .
s = 60° s-s = 2" o(.~

Required: Compute the estimated (x' y) coordinates of the unknown

station P, in metres, and their associated variance-covariance matrix


E 2
A A
in em
(x' y)
223

7. Consider problem number 1 of this exercise. Assume that the observed

quantities ~ and 8 have got also non-random (systematic) errors of

-1 em and 5" respectively. Compute the expected total error in the

derived height h in centimetres.

8. Determine the expected error in the sum of a hundred numbers in the


following two cases:

( i) each individual number is to be rounded-off to three decimal

places.

(ii) each individual number is to be truncated to three decimal

places.

Then compare and comment on the results.

/
/
9. To determine the height h of a tower, the /

proposed, in whiCh a, S, 8 and a are the

quantities to be measured. The approxi-

mate values of these quantities were

obtained from a preliminary investigation

and found to be:

a = 100 m.

Providing that the horizontal angles a and S are to be measured with

a precision of 2" (i.e. s-a ,; s--S ~ 2"), what are the required precisions

in measuring both the horizontal distance a and the vertical angle 8

(i.e. Sa and s8) such that their contributions to the standard error Sh
of the derived height h - which is specified not to be worse than

2.45 em - will be the same.


224

10. Assume that the horizontal angles in a triangulation network are to

be measured using two theodolites, "I" and"II", of different quality.

These two theodolites were tested by measuring one particular angle

several times, from which it was found that the standard deviation

of one observation, i.e. the standard deviation of the sample,observed

with theodolite "I" was s1 = 1':5 and with theodolite "II"


l
was s1 = 2': 5. If it is specified that all the angles of the net-
2
work are required to have a standard deviation of the mean, i.e.

SI, not worse than o': 5, how many times should we measure each angle

when using theodolite number"I" and when using number "II"?

11. The following observations of the length of an iron bar in metres

are made on a comparator:

3.284, 3.302, 3.253, 3.273, 3.310, 3.321, 3.304,

3.295, 3.263, 3.270.


Required: (i) the length of the bar (i.e. the mean)

(ii) the RMS of one observation j

(iii) the standard deviation of the mean.

12. The following table shows the means I. of the daily measurements of
~

the same distance t during a five day period, along with their

respective standard errors S- •


t.
~

Day Mon. Tues. Wed. Thurs. Fri.

I. (m)
~
101.01 100.00 99.97 99-96 100.02

SI _(em) 2 1 4 5 3
~
225

Required: Compute the weighted mean of the distance ~' say i, along
with its associated RMS, i.e. Si.

13. Given a gravimetric network, as shown in the figure below, determine

the gravity values g1 and g 2 at points P1 and P2 respectively,

with their variance-covariance matrix. The gravity

g
0
= 979832.12 mgal at the initial

point P is known.
0

The following table gives the observed

gravity differences with their signs,

along with the time needed for each

observation.

Station
From To I:J.g (mgal) b.T (hr)
p pl - 9.82 2.5
0

p2 Po -27.78 1.5

pl p2 +38.42 2.0

Assume that the observed differences I:J.g's are uncorrelated, and their

variances are proportional to the corresponding time intervals b.T.


226

14. GiYen a leYelling net as shown in the

Figure, the eleyations HA, HB of

points A and B are considered as

known and errorless:

HA = 300.000 m,
A

HB = 302.245 m.

The following table giYes the obserYed

height differences h.'s along with the


J..

length t 1 of each section.

Section Station h. t.
J.. J..
No. From To
(m) (km)

1 pl B 1.245 1.0

2 A pl 0.990 0.5

3 pl p2 0.500 1.0

4 p2 B 0.751 1.0

5 p3 B 1.486 0.5

6 p3 p2 0.740 1.5

Note that the arrows in the giYen figure indicate the directions of

increasing eleYations. The aboYe ob.serYations are considered

uncorrelated with~. proportional to ~i.


J..
Required: Perform a parametric least squares adjustment of the aboYe

leYelling net and find out the following:


227

~ fi ~
(i) The estimated elevations H1 , H2 and H3 of points P1 , P2 and P 3 .

(ii) The adjusted values of the given six height differences.

(iii) The estimated variance-factor §2 and compare it with the assumed,


0

apriori variance factor s20 ; comment on the results.

(iv) The estimated variance-covariance matrix ~X of X= (H1 ,H 2 , H3 ).


15. Adjust the levelling net given in problem no. 14 again by using the
conditional method of adjustment. Replace the requirement no. iv

by computing the estimated variance-covariance matrix E£ of the

adjusted height differences. Compare the results of the other three

requirements with the corresponding results from the parametric

adjustment.

16. Two physical quantities Z and Y are assumed to satisfy the following

linear model

Z = aY + S ,
where a and S are constants to be determined. The observations Y.l
and Z. obtained from an experiment are given in the following table.
l

y l 3 4 6 8 9 ll 14

z 1 2 4 4 5 7 8 9

Assume that the Y's are errorless, and the Z's were observed with equal

precision.
A

Reguired: Determine a and S which provide the best fitting line

between Z andY, in the least squares sense.


228

17. Solve problem No. 16 again, but this time consider the Z's error-

less, and the Y's with equal variances. Compare the results for

&and S with the corresponding results from problem No. 16.


c
18. The given figure shows a triangulation B
network with fixed base AB =2 km. The

eight numbered angles in the Figure

are all measured, each with a

different number n.]. of observations

as shown in the following table: A D

Angle n.]. Mean value of the angle


No.

1 2 82° 07' 09'!50

2 2 28 22 17.70

3 5 110 29 25.02

4 3 125 53 33.67

5 2 25 44 09.30

6 2 29 19 17.50

7 5 55 03 29.32

8 3 68 33 32.33

Assume that all the measurements were done with the same instrument

and under similar circumstances. (~: the weight of each angle

will be proportional to the corresponding number of repetitiomn.).


].

Required: (i) Neglecting the spherical excess in this network, compute


......
the distance CD using the adjusted values of the observed angles.
229

(ii) Considering the fixed base AB to be errorless, find


A
the estimated relative error of the estimated length CD.

19. The given figure shows a braced quadrilateral ABCD in a triangulation

network, in which all the directions marked by arrowheads were

measured with the same precision.


c
The base AB = 25 km is assumed

errorless. The spherical excess 2S


s in the four formed triangles

is computed approximately and

given by:

MBC, s = 3'! 126


D.ABD, E: = 1'! 556 7 13
12
6GBD, E: = 3'!085 9

D.ACD, E: = 1'!515
A D
The results of the direction observations are summarized in the

following table .

... ,
..
Occupied Target Direction Observed Direction
Station Station No.

.. B 7 00° 00' oo':oo


A c 8 91 30 30.35
D 9 125 53 33.91

c 30 00 00 00.00
B D 36 28 22 l/.26
A 38 110 29 2T.l3
D 26 00 00 00.00
c A 27 29 19 17.52
B 28 35 03 26.80
A 11 00 00 00.00
D B 12 35 07 29.00
c 13 68 33 32.60
230

Required: (i) Perform a conditional adjustment and find out the

adjusted values of the observed directions, along with their


A

estimated variance-covariance matrix r£ .


(ii) Using Legendre's theorem for the spherical triangleJ

i.e. by subtracting one third of the spherical excess from each

adjusted angle and then solv:i.rg the triangle as if it was a plane


,....
triangle compute the side CD from the known base .AB and the
..........
adjusted directions. Then check the computed CD by following another

route in its computation.

(iii) Compute the estimated relative error of the estimated


:::::::::
length CD.

20. The given figure shows a

triangulation network with

two fixed (errorless) stations


B 3 12 D
A and E whose x and y

coordinates are:

X y
A: 0. 0 m, 0.0 m,

E: 200.0 m, 0.0 m.

The 14 marked directions (with arrowheads) were observed with the

same precision. The results of the observations are tabulated below.


231

Occupied Target Direction Observed Directions


Station Station No.

A
B 1 00° 00' oo'!o
c 2 60 00 10.0

D 3 00 00 00.0
B c 4 6o 00 05.0
A 5 119 59 50.0

A 6 00 00 oo.o
c B 7 59 59 55.0
D 8 120 00 00.0
E 9 180 00 05.0

E 10 00 00 oo.o
D c 11 59 59 45.0
B 12 119 59 55.0

A 13 00 00 00.0
E
D 14 60 00 15.0

Reguired: Prepare the input for a computer program performing

a parametric least squares adjustment using the directions (not the

angles) to estimate the unknown coordinates of points B, C and D by

providing the following:

(i) The number of unknown parameters and the number of degrees of

freedom.

(ii) The non-linear mathematical model.

(iii) The approximate values for the ,x, y coordinates of points B,C,D.

(iv) The linearized form of the mathematical model, i.e. V = A~X-~E.


giving the symbolic elements of the vectors V and ~X and the

numerical values of the elements of the design matrix A and the

vector ~E.
232

(v) Construct the variance-covariance matrix E~E of the obser-


vations assuming the standard errors in directions to equal

2".
233

APPENDIX I

Assumptions for and Derivation of the

Gaussian PDF

The derivation of the Gaussian PDF presented here is.due to G.H.L.

Hagen (1837). The first formulation of the normal law, however,originates

with De Moivre (1733).

(i) Let us assume that m independent physical causes are influ-

encing the measurement. Let each cause contribute an elementary error

either +~ or -~ towards the overall error s. Any value of s can thus be

expressed as a combination (there are n =~ such combinations) of m

elementary errors! ~. We note first that the span of s is <-m~, m~>.

Further, we realize that s can attain only a value of an integral mu+tiple

of ~. It is not difficult to see that any two adjacent values of s differ

by 2~ since one is obtained from the other by replacing -~ by +~ and vice-

versa. Dividing the range of E

Ra (s) = m~ - (-m~) = 2m~


by the step of s, i.e. 2~, we discover that s can attain any of the following

m + 1 values

s.
~
= (2. ~
- m)~ i = 0, 1, . . . m, (I-1)
corresponding to particular distinguishable combinations of the m elementary

errors.

(ii) Let us regard now the set D of all permissible values of E

D- {E -~ € ' • • • ••• 'e;. }


0 1 m
234

the probability space of the random sample consisting of all the 2m

combinations e. Obviously, many of the 2m combinations have the same values,

because there are only m + 1 different values available. The counts, ci'

of the individual values e. (see section 3.1.1) can be computed using the
l

combined probability (see section 2.3):

m i
c.=(~)= m (m-1) (m-2) ..• (m-:i+l) = II j/ II j. (I-2)
l l i ( i-1) ( i-2) . . . 1 j=m-i+l j=l

The actual probability of any value e. is then given by


l

c
P(.e.)
l
= .i = (~)/2m
l '
(I-3)
n
(see section 3.1.2).

(iii) The above formula describes the actual PDF of our sample e

in the discrete probability space D. Since our ultimate aim is to derive

the analytic expression for the "corresponding" (we shall see later what

is meant by corresponding) continuous PDF, we want to be able to express

P as a function of e. rather than i. The easiest way to do it is to use the


l

finite differences.

Let us: .define

oP(e.)
l
= P(e.)-
l
P(e.l -1 )
and we get from e~uation (I-3)

Obviously, the ratio oP (e.)/P(e.)


l l
is then given by
235

6P(e.)/P(e.)
~ ~
= 1- i/(m-i+l). ( I-4 )(

On the other hand, i can be expressed as a function of E. from equation


~

(I-1)

2i - m = E. I !J.
~

or

Renoting 6e =2/J. and substituting for i in equation (I-4) we obtain

6P(e) =l _ e/6e + m/2


P(e) m - e/oe- m/2 + 1

=1 + m/2 - e/8e - e/6e- m/2


1 + m/2 - e/8e

=1 - 2e/8e
= _______ ..;;..__
2E - OE
1 + m/2 - e/oe (1 + m/ 2) 0E - E

(iv) The next step in the development is to convert the discrete

PDF, P(E), to a continuous PDF, i.e. to derive the "corresponding" continuous

PDF. The "corresponding" PDF is assumed to be the PDF of such a variable

e which is defined the same way as the· discrete e in (i) with the exception

that m is let to grow beyond all limits, i.e. m ~ oo, By letting m grow

we would obtain infinitely large values of E (see equation (I-1». This

would contradict our experience teaching us that the errors are always finite

in value. Hence, we have to adopt another assumption and that is that as

m grows to infinity, the absolute value o.f elementary error t:. grows to zero,

making the product m!J. in equation (I-1) always .finite.


236

Accepting these two assumptions we can write the finite difference

equation as

oP( e:) 2E - oe:


lim =-.lim
P( E) (l+m/2) oe: - e: (I-5)
m+oo m+oo

0£ + 0 0£ + 0

which is nothing else but an ordinary differential equation for the con-

tinuous PDF P(s). It can thus be re~Titten as

d P( £) 2£ - de:
- - ~----~-------
P(e:) (1 + m/2) de: - s
To simplify the solution of this differential equation let us multiply

both the numerator and denominator of the right hand side by de: and assume

that mde: 2 is c-onst8.l'lt We further assume

Then we can write -

dP •
-p=-
2 e:de:
C/2 = - -c4 e:de: (I-6)

(v) We can now finally solve the differential equation. It is

solvable by direct integration and we get

!~ = - !~ e:de: + const.

4 e:2
. - c:2 + const.

Denoting the integration constant by tn K we finally obtain

P: K exp (-2e: 2 /C). (I-7)


237

The question now arises whether we are free to regard both K and

C as two independent parameters of the above PDF. We know that the basic

equation for a PDF, i.e


00
J
_oo
P(E) de: =1 (I-8)

has to be satisfied. Substituting for p into the basic equation we get

00 00
2
f -00 P(e:) de: =J -00
K exp (-2e: /C) de: =
00
2
=K J exp (-2e /C) de =1
-oo

00

and K =1 I I
-oo
2
exp (-2e /C) de .
Hence the answer is that K must not be regarded as an independent parameter.

It is a function of C and can be evaluated by solving the integral above.

We obtain
00 2 = .; Crr
f 2 !0 exp (-2~:: /C) de (I-9)
-00
2

and K =.;.L (I-10)


err •

The Gaussian PDF can then be written as


/2 2
P(t:) =G (C;e) = vcn exp (-2e /C) (I-ll)

and we can see that it is a one-parametric PDF.


238
APPENDIX II - A

r,RDINATES OF THE
STANDARD NORMAL CURVE
1 t2
Y = -= exp (- 2)
V27r

t 0
·-
1
__......
2 3 .. 4 5 6 7 8 9
0.0 .3~)89 .3989 .3989 .3988 .3986 .3984 .3982 .3980 .3977 .3973
0.1 .3970 .3965 .3961 .3956 .3951 .3945 .3939 .3932 .3925 .3918
0 ') .3910 .3902 .3894 .3885 .3876 .3867 .3857 .3847 .3836 .3825
0.:3 .3814 .3802 .3790 .3772 .3765 .3752 .3739 .3725 .3712 .3697
0.4 .3683 .3668 .3653 .3637 .3621 .3605 .3589 .3572 .3555 .3538

0.5 .3521 .3503 .3485 .3467 .3448 .3429 .3410 .3391 .3372 .3352
0.6 .3332 .3312 .3292 .3271 .3251 .3230 .3209 .3187 .3166 .3144
0.7 .3123 .3101 .3079 .3056 .30:34 .3011 .2980 .2966 .2943 .2920

I e.s
!
0.9
.2897
.2661.
.2874
.2637
.2850
.2613
.2827
.2589
.2803
.2565
.2780
.2541
.2756
.2516
.2732
.2492
.2709
.2468
.2685
.2444

1.0 .2·i20 .2:396 .2371 .234'7 .2323 .2299 .2275 .2251 .2227 .2203
1.1 .21.79 .2155 .2131 .2107 .2083 .2059 .2036 .2012 .1989 .1965
1.2 .1942 .1919 .1895 .1872 .1849 .1826 .1804 .1781 .1758 .1736
1 .u'J .1714 .1691 .1669 .1647 .1626 .1604 .1582 .1561 .1539 .1518
1.4 .1497 .1476 .1456 .1435 .1415 .1394 .1374 .1354 .1334 .1315

1.5 .1295 .1276 .1257 .1238 .1219 .1200 .1182 .1163 .1145 .1127
1.6 .1109 .1092 .1074 .1057 .1040 .1023 .1006 .0989
l 1.7 .0940 .0925 .0909 .0893 .0878 .0863 .0848 .08:33
.0973
.0818
.0957
.0804
l.R .0790 .0775 .0761 .0748 .0734 .0721 .0707 .0694 .0681 .0669
1.9 .0656 .0()44 .0632 .0620 .0608 .0596 .0584 .0573 .0562 .0551

2.0 .0540 .0529 .0519 .0508 .0498 .0488 .0478 .0468 .0459 .0449
2.1 .0440 .0431 .0422 .0413 .0404 .0396 .0387 .0379 .0371 .0363
2.2 .0355 .0347 .0339 .0332 .0325 .0317 .0310 .0303 .0297 .0290
2.3 .0283 .0277 .0270 .0264 .0258 .0252 .0246 .0241 .0235 .0229
2.4 .0224 .0219 .0213 .0208 .0203 .0198 .0194 .0189 .0184 .0180
~

?..5 .01.75 .0171. .0167 .0163 .0158 .01.54 .0151 .01.47 .0143 .0139
2.. t>" .0136 .0132 .0129 .0126 .0122 .0119 .0116 .0113 .0110 .0107
2.7 .0104 .0101 .0099 .0096 .0093 .0091 .0088, .0086 .0084 .0081
2.8 .0079 .0077 .00'75 .0073 .0071 .0069 .0067 .0065 .0063 .0061
2.9 .0060 .0058 .0056 .0055 .0053 .0051 .0050 .0048
. .0047 .0046

3.0 .0044 .0043 .0042 .0040 •0039 .0038 .0037 .0036 .0085 .0034
3.1 .0033 . 0032 .0031 . .0030 .0029 .0028 .0027 .0026 .0025 .0025
3.2 .002<1 .0023 .0022 .0022 .0021 .0020 .0020 .0019 .0018 .0018
u.,_,
'} ry
.OD17 .0017 .0016 .0016 .0015 .0015 .0014 .0014 .0013 .0013
3.t! .0012 .0012 .0012 .0011 .0011 .0010 .0010 .0010 .0009 .0009

~.5 .0009 .00.08 .0008 .0008 .0008 .0007 .0007 .0007 .0007 .0006
3.(3 .0006 .0006 .0006 .0005 .0005 .0005 .0005 .0005 .0005 .0004
3..7 .OfJO'i .0004 .0004 .0004 .0004 .0004 .0003 .0003 .0003 .0003
:l.8 .0003 .0003 .0003 .0008 .0003 .0002 .0002 .0002 .0002 .0002
:J.9 .0002 .0002 .0002 .0002 .0002 .0002 .0002 .0002 .0001 .0001
239
APPENDIX II - B

AREA-S UNDER THE


. ST'ANDAR.D NORMAL CURVE·
· • ··· from--co. to-t ·

·:·~~··:' ;~(~~!_'~~2f~~·(
. . Y\W'aJ~L£;. ,.,,
~~2i~.
;..-· •· .. ~,.· :... ,,:;~·;·.w.::._:.:(:"··.-·..-..\~~~.:;.~ ...... ,.. ·~ . . . . . .:...... -

t 0 1 2 3 4 5 6 7 8 9

0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5754
0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.3 .6179 .6217 .6251? .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879

0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7258 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7518 .7549
0.7 .7580 .7612 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
Q.8 .7881 .7910 .7939 .7967 .7996 .8023 .8051 .8078 .8106 .8133
0.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389

1.0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621
1.1 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830
1.2 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015
1.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177
1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319

1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441
1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545
1.'7 .9554 .9564 .9573 .9582 .9591 .9599 ,9608 .9616 .U625 .9633
1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706
1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767

2.0 .. 9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817
2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857
2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890
2.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .9916
2.4 .9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936

2.5 .9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952
2.6 .9953 .9955 .9956 .9957 .9959 .9960 .Q961 .9962 .9963 .9964
2.7 .9965 .9966 ,9967 .9968 .9969 .9970 .9971 .9972 .'9973 .9974
2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981
2.9 .9981 .9982 .9982 .9983 .9984 .9984
. .9985 .9985 .9986 .9986

3.0 . .9987 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990 .9990
3.1 .9990 .9991 .9991 .9991 .9992 .9992 .9992 .9992 .9993 .9993
3.2 .9993 .9993 .9994 .9994 .9994 .9994 .9994 .9995 .9995 .9995
3.3 .9995 .9995 .9995 .9996 .9996 .9996 .9996 .9996 .9996 .9997
3.4 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9998

3.5 .9998 ·.9998 .9998 .9998 .9998 .9998 ~9998 .9998 .9998 .9998
3.6 .9998 .9998 .9999 .9999 .9999 .9999 .9999 .9999 .9999 .9999
3.7 .9999 .9999 .9999 .9999 .9999 .9999 .9999 .9999 .9999 .9999
3.8 .9999 .9999 .9999 .9999 .9999 .9999 .9999 .9999 .9999 .9999
3.9 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
240

APPENDIX II - C

A
AREAS
under the
STANDARD
NORMAL CURVE
from 0 to t 0 . t

t 0 1 2 3 4 5 6 7 8 9

0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0754
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879

.0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
0.6 .2258 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2518 .2549
0.7 .2580 .2612 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852
0.8 .2881 .2910 .2939 .2967 .2996 .3023 .3051 .3078 .3106 .3133
0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389

1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 ..3621
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319

1.5 .1332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4420 .4441
1.fl .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
1.7 .4554 .4564 .4573 .4582 .4591 .4599 A608 .4616 .4625 .4633
.· 1.8 .4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 .4699 .4706
1.9 .4713 .4719 .4726 .4732 ,4738 .4744 .4750 .4756 .4761 .4767

2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 . .4812 .4817
2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890
~.3 .4893 .4896 .4898 .4901 .4904 .490() .4909 ..4911 .4!>13 . .4916
2.•i .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936

2.5 A938 .4940 .4941 .4943 .4945 .4946. .4948 .4949 .4951 .4952
2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964
2.7 .4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 .4973 .4974
2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986
.
3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990
3.1 .4990 .4991 .4991 .4991 .4992 .4992 .4992 .4992 .4993 .4993
3.2 .4993 .4993 .4994 .4994 •.;,994 .4994 .4994 .4995 .4995 .4995
3.3 .4995 .4995 .4995 .4996 .4996 .4996 .4996 .4996 .4996 .4997
3.4 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4l:l97 ..1!)98

3.5 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998
3.6 .4998 .4998 .499~) .4999 .4999 .4999 .4999 .4999 .4999 •.4999
3.7 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.8 .49()9 .4999 .4999 .4999 .4999 A999 .4999 .4999 .4999 .4999
·~.D .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000
241

BIBLIOGRAPHY

Anderson, T.W~ 1966 (7th printing): An Introduction to Multivariate


Statistical Analysis, Wiley & Sons.

Dorrer, E., 1966: Adjustment Computations, Department of Surveying


Engineering, U.N.B., Lecture Notes No. 12.

Fraser, D.A.s.; 1967 (4th printing): Statistics- An Introduction,


Wiley & Sons •
Hamilton, A.C., 1964: Statistics in Physical Science, Ronald.

Hirvonen, R.A., 1971: Adjustment by Least Squares in Geodesy and Photo-


grammetry, Ungar.

Hogg, R.V., Craig, A.T., 1966 (5th printing): Introduction to Mathematical


Statistics, MacMillan.

Lipschutz, s., 1968: Probability, Schaum's Outline Series, MacGraw-Hill.

Speigel, M.R., 1961: Statistics, Schaum's Outline Series, McGraw-Hill.

Vander Waerden, B.L., 1969: Mathematical Statistics, Sp~inger-Verlag.

Wells, D.E. and Krakiwsky, E.J., 1971: The Method of Least Squares,
Department of Surveying Engineering, U.N.B., Lecture Notes No. 18.

Wilks, S. S. , 1963 (2nd printing): Mathematical Statistics, Wiley and Sons.

Wonnacott, T.H. and Wonnacott, R.J., 1972 (2nd edition): Introductorz


Statistics, Wiley & Sons.

You might also like