Basic Concepts of Data and Error Analysis With Introductions To Probability and Statistics and To Computer Methods (PDFDrive)
Basic Concepts of Data and Error Analysis With Introductions To Probability and Statistics and To Computer Methods (PDFDrive)
Basic Concepts of Data and Error Analysis With Introductions To Probability and Statistics and To Computer Methods (PDFDrive)
Basic Concepts
of Data and
Error Analysis
With Introductions to
Probability and Statistics and
to Computer Methods
Basic Concepts of Data and Error Analysis
Panayiotis Nicos Kaloyerou
123
Panayiotis Nicos Kaloyerou
Department of Physics,
School of Natural Sciences
University of Zambia
Lusaka, Zambia
and
Wolfson College
Oxford, UK
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
For my wife Gill and daughter Rose
Preface
This book began as a set of lecture notes for my laboratory sessions at the
University of Swaziland many years ago and continued to be used as I moved to
other universities. Over the years the lecture notes were refined and added to,
eventually expanding into a short book. I also noticed that many books on practical
physics were either advanced or very detailed or both. I felt, therefore, that there
was a need for a book that focused on and highlighted the essential concepts and
methods of data and error analysis that are of immediate relevance for students to
properly write-up and present their experiments. I felt that my book filled this gap
and so I decided it might be useful to publish it.
The original lecture notes comprised chapters one to four of the present book.
However, the publisher suggested that the book should be extended, so it was
decided to add chapter five, which is an introduction to probability and statistics,
and chapter six, which introduces computer methods.
For students to get started with their experimental write-ups, only chapters one to
four are needed. Though I have attempted to keep these chapters as simple as
possible, slightly more advanced derivations of important formula have been
included. Such derivations have been presented in separate sections so can be left
out on first reading and returned to at a later time.
Chapter five aims to provide the theoretical background needed for a deeper
understanding of the concepts and formula of data and error analysis. It is a
stand-alone chapter introducing the basic concepts of probability and statistics.
More generally, an understanding of the basic concepts of probability and statistics
is essential for any science or engineering student. Though this chapter is a bit more
advanced, I have tried to present it as simply and concisely as possible with the
focus being on understanding the basics rather than on comprehensive detail. I have
always felt that it helps understanding, and is, in any case, intrinsically interesting,
to learn of the origins and the originators of the science and mathematics concepts
that exist today. For this reason, I have included a brief history of the development
of probability and statistics.
vii
viii Preface
Chapter six introduces the computer methods needed to perform data and error
calculations, as well as the means to represent data graphically. This is done by
using four different computer software packages to solve two examples. However,
it is hoped that the methods of this chapter will be used only after chapters one to
four have been mastered.
I have added a number of tables and lists of mathematical formulae in the
appendices for reference, as well as some biographies of scientists and mathe-
maticians that have contributed to the development of probability and statistics.
It is hoped that this book will serve both as a reference to be carried to each
laboratory session and also as a useful introduction to probability and statistics.
The author particularly thanks Dr. H. V. Mweene (Dean of Natural Sciences and
member of the Physics Department, the University of Zambia) for meticulous
proofreading of the manuscript and for numerous suggestions for improving the
manuscript. Thanks are also due to Mr. E. M. Ngonga for careful proofreading.
ix
Contents
xi
xii Contents
xv
List of Figures
xvii
xviii List of Figures
When we give the results of a measurement, we need to give two pieces of informa-
tion: one is the UNIT which we are using and the other is the NUMBER which gives
the size, or ‘magnitude’, of the quantity when expressed in this unit.
For example, if you ask me how much money I have, I should say ‘eight euros’; I
mean that the amount of money I have is eight times the value of a one euro note. If I
merely say ‘eight’ I might mean eight cents, or eight dollars, or eight cowrie-shells;
the statement means nothing. We need the UNIT as well as the NUMBER.
Each quantity has its own proper unit. We measure length in metres, energy in
joules and time in seconds.
Before 1795, each country, or each part of a country, had its own units for length,
mass and time. The metric system (metre-kilogram) was adopted in France in 1795.
In this system, each unit is related to other units for the same quantity by multiples
of ten, with Greek prefixes adopted for these multiples. This system has gradually
been adopted all over the world, first for scientific work and later for trade and home
use. In 1799 in France, the metre and kilogram were declared legal standards for all
measurements.
In 1960 the 11th General Conference on Weights and Measures adopted the Systeme
International d’Unités (metres-kilograms-seconds) or SI units for short. Before
1960, the SI system was called the MKS-system. The SI system has been almost
universally adopted throughout the world. The SI system consists of the following
basic units: length - metre (m), mass - kilogram (kg), time - second (s), electric
current - ampere (A), temperature - kelvin (K),1 quantity of substance - mole (m),
and luminosity - candela (cd). The SI basic units, also called base SI units, are given
in Table 1.1, and defined in Table 1.2. All other units are derived from these seven
basic units (see Table 1.3).
In printed material, slanting (italic) type is used for the ‘symbol for quantity’, and
erect (Roman) type is used for the ‘symbol for unit’. By ‘symbol for quantity’ we
mean the symbol usually (but not necessarily always) used to represent a quantity;
for example, energy is usually represented by the capital letter E, while the symbol
for its unit, the joule, is J.
1 The kelvin unit was named after Lord Kelvin, originally William Thompson (1824–1907). Lord
Kelvin was born in Belfast, County Antrim, Ireland (now in Northern Ireland). He was an engineer,
mathematician and physicist who made contributions mainly to thermodynamics and electricity.
He obtained degrees from the universities of Glasgow and Cambridge and subsequently became
Professor of Natural Philosophy (later named physics) at the University of Glasgow, a position he
held for 53 years until he retired.
1.4 Names and Symbols for SI Base Units 3
Note that in the SI system we do not use capitals when writing the names of units,
even when the unit name happens to be the name of a person. We write kelvin for
the unit of temperature even though the unit is named after Lord Kelvin.
There is no base unit for, quantities such as speed. We have to build the unit for speed
from two of the base units. A unit built in this way is called a ‘derived unit’. Each
derived unit is related in a very direct way to one or more base units.
Some derived units are given in Table 1.3. A more complete list is given in
Appendix B.4. Notice that some, but not all of them, have special names of their
own. Some units are named after the scientist who contributed most to the develop-
4 1 Units of Measurement
ment of the concept to which the unit refers, e.g., the newton (N), named after Sir
Isaac Newton2 and the joule (J) named after James Prescott Joule.3
1.6 Prefixes of 10
For many purposes, the SI base or derived unit is too small or too large. In the SI
system we therefore use prefixes which make the unit larger or smaller. For example,
the prefix ‘kilo’ means ‘one thousand times’, so a kilometre is one thousand times
as long as a metre. The abbreviation for ‘kilo’ is ‘k’, so we can say 1 km = 1000
m. The prefixes with their abbreviations and meanings are tabulated in Table 1.4. A
more complete list is given in Appendix B.5
2 Sir Isaac Newton (1642–1727) was an English mathematician-natural philosopher (natural phi-
losophy is now called physics) born in Lincolnshire, England. As well as mechanics, he made
contributions in optics where he discovered that white light is composed of the colours of the rain-
bow by passing white light through a prism. His most significant contributions were to mechanics,
where, building on fledgling concepts of motion he greatly advanced the subject using calculus for
the first time, developed by himself and later, separately, by Leibnitz. He abstracted three aspects
of motion and the production of motion as being fundamental and stated them as three laws, now
called Newton’s laws of motion. Based on these laws and on the introduction of his universal law of
gravitation he performed many calculations among which was a quantitative description of the ebb
and flow of tides, a description of many aspects of the motion of planets and of special features of
the motion of the Moon and the Earth. He published this work in a remarkable book Philosophiae
Naturalis Principia Mathematica (Mathematical Principles of Natural Philosophy) published in
1687. Another of his major works was his 1704 book, Opticks. Newton studied at the University of
Cambridge and later became Lucasian professor of mathematics at the University of Cambridge.
3 James Prescott Joule (1818–1889) was an English physicist born in Salford, England. He is most
noted for demonstrating that heat is a form of energy, developing four methods of increasing accuracy
to determine the mechanical equivalent of heat. He studied at the University of Manchester.
1.7 Dimensional Analysis 5
The main reason to analyse formulae in terms of dimension is to check that they are
correct. This can best be explained by some examples.
where k is a dimensionless constant. For this formula to be correct it must have the
same dimensions on both sides.
6 1 Units of Measurement
L = [L T −2 ][T 2 ]
L = L,
which is clearly correct, since both sides of the formula are the same. We conclude
that the formula is correct. Notice that dimensionless constants are ignored. Where
physical constants have units, these are included in the dimensional analysis.
volume L3
V = volume of liquid flowing per second = = = L 3 T −1
time T
force mass × accelaration M L T −2 M T −2
P = pressure = = = = = M L −1 T −2
area area L2 L
π = 3.142 = dimensionless constant, hence ignored
8 = dimensionless constant, hence ignored
a = radius = L
force × time M L T −2 × T
η = viscosity = = = M L −1 T −1
area L2
l = length = L
[M L −1 T −2 ][L 3 ]
L 3 T −1 =
[M L −1 T −1 ][L]
L 3 T −1 = T −1 L 2 .
We conclude that the formula is incorrect, since the two sides of the formula are
different. In fact, the correct formula is
1.7 Dimensional Analysis 7
Pπa 4
V = .
8ηl
[M L −1 T −2 ][L 4 ]
L 3 T −1 =
[M L −1 T −1 ][L]
L 3 T −1 = T −1 L 3 .
Since both sides are the same, we conclude that the formula is dimensionally cor-
rect. Note that this method can tell you when a formula is definitely wrong, but it
cannot tell that a formula is definitely correct; it can only tell you that a formula is
dimensionally correct.
Below are shown photographs of some widely used instruments for measuring
dimensions of objects (outside dimensions, inside dimensions of hollow objects
and depths). These are the Vernier caliper (Fig. 1.1), the micrometer screw gauge
(Fig. 1.2) and the travelling microscope (Fig. 1.3). The Vernier caliper and the trav-
elling microscope use an ingenious method for the accurate measurement of length
dimensions, called the Vernier scale. Invented by Pierre Vernier4 for the accurate mea-
surement of fractions of millimetres. The micrometer screw uses another ingenious
method for the accurate measurement of length dimensions based on the rotation of a
screw system. Here, we will only describe the Vernier caliper and micrometer screw,
since the travelling microscope uses the Vernier system for the accurate measurement
of lengths.
The guide bar of the Vernier caliper shown in Figs. 1.1, 1.4, and 1.5 has two scales.
The lower scale reads centimetres and millimetres, while the upper scale reads inches.
Corresponding to this there are two scales on the Vernier, the lower one for measuring
4 Pierre
Vernier (1580–1637) was a French mathematician and government official born in Ornans,
France. He was taught by his father, a scientist, and developed an early interest in measuring
instruments. He described his new measuring instrument, now called the Vernier caliper, in his
1631 book La Construction, l’usage, et les Propriétés du Quadrant Nouveau de Mathématiques
(The Construction, Uses, and Properties of a New Mathematical Quadrant).
8 1 Units of Measurement
fractions of millimetres, and the top one for measuring fractions of an inch. The two
larger jaws are used to measure outer dimensions of an object, while the two smaller
jaws are used to measure internal dimensions of an object. By sitting the guide bar
on the bottom of an object and using the movable smaller jaw, depths can also be
measured. We will first discuss the millimetre/centimetre scale.
The essence of understanding how the Vernier scale measures fractions of mil-
limetres is to note that 20 divisions on the Vernier scale cover 39 mm of the main
scale as shown in Fig. 1.4. The length of each division of the Vernier scale is therefore
39 mm/20 = 1.95 mm. As a result, with the zeros of the two scales aligned as in
Fig. 1.4, the first division of the Vernier scale is short of the 2 mm division of the main
scale by 0.05 mm. If the Vernier scale is moved so that its first division is aligned
with the 2 mm division of the main scale, the distance moved by the Vernier scale is
0.05 mm, and the main scale reads 0.05 mm. The reading on the main scale is indi-
cated by the zero division of the Vernier scale. If the second division of the Vernier
scale is aligned with the 4 mm division of the main scale the distance moved from
the zero of the main scale is 2 × 0.05 = 0.1 mm, so that the reading on the main scale
is 0.1 mm. When the nth (where 0 ≤ n ≤ 20) division of the Vernier scale is aligned
1.8 Instruments for Length Measurement 9
Fig. 1.4 The figure shows 20 divisions of the Vernier scale covering 39 mm of the lower main
scale, so that the divisions of the Vernier scale are each 1.95 mm long
with the n × 2 mm division of the main scale, the Vernier has moved a distance of
n × 0.05 mm, corresponding to a reading of n × 0.05 mm on the main scale. Clearly,
for n = 20, the Vernier scale has moved a distance of 20 × 0.05 = 1 mm. We have
taken the zero of the main scale as the reference for measuring the distance moved
by the Vernier scale. However, it is clear that we can use any mm division of the
main scale as a reference from which to take the distance moved by the Vernier scale.
For example, take the 7 cm = 70 mm division of the main scale as the reference. If,
say, the 9th division of the Vernier scale is aligned with a division on the main scale,
then from our considerations earlier, we can say that the Vernier scale has moved a
distance of 9 × 0.05 = 0.45 mm from the 70 mm division, and the corresponding
reading on the main scale is 70 + 0.45 = 70.45 mm = 7.045 cm. This then is how
10 1 Units of Measurement
Fig. 1.5 Reading Vernier calipers. The figure shows that the largest whole number of mm before
the zero of the Vernier scale is 50 mm. The 7th division of the Vernier scale is aligned with a division
of the main scale. The reading is therefore 50 + (7 × 0.05) = 50.35 mm or 5.035 cm
the Vernier scale measures fractions of millimetres. With this in mind, a reading on
the Vernier scale is taken as follows:
The whole number of millimetres in a reading is taken as the largest number of
millimetres on the main scale before the zero of the Vernier scale. The fraction of a
millimetre is given by the division of the Vernier that is aligned with a division of the
main scale. As an example, let us take the reading of the scales of the Vernier caliper
shown in Fig. 1.5. The figure shows that the largest whole number of millimetres
before the zero of the Vernier scale is 5 cm = 50 mm. The 7th division of the Vernier
scale is aligned with a division of the main scale (which particular main scale division
has no relevance). Thus, the reading is 50 + (7 × 0.05) = 50.35 mm. A simpler way
to get the same result, is to read the Vernier scale as 0.35 mm, immediately giving
a total reading of 50 + 0.35 = 50.35 mm, as before. This simplified way of reading
the Vernier scale is achieved by numbering every two divisions of the Vernier scale
so that the 20 divisions are numbered from 0 to 10. The 0–10 numbering of the 20
Vernier scale divisions is shown in Figs. 1.1, 1.4, and 1.5. We may express the above
length reading as a formula:
where MS is the main scale reading, and VS is the Vernier scale reading.
Some authors express the Vernier scale readings in terms of the least count (LC)
of an instrument, defined as the smallest measurement that can be taken with an
instrument, which is given by the formula
Fig. 1.6 Negative zero correction. Note that the zero of both the top (inches) and lower (millime-
tres/centimetres) Vernier scale is to the left of the zeros of the main scale. Since the 8th division
on the Vernier scale is aligned with a division on the main scale, the negative zero correction to be
added to the Vernier caliper reading is (8 × 0.05) = 0.4 mm
A look at the top of the guide bar in Figs. 1.1, 1.4, and 1.5 shows a second inch
1 1
scale with a smallest division of 16 in. To read fractions of 16 in, the Vernier has a
second scale at its top. The principle is exactly the same as above. Here, 8 divisions
7
of the top Vernier scale cover 16 in of the top main scale, so that each division of the
top Vernier scale has a length 8 = 128
7/16 7
in. With the zeros of the two scales aligned
1 1
the first top Vernier division is short of the 16 in division of the main scale by 128 in.
1/16
The least count of the inch scale is LC = 8 = 128 in. As an example let us take the
1
reading from the top scales shown in Fig. 1.5. The main scale reading is taken to be
the largest number of 161
in before the Vernier zero, here MS = 31 16
in = 1 15
16
in. To find
1
the fractional part of 16 in, we can still use formula (1.1), but with 0.05 mm replaced
1
by 128 in. Noting that the closest Vernier division aligned with main scale division
1
is the 5th Vernier division, and Using formula (1.1) with the 128 in replacement, our
length reading becomes 1 16 + 5 × 128 = 1 128 in.
15 1 125
Finally, we need to consider the zero correction. When the caliper jaws are closed
the reading should be zero. If it is not, readings must be corrected. When the zero
of the Vernier is to the left of the zero of the main scale, as shown in Fig. 1.6, the
zero correction is said to be negative. It is clear that all readings will be slightly low.
Hence a negative zero correction must be added to the observed reading:
When the zero of the Vernier is to the right of the zero of the main scale, as shown in
Fig. 1.7, the zero correction is said to be positive. In this case, readings will be slightly
high, so that a positive zero correction must be subtracted from the observed reading:
Fig. 1.7 Positive zero correction. Note that the zero of both the top (inches) and lower (millime-
tres/centimetres) Vernier scale is to the right of the zeros of the main scale. Since the 8th division
on the Vernier scale is aligned with a division on the main scale, the positive zero correction to be
subtracted from the Vernier caliper reading is (8 × 0.05) = 0.4 mm
The principle of the micrometer is based on the fact that as a screw is turned by one
revolution, it advances a distance equal to its pitch. The pitch of the screw is the dis-
tance between adjacent threads. When facing the ratchet, a clockwise rotation closes
the micrometer, while an anticlockwise rotation opens it. Typically, a micrometer has
a screw of pitch 0.5 mm and a thimble with 50 divisions. Thus, turning the thimble 1
1
division anticlockwise ( 50th of a rotation) advances (opens) the micrometer spindle
by 50 = 0.01 mm. The sleeve scale is marked in millimetres and half divisions of
0.5
millimetres. Therefore, the sleeve division closest to the edge of the thimble gives
either an integer millimetre reading or an integer plus a half integer millimetre read-
ing, depending on whether the division is a millimetre or half millimetre division.
Fractions of a half millimetre are read from the thimble. The thimble reading is the
division on the thimble aligned with the line running along the sleeve, which we
will call the reference line. For example, let us take the micrometer reading shown
in Fig. 1.8. The nearest sleeve division before the edge of the thimble is the 7 mm
division. The thimble division most closely aligned with the reference line is the 37th
division. The micrometer reading is therefore 7+(37×0.01) = 7.37 mm. As another
example, consider reading the micrometer shown in Fig. 1.9 which involves a half
millimetre sleeve division. Here, the nearest sleeve division before the thimble edge
is the 6.5 mm division. The thimble division lies almost in the middle between the
23rd and 24th divisions, so we may take 23.5 as the closest division. The micrometer
reading is therefore 6.5 + (23.5 × 0.01) = 6.735 mm.
As with the Vernier scale, we must consider the zero correction for the micrometer.
For a fully closed micrometer the edge of the thimble should be exactly aligned with
the zero of the sleeve when the zero of the thimble is aligned with the reference line.
If the zero of the thimble is above the reference line, as shown in Fig. 1.10, the
zero correction is negative so that observed readings are too low. Again, a negative
zero correction must be added to the observed reading:
1.8 Instruments for Length Measurement 13
Fig. 1.8 Reading a micrometer (I). The nearest sleeve division to the edge of the thimble is the
7 mm division. The thimble division most closely aligned with the reference line is the 37th division.
The micrometer reading is therefore 7 + (37 × 0.01) = 7.37 mm
Fig. 1.9 Reading a micrometer (II). The nearest sleeve division to the edge of the thimble is the
6.5 mm division. The thimble division lies almost in the middle between the 23rd and 24th divisions,
so we may take 23.5 as the closest division. The micrometer reading is therefore 6.5+(23.5×0.01) =
6.735 mm
If the zero of the thimble is below the reference line, as shown in Fig. 1.11, the
zero correction is positive and the observed readings are too high. Again, a positive
zero correction must be subtracted from the observed reading:
For Vernier calipers it is easy to see why a zero correction is negative or positive,
but for a micrometer a little more thought is needed. The two following examples
address this issue.
Example 1.8.3 (Negative zero correction. The zero of the thimble is above the ref-
erence line.)
To see why a micrometer with the thimble zero above the reference line (negative
zero correction) gives too low a reading, consider a micrometer with the zero above
the reference line by 3 divisions, i.e., by 3 × 0.01 = 0.03 mm, as shown in Fig. 1.10.
Also suppose that the length of an object has been measured and the observed reading
is 6.1 mm. For the zero of the thimble to reach the reference line the spindle moves a
14 1 Units of Measurement
distance of 0.03 mm. To reach the observed reading of 6.1 mm, the spindle moves a
further distance of 6.1 mm. The correct length measurement of the object is the total
distance moved by the spindle, which is 0.03+6.1 = 6.13 mm, clearly longer than the
observed reading. We see, then, why a micrometer with a negative zero correction
gives too low an observed reading, and, hence, why we must add a negative zero
correction: the distance traveled by the spindle for the thimble zero to reach the
reference line is not included in the observed reading.
Example 1.8.4 (Positive zero correction. The zero of the thimble is below the refer-
ence line.)
To see why a micrometer with the thimble zero below the reference line (positive
zero correction) gives too high a reading, consider a micrometer with the zero below
the reference line by 3 divisions, i.e., by 3 × 0.01 = 0.03 mm, as shown in Fig. 1.11.
Also suppose that the length of an object has been measured and the observed reading
is 6.1 mm. The zero of the thimble has a ‘head start’ of 3 divisions (or 0.03 mm) so
that to reach the reading of 6.1 mm the spindle only travels a distance of 6.1 − 0.03 =
6.07 mm. Since the correct length measurement of the object is the distance moved
by the spindle, as stated above, the correct length of the object is 6.07 mm, which
is shorter than the observed reading. We see then why a micrometer with a positive
zero correction gives too high an observed reading and hence why we must subtract a
positive zero correction: The 3rd thimble division below the reference line indicates
a distance of 0.03 mm which the spindle has not moved, and this ‘false’ distance is
included in the observed reading, giving a value which is too high.
Chapter 2
Scientific Calculations, Significant
Figures and Graphs
figures remains unchanged. For example, we can write the length of the block alterna-
tively as 97.6 mm = 9.76 cm = 0.0976 m, but all lengths are given to three significant
figures.
A good guide for the error in reading an instrument is as follows:
The requirement to round down to the lowest single digit is for consistency with
rule 2.4.2 Sect. 2.4 for writing errors, namely, that an error should be given to 1
digit and be of the same power of 10 as the last (uncertain) significant figure of
a measurement. For example, in the ruler measurement above the result is given
as 97.6 mm. According to our rule, a ruler is accurate to half its smallest division,
i.e., 0.5 mm, so that we should write the error in the answer as ± a half of 0.5 mm,
i.e., 0.25 mm. Since there are two digits in the error 0.25 we must round down to the
lowest single digit, namely, 0.2. Thus, the measurement together with error is written
as 97.6 ± 0.2 mm.
In Sect. 2.1 above, we have used the phrase ‘significant figure’. What is a significant
figure? When are figures not significant?
Here are a few simple rules on significance, should any doubt ever arise.
1. All digits 1–9, and all zeros standing between such digits, are significant. So, for
example, both these quantities are expressed to four significant figures:
4328 m
5.002 mm
2. Zeros written to the LEFT of the number and after a decimal point are used to
express negative powers of ten, and are not considered significant as they can
always be removed by putting the quantity into standard form or by changing the
unit. For example, all these quantities are expressed to four significant figures.
3. In a decimal number, that is, a number which includes a decimal point, all zeros
written to the RIGHT of a non-zero digit are significant. For example:
4. In a non-decimal number, that is, a number which does not include a decimal
point, zeros written to the RIGHT of a non-zero digit are used to express powers
of ten, and may or may not be significant. In such cases it is necessary to explicitly
state the number of significant figures. For example, the number 5,359,783 stated
to four significant figures reads 5,360,000. The four zeros are essential as they
indicate powers of ten. Just by looking at this number we cannot tell the number
of significant figures it represents. In such cases, it is essential to explicitly state
the number of significant figures; for example, we should write 5,360,000 (3 sf),
where the abbreviation ‘sf’ stands for ‘significant figures’. Contrast this with the
case of a decimal number (item 3 above) where the zeros to the right of the last
non-zero digit indicate the number of significant figures.
5. In any case of doubt, it is good practice to state the number of significant figures
being used. It is usual to abbreviate the statement like this:
The difference between significant figures and decimal places must be clearly under-
stood. For this reason and for completeness we consider what is meant by the number
of decimal places. The abbreviation for ‘decimal places’ is ‘dc. pl.’
1. The number of decimal places is the number of digits to the right of the deci-
mal point. Consider the following examples where both the number of decimal
places and the number of significant figures are given:
2. Zeros between the decimal point and the last non-zero digit are counted. Zeros
to the right of the last non-zero digit are only counted when the answer to a cal-
culation or measurement is required to be specified to given number of decimal
18 2 Scientific Calculations, Significant Figures and Graphs
places, otherwise they are ignored. In the following examples, the numbers are
to be rounded to three decimal places:
The number of significant figures used for a measured quantity must not exceed the
accuracy of the instrument used for measurement. Therefore, in measurement, the
last (uncertain) significant figure should be to the same power of 10 as the error in
the instrument. In Sect. 2.1 we saw that the error in measuring a length using a metre
rule is ±0.2 mm. Therefore, if we measure a length using a ruler and find 17.3 mm,
we should write this measurement and its error as 17.3 ± 0.2 mm or 1.73 ± 0.02 cm.
It would be pointless to try to estimate the length measurement to a greater accuracy,
e.g., 17.28 mm, and it would be wrong to write the answer as 17.28 ± 0.2 mm.
Now, consider the case where measuring the thickness of a pencil using a microm-
eter yields 6.735 mm. The smallest division of a micrometer is 0.01 mm. We would
therefore write the error in the measurement, according to the instrument error rule
of Sect. 2.1, as ±0.002 mm. The pencil thickness measurement is therefore written
as 6.735 ± 0.002 mm.
We may abstract from the above examples the following two rules for the number
of significant figures to be included in a measurement and the error in the measure-
ment.
When performing calculations with measured quantities certain definite rules should
be followed:
The result appears to be correct to EIGHT significant figures! Is this really so? No, it
is not! The thickness is known to only two significant figures, and the second figure
is doubtful; we might have chosen 5.7 mm, or 5.5 mm, because the last figure is an
estimate. Calculate the volume again using these different thicknesses; you will find
20 2 Scientific Calculations, Significant Figures and Graphs
that only the FIRST figure remains unchanged! All that we can say for certain is that
and even then the second figure is only an estimate. We see therefore that the answer
after multiplication is only as accurate as the least accurate of the data.
Many calculations not only involve multiplications and divisions, but mixtures
of additions, subtractions, multiplications and divisions in various combinations.
Calculations may also include trigonometric, hyperbolic, logarithmic, exponential
and other functions. A similar analysis to the multiplication example above will show
that an answer to a calculation should be given to the same number of significant
figures as the least accurate of the data. We may state this as a general rule:
This section provides general information on drawing graphs. Later, in Sect. 3.7
‘Graphs’, the points-in-pairs method for drawing the best straight line through a set
of data points will be explained. The points-in-pairs method has the advantage that it
provides a convenient method for finding the error in the slope of a straight-line graph.
We often present the data from a scientific experiment in the form of a GRAPH and
then deduce a final result from the graph. Why do we do this? Because it makes it
possible to see general trends in the data and because our final result then comes from
all the data taken together, instead of from only part of it. Accuracy can therefore be
much higher. Here are a set of rules for drawing and using graphs.
1. Use a sharp pencil.
2. Look at the data to be plotted. Notice whether quantities are increasing or decreas-
ing in a reasonable manner. If any particular result seems wildly wrong, check
the arithmetic; if this is correct, check the observation itself.
3. Sometimes the instructions for the experiment indicate which of your two vari-
ables is to be plotted on the horizontal x-axis and which is to be plotted on the
vertical y-axis. The instruction ‘plot A against B’ means take B as the x variable
and A as the y variable. If no instruction is given, plot the independent variable
on the horizontal axis and the dependent variable on the vertical axis.
4. Leave wide margins all round your graph paper. Write a full title at the top of
the graph. A typical title would be ‘Velocity-Time Graph for a Trolley of Mass
0.750 kg When Accelerated by a Force of 0.30 N’.
5. Label each axis clearly with the quantity plotted and its unit.
6. Look at the RANGE of your data. If there are no negative values, only the
positive portion of the corresponding axis need be drawn. Also, note whether
22 2 Scientific Calculations, Significant Figures and Graphs
the origin or axes intercepts need to be included. If these are not needed, as is
the case when only the slope needs to be calculated, the axes scales need not be
started at zero, allowing the axes scales to be chosen so that the data points are
well spread over the graph and not bunched in a small portion of the graph. For
example, if the spread of data points is far from the origin, then including the
origin will result in the data points being bunched in a small portion of the graph
(see Fig. 3.8). We will return to this point at the end of Sect. 3.7.2.
7. Keeping the points made in item 6 in mind, choose your axis scales so that the
data points are spread over the whole graph and that, if the graph is a straight
line, the slope is not too close to either the x- or y-axis (see Fig. 2.1). Avoid using
1 cm to represent 3 or 7 or 1/3 or any other difficult or inconvenient number.
8. Mark your experimental points clearly. The following are examples of symbols
which clearly mark experimental points:
×, ⊗, , or ◦
A small square can also be used. It is important to show the data points clearly,
so avoid using dots to mark them. These may be difficult to see and may even
be obscured when the curve or line is drawn.
9. Errors in the dependent variable (the measured value) can be indicated with the
symbol
where the length of the vertical line indicates the size of the error in the dependent
variable. Sometimes, the following symbol is used to indicate errors in both the
dependent and independent variables:
Fig. 2.1 Right and wrong choices of scale and connecting line
2.6 Graph Drawing and Graph Gradients 23
14. A straight-line graph gives far more information than a curve does. If you know
the relation between the variables then you can use the table in Sect. 3.7.4 to find
what to plot to get a straight line.
The slope of a straight line is also called the gradient. A very steep line, nearly
vertical, has a large slope. A gently-sloping line on the same axes has a small slope.
A line which slopes upwards as we move from left to right has a positive slope; a
line which slopes downward as we move from left to right has a negative slope.
The slope of a line is ALWAYS calculated ‘in the units of the axes’. Thus, in the
example below, the slope is in ‘metres per second’ or ‘m/s’ or ‘ms−1 ’ and NOT in
any other unit.
They are chosen so that lines AC and BC, drawn parallel to the two axes, are as far
as possible convenient whole-number lines and not fractional ones.
The slope of the line is given by
and
BC = (y2 − y1 ) = (100 − 30) m = 70 m,
So the slope is
70 −1
slope = ms = 2 ms−1
35
Because this is a displacement-time graph, the slope is in units of velocity. If a slope
is correctly calculated, then by carrying the units through the calculation, the correct
units for the answer follow automatically.
When performing these slope calculations, always CHECK your values for the
length of AC and BC by looking at the lines again; are your lengths correct? Students
frequently make large errors in reading scales and co-ordinates.
(b) The Slope of a Curve
A curve, by definition, has a slope which varies along its length. We therefore have
to find the slope of a curve at each point separately. For example, what is the slope
of the curve at point P in Fig. 2.3? To find out, we draw a TANGENT at P and then
determine the slope of the tangent. The slope at P is positive. If we repeat the process
at Q, we shall find a negative slope. Notice that at points R and S the slope is zero.
Slope at Point P
At point P, the slope of the curve can be found by determining the slope of the
tangent AB:
BC y2 − y1
slope of AB = =
AC x2 − x1
55 − 15
=
24 − 4
= 2 ms−1
Slope at Point Q
At point Q, the slope of the curve can be obtained by finding the slope of the tangent
LN.
LM y − y1
slope of L N = = 2
NM x2 − x1
25 − 86
=
46 − 27.5
= −3.3 ms−1
This slope is negative; the line slopes downwards to the right. At points R and S,
any tangent drawn would be parallel to the x-axis and any triangle drawn would have
zero height. The slope at these points is therefore zero.
Chapter 3
Error Analysis
REPEATED READINGS - Never be satisfied with a single reading; repeat the measurement.
This procedure will improve the precision of results;
it can also show up careless mistakes.
RANDOM ERRORS - All measurements are subject to random errors and these
spread the readings about the true values. Sometimes
the reading is too high, sometimes too low. With
repeated readings, random errors tend to cancel out.
If n readings are taken then the best estimate is the mean (or average):
Example 3.1.1 (Calculation of the mean) Suppose you are measuring the volume
of water flowing through a tube in a given time. Five readings of this quantity may
yield the values:
436.5, 437.3, 435.9, 436.4, 436.9 cm3
(If one reading differs too much from the others, it may be due to a mistake. Since a
bad reading will influence the average, it may be better to neglect it.)
The mean is given by
The more readings you take, the more reliable your results.
Fig. 3.1 Results from two different experiments showing different spreads
The results of experiment 2 are regarded as more reliable because of the smaller
spread.
d1 = x1 − x
d2 = x2 − x
RESIDUALS .. ..
. .
dn = xn − x
1
2
STANDARD d12 +d22 +d32 +···dn2
DEVIATION σ= n
(3.1)
The above formula for standard deviation is precise when all possible values of a given
quantity are available. The entirety of possible values is called the POPULATION
and we may call the above formula the POPULATION STANDARD DEVIATION.
Often, as in nearly all experiments, only a portion of all possible values, called
a SAMPLE, is available. When only a sample is available, the best estimate of
the population standard deviation is the SAMPLE STANDARD DEVIATION, σs ,
given by
SAMPLE 1
2
d12 +d22 +d32 +···dn2
STANDARD σs = n−1
(3.2)
DEVIATION
In what follows, we will use the term ‘standard deviation’ to mean ‘population stan-
dard deviation’, but we will use the full name when we want to refer to the ‘sample
standard deviation’.
Solution
x = 436.6 cm3 (from Example 3.1.1)
21 21
d12 + · · · d52 0.01 + 0.49 + 0.49 + 0.04 + 0.09
σ= =
n 5
21
1.12 1
σ= = 0.224 2
5
σ = 0.47 cm3
σ ≈ 0.5 cm3
STANDARD ERROR σ
sm = (3.3)
(n−1) 2
1
IN THE MEAN
It is such that the true value x has a 68% chance of lying within ±sm of the mean
value and a 95% chance of lying within ±2sm , etc. Thus, sm is the required measure
of how close the mean value of the given sample, x, is to the unknown true value.
3.1 Random Errors 31
Fig. 3.3 Standard error is a measure of the closeness of the true value to the mean value
We may represent the chance of the true value being found near the mean graph-
ically. This is shown in Fig. 3.3.
Example 3.1.4 (Calculation of standard error) Calculate the standard error for the
five readings of Example 3.1.1.
Solution
mean = x = 436.6 cm3 (from Example 3.1.1.)
σ 0.47 0.47
sm = 1 = 1 =
(n − 1) 2 (5 − 1) 2 2
= 0.235 cm ≈ 0.2 cm
3 3
32 3 Error Analysis
Notice that sm depends both on the standard deviation σ and on the number n of
readings.
x1 , x2 , . . . , xn .
ei = xi − X, (3.5)
1 1 1 1
n n n n
nX
Ek = x − X = xi − X = xi − = (xi − X ) = ei .
n i=1 n i=1 n n i=1 n i=1
(3.6)
From Eq. (3.6) we get
1 1 2 1 1
n n n n n n n
1 2
E k2 = ei e j = ei + ei e j = e + ei e j .
n2 n2 n2 n n2
i=1 j=1 i=1 i=1 j=1, j=i i=1 j=1, j=i
(3.7)
3.1 Random Errors 33
The square of the standard deviation (variance) σt2 of the errors in each measurement
of the grand distribution formed by lumping together the measurements of the N sets
is given by
1 2 1 2
N N
σt2 = σk = e k = e2 . (3.9)
N k=1 N k=1
The error in the mean of the k th set of n measurements is given by Eq. (3.6).
Now, the standard deviation sm of the error in the means of the N sets of n
measurements is the standard error in the mean that we are seeking. Its square, the
standard variance of the mean, is given by
1 1 2
N N
sm2 = (x k − X )2 = E = E 2 . (3.10)
N k=1 N k=1 k
Next, we want to establish a relation between E 2 and e2 . Substituting Eq. (3.7)
for each E k in Eq. (3.10) gives
⎡ ⎤
1 N
1 1 n n
sm2 = ⎣ e2 + ei e j ⎦
N k=1 n n 2 i=1 j=1, j=i
⎡ ⎤k
1 N
1 2 1
N n n
= e k + ⎣1 ei e j ⎦ .
N k=1 n N k=1 n 2 i=1 j=1, j=i
k
Since ei and e j are independent, negative terms cancel positive terms in the double
sum, so that the double sum term equals zero, giving
1 1 2
N
1
sm2 = ek = e2 . (3.11)
N k=1 n n
34 3 Error Analysis
1 2
E 2 = e . (3.12)
n
Substituting Eqs. (3.9) and (3.10) into Eq. (3.12) and taking the square root gives
σt
sm = √ . (3.13)
n
We do not yet have a formula for the standard error in the mean since σt is an
unknown quantity. We therefore need a way to approximate this term. We can do this
as follows: From Eqs. (3.5) and (3.6) we get
xi − x = ei − E k . (3.14)
1 1
n n
σ 2 = (xi − x)2 = (ei − E k )2
n i=1 n i
1 1
n n n
= ei e j − 2E k ei + E k2
n i=1 j=1 n i=1
1 2 1 1
n n n n
= ei + ei e j − 2E k ei + E k2
n i=1 n i=1 j=1, j=i n i=1
1 2 1
n n n
= ei + ei e j − 2E k2 + E k2 ,
n i=1 n i=1 j=1, j=i
where we have used Eq. (3.6). The double sum gives zero since the negative terms
cancel the positive terms pairwise. Hence
1 2
n
σ 2 = e − E k2 .
n i=1 i
This relation is for one of the N sets of n measurements. Summing over all N we get
n
1 2 1 1 2 1 2
N N N
σ =
2
σk = ei − E .
N k=1 N k=1 n i=1 N i=1 k
k
1 2
N
σ 2 = e k − E 2 = e2 − E 2 .
N k=1
Substituting Eqs. (3.9) and (3.10) into the above equation gives
σ 2 = σt2 − sm2 .
Now σ 2 is still an unknown quantity. But, a very good estimate of it is the square
of the standard deviation σ 2 of the set of the n actual measurements, i.e., σ 2 ≈ σk2 ,
giving
σ
sm = 1 ,
(n − 1) 2
where we have dropped the k subscript. This completes the derivation of the formula
for the standard error in the mean.
A random error spreads results about the true value and from the equation for sm it
is clear that by taking a large enough number of readings n, the random error can
be made arbitrarily small. However, there are other errors, called systematic errors.
With systemic errors the readings are not spread about the true value, but about some
displaced value. Systematic errors will therefore cause the mean to be shifted away
from the true value. In this case, simply repeating the measurements will not reduce
the systematic errors.
It is customary to distinguish between an ACCURATE and a PRECISE measure-
ment. An ACCURATE measurement is one in which systematic errors are small; a
PRECISE measurement is one in which the random errors are small. Some examples
of systematic errors are:
1. Parallax errors (these can occur when reading a pointer on a scale).
2. Zero errors (these occur when an instrument is not properly set at zero).
3. Inaccurate instrument scales (these are errors inherent in instrument scales, such
as those of rulers or micrometers).
36 3 Error Analysis
Example 3.2.1 (Instrument error rule) For a length of, say, 15.6 mm measured by an
ordinary ruler, the error estimate would be ±0.2 (see Sect. 2.1) and we would write
Consider again Example 3.1.1 where five readings of the volume of water flowing
through a tube are taken. So far we have found x = 436.6 cm3 and sm = 0.2 cm3 .
Now consider what systematic errors may have occurred in taking the readings:
1. The person timing may have had a tendency to start or stop the stop-watch too
soon or too late. Let us suppose the person had a tendency to start the stop-watch
too soon or too late by 1/5 s, so that time readings are either longer or shorter
than the true values. This is a common type of systematic error and arises because
individuals respond to aural and visual stimuli in different ways. Now, suppose
further that the volume of the flowing water was measured in a time interval of 4
min = 240 s, then
436.6
volume flowing in one second = cm3
240
1 436.6 1
volume flowing in one second = × cm3
5 240 5
= 0.36 ≈ 0.4 cm3 .
Since the stop watch was started too late or too soon by 1/5 s, the systematic error
in measuring the volume flowing in 240 s is the volume flowing in 1/5 s, i.e., 0.4
cm3 . Thus, the error in the volume flowing in 240 s is:
2. Because the surface of a liquid is curved, systematic parallax errors may occur in
reading the volume of the water. To account for this, we estimate a further error
of 0.2 cm3 in measuring the volume of the water (Fig. 3.4).
We now have three errors in measuring the volume: The standard error sm = 0.2 cm3
(calculated in Example 3.1.4), and the two systematic errors of 0.4 cm3 and 0.2 cm3
calculated above. The question is how to combine these errors to get the total error.
We could simply add them to get
This would be correct if all the errors tended to push the value in the same direction
- either up or down. In some cases this may occur. Generally, however, summing
the errors would be an over-estimate. Since some errors may push the value down,
whilst others push the value up, they have a tendency to cancel each other, reducing
the overall error.
From detailed statistical arguments, it is found that the best estimate for combining
random and systematic errors is given by
1
E = [(e1 )2 + (e2 )2 + (e3 )2 + . . . + (en )2 ] 2
For a good experiment r is small, so that (r )2 can be neglected to a good accuracy.
Even so, the expression is still cumbersome. Moreover, more detailed statistical
arguments show that it is too large an error estimate.
3.4 Errors in Formulae 39
V = lwh.
This is even more cumbersome than the previous example, and, again, more detailed
statistical arguments show that this approach gives too large an error estimate.
In the examples above, we saw that an error estimate Z by the direct approach,
i.e., by substituting A + A, B + B, C + C, . . . directly into the formula Z =
Z (A, B, C, . . .) leads to a cumbersome expression. We also stated that detailed sta-
tistical arguments show the error to be too large. A much better approach lead-
ing to ‘neater’ error formulae giving better error estimates in most cases is to use
calculus, specifically the derivative (for Z = Z (A)) and the total differential (for
Z = Z (A, B, C, . . .)).
Consider first Z = Z (A) where Z depends only on one measured quantity A.
Then
dZ
= Z (A).
dA
Rearrangement gives
d Z = Z (A)d A
Substituting the errors Z and A for the differentials immediately gives the
required error formula:
ERROR IN Z (A) dZ
Z = d A A
In some cases, relative errors are preferred. A general formula for relative errors for
one-quantity formulae is immediately obtained by dividing the above formula by Z
to get
40 3 Error Analysis
When Z depends on two measured quantities Z = Z (A, B), we need to use the total
differential
∂Z ∂Z
dZ = dA + dB
∂A ∂B
Again, substituting errors Z , A and B for the differentials immediately gives
the required error formula:
ERROR IN Z (A, B) ∂Z ∂Z
Z = ∂ A A + ∂ B B
Similarly, relative errors are preferred in most cases. Dividing the above formula by
Z we get the required formula
d Z (A)
Z = A (3.16)
dA
or
Z 1 d Z (A)
= A (3.17)
Z Z (A) d A
to either derive an error and/or relative error formulae for various specific one-
quantity formulae, whichever is appropriate. Note that the error A may be a standard
error in the mean or a systematic error.
Proportionalities and Inverse Proportionalities
For the proportionality
Z (A) = k A
d(k A)
Z = A = kA
dA
The fractional error becomes
Z kA A
= =
Z kA A
For the inverse proportionality
k
Z (A) =
A
formula (3.16) gives
d(k A−1 ) k
Z = A = − 2 A.
dA A
The fractional error becomes
Z kA A
= 2 = ,
Z A k/A A
where the minus sign has been dropped since the error in an answer is given as
± Z . We conclude that the required formula for both proportionalities and inverse
proportionalities is:
Powers
Now suppose Z is given by
Z (A) = k An , (3.19)
dZ
= kn An−1
dA
and Eq. (3.19) into Eq. (3.17) we get
Z kn An−1 A
= A = n
Z k An A
The required formula is
42 3 Error Analysis
Z A
Z =n A
ERROR FORMULA FOR POWERS (3.20)
This formula applies even when n is negative. Though a negative error results, only
the positive magnitude is taken since the final result is written Z ± Z .
Exponentials
Consider the following formula containing an exponential:
Z nken A
= A = nA
Z ken A
Hence, the required formula is
Z (3.23)
Z = nA
ERROR FORMULA FOR EXPONENTIALS
Natural Logarithms
Consider the following formula containing natural logarithms:
Z = ln A (3.24)
dZ 1
= (3.25)
dA A
Substituting Eq. (3.25) into Eq. (3.16) we get the required formula:
A
ERROR FORMULA FOR NATURAL LOGARITHMS Z = A (3.26)
Z s (θ) = sin θ,
Z c (θ) = cos θ,
Z t (θ) = tan θ.
3.4 Errors in Formulae 43
Differentiating we get
d Z s (θ)
= cos θ
dθ
d Z c (θ)
= − sin θ
dθ
d Z t (θ)
= sec2 θ.
dθ
For trigonometric formulae the error and relative error formulae take similar forms
so we will give both forms. Substituting the derivatives into Eqs. (3.16) and (3.17)
we get
Z
ERROR FORMULA FOR SINES Z = cos θθ, Z = cot θθ
(3.27)
Z
ERROR FORMULA FOR COSINES Z = − sin θθ, Z = − tan θθ
(3.28)
For the derivation of specific multi-quantity error formulae we use the general formula
∂Z ∂Z
Z = A + B, (3.30)
∂A ∂B
while for specific relative error formulae we use the general formula
Z 1 ∂Z 1 ∂Z
= A + B. (3.31)
Z Z (A, B) ∂ A Z (A, B) ∂ B
44 3 Error Analysis
It is important to note that in the derivation of the error formulae the errors in
the measured quantities should be interpreted as differences from the true values1
A0 and B0 , i.e., A = A − A0 and B = B − B0 , and not as standard errors in
the mean or as systematic errors. Once the formulae are derived, then they can be
freely interpreted as standard errors in the mean, systematic errors or a combination
of each.
Sums and Differences
Consider the sum of two measured quantities:
Z = k A + l B. (3.32)
But, this formula gives too high an error estimate, since, as we mentioned above,
some errors push the value down, whilst others push the value up, so that they have
a tendency to cancel each other, reducing the overall error. A better error estimate is
obtained by first squaring Eq. (3.33) to get
Next, interpret each term on the right hand side of Eq. (3.34) as the average error
over many measured values. In this case, the first two terms are averages of positive
quantities and are therefore nonzero. On the other hand, the third term is an aver-
age of an equal number of positive and negative terms and is therefore zero, i.e.,
2klAB = 0. With these observations, Eq. (3.34) becomes
Taking the square root of both sides gives the formula for a better error estimate:
1 We recall that it is impossible to know the true value of a measured quantity. We can think of the
true value as given by the mean of an infinite number of measurements. This assumption suggests
that by taking enough readings we can approach the true value as closely as we please. Aside from
the fact that taking an infinite number of measurements is impossible, the assumption that the mean
of the infinite measurements gives the true value remains unproven. However, it is a good working
hypothesis that the mean of many ideal measurements (measurements free of systematic error) is
close to the true value.
3.4 Errors in Formulae 45
1
Z = [k 2 (A)2 + l 2 (B 2 ] 2 . (3.35)
Z = kA − lB,
then
∂Z ∂Z
= k, = −l,
∂A ∂B
so that substituting into Eq. (3.30) and squaring gives
The third term is now negative, but since, for the same reason as for the sum, it is
zero, taking the square root gives the same error formula (3.35) as for the sum. In
general, when n quantities, A1 , A2 , . . . , An , are added or subtracted to find a quantity
Z , the error Z in Z is given by
Differentiating gives
∂Z ∂Z
= k B, = k A.
∂A ∂B
For products (and quotients), the relative error is preferred since it leads to a neater
formula. Hence, we substitute the derivatives together with Z (A, B) = k AB into
Eq. (3.31) to get
Z 1 1 A B
= k BA + k AB = + .
Z k AB k AB A B
Once again, and for the same reason as for sums and differences, this formula gives
an overly large error estimate. Following similar steps and argument as for sums and
differences we get,
46 3 Error Analysis
2 2 2
Z A B A B
= + +2 ,
Z A B A B
with 2 A B
A B
= 0. Taking the square root of both sides, we get a better relative error
estimate formula: 21
Z A 2 B 2
= + . (3.37)
Z A B
Next, consider a multi-quantity quotient formula, and again aim for a formula for
the relative error:
A
Z (A, B) = k
B
∂Z k ∂Z kA
= , =− 2
∂A B ∂B B
Z 1 k 1 kA A B
= A A − A 2 B = −
Z kB B kB B A B
2 2 2
Z A B A B
= + −2
Z A B A B
With the equation interpreted as an average over many measurements, the third term
is again zero. Taking the square root of both sides, we get the desired better estimate
formula: 21
Z A 2 B 2
= + .
Z A B
It is identical to the relative error formula (3.35) for products. We may therefore write
a general formula for n products or quotients of n measured quantities labelled by
A1 , A2 , . . . , An as
OR A MIXTURE OF
THE TWO
(3.38)
Powers
Consider the formula for a product of quantities raised to a power,
3.4 Errors in Formulae 47
Z = k A p Bq
We follow similar steps as above and note that a relative error formula is preferred:
∂Z ∂Z
= kp A p−1 B q , = kq A p B q−1
∂A ∂B
Z 1 1 A B
= kp A p−1 B q A + kq A p B q−1 B = p +q
Z k A p Bq k A p Bq A B
The resulting relative error formula is similar to that for products and quotients. By
similar reasoning and mathematical steps, a better relative error formula is found
to be 2 2 21
Z A B
= p2 + q2 .
Z A B
ERROR FORMULA 2 2 2 21
Z 2 A1 2 A2 2 An
FOR = m1 + m2 + · · · + mn
Z A1 A2 An
POWERS
(3.39)
We want to find the error η in η due to the errors ρ, r , l and Q in the
measured quantities ρ, r , l and Q.
We can use the formula for products and quotients by first setting r 4 = B,
ρπ B
η= .
8l Q
48 3 Error Analysis
Notice that the constants 8 and π make no contribution to the error. Then find B/B
in terms of r using Eq. (3.21):
B = r4
B r
=4 .
B r
Hence,
2 2 2 2 21
η ρ l Q 4r
= + + + .
η ρ l Q r
Sometimes you may find that when you measure a number of quantities the error in
one of them is much smaller than the error in the other quantities. When squares are
taken the difference is increased still further.
Consider the following example:
Z = A+B
with
A = (100 ± 10) and B = (100 ± 3).
Then
A = 10, and B = 3,
so that
1
Z = [(A)2 + (B)2 ] 2
1
Z = [(10)2 + (3)2 ] 2
1 1
Z = (100 + 9) 2 = 109 2
Z = 10.4
1
Z = [(A)2 ] 2
Z = A
Z = 10
We see that the error of B = 3 contributes only 0.4 to the error Z = 10.4 in Z ,
i.e., makes only a 3.8% contribution. It would be a reasonable approximation there-
fore to neglect the contribution B in calculating the error in Z .
We conclude that when a number of quantities are measured, we may neglect the
contribution to the total error of those measured quantities which have a small error
(with care!).
3.6 Proportionalities
A great deal of experimental work is concerned with how one quantity changes as a
result of another. There are several types of such dependence:
1. When
x = kt,
x ∝ t.
Or when
s = kt 2 ,
s ∝ t 2.
2. When
1
ρ=k ,
v
with k constant, we say that ρ is inversely proportional to v.
3. When
kμ
ρ= ,
v
k constant, we say that ρ is proportional to μ and inversely proportional to v.
50 3 Error Analysis
3.7 Graphs
Graphs are used because they make it easy to see how one quantity is related to
another. For example, a graph can show whether or not two quantities are propor-
tional, or if a point does not fit a general pattern (which indicates a bad measurement).
y = mx + c,
where m and c are constants. m = slope of the straight line; c = where the line cuts
the y-axis (Fig. 3.5).
NOTE 1. If c = 0 then
y = mx.
The line cuts the x-axis when y = 0, i.e., at x = 0. The line therefore passes through
the origin.
0 = mx + c
−c
x= .
m
Example 3.7.1 (To test whether or not the graph to be plotted is a straight line graph)
Suppose we want to plot ρ against μ (ρ v. μ), where
ρ = kμ, k = const.
3.7 Graphs 51
Comparing with
y = mx + c
we see that m = k and c = 0 and we should expect a straight line of slope m passing
through the origin.
Now, suppose you have plotted 12 points on an x y-graph and the points are equally
spaced along the x-axis as shown in Fig. 3.6. What is the best line through the points?
One method you can use to draw the best straight line, which also provides a method
for finding the error in the slope, is the POINTS-IN-PAIRS method. Another common
and important method, but which involves much more work is the METHOD OF
LEAST SQUARES. We describe both below.
Dy
MEAN SLOPE m =D
x
Then,
Note: As far as possible, the variation in the distance between the points on the x-axis
should not be too great (since the slopes obtained from the more widely spaced points
would give better estimates than those based on closer points).
m 1 +m 2 +...+m 6
m= 6
Then,
y7 − y1 y8 − y2 y12 − y6
m1 = , m2 = ,..., m6 = .
Dx Dx Dx
Use the mean of the slope m already calculated for the case of equal x-axis spacing,
i.e.,
Dy
m=
Dx
Use the mean slope m already calculated for the case of unequal x-axis spacing, i.e.,
m1 + m2 + . . . + m6
m=
6
FROM NOW ON THE PROCEDURE IS THE SAME FOR EITHER CASE
CALCULATE RESIDUALS:
3.7 Graphs 55
d1 = m 1 − m
d2 = m 2 − m
.. .. ..
. . .
d3 = m 6 − m
21
d12 + d22 + d32 + · · · d62
σ=
6
Then,
STANDARD ERROR σ
sm = 1
IN THE SLOPE (n − 1) 2
We need to add some additional comments for lines passing through the origin and
those which do not.
GRAPH y = mx, (c = 0)
From the equation of a straight line we know that the line must pass through the origin.
Nevertheless, the ‘points-in-pairs’ method must be used as usual, even though the
‘best-line’ may not pass through the origin. One reason is that the amount by which
the line misses the origin may contain information about a possible systematic error
in the apparatus.
GRAPH y = mx + c
Finding the point c where the ‘best-line’ cuts the y-axis is straightforward when you
can plot the graph to include the origin. You can simply read-off c from the graph
(Fig. 3.7).
Sometimes, however, plotting a graph which includes the origin requires a choice
of scale which results in the plotted points being too close together (see Fig. 3.8). This
makes plotting an accurate line difficult and it is much better to choose a different
scale which spreads the points along the x-axis (see Fig. 3.9). The problem now is
that c cannot be simply read-off the graph. In this case, c can be easily calculated by
using the slope m of the ‘best-line’ and the means x and y:
c = y − m x.
56 3 Error Analysis
where k and A are constants. Plotting y against x will not give a straight line. Since
dealing with straight lines is easier, it is desirable to express Eq. (3.40) in such a way
that a straight-line graph can be given. This can be done using logarithms.
Recall the following rules for logarithms:
3.7 Graphs 57
log y = log k A x
log y = log A x + log k
log y = x log A + log k
log y = (log A)x + log k
Comparing this with the equation of a straight line, y = mx + c, we see that plotting
log y against x results in a straight line with slope m = log A which cuts the y-axis
at c = log k.
Though we can always look up the logs of our y-readings and plot the graph on
ordinary graph paper, it is much more convenient to use special logarithmic graph
paper, where the scale along the y-axis is logarithmic.
Sometimes, to get a straight line you must plot log y against log x. Again there is
logarithmic paper where both the x and y-axes are logarithmic.
The errors we have been calculating can be easily converted to percentage errors. If
Z is the error in a quantity Z , then the percentage error is given by
PERCENTAGE ERROR, e p e p = Z
Z ×100
In other words, the percentage error is just the relative error multiplied by 100.
3.9 Problems
1. The diameter of a wire is measured repeatedly in different places along its length.
The measurements in millimetres are
The method of least squares is a standard statistical method for drawing the best
straight line through a set of points.
Consider n pairs of measurements (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ). We assume the
errors are entirely in the y value. The analysis when there are errors in both the x and
y values is much more complicated, yet with little gain in accuracy. Hence, we shall
confine ourselves to the former case, which is what obtains in practically all cases.
We will also assume that each pair has equal weight. We see from the Fig. 4.1 that
the di given by
di = yi − Yi
are the deviations of the measured values yi from the value Yi given by the best line
(as yet unknown) through the data.
Now, the Yi are given by
Yi = mxi + c
where m is the slope of the unknown best line and c is where the unknown best line
cuts the y-axis.
The best values of m and c are taken to be those for which the sum of the deviations
squared,
i=n
i=n
S= di2 = (yi − mxi − c)2 , (4.1)
i=1 i=1
is a minimum - hence the name “method of least squares”. This method was suggested
by Gauss1 in 1801 and Legendre2 in 1806.
1 Carl Friedrich Gauss (1777–1855), originally named JohanFriedrich Carl Gauss, was a German
mathematician regarded as one of the great mathematicians of his time, making contributions to
number theory, geometry, probability theory, cartography, terrestrial magnetism, orbital astronomy,
theory of functions and potential theory (a branch of mathematics arising from electromagnetism
and gravitation). In relation to his work on cartography, he invented the heliotrope (an instrument
that focuses sunlight into a beam that can be seen from several miles away) and in his work on
terrestrial magnetism he invented the magnetometer to measure the Earth’s magnetic field. With
his Göttingen colleague, Wilhelm Weber, he made the first electric telegraph. Over many years, he
gave four derivations of the fundamental theorem of algebra. His two most important contributions
were published in 1801. The first publication was Disquisitiones Arithmeticae (Algebraic Number
Theory) and the second concerned the rediscovery of the asteroid Ceres. The importance of the
work on the Ceres asteroid has to do with the development of an ingenious method for dealing
with errors in observations, the method of least squares. Some years later, Legendre produced a
comprehensive exposition of this method (see footnote 2), and both Gauss and Legendre came to
share credit for the method.
2 Adrien-Marie Legendre (1752–1833) was born in Paris, France. He was a French mathematician
who made contributions in many areas of mathematics including number theory, geometry, mechan-
ics, orbital astronomy, and elliptic integrals. Legendre was appointed, with Cassini and Mechain,
to a special committee to develop the metric system and, in particular, to make measurements to
4 The Method of Least Squares 63
As usual, the minimum is found by setting the derivative equal to zero. Differen-
tiating Eq. (4.1) with respect to m and equating the result to zero gives
∂S n
= −2xi (yi − mxi − c) = 0
∂m i=1
n
n
n
= −2 xi yi − m xi − c
2
xi = 0.
i=1 i=1 i=1
Also, differentiating Eq. (4.1) with respect to c and equating the result to zero gives
∂S n
= −2 (yi − mxi − c) = 0
∂c i=1
n
n
= −2 yi − m xi − nc = 0.
i=1 i=1
Dividing by n we get
n
yi n
xi
=m + c. (4.3)
i=1
n i=1
n
With
n
yi n
xi
y= , x=
i=1
n i=1
n
we get
y = mx + c. (4.4)
determine the standard metre. He also worked on projects to produce logarithmic and trigonomet-
ric tables. His work on elliptic integrals, regarded as his most important contribution, was published
in his treatise Traité des fonctions elliptiques (Treatise on Elliptic Functions), 1825–37. The first
comprehensive treatment of the method of least squares is contained in Legendre’s 1806 book Nou-
velles méthodes pour la détermination des orbites des comètes (New Methods for the Determination
of Comet Orbits) and shares credit for its discovery with his German rival Carl Friedrich Gauss.
64 4 The Method of Least Squares
The best straight line passes through the point (x, y), called the CENTROID. Solving
Eqs. (4.2) and (4.3) simultaneously, we obtain (see Sect. 4.1 for the proof).
n
SLOPE OF THE BEST (xi − x)yi
m= i=1
n
i=1 (x i − x)
STRAIGHT LINE 2 (4.5)
Divide by n:
n
xi n
xi yi n
xi2
c = −m .
i=1
n i=1
n i=1
n
n
xi
Substitute = x and rearrange:
i=1
n
1 n
xi yi n
xi2
c= −m . (4.7)
x i=1
n i=1
n
n
n
xi2 − xi x = xi2 − 2x 2 n + nx 2 ,
i=1 i=1
n
n xi
= xi2 − 2x n + nx 2 ,
i=1 i=1
n
n
= xi2 − 2x xi + x 2 ,
i=1
n
= (xi − x)2 .
i=1
Let m be the standard error in the slope, and c be the standard error in c. We state
without proof that estimates of the standard errors are given by
n 1
STANDARD ERROR 2 2 (4.10)
i=1 di
IN THE SLOPE m = D(n−2)
n 1 (4.11)
STANDARD 2 2
x2 i=1 di
ERROR IN c c =
1
n + D . (n−2)
where
The sum of the squares of the n (4.12)
D= i=1 (x i − x)
2
residuals of the x−values
and
(4.13)
The difference between yi and
Yi = mxi + c, where m and c di = yi − Yi = yi − mxi − c
are the best values
Proofs of Eqs. (4.10) and (4.11) can be found in Appendix C of G.L. Squires, Practical
Physics, fourth edition, (Cambridge University Press, Cambridge, 2001).
If we require the best straight line to pass through the origin, i.e., c = 0, the best
value of m is given by setting c = 0 in Eq. (4.2),
n
xi yi (4.14)
m = i=1
n 2
i=1 xi
n 2 2
1
1 i=1 di
m = . n (4.15)
(n − 1) 2
i=1 x i
,
4.3 Lines Through the Origin 67
Even though we know that the line passes through the origin, it can be useful to use
Eq. (4.5) to determine the slope m of the best straight line, and Eq. (4.6) to determine
the constant c where it cuts the y−axis. In this case the distance by which the best
line misses the origin may give a visual indication of the error.
4.4 Summary
1.
x1 +x2 +...+xn n = number of data points
mean = x = n
2.
d1 = x1 − x
d2 = x2 − x
RESIDUALS .. .. ..
. . .
dn = xn − x
3.
1
STANDARD d12 +d22 +d32 +···dn2 2
DEVIATION σ= n
4.
STANDARD ERROR σ
sm = 1
IN THE MEAN (n−1) 2
68 4 The Method of Least Squares
6. ERROR IN POWERS
Z = k An
Z
Z = nA
A
Z = ln A
A
Z = A
8. ERROR IN EXPONENTIALS
Z (A) = ken A
Z
Z = nA
Z = A1 + A2 + · · · + An
1
Z = [(A1 )2 + (A2 )2 + · · · + (An )2 ] 2
2
2
2
Z A1 A2 An
Z = A1 + A2 + ··· + An
2
2
2 21
B1 B2 Bm
+ B1 + B2 + ··· + Bm
n 1
STANDARD ERROR 2 2
i=1 d
IN THE SLOPE m = D(n−2)i
n 1
STANDARD 2 2
x2 i=1 di
ERROR IN c c =
1
n + D . (n−2)
70 4 The Method of Least Squares
where
The sum of the squares of the n
D= i=1 (x i − x)2
residuals of the x−values
PERCENTAGE ERROR, e p e p = Z
Z ×100
Chapter 5
Theoretical Background - Probability
and Statistics
So far we have presented the essential elements of statistics needed for data and
error analysis. For a deeper understanding of the concepts and formulae presented
in the first four chapters it is necessary to consider essential elements of probability
theory as well as a more complete treatment of statistics. This then is the purpose
of this chapter. To achieve a coherent self-contained treatment there may be a little
repetition of earlier material. This chapter can be viewed as a stand-alone chapter and
can be read independently of the other chapters. To begin writing up experiments,
only chapters one to four are needed. Chapter 6 introduces computer methods and
can also be read independently after chapters one to four have been mastered. Indeed,
it is highly recommended that students learn to do calculations with a calculator and
draw graphs by hand before moving on to computers.
Below we give a brief history of the emergence of probability, but we will first
start with interpretations of probability. The reason for this ordering is that there are
two classes of interpretation of probability which permeate the development of prob-
ability and the originators of probability swayed from one interpretation to another,
even where they stated a preference for only one of the interpretations. The two
classes of interpretations in question are the objective interpretation and the subjec-
tive interpretation. Knowing these two interpretations will help in understanding the
essential elements of probability and statistics, which it is the purpose of this chapter
to present.
5.1 Introduction
Probability and statistics arise in almost every human endeavor: the sciences,
medicine, sociology, politics, insurance, games, gambling and so on. Many con-
clusions in medicine on what is good or bad for us result from statistical studies
where a large sample of people is studied under specific conditions (a particular diet
© Springer International Publishing AG, part of Springer Nature 2018 71
P. N. Kaloyerou, Basic Concepts of Data and Error Analysis,
https://doi.org/10.1007/978-3-319-95876-7_5
72 5 Theoretical Background - Probability and Statistics
But, there is a crucial, inescapable feature that emerges from this philosophical dis-
course, namely, the dichotomy of two broad categories of interpretation. These are the
objectivist or objective interpretation of probability and the subjectivist or subjective
interpretation of probability.
Some workers favour one interpretation over the other, while others, the pluralists,
take the view that the objective interpretation is more suited to some applications,
while the subjective interpretation is more suited to others. I fall in the category of
the pluralists for two reasons. First, because the areas to which probability is applied
is diverse; compare the simple case of the probability of obtaining heads in the toss
of a coin with the probability of deciding which horse will win a race. The first
involves a few well defined factors, while the second not only involves many factors,
but the factors themselves are not well defined. Deciding the winning horse depends
on personal knowledge, belief and even prejudice. Second, even founding fathers of
probability swayed from one interpretation to the other. Thus, adopting the simplest
versions of both interpretations provides us with an intuitive working interpretation
of probability which will be a great help in understanding the essential elements of
probability and statistics.
Within the two categories there are numerous subtle and not so subtle variations.
We will adopt and give the simplest definitions for each:
An objective interpretation refers to properties or behaviour that belong to a
physical entity and is totally mind-independent (it does not depend on the knowledge,
belief, prejudice or ignorance of any person). Objective probability is defined as the
relative frequency of an event occurring (desired outcome) when very many trials
are performed. Relative frequency is defined as the number of times the event of
interest occurs divided by the number of trials. As an example, consider tossing
a coin (we call each toss of the coin a trial) and ask for the probability of heads
up. The probability of obtaining heads is found by tossing a coin, say, 1000 times.
Suppose we get 506 heads. Then the probability of getting heads = relative frequency
= 1000
506
= 0.506. We know from the many trials that have been performed that the
probability of getting heads-up approaches 0.5, so our example gives a result close
to this value. But, suppose we repeat the 1000 tosses and obtain 356 heads giving a
probability of 0.356, far away from 0.5. Repeating the 1000 throws numerous times
shows that the outcome of 0.365 is far less likely than that of getting a result close
to 0.5. As we increase the number of trials, the chance of getting a result far from
0.5 becomes even more unlikely, Therefore, we can approach the true probability of
0.5 as close as we wish by taking a sufficient number of trials.1 Thus, the simplest
objective interpretation of probability is in terms of relative frequency. The objective
interpretation is an a posteriori2 interpretation of probability, since the probability is
assigned after the trials are completed.
1A criticism of this definition is that however large the number of trials, a very skewed series of
outcomes is still possible. Though, for a sufficiently large number of trials, the possibly of a skewed
result becomes vanishingly small (such that, in the lifetime of the universe it never occurs), it is
still possible. For all practical purpose this does not present a problem, but for a theoretical purist
it remains an issue.
2 Latin, meaning ‘from what comes after’.
74 5 Theoretical Background - Probability and Statistics
further information, PIR is applied and a probability of 13 that a particular ship has
sunk is assigned. But, suppose somebody knows that one ship is badly maintained
or the captain is incompetent. With this extra knowledge, a rational person assigns
a higher probability that that particular ship has sunk. There is no straightforward
way, however, of assigning a numerical value to the increased probability.
The Principle of Insufficient Reason states that if there is no reason to favour one
or more of a total set of mutually exclusive outcomes, each outcome is assigned
the same probability.
Probability as we know it today was introduced around 1655–1660 and from the
beginning was dual in having to do with degrees of knowledge (belief, even prejudice)
on the one hand, and tendency (frequency) on the other. In other words, the dichotomy
of the objective interpretation versus the subjective interpretation was present at the
earliest origins of probability. Before 1655–1660 there was little formal development
of probability, but some rough notions of probability and statistics existed from very
76 5 Theoretical Background - Probability and Statistics
early times. Sentences may be found in Aristotle’s writing which translate as ‘the
probable is usually what happens’. Sextus Empiricus (A.D. 200) commented on
‘signs’ (e.g., natural signs like ‘smoke’ indicate fire, patients show ‘signs’ indicating
disease). Signs, which may be natural or otherwise, were associated with probability
and their use dates back to Aristotle. Gambling is ancient, possibly primeval. Dicing,
for example, is one of the oldest human pastimes. A predecessor of dice is the
astragalus or talus. The talus is the knuckle bone or heel of a running animal such as
a deer, horse, ox, sheep or hartebeeste. The talus has the feature that when it is thrown
it can come to rest in only four ways, hence its use for gaming. It is therefore surprising
that an explicit formal development of probability did not take place much earlier.
But, this does not mean that related, even detailed, statistical notions did not exist
before 1655–1660. The idea of ‘long observation’, e.g., many throws of a dice, was
known long before this period. It appears in the first book about games of chance
written by Cardinano around 1550 but not published until 1663. Galileo Galilei
(1564–1642) had a good sense of chance. He was also aware of the value of ‘long
observation’, and even had a notion of relative frequency. The idea of experiment
was emerging as an important tool of science, advocated by the philosopher Francis
Bacon (1564–1642), and put into practice by Galileo. Galileo was perhaps the first
to start taking averages of observations and he had a sophisticated appreciation of
dealing with discrepant observations.
While gaming was the main initial driving force in the development of probability
and statistics, economic and scientific interest proved a powerful incentive for fur-
ther development. On the economic side, the need to calculate annuities (an annual
payment from an insurance or investment) played an important role in driving the
early development of probability and statistics and we will cite some of the main
contributions below. On the scientific side, the need to calculate the macroscopic
thermodynamic behaviour of gases in terms of the random movement of molecules
or atoms drove the development of statistical mechanics. Again, we will cite the main
contributions below.
In 1654 Pascal solved two problems of probability (one, to do with dice and one to
do with dividing the stake among gamblers if the game is interrupted) and sent them
to Fermat. The problems had been around for a long time and the chief clue to their
solution, the arithmetic triangle, was known from about a century before. Despite
this, Pascal’s 1654 solutions are generally taken as the beginning of probability,
sparking interest and setting the probability field rolling.
In 1657 Huygens wrote the first published probability textbook. It first came out
in Latin in 1657 under the title De ratiociniis in aleae ludo (Calculating in Games of
Chance). The Dutch version only appeared in 1660 as Van Rekiningh in Spelen van
Geluck. Huygen’s book is entirely about games of chance, with hardly any epistemic
references. It is notable, if not ironic, that the word ‘probability’ does not appear
in his book. About the same time Pascal made the first application of probability to
problems other than games of chance, thereby inventing decision theory. Also at this
time, the German law student Leibnitz, whilst still a teenager, applied probability
ideas to legal problems. In 1665 Leibnitz proposed to measure degrees of proof
and of right in law on a scale of 0 to 1, subject to a crude calculation of what he
5.1 Introduction 77
called probability. He also embarked on writing the first monograph on the theory of
combinations.
Among the first contributions to the development of probability and statistics,
motivated by the calculation of annuities were those in the late 1660s when John
Hudde and John de Witt began determining annuities, based on the analysis of sta-
tistical data. A little earlier, in 1662, John Graunt published the first extensive set of
statistical inferences drawn from mortality records.
The 1662 book Port Royal Logic was the first book to mention the numerical
measurement of something actually called probability. There are conflicting reports
on the authorship, but it is thought that Antoine Arnauld (1612–1694) was the main
contributor, in particular writing Book IV which contains the chapters on probability.
Arnauld was considered a brilliant theologian and philosopher. He was a member of
the Jansenist4 enclave at Port Royal. Pierre Nicole (1625–1695), who also contributed
to Logic, and Blaise Pascal (1623–1662) were also members of the enclave. Pascal
did not contribute to Logic.
The Principle of Insufficient Reason, under the name of ‘equipossibility’, origi-
nated with Leibnitz in 1678. He used it to define probability as the ratio of favourable
cases to the total number of equally probable cases. Laplace, much later, also defined
probability in this way. It is a definition that was prominent from the early beginnings
of probability and is still in use today despite, as mentioned above, heavy criticism
from numerous eminent critics.
Jacques Bernoulli’s 1713 book Ars conjectandi (The Art of Conjecturing) repre-
sents a major contribution to the development of probability. The chief mathematical
contribution was the proof of the First Limit Theorem. One of the founders of mod-
ern probability, A. N. Kolmogorov, commented that the proof was made with ‘full
analytical rigour’. The proof was given in 1692 but Bernoulli was not satisfied with
it and the book was not published. Bernoulli died in 1705 and the book was passed
on to the printer by his nephew Nicholas. It was published posthumously in 1713 in
Basle.
J. Bernoulli was a subjectivist. Indeed, the version of the subjective interpreta-
tion we gave earlier largely originated with him. Though Bernoulli believed that a
probability assignment to a given event may differ from person to person, depending
on what information they possess, he also believed that rational persons in posses-
sion of the same information will assign the same probability to a given event; at
least, there is no indication that he believed otherwise. In the latter view, he differed
from modern subjectivists, like Bruno de Finetti, who interpret probability as being
a measure of a rational degree of belief (which may depend on taste, prejudice, etc.).
All subjectivists, however, require that probability assignments satisfy the axioms of
probability.
J. Bernoulli recognised the difference between a priori and a posteriori methods of
assigning probabilities. Just as Leibniz and Huygens used an early version of PIR, so
also did Bernoulli, and he based the a prior method on it. But, Bernoulli recognised
that PIR could not be applied completely generally and wanted to supplement it by the
considered important for an objective interpretation, since for ergodic systems time-
averages can be equated to ensemble-averages. The reason this is important is that
time-averages are objective and thus favoured over subjective ensembles, but time-
averages are notoriously difficult to calculate, whereas calculations with ensembles
are much easier. Thus, an objectivist would like to develop a probability interpre-
tation in terms of time-averages, but use ensembles only for calculation, hence the
importance of the equivalence of these averages. This is the case for D. A. Lavis
who has refined and defends an objective interpretation of statistical mechanics. His
basic idea is precisely to identify probabilities with time-averages. When a system
is ergodic time-averages are well defined and probabilities can be easily defined.
When systems are not ergodic the definition is more challenging. Lavis uses ergodic
decomposition and other more recent notions to define probabilities for this case. In
either case, the definitions are mind-independent.
It is appropriate, acknowledging numerous omissions, that we conclude with
Kolmogorov,(Andrey Nikolayevich Kolmogorov, 1903–1987), a brilliant Russian
mathematician and a founding father of modern probability. His motivation for the
axiomatic approach is that despite the practical value of both the objective and sub-
jective interpretations of probability, attempts to place these interpretations on a
solid theoretical footing proved to be very difficult. In an early paper, General The-
ory of Measure and Probability Theory, he aimed to develop a rigorous axiomatic
foundation for probability. He expanded this paper into the very influential 1933
monograph Grundbegriffe der Wahrscheinlichkeitsrechnung. It was translated into
English and published in 1950 as Foundations of the Theory of Probability. He also
made profound contributions to stochastic processes, especially Markov processes.
5.2 Basics
In the next section we begin our exposition of the essential elements of probability
and statistics. In the rest of this chapter, unless otherwise stated, the outcomes of
trials will be assumed equally probable (by the Principle of Insufficient Reason).
We begin with the definition of a trial, also called a random experiment. We recall
again its definition:
A trial (also called a random experiment) is an action or series
of actions leading to a number of possible outcomes).
Often, the outcomes of trials may be described by more than one sample space.
Example 5.2.2 (More than one sample space for the same trial)
Give two examples of a trial with more than one sample space and define at least two
different sample spaces for each example.
Solution
In throwing a dice, the outcomes {1,2,3,4,5,6} form one possible sample space.
The outcomes {even, odd} form another possible sample space. The outcomes from
picking a card from a deck of cards can be represented by three different sample
spaces: {52 cards}, {red, black} and {hearts, spades, clubs, diamonds}.
The sample space can be finite and hence called a finite sample space, or it can be
infinite. If it is infinite, it may be countable (having as many points as natural numbers)
and called a countable infinite sample space, or it may be noncountable as is the case
for a number n defined in some interval 0 ≤ n ≤ 1 and called a noncountable infinite
sample space. Finite or countable sample spaces are also called discrete sample
spaces, while a noncountable infinite sample space is also called a nondiscrete or
continuous sample space. The sample spaces consisting of the outcomes of throwing
a dice are discrete, while the sample space consisting of heights of students in a class
is continuous. Detection counts of radioactive particles are an example of an infinite
discrete sample space. The counts are discrete, but the full distribution is given by
counting for an infinite length of time.
A group of sample points is called an event, while a single sample point is called a
simple event. An example of an event is getting an odd number {1,3,5} when a dice
is thrown. Simple events are {1}, {2}…{6}. When an event consists of all sample
points, it is certain to occur; if it consists of none of the sample points, it cannot
occur.
An event is a group of sample points.
A single sample point is called a simple event.
We note that both formula look the same, but we should keep in mind that they differ
in interpretation.
performance of many trials to establish that the probability of getting a head should
be 21 .
In many of the examples that follow we will use simple trials such as tossing a
coin, throwing a dice, drawing a card from a deck of 52 cards, or choosing coloured
balls from a bag. For such simple cases, it is convenient to use PIR to assign equal
probabilities to the outcomes. After noting this, in the remainder of this chapter, we
will not continue to specify which interpretation we are using. We will assume that
coins, dice etc. are fair. A dice should be symmetrical and homogeneous, a deck of
cards should not contain 6 aces, a coin should not have two heads and so on. We
assume all of this in the examples that follow in the remainder of this chapter.
Though we have only briefly alluded to difficulties with both the objective and sub-
jective interpretations, mainly because the criticisms are technically advanced, the
difficulties are viewed as severe. For this reason, modern probability is based on an
axiomatic approach pioneered by Kolmogorov, and we list the basic axioms below.
The objective and subjective interpretations are still valuable since they provide a
more intuitive understanding of probability and we will keep these interpretations in
mind in what follows. Before stating the axioms, we first introduce some commonly
used notation:
∪ = union ∩ = intersection ⊂= subset.
Let S be a sample space, which may be discrete or continuous. For a discrete sample
space all subsets can be taken to be events, but for a continuous sample space only
subsets satisfying certain mathematical conditions (which are too technical for us to
state) can be considered as subsets. We saw earlier that the outcomes of a particular
trial can be represented by different sample spaces (see Example 5.2.2). We therefore
need to differentiate sets of events belonging to different sample spaces, and we do
this by placing the sets of events into classes. Thus, we will consider event A of a
given trial and sample space as belonging to class C, and allocate to it a number
P(A) between 0 and 1. Then P(A) can be interpreted as the probability of the event
occurring if the following axioms are satisfied:
Axiom 3 states that for mutually exclusive events A1 , A2 . . . in class C, the probability
of occurrence of any one of them is equal to the sum of the probabilities of each of
them occurring:
When these three axioms are satisfied P(A) is called a probability function. There
are a number of theorems which we need to note.
desired outcomes, the number of ways of getting A2 but not A1 is 6, giving a probabil-
ity P(A2 ) − P(A1 ) of getting A2 but not A1 of 52
6
, confirming the result from Theorem
5.2.1.
Theorem 5.2.2 The probability of not getting an outcome from the trial is 0. Let ∅
represent no event (or empty set), then
Theorem 5.2.3 The probability P(A ) of an event A not occurring is equal to 1 minus
the probability P(A) of A occurring. This follows since, as is perhaps obvious, the
probability of A occurring plus the probability of A not occurring covers all possible
outcomes, i.e., P(A) + P(A ) = 1:
1 1 1 1 4
P(A) = P(A1 ) + P(A2 ) + P(A3 ) + P(A4 ) = + + + = ,
52 52 52 52 52
which confirms Theorem 5.2.4.
Theorem 5.2.5 For any two events A or B, not necessarily exclusive, the probability
P(A ∪ B) is given by
1 to 6 is P(A ∩ B) = 52 6
. Then
5.2 Basics 87
13 24 6 31
P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = + − = .
52 52 52 52
The probability of drawing a heart or a card from ace to 6 by counting desired
outcomes is P(A ∪ B) = 52
31
, where hearts ace to 6 are only counted once, confirming
the answer using Theorem 5.2.5.
Theorem 5.2.6 For any two events, mutually exclusive or not, the probability P(A ∩
B) that A and B occur together plus the probability P(A ∩ B ) that A and any element
not in B occur together is equal to the probability that A occurs:
5 4 4 13
P(A) = P(A ∩ A1 ) + P(A ∩ A2 ) + P(A ∩ A3 ) = + + =
52 52 52 52
in agreement with Theorem 5.2.7.
The notation P(B|A) denotes the probability that B occurs given that A has already
occurred.
Consider two events A and B which in general will not be mutually exclusive. Further
suppose that A and B belong to a sample space P consisting of events {A, B, C, . . .}
with M total events (noting that each event can appear more than once). Suppose
that event A occurs m times in P, so that P(A) = Mm . We consider two trials. The
first results in one event of the sample space P occurring, and the second also results
in one of the events of P occurring. This defines a sample space consisting of all
combinations of events A, B, C, . . . in pairs. We call this the original sample space and
5.2 Basics 89
Now divide the numerator and denominator by N to get an important formula for
conditional probabilities
We have not proved the that Nn = Mm , but state that it is necessarily true and justify
our claim by the example to follow. Hopefully, this example will also help to clarify
our argument leading to the important formula above, which we highlight as:
P(A∩B)
Formula for Conditional Probability P(B|A) = P(A) (5.1)
sample space consists of all pairs of balls with the first ball replacement after being
picked:
RR RR RR RB RB
Original RR RR RR RB RB
sample RR RR RR RB RB
space BR BR BR BB BB
BR BR BR BB BB
The new reduced sample space consists of all pairs of balls in which a red ball is
picked first:
New
RR RR RR RB RB
reduced
RR RR RR RB RB
sample
RR RR RR RB RB
space
P(B|A) is found using the new sample space. The number of occurrences of both A
and B (i.e., the pairs RR) in the new sample space is 9. The total number of sample
points in the new sample space is 15 so that the required probability P(B|A) of picking
a red ball given that one has already been picked is
9
P(B|A) = (5.3)
15
To illustrate our argument leading to formula (5.1), and to offer some justification
for equating Nn to Mm , we first calculate P(A ∩ B) using the original sample space,
which has 25 elements. In the original sample space A and B both occur (i.e., RR) 9
times, so that P(A ∩ B) = 25 9
. Next, dividing the numerator and denominator of Eq.
(5.3) by 25, we get
We see that 15
25
= Nn = 35 = Mm .
The conditional probability P(B|A) is more easily calculated by counting desired
outcomes. When the red ball is replaced there are once again 3 red balls and 2
black balls in the bag, so that the probability of picking a second red ball is simply
P(B) = 35 = P(B|A), in agreement with our above result. This result follows because
the events A and B are statistically independent (see the next section). We may
conclude that for sampling with replacement, events are statistically independent.
Solution
From Example 5.2.10 or by counting desired outcomes, P(B|A) = 3
5
and P(A) = 35 ,
so that
3 3 9
P(A ∩ B) = P(B|A)P(A) = · = ,
5 5 25
in agreement with the value calculated in Example 5.2.10 by counting desired out-
comes in the reduced sample space.
New
RR RR RB RB
reduced
RR RR RB RB
sample
RR RR RB RB
space
By counting the number of times A and B both occur (RR occurrences) in the reduced
sample space, and noting that the number of sample points in the reduced sample
space is 12, we immediately get
6 1
P(B|A) = = .
12 2
We can also get the same result more simply by directly counting the balls, since
once a red ball is picked, 2 black balls and 2 red balls remain in the bag, giving
2 1
P(B|A) = = ,
4 2
in agreement with the above.
92 5 Theoretical Background - Probability and Statistics
As may have been noticed from the above examples, P(A) and P(B|A) are easily
obtained since they can be calculated directly from the sample spaces of single trials,
while calculating P(A ∩ B) is more tedious because it requires a consideration of
the sample space of pairs of outcomes from two trials. Therefore, the conditional
probability formula in the form of the multiplication rule Eq. (5.2) allows P(A ∩ B)
to be calculated more easily.
Solution
The probability P(A ∩ B) is the same whichever card is picked first. In this case, we
suppose that the queen is picked first.
With replacement: Let A = drawing a queen and B = drawing a king. Since there
are four queens in a deck of 52 cards P(A) = 524
of picking a queen first. Similarly,
after replacing the queen, the probability of drawing a king given that a queen is
drawn first is P(B|A) = 52
4
. By the multiplication rule, the probability P(A ∩ B) of
choosing a queen followed by a king is
4 4 1
P(A ∩ B) = P(B|A)P(A) = · = = 0.0059.
52 52 169
4 4 4
P(A ∩ B) = P(B|A)P(A) = · = = 0.0060.
51 52 663
We see that the probability is higher without replacement. This is expected since the
probability of picking a king is slightly higher.
Solution
With replacement: Let the four events A1 , A2 , A3 , A4 each represent drawing an ace.
We need to use the multiplication rule generalised to four events
Since the aces are replaced all four probabilities are the same, i.e., P(A1 ) =
P(A2 |A1 ) = P(A3 |A1 ∩ A2 ) = P(A4 |A1 ∩ A2 ∩ A3 ) = 52
4
= 13
1
. Then the probability
P(A1 ∩ A2 ∩ A3 ∩ A4 ) of drawing four aces is
1 1 1 1 1
P(A1 ∩ A2 ∩ A3 ∩ A4 ) = · · · = ≈ 3.5 × 10−5 .
13 13 13 13 28, 561
Thus, a poker player’s dream hand is not going to be realised very often.
Without replacement: This time, the conditional probabilities are different. As
before P(A1 ) = 524
, but since the first ace is not replaced only 3 aces and 51 cards
remain so that P(A2 |A1 ) = 51 3
. Similarly P(A3 |A1 ∩ A2 ) = 50
2
and P(A4 |A1 ∩ A2 ∩
A3 ) = 49 . The probability of drawing four aces without replacement is thus given by
1
4 3 2 1 1
P(A1 ∩ A2 ∩ A3 ∩ A4 ) = · · · = = 3.7 × 10−6 .
52 51 50 49 270, 725
As we might have guessed, the probability is very much smaller without replacement.
94 5 Theoretical Background - Probability and Statistics
Theorem 5.2.9
P(B|A) = P(B),
which means that the probability of B is not affected by the probability of A. In this
case A and B are said to be independent events. For this important special case of
independent events the multiplication rule reduces to:
A useful formula for more advanced applications is Bayes’ rule, also called Bayes’
theorem: Let A1 , A2 ,…,An be mutually exclusive events covering the whole sample
space, so that one of these events must occur. Let A be another event of the same
sample space. Then, Bayes’ theorem states:
P(Ai )P(A|Ai )
P(Ai |A) = n .
i=1 P(Ai )P(A|Ai )
It is perhaps easier to see the meaning of the theorem after rearranging it:
5.2 Basics 95
n
P(Ai |A) P(Ai )P(A|Ai )
P(A|Ai ) = i=1
P(Ai )
This gives us the probability of A occurring given that Ai has occurred. That one of
the events Ai must occur, means that A can only occur if one of the Ai ’s has occurred.
Hence, in a sense, events Ai can be thought of as causing A to occur. For this reason
Bayes’ theorem is sometimes viewed as a theorem on probable causes.
5.2.4 Permutations
For r = n we get
n Pn = n(n − 1)(n − 2) . . . (1) = n!
We arrive at the important result that the number of ways of arranging n objects is
n!. It is worth highlighting this result
Number of ways of
= n!
arranging n objects
n! 12! 12! 12 · 11 · 10 · 9 · 8!
n Pr = = = = = 11, 880.
(n − r) (12 − 4)! 8! 8!
5.2.5 Combinations
With permutations the order of the objects matters so, for example, the arrangements
of letters pqrs and sprq are different permutations. For combinations, the order of
objects doesn’t matter, so that the letter arrangements pqrs and sprq are considered
to be the same combination.
Since r objects can be arranged r! ways, the number of combinations of n objects r
ways can be found by replacing these r! arrangements by a single arrangement in the
set of permutations of n objects r ways. This can be done by dividing the number of
permutations n Pr by r!. We thus obtain the formula for the number of combinations
n Cr by dividing the formula for n Pr by r!:
P n!
Number of combinations of n objects r ways n Cr = nr!r = r!(n−r)! (5.9)
Solution
Since there are 12 letters combined four ways, r = 4 and n = 12. The number of
combinations is
n! 12! 12! 12 · 11 · 10 · 9
n Cr = = = = = 495.
r!(n − r)! 4!(12 − 4)! 4! · 8! 4·3·2·1
The numbers given in Eq. (5.10) are called binomial coefficients since they appear
as the coefficients in the binomial expansion (a + b)n .
Next, we need to determine the number of desired outcomes, namely, picking 1 red
ball and 2 blue balls when the order matters. Since order matters we need to consider
the number of ways picking the balls in the order RBB, where RBB = 1st picked
ball is red, 2nd picked ball is blue and 3rd picked ball is blue. This number can be
found by noting that the number of ways of picking a red ball first is 7. For each of
the red balls picked there are 5 ways of picking the second ball blue, making a total
of 7 · 5 = 35 ways. To each of these 35 ways of picking a red ball first and a blue
ball second, there are 4 ways of picking a third ball blue, which makes a total of
7 · 5 · 4 = 140 ways of picking three balls in the order RBB. By the same reasoning,
we find that the orders BRB and BBR can also each be chosen 140 ways, making the
total number of desired outcomes 140 + 140 + 140 = 420.
98 5 Theoretical Background - Probability and Statistics
Rather than calculating by hand, we can use the formula for permutations to
calculate the number of ways of picking the orders RBB, BRB and BBR when the
order matters. We illustrate this method for RBB:
The probability P(1R, 2B) of picking 1 red ball and 2 blue balls is thus
Number of ways of picking 1 red ball and two blue balls in any order
= (number of ways of picking 1 red ball from 7 red balls in any order)
· (number of ways of picking 2 blue balls from 5 blue balls in any order)
7 5
= 7 C1 · 5 C2 = = 70
1 2
The probability P(1R, 3B) of picking 1 red ball and 2 blue balls is now given by
7 5
1 2 70 7
P(1R, 3B) = = = ,
12 220 22
3
which, of course, agrees with the answer above calculated using permutations.
Note: As indicated by the definition of our shorthand notation, the probability asked
for above is not the same as the probability P(R ∩ B ∩ B). The latter asks for the
probability that the balls are picked in a particular order, whereas the problem we
solved above asks for the probability that the balls are picked in any order. We
can check this by calculating P(R ∩ B ∩ B) using combinations and by the use of
Theorem 5.2.8:
7 P1 5 P2 140 7
P(R ∩ B ∩ B) = = = .
12 P3 1320 66
By Theorem 5.2.8
7 5 4 140 7
P(R ∩ B ∩ B) = P(R)P(B|R)P(B|R ∩ B) = . . = = .
12 11 10 1320 66
Contrast this result with the probability calculated above, noting that P(R ∩ B ∩ B) =
P(B ∩ R ∩ B) = P(B ∩ B ∩ R):
7 7 7 7
P(1R, 2B) = P(R ∩ B ∩ B) + P(B ∩ R ∩ B) + P(B ∩ B ∩ R) = + + = ,
66 66 66 22
Random or stochastic variables are either numbers forming the sample space of a
random experiment (e.g. heights of students in a class) or else they are numbers
assigned to each sample point according to some rule. As an example of the latter,
consider the sample space {HH , HT , TH , TT } resulting from tossing a coin twice.
The rule number of heads in each outcome assigns the numbers {2, 1, 1, 0} to the
sample space {HH , HT , TH , TT }. The rule therefore defines a random variable X
having values {0, 1, 2}. Note that random variables are usually denoted by a capital
letter. Since the values of X are discrete, we call X a discrete random variable.
Where random experiments yield continuous numbers or where continuous numbers
are associated with sample points, we call X a continuous random variable.
or as
P(X = xi ) = p(x) (5.12)
The names are often shortened to probability density or probability function . For
a function to be a probability function it must satisfy the following conditions:
1. 0 ≤ p(x) ≤ 1
Conditions for a discrete
probability density (5.13)
2. x p(x) = 1
A discrete probability density can be represented by a bar chart or histogram. Fig. 5.1
shows a bar chart of a probability density and a line graph of a cumulative distribution
function (see next section).
5.3 Probability Distributions 101
Fig. 5.1 Bar chart of the probability density p(x) and a line graph of the distribution function F(x)
of Example 5.3.21
Instead of asking for the probability of a particular value xi , we can ask for the
probability that X has a value less than some x, where x is continuous with values
from −∞ to +∞. This probability is given by the cumulative distribution function,
usually shortened to distribution function, defined by
For a discrete random variable the probability is easily obtained by simply adding
the probabilities for all values of xi less than x, i.e.,
102 5 Theoretical Background - Probability and Statistics
F(x) = p(xi ), xi < x.
i
When X can take only values x1 to xn , the distribution function is defined as follows:
⎧
⎪
⎪ 0 for − ∞ < x < x1
⎪
⎪
⎨ p(x1 )
⎪ for x1 ≤ x < x2
F(x) = p(x1 ) + p(x2 ) for x2 ≤ x < x3
⎪
⎪ .
⎪ ..
⎪
⎪
⎩
p(x1 ) + . . . + p(xn ) for xn ≤ x < ∞.
We note that since the values of the distribution function F(x) with increasing x
are obtained by adding positive or zero probabilities it either increases or remains
the same. The distribution function is therefore a monotonically increasing function.
Solution
For each of the 6 numbers on the first dice, there corresponds 6 numbers on the second
dice so that the total number of pairs of numbers, the sample space, is 6 · 6 = 36. The
probability density is given by the probability that the sum of two numbers is equal to
2, 3,...,12. The number of ways of getting 2 is 1 since the only combination is 1+1=2,
the number of ways of getting 3 is 2 since the allowed combinations are 1+2=3 or
2+1=3, and the number of ways of getting 4 is 3 since the allowed combinations are
1+3, 3+1, 2+2, and so on. Thus, the probabilities are P(X = 2) = 36 1
, P(X = 3) =
2
36
, P(X = 4) = 3
36
etc.. The probability F(X ) of getting X = 2 or less is F(2) = 36
1
,
of getting X = 3 or less it’s F(2) = 36 , of getting X = 4 or less it’s F(2) = 36 and
3 7
so on. The table below shows the complete definitions of p(x) and F(x):
x 2 3 4 5 6 7 8 9 10 11 12
p(x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
F(x) 1/36 3/36 6/36 10/36 15/36 21/36 26/36 30/36 33/36 35/36 36/36
A bar chart of p(x) is shown in Fig. 5.1. It has a typical bell shape or Gaussian distri-
bution (see subsection 5.6.4), which describes the distributions of many commonly
occurring random variables. A line graph of F(x) is also shown in Fig. 5.1. The
graph of F(x) has the staircase function or step function shape typical of discrete
distribution functions. Notice, that the jumps correspond to the probabilities p(x).
This is easily seen by noting that the jumps are the differences between adjacent
values of F(x) and that these differences are the probabilities p(x). For example
15 10 5
P(X = 6) = F(X ≤ 6) − F(X ≤ 5) = − = .
36 36 36
If we ask for the probability that a continuous random variable X with values ranging
from x = −∞ to x = +∞ has a specific value X = x, we will get P(X = x) = ∞ x
=
0. This is clearly not meaningful, so instead, for a continuous random variable we ask
for the probability that it lies in some interval a ≤ x ≤ b. This procedure allows the
definition of a continuous probability density p(x). A continuous probability density
is defined by the requirement that it satisfies the following conditions, analogues of
the discrete case:
104 5 Theoretical Background - Probability and Statistics
1. 0 ≤ p(x) ≤ 1
Conditions for a continuous
∞ (5.14)
probability density function
2. −∞ p(x) dx = 1
The first condition is necessary because a negative probability has no obvious mean-
ing, while the second condition reflects the certainty that the value of a continuous
random variable must lie between −∞ to +∞.
Asking for the probability that a banana picked at random from a banana plantation
has length between 14 cm to 16 cm is an example of a trial or random experiment
which yields a continuous random variable; the length of a banana. Another example,
is selecting a student from a class and asking for the probability that her height lies
between 1.2 m and 1.5 m.
The probability that a random variable X lies in the interval a ≤ x ≤ b is given
by
We can ask for the probability that a continuous random variable X has a value
less than x. This probability is given by a continuous distribution function F(x), the
analogue of the discrete case, defined by
Continuous x
distribution F(x) = P(X ≤ x) = P(−∞ ≤ X ≤ x) = −∞ p(v) d v (5.16)
function
Because the probability of a specific value x is zero, and as long as p(x) is continuous,
‘≤’ in the above definition is interchangeable with ‘<’. The probability that X lies
in the interval a ≤ X ≤ b is given by the difference F(b) − F(a), i.e.,
b
P(a ≤ X ≤ b) = F(b) − F(a) = p(x) dx. (5.17)
a
5.3 Probability Distributions 105
√1 e−x
2
Fig. 5.2 Plot of the continuous probability density p(x) = π
and its distribution function
F(x) = 1
2 [(1 + erf(x))]. The shaded area in the probability function graph shows P(X ≤ 1)
In Fig. 5.2 the shaded area of the probability density graph gives the probability
that X ≤ 1. The second graph shows the corresponding distribution function.
Find (a) the distribution function, and (b) the probability P(−0.5 ≤ X ≤ 0.5).
Solution
(a) To find F(x), apply Eq. (5.16) to each of the three intervals in which p(x) is
defined:
⎧ −1
⎪
⎪
⎪ −∞ 0 dx = 0 for x < −1
⎪
⎨ x
F(x) = 0.3 −1 (2 − v 2 ) d v = 0.6x − 0.1x3 + 0.5 for − 1 ≤ x ≤ 1
⎪
⎪
⎪
⎪
⎩ ∞
1 0 dx = 0 for x > 1.
(b) The probability P(−0.5 ≤ X ≤ 0.5) is found from F(x) found in part (a)
The last step follows by the definition of an integral as the area under the curve. Here,
the area under the curve with projection on the x-axis of length dx is p(x)dx. The
required result follows immediately from Eq. (5.18):
dF(x)
dx = p(x)
We come now to consider two random variables, either both discrete or both con-
tinuous. Generalisation to more than two variables or to mixtures (some random
variables discrete, some continuous) is straightforward.
Discrete Joint Distributions
Let X and Y be two discrete random variables which can take discrete values
x1 , x2 , x3 . . . and y1 , y2 , y3 . . . respectively. The probability P(X = xi , Y = yj ) that
X = xi and Y = yj can be written as
or as
P(X = xi , Y = yj ) = p(x, y), (5.20)
if we define p(x, y), called a joint discrete probability density function or simply a
joint discrete probability density, as follows:
5.3 Probability Distributions 107
Joint discrete p(xi , yj ) for x = xi , y = yj
p(x, y) = (5.21)
probability density 0 for x = xi , y = yj
1. 0 ≤ p(x, y) ≤ 1
Conditions for a Joint discrete
probability density
2. x y p(x, y) = 1
.
The probability that X = xi irrespective of the value of Y is denoted by P(X = xi ) =
px (xi ). Supposing X and Y have values X = x1 , x2 , . . . xn and Y = y1 , y2 , . . . , ym ,
the probability P(X = xi ) is given by
m
P(X = xi ) = px (xi ) = p(xi , yj ). (5.22)
j=1
n
P(Y = yj ) = py (yj ) = p(xi , yj ). (5.23)
i=1
The functions P(X = xi ) = px (xi ) and P(Y = yj ) = py (yj ) are called marginal prob-
ability density functions (or marginal probability densities for short) and, together
with the probability density p(xi , yj ), can be represented by a joint probability table
(Table 5.1):
We see from the Table 5.1 that summing columns and rows gives the marginal
probability densities in the margins of the table, hence the name ‘marginal’. Take,
108 5 Theoretical Background - Probability and Statistics
for example, the column labeled y2 . We notice that each entry for p contains y2 .
Adding these entries gives the marginal probability function py (y2 ), i.e.,
Notice also that the bottom right hand corner of Table 5.1 gives the sum of all the
probabilities:
n m
p(xi , yj ) = 1. (5.24)
i=1 j=1
n
m
px (xi ) = 1 and py (yj ) = 1,
i=1 i=j
number of ways X = 0, Y = 0
p(0, 0) =
number of sample points
number of ways of picking 3 green balls from 4 green balls
=
number of sample points
4 9 4
= ÷ = .
3 3 84
5.3 Probability Distributions 109
number of ways X = 0, Y = 1
p(0, 1) =
number of sample points
(no. of ways to pick 2 green balls from 4)(no. of ways to pick 1 yellow ball from 3)
= .
number of sample points
4 3 9 18
= ÷ = .
2 1 3 84
4 18 12 1 35
px (0) = p(0, 0) + p(0, 1) + p(0, 2) + p(0, 3) = + + + =
84 84 84 84 84
12 24 6 42
px (1) = p(1, 0) + p(1, 1) + p(1, 2) = + + =
84 84 84 84
4 3 7
px (2) = p(2, 0) + p(2, 1) = + = .
84 84 84
while the marginal probability distribution py (y) is given by Eq. (5.23)
4 12 4 20
py (0) = p(0, 0) + p(1, 0) + p(2, 0) = + + =
84 84 84 84
18 24 3 45
py (1) = p(0, 1) + p(1, 1) + p(2, 1) = + + =
84 84 84 84
12 6 18
py (2) = p(0, 2) + p(1, 2) = + =
84 84 84
1
py (0) = p(0, 3) = .
84
(c) The table of the marginal probabilities and the total probability is given in
Table 5.2.
110 5 Theoretical Background - Probability and Statistics
1. 0 ≤ p(x, y) ≤ 1
Conditions for a joint continuous (5.25)
probability density ∞ ∞
2. −∞ −∞ p(x, y) dxdy = 1
With these conditions satisfied, the probability P(a < X < b, c < Y < d ) that X has
a value in the interval a < X < b and Y has a value in the interval c < Y < d is
x=b y=d
P(a < X < b, c < Y < d ) = x=a y=c p(x, y) dxdy
Following similar steps as in subsection 5.3.5, we can show that the joint probability
density is the second partial derivative of the joint distribution function:
∂2F
p(x, y) = (5.26)
∂x∂y
Proof of Eq. (5.26): Let X lie in the infinitesimal interval x ≤ X ≤ x + x, and y in
the infinitesimal interval y ≤ Y ≤ y + y, then
u=x+x v=y+y
P(x ≤ X ≤ x + x, y ≤ Y ≤ y + y) = F(x, y) = p(u, v) dud v
u=x v=y
The integral gives the volume under the surface p(x, y) with area projection on the
xy-plane dxdy at point (x, y). Since the area dxdy is infinitesimal, this volume is given
by p(x, y)dxdy, hence
5.3 Probability Distributions 111
dF = p(x, y)dxdy.
Rearrangement gives
∂2F
= p(x, y),
∂x∂y
Determine (a) P(X ≥ 1, Y ≤ 1), (b) P(X ≤ Y ) and (c) P(X ≤ a).
Solution
(a)
1 ∞ y
P(X ≥ 1, Y ≤ 1) = e−2x e− 2 dxdy
−∞ 1
1 ∞ y
= e−2x e− 2 dxdy
0 1
1
y 1 −2x ∞
= e− 2 e dy
0 −2 1
1
y 1 −2
= e− 2 0 − e dy
0 −2
1
1 −2 1 − y
= e e 2
2 − 21
0
1 −2 − 21
= e −2e + 2
2
e−2 − e− 2
5
=
(b)
∞ y y
P(X ≤ Y ) = e−2x e− 2 dxdy
0 0
∞
− 2y 1 −2x y
= e e dy
0 −2
∞
0
− 2y 1 −2y 1
= e e − dy
0 −2 −2
∞
1 −5y 1 −y
= e 2 + e 2 dy
0 −2 2
1 −5y 1 −y ∞
= e 2 + e 2
5 −1 0
1
= − +1
5
4
=
5
5.3 Probability Distributions 113
(c)
∞ a y
P(X ≤ a) = e−2x e− 2 dxdy
0 0
a ∞
−2x 1 −y
= e e 2 dx
0 − 21
a 0
1
= e−2x 0 − 1 dx
0 −2
a
1 −2x
=2 e
−2
0
1 −2a 1
=2 e −
−2 −2
= 1 − e2a
p(x, y)
P(Y = y | X = x) = p(y|x) = . (5.30)
px (x)
p(x, y)
P(X = x | Y = y) = p(x|y) = . (5.31)
py (y)
The functions p(y|x) and p(x|y) are called conditional probability density functions.
Using Eq. (5.31) with p(x, y) a continuous probability function, the probability
P(a ≤ X ≤ b | y ≤ Y ≤ y + dy) that X lies in the interval a ≤ X ≤ b given that y
lies in the interval y ≤ Y ≤ y + dy is given by
b
P(a ≤ X ≤ b | y ≤ Y ≤ y + dy) = p(x|y) dx.
a
114 5 Theoretical Background - Probability and Statistics
Since dy is infinitesimal, we can write the formula more simply and interpret it as
the probability that a < X < b given Y = y:
b
P(a ≤ X ≤ b | Y = y) = p(x|y) dx.
a
Note that this is a ‘working’ formula to find a ≤ X ≤ b given Y = y since the prob-
ability of Y = 0 is 0.
for the random variables X and Y . Calculate the conditional probability density p(x|y)
given that Y = y = 1.
Solution
To find p(x|1) we use formula (5.31):
p(x, y) p(x, 1)
p(x|y) = , p(x|1) =
py (y) py (1)
Solution
To find p(x, y) we use formula (5.31), i.e.,
p(x, y)
p(x|y) = . (5.33)
py (y)
First find the marginal probability density py (y) using formula (5.27)
∞ 1
4
py (y) = p(x, y) dx = (x − 2)(y − 3) dx
x=−∞ 15 x=0
1
4
= (xy − 3x − 2y + 6) dx
15 x=0
1
4 x2 y 3x2
= − − 2yx + 6x
15 2 2
0
4 y 3 2
= − − 2y + 6 = (9 − 3y) (5.34)
15 2 2 15
Substituting Eqs. (5.32) and (5.34) into Eq. (5.33), we get the required conditional
probability density p(x|y),
⎧ 2(x−2)(y−3)
⎨ (9−3y) for 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
p(x|y) = (5.35)
⎩
0 otherwise.
We saw in earlier chapters the importance of the mean in the analysis of experimental
data. It is the best estimate of the true value of a measured quantity (a value that can
never be known). More generally, the mean plays a fundamental role in all sorts of
statistical applications such as determining the life expectancy of a population for
annuity calculations, or determining average weights or heights of a population, etc..
Knowing the mean alone is not enough; the scatter or spread of data is also important.
For example, an experimental measurement of a quantity producing a narrow spread
of measured values is regarded as more precise (though not necessarily accurate -
since the average of the data may not be close to the true value because of the presence
of systematic errors) than one that produces a wide spread. The concept of variance,
or its square root, the standard deviation, are important measures of spread or scatter
of data.
116 5 Theoretical Background - Probability and Statistics
The mean, also called the average or expectation value, is commonly denoted either
with angle brackets X or by an over-line X . Another common notation, especially
when the mean appears in a formula, is to represent the mean by the Greek letter μ.
In this chapter we shall mostly use angle brackets or μ to represent the mean.
For a discrete data set of n quantities X = x1 , x2 , . . . , xn , the mean is defined as
the sum of the quantities divided by the number n of quantities, that is
n
x1 + x2 + x3 . . . + xn i=1 xi
x = = (5.36)
n n
A quantity may occur more than once. Let f (xi ), called the frequency, be the number
of times a quantity xi occurs. For example, in
x1 + 4x2 + 2x3 . . . + xn
x =
n
x2 occurs 4 times so its frequency f (x2 ) = 4, while x3 occurs 2 times so its frequency
f (x3 ) = 2. Using this notation, the formula for the mean is better written as
n
f (x1 )x1 + f (x2 )x2 + f (x3 )x3 . . . + f (xn )xn f (xi )xi
X = = i=1
(5.37)
n n
f (xi )
Noting that the frequency f (xi ) divided by n is just the probability p(xi ) = n
of
occurrence of the quantity xi , we can also write the formula for the mean as
n
X = p(x1 )x1 + p(x2 )x2 + p(x3 )x3 . . . + p(xn )xn = p(xi )xi (5.38)
i=1
n
f (xi )xi
Mean of a discrete quantity X = i=1
= ni=1 p(xi )xi (5.39)
n
Solution
The mean X is given by
n
5
X = p(xi )xi = P(X = xi )xi
i=1 i=1
= 0P(X = 0) + 1P(X = 1) + 2P(X = 2) + 3P(X = 3) + 4P(X = 4)
1 4 6 4 1
= 0· +1· +2· +3· +4· =2
16 16 16 16 16
analogous to the discrete case probability p(xi ). In this case, the continuous mean
can be approximated using the discrete mean formula:
∞
X = [p(xi )xi ]xi (5.41)
−∞
∞ ∞
In the limit xi → dx, −∞ → −∞ and Eq. (5.41) reduces to formula (5.40)
∞
X = xp(x) dx,
−∞
(5.42)
5.4 The Mean and Related Concepts 119
The generalisation of the definition of the mean to two or more random variables is
straightforward. For two random variables X and Y the generalisations for both the
discrete and continuous cases are:
Solution
As in Example 5.4.28, X is continuous but the measured values are discrete so we
can use the discrete mean formula. Using formula (5.39) the mean of X is
As we mentioned earlier, the mean alone, though important, is not enough. The mean
alone gives no indication of the precision of an experiment. For this we need an idea
of the spread or scatter of the measured values. The need to measure the spread or
scatter, in addition to the mean, is true for any set of data values. Such a measure is
the variance, or its square root, the standard deviation.
We defined the standard deviation σ in Chap. 3 and we repeat the definition here.
We will also adopt the commonly used notation μ to represent the mean. Using this
notation, the variance for both discrete and continuous random variables is defined
by
σ 2 = (x − μ)2
In words, the variance is found by taking the difference of the mean with all data val-
ues, (xi − μ), squaring each difference to obtain (x − μ)2 , and then taking the mean
of the squared differences. We may recall that in Chap. 3 we called the differences
r = (xi − μ) residuals. This general definition can be written more specifically for
discrete and continuous random variables as follows:
n
Variance i=1 f (xi )(xi − μ)
2
of a discrete quantity σ 2
= = ni=1 p(xi )(xi − μ)2 (5.44)
n
Variance ∞
σ2 = −∞ p(x)(x − μ)2 dx (5.45)
of a continuous quantity
5.4 The Mean and Related Concepts 121
The standard deviation formulae follow by taking the square roots of formulae (5.44)
and (5.45):
n 21
Standard Deviation f (xi )(xi − μ)2 n 21
of a discrete quantity σ = = p(xi )(xi − μ)2
i=1
i=1
n
(5.46)
Standard Deviation ∞ 21
of a continuous quantity σ= −∞ p(x)(x − μ)2 dx (5.47)
In Sect. 3.1 we distinguished between the population standard deviation and the
sample standard deviation. We mentioned there that formula (3.2) for the sample
standard deviation is a better estimate for the real (impossible to know) standard
deviation than the population standard deviation, Eq. (3.1), when only a portion
(sample) of the population is known. This situation arises in cases where the popula-
tion is discrete but infinite (as is the case of counting particles emitted by radioactive
sources in finite time intervals, when the full population requires counting for an
infinite time interval), or when the population is continuous (as in measurements
of continuous quantities). Actually, even for a finite population, there are numerous
situations where the entire population cannot be accessed. For example, in finding
the average height of women in a given country, the population is too large to be
fully accessed. The important example, mentioned above, that is most relevant to
us here, is experimental measurement (length, speed etc.). Quantities such as length
are continuous and require an infinite number of measurements to obtain the true
(impossible to know) mean. Obviously, this is impossible and all measurements of
continuous quantities are finite both in the number of measurements taken and in the
number of decimal places of each measurement. Thus, in all such cases where only a
portion of the population is available, the best estimate of the true standard deviation
is given by a slight generalistion of Eq. (3.1), namely,
n
f (xi )(xi − μ)2
σ= i=1
, (5.48)
(n − 1)
which is the sample standard deviation also defined earlier in Eq. (3.2). The situation
is identical for the variance, with the sample variance given by5
5 Note that many spreadsheet and mathematical computer packages have inbuilt formulae for the
mean, variance and standard deviation. However, it is not always stated whether the formulae are
for the population or for the sample variance or standard deviation. Where not stated, the formulae
are invariably for the sample variance and sample standard deviation.
122 5 Theoretical Background - Probability and Statistics
n
f (xi )(xi − μ)2
σ =
2 i=1
. (5.49)
(n − 1)
Since all real measurements (even of continuous variables) are discrete, why use con-
tinuous probability distributions p(x)? The first answer is that continuous probability
distributions are much easier to handle mathematically. Even for discrete populations,
when they are very large, it is convenient to approximate them with a continuous
probability distribution. A second more important reason is that continuous proba-
bility distributions can be viewed as the infinite limits of probability distributions
resulting from the finite measurements of continuous quantities. As such, it is argued
that statistical quantities such as the mean or the variance calculated from these
distributions are the best estimates of these quantities.
An important formula for variance for the discrete case is derived as follows:
n
n
σ 2 = (xi − μ)2 = (xi − μ)2 p(xi ) = (xi2 − 2xi μ + μ2 )p(xi )
i=1 i=1
n
n
n
= xi2 p(xi ) − 2μ xi p(xi ) + μ2 p(xi ) = x2 − 2μ2 + μ2
i=1 i=1 i=1
= x2 − μ2 = x2 − x2 ,
n
n
xi p(xi ) = μ and p(xi ) = 1.
i=1 i=1
where, again, we have used the definition of the mean and the requirement that
probabilities must add to 1. This gives us an important formula for standard deviation
for both the discrete and continuous case, noting that x can be either discrete or
continuous:
5.4 The Mean and Related Concepts 123
Figure 5.3 shows that a large standard deviation indicates a large spread of data
values, while a small standard deviation shows a small spread of data values.
Here are some useful properties of the variance which apply both to discrete and to
continuous random variables X and Y :
3. For independent
σX2 ±Y = σX2 ± σY2
random variables X , Y
Notation: σkX
2
= variance of the product kX = {kxi } for the discrete case, kX = kx
for the continuous case and σX2 ±Y = variance of X ± Y . These properties can be
generalised to more than two random variables in obvious ways.
1
p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = .
6
124 5 Theoretical Background - Probability and Statistics
1 1 1 1 1 1
σ2 = (1 − 3.5)2 + (2 − 3.5)2 + (3 − 3.5)2 + (4 − 3.5)2 + (5 − 3.5)2 + (6 − 3.5)2
6 6 6 6 6 6
35
= = 2.917. (5.51)
12
The variance may also be found from formula (5.50). First find X 2 :
1 2 91
X 2 = (1 + 22 + 32 + 42 + 52 + 62 ) = .
6 6
as before.
Solution
First find the mean using formula (5.40)
1
X = 3x2 x dx = 0.75.
0
X −μ
Standardised random variable Z of X Z=
σ
Mean of Z μZ = 0
Standard deviation of Z σZ = 1
Standardised random variables are dimensionless and are useful for comparing dif-
ferent distributions.
We mention, in passing, that the concept of variance can be generalised by taking
the rth power of the residues. The quantity so obtained is called the rth moment. For
both discrete and continuous random variables the rth moment is defined by
Also in passing, we give the definition of the moment generating function M (t) given
by the mean of the function etX of the random variable X , with t a parameter:
⎧ n tx
⎨ i e i p(xi ), X discrete with probability function p(xi )
M (t) = etX =
⎩∞ tx
−∞ e p(x) dx, X continuous with probability function p(x).
(5.52)
n m
Mean of discrete X μX = i=1 j=1 xi p(xi , yj )
n m
Variance of discrete X σX2 = (x − μX )2 = i=1
2
j=1 (xi − μX ) p(xi , yj )
n m (5.53)
Mean of discrete Y μY = i=1 j=1 yi p(xi , yj )
n m
Variance of discrete Y σY2 = (x − μY )2 = i=1
2
j=1 (xi − μY ) p(xi , yj )
∞ ∞
Mean of continuous X μX = −∞ −∞ xp(x, y) dxdy
∞ ∞
Variance of continuous X σX2 = (x − μX )2 = −∞ 2
−∞ (x − μX ) p(x, y) dxdy
∞ ∞
Mean of continuous Y μY = −∞ −∞ yp(x, y) dxdy
∞ ∞
Variance of continuous Y σY2 = (x − μY )2 = −∞ 2
−∞ (x − μY ) p(x, y) dxdy
(5.54)
With joint distributions, another important quantity arises, namely, the covariance.
It is denoted by σxy and has the general definition:
This definition may also be written more specifically for the discrete and continuous
cases as follows:
Covariance n m (5.55)
σXY = j=1 (xi − μX )(yi − μY )p(xi , yj )
of discrete X , Y i=1
Covariance ∞ ∞
σXY = −∞ −∞ (x − μX )(y − μY )p(x, y) dxdy
of continuous X , Y
(5.56)
The importance of covariance is that it indicates the extent to which the random
variables X and Y depend on each other. This dependence is made more precise by
the correlation coefficient defined in the next section.
Some properties of the covariance:
5.4 The Mean and Related Concepts 127
1. σXY = μXY − μX μY
2. For independent
σXY = 0
random variables X , Y
4. |σXY | ≤ σX σY
Correlation Coefficient
In joint distributions, the variables may be completely independent, in which case
σXY = 0, or completely dependent, for example, when X = Y , in which case
σXY = σX σY . But in many cases, X and Y are partially dependent on each other,
i.e., they are correlated to some extent. We need a way to measure the degree of
correlation. Such a measure is the correlation coefficient denoted by ρ and defined
by
Definition of the
ρ = σσX XYσ Y (5.57)
Correlation Coefficient
4 18 12 1 12
p(0, 0) = , p(0, 1) = , p(0, 2) = , p(0, 3) = , p(1, 0) = ,
84 84 84 84 84
24 6 4 3
p(1, 1) = , p(1, 2) = , p(2, 0) = , p(2, 1) = . (5.58)
84 84 84 84
Find the mean, variance and standard deviation of X and Y . Also find the covariance
and the correlation coefficient.
128 5 Theoretical Background - Probability and Statistics
Solution
The means are found using the formulae (5.53) for the mean:
σXY −0.1667
ρ= = = −0.3780
σX σY (0.6236)(0.7071)
Solution
The means are calculated using the formulae for the mean in Eq. (5.54):
130 5 Theoretical Background - Probability and Statistics
∞ ∞ ∞ ∞
y y
μX = xe−2x e− 2 dydx = xe−2x e− 2 dydx
−∞ ∞ 0 0
∞
y ∞ ∞
= xe−2x −2e− 2 dx = −2 e−2x (−0.25 − 0.5x) 0
0 0
= 0.5.
∞ ∞ y
μY = ye−2x e− 2 dydx
0 0
∞
− 2y 1 −2x ∞ 1 y ∞
= ye − e dx = − e− 2 (−4 − 2y)
0 2 0 2 0
= 2.
The variances are calculated using the formulae for the variance in Eq. (5.54):
∞ ∞ y
σX2 = (x − 0.5)2 e−2x e− 2 dydx
0 ∞ 0
y ∞ ∞
= (x − 0.5)2 e−2x −2e− 2 dx = −2 e−2x −0.125 − 0.5x2 0
0 0
= 0.25.
∞ ∞ y
σY2 = (y − 2)2 e−2x e− 2 dydx
0 0
∞
2 − 2y 1 −2x ∞ 1 y ∞
= (y − 2) e − e dx = − −2e− 2 4 + y2
0 2 0 2 0
= 4.
σX = 0.5 and σY = 2.
Since the covariance σXY is 0, the correlation coefficient is 0. This shows that the
random variables X and Y are uncorrelated (statistically independent). This is to be
expected, since, as mentioned earlier in this section, whenever a joint probability
5.4 The Mean and Related Concepts 131
density can be written as a product of functions, each depending on only one random
variable, the random variables are statistically independent.
∞
Conditional mean of Y μY |X = −∞ y p(y|x) dy
∞
Conditional mean of X μX |Y = −∞ x p(x|y) dx
The formulae apply both to the continuous and the discrete case, but for the contin-
uous case, X = x should be interpreted as x < X ≤ x + dx, similarly for Y = y.
Other Statistical Measures
Aside from the mean, the most relevant, there are other measures of central tendency,
and aside from the variance, there are other measures of dispersion (spread). As a
measure of central tendency the mode or the median is sometimes used. The mode
is that value of x, call it xm , that occurs most often. From this it follows that the
probability p(xm ) is a maximum. Some distributions may have more than one mode.
The median is the value of x for which P(X ≤ x) = P(X ≥ x) = 21 . Another measure
of dispersion is the range, defined as the difference between the largest and smallest
value of a data set.
The simple random walk concerns finding the probability P(m) that after taking a
total of N = n1 + n2 steps, with n1 = steps to the right and n2 = steps to the left, along
a straight line, a particle (or person) ends up m = n1 − n2 steps from the origin. Let
p = probability of a step to the right and q = 1 − p = probability of a step to the
left. It is assumed that each step is independent of any other. We first ask for the
probability of a specific sequence of n1 steps to the right and n2 steps to the left.
To find this probability we use the generalised multiplication rule for the case of
independent events, Eq. (5.6). In words, formula (5.6) states that the probability that
events A1 , A2 , . . . , AN occur together is the product of their probabilities. Using this,
the probability for a specific sequence of n1 steps to the right and n2 steps to the left
is
pp · · · p qq · · · q = pn1 qn2 . (5.59)
n1 n2
132 5 Theoretical Background - Probability and Statistics
This specific sequence can occur a number of different ways, and this number is equal
to the number of ways of arranging N objects n1 ways in any order, i.e., the number
of ways of combining N objects n1 ways (or, equivalently, N objects n2 ways). We
saw in subsection 5.2.5 that this number of combinations is
N N! N!
N Cn1 = = = .
n1 (N − n1 )!n1 ! n1 !n2 !
The total probability P(n1 , n2 ) of taking n1 steps to the right and n2 steps to the left
in any sequence is just the probability of one sequence summed N Cn times, i.e.,
N N ! n1 n2
P(n1 , n2 ) = pn1 qn2 = p q .
n1 n2 !n1 !
The distribution of data depends on numerous underlying factors and different ran-
dom experiments produce a great variety of different distributions. But, commonly
encountered random experiments of interest produce data distributions which can
be described by a small set of distributions. We shall consider the most important
of these: the binomial, Poisson, hypergeometric and Gaussian distributions. The
binomial and hypergeometric distributions are discrete finite distributions, while the
Poisson distribution is a discrete infinite distribution. The Gaussian distribution is
continuous, and is perhaps the most important, not least because of its mathematical
simplicity. It is also the most relevant for distributions of measured values.
The binomial distribution or Bernoulli distribution, named after the Swiss mathe-
matician Jacques Bernoulli,6 is a discrete distribution that occurs in games of chance
(e.g., tossing a coin), quality inspection (e.g., counting the number of defective items),
6 The Bernoulli distribution is so named because J. Bernoulli was the first to study problems leading
to this distribution. In his 1713 book Ars Conjectandi (The Art of Conjuring) he included a treatment
of the problem of independent trials having two equally probable outcomes. He tried to show that
5.6 Binomial, Poisson, Hypergeometric and Gaussian Distributions 133
opinion polls (e.g., number of people who like a particular brand of coffee), medicine
(e.g., number of people that respond favourably to a drug under test) etc..
Generally, the binomial distribution applies to any set of trials with only two inde-
pendent, equally probable outcomes. The outcomes are invariably labeled ‘success’
or ‘failure’. The outcomes are associated with the random variable X , which has two
values: X = 1 = a success, and X = 0 = a failure. All successful outcomes X = 1
have the same probability p(1) = p, so that failures X = 0 must have the probability
p(0) = 1 − p = q. The interest is to find the probability p(i) of i successes in n trials.
This probability is given by
n
Binomial distribution or P(X = i) = p(i) = pi qn−i , (5.60)
i
Bernoulli distribution
i = 0, 1, 2, . . . , n
Since the distribution contains the binomial coefficients we see why it is called the
binomial distribution. It is not difficult to show how this formula is obtained. The
proof follows:
Consider the following sequence of i successes and n − i failures:
sss . . . s fff . . . f
i n−i
Since the probabilities are independent, the probability that all of these successes
and failures occur is just the product of the probabilities of each outcome as given in
Eq. (5.6)
pi (1 − p)n−i . (5.61)
But there are a number of different sequences with i successes and n − i failures in
different positions in the sequence. Note that for each different sequence, the order
of the s’s does not matter, i.e., interchanging s’s among themselves does not change
the sequence. Similarly, the order of the f ’s does not matter. In this case the number
of different sequences is given by the formula for combinations, Eq. (5.9):
n
. (5.62)
i
It follows that the total probability p(i) for i successes in n trials is the product of
Eqs. (5.62) and (5.61):
n i n i n−i
p(i) = p (1 − p) =
n−i
pq , (5.63)
i i
for a large enough number of trials the relative frequency of successful outcomes would approach
the probability for a successful outcome, but failed.
134 5 Theoretical Background - Probability and Statistics
which completes the proof. By using the binomial theorem we can easily show
that the binomial distribution satisfies the fundamental requirement of a probability
function, condition 2 of Eq. (5.13):
n n
n
p(i) = pi qn−i = (p + q)n = [p + (1 − p)]n = 1.
i
i=0 i=0
Mean μ = np
Variance σ 2 = npq
√
Standard deviation σ= npq
Proof that mean = μ = np: By substituting the binomial distribution for p(xi ) in the
definition of the mean, Eq. (5.39), we obtain an expression for the mean X k = ik
of a Bernoulli random variable X :
∞
∞
n n
X = i = i =
k k k k
i pq =
i n−i
i k
pi qn−i ,
i i
i=0 i=1
we get
∞
n−1 n
X = i =
k k
ik
pi qn−i
i−1 i
i=1
∞
n−1
= np ik−1 pi−1 qn−i
i−1
i=1
∞
k−1 n − 1
= np (j + 1) pj q(n−1)−j , by setting j = i − 1
j
j=0
Proof that variance = σ 2 = npq: This time, setting k = 2 in Eq. (5.64) gives
where the result Y = (n − 1)p follows from Eq. (5.65), since Y is a Bernoulli
random variable with parameters (n − 1, p). Substituting Eqs. (5.65) and (5.66) into
the formula for the variance Eq. (5.50) gives
Fig. 5.4 The four plots show that the binomial distribution characterised by (n, p) approaches the
Gaussian distribution (see Sect. 5.6.4 ) as n increases
136 5 Theoretical Background - Probability and Statistics
These, of course, are the values also obtained by counting desired outcomes and
dividing by the total number of points, 16, of the sample space.
The total probability of a packet not being returned is P(X = 0) + P(X = 1), from
which it follows that the probability P(returned) of a packet being returned is
As the name implies, the Poisson distribution (or Poisson probability density) was
first derived in 1837 by the French mathematician Siméon Denis Poisson.7 The Pois-
son distribution is the limit of the binomial distribution as p → 0 and n → ∞, while
μ = np remains finite. It follows that the Poisson and binomial distributions are
closely related. Indeed, for large n and small p, the Poisson distribution, preferred
for calculation, serves as a good approximation to the binomial distribution. Essen-
tially, the Poisson distribution is the generalisation of the binomial distribution to
a discrete countable infinity of trials. As for a binomial distribution, the trials are
ideally independent, but also like the binomial distribution, for a large population,
the difference between results from independent or dependent trials is not too large.
In this case, the Poisson distribution can serve as a reasonable approximation for
trials that are not independent.
The Poisson distribution has a very wide area of application since it serves as
an approximation to the binomial distribution when the probability p of successes
is small while the number of trials n is large. Under these conditions the use of the
Poisson distribution is preferred since calculations with the Poisson distribution are
much easier. Further, the Poisson distribution best describes distributions which arise
from natural processes where values may change at any instant of time. An important
example from nuclear physics concerns particle counts in a fixed time interval (e.g.,
α or β- particles) or radiation (e.g., γ or X-rays) emitted by a radioactive substance
(e.g., nickel oxide).
The Poisson distribution (Fig. 5.5) is given by
μi −μ
Poisson Distribution P(X = i) = p(i) = e , i = 0, 1, 2, . . . (5.67)
i!
7 The Poisson distribution was first presented in Poisson’s 1837 book Recherches sur la probabilité
des judgements en matière criminelle et en matière civile (Investigations into the Probability of
Verdicts in Criminal and Civil Matters).
138 5 Theoretical Background - Probability and Statistics
Fig. 5.5 The plots show the Poisson distribution for four values of the mean μ = 2, 5, 10 and 15
In Eq. (5.67) μ is the mean and i represents the allowed values of the random variable
X . That the distribution p(i) satisfies the fundamental condition for a probability
density, condition 2 of Eq. (5.13), is easily shown by summing p(i) from i = 0 to ∞:
∞
∞
−μ μi
p(i) = e = e−μ eμ = 1,
i=0 i=0
i!
since the series is just the power series of the exponential function eμ .
The Poisson distribution Eq. (5.67) can be derived from the binomial distribution in
the following way: Begin with the binomial distribution, Eq. (5.60),
n i n−i n!
p(i) = pq = pi (1 − p)n−i .
i (n − i)! i!
As mentioned above, the Poisson distribution follows by taking the limit p → 0 and
n → ∞, or equivalently, by taking the limit λ → 0 and n → ∞. Taking this limit,
we get
5.6 Binomial, Poisson, Hypergeometric and Gaussian Distributions 139
λ n n(n − 1)(n − 2) . . . (n − i + 1)
lim 1− → e−λ , lim = 1,
λ→0,n→∞ n n→∞ ni
λ i
lim 1− →1
λ→0,n→∞ n
Substituting these results into Eq. (5.68) gives the Poisson distribution
λi −λ μi
P(X = i) = p(i) = e = e−μ ,
i! i!
where we have set λ = μ to get the last term.. This completes the derivation. The
Poisson distribution has the following important properties:
Mean μ
Variance σ2 = μ
√
Standard deviation σ= μ
Proof that the mean = μ: Substituting the Poisson distribution, Eq. (5.67), into the
formula (5.39) for the mean gives
∞
∞
μi
Mean = X = i = ip(i) = i e−μ
i=1 i=1
i!
∞
−μ μi−1
= μe , since the i = 0 term is 0
i=1
(i − 1)!
∞
μj
= μe−μ , by setting j = i − 1
j=0
j!
∞
μj
= μe−μ eμ , since = eμ
i=0
j!
= μ, (5.69)
Substituting Eqs. (5.69) and (5.70) into the formula for the variance Eq. (5.50) gives
Variance = σ 2 = X 2 − X 2 = μ2 + μ − μ2 = μ,
3.40
P(0) = e−3.4 = 0.0333733
0!
3.41
P(1) = e−3.4 = 0.113469
1!
3.42
P(2) = e−3.4 = 0.192898.
2!
Then
P(≤ 2) = P(0) + P(1) + P(2) = 0.33974.
Drawing objects from a set of things (e.g., cards from a deck of cards) and replacing
each object before the next object is picked guarantees independence of the trials.
142 5 Theoretical Background - Probability and Statistics
In this case, the binomial distribution can be used whenever there are two outcomes.
Suppose a bag contains N balls, M of which are red and N − M are blue, then
M
probability of a red ball = p =
N
M
probability of a blue ball = q = 1 − p = 1 − ,
N
and Eq. (5.60) gives the probability p(i) of picking i red balls in n trials:
i
n n M M n−i
p(i) = pi qn−i = 1− , i = 0, 1, 2 . . .
i i N N
Suppose now the balls are picked without replacement. We again ask for the prob-
ability p(i) of picking i red balls in n trials. The trials are no longer independent so
that we cannot use the binomial distribution. Instead, we will determine this proba-
bility by counting the number of ways of getting i red balls in n trials. As usual, we
will be greatly aided in counting these ways and sample points by the formula for
combinations. The number of ways of choosing i red balls from the M red balls in
the bag is, noting that i ≤ M ,
M
Number of ways of choosing i red balls from M red balls = .
i
It follows that the total number of ways of choosing i red balls from n trials is just
the product of these combinations:
M N −M
Total number of ways of choosing i red balls from n trials =
i n−i
The number of sample points is given by the number of ways of arranging the total
number of balls N among n trials, noting that n ≤ N :
N
Number of sample points = .
n
The previous probability distribution functions we considered are all discrete. The
Gaussian distribution, also called the normal distribution or Gaussian probability
density, on the other hand, is continuous. It is, perhaps, the most important dis-
tribution since it describes the distributions resulting from a very wide variety of
random processes or random experiments. Further, under suitable conditions, the
Gaussian distribution approximates non-Gaussian distributions. For example, for a
large number n of trials, the binomial distribution approaches the Gaussian distri-
bution (Fig. 5.4), so that, in such cases, the often mathematically simpler Gaussian
distribution serves as a good approximation.
The Gaussian distribution was actually introduced by the French mathematician
Abraham de Moivre in 1733, who developed it as an approximation to the binomial
distribution for a large number of trials. De Moivre was concerned with calculating
probabilities in games of chance. It was brought to prominence in 1809 by the German
mathematician Karl Friedrich Gauss, who applied the distribution to astronomical
problems. Since this time it has become known as the Gaussian distribution. The
Gaussian distribution described so many data sets that in the latter part of the 19th
century the British statistician Karl Pearson coined the name ‘normal distribution’,
a name which, like the name ‘Gaussian distribution,’ has stuck to this day.
8 It
is so called because its moment generating function (defined in Eq. (5.52)) can be expressed as
a hypergeometric function.
144 5 Theoretical Background - Probability and Statistics
− (x−μ)
2
Gaussian 1
Distribution P(X = x) = p(x) = √ e 2σ 2 , −∞ < x < ∞ (5.72)
σ 2π
It is straightforward to show that the probability density p(x) satisfies the fundamental
condition for a probability function, condition 2 of (5.14). To do this we first introduce
the standardised random variable Z corresponding to the random variable X :
X −μ
Z= (5.73)
σ
∞ (x−μ)2 ∞ z2
1 − 1 −
√ e 2σ 2 dx = √ e 2 dz. (5.74)
σ 2π −∞ 2π −∞
From tables ∞ ∞
$
−az 2 −az 2 π
e dz = 2 e dz = .
−∞ 0 a
Substituting a = 1
2
gives
∞ z2 √
−
e 2 dz = 2π. (5.75)
−∞
5.6 Binomial, Poisson, Hypergeometric and Gaussian Distributions 145
∞ (x−μ)2 √
1 − 2π
√ e 2σ 2
dx = √ = 1,
−∞ σ 2π 2π
x (v−μ)2
1 −
F(x) = P(X ≤ x) = √ e 2σ 2 d v. (5.76)
σ 2π −∞
Substituting the standardised random variable Z = z given in Eq. (5.73) into Eq.
(5.72) gives the standard Gaussian probability distribution or standard Gaussian
probability density,
2
1 −z
p(z) = √ e 2 , (5.77)
2π
while substituting it into Eq. (5.76) gives the standard Gaussian distribution function
z u2 z u2
1 − 1 1 −
F(z) = P(Z ≤ z) = √ e 2 du = +√ e 2 du (5.78)
2π −∞ 2 2π 0
The distribution function F(z) is closely related to the error function erf(z), defined
by z
2 2
erf(z) = √ e−u du,
π 0
F(−z) = 1 − F(z)
The following relations, which follow from the above definitions, are useful in cal-
culations of probabilities such as P(X ≤ a) or P(a ≤ X ≤ b):
X −μ a−μ a−μ
F(a) = P(X ≤ a) = P ≤ =F = F(A)
σ σ σ
A z 2
1 −
= √ e 2 dz (5.79)
2π −∞
146 5 Theoretical Background - Probability and Statistics
Fig. 5.6 The plots show the Gaussian probability density (Gaussian distribution) for mean μ = 0
and four values of the standard deviation, σ = 0.14, 0.21, 0.35 and 0.64
a−μ X −μ b−μ
P(a ≤ X ≤ b) = P ≤ ≤
σ σ σ
b−μ a−μ
=F −F = F(B) − F(A)
σ σ
B z2
1 −
= √ e 2 dz (5.80)
2π A
Mean μ
Variance σ2
Standard deviation σ
Proof that mean = μ: Beginning with the definition of the mean, we have
∞ (x−μ)2
1 −
X = √ xe 2σ 2 dx.
σ 2π −∞
∞ z2
1 −
X = √ (μ + zσ)e 2 dz
2π −∞
∞ z 2 ∞ z2
μ − σ −
= √ e 2 dz + √ z e 2 dz (5.81)
2π −∞ 2π −∞
√
The first integral is the same as in Eq. (5.75) and equal to 2π. The second integral
can by evaluated by noting that
z2 d −z
2
−
ze 2 = − e 2.
dz
Using this result, the second integral becomes
2
∞
∞
d −z z2
−
− e 2 dz = − e 2 =0
−∞ dz
−∞
∞ (x−μ)2
1 2 −
X = √2
x e 2σ 2 dx
σ 2π −∞
Substitution of these values of the three integrals into Eq. (5.83) gives
σ2 √ μ2 √
X 2 = √ 2π + 0 + √ 2π = σ 2 + μ2 . (5.84)
2π 2π
Substituting Eqs. (5.82) and (5.84) into the formula for the variance Eq. (5.50) gives
variance = X 2 − X 2 = σ 2 + μ2 − μ2 = σ 2 ,
8 −(x−4)2
1
P(3 ≤ X ≤ 8) = √ e 2(4) dx = 0.668712,
2 2π 3
From this theorem it follows that for large n (and p, q small) a standardised Gaussian
distribution is a good approximation to a standardised binomial distribution.
Because we are approximating a discrete distribution with a continuous one,
a correction called the continuity correction has to be made. Thus, to apply the
approximation, we first make the correction
1 1
P(X = i) = P i − ≤ X ≤ i + .
2 2
We see that there are two approximations to the binomial distribution: the Poisson
approximation, which is good when n is large and p is small, and the Gaussian
approximation, which is good if n is large and neither p nor q are too small. In
practice, the Gaussian approximation is very good if both np and nq are greater
than 5.
627
P(5 ≤ X ≤ 8) = P(X = 5) + P(X = 6) + P(X = 7) + P(X = 8) = = 0.6123.
1024
Standardising gives
4.5 − 5 X −5 8.5 − 5
P(4.5 ≤ X ≤ 8.5) = P √ ≤ √ ≤ √ = P(−0.3162 ≤ Z ≤ 2.2135).
2.5 2.5 2.5
Comparing with the binomial result P(5 ≤ X ≤ 8) = 0.6123, we see that the Gaus-
sian approximation is quite good. Indeed, for our case np = nq = 5 satisfies the
criterion for the Gaussian distribution to be a good approximation to the binomial
distribution (Fig. 5.7).
Fig. 5.7 A Gaussian probability density approximation to a binomial distribution for the number
of heads obtained in 10 tosses of a coin
tion should be related to the Poisson distribution. This is indeed the case, and the
relation is given by:
We conclude with three important theorems that have played an important role in
probability theory.
152 5 Theoretical Background - Probability and Statistics
The Weak Law of Large Numbers was first derived by Swiss mathematician Jacques
Bernoulli in his 1713 book Ars Conjectandi (The Art of Conjuring) for the special
case of Bernoulli or binomial random variables (i.e., random variables produced by
independent trials having two outcomes). It is stated as follows:
A early version of the central limit theorem was proved by De Moivre around 1733
for the special case of Bernoulli random variables with p = 21 . Laplace also presented
a special case of the central limit theorem in his 1812 book Théorie analytique des
probabilités (Analytic Theory of Probability). Later, Laplace extended the theorem
to arbitrary p. Laplace showed that the distribution of errors in large data samples
gathered from astronomical observations was approximately Gaussian. Since error
analysis is fundamental to all scientific experiment, Laplace’s central limit theorem
is regarded as a very important contribution to science.
The central limit theorem states that the sum of a large number of independent
random variables is approximated by a value found from a Gaussian distribution. A
more precise statement follows:
The Strong Law of Large Numbers was derived in 1909 for the special case of
Bernoulli random variables by the French mathematician Emile Borel using the
newly introduced measure theory. A general derivation was given later by A. N.
Kolmogorov. The Strong Law states that the average of a sequence of independent
random variables having the same distribution is certain (probability 1) to converge
to the mean of the distribution. More precisely, the theorem may be stated thus:
5.8 Problems
6. An example of the use of Theorem 5.2.6. Consider again events A and B and the
trial of question 5. By calculating the three probabilities P(A), P(A ∩ B) and
P(A ∩ B ) by direct counting, confirm theorem 5.2.6.
7. An example of the use of Theorem 5.2.7. Let A = the hearts suite of a deck
of 52 cards. It consists of the mutually exclusive events: A1 = hearts ace to 3,
A2 =hearts 4 to 7, A3 = hearts 8 to 9 and A4 = 10, jack, queen, king of hearts. The
trial consists of drawing a card. By determining the probability P(A) that a heart
is drawn and the probabilities of A occurring with A1 , A2 , A3 and A4 , confirm
theorem 5.2.7.
8. Conditional Probability. Sampling with replacement. Consider a bag containing
3 blue balls and 4 yellow balls. The trial is picking of a ball and is repeated twice.
Determine the probability of picking a yellow ball, given that a yellow ball is
already picked.
9. The multiplication rule (I). Sampling with replacement. Consider the ball-picking
trials of question 8. Calculate the probability P(A ∩ B) of picking two yellow
balls, with replacement, using the multiplication rule P(A ∩ B) = P(B|A)P(A),
Eq. (5.2).
10. Conditional Probability. Sampling without replacement. Consider again ques-
tion 8, but this time with the first picked ball not replaced. Again, calculate the
probability of picking a yellow ball given that a yellow ball is already picked.
11. The multiplication rule (II). Sampling without replacement. Again, consider
question 8 and ask for the probability P(A ∩ B) of picking 2 yellow balls from
the bag using the multiplication rule P(A ∩ B) = P(B|A)P(A), Eq. (5.2), but this
time without replacing the first picked ball.
12. Multiplication Rule (III). Drawing cards (I). What is the probability of drawing
an 8 and a 9 from a deck of 52 cards with and without replacing the first chosen
card?
13. Multiplication Rule (IV). Drawing cards (II) . What is the probability of drawing
three kings from a deck of 52 cards with and without replacing the first chosen
card?
14. Permutations. How many permutations of 6 letters can be made from the letters
of the word ‘hippopotamus’?
15. Combinations. Consider again the arrangement of the letters of the word ‘hip-
popotamus’ in groups of 6 letters, but this time the different order of the same 6
letters does not matter. In other words, how many combinations of 6 letters can
be made from the word ‘hippopotamus’?
16. Use of combinations and permutations in the calculation of probabilities.
Four balls are picked, without replacement, from a bag containing 6 yellow balls
and 4 green balls. Determine the probability of picking 2 yellow balls and 2
green balls using (a) permutations and (b) combinations.
17. Probability function. Consider the sample space
produced by tossing a coin three times. The random variable X = number of heads
can take on values X = 0, 1, 2, 3. Write down the probability function for X .
5.8 Problems 155
18. Distribution function. Write down the distribution function corresponding to the
probability function of problem 17.
19. Probability density and distribution function. Let the discrete random variable
X = product of the two numbers when two dice are thrown. Determine the
probability density and distribution function for X .
20. Continuous probability density. The random variable X is described by the prob-
ability density
⎧
⎪
⎪ 0 for x < −1
⎨ 1
e−x for − 1 ≤ x ≤ 1
2
p(x) = √
⎪
⎪ π Erf(1)
⎩
0 for x > 1.
Find (a) the distribution function, and (b) the probability P(−0.5 ≤ X ≤ 0.5)
using Eq. (5.15) and F(b) − F(a) of Eq. (5.17).
21. Discrete joint probabilities. 3 balls are picked from a bag containing 3 orange
balls, 4 yellow balls and 2 green balls. Let X = number of orange balls chosen, and
Y = number of yellow balls chosen. (a) Define the joint probability density P(X =
x, Y = y) = p(x, y), (b) Determine the marginal probability density functions
Px (X = x) = px (x) and Py (Y = y) = py (y), and (c) Draw a table to determine
the various marginal probabilities and check the answer by finding the total
probability.
22. Continuous joint probability density. The continuous joint probability density
for two random variables X and Y is defined by
⎧
⎪
⎪ y2
⎨ −
e−2x e 2 for − ∞ ≤ x ≤ ∞, −∞ ≤ y ≤ ∞
2
p(x, y) =
⎪
⎪
⎩0 otherwise.
Determine (a) P(X ≥ 1, Y ≤ 1), (b) P(X ≤ Y ) and (c) P(X ≤ a).
23. Discrete conditional probability density. Consider the discrete joint probability
distribution
for the random variables X and Y . Calculate the conditional probability density
p(x|y) given that Y = y = 1.
24. Continuous conditional probability density. Consider the continuous joint prob-
ability density
⎧
⎪ 2
⎨ (2x − 4)(3y − 5) for 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
p(x, y) = 21
⎪
⎩
0 otherwise.
156 5 Theoretical Background - Probability and Statistics
2
p(x) = √ e−(2x−3) .
2
28. Mean of a function of a random variable. The following are 6 measured values
of one side of a cubical water tank in units of metres: 0.6986, 0.7634, 0.7286,
0.6629, 0.7041 and 0.6629. Let the random variable X = the measured length of
one side of the water tank, and the function of the random variable g(X ) = the
volume of the tank = X 3 . Find the means of X and g(X ).
29. Variance and standard deviation of a set of discrete quantities. Consider only
the diamond suite of a deck of cards. Let the random variable X = drawing an
even numbered diamond. Find the variance and standard deviation of X . Note
that ace, jack, queen and king are counted as 1, 11, 12 and 13, respectively.
30. Variance and standard deviation of a continuous quantity. Find the variance and
standard deviation of the random variable X described by the probability density
⎧ 3
⎨ 4x for 0 ≤ x ≤ 1
p(x) =
⎩
0 otherwise.
31. Variance and standard deviation of a set of discrete joint distributions. Consider
again problem 21 in which 3 balls are picked from a bag containing 3 orange
balls, 4 yellow balls and 2 green balls. Find the mean, variance and standard
deviation of X and Y . Also find the covariance and the correlation coefficient.
32. Variance and standard deviation of a set of continuous joint distributions. Deter-
mine the mean, variance, standard deviation, covariance and the correlation coef-
ficient of the following joint probability density:
⎧3 2
⎨ 5 (2x + 4xy) for 0 ≤ x ≤ 3, 0 ≤ y ≤ 3
p(x, y) =
⎩
0 otherwise.
5.8 Problems 157
33. Coin tossing. Application of the binomial distribution I. Three coins are tossed.
Determine the probability density for the number of heads obtained using the
binomial distribution.
34. Defective fuses. Application of the binomial distribution II. Electric fuses are sold
in packets of 30. All fuses have an equal probability p = 0.005 of being defective
The probability of one fuse being defective is independent of the probability of
another fuse being defective. A money-back guarantee is offered if more than
one fuse in a packet is defective. What percentage of fuse packets are refunded?
35. Faulty fuses. Application of the poisson distribution I. Consider problem 34. This
time, calculate the probability that 0,1,2 and 5 fuses out of every 200 manufac-
tured are faulty using the Poisson distribution with μ = np = 200(0.005) = 1.
Also calculate these probabilities using the binomial distribution for comparison.
36. Radioactive counting experiment. Application of the Poisson distribution II. Con-
sider an experiment in which the number of alpha particles emitted per second
by 2 g of a radioactive substance are counted. Given that the average number of
alpha particle counts per second is 5.7, calculate the probability P(≤ 2) that no
more than 2 alpha particles are counted in a 1 second interval.
37. Calculation of probabilities of a Gaussian distributed random variable. Let X be
a random variable described by a Gaussian probability density with mean μ = 5
and standard deviation σ = 3. Calculate the probability P(2 ≤ X ≤ 11) that X
has a value in the interval 2 ≤ X ≤ 11 using (a) Eq. (5.15) with p(x) given by
Eq. (5.72), and (b) Eq. (5.80).
38. Gaussian approximation to the binomial distribution. A coin is tossed 20 times.
The random variable X = number of heads. Calculate the probability P(14 ≤
X ≤ 17) that either 14, 15, 16 or 17 heads are thrown using (a) a binomial
distribution, and (b) a Gaussian approximation.
END
Chapter 6
Use of Computers
In this chapter we want to show how to use computer software packages to calculate
quantities from measured values in experiments, to calculate standard errors in the
mean, to represent data graphically and to draw graphs of the best straight line.
Experimental results are most usefully represented by linear graphs. For this, we will
use the method of least squares to find the best straight line and associated errors. To
represent distributions of frequencies and relative frequencies of measured values,
bar charts or histograms are very useful, so we will show how to produce these. In
addition we will show how to fit curves to these distributions.
Specifically we will consider four software packages: ©Microsoft Excel, ©Maple,
©Mathematica and ©Matlab.1 Of course, many other software packages are avail-
able, with a number specific to plotting graphs. Once this chapter is mastered, it
should not be difficult to transfer the methods shown here to other packages. We
have chosen Excel, a spreadsheet package, because it is commonly available and
its table orientated format makes calculations very easy. We have chosen Maple,
Mathematica and Matlab because of their tremendous mathematical and graphical
power.
Excel produces charts and line graphs fairly automatically and simply. Plotting
functions in Excel is a little limited. The mathematical packages, Maple, Mathe-
matica and Matlab, offer much more control and flexibility in producing charts and
graphs. Some might prefer to use Excel for calculation and then use the mathematical
packages to plot the results.
Computers are great time savers, but it is important to first master calculations with
a calculator and graph plotting by hand before adopting their use. Computers alone
cannot produce a well-thought out presentation of results. This can only be achieved
by a thorough understanding of the mathematical concepts and a good understanding
of how to construct a graph (as described in earlier chapters) or chart.
1 Excel, Maple, Mathematica and Matlab are registered trademarks of The Microsoft Corporation,
Waterloo Maple Inc, Wolfram Research Inc, and The MathsWorks Inc, respectively.
© Springer International Publishing AG, part of Springer Nature 2018 159
P. N. Kaloyerou, Basic Concepts of Data and Error Analysis,
https://doi.org/10.1007/978-3-319-95876-7_6
160 6 Use of Computers
It is our view that the best way to learn how to apply software packages to the solu-
tion of problems is by example. To this end we solve two examples using each of the
software packages in turn. The first example involves plotting bar charts/histograms,
frequency curves and relative frequency curves to represent the frequencies and rel-
ative frequencies of measured building heights. Here we will present a number of
graphical options which control the final appearance of a chart or graph. The example
also involves the calculation of important statistical quantities, namely, the mean, the
variance, the standard deviation and the standard error in the mean.
The second example is an experiment to determine the acceleration due to gravity
by measuring the times for an object to fall through various heights. This example
demonstrates how to use the method of least squares to calculate the slope of the
best straight line and where it cuts the y-axis, together with their errors, and then to
use the slope of the best straight line to calculate the acceleration due to gravity. The
graph of the best straight together with the data points will be given. Since this is a
case of a line through the origin, there are two ways to draw the best straight line.
One way, and the easiest, is to draw the line with the best slope through the origin.
The other is to use both the slope of the best straight line and where it cuts the y-axis
to plot the line.
For each of the mathematical packages we will write a separate program for each
example. For Excel we will solve each example as a spread sheet. We will present
each program in as close a format as possible to the package’s format. For Maple
and Mathematica we will suppress all but essential output, including charts or graphs
included only to exemplify various alternative graphical options. The charts or graphs
directly relevant to the examples will be exported from the program and incorporated
into the chapter text. Hence, charts and graphs may not appear in the same posi-
tion as when the programs are ran within Maple or Mathematica. Both Maple and
Mathematica have interactive interfaces with the output from the command line pro-
duced in the line immediately following the command unless output is suppressed.
Matlab offers both a Command Window and an Editor Window. The Command
Window is an interactive window similar to that of Maple and Mathematica. The
Editor Window allows a program to be written much like a Fortran program. After
running the program, the output appears in the Command Window, while the graph-
ical output appears in a separate graphics window.
The mathematical packages require a knowledge of various commands and
options for calculations and for producing charts and graphs. Though the commands
of each of the mathematical packages have similarities there are crucial differences
in syntax. We will explain each command or option within the program itself either
just before use or, mostly, just after use. This should serve to explain the meaning
of each command and option in the command line. The command line itself will
serve to show the syntax of each command and option. Where necessary, we will
add a more detailed explanation of a command or option. Where more information
on a command or option is desired, or to look up more options, reference can be
made to the package’s help. Matlab needs a bit more explanation, so we will give
an introduction to the use of Matlab in the chapter text before presenting the Matlab
program. For Excel, we will present the spreadsheet. The results of calculation cells
6 Use of Computers 161
are produced by hidden formulae. The address of each cell is specified by a row
number and a column letter, e.g., C3 specifies column C, row 3. By reference to the
cells address we will show the hidden formula in the body of the chapter text.
For each example, we will comment on the results after the solutions of each
of the four packages have been presented, though some comments will be included
either in the program itself or in the program section of the chapter text.
For the mean, variance and standard deviation we will use formulae (5.39), (5.44)
and (5.46):
n
f (xi )xi n
X = i=1
= p(xi )xi
n i=1
n
f (xi )(xi − μ)2
σ =
2 i=1
n
n 21
f (xi )(xi − μ)2
σ = i=1
n
The formulae we will need for the method of least-squares are (4.5), (4.6) and (4.10)
to (4.13):
n
(xi − x)yi
m = i=1
n
i=1 (xi − x)
2
c = y − mx
n 2 2
1
i=1 di
m =
D(n − 2)
n 2 2
1
1 x2 i=1 di
c = + .
n D (n − 2)
n
D= (xi − x)2
i=1
di = yi − Yi = yi − mxi − c
For lines through the origin we set c = 0 and use the formulae (4.14) to (4.16):
n
xi yi
m = i=n 1 2
i=1 xi
n 2 21
1 d
m = . i=
n
1 i
(n − 1) i=1 xi
2
di = yi − Yi = yi − mxi
162 6 Use of Computers
As we mentioned in Sect. 4.3, even when we know that the line passes through the
origin it can still be useful to calculate c since the amount by which the best line
misses the origin gives a visual indication of errors, particularly systematic errors.
In our second example, the line passes through the origin. We will use both the full
formula and the ‘through the origin’ formulae to calculate the best line.
All four software packages have inbuilt formulae for the mean, variance and
standard deviation. Note though, that the formulae refer to the sample variance and
sample standard deviation. In what follows, we prefer to enter the above formulae
by hand.
Before proceeding, it is perhaps worth noting that a histogram is a chart consisting
of contiguous columns with widths proportional to the class interval (continuous data
is divided into intervals, such that all data in an interval is regarded as a class, and
the number of data in the interval is counted as the frequency of that class) and with
areas proportional to the relative frequencies of the data. A bar chart is a histogram
in which the class intervals are equal for continuous data, or, in the case of discrete
data, each bar represents the frequency of each value of the discrete data.
For Example 1 we will show how to fit a curve to the charts and/or data points. The
fitted curve will not necessarily be that good given that we are considering only a small
number of data points, but we include it by way of example. Curve fitting methods rely
on selecting a suitable equation, then finding the coefficients that produce the curve
that most closely fits the data. The least squares method extended to include curves
as well as straight lines uses polynomials of various degrees. Maple, Mathematica
and Matlab provide both commands and a graphical interface for curve fitting. In
Maple we have used the graphical interface, while for Mathematica and Matlab we
used their curve fitting commands: the Fit command for Mathematica and the polyfit
command for Matlab. Their use is described in the respective programs.
(iii) Use Maples’s graphical curve fitting interface to produce a frequency curve
fitted to a frequency histogram.
(iv) Use Mathematics’s Fit command to produce a frequency curve fitted to a fre-
quency bar chart.
(v) Use Matlab’s Fit command to produce a frequency curve fitted to the frequency
data points.
We will solve Example 1 using each of the four software packages. Once all four
solutions are obtained some comments on the results will follow.
The aim of Example 1, as indicated in the example, is to show how to use the four
packages (i) to calculate the mean, variance, standard deviation and standard error
in the mean, (ii) to produce bar charts and histograms, (iii) to plot data points and
(iv) for curve fitting.
The calculation of the mean height together with the standard error in the mean is
done in the spreadsheet shown in Fig. 6.1. The frequency and relative frequency bar
charts and line graphs are shown in Fig. 6.2.
Refer to Fig. 6.1 of the Excel spreadsheet solution. The first (blue) row labels
the columns, while the first (blue) column labels the rows. The measured building
heights are entered in column B, rows 2 to 17 or, more conveniently stated, cells B2
to B17. In cell B18 we entered a formula to calculate the mean, while cell B19 is a
text cell in which we have rounded the mean by hand.
To enter a formula in a cell, activate the cell (by double clicking the left mouse
button) and type an equal sign followed by the formula. The formula appears in one
of the header text bars, while the result of the formula appears in the cell. The formula
in cell B18 looks like this,
= SUM(B2 : B17)/16.
The formula adds the contents of cells B2 to (the colon acts as ‘to’ ) B17 and divides
the answer by 16. Typically, and conveniently, formulae are written in terms of
cells rather than numbers. Below, we give the formulae for each cell that contains a
formula. With the mean calculated, the residuals can be calculated and this is done
in column C, rows C2 to C17. The formula for the residual in cell C2 is
= B2 − 33.48875,
where 33.48875 is the mean calculated in cell B18. Instead of laboriously typing the
formula in the rest of the cells in the column C, cell C2 can be highlighted, copied
and its contents pasted to the remaining cells C3 to C17. Excel will adjust the row
164 6 Use of Computers
Fig. 6.1 Excel spreadsheet to calculate the height of a building and the standard error in the mean
Fig. 6.2 Excel bar charts of the frequencies and relative frequencies of the building heights, together
with joined data point plots of the frequencies and relative frequencies of the building heights
6.1 Example 1. Building Heights 165
numbers accordingly. The formula for the square of the residual in cell D2 has the
form
= C2 ∗ C2,
and is repeated in cells D3 to D17 with cell addresses adjusted accordingly. The
variance is given in cell D18:
= SUM(D2 : D17)/16
= SQRT(D18)
= F20/SQRT(16 − 1)
Cell F22 rounds off the standard error in the mean to 2 dc. pl.
= ROUND(F21, 2)
The various plots are made from column G, rows 2 to 10 containing the frequencies,
and from column H, rows 2 to 10, containing the relative frequencies. Column H is
obtained by dividing column G by the total number of measurements (16) as, for
example, in cell H2:
= G2/16
In cell G11 the frequencies are summed to check that they sum to the total number
of measurements (16):
= SUM(G2 : G10)
In cell H11 the relative frequencies are summed to check they sum to 1:
= SUM(H2 : H10)
From Fig. 6.1 we get the mean building height and its standard error in the mean:
The original measurements were made to 4 significant figures (sf) so that the calcu-
lated mean cannot be more accurate than this. Hence, the standard error in the mean
must be rounded to 2 decimal places so that it corresponds to the 4th sf of the mean
height. The answer is therefore given to 4 sf with the error of the 4th sf indicated.
The Excel inbuilt functions AVERAGE and STDV can be used to find the mean
and sample standard deviations, respectively. We have preferred to enter the formula
166 6 Use of Computers
by hand, first, because it is more instructive, and second, because it is the standard
deviation that is needed to calculate the standard error in the mean, not the sample
standard deviation.
To produce a chart, follow these steps:
1. Highlight the column or row containing the data to be plotted. If more than one
row or column is highlighted each set of data is plotted on the same axes.
2. Select (by left clicking the mouse button) the top menu bar option Insert. A
sub-menu bar appears containing the group Charts. The Charts group offers
the following choice of charts: Column, Line, Pie, Bar, Area, Scatter and Other
Charts. Note that reference is not made to histograms. What are called Column
charts are bar charts, while what are called ‘bar charts’ are horizontal bar charts.
3. Select the type of chart desired. For example, select Column. A further menu
appears offering choices such as 2-D Column, 3-D Column etc.
4. Select 2-D Column. The chart is automatically produced.
5. Once the chart is produced, a number of format options appear in the top menu
bar. These include: colour of the columns, title bar, axes labels, a legend, grid, and
labelling of the columns. A variety of positions and combinations are offered.
The frequency charts of Fig. 6.2 were produced by highlighting (selecting) cells
G2 to G10 containing the frequencies and selecting Column for the first frequency
chart and Line for the second frequency chart. The relative frequency chart and line
chart of Fig. 6.2 are similarly produced by highlighting cells H2 to H10 containing
the relative frequencies.
[
[ Histograms
[ Histograms are produced with the Histogram command which plots the frequencies
of a list of data. In our case the list is bheights. The frequency of data is automatically
counted to produce the histogram . Because of this automation, the Histogram will
not be used to represent relative frequencies. To plot the relative frequencies pointplot
(see below) will be used.
[
where some options are [number of grid lines, linestyle = dot, color =
blue, thickness = 2]. Note also that gridlines = default attributes to both
axes the default values.
[ The horizontal and vertical grid lines can be given individual options
using axis[dir] = gridlines = [number of grid lines, color = colour name],
thickness = number between 0 and 1], where dir = 1 = horizontal axis,
while dir = 2 = vertical axis. An example is:
[> PP2 := Histogram(bheights, frequencyscale = absolute,
view[33.43..33.54, 0..4], axesfont = [Calibri, roman, 12],
titlefont = [Calibri, roman, 18], labelfont = [Calibri, roman, 14],
title = “Frequency of Building Heights", axes = frame,
labels = [“Building height (m)", “Frequency"],
labeldirections = [horizontal, vertical], color = “Orange",
axis[1] = [gridlines = [10, color = red ], thickness = 0],
axis[2] = [gridlines = [10, color = blue], thickness = 0]) :
[> display(PP2)
]
[ Graph PP2 is shown in Fig. 6.4.
[ thickness = t , where t is positive integer. t = 0 represents the thinnest line.
[ labels = [“x-axis label", “y-axis label"] - again, notice the speech marks,
they indicate a character string.
[ labeldirections = [horizontal, vertical] - specifies that the x-axis label should
be horizontal, while the y-axis label should be vertical.
[ style = polygon - gives a different appearance to the bars.
[ color = “name of the colour" - specifies the colour of the bars. Notice that
the colour names must be enclosed in speech marks. Examples of colour
names are “Red", “Blue", “Orange" among many others. Notice also that
the colour names must be capitalised.
[
[ ALTERNATIVELY TO CONTROLLING THE APPEARANCE OF A CHART
OR GRAPH BY COMMAND LINE OPTIONS AS ABOVE, MAPLE OFFERS
A GRAPHICS MENU BAR. THE GRAPHICS MENU BAR IS INVOKED
BY RIGHT CLICKING THE MOUSE BUTTON ON THE GRAPHIC.
[
[ Data plot, with data points joined by a line, of the frequencies of the
heights in bheights
[> PP3 := pointplot(bheightspnt, style = pointline,
symbol = circle, view[33.43..33.54, 0..4], axesfont = [Calibri,
roman, 12],
titlefont = [Calibri, roman, 18], labelfont = [Calibri, roman, 14],
title = “Frequency of Building Heights", axes = frame,
labels = [“Building heights (m)", “Frequency"],
labeldirections = [horizontal, vertical], color = “Orange") :
[> display(PP3)
[33.50, 162
], [33.51, 16
2
], [33.52, 161
], [33.54, 161
]]:
]
[ Data plot, with data points joined by a line, of the relative frequencies of
the heights in bheights
[> PP4 := pointplot(bheightspntrf , style = pointline,
symbol = circle, view[33.43..33.54, 0..3], axesfont = [Calibri,
roman, 12],
titlefont = [Calibri, roman, 14], labelfont = [Calibri, roman, 14],
title = “Relative Frequency of Building Heights", axes = frame,
labels = [“Building heights (m)", “Relativefrequency"],
labeldirections = [horizontal, vertical], color = “Orange") :
[
[
[ Curve Fitting
[ By way of example, we will try to fit a curve to the frequency data points
listed in bheightspnt.
[ We use Maple’s Interactive command which invokes a graphical inter-
face to enable curve fitting. Information on how to use the graphical
curve fitting interface can be found under curve fitting in Maple’s excel-
lent Help. There are two ways of using the Interactive command. First,
the data points are included in the command. Second, the data points
are left out, in which case the first graphic that appears is a table to enter
the data points. Here, we will invoke the graphical interface using the
interactive command with the data points included . Better results are
obtained if values of the heights are replaced by their position in the list
bheights, with repeated values counted as being in the same position.
Thus, the list of coordinates bheightspnt, [[33.45,1], [33.46,1], [33.47,3],
[33.48,3], [33.49,2], [33.50,2], [33.51,2], [33.52, 1] , [33.54,1]] is mapped
onto the following coordinates [[1,1] , [2,1], [3,3], [4,3], [5,2], [6,2], [7,2],
[8, 1] , [9,1]]. A number of curve fitting options are offered. We found
the least squares option with the polynomial ax8 + bx7 + cx3 + d 2 gave the
best curve. There is a choice of returning a graph or the coefficients of the
polynomial. We preferred to have the values of the ‘best fit’ coefficients
returned.
[ The following command invokes the graphical interface
[> Interactive ([ [1,1], [2,1], [3,3], [4,3], [5,2], [6,2], [7,2], [8,1], [9,1], x )
[ By selecting the interpolant of the on ‘Done’ return option, the plot inter-
polant function is placed in the worksheet.
[ Definition of the interpolant function using the coefficients of the inter-
polant returned by Interactive:
[> INTERPBH := (x) → 4381113840796886320764 x − 231804965121528376760
2452318603875385298701 2 22961214042980559683 3
x
+ 695414895364585130280 x + 2190556920398443160382
1615232083666614959 7 4238492937391799
x8
[ Note the syntax for defining a function in Maple.
[ Plot of fitted curve
[> PP5 := plot(INTERPBH (x), x = 0..10, view[0..10, 0..3.5],
axesfont = [Calibri, roman, 12], titlefont = [Calibri, roman, 18],
labelfont = [Calibri, roman, 14],
title = “Frequency of Building Heights", axes = frame,
labels = [“Building heights (m)", “Frequency"],
labeldirections = [horizontal, vertical], color = “blue",
axis[1] = [thickness = 1.5], axis[2] = [thickness = 1.5]):
[ We have suppressed the display of the plot as it is more interesting to
combine the fitted curve PP5 with the frequency histogram. To do this
the x− and y−axis scales should match. This can be achieved by first
creating a new list, bheieghtsp, by replacing each height by its position
6.1 Example 1. Building Heights 173
in the list bheights and writing its position a number of times equal to its
frequency, then plot a histogram of bheightsp
[> bheights := [1, 2, 3, 3, 3, 4, 4, 4, 5, 5, 6, 6, 7, 7, 8, 9] :
[
[ Plot of a histogram of the frequencies of the heights in bheightsp
[> PP6 := Histogram(bheightsp, frequencyscale = absolute,
view[0..10, 0..3.5], axesfont = [Calibri, roman, 12],
titlefont = [Calibri, roman, 18], labelfont = [Calibri, roman, 14],
title = “Frequency of Building Heights", axes = frame,
labels = [“Heights position label", “Frequency"],
labeldirections = [horizontal, vertical], color = “Orange",
axis[1] = [thickness = 1.5], axis[2] = [thickness = 1.5]):
[ Again we have not displayed graphic PP6, but will instead display the
combined plot of PP5 and PP6:
[> display(PP5, PP6)
[
[ The combined PP5 and PP6 plot is shown in Fig. 6.7.
[ As can be seen, the curve fitting is reasonably good, though the end part
of the curve deviates from the Gaussian shape expected for measure-
ments of this type. This is due to the small number of measurements
made.
174 6 Use of Computers
or plot. This brings up an options box. Selecting (by left clicking the mouse button)
the option Drawing Tools brings the Drawing Tools options box, which offers some
options that control the appearance of the chart or plot. (We will repeat these instruc-
tions within the Mathematica program that follows.) In Mathematica, text lines are
in regular type while input lines use a typewriter font. The output of a command is in
grey type. After the program is run, input and output lines are numbered and begin
with ‘In[i]:=’ and ‘Out[i]:=’, respectively. Text lines are not numbered. Mathematica
indicates cells and cell groups by nested square brackets placed at the right of the
program page. Though, in the presentation of the program that follows, we have
tried to follow the format of the program within Mathematica, we will not include
the nested square brackets. The output of a command is in italic blue type. We will
suppress the output of most commands, including those for charts or graphs, included
only to illustrate the use of graphical options. Only command line output relevant to
the solution of Example 1, will be shown.
In[1]:= clear;
The data points, the measured building heights, are entered as
a list labelled bheights
In[2]:= bheightsi ={33.45, 33.46, 33.47, 33.50, 33.49,
33.51, 33.48, 33.52, 33.47, 33.48, 33.49,
33.47, 33.51, 33.50, 33.48, 33.54};
Note that terminating a command line with a semicolon sup-
presses output, while leaving a command line open shows out-
put.
The Sort command is used to place the measurements in
ascending order
In[3]:= bheights = Sort[bheightsi]
Out[3]:= {33.45, 33.46, 33.47, 33.47, 33.47, 33.48,
33.48, 33.48, 33.49, 33.49, 33.5, 33.5, 33.51,
33.51, 33.52, 33.54}
The Sum command is used to sum the measurements
In[4]:= bheightssum = Sum[bheightsi[[i]], {i, 1,
16}] ;
The syntax for the Sum command is : Sum[i, {i, imin , imax }].
bheights[[i]] refers to the ith item of the list, e.g.,
In[5]:= bheights[[3]]
Out[5]:= 33.47
176 6 Use of Computers
0.006
The standard error in the mean to 2 dc. pl. is
0.01
Answer. Height of the building = 33.49 ±0.01
m
The original measurements were made to 4 significant figures
(sf) so that the calculated mean cannot be more accurate than
this. Hence, the standard error in the mean must be rounded to
2 decimal places so that it corresponds to the 4th sf of the mean
height. The answer is therefore given to 4 sf with the error of
the 4th sf indicated.
Mathematica has inbuilt functions for the mean, variance and
standard deviation, namely, Mean[], V ariance[] and
StandardDeviation[], respectively. But, it should be noted that
6.1 Example 1. Building Heights 177
the latter two functions deliver the sample variance and sample
standard deviation. We have preferred to enter the formulae by
hand, first, because it is more instructive, and second, because
it is the standard deviation that is needed to calculate the stan-
dard error in the mean, not the sample standard deviation.
The Charts
There are a variety of options for controlling the appearance
of charts or graphics. In Mathematica the meaning and use of
graphics options is obvious from their use in the command
line. Similarly, the syntax of the options is fairly clear from
the command line. More values of the options used here, and
indeed, many more options, can be found in Mathematica’s
excellent Help. Hence, we will only add a few extra explana-
tory comments here and there. With some exceptions, the
options apply to all types of charts and graphs. The options we
present are, of course, by no means comprehensive. Note that
Mathematica has separate commands for histograms and bar
charts. Although height is a continuous quantity, the measured
values are necessarily discrete. Bar charts are preferred in the
case of discrete values because the column labels refer to the
discrete value and the frequency also refers to that value. His-
tograms tend to count data points in unequal intervals, Another
advantage of bar charts is that the plots of relative frequencies
can be produced in a straight forward way. We will. however,
include histograms by way of example.
Histograms
Histograms are produced with the Histogram command which
plots the frequencies of a list of data. In our case the list is
bheights. The frequency of data is automatically counted to pro-
duce the histogram.
Histogram of frequencies of the heights in bheights
In[15]:= QQ1=Histogram[bheights, 10, ChartLabels →
{33.45, 33.46,
33.47, 3.48, 33.49, 33.5, 33.51, 33.52,
33.54},
AxesLabel → {HoldForm["Height (m)"],
HoldForm["Frequency"]},
PlotLabel → HoldForm["Frequency of Building
Heights"],
LabelStyle → {FontFamily → "Calibri", 14,
GrayLevel[0]},
ChartStyle → "Pastel"];
178 6 Use of Computers
In[16]:= Show[QQ1]
]
Histogram QQ1 is shown in Fig. 6.8.
The number 10 specifies the number of bars. Obviously, this
number must be chosen as an integer. HoldForm[“text or
expression ] prints text (indicated by enclosing the text in speech
marks) as is, and prints expressions (indicated by leaving out
the speech marks) unevaluated. Chartstyle selects colours or
colour schemes for the columns in a bar chart or histogram.
ChartStyle → {Red , Green, Orange, ...} selects a color for each
column. The 14pt in LabelStyle is the point size for the axes
labels and axes numbering. BaseStyle → {FontFamily → "Cal-
ibri", FontSize → 16} controls all elements of the plot. The point
size for axes labels and plot labels can also be chosen from the
Format menu offered in the top menu bar of the Mathematica
window. The point size chosen from the menu bar overrides the
value given in LabelStyle. For the histogram in Fig. 6.8, 18 pt was
chosen for the plot label from the Format menu. GrayLevel[d]
specifies how dark or light the objects it refers to are: d = 0 to 1,
0 = black, 1 = white. In our case, GreyLevel specifies that text
should be black.
MATHEMATICA OFFERS A GRAPHICS MENU PALLET NAMED
Drawing Tools. AS THE NAME IMPLIES, IT OFFERS NUMER-
OUS DRAWING TOOLS, SUCH AS ADDING
ARROWS, TEXT, RECTANGLES ETC. Drawing Tools IS
INVOKED AS FOLLOWS: RIGHT CLICK THE MOUSE BUT-
TON ON THE CHART OR PLOT. THIS BRINGS UP AN
OPTIONS BOX. SELECTING (BY LEFT CLICKING THE
MOUSE BUTTON) THE OPTION DrawingTools BRINGS UP
THE Drawing Tools PALLET.
Bar Charts
Bar chart of the frequencies of the heights in bheights
In[17]:= QQ2 := BarChart[1, 1, 3, 3, 2, 2, 2, 1, 1, ChartLabels → 33.45,
33.46, 33.47, 3.48, 33.49, 33.5, 33.51, 33.52, 33.54, ChartEle-
mentFunction → “GlassRectangle", ChartStyle → “Pastel",
AxesLabel → {HoldForm[“Height (m)"], HoldForm
[“Frequency"]}, PlotLabel → HoldForm[“Frequency of Building
Heights"], LabelStyle → {FontFamily → “Calibri", 12, GrayLevel
[0]}, AxesStyle → Thick, TicksStyle → Directive[Thin]]
]
In[18]:= Show[QQ2]
Bar chart QQ2 is shown in Fig. 6.9.
6.1 Example 1. Building Heights 179
Curve Fitting
The Mathematica curve fitting command is Fit, with syntax:
Fit[data, fit function, fit function variable], where data = the data
to fit, fit function = polynomial chosen for the fit, and fit function
variable = the variable in which the polynomial is expressed.
The Fit command is based on the least-squares method which
uses polynomials as the fit-functions. For example, a linear fit
uses ax + b, a quadratic fit uses ax2 + bx + c, while a cubic
fit uses ax3 + bx2 + cx + d . The fit polynomial is entered as
either linear = {1,x}, quadratic={1, x, x2 }, cubic={1, x, x2 , x3 } etc.
as shown in the following Fit command, which fits a curve to
the frequency data points in the list bheigtsfreq.
Matlab takes a bit more learning than Maple or Mathematica, but the extra effort is
well worth it. Each of the three software packages has its advantages. For Matlab,
the advantage is that it allows a variety of mathematical operations to be carried out
on arrays, and is hence ideal for matrix operations. We will assume some familiarity
with Matlab features and offer only a brief overview.
When Matlab is opened, four windows appear together with a top menu bar that
leads to a number sub-menu bars offering comprehensive choices. The four win-
182 6 Use of Computers
Fig. 6.9 Mathematica bar chart of the frequencies of the building heights
Fig. 6.10 Mathematica bar chart of the relative frequencies of the building heights
Fig. 6.12 Mathematica line and data plot of relative frequencies of the building heights
Fig. 6.13 Mathemtica bar chart of the frequencies of the building heights together with a fitted
frequency curve
dows are: Current Folder, W orkspace, Editor and Command W indow. A window
is invoked by left-clicking the mouse button on the window of interest. Commands
placed in the Command W indow are executed immediately by pressing the keyboard
return key. This is fine for a few simple commands. But, when a large number of
commands are involved, and especially to keep a permanent record of the command
sequences, writing the commands in the editor window then saving them to a Matlab
.m file is much preferred. The sequence of commands is, of course, a program. To run
the program, choose the Editor sub-menu bar from the top menu bar and left-click
the Run icon. The output appears in the command window. Plots are displayed in a
separate graphics window which opens after the program is ran.
Help on any Matlab command or option is obtained by using the help command
in the Command W indow. For example help errorbar gives help on the errorbar
command used, as the name implies, to produce error bars on a graph.
184 6 Use of Computers
In Matlab there are two sets of operators for multiplication, division and expo-
nentiation. The ordinary symbols ∗ , / and ∧ are reserved for matrix operations.
The ‘dot’ versions .∗ , ./ and .∧ are reserved for element-by-element operations
between arrays and for ordinary multiplication of functions. Since numbers can be
viewed as scalar matrices both forms of the operations can be used with numbers.
For addition and subtraction, no such distinction is necessary and so the usual + and
− symbols are used for all cases. Some examples are:
[1, 2, 3]. ∗ 3 = [3, 6, 9], [12, 9, 15]./3 = [4, 3, 5], [1, 2, 3].2 = [1, 4, 9],
3. ∗ 4 = 12, or 3 ∗ 4 = 12, both forms are equally valid
sin(x). ∗ cos(x), The correct way to multiply functions
sin(x) ∗ cos(x), The wrong way to multiply functions. Results in an error message
There are two types of Matlab .m files: the script .m file and the function .m file.
As its name implies, the latter is used to define functions. To write a program, the
script option is selected in the editor window by left-clicking the New icon in the
Editor sub-menu, then left-clicking the script option. Once the program is written
in the Editor window it is saved as a script .m file when the Save icon or option is
selected. The Function definition is selected in the same way except that the function
option is selected instead. In this case, a function definition template appears in the
Editor window. This is a great help since functions must be defined with a specific
format and it is this format that distinguishes the function.m file from a script.m file.
We will come across function.m files in the solution program.
There are two ways to control the appearance of graphics. The first is to use
command line options. The second is to use Matlab’s graphical interface. We will
use the command line options to produce the charts and plots that follow as this is
more instructive. But, the graphical interface is very useful and so we will describe
how to invoke it.
When a program containing a chart and/or plot is ran, a new window opens to
display the chart or plot. The new window also contains a menu bar which allows a
variety of options that control the appearance of the chart or plot to be chosen. The
steps to invoke the graphical interface are as follows:
(1) In the new graphics window displaying the plot or chart, left-click on the Tools
option in the top menu bar to open the Tools menu .
(2) Left-click on the Edit plot option.
(3) Double left-clicking on the background of the plot brings up the Property
Editor window specific to the background. Here, a title and axis labels can be
entered, a number of grid configurations can be chosen, a background colour can
be selected and so on. On the other hand, double clicking on the bars of a chart,
the points of a data plot, or the curve of a line plot brings up the Property Editor
window specific to these graphic elements.
Curve fitting can be done either with Matlab’s polyfit command or with a graphical
interface. The polyfit command produces a best fit polynomial function which can
then be plotted. An explanation for doing this is given in the solution program.
6.1 Example 1. Building Heights 185
Invoking the graphical curve fitting interface is achieved by following the same initial
steps as for graphics, except that Basic Fitting is selected from the Tools menu. A
menu window appears offering a choice for the degree of the polynomial to be used
for the curve fitting. Once the choice is made the curve is plotted on the data points
graph or on the chart. The numerical values of the coefficients of the polynomial
used for the curve fitting are returned in same options window.
clear
format long
% ‘Format long’ specifies a scaled fixed point format with 15 digits for double
% precision (and 7 digits when single precision is specified with, for example, ‘Y
% = single(X)’ ). Other format options can be found by typing ‘help format’ in the
% ‘Command Window’
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% Matlab is framed in terms of arrays, mostly numerical arrays (matrices). In view
% of this, what we have been calling lists in Maple or Mathematica are better referred
% to as one-dimensional matrices, i.e., vectors. Therefore, we refer to our sets of
% measured values as vectors rather than lists.
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% The percent sign % indicates a text line. Commands can be temporarily suppressed
% by converting them to text using the % sign.
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% One of the advantages of Matlab is that it allows many mathematical operations
% to be applied to the whole array. In the above command each element of the array
% is divided by 16.
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% two functions deliver the sample variance and sample standard deviation. We have
% preferred to enter the formulae by hand, first, because it is more instructive, and
% second, because it is the standard deviation that is needed to calculate the standard
% error in the mean, not the sample standard deviation.
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% xxxxxxxxxxxxxxxxxxxxxxxxxx THE CHARTS xxxxxxxxxxxxxxxxxxxxxxxxxx
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% There are a variety of options for controlling the appearance of charts and graphs.
% In Matlab, the meaning and use of graphical options is usually obvious from their
% use in the command line. Similarly, the syntax of the options is fairly clear from
% their use in the command line. Hence, we will only add a few extra explanatory
% comments here and there. More options and values of the options can be found
% by using Matlab’s ‘help’ command as described in the introduction of the main
% text of this section.
% Bar chart of the relative frequencies of the measured building heights labelled by
% their position in ‘bheights’ (with equal values counted in the same position)
subplot(2,2,2)
bar(y2,‘m’),title(‘Relative Frequency of Building Heights’),
xlabel(‘Height position’), ylabel(‘Relative frequency’),
axis([0,10,0,0.25])
% We will use this form of ‘plot’ to combine curves and data points on the same axes.
% Plot of joined frequency data points of the measured building heights labelled
% by their position in ‘bheights’ (with equal values counted in the same position)
subplot(2,2,3)
plot(x,y1,‘p’,x,y1),title(‘Frequency of Building Heights’),
xlabel(‘Height position’), ylabel(‘Frequency’), grid, axis([0,10,0,3.5])
% The indicator ‘p’ in the first occurrence of x, y1 results in the data points being
% represented by a pentogram symbol. The second occurrence of the same data
% points x,y1 results in a straight line. The indicator ‘-’ for a solid line is not needed
% since, as mentioned above, it is the default value.
% Plot of joined relative frequency data points of the measured building heights
% labelled by their position in ‘bheights’ (with equal values counted in the same
% position)
subplot(2,2,4)
plot(x,y2,‘o’,x,y2), title(‘Relative Frequency of Building Heights’),
xlabel(‘Height position’), ylabel(‘Relative frequency’), grid,
axis([0,10,0,0.25])
polyfit(x,y1,4)
% A polynomial fit function can be defined using the results from ‘polyfit’, then
190 6 Use of Computers
% The plot of the function FREQFIT against the positions of the building heights
% in ‘bheights’ (with equal values assigned separate positions)
plot(x,FREQFIT(x),x,y1,‘p’), axis([0,10,0,3.5]),
title(‘Building Heights’), xlabel(‘Position in bheights’),...
ylabel(‘Frequency’)
% The fitted curve and the data points are shown in Fig. 6.15.
% There are variations in the syntax of the ‘errorbar’ command which can be seen
% by typing ‘help errorbar’ in the command line of the command window.
% Definition of the z vector and error bar vector e: The two vectors must have the
% same dimensions. The error vector contains the error in each reading as given by
% the standard error in the mean. In our case. the standard error is 0.01 and is the
% same for each measurement.
% First define the list z of building heights (i.e. relabel ‘bheights’ by z to simplify
% the look of the ‘errorbar’ command), and define a list e of errors of the same length
% as ‘bheights
z=bheights
e=[0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, ...
0.01, 0.01, 0.01, 0.01, 0.01]
% Plot of the building heights versus their position in ‘bheights’ (with equal values
% assigned separate positions) together with error bars, i.e., plot z versus x1 together
% with error bars
errorbar(z, e), title(‘Building Heights’),...
xlabel(‘Position in bheights’), ylabel(‘Height (m)’)
% Note that Matlab overwrites earlier graphics when more than one graphic or
% graphic array is included in a program. The easiest way to see the plot or plot array
% of current interest is to temporarily suppress other plots or plot arrays using the
% percent sign %.
The fit function used in the plot command to plot the fitted curve with the data points
6.1 Example 1. Building Heights 191
Fig. 6.14 Matlab plots. TOP: Bar charts of the frequencies and relative frequencies of the building
heights labelled by their position in bheights. BOTTOM: Joined data points of the frequencies and
relative frequencies of the building heights also labelled by their position in bheights
is defined in the following Matlab.m function file named FREQFIT.m. Note that the
syntax of the top and bottom lines define the file to be a function file as opposed to
a program file:
Fig. 6.15 The graph shows the fit curve produced by the polynomial ax4 + bx3 + cx2 + dx + e
using the values of the coefficients produced by Matlab’s polyfit command
Fig. 6.16 Matlab plot of the measured building heights labelled by their position in ‘bheights’
together with error bars
The result shows reasonable precision. Since we judge the systematic error to be
negligible compared to the random error, we may also conclude that the measurement
is reasonably accurate.
The charts and joined data plots of the frequencies approach a Gaussian distri-
bution as is expected for this kind of measurement, though the last quarter of the
curve shows a marked deviation from a Gaussian curve. This is probably because
the measurements were too few.
The spreadsheet format of Excel is convenient for calculations of the type car-
ried out above. In Excel, the plotting of bar charts and data line graphs is automated,
producing quality results very simply. Various display options are offered. The math-
ematical packages require more work, but offer a large range of display options for
charts and graphs. For plotting functions, such as the best straight line, the mathe-
matical packages are much preferred.
6.2 Example 2. Acceleration due to Gravity 193
1 2
h= gt . (6.1)
2
Here, we want to find g graphically. Casting the equation in the form
2 t2
= (6.2)
g h
shows that g can be determined from the slope of a linear graph (invariably preferred
to nonlinear graphs) of t 2 versus h, i.e.,
2
= slope
g
2
g= (6.3)
slope
Since, as seen from Eq. (6.1), for t = 0, h = 0 the line passes through the origin.
The method of least squares offers two approaches for finding the best straight line
for this case:
194 6 Use of Computers
Approach 1: Ignore this information and determine both the best slope m and the best
c and their errors using Eqns. (4.5), (4.6) and (4.10)–(4.13). This approach has the
advantage of visually indicating the size of an error by the distance the line misses
the origin. For small errors, the distance of the line from the origin may be too small
to be visible.
Approach 2: Use the information and use formulae (4.14) to (4.16) for lines through
the origin to determine the best slope m and its error, then draw the line with this
slope through the origin.
We want to do the following:
1. Use approach 1 to determine the slope m of the best line and where the best line
cuts the y-axis, together with their errors.
2. Use the best value for the slope m and its error m to calculate the acceleration
due to gravity g and its error.
3. Plot the equation of the line with the best slope m passing through c and add the
data points defined by the heights and the mean times squared for each height.
4. Use approach 2 to determine the slope m of the best line through the origin together
with its error.
5. Define the equation of the line with best slope m and c = 0, and hence plot the
best straight line through the origin.
6. Repeat the calculation of 2 for the m of the line through the origin.
7. Combine the plots of the best straight lines from the two approaches and the data
points in various combinations.
The aim of Example 2 is to show how to use the four packages to perform calcu-
lations using the method of least squares and then to plot the best straight lines and
data points.
Here we assume that the Excel spreadsheet solution of example 1 has been studied,
so we won’t repeat the explanations given there. The Excel spreadsheet solution of
Example 2 follows the same pattern as for Example 1, only the formulae entered
in the formula cells are different. We will therefore give the hidden formula in the
formulae cells.
Since times were taken to 3 sf, calculations in the spreadsheet were carried out to
4 sf.
The mean values of the 5 times squared, t2i , for each height will be used to plot
the graph. Therefore, in formulae (4.5), (4.6), (4.10)–(4.13) and (4.14)–(4.16) the
heights hi correspond to xi , and the t2i corresponds to yi .
FIRST. Determination of the best m and c and their errors using formulae (4.5),
(4.6) and (4.10) to (4.13)
6.2 Example 2. Acceleration due to Gravity 195
Fig. 6.17 Excel spreadsheet to find g by first determining the best m and c and their errors using
formulae (4.5), (4.6) and (4.10) to (4.13)
For what follows, refer to Fig. 6.17. Cells A3 to A13 count the data. The heights
hi are entered in column B. The five times measured for each height are placed in
columns C to G. Aside from text cells the remaining cells contain formulae. The
formulae are hidden; only the results of the formula are shown in the formulae cells.
We will therefore list the formulae cells together with the hidden formula they contain
by first giving the cell address followed by the hidden formula it contains (enclosed
in square brackets). We will also add a description of the formula. For columns
containing cells with the same formula referring to different rows we will give the
formula in the first cell of the column and indicate that the formula is repeated in
the remaining cells with the abbreviation FRF (formula repeated from) followed by
‘cell i → cell f’.
• B14: [=SUM(B3:B13)/11], h
• H3: [=SUM(C3:G3), sum of the five times for h1 , FRF H4 → H13
• I3: [=H3/5], t1 , FRF I4 → I13
• J3: [=I3*I3], t21 , FRF J4 → J13
• J14: [=SUM(J3:J13)/11], t2i
• K3: [=B3-0.6], residual of hi , FRF K4 → K13
• L3: [=K3*K3], square of the residual h1 , FRF L4 → L13
• L14: [=SUM(L3:L13)], D
196 6 Use of Computers
• N3: [=(J3-0.204*B3-(-0.000278))
2
]
• N16: [=SUM(N3:N13)], 11 i=1 di
2
Fig. 6.18 Excel spreadsheet to find g by first determining the best m and its error using formulae
(4.14) to (4.16) for a line through the origin
The answer from the Excel spreadsheet in Fig. 6.17, obtained by approach 2, is:
Acceleration due to gravity is g = 9.82 ± 0.03 ms−2
Since the 1st (rounded) significant figure of the standard error in g corresponds to
the 3rd significant figure of g, the calculated value of g is given to 3 sf.
We will comment on the apparent extra accuracy using approach 2 in the overall
comment section, Sect. 6.2.5, for Example 2.
198 6 Use of Computers
The Maple program for the solution of Example 2 follows. For introductory aspects
on the Maple language, see subsection 6.1.2 in which Maple is used for Example 1.
Additional commands that are needed for Example 2 are explained in the solution
program that follows.
[
[ The mean times are squared to obtain < t >2i . These values are entered
into the array timesmeansq using a for loop
200 6 Use of Computers
htsqsum
[> best_slope := DD
best_slope := 0.2039810000 (3)
[
[ Calculation of c, where the best line cuts the y-axis: c =< y > −m <
x >=< t >2i −m < h >
[> c := timesmeansq_mean − best_slope ∗ gheightsmean
c := −0.0002779818 (4)
[ The acceleration g due to gravity is given by g = 2h/t 2 = 2/best_slope
2
[> g := best_slope
g := 9.804834764 (5)
[
[ The standard error in the slope is found from formula (4.10), after first
finding di given by formula (4.13)
[> d _i = [timesmeansq[1] − 0.1 · best_slope − c, timesmeansq[2] − 0.2 ·
best_slope − c,
timesmeansq[3] − 0.3 · best_slope − c, timesmeansq[4] − 0.4 ·
best_slope − c,
timesmeansq[5] − 0.5 · best_slope − c, timesmeansq[6] − 0.6 ·
best_slope − c,
timesmeansq[7] − 0.7 · best_slope − c, timesmeansq[8] − 0.8 ·
best_slope − c,
timesmeansq[9] − 0.9 · best_slope − c, timesmeansq[10] − 1.0 ·
best_slope − c,
timesmeansq[11] − 1.1 · best_slope − c]:
[
[ Calculation of di2 :
d _i_sq := [d _i[1]2 , d _i[2]2 , d _i[3]2 , d _i[4]2 , d _i[5]2 , d _i[6]2 , d _i[7]2 ,
[>
d _i[8]2 , d _i[9]2 , d _i[10]2 , d _i[11]2 ] :
[
[ Determination of the sum of di2 :
[> d _i_sq_sum := sum(d _i_sq[h], h = 1..11)
[
[ The standard error in the slope is given by
0.5
d _i_sq_sum
[> St_Error_in_m :=
1
11−2 · DD
St_Error_in_m := 0.001299147886 (6)
[
[ The standard error in c is
0.5
gheightsmean2
[> Dc :=
1
11 + DD · d _i_sq_sum
11−2
Dc := 0.0008811249659 (7)
[
202 6 Use of Computers
[ The standard error in g is found from the standard error in the slope m
using formula (3.19) for errors in inverse proportionalities
g·St_Error_in_m
[> sterrg := best_slope
Sterrg := 0.06244665121 (8)
[
[ Answer. The acceleration due to gravity is g =
9.80 pm 0.06 ms−2
[ Since the 1st (rounded) significant figure of the standard error in g corre-
sponds to the 3rd significant figure of g, the calculated value of g is given
to 3 sf.
[
[ APPROACH 2. Determination of the best m and
its error using formulae (4.14) to (4.16) for a line
through the origin
[
[ Calculation of the sum of products of heights in gheights and the mean
times squared in timesmeansq, i.e., calculation of xi .yi = hi . < t >2i :
[> htsqsum_origin := sum(gheights[i] · timesmeansq[i], i = 1..11):
[
[ Calculation of the sum of the squares of the heights in gheights
[> gheightssqsum := sum(gheights[i]2 , i = 1..11):
[
[ Calculation of di given by formula (4.16):
[> dd _i := [timesmeansq[1] − 0.1 · best_slope, timesmeansq[2] − 0.2 ·
best_slope,
timesmeansq[3] − 0.3 · best_slope, timesmeansq[4] − 0.4 · best_slope,
timesmeansq[5] − 0.5 · best_slope, timesmeansq[6] − 0.6 · best_slope,
timesmeansq[7] − 0.7 · best_slope, timesmeansq[8] − 0.8 · best_slope,
timesmeansq[9] − 0.9 · best_slope, timesmeansq[10] − 1.0 · best_slope,
timesmeansq[11] − 1.1 · best_slope]:
[
[ Calculation of di2 :
dd _i_sq := [dd _i[1]2 , dd _i[2]2 , dd _i[3]2 , dd _i[4]2 , dd _i[5]2 , dd _i[6]2 ,
[>
dd _i[7]2 , dd _i[8]2 , dd _i[9]2 , dd _i[10]2 , dd _i[11]2 ] :
[
[ Determination of the sum of di2 :
[> dd _i_sq_sum := sum(dd _i_sq[h], h = 1..11):
[
[ The slope of the best line through the origin is found from formula (4.14):
6.2 Example 2. Acceleration due to Gravity 203
htsqsum_origin
[> best_slope_origin := gheightssqsum
Height",
axes = frame, labels = [“Height (m)", “Time squared (seconds sq.)"],
labeldirections = [horizontal, vertical], color = “Orange",
symbolsize = 20, legendstyle = [location = right], legend = “Data
points"):
[ The plot is suppressed because it will be shown below combined with
other plots.
[
[ The Graph for Approach 1
[ The following defines the line with the best slope given by best_slope,
which cuts the y- axis at c = -0.00028:
[> TSQ := (h) → best_slope · h + c:
TSQ := h → best_slope h + c (13)
[
[ Plot of the line TSQ:
[> PG2 := plot(TSQ(h), h = 0..1.2, axesfont = [Calibri, roman, 12],
titlefont = [Calibri, roman, 18], labelfont = [Calibri, roman, 14],
title = “Time Squared versus Height", axes = frame,
labels = [“Height (m)", “Time squared (seconds sq.)"],
labeldirections = [horizontal, vertical], color = “Red",
legendstyle = [location = right], legend = “Best Line");
[ The plot is suppressed because it will be shown below combined with
other plots.
[ Combined plot of TSQ and the data points
[> display(PG1, PG2):
[
[ The Graph for Approach 2
[ Since the line passes through the origin the equation of the best line has
slope given by best_slope_origin and c = 0;
[> TSQorigin := (h) → best_slope_origin · h:
[
[ Plot of the line TSQorigin through the origin
[> PG3 := plot(TSQorigin(h), h = 0..1.2, axesfont = [Calibri,
roman, 12],
titlefont = [Calibri, roman, 18], labelfont = [Calibri, roman, 14],
title = “Time Squared versus Height", axes = frame,
labels = [“Height (m)", “Time squared (seconds sq.)"],
labeldirections = [horizontal, vertical], color = “Blue",
legendstyle = [location = right], legend = “Best Line through the
origin",
linestyle = [dot]):
Combined plot of TSQ, TSQorigin
and the data points
6.2 Example 2. Acceleration due to Gravity 205
Fig. 6.19 Maple plot of the line with the best m and best c, the line with the best m passing through
the origin and the data points
The Mathematica program for the solution of Example 2 follows. For introductory
aspects on the Mathematica language, see subsection 6.1.3 in which Mathematica
is used for Example 1. Additional commands that are needed for Example 2 are
explained in the solution program that follows.
The contents of the array timesmean can be shown using a Print com-
mand and a For loop The syntax is shown in the following command.
Output is suppressed.
In[22]:= For[i = 1,i < 12, i++, Print [timesmean[i]]];
The mean times are squared to obtain < t >_i∧ 2. These values are
entered into the array timesmeansq using a For loop.
In[23]:= For [i = 1, i < 12, i++, timesmeansq[i] = timesmean
[i]∧ 2] ;
To calculate the slope m of the best straight line and where it cuts the
x-axis, formulae (4.5) and (4.6) are used with x_i = h_i and y_i =<
t >_i∧ 2, where h_i are the heights contained in the list gheights, and
where < t >_i∧ 2 are the mean squared times in the array timeseansq
The standard error in the slope is found from formula (4.10), after first
finding d _i given by formula (4.13)
In[32]:= di = { timesmeansq[1] - bestslope*0.1 - c,
timesmeansq[2] - bestslope*0.2 - c,
timesmeansq[3] - bestslope*0.3 - c,
timesmeansq[4] - bestslope*0.4 - c,
timesmeansq[5] - bestslope*0.5 - c,
timesmeansq[6] - bestslope*0.6 - c,
timesmeansq[7] - bestslope*0.7 - c,
timesmeansq[8] - bestslope*0.8 - c,
timesmeansq[9] - bestslope*0.9 - c,
timesmeansq[10] - bestslope*1.0 - c,
timesmeansq[11] - bestslope*1.1 - c};
Calculation of d _i∧ 2
In[33]:= disq = d_i∧ 2 ;
Determination of the sum of d _i∧ 2
In[34]:= disqsum = Sum[disq[[i]], {i, 1, 11}];
The standard error in the slope m is given by
In[35]:= StErrm = ( 1/(11 - 2) *(disqsum/DD) )0.5
Out[35]:= 0.00129915
The standard error in c is found from Eq. (4.11)
In[36]:= Dc = ( (1/11 + gheightsmean∧ 2/DD)*(disqsum/
(11 - 2)) )∧ 0.5 ;
6.2 Example 2. Acceleration due to Gravity 209
Out[36]:= 0.000881125
The standard error in g is found from the standard error in the slope
m using formula (3.19) for errors in inverse proportionalities
In[37]:= sterrg = g*StErrm/bestslope
Out[37]:= 0.0624467
The standard error in g is found from the standard error in the slope
m using formula (3.19) for errors in inverse proportionalities
In[48]:= sterrgorigin = g*StErrmorigin/bestslopeorigin
Out[48]:= 0.0283661
The Graphs
Plot of Data Points
The following is a plot of the data points, i.e., the squares of the mean
times for each height contained in the array timesmeansq versus the
heights contained in the list gheights.
In[49]:= PG1 = ListPlot[ { {0, 0}, {0.1, 0.0206}, {0.2,
0.0406},
{0.3, 0.0624},{0.4, 0.0787}, {0.5, 0.1028}, {0.6,
0.1217},
{0.7, 0.1411}, {0.8, 0.1624}, {0.9, 0.1844}, {1.0,
6.2 Example 2. Acceleration due to Gravity 211
0.2052},
{1.1, 0.2234} }, PlotLegends → Placed[{"Data
points"}, Right],
PlotStyle → {Black, PointSize[0.03]}, AspectRatio
→ 5/4,
LabelStyle → {FontFamily → "Calibri", 14,
GrayLevel[0]},
AxesStyle → Thick, TicksStyle → Directive[Thin]];
Right],
AspectRatio → 5/4, AxesStyle → Thick,
TicksStyle → Directive[Thin]];
A combined plot of TSQ, TSQorigin and the data points
In[54]:= Show[PG1, PG2, PG3, PlotLabel → HoldForm["Time
Squared versus Height"], BaseStyle → {FontFamily
→ "Calibri", FontSize → 16} ]
We see that the difference between the best line and the best line
through the origin is too small to be visible graphically. As a result the
error indicated by the distance the best line misses the origin is also
too small to be seen visually. Note that the axes labels in the original
plots do not appear in the combined plot.
The Matlab program for the solution of Example 2 follows. For introductory aspects
on the Matlab language, see subsection 6.1.4 in which Matlab is used for Example
1. Additional commands that are needed for Example 2 are explained in the solution
program that follows
% Matlab Program for the Solution of Example 2. To determine the acceleration due
% to gravity by measuring the time t for an object to fall through various heights h
% and by using the method of least squares.
% Recall that the dot versions of the usual arithmetic operators, i.e., .* and ./ and .∧
% apply to each element of an array.
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% APPROACH 1. Determination of the best m and c and their errors using formulae
% (4.5), (4.6) and (4.10) to (4.13)
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
clear
format long
% The mean of ‘gheights’ is found using Matlabs ‘sum’ command, then entered into
% the array gheightsmean
gheightsmean = sum(gheights)/11
% The 11 sets of five times for each height are entered into 11 arrays labelled ‘times’
214 6 Use of Computers
% The mean times are squared to obtain < t >_ i∧ 2. These values are entered into
% the array ‘timesmeansq’
timesmeansq = timesmean. ∧ 2
% To calculate the slope m of the best straight line and where it cuts the x-axis
% formulae (4.5) and (4.6) are used with x_ i = h_ i and y_ i = < t >_ i∧ 2, where
% h_ i are the heights contained in the list ‘gheights’, and where < t > _ i∧ 2 are the
% squared mean times in the array ‘timeseansq’
% Calculation of (x_ i− < x >)y_ i = (h_ i− < h >) < t > _i∧ 2,
% where < h > = mean of ‘gheights’
htsq = (gheights-gheightsmean).*timesmeansq
% The ability to operate element by element on arrays and matrices by using the dot
% versions of arithmetic operators, as mentioned earlier, is one of the big advantages
% of Matlab. Thus, using .* in the above command line we were able to first subtract
% ‘gheightsmean’ from each element of the ‘gheights’ array then multiply each result
% by ‘timesmeansq’, making the calculation much easier.
% Calculation of the sum of (x_ i− < x >)y_ i = (h_ i− < h >)< t >_ i∧ 2
htsqsum = sum(htsq)
6.2 Example 2. Acceleration due to Gravity 215
% ‘htsqsum’ and ‘DD’ are substituted into formula (4.5) to get the slope m of the
% best straight line
bestslope = htsqsum/DD
% Calculation of c, where the best line cuts the y-axis, using formula (4.6)
% c = < y > −m < x > = < t >_ i∧ 2 − m < h >
c = timesmeansqmean - bestslope*gheightsmean
% The standard error in the slope is found from formula (4.10), after first finding
% d_ i∧ 2 using formula (4.13)
disq = (timesmeansq-gheights.*bestslope-c).∧ 2
% The standard error in g is found from the standard error in the slope m using
% formula (3.19) for errors in inverse proportionalities
Sterrg = g*StErrm/bestslope
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% APPROACH 2. Determination of the best m and its error using
% formulae (4.14) to (4.16) for lines through the origin.
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% Calculation of the sum of products of heights in ‘gheights’ and the mean times
216 6 Use of Computers
%The acceleration g due to gravity is calculated from the slope of the line through
% the origin
gorigin = 2/bestslopeorigin
%The standard error in the slope for a line through the origin is found from
% formula (4.15)
StErrorigin = sqrt( (1/(11-1))*ddisqsum/gheightssqsum )
% The standard error in g is found from the standard error in the slope m using
% formula (3.19) for errors in inverse proportionalities
Sterrgorigin = gorigin*StErrorigin/bestslopeorigin
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% xxxxxxxxxxxxxxxxxxxxxxxxxxTHE GRAPHSxxxxxxxxxxxxxxxxxxxxxxxxxx
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% The plot command p(x,y,r) allows various line colours and line types to be chosen
% through choices of r, where r is a character string of options. Two examples follow:
% E.g. 1. plot(x,y,‘rd:’)plots a red dotted line with a diamond at each data point.
% E.g.2 plot(x,y, ‘bo–’) plots a dashed blue line with a circle at each data point.
%xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
%xxxxxxxx PLOT OF BEST LINE AND DATA POINTS xxxxxxxxxxxxxxxxxxxx
%xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% The line with best slope m = bestslope = 0.203981, which cuts the y-axis at
% c = -0.000277982, is plotted together with data points (h_ i, < t > _ i∧ 2)
% contained in the arrays ‘gheights’ and ‘timesmeansq’.
% To do this, the equation TSQmfn(h) of the best straight line is first defined as a
% separate Matlab function file, then saved in an .m file named TSQbestmfn.m.
% Plot of the best straight line and (h_ i, < t > _ i∧ 2) data points
subplot(2,2,1)
plot (h,fb,‘b’,x1,y1,‘ro’), title (‘Time Squared versus Height’),
xlabel (‘Height (m)’), ylabel (‘Time squared (s∧ 2)’), grid on,
legend(‘Best line’, ‘Data points’),axis([0,1.2,0,0.5])
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% xxxxxxxxxxxx PLOT OF THE BEST LINE THROUGH THE ORIGIN xxxxxxxx
% xxxxxxxxxxxxxxxxxxxxx AND DATA POINTS xxxxxxxxxxxxxxxxxxxxxxxxxx
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
218 6 Use of Computers
% The equation for the best line through the origin has m = 0.203618 and c = 0.
Using
% this m and c the equation of the best straight line through the origin is defined in a
% Matlab .m function file, then saved in an .m file named TSQoriginmfn.m.
% Plot of the best straight line through the origin and (h_ i, < t > _i∧ 2) data points
subplot(2,2,2)
plot (h,fo,‘r’,x1,y1,‘o’), title (‘Time Squared versus Height’),
xlabel (‘Height (m)’), ylabel(‘Time squared (s∧ 2)’), grid on,
legend (‘Line through the origin’, ‘Data points’), axis ([0,1.2,0,0.5])
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% xxxxxxxxxxxxxxxxx COMBINED PLOTS xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
% There are two main ways of combing plots. The first uses the ‘hold on’ command.
% The ‘hold on’ command keeps the current plot and adds to it all subsequent plots
% until a ‘hold off’ or ‘hold’ (hold toggles between on/off states) command is issued.
% The second, and preferred way, is to use the ‘plot’ command with multiple
% arguments.
% Combined plot of the best line and the best line through the origin
subplot(2,2,3)
plot(h,fb,‘b’,h,fo,‘y–’), title(‘Time Squared versus Height’),
xlabel(‘Height (m)’), ylabel(‘Time squared (s∧ 2)’), grid on,
legend(‘Best line’, ‘Line through the origin’), axis([0,1.2,0,0.5])
% Combined plot of the best line, the best line through the origin and (h_i, < t >_i∧ 2)
% data points
subplot (2,2,4)
plot(h,fb,‘b’,h,fo,‘y–’,x1,y1,‘ko’), title(‘Time Squared versus Height’),
xlabel(‘Height (m)’), ylabel(‘Time squared (s∧ 2)’), grid on,
legend(‘Best line’,‘Line through the origin’,‘Data points’),
axis ([0,1.2,0,0.5])
The Matlab.m function files TSQbestmfn and TSQoriginmfn used in the above pro-
gram are defined as follows:
Fig. 6.21 Matlab plots. Top left: Plot of the line with the best m and best c together with the data
points. Top right: Plot of the line with the best m passing through the origin together with the data
points. Bottom left: Plot of both lines. As can be seen, the difference between them is too small to
be seen graphically. Bottom right: Combination of both lines and the data points
34. Venn, J. (1866). The logic of chance. London. (Third edition of 1888 reprinted,
New York, 1962).
35. Vernier, P. (1631). La construction, l’usage, et les propriétés du quadrant nou-
veau de mathématiques (The construction, uses, and properties of a new math-
ematical quadrant) (1631).
Appendix B
Tables
This appendix and all appendices that follow have been compiled from the CRC
Handbook of Chemistry and Physics.1
Physical Quantity Symbol for Quantity SI Base unit Symbol for unit
length l metre m
mass m kilogram kg
time s second s
electric current I ampere A
temperature T kelvin K
amount of substance n mole mole
luminous intensity Iv candela cd
1 Editor-in-chief:
W. M. Haynes, CRC Handbook of Chemistry and Physics, 97th edition, (CRC
Press, New York, 2017).
© Springer Nature Switzerland AG 2018 225
P. N. Kaloyerou, Basic Concepts of Data and Error Analysis,
https://doi.org/10.1007/978-3-319-95876-7
226 Appendix B: Tables
Physical Quantity Symbol for Quantity SI Base unit Symbol for unit
plane angle θ radian rad
solid angle steradian sr
Note that not all derived units have special names. For such units, under the ‘Name’
column we have given their definition in words. Units which are named usually take
the name of their originator, e.g., the newton or the joule. Note that despite being
named after a person, such units are always written in lowercase, i.e., the first letter
is not capitalised. However, symbols for many such units are capitalised, e.g., the
symbol for the newton is N, and the symbol for the joule is J.
Appendix B: Tables 227
Length Pressure
1 in = 2.54 cm (exact) 1 Pa = 1 N·m−2
1 m = 39.37 in = 3.281 ft 1 bar = 105 1 Pa
1 ft (foot) = 12 in = 0.3048 m 1 atm = 760 mm Hg = 76.0 cm Hg
1 yd (yard) = 3 ft = 0.9144 m 1 atm = 1.013 × 105 N·m−2
1 km = 0.6214 mi 1 torr = 1 mm Hg = 133.3 Pa
1 mi (mile) = 1 mi = 5280 ft = 1.609 km
1 Å(ångstrom) = 10−10 m Time
1 lightyear = 9.461 × 1015 m 1 yr = 365.25 days = 3.156 × 107 s
1 day = 24 h = 1.44 × 103 min = 8.64 × 104 s
Volume
1 l (litre) = 1000 ml = 1000 cm3 Energy
1 ml = 1 cm3 1 cal = 4.186 J
1 gal (UK) = 4.546 l 1 J = 0.2389 cal
1 l = 0.2100 gal (UK) 1 Btu = 252 cal = 1054 J
1 gal (US) = 3.785 l 1 eV = 1.602 × 10−19 J
1 l = 0.2642 gal (US) 1 kWh = 3.600 × 106 J
Mass Power
1 t (metric ton or tonne) = 1000 kg = 1 hp = 746 W
1 t (UK ton) = 1016 kg 1 W = 1 J·s−1
1 slug = 14.59 kg 1 Btu·h−1 = 0.293 W
1 u = 1.661 × 10−27 kg = 931.5 MeV·c−2
Temperature
Force 1 ◦C = 1 K
◦
1N = 0.2248 lb TF = 95 TC + 32 F
1 lb = 4.448 N TC = 59 (TF − 32)◦ C
230 Appendix B: Tables
Other Constants
a + (b + c) = (a + b) + c, a(bc) = (ab)c,
a c ad ± bc a c ac a c a d ad
± = , × = , ÷ = × = .
b d bd b d bd b d b c bc
(3) Laws of exponents:
a m × a n = a m+n , (ab)n = a n bn , (a m )n = a mn .
1
a 0 = 1 if a = 0, a −n = , a m ÷ a n = a m−n .
an
(5) Laws of base 10 logarithms:
log a = x ⇒ a = 10x ,
a
log a + log b = log ab, log a − log b = log ,
b
1
log a n = n log a, log a −n = log n = −n log a,
a
1 1
log a n = log a.
n
(6) Laws of natural logarithms, i.e., base e logarithms:
ln a = x ⇒ a = e x ,
a
ln a + ln b = ln ab, ln a − ln b = ln ,
b
−n 1
ln a n = n ln a, ln a = ln n = −n ln a,
a
1 1
ln a n = ln a.
n
(7) Binomial theorem (n positive):
n n n n
(a + b)n = a n + a n−1 b + a n−2 b2 + · · · + abn−1 + bn
1 2 n−1 n
n
n
= a n−i bi ,
i
i=0
n!
n Pr = .
(n − r )!
Appendix C: Some Mathematical Formulae and Relations 235
(10) Combinations are the number of ways of arranging n objects when the order
does not matter. In this case the 6 ways of permuting three objects a, b and c
are viewed as a single combination. The number of ways of arranging r objects
chosen from n objects when the order objects does not matter is
n n! n Pr
= n Cr = = .
r (n − r )!r ! r!
(11) Geometry
(i) Area A and perimeter P of a rectangle of length a and width b: A = ab,
P = 2a + 2b,
(ii) Area A and perimeter P of a parallelogram with acute angle θ, long sides
a, short sides b and height h = b sin θ: A = ah = ab sin θ, P = 2a + 2b,
(iii) Area A and perimeter P of a triangle of base a, sides b and c, and height
h = b sin θ, where θ is the angle between b and a: A = 21 ah = 21 ab sin θ,
P = a + b + c,
(iv) Area A and perimeter P of a trapezoid of height h, parallel sides a and b,
and acute angles θ and φ: A = 21 h(a + b), P = a + b + h sin1 θ + sin1 φ ,
(v) Area A and circumference C of a circle of radius r : A = πr 2 , C = 2πr ,
(vi) Area A of an ellipse with short radius (semi-minor axis) b and long radius
(semi-major axis) a: A = πab,
(vii) Volume V and surface area A of a sphere of radius r : V = 43 πr 3 , A = 4πr 2 ,
(viii) Volume V and surface area A of a cylinder of radius r and height h: V =
πr 2 h, A = 2πr h + 2πr 2 ,
(ix) Volume V and surface area A of a rectangular box of length a, height b and
width c: V = abc, A = 2(ab + ac + bc),
(x) Volume V of a parallelepiped with the rectangular face of area A = ac, and
the parallelogram face with parallel sides a and b, acute angle θ and height
(distance between parallel sides a) h = b sin θ: V = Ah = acb sin θ,
(xi) Volume V and surface area A of a right√ circular cone of base radius r and
height h: V = 13 πr 2 h, A = πr 2 + πr r 2 + h 2 ,
(xi) Volume V of a pyramid of base area A and height h: V = 13 Ah.
(12) Definition of a radian. A radian is defined as the angle subtended at the center
of a circle by an arc equal to its radius. It is abbreviated as “rad”:
360
1 rad = = 57.30◦ (to four significant figures).
2π
(13) Trigonometric functions: sine (sin), cosine (cos) and tangent (tan):
(i) Consider a right angled triangle with hypotenuse H , adjacent A and opposite
O with respect to angle θ. Then, the trigonometric functions are defined as
follows:
236 Appendix C: Some Mathematical Formulae and Relations
A 1
cos θ = , cosecant = csc θ = , inverse of cos θ = arccos θ = cos−1 θ
H cos θ
O 1
sin θ = , secant = sec θ = , inverse of sin θ = arcsin θ = sin−1 θ
H sin θ
O 1
tan θ = , cotangent = cot θ = , inverse of tan θ = arctan θ = tan−1 θ
A tan θ
sin θ
tan θ = , sin2 θ + cos2 θ = 1, sec2 θ − tan2 θ = 1, csc2 θ − cot 2 θ = 1,
cos θ
cos(−θ) = cos θ, sin(−θ) = − sin θ, tan(−θ) = − tan θ.
(16) Series
x2 x3
ex = 1 + x + + + ··· −∞ < x < ∞,
2! 3!
k2 x 2 k3x 3
ekx = 1 + kx + + + ··· −∞ < x < ∞,
2! 3!
x3 x5 x7
sin x = x − + − + ··· −∞ < x < ∞,
3! 5! 7!
x2 x4 x6
cos x = 1 − + − ··· −∞ < x < ∞,
2! 4! 6!
3 5
x 2x 17x 7
tan x = x + + + ··· |x| < π2 ,
3 15 315
x3 x5 x7
sinh x = x + + + + ··· −∞ < x < ∞,
3! 5! 7!
x2 x4 x6
cosh x = 1 + + + ··· −∞ < x < ∞,
2! 4! 6!
3 5
x 2x 17x 7
tanh x = x − + − ······ |x| < π2 .
3 15 315
(17) Definition of the derivative of a function u(x):
(18) Product rule. If f = f (x) and g = g(x) then the product rule gives
dfg dg df
= f + g.
dx dx dx
(19) Chain rule:
(a) The chain rule for the derivative of f = f [u(x)] is
df d f du
= .
dx du d x
238 Appendix C: Some Mathematical Formulae and Relations
df d f du d f dv
= + .
dx du d x dv d x
dc d(cu) du
= 0, =c ,
dx dx dx
dxn du n du
= nx ,
n−1
= nu n−1 ,
dx du dx
de x deu du
= ex , = eu ,
dx dx dx
d loga x loga e d loga u loga e du
= , = , (a = 0, 1),
dx x dx u dx
d ln x 1 d ln u 1 du
= , = ,
dx x dx u dx
d sin x d sin u du
= cos x, = cos u ,
dx dx dx
d cos x d cos u du
= − sin x, = − sin u ,
dx dx dx
d tan x d tan u du
= sec2 x, = sec2 u ,
dx dx dx
d arcsin x 1 π π
(21) Indefinite integrals. In this section c and n are constants, while u = u(x) and
v = v(x) are functions of x.
0 d x = 0, d x = x + c, cu d x = c u d x,
x n+1
xn dx = (n = −1) , e x d x = e x + c,
n+1
1 1
d x = ln x + c (x > 0), d x = ln(−x) + c (x < 0),
x x
e x ln a ax
a x d x = e x ln a d x = +c = + c, a > 0, a = 1,
ln a ln a
sin x d x = − cos x + c, cos x d x = sin x + c,
tan x d x = ln sec x + c = − ln cos x + c,
sinh x d x = cosh x + c, cosh x d x = sinh x + c,
tanh x d x = ln cosh x + c,
(25) The gamma function, (x), for x a real positive number is defined by
∞
(x) = e−t t x−1 dt, x >0
0
n!n x−1
(x) = lim , x arbitrary
n→∞ x(x + 1)(x + 2) · · · (x + n − 1)
For negative real x the gamma function has the following values
± finite real value for negative x = −1, −2, . . .
(x) =
± for x = −1, −2, . . .
(26) The factorial function, (n) = n!, for positive integer n is defined by
Using the formula (n) = (n + 1), the definition of the factorial function can
be extended to 0 and negative integer n as follows:
(0) = 0! = 1
(n) = ±∞, n = negative integer
π = 3.14159265358979323846264338327950288419716939937511
Euler’s constant γ:
γ = 0.57721566490153286061
e = 2.71828182845904523536028747135266249775724709369996.
Appendix D
Some Biographies
numbers with which he derived the exponential series, and early concepts of prob-
ability.
• Boltzmann, Ludwig Eduard, 1844–1906, Austrian physicist. Boltzmann was
born in Vienna, Austria. He obtained his doctorate from the university of Vienna
in 1866. He held professorial positions in mathematics and physics at Vienna,
Graz, Munich, and Leipzig. His major contributions were in statistical mechanics.
Among his contributions to statistical mechanics is the derivation of the princi-
ple of equipartition of energy and what is now called the Maxwell–Boltzmann
distribution law.
• Fermat, Pierre de, 1601–1665, French mathematician. He was born in Beaumont-
de-Lomagne, France, and educated at a Franciscan school, studying law in later
years. Little is known of his early life. Fermat received the baccalaureate in law
from the University of Orléans in 1631 and went on to serve as a councillor in the
local parliament of Toulouse in 1634. Fermat made contributions to number theory,
analytic geometry and probability. He is regarded as the inventor of differential
calculus through his method of finding tangents and maxima and minima. Some
30 years later, Sir Isaac Newton introduced his calculus. Recognition of Fermat’s
work was scanty, perhaps because he used an older clumsy notation rendered ob-
solete by Descarte’s 1637 work Géométrie.
In his work reconstructing the long lost Plane Loci of the 3rd century BC Greek
geometer Apollonius, Fermat found that the study of geometry is facilitated by
the introduction of a coordinate system. Fermat’s Introduction to Loci was pub-
lished much later, after his death, in 1679. Descarte introduced a similar idea in
his 1637 Géométrie. Since then, the study of geometry using a coordinate system
has become known as Cartesian geometry, after René Descarte, whose name in
Latin translates to Renatus Cartesius.
Fermat differed with Descarte’s attempt to explain the sine law of refraction (at
a surface separating materials of different densities, the ratio of the sine of the
angle of incidence and the angle of refraction is constant) by supposing that light
travels faster in the denser medium. Fermat, on the other hand, influenced by the
Aristotelian view that nature always chooses the shortest path, supposed that light
travels faster in the less dense medium. Fermat was proved correct by the later
wave theory of Huygens and by the 1849 experiment of Fizeau.
• Galileo Galilei, 1564–1642, Italian natural philosopher, astronomer and mathe-
matician. He was born in Pisa, Tuscany, Italy, the oldest son of Vincenzo Galilei,
a musician who made important contributions to the theory and practice of music.
The Family moved to Florence, where Galileo attended the monastery school at
Vallombrosa. He enrolled at the University of Pisa to study medicine, but instead,
was captivated by mathematics and philosophy and switched to these subjects
against his father’s wishes. However, in 1585, he left the university without a
degree. Despite not having a degree, his scientific and mathematical work gained
recognition and in 1589 he was awarded the chair of mathematics at the University
of Pisa. While at Pisa he performed his famous experiment in which he dropped
Appendix D: Some Biographies 245
objects of different weights from the top of the Leaning Tower of Pisa, demonstrat-
ing that objects of different weights fall at the same rate, contrary to Aristotle’s
claim. His studies on the motion of bodies led him away from Aristotelian notions
about motion, preferring instead the Archimedean approach.
father, Constantijn Huygens, was a diplomat and dabbled in poetry. From an early
age, Huygens showed a talent for drawing and mathematics. He became a student
at the University of Leiden, where he studied mathematics and law.
Huygens improved the construction of telescopes using his new method of grind-
ing and polishing lenses. This allowed him, in 1654, to identify that the funny
shape of Saturn observed by Galileo was actually rings that circled the planet.
His interest in astronomy required an accurate way to measure time and this led
him to invent the pendulum clock. Huygens also made contributions to dynam-
ics: he derived the formula for the time of oscillation of a simple pendulum, the
oscillation of a body about a stationary axis, and the laws of centrifugal force
for uniform circular motion (now described in terms of centripetal force) and in
1656 obtained solutions for the collision of elastic bodies (published later in 1669).
Huygens, however, remains most famous as the founder of the wave theory of
light, presented in his book Traité de la Lumière (Treatise on Light). Though
largely completed by 1678, it was only published in 1690. As made clear in his
book, Huygens held the view that natural phenomena such as light and gravity
are mechanical in nature and hence should be described by a mechanical model.
This view led him to criticise Newton’s theory of gravity. In describing light as
waves, he was also in strong opposition to Newton corpuscular view of light (i.e.,
that light is made up of particles). Though requiring an underlying mechanical
medium producing light waves, Huygens gave a beautiful description of reflection
and refraction based on what is now called Huygens’ principle of secondary wave
fronts, which is a completely non-mechanical description.
• Joule, James Prescott - see footnote 3, Chap. 1.
• Kelvin, Lord - see footnote 1, Chap. 1.
• Laplace, Pierre Simon Marquis de, 1749–1827, French mathematician, as-
tronomer and physicist. Laplace was born in Beaumount-en-Auge, Normandy,
France, the son of a peasant farmer. At the military academy at Beaumont, he
quickly showed his mathematical ability. In 1766 Laplace began studies at the
University of Caen, but later left for Paris, apparently, before completing his de-
gree. He took with him a letter of recommendation which he presented to Jean
d’Alembert, who helped him find employment as a professor at the École Mili-
taire, where he taught from 1769 to 1776. In later life, Laplace became president
of the Board of Longitude, contributed to the development of the metric system
and served for six weeks as minister of the interior under Napoleon. For his con-
tributions, Laplace was eventually created a marquis (a nobleman ranking above
a count but below a duke). He survived the French Revolution when many high-
ranking individuals were executed.
mathematics is the development of what are now called the Laplace equation and
Laplace transforms that find application in many areas of physics and engineering.
In probability theory, he developed numerous tools for calculating probabilities
and showed that large amounts of astronomical data can be approximated by a
Gaussian distribution.
time, he is considered among the creators of geology. He suggested that the earth
was once molten.
• Maxwell, James Clerk, 1831–1879, Scottish physicist. Maxwell was born in Ed-
inburgh to a well-off middle-class family. The original family name was Clerk,
but his father, a lawyer, added the name Maxwell when he inherited Middlebie es-
tate from Maxwell ancestors. At school age, he attended the Edinburgh Academy
and when 16 years of age entered the University of Edinburgh. He later studied
at the university of Cambridge where he excelled. In 1856 he was appointed to
the professorship of natural philosophy at Marischal College, Aberdeen. When,
in 1860, Marischal College merged with King’s College to form the university of
Aberdeen, Maxwell was made redundant. After his application to the university
of Edinburgh was rejected, he managed to secure a position as professor of natural
philosophy at King’s College, London.
The five years following his appointment in King’s College in 1860 were, per-
haps, his most prolific. During this period he published two classic papers on
the electromagnetic field, supervised the experimental determination of electrical
units, and confirmed experimentally the speed of light predicted by his theory.
In 1865 Maxwell resigned his professorship at King’s College and retired to the
family estate in Glenlair, where he devoted most his time to writing his famous
treatise on electricity and magnetism Treatise on Electricity and Magnetism pub-
lished in 1873. By unifying the experimentally established laws of electricity and
magnetism (which includes Faraday’s law of induction), he established his enor-
mously powerful electromagnetic theory based on the four equations now known as
Maxwell’s equations. From these equations he derived a wave equation describing
light, which established light as an electromagnetic wave. Electromagnetic waves
have a broader spectrum than visible light and the connection of all such waves
with electricity and magnetism suggested that these waves could be produced in
the laboratory. This was confirmed experimentally eight years after Maxwell’s
death by Heinrich Hertz in 1887, when he succeed in producing radio waves (giv-
ing rise to the radio industry).
But this was not his only achievement. He made significant contributions to ther-
modynamics (the Maxwell relations) and statistical mechanics where, to mention
one contribution, he derived a distribution law for molecular velocities, now called
the Maxwell-Boltzmann distribution law. Maxwell, though a great theoretician,
also possessed experimental skills and designed a number of experiments to inves-
tigate colour. This led him to suggest that a colour photograph could be produced
using filters of the three primary colours (red, blue and green). He confirmed his
proposal in an 1861 lecture to the Royal Institution of Great Britain, where he
projected, through filters, a colour photograph of a tartan ribbon that he made by
his method. His introduction of a hypothetical super being, now called Maxwell’s
Demon, played an important role in conceptual discussions of statistical ideas. His
work on geometric optics led to the development of the fish-eye lens. His contri-
butions to heat were published in his book Theory of Heat published in 1872.
Appendix D: Some Biographies 249
In 1730, de Moivre introduced what is now called the Gaussian or normal distribu-
tion, which plays a central role in probability and statistics. The true significance
of the distribution was realised much later when Gauss used it as a central part of
his method for locating astronomical objects. As a result, it came to be known as
the Gaussian distribution. So many sets of data satisfied the Gaussian distribution
that it came be thought of as normal for a data set to satisfy the Gaussian curve.
Following the British statistician Karl Pearson, the Gaussian distribution began to
be referred to as the normal distribution.
We may note that Stirling’s formula for n!, incorrectly attributed to the Scottish
mathematician James Stirling, was actually introduced by de Moivre. De Moivre
was also the first to use complex numbers in trigonometry.
• Newton, Isaac - see footnote 2, Chap. 1.
• Pascal, Blaise, 1623–1662, French mathematician, physicist and religious philoso-
pher. Pascal was born in Clermont-Ferrand, France, the son of Étienne Pascal, a
presiding judge of the tax court at Clermont-Ferrand and a respected mathemati-
cian. In 1631, after his mother’s death, the family moved to Paris where Étienne
Pascal devoted himself to the education of his children. Both Pascal and his younger
sister Jacqueline were viewed as child prodigies; his sister in writing and Pascal
in mathematics. As a young man, between 1642 to 1644, he conceived and con-
structed a calculating machine to help his father in tax computations following his
appointment in 1639 as intendant (local administrator) at Rouen.
Pascal came from a religious Catholic background and maintained strong religious
beliefs throughout his life and by the end of 1653 turned his attention more to
religion than to science. On November 23 1654 Pascal experienced a strange
mystical conversation which he believed marked the beginning of a new life. In
January 1654, he entered the Port-Royal convent (a Jansenist convent - Jansenism
is an austere form of Catholicism). Though he never became a full member, he
only wrote at the request of the Port-Royal Jansenists, and never again published
under his own name.
• Poisson, Siméon-Denis, 1781–1840, French mathematician. Poisson was born in
Pithiviers, France. His family wanted him to study medicine, but Poisson’s inter-
est was in mathematics and in 1798 enrolled at the École Polytechnique in Paris.
His teachers were Pierre-Simon Laplace and Joseph-Louis Lagrange, with whom
he became lifelong friends. In 1802 he was appointed a Professor at the École
Polytechnique, a post he left in 1808 to take up a position as an astronomer at the
Bureau of Longitudes. When the Faculty of Sciences was created in 1809 he was
offered the position of professor of pure mathematics.
Chapter 3
1. (a) σ = 0.0283 mm, (b) sm = 0.00756 mm
2. m = 1.45, (x, y) = (7.00, 9.48), c = −0.667, sm = 0.03
3. (a) (150.0 ± 0.5) g (b) (50.0 ± 0.5) g
4. (350.0 ± 0.7) g
5. (200 ± 3) cm3
6. (a) Relative error = 0.0687 (3 sf) (b) Percentage error = 6.87% (3 sf)
7. (670 ± 46) cm3
Chapter 5
1. {numbers 1 to 36} and {red, black}
3
2. 52
3. By direct counting the probabilities P(A1 ) = 52 6
, P(A2 ) = 52
13
. Then, clearly,
P(A1 ) ≥ P(A2 ). Also P(A1 − A2 ) = P(A1 ) − P(A2 ) = 12 . By direct count-
7
1 1 1 1 4
P(A) = P(A1 ) + P(A2 ) + P(A3 ) + P(A4 ) = + + + =
52 52 52 52 52
in agreement with Theorem 5.2.4.
5. The probability of choosing a club is P(A) = 52
13
, while the probability of draw-
ing a card numbered 5 to 10 is P(B) = 52 . The probability of drawing a club
24
numbered 5 to 10 is P(A ∩ B) = 52 6
. Then
13 24 6 31
P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = + − =
52 52 52 52
© Springer Nature Switzerland AG 2018 253
P. N. Kaloyerou, Basic Concepts of Data and Error Analysis,
https://doi.org/10.1007/978-3-319-95876-7
254 Appendix E: Solutions
6 7 13
P(A) = P(A ∩ B) + P(A ∩ B ) = + = ,
52 52 52
in agreement with Theorem 5.2.6.
7. By counting desired outcomes we get P(A) = 5213
, P(A1 ∩ A) = 3
52
, P(A2 ∩
A) = 52 , P(A3 ∩ A) = 52 and P(A4 ∩ A) = 52 , then
4 2 4
3 4 2 4 13
P(A) = P(A ∩ A1 ) + P(A ∩ A2 ) + P(A ∩ A3 ) = + + + =
52 52 52 52 52
A2 ∩ A3 ) = 5525
1
19.
x 1 2 3 4 5 6 8 9 10
p(x) 1/36 2/36 2/36 3/36 2/36 4/36 2/36 1/36 2/36
F(x) 1/36 3/36 5/36 8/36 10/36 14/36 16/36 17/36 19/36
Appendix E: Solutions 255
x 12 15 16 18 20 24 25 30 36
p(x) 4/36 2/36 1/36 2/36 2/36 2/36 2/36 2/36 1/36
F(x) 23/36 25/36 26/36 28/36 29/36 31/36 33/36 35/36 36/36
20. (a) ⎧
−1
⎪
⎪ −∞ 0 d x = 0 for x < −1
⎪
⎪
⎪
⎨ x −v2
F(x) = √π Erf(1)
1
−1 e dv = Erf(1)+Erf(x) for − 1 ≤ x ≤ 1
⎪
⎪
Erf(1)
⎪
⎪
⎩ ∞ 0 d x = 0 for x > 1.
⎪
1
(b) .
p y (0) = 10
84
, p y (1) = 40
84
, p y (2) = 30
84
, p y (0) = 4
84
(c)
X \Y y1 = 0 y2 = 1 y3 = 2 y4 = 3 px (xi )
x1 = 0 p(0, 0) = 0 p(0, 1) = 84 4
p(0, 2) = 12 84 p(0, 3) = 844
px (0) = 20 84
x2 = 1 p(1, 0) = 84 3
p(1, 1) = 84 p(1, 2) = 84
24 18
— px (1) = 45 84
x3 = 2 p(2, 0) = 84 6
p(2, 1) = 1284 — — px (2) = 84 18
x4 = 2 p(3, 0) = 84 1
— — — px (2) = 84 1
p( y j ) p y (0) = 10
84 p y (1) = 40
84 p y (2) = 30
84 p y (3) 4
84 1
24. ⎧ (2x−4)(3y−5)
⎨ (15−9y) for 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
p(x|y) =
⎩
0 otherwise
25. X = 1.5
26. d = 0.571 mm
27. X = 23
28. X = 0.7034 √
m, g(X ) = 0.3507 m3 .
29. σ = 14, σ = 14
2
34. 0.99%
35. P B(0) = 0.3670, P P(0) = 0.3679, P B(1) = 0.3688, P P(1) = 0.3679,
C F
Candela (cd), 2, 3 Fahrenheit (F), 2
Cardinano, J, 76 Foot (ft), 2
Centimetre-gram-second (CGS), 2 Frequency, 81
Central Limit Theorem, 152 Frequency curve, 30
Combinations, 96 Frequency graphs, 30
Combinatorial analysis, 95
Conditional probability, 88
Conditional probability density, continuous, G
113 Galilei, Galileo, 76
© Springer Nature Switzerland AG 2018 257
P. N. Kaloyerou, Basic Concepts of Data and Error Analysis,
https://doi.org/10.1007/978-3-319-95876-7
258 Index
H
Halley, E, 78 N
Heisenberg, W, 72 Newton (N), 4
Huygens, C, 76 Newton, Sir Issac, 4
Hypergeometric distribution, 141 Normal distribution, 143
I O
Inch (in), 2 Objective interpretation, 73
International Bureau of Weights and Mea- Objective probability, 82
sures, 3
International System of Units (SI), 2
P
Parallax errors, 35
J Pascal, B, 76, 77
Joule (J), 2, 4 Permutations, 95
Joule, J. P., 4 Points-in-pairs method, 51
Poisson distribution, 137
Population standard deviation, 29
K Pound (lb), 2
Kelvin (k), 2, 3 Precise measurement, 35
Kelvin, Lord, 2 Principle of Insufficient Reason (PIR), 74
Kilogram (kg), 2, 3 Probability density, continuous, 103
Kolmogorov, A. N., 80 Probability density, joint, continuous, 110
Probability distribution, discrete, 100
Probability function, discrete, 100
L Probability function, joint, discrete, 106
Laplace, P-S, 77, 78 Proportionalities, 49
Legendre, A-M, 62
Leibnitz, G. W. F. von , 76
R
Random errors, 27
M Random Experiment, 80
Marginal probability function, discrete, 107 Random variables, 100
Maxwell, J. C., 78 Random variables, independent, 111
Mean (average), 27 Random variable, standardised, 125
Mean, continuous, 117 Relative frequency, 81
Mean, discrete, 116 Residuals, 28
Mean, joint, continuous, 126 Rounding off, 20
Mean, joint, discrete, 125
Median, 131
Method of least squares, 51, 61 S
Metre-kilogram-second (MKS), 2 Sample space, 81
metre (m), 2, 3 Sample standard deviation, 29
Metric system, 1 Schrödinger, E, 72
Micrometer, 7, 12 Second (s), 2, 3
Microscope, 7 Significant figures, 15, 16
Index 259
T
Tangent, 25 Y
Trail, 80 Yard (yd), 2
U Z
United States Customary System, 2 Zero errors, 35