David J. Smith - Basic Statistical Techniques For Medical and Other Professionals - A Course in Statistics To Assist in Interpreting Numerical Data-Productivity Press (2021)
David J. Smith - Basic Statistical Techniques For Medical and Other Professionals - A Course in Statistics To Assist in Interpreting Numerical Data-Productivity Press (2021)
David J. Smith - Basic Statistical Techniques For Medical and Other Professionals - A Course in Statistics To Assist in Interpreting Numerical Data-Productivity Press (2021)
Typeset in Garamond
by MPS Limited, Dehradun
Contents
Foreword ................................................................ix
Preface....................................................................xi
Acknowledgements ............................................. xiii
About the Author .................................................. xv
Introduction ........................................................xvii
v
vi ▪ Contents
ix
x ▪ Foreword
Sam Samuel
Preface
xi
Acknowledgements
xiii
About the Author
xv
Introduction
DOI: 10.4324/9781003220138-1 1
2 ▪ Statistical Techniques for Medical & Other Professionals
BLOOD SUGAR
10
6
mmol/L
0
Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Day 8 Day 9 Day 10
Spread of a Variable
Correlation
Taking Samples
DOI: 10.4324/9781003220138-2 7
8 ▪ Statistical Techniques for Medical & Other Professionals
Combining Probabilities
Pab = Pa x P b = 0 . 8 x 0 . 7 = 0 . 56
Pan = Pa x P b x P c … … … x Pn
P (a or b or both ) = Pa + Pb (Pa x Pb )
( More easily remembered as SU M PR ODUCT)
1 (1 P a ) (l P b ) (1 Pc ) ……. (l P n).
Pa + Pb
Pa + Pb + P c + Pd etc
Conditional Probabilities
Metrics
DOI: 10.4324/9781003220138-3 17
18 ▪ Statistical Techniques for Medical & Other Professionals
The Mean
Xi
X̄ =
N
(X̄ Xi)2
N
Without the square root it is called the variance, in other
words:
20 ▪ Statistical Techniques for Medical & Other Professionals
(X̄ Xi)2
N
Grouped Data
be seen, not only has the sugar level tended to fall but so
also has the COV, indicating an improvement in con
sistency. The latter is as important as the trend in the Mean.
Figure 3.5 Skewed distributions from left to right: The first ex
ample, an unbiased (symmetrical) distribution, and the second
example
Geometric Mean
a. Geometric Growth
If the growth of some variable is geometric rather than
linear (e.g., population) where the arithmetic mean
would be misleading. Thus, if the population of
Noddytown is 10,000 in 2010, and 20,000 in 2020, then
the likely number in 2015 would be better given as:
b. A wide range
Comparing Variables
In Chapter 3, we acquired some familiarity with the normal
(Gaussian) distribution and its mean and standard deviation.
The next step is to think about comparing two dis
tributions (of the same variable) with a view to deciding if
they are significantly different or whether they might both
represent the same population.
Figure 4.1 illustrates the idea graphically. The two
questions are
a. Are the standard deviations significantly different?
b. Are the means significantly different?
StdDev12
F =
S td De v22
Appendix 4 provides tables of the F distribution. In order
to use them, it is necessary to make use of a number
DOI: 10.4324/9781003220138-4 29
30 ▪ Statistical Techniques for Medical & Other Professionals
(a) (b)
[M ean1–Mean2]
t=
SQR T [StdDev12 / n1 + StdDev22 /n2]
DOI: 10.4324/9781003220138-5 33
34 ▪
Graphs
Some Pitfalls
Supression
Extrapolation
Logarithmic Scales
1,000.0
100.0
10.0
1.0
01/02/2021
06/02/2021
11/02/2021
16/02/2021
21/02/2021
26/02/2021
03/03/2021
08/03/2021
13/03/2021
18/03/2021
23/03/2021
28/03/2021
02/04/2021
07/04/2021
12/04/2021
17/04/2021
22/04/2021
27/04/2021
02/05/2021
07/05/2021
12/05/2021
17/05/2021
Figure 5.13 Moving average of deaths on a logarithmic basis
Moving Averages
Control Charts
SYSTOLIC
250
200
150
100
50
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
ACTION WARNING
LINE LINE
Binomial
DOI: 10.4324/9781003220138-6 49
50 ▪ Statistical Techniques for Medical & Other Professionals
Note that P + Q = 1
From the multiplication rule in Chapter 2
■ Probability of 2 hearts P2
■ Probability of 1 heart 2PQ
■ Probability of 0 hearts Q2
■ Probability of 3 hearts P3
■ Probability of 2 hearts 3P2Q
■ Probability of 1 heart 3PQ2
■ Probability of 0 hearts Q3
Poisson
the specified attribute. This does not infer that every sample
will contain the quantity given by exactly n × P, indeed some
will contain 0, others 1, others 2 and so on. If the expectation
is given the symbol m such that m = nP then the terms of the
Poisson Distribution are:
e−m, me−m, m mn e m
m2 e
, n!
2!
0 1 2 n items
0 deaths is 8%
1 or less deaths is 29%
2 or less deaths is 54%
3 or less deaths is 76%
0 cases?
1 case?
2 cases?
3 cases?
4 cases?
5 cases?
Chapter 7
Testing for
Significance
(Attributes)
We are frequently confronted with data relating to some
attribute where measurements have been taken under two
different regimes or circumstances. The following tech-
nique applies to the numbers of occurrences of an attri-
bute and is used to compare the difference between
outcomes in terms of whether that difference is significant
or is merely due to chance.
It is best explained by means of the following example.
Assume that we wish to test if the observed frequency of
seeing each face of a die is random or whether that par-
ticular die is biased.
We calculate the value of what is known as χ2 called
Chi-squared (χ is the Greek letter Chi pronounced Kai, as
in the word kite). Some books refer to it as Chi- square. It
is calculated by comparing the observed frequencies (O)
with the frequencies which could be anticipated if the
outcome were assumed to be unbiased (A). The formula is
DOI: 10.4324/9781003220138-7 55
56 ▪ Statistical Techniques for Medical & Other Professionals
2 (O A)2
=
A
Correlation and
Regression
Relating Two Variables
DOI: 10.4324/9781003220138-8 61
62 ▪ Statistical Techniques for Medical & Other Professionals
Y= mX+ c
64 ▪ Statistical Techniques for Medical & Other Professionals
Exercise 8 Correlation
Construct the spreadsheet shown in Figure 8.2 using
the formulae indicated. Cell D32 ( Figure 8.2) contains
the equation for r provided above the figure. Enter the
following data and determine the coefficient, r.
Systolic Diastolic
147 73
156 78
160 73
189 80
173 68
157 70
139 80
157 79
165 60
156 82
175 78
158 70
161 80
180 74
145 78
162 69
(Continued)
Correlation and Regression ▪ 65
Systolic Diastolic
147 70
168 79
131 68
138 75
126 58
147 73
145 70
145 69
False Correlation
No of STDEV
Median away from Ranked
Rank Mean Data
Rank "NORMSINV"
1 0.0275591 -1.917944016 1
2 0.0669291 -1.499059227 2
3 0.1062992 -1.246452206 2
4 0.1456693 -1.055189681 3
5 0.1850394 -0.896325881 3
6 0.2244094 -0.757385572 3
7 0.2637795 -0.631736528 4
8 0.3031496 -0.515363243 4
9 0.3425197 -0.405596134 4
10 0.3818898 -0.300521332 4
11 0.4212598 -0.198671544 5
12 0.4606299 -0.098846884 5
13 0.5 0 5
14 0.5393701 0.098846884 5
15 0.5787402 0.198671544 5
16 0.6181102 0.300521332 6
17 0.6574803 0.405596134 6
18 0.6968504 0.515363243 6
19 0.7362205 0.631736528 6
20 0.7755906 0.757385572 7
21 0.8149606 0.896325881 7
22 0.8543307 1.055189681 7
23 0.8937008 1.246452206 8
24 0.9330709 1.499059227 8
25 0.9724409 1.917944016 9
Mean 5
Std Dev 2.041241
Handling Numbers
(Large and Small)
If the reader is not already familiar with manipulating
numbers expressed in the “POWERS OF TEN” format, then
it is important to study this chapter carefully and to prac-
tice Exercise 9.
Many of the numbers involved in this subject are either
very large or very small (in other words, a number multi-
plied or divided by a large number, like a million).
If we use expressions such as “1 in 1,000,000” or “1 in
100,000” it becomes very cumbersome. It is, therefore,
important to become familiar with the concept of
“POWERS OF TEN.”
Big Numbers
DOI: 10.4324/9781003220138-9 69
70 ▪ Statistical Techniques for Medical & Other Professionals
Small Numbers
Some Examples
Remember:
An Introduction
to Risk
The purpose of this chapter is to provide a wider per-
spective by introducing the risk of fatality from scenarios
other than those arising from medical issues. It provides a
comparison between the background occupational and
leisure related risks, to which we are exposed, and those
relating to health. The following rates and probabilities
are, by their nature, approximate estimates. References
3 and 4 (Appendix 9) also deal with this aspect.
DOI: 10.4324/9781003220138-10 75
76 ▪ Statistical Techniques for Medical & Other Professionals
Notes
∗Notice the inference of a probability (or less), stated at some confidence
level. This makes further use of the Chi-squared technique. The
“probability or less” is calculated as χ2 /2T where T is the aggregate
number of exposures to the risk and χ2 is obtained from n = 2(k+1)
degrees of freedom (for k occurrences) and at a probability of (1-
confidence). In the above example, χ2 is found in Appendix 7, from
n = 2(0+1) = 2 and probability 0.1, namely 4.61. Thus, 4.61/(2 × 70% ×
12 106) = <3 10−7. The same formula may be used to infer a rate, in
which case T becomes the aggregate time of exposure. The technique
is fully explained in Reference 4 ( Appendix 9).
A Final Word
I have tried to show, in this book, some fairly simple
techniques for drawing conclusions from quantified phy-
sical measurements. However, let us beware of the short-
sightedness which leads to a blinkered treatment of
numerical data. There is a danger in running away with
the conclusions as if they were the only factors which
impinge on the situation in question. In medical applica-
tions, this would be the patient’s overall health. I cannot
stress, too strongly, the need for a holistic approach to
health care. The role of the specialist is paramount but,
nevertheless, it is equally important to take a wide view of
the parameters involved and to seek a balance between
different treatment options.
I believe the same might be said of all scientific and
engineering disciplines.
If you have worked through the book, and attempted
the exercises, I hope you will have gained a fair grasp of
the basic principles of statistical sampling and inference. I
have attempted to explain the techniques with little more
than simple arithmetic. They can all be applied using the
tables and curves provided.
DOI: 10.4324/9781003220138-11 83
84 ▪ Statistical Techniques for Medical & Other Professionals
Arithmetic Functions
DOI: 10.4324/9781003220138-NaN 85
86 ▪ Manipulating numbers in Spreadsheets
Copy–Paste
Copy–Paste Special
Regression/Correlation
90
Appendix 2 ▪ 91
93
Appendix 4a: 0.5% & 1% Points of the F
Distribution
0.5% Probability
Degrees of freedom (larger std. dev.)
1 2 3 4 5 6 10 24 ∞
94
Appendix 4a ▪ 95
1% Probability
Degrees of freedom (larger std. dev.)
1 2 3 4 5 6 10 24 ∞
Degrees of freedom (smaller std. dev.)
5% Probability
Degrees of freedom (larger std. dev.)
1 2 3 4 5 6 10 24 ∞
Degrees of freedom (smaller std. dev.)
96
Appendix 4c: 10% & 25% Points of the F
Distribution
25% Probability
Degrees of freedom (larger std. dev.)
1 2 3 4 5 6 10 24 ∞
1 5.83 7.50 8.20 8.58 8.82 8.98 9.32 9.63 9.85
Degrees of freedom (smaller std. dev.)
98
Appendix 5 ▪ 99
120 0.672 0.845 1.289 1.658 1.980 2.270 2.617 2.860 3.373
∞ 0.675 0.842 1.282 1.645 1.960 2.241 2.576 2.807 3.291
100
Appendix 7: Percentage Points of the Chi
Squared Distribution
0.999 0.99 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.01
1 <0.001 <0.001 0.02 0.06 0.15 0.28 0.46 0.70 1.10 1.64 2.71 6.63
2 0.00 0.02 0.21 0.45 0.71 1.02 1.39 1.83 2.41 3.22 4.61 9.21
3 0.02 0.12 0.58 1.00 1.42 1.87 2.37 2.95 3.67 4.64 6.25 11.30
4 0.09 0.30 1.06 1.65 2.19 2.75 3.36 4.04 4.88 5.99 7.78 13.30
5 0.21 0.55 1.61 2.34 3.00 3.66 4.35 5.13 6.06 7.29 9.24 15.10
6 0.38 0.87 2.20 3.07 3.83 4.57 5.35 6.21 7.23 8.56 10.60 16.80
7 0.60 1.24 2.83 3.82 4.67 5.49 6.35 7.28 8.38 9.80 12.00 18.50
8 0.86 1.65 3.49 4.59 5.53 6.42 7.34 8.35 9.52 11.00 13.40 20.10
Degrees of freedom
9 1.15 2.09 4.17 5.38 6.39 7.36 8.34 9.41 10.70 12.20 14.70 21.70
10 1.48 2.56 4.87 6.18 7.27 8.30 9.34 10.50 11.80 13.40 16.00 23.20
15 3.48 5.23 8.55 10.30 11.70 13.00 14.30 15.70 17.30 19.30 22.30 30.60
20 5.92 8.26 12.40 14.60 16.30 17.80 19.30 21.00 22.80 25.00 28.40 37.60
25 8.65 11.50 16.50 18.90 20.90 22.60 24.30 26.10 28.20 30.70 34.40 44.30
30 11.60 15.00 20.60 23.40 25.50 27.40 29.30 31.30 33.50 36.30 40.30 50.90
35 14.70 18.50 24.80 27.80 30.20 32.30 34.30 36.50 38.90 41.80 46.10 57.30
40 17.90 22.20 29.10 32.30 34.90 37.10 39.30 41.60 44.20 47.30 51.80 63.70
45 21.30 25.90 33.40 36.90 39.60 42.00 44.30 46.80 49.50 52.70 57.50 70.00
50 24.70 29.70 37.70 41.40 44.30 46.90 49.30 51.90 54.70 58.20 63.20 76.20
75 42.80 49.50 59.80 64.50 68.10 71.30 74.30 77.50 80.90 85.10 91.10 106.40
100 61.90 70.10 82.40 87.90 92.10 95.80 99.30 103.00 107.00 112.00 118.50 135.80
101
Appendix 8: Answers to Exercises
102
Appendix 8 ▪ 103
0 cases = 3%
1 or less case = 15% thus 1 case 15%−3%=12%
2 or less cases = 33% thus 2 cases 33−15%=18%
3 or less cases = 55% thus 3 cases 55%−33%=22%
4 or less cases = 74% thus 4 cases 74%−55%=19%
5 or less cases = 87% thus 5 cases 87%−74%=13%
Appendix 8 ▪ 107
Exercise 7 Chi-Squared
Exercise 8 Correlation
If the spreadsheet has been properly constructed, then the
r = 0.32 should have been obtained. The formula for the
cell D33 is shown below and represents the formula in
Chapter 8
=F28/(G28*H28)^0.5
Cells B28–H28 are the sums of the rows 2–25.
Cells B30–C30 are the averages of the rows 2–25.
£300/(0.67 × 0.1 × 2) = £2 ,2 40
112
Index
Bayes Theorem, 16
binomial attribute, 49–51
D
binomial expansion, 51
data as integer, 21
data comparisons, 3
C
data variability, 2–3
Chi squared distribution, 55, 58–59, dimensionless, 8, 90
75, 101, 107 distribution, 90
113
114 ▪ Index
L
E
linear, 2
E, 90 logarithmic scales, 43–45
empirical statement of probability, 7, 90 compared to linear scales, 43–44
exclusive, 8, 11, 90
EXP, 90
M
F
mean (arithmetic), 1, 91
fatal accident frequency (FAFR), comparison of, 31–32
77–78, 91 formula, 19
Fisher’s F distribution, 29, 94–97 as measure of central tendency
(or average), 18
spread of values and, 2
G value, 3
mean (geometric), 27–28, 91
geometric mean, 27–28
mean and coefficient of variation,
graphs, 38–40
25, 104
technique, 2
mean deviation, 20, 103
visual plot (linear), 1
median life expectancy, 26–27, 104
grouped data, 20
medians, 25–27, 91
metric, 91
H moving averages, 45–47
multiplicity rule, 9
histograms (bar charts), 33–38 mutually exclusive events, 11–12
Index ▪ 115
N
R
negative exponents, 72–74
normal (or Gaussian) distribution, random sampling, 5, 91
18, 22, 26, 29, 32, 38, relevant factors in, 5
51–52, 68 ratios, 10
comparison of two distributions, regression, 61–65, 91
29, 30–32 equation, 63–64
establishment of, 66–68 spreadsheets for line
table, 94 generation, 88–89
numbers: big, 69–70, 73–74, regression coefficient, 68
114–115 risk of fatality, 75, 91
division, 71–72 fatal accident frequency
multiplication, 71–72 (FAFR), 77–78
as positive and negative powers individual per annum, 77
of ten, 5–6, 69 individual per exposure to an
small, 70, 73–74, 110 activity, 75–76
as low as reasonably practical
(ALARP), 80–82
P maximum tolerable, 78–80
variable vs. constant rates, 82
parameter, 91
poisson curves, 53
attributes using, 53, 111 S
cumulative, 104
poisson distribution, 51–53 scatter diagrams, 40, 51
population, 91 semi-quantified metrics, 17–18
power of ten, 69 as variables, 17
probability, 7–8, 91 significance testing, 55–59
à priori statement of, 7 spread (distribution) of values, 2–3
combination of, 8–11 spread (distribution) of
concept of, 8 variable, 3–4
conditional, 13–16, 87–88 spreadsheets (for statistical
either or both, 10 calculations), 85–89
empirical statement of, 7 arithmetic functions, 85–86
false and true negatives, conditional probabilities, 87–88
14–15, 16 copy-paste, 86
false and true positives, correlation coefficient, 88–89
13–14, 15–16 regression line generation, 88–89
manipulation, 12, 102 standard deviation, 19, 20, 92, 103
116 ▪ Index
variables, 3, 92
compared to attributes, 3, 17 W
continuous, 17
expressed in terms of continuous Weibull distribution, 82
measurement, 17
relationship to other variables, 4