Instant Download Bayesian Statistics For Beginners: A Step-By-Step Approach Therese M Donovan PDF All Chapter
Instant Download Bayesian Statistics For Beginners: A Step-By-Step Approach Therese M Donovan PDF All Chapter
Instant Download Bayesian Statistics For Beginners: A Step-By-Step Approach Therese M Donovan PDF All Chapter
com
https://ebookmass.com/product/bayesian-
statistics-for-beginners-a-step-by-step-
approach-therese-m-donovan/
ebookmass.com
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
https://ebookmass.com/product/elementary-statistics-a-step-by-
step-approach-10th-edition-ebook-pdf/
https://ebookmass.com/product/elementary-statistics-a-step-by-
step-approach-11e-ise-11th-ise-edition-allan-bluman/
https://ebookmass.com/product/ise-elementary-statistics-a-step-
by-step-approach-a-brief-version-8e-8th-edition-allan-g-bluman/
https://ebookmass.com/product/wordpress-for-
beginners-2019-a-visual-step-by-step-guide-to-mastering-
wordpress-webmaster-series-book-2-ebook-pdf/
Developing Helping Skills: A Step by Step Approach to
Competency 3rd Edition, (Ebook PDF)
https://ebookmass.com/product/developing-helping-skills-a-step-
by-step-approach-to-competency-3rd-edition-ebook-pdf/
https://ebookmass.com/product/developing-helping-skills-a-step-
by-step-approach-to-competency-3rd-edition-valerie-nash-chang/
https://ebookmass.com/product/endovascular-interventions-a-step-
by-step-approach-nov-6-2023_1119467780_wiley-blackwell-jose-
wiley/
https://ebookmass.com/product/learn-data-mining-through-excel-a-
step-by-step-approach-for-understanding-machine-learning-
methods-2nd-edition-hong-zhou/
https://ebookmass.com/product/microsoft-project-2019-step-by-
step-step-by-step-microsoft-cindy-lewis-carl-chatfield-timothy-
johnson-lewis/
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
Bayesian Statistics
for Beginners
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
Bayesian Statistics
for Beginners
A Step-by-Step Approach
THERESE M. DONOVAN
RUTH M. MICKEY
1
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
3
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Ruth M. Mickey 2019
The moral rights of the author have been asserted
First Edition published in 2019
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2019934655
ISBN 978–0–19–884129–6 (hbk.)
ISBN 978–0–19–884130–2 (pbk.)
DOI: 10.1093/oso/9780198841296.001.0001
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
To our parents, Thomas and Earline Donovan and Ray and Jean Mickey,
for inspiring a love of learning.
To our mentors, some of whom we’ve met only by their written words,
for teaching us ways of knowing.
Preface
Greetings. This book is our attempt at gaining membership to the Bayesian Conspiracy.
You may ask, “What is the Bayesian Conspiracy?” The answer is provided by Eliezer
Yudkowsky (http://yudkowsky.net/rational/bayes): “The Bayesian Conspiracy is a multi
national, interdisciplinary, and shadowy group of scientists that controls publication,
grants, tenure, and the illicit traffic in grad students. The best way to be accepted into the
Bayesian Conspiracy is to join the Campus Crusade for Bayes in high school or college, and
gradually work your way up to the inner circles. It is rumored that at the upper levels of the
Bayesian Conspiracy exist nine silent figures known only as the Bayes Council.”
Ha ha! Bayes’ Theorem, also called Bayes’ Rule, was published posthumously in 1763 in
the Philosophical Transactions of the Royal Society. In The Theory That Would Not Die,
author Sharon Bertsch McGrayne aptly describes how “Bayes’ rule cracked the enigma code,
hunted down Russian submarines, and emerged triumphant from two centuries of contro-
versy.” In short, Bayes’ Rule has vast application, and the number of papers and books that
employ it is growing exponentially.
Inspired by the knowledge that a Bayes Council actually exists, we began our journey by
enrolling in a 5-day ‘introductory’ workshop on Bayesian statistics a few years ago. On Day
1, we were introduced to a variety of Bayesian models, and, on Day 2, we sheepishly had to
inquire what Bayes’ Theorem was and what it had to do with MCMC. In other words, the
material was way over our heads. With tails between our legs, we slunk back home and
began trying to sort out the many different uses of Bayes’ Theorem.
As we read more and more about Bayes’ Theorem, we started noting our own questions as
they arose and began narrating the answers as they became more clear. The result is this
strange book, cast as a series of questions and answers between reader and author. In this
prose, we make heavy use of online resources such as the Oxford Dictionary of Statistics
(Upton and Cook, 2014), Wolfram Mathematics, and the Online Statistics Education: An
Interactive Multimedia Course for Study (Rice University, University of Houston Clear
Lake, and Tufts University). We also provide friendly links to online encyclopedias such
as Wikipedia and Encyclopedia Britannica. Although these should not be considered
definitive, original works, we have included the links to provide readers with a readily
accessible source of information and are grateful to the many authors who have contrib-
uted entries.
We are not experts in Bayesian statistics and make no claim as such. Therese Donovan is a
biologist for the U. S. Geological Survey Vermont Cooperative Fish and Wildlife Research
Unit, and Ruth Mickey is a statistician in the Department of Mathematics and Statistics at
the University of Vermont. We were raised on a healthy dose of “frequentist” and max-
imum likelihood methods but have begun only recently to explore Bayesian methods. We
have intentionally avoided controversial topics and comparisons between Bayesian and
frequentist approaches and encourage the reader to dig deeper—much deeper—than we
have here. Fortunately, a great number of experts have paved the way, and we relied heavily
on the following books while writing our own:
• N. T. Hobbs and M. B. Hooten. Bayesian Models: A Statistical Primer for Ecologists.
Princeton University Press, 2015.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
viii PREFACE
• J. Kruschke. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan.
Elsevier, 2015.
• A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. Chapman &
Hall, 2004.
• J. V. Stone. Bayes’ Rule: a Tutorial Introduction to Bayesian Analysis. Sebtel Press, 2014.
• H. Raiffa and R. Schlaifer. Applied Statistical Decision Theory. Division of Research,
Graduate School of Business Administration, Harvard University, 1961.
• P. Goodwin and G. Wright. Decision Analysis for Management Judgment. John Wiley &
Sons, 2014.
Although we relied on these sources, any mistakes of interpretation are our own.
Our hope is that Bayesian Statistics for Beginners is a “quick read” for the uninitiated and
that, in one week or less, we could find a reader happily ensconced in a book written by one
of the experts. Our goal in writing the book was to keep Bayes’ Theorem front and center in
each chapter for a beginning audience. As a result, Bayes’ Theorem makes an appearance in
every chapter. We frequently bring back past examples and explain what we did “back
then,” allowing the reader to slowly broaden their understanding and sort out what has
been learned in order to relate it to new material. For the most part, our reviewers liked this
approach. However, if this is annoying to you, you can skim over the repeated portions.
If this book is useful to you, it is due in no small part to a team of stellar reviewers. We owe
a great deal of gratitude to George Allez, Cathleen Balantic, Barry Hall, Mevin Hooten, Peter
Jones, Clint Moore, Ben Staton, Sheila Weaver, and Robin White. Their enthusiasm,
questions, and comments have improved the narrative immensely. We offer a heartfelt
thank you to Gary Bishop, Renee Westland, Melissa Murphy, Kevin Roark, John Bell, and
Stuart Geman for providing pictures for this book.
Therese Donovan
Ruth Mickey
October 2018
Burlington, VT
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
Contents
1 Introduction to Probability 3
2 Joint, Marginal, and Conditional Probability 11
3 Bayes’ Theorem 29
4 Bayesian Inference 37
5 The Author Problem: Bayesian Inference with Two Hypotheses 48
6 The Birthday Problem: Bayesian Inference with Multiple Discrete Hypotheses 61
13 The Shark Attack Problem Revisited: MCMC with the Metropolis Algorithm 193
14 MCMC Diagnostic Approaches 212
15 The White House Problem Revisited: MCMC with the Metropolis–Hastings
Algorithm 224
16 The Maple Syrup Problem Revisited: MCMC with Gibbs Sampling 247
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
x CONTENTS
SECTION 6 Applications
Appendices
Bibliography 399
Hyperlinks 403
Name Index 413
Subject Index 414
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
SECTION 1
Basics of Probability
Overview
And so we begin. This first section deals with basic concepts in probability theory, and
consists of two chapters.
CHAPTER 1
Introduction to Probability
In this chapter, we’ll introduce some basic terms used in the study of probability. By the end
of this chapter, you will be able to define the following:
• Sample space
• Outcome
• Discrete outcome
• Event
• Probability
• Probability distribution
• Uniform distribution
• Trial
• Empirical distribution
• Law of Large Numbers
To begin, let’s answer a few questions . . .
What is probability?
Answer: The best way to introduce probability is to discuss an example. Imagine you’re a
gambler, and you can win $1,000,000 if a single roll of a die turns up four. You get only one
roll, and the entry fee to play this game is $10,000. If you win, you’re a millionaire. If you
lose, you’re out ten grand.
Bayesian Statistics for Beginners: A Step-by-Step Approach. Therese M. Donovan and Ruth M. Mickey,
Oxford University Press (2019). © Ruth M. Mickey 2019.
DOI: 10.1093/oso/9780198841296.001.0001
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
Let’s call the number of possible outcomes N, so N ¼ 6. These outcomes are discrete,
because each result can take on only one of these values. Formally, the word “discrete” is
defined as “individually separate and distinct.” In addition, in this example, there is a
finite number of possible outcomes, which means “there are limits or bounds.” In other
words, the number of possible outcomes is not infinite.
If we believe that each and every outcome is just as likely to result as every other outcome
(i.e., the die is fair), then the probability of rolling a four is 1/N or 1/6. We can then say, “In
6 rolls of the die, we would expect 1 roll to result in a four,” and write that as Pr(four) ¼ 1/6.
Here, the notation Pr means “Probability,” and we will use this notation throughout
this book.
How can we get a good estimate of Pr(four) for this particular die?
Table 1.1
In this table, the column called Frequency gives the number of times each outcome was
observed. The column called Probability is the frequency divided by the sum of the
frequencies over all possible outcomes, a proportion. The probability of an event of four is
the frequency of the observed number of occurrences of four (which is 0) divided by the
total throws (which is 1). We can write this as:
jnumber of foursj 0
PrðfourÞ ¼ ¼ ¼ 0: ð1:1Þ
jtotal trialsj 1
This can be read as, “The probability of a four is the number of “four” events divided by the
number of total trials.” Here, the vertical bars indicate a number rather than an absolute
value. This is a frequentist notion of probability, because we estimate Pr(four) by asking
“How frequently did we observe the outcome that interests us out of the total?”
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
INTRODUCTION TO PROBABILITY 5
1.0
0.8
Probability
0.6
0.4
0.2
0.0
One Two Three Four Five Six
Outcome
Answer: No.
By now, you should realize that one roll will never give us a good estimate of Pr(four).
(Sometimes, though, that’s all we have . . . we have only one Planet Earth, for example).
Next, you roll the die 9 more times, and summarize the results of the 10 total rolls (i.e., 10
trials or 10 experiments) in Table 1.2. The number of fours is 2, which allows us to estimate
Pr(four) as 2/10, or 0.20 (see Figure 1.2).
Table 1.2
1.0
0.8
Probability
0.6
0.4
0.2
0.0
One Two Three Four Five Six
Outcome
What can you conclude from these results? The estimate of Pr(four) ¼ 0.2 seems to
indicate that the die may be in your favor! (Remember, you expect Pr(four) ¼ 0.1667 if
the die was fair). But $10,000 is a lot of money, and you decide that you should keep test
rolling until the gamemaster shouts “Enough!” Amazingly, you are able to squeeze in 500
rolls, and you obtain the results shown in Table 1.3.
Table 1.3
The plot of the frequency results in Figure 1.3 is called a frequency histogram. Notice
that frequency, not probability, is on the y-axis. We see that a four was rolled 41 times.
Notice also that the sum of the frequencies is 500. The frequency distribution is an example
of an empirical distribution: It is constructed from raw data.
100
80
Frequency
60
40
20
0
One Two Three Four Five Six
Outcome
We can now estimate Pr(four) as 41/500 ¼ 0.082. We can calculate the probability estimates
for the other outcomes as well and then plot them as the probability distribution in Figure 1.4.
1.0
0.8
Probability
0.6
0.4
0.2
0.0
One Two Three Four Five Six
Outcome
INTRODUCTION TO PROBABILITY 7
yields an estimate that is closer and closer to the true probability as the number of trials
increases. In other words, your estimate of Pr(four) gets closer and closer to or approaches
the true probability when you use more trials (rolls) in your calculations.
Table 1.4 lists the six possible outcomes, and the probability of each event (1/6 ¼ 0.167).
Notice that the sum of the probabilities across the events is 1.0.
Table 1.4
Outcome Probability
One 0.167
Two 0.167
Three 0.167
Four 0.167
Five 0.167
Six 0.167
Sum 1
Figure 1.5 shows exactly the same information as Table 1.4; both are examples of a
probability distribution. On the horizontal axis, we list each of the possible outcomes. On
the vertical axis is the probability. The height of each bar provides the probability of
observing each outcome. Since each outcome has an equal chance of being rolled, the
heights of the bars are all the same and show as 0.167, which is 1/N. Note that this is not an
empirical distribution, because we did not generate it from an experiment. Rather, it was
based on the assumption that all outcomes are equally likely.
1.0
0.8
Probability
0.6
0.4
0.2
0.0
One Two Three Four Five Six
Outcome
How would you change the table and probability distribution if the
die were loaded in favor of a four?
Table 1.5
Outcome Probability
One 0.12
Two 0.12
Three 0.12
Four 0.4
Five 0.12
Six 0.12
Sum 1
The probabilities listed in the table sum to 1.0 just as before, as do the heights of the
corresponding blue bars in Figure 1.6.
1.0
0.8
Probability
0.6
0.4
0.2
0.0
One Two Three Four Five Six
Outcome
Answer: The bet is that if you roll a four, you win $1,000,000, and if you don’t roll a
four, you lose $10,000. It would be useful to group our 6 possible outcomes into one of
two events. As the Oxford Dictionary of Statistics explains, “An event is a particular
collection of outcomes, and is a subset of the sample space.” Probabilities are assigned
to events.
Our two events are E1 ¼ {four} and E2 ¼ {one, two, three, five, six}. The brackets { }
indicate the set of outcomes that belong in each event. The first event contains one
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
INTRODUCTION TO PROBABILITY 9
outcome, while the second event consists of five possible outcomes. Thus, in probability
theory, outcomes can be grouped into new events at will. We started out by considering six
outcomes, and now we have collapsed those into two events.
Now we assign a probability to each event. We know that Pr(four) ¼ 0.4. What is the
probability of NOT rolling a four? That is the probability of rolling one OR two OR three OR
five OR six. Note that these events cannot occur simultaneously. Thus, we can write
Pr(four) as the SUM of the probabilities of events one, two, three, five, and six (which is
0.12 þ 0.12 þ 0.12 þ 0.12 þ 0.12 ¼ 0.6). Incidentally, the sign means “complement of.” If
A is an event, A is its complement (i.e., everything but A). This is sometimes written as Ac.
The word OR is a tip that you ADD the individual probabilities together to get your answer
as long as the events are mutually exclusive (i.e., cannot occur at the same time).
This is an example of a fundamental rule in probability theory: if two or more events are
mutually exclusive, then the probability of any occurring is the sum of the probabilities of
each occurring. Because the different outcomes of each roll (i.e., rolling a one, two, three,
five, or six) are mutually exclusive, the probability of getting any outcome other than four is
the sum of the probability of each one occurring (see Table 1.6).
Table 1.6
Event Probability
Four 0.4
Not Four 0.6
Sum 1
Note that the probabilities of these two possible events sum to 1.0. Because of that, if we
know that Pr(four) is 0.4, we can quickly compute the Pr(four) as 1 0.4 ¼ 0.6 and save a
few mental calculations.
The probability distribution looks like the one shown in Figure 1.7:
1.0
0.8
Probability
0.6
0.4
0.2
0.0
Four Not Four
Event
Remember: The SUM of the probabilities across all the different outcomes MUST EQUAL
1! In the discrete probability distributions above, this means that the heights of the bars
summed across all discrete outcomes (i.e., all bars) totals 1.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
Do you still want to play? At the end of the book, we will introduce decision trees, an
analytical framework that employs Bayes’ Theorem to aid in decision-making. But that
chapter is a long way away.
Answer: We’re a few chapters away from hitting on this very important topic. You’ll see
that Bayesians think of probability in a way that allows the testing of theories and hypoth-
eses. But you have to walk before you run. What you need now is to continue learning the
basic vocabulary associated with probability theory.
What’s next?
Answer: In Chapter 2, we’ll expand our discussion of probability. See you there.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
CHAPTER 2
Now that you’ve had a short introduction to probability, it’s time to build on our
probability vocabulary. By the end of this chapter, you will understand the
following terms:
• Venn diagram
• Marginal probability
• Joint probability
• Independent events
• Dependent events
• Conditional probability
Let’s start with a few questions.
A gala that celebrates the sense of vision? Nope. The eyeball event in this chapter
refers to whether a person is right-eyed dominant or left-eyed dominant. You already
know if you are left- or right-handed, but did you know that you are also left- or right-
eyed? Here’s how to tell (http://www.wikihow.com/Determine-Your-Dominant-Eye; see
Figure 2.1):
Bayesian Statistics for Beginners: A Step-by-Step Approach. Therese M. Donovan and Ruth M. Mickey,
Oxford University Press (2019). © Ruth M. Mickey 2019.
DOI: 10.1093/oso/9780198841296.001.0001
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
1. Stretch your arms out in front of you and create a hole with your hands by joining your
finger tips to make a triangular opening, as shown.
2. Find a small object nearby and align your hands with it so that you can see it in the
triangular hole. Make sure you are looking straight at the object through your hands—
cocking your head to either side, even slightly, can affect your results. Be sure to keep
both eyes open!
3. Slowly move your hands toward your face to draw your viewing window toward you. As
you do so, keep your head perfectly still, but keep the object lined up in the hole between
your hands. Don’t lose sight of it.
4. Draw your hands in until they touch your face—your hands should end up in front of
your dominant eye. For example, if you find that your hands end up so you are looking
through with your right eye, that eye is dominant.
The eyeball characteristic has two discrete outcomes: lefty (for left-eyed dominant people)
or righty (for right-eyed dominant people). Because there are only two outcomes, we can
call them events if we want.
Let us suppose that you ask 100 people if they are “lefties” or “righties.” In this case,
the 100 people represent our “universe” of interest, which we designate with the letter
U. The total number of elements (individuals) in U is written |U|. (Once again, the
vertical bars here simply indicate that U is a number; it doesn’t mean the absolute value
of U.)
Here, there are only two possible events: “lefty” and “righty.” Together, they make up a
set of possible outcomes. Let A be the event “left-eye dominant,” and A be the event
“right-eye dominant.” Here, the tilde means “complement of,” and here it can be inter-
preted as “everything but A.” Notice that these two events are mutually exclusive: you
cannot be both a “lefty” and a “righty.” The events are also “exhaustive” because you must
be either a lefty or righty.
Suppose that 70 of 100 people are lefties. These people are a subset of the
larger population. The number of people in event A can be written | A |, and in this
example | A | ¼ 70. Note that | A | must be less than or equal to | A |, which is 100.
Remember that we use the vertical bars here to highlight that we are talking about a
number.
Since there are only two possibilities for eye dominance type, this means that 100 70 ¼ 30
people are righties. The number of people in event A can be written |A|, and in this
example |A| ¼ 30. Note that |A| must be less than or equal to |U|.
Our universe can be summarized as shown in Table 2.1.
Table 2.1
Event Frequency
Lefty (A) | A | ¼ 70
Righty (A) | A | ¼ 30
Universe (U) | U | ¼ 100
We can illustrate this example in a diagrammatic form, as shown in Figure 2.2. Here, our
universe of 100 people is captured inside a box.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
This is a Venn diagram, which shows A and A. The universe U is 100 people and is
represented by the entire box. We then allocate those 100 individuals into A and A. The blue
circle represents A; lefties stand inside this circle; righties stand outside the circle, but inside
the box. You can see that A consists of 70 elements, and A consists of 30 elements.
Lefty (A)
70
Righty (∼A) 30
Figure 2.2
Answer: Venn diagrams are named for John Venn (see Figure 2.3), who wrote his seminal
article in 1880.
According to the MacTutor History of Mathematics Archive, Venn’s son described him as
“of spare build, he was throughout his life a fine walker and mountain climber, a keen
botanist, and an excellent talker and linguist.”
Answer: We write this as Pr(A). Remember that Pr stands for probability, so Pr(A) means
the probability that a person is in group A and therefore is a lefty. We can determine the
probability that a person is in group A as:
jAj 70
PrðAÞ ¼ ¼ ¼ 0:7: ð2:1Þ
j U j 100
Probability is determined as the number of persons in group A out of the total. The
probability that the randomly selected person is in group A is 0.7.
Answer: There are 30 of them (100 70 ¼ 30), and they are righties.
jAj 30
Prð AÞ ¼ ¼ ¼ 0:3: ð2:2Þ
jU j 100
With only two outcomes for the eyeball event, our notation focuses on the probability
of being in a given group (A ¼ lefties) and the probability of not being in the given group
(A ¼ righties).
Answer: Yes, of course. Let’s probe these same 100 people and find out other details about
their anatomy. Suppose we are curious about the presence or absence of Morton’s toe.
People with “Morton’s toe” have a large second metatarsal, longer in fact than the first
metatarsal (which is also known the big toe or hallux toe). Wikipedia articles suggest that
this is a normal variation of foot shape in humans and that less than 20% of the human
population have this condition. Now we are considering a second characteristic for our
population, namely toe type.
Let’s let B designate the event “Morton’s toe.” Let the number of people with Morton’s
toe be written as |B|. Let B designate the event “common toe.” Suppose 15 of the 100
people have Morton’s toe. This means |B| ¼ 15, and |B| ¼ 85. The data are shown in
Table 2.2, and the Venn diagram is shown in Figure 2.4.
Table 2.2
Event Frequency
Morton’s toe (B) |B| ¼ 15
Common toe (B) |B| ¼ 85
Universe (U) |U| ¼ 100
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
These events can be represented in a Venn diagram, where a box holds our universe of
100 people.
Note the size of this red circle is smaller than the previous example because the number of
individuals with Morton’s toe is much smaller.
Figure 2.4
Answer: You bet. With two characteristics, each with two outcomes, we have four possible
combinations:
1. Lefty AND Morton’s toe, which we write as A \ B.
2. Lefty AND common toe, which we write as A \B.
3. Righty AND Morton’s toe, which we write as A \ B.
4. Righty AND common toe, which we write as A \B.
The upside-down \ is the mathematical symbol for intersection. Here, you can read it as
“BOTH” or “AND.”
The number of individuals in A \ B can be written j A \ B j, where the bars indicate a
number (not absolute value). Let’s suppose we record the frequency of individuals in each
of the four combinations (see Table 2.3).
Table 2.3
blue) sums the number of people that are lefties and righties. The lower right (white)
quadrant gives the grand total, or |U|.
Now let’s plot the results in the same Venn diagram (see Figure 2.5).
Lefty (A)
Figure 2.5
The updated Venn diagram shows the blue circle with 70 lefties, the red circle with
15 people with Morton’s toe, and no overlap between the two. This means j A \ B j ¼ 0,
j A j ¼ j A \ B j ¼ 70, and j B j ¼ j A \ B j ¼ 15. By subtraction, we know the number of
individuals that are not in A OR B j A \ B j is 15 because we need to account for all 100
individuals somewhere in the diagram.
Answer: For our universe, no. If you are a person with Morton’s toe, you are standing in
the red circle and cannot also be standing in the blue circle. So these two events
(Morton’s toe and lefties) are mutually exclusive because they do not occur at the
same time.
Answer: You bet. All 70 lefties do not have Morton’s toe. These two events are non-
mutually exclusive.
Answer: In this case, we need to adjust the Venn diagram to show that five of the people
that are lefties also have Morton’s toe. These individuals are represented as the intersection
between the two events (see Figure 2.6). Note that the total number of individuals is still
100; we need to account for everyone!
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
Lefty (A)
Figure 2.6
We’ll run with this example for the rest of the chapter.
This Venn diagram is pretty accurate (except for the size of the box overall). There are
70 people in A, 15 people in B, and 5 people in A \ B. The blue circle contains 70 elements
in total, and 5 of those elements also occur in B. The red circle contains 15 elements (so is a
lot smaller than the blue circle), and 5 of these are also in A. So 5/15 (33%) of the red circle
overlaps with the blue circle, and 5/70 (7%) of the blue circle overlaps with the red circle.
Of the four events (A, A, B, and B), which are not
mutually exclusive?
Answer:
• A and B are not mutually exclusive (a lefty can have Morton’s toe).
• A and B are not mutually exclusive (a lefty can have a common toe).
• A and B are not mutually exclusive (a righty can have Morton’s toe).
• A and B are not mutually exclusive (a righty can have a common toe).
In Venn diagrams, if two events overlap, they are not mutually exclusive.
Answer: If we focus on each circle, A and A are mutually exclusive (a person cannot be a
lefty and righty). B and B are mutually exclusive (a person cannot have Morton’s toe and a
common toe). Apologies for the trick question!
If you were one of the lucky 100 people included in the universe,
where would you fall in this diagram?
Table 2.4
This is an important table to study. Once again, notice that there are four quadrants or
sections in this table. In the upper left quadrant, the first two columns represent the two
possible events for eyeball dominance: lefty and righty. The first two rows represent the two
possible events for toe type: Morton’s toe and common toe.
The upper left entry indicates that 5 people are members of both A and B.
• Look for the entry that indicates that 20 people are A \B, that is, both not A and not B.
• Look for the entry that indicates that 65 people are A \B.
• Look for the entry that indicates that 10 people are A \ B.
The lower left and upper right quadrants of our table are called the margins of the table.
They are shaded light blue. Note that the total number of individuals in A (regardless of B)
is 70, and the total number of individuals in A is 30. The total number of individuals in B
(regardless of A) is 15 and the total number of individuals B is 85. Any way you slice it, the
grand total must equal 100 (the lower right quadrant).
Answer: Well, if you are interested in determining the probability that an individual
belongs to any of these four groups, you could use your universe of 100 individuals to do
the calculation. Do you remember the frequentist way to calculate probability? We
learned about that in Chapter 1.
Our total universe in this case is the 100 individuals. To get the probability that a person
selected at random would belong to a particular event, we simply divide the entire table
above by our total, which is 100, and we get the results shown in Table 2.5.
Table 2.5
We’ve just converted the raw numbers to probabilities by dividing the frequency table by
the grand total.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
Note that this differs from our die rolling exercise in Chapter 1, where you were unsure
what the probability was and had to repeatedly roll a die to estimate it. By the Law of Large
Numbers, the more trials you have, the more you zero in on the actual probability. In this
case, however, we are given the number of people in each category, so the calculation is
straightforward. These 100 people are the only people of interest. They represent our
universe of interest; we are not using them to sample a larger group. If you didn’t know the
make-up of the universe, you could randomly select one person out of the universe over
and over again to get the probabilities, where all persons are equally likely to be selected.
Let’s walk through one calculation. Suppose we want to know the probability that an
individual is a lefty AND has Morton’s toe. The number of individuals in A and B is
written jA \ Bj and the probability that an individual is a lefty with Morton’s toe is
written:
jA \ Bj 5
PrðA \ BÞ ¼ ¼ ¼ 0:05: ð2:4Þ
jU j 100
This is officially called the joint probability and is the upper left entry in our table. The
Oxford Dictionary of Statistics states that the “joint probability of a set of events is the
probability that all occur simultaneously.” Joint probabilities are also called conjoint
probabilities. Incidentally, a table that lists joint probabilities such as the one above is
sometimes referred to as a conjoint table.
When you hear the word joint, you should think of the word AND and realize that you
are considering (and quantifying) more than one characteristic of the population. In this
case, it indicates that someone is in A AND B. This is written as:
Answer: This is equivalent to asking, what is the joint probability that a person is
right-eye dominant AND has Morton’s toe? See if you can find this entry in Table 2.5. The
answer is 0.1.
In addition to the joint probabilities, the table also provides the marginal probabilities,
which look at the probability of A or A (regardless of B) and the probability of B or B
(regardless of A).
Answer: The word marginal in the dictionary is defined as “pertaining to the margins; or
situated on the border or edge.” In our table, the marginal probabilities are just the
probabilities for one characteristic of interest (e.g., A and A) regardless of other character-
istics that might be listed in the table.
Let’s now label each cell in our conjoint table by its probability type (see Table 2.6).
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
Table 2.6
Suppose you know only the following facts: the marginal probability of being a lefty is
0.7, the marginal probability of having Morton’s toe is 0.15, and the joint probability of
being a lefty with Morton’s toe is 0.05. Also suppose that you haven’t looked at
Table 2.6!
Take out some scratch paper and a pencil. You can do it! Here are some hints:
• the lower right hand quadrant must equal 1.00;
• for any given characteristic, the sum of the two marginal probabilities must equal 1.00.
Table 2.7
Answer: Because the marginal probabilities for eyeballs must sum to 1.00, and the mar-
ginal probabilities for toes must sum to 1.00 (because they deal with mutually exclusive
events), we can fill in the missing marginal probabilities.
The marginal probability of a lefty, Pr(A), is 0.7, so the marginal of a righty, Pr(A), must
be 1.00 0.7 ¼ 0.3.
The marginal probability of having Morton’s toe, Pr(B), is 0.15, so the marginal of Pr(B)
must be 1.00 0.15 ¼ 0.85.
So far, so good. Once we know the marginals, we can calculate the joint probabilities in
the upper left quadrant (see Table 2.8). For example:
• if the marginal Pr(A) ¼ 0.7, then we know that PrðA \ BÞ ¼ 0.7 0.05 ¼ 0.65;
• if the marginal Pr(B) ¼ 0.15, then we know that Prð A \ BÞ ¼ 0.15 0.05 ¼ 0.1;
• if the marginal Pr(B) ¼ 0.3, then we know that Prð A \ BÞ ¼ 0.3 0.1 ¼ 0.2.
Table 2.8
Answer: Don’t cheat now . . . try to express Pr(B) as the sum of joint probabilities before
reading on! This step is essential for understanding Bayesian inference in future chapters!
How did you do?
Hint 1: We can decompose the total, 0.15, into its two pieces: 0.05 þ 0.1.
The probability that a lefty has Morton’s toe can be written:
If we put these two terms together, we can express the marginal probability of having
Morton’s toe as:
Can we look at this problem from the Venn diagram perspective again?
Lefty
Morton’s toe
65 5 10
Figure 2.7
Answer: To answer this question, we must introduce the very important concept of
conditional probability.
Answer: Conditional probability is the probability of an event given that another event
has occurred.
Conditional probability is written as:
• Pr(A|B), which is read “the probability of A, given that B occurs”; in our context, Pr(A|B)
is Pr(lefty | Morton’s toe);
• Pr(A|B), which is read “the probability of A, given that B occurs”; in our context,
Pr(A|B) is Pr(lefty | common toe);
• Pr(B|A), which is read “the probability of B, given that A occurs”; in our context,
Pr(B|A) is Pr(Morton’s toe | righty);
• etc.
The vertical bar means “given.”
Answer: You use the following equation, which is a standard equation in probability
theory:
PrðA \ BÞ
PrðAjBÞ ¼ : ð2:11Þ
PrðBÞ
It’s essential that you understand conditional probability, so let’s look at this equation from
a few different angles and, in the words of Kalid Azad, “let’s build some intuition” about
what it means.
Angle 1: The Venn diagram zoom
We already know that the numerator
PrðA \ BÞ ð2:12Þ
is the intersection in the Venn diagram where A and B overlap (the probability of a lefty and
Morton’s toe). This can be written as
PrðB \ AÞ ð2:13Þ
as well. The intersection of A and B is the intersection, no matter how you write it:
And we know that the denominator Pr(B) is the probability of Morton’s toe.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
In the Venn diagram, we can focus on the area of B and then look to see what fraction of
the total B is occupied by A. In this example, we restrict our attention to the 15 people with
Morton’s toe, and note that 5 of them are lefties. Therefore, about 5/15 or 1/3 of the red
circle is overlapped by the blue circle.
A ∩ B: 5 ∼A ∩ B: 10
Figure 2.8
For the numbers given, we can see that Pr(A | B) ¼ 5/15 ¼ 1/3 ¼ 0.333 ¼ 33.3%. A general
rule can help with the visualization: zoom to the denominator space B, then determine
what fraction of this space is occupied by A. Similarly, Pr(A | B) ¼ 10/15 ¼ 2/3 ¼ 0.667 ¼
66.7%. Note that these probabilities sum to 1.
Angle 2: The table approach
We can also tackle this problem using the raw data (see Table 2.9).
Table 2.9
PrðA \ BÞ
PrðA j BÞ ¼ : ð2:15Þ
PrðBÞ
jA \ Bj 5
PrðA \ BÞ ¼ ¼ : ð2:16Þ
jU j 100
jBj 15
PrðBÞ ¼ ¼ : ð2:17Þ
jU j 100
Answer: If you have Morton’s toe, the probability of being a lefty is 0.33. If you don’t have
Morton’s toe, the probability of being a lefty is 65/85 ¼ 0.77 (you can confirm this too).
If Morton’s toe does not matter, these conditional probabilities should be equal to the
marginal probability, which is 0.7. This is clearly not the case here.
Answer: In other words, is the probability of a lefty, given Morton’s toe, the same thing as
the probability of Morton’s toe, given a lefty? Let’s try it!
jA\Bj
jU j jA \ Bj 5
PrðA j BÞ ¼ ¼ ¼ ¼ 0:333 ð2:19Þ
jBj j Bj 15
jU j
jA\Bj
jU j jA \ Bj 5
PrðB j AÞ ¼ ¼ ¼ ¼ 0:072: ð2:20Þ
jAj jAj 70
jU j
So the answer is No! These two probabilities are very different things. The first asks what is
the probability of A given that event B happens (with a result of 0.333), while the second
asks what is the probability of B given that A happens (with a result of 0.072).
Answer: Yes . . . see if you can find it before looking at Table 2.10!
Table 2.10
Remember, when dealing with conditional probabilities, the key word is “zoom.” Let’s
start with Pr(A | B):
PrðA \ BÞ
PrðA j BÞ ¼ : ð2:21Þ
PrðBÞ
If B happens, we zoom to row 1 (Morton’s toe), and then ask what fraction of the people
with Morton’s toe are lefties:
:05
PrðA j BÞ ¼ ¼ 0:333: ð2:22Þ
0:15
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
Answer: If A happens, we zoom to the first column (lefties) and then ask what fraction of
the lefties have Morton’s toe:
0:05
PrðB j AÞ ¼ ¼ 0:072: ð2:23Þ
0:7
PrðA \ BÞ
PrðA j BÞ ¼ : ð2:24Þ
PrðBÞ
You can rearrange this to your heart’s content. For this book, the most important
rearrangement is:
This formula can be used to calculate joint probability, PrðA \ BÞ. Take some time to make
sure this equation sinks in and makes full sense to you.
As an aside, if the occurrence of one event does not change the probability of the
other occurring, the two events are said to be independent. This means that
PrðA j BÞ ¼ PrðAj BÞ ¼ PrðAÞ.
So, when A and B are independent:
Answer: That, dear reader, is the subject of our next chapter, where we will derive Bayes’
Theorem. See you there!
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
SECTION 2
Overview
• In Chapter 3, Bayes’ Theorem is introduced. The chapter shows its derivation and
describes two ways to think about it. First, Bayes’ Theorem describes the relationship
between two inverse conditional probabilities, P(A|B) and P(B|A). Second, Bayes’ The-
orem can be used to express how a degree of belief for a given hypothesis can be updated
in light of new evidence. This chapter focuses on the first interpretation.
• Chapter 4 introduces the concept of Bayesian inference. The chapter discusses the
scientific method, and illustrates how Bayes’ Theorem can be used for scientific inference.
Bayesian Inference is the use of Bayes’ Theorem to draw conclusions about a set of
mutually exclusive and exhaustive alternative hypotheses by linking prior knowledge
about each hypothesis with new data. The result is updated probabilities for each hy-
pothesis of interest. The ideas of prior probabilities, likelihood, and posterior probabilities
are introduced.
• Chapter 5, the “Author Problem,” provides a concrete example of Bayesian inference.
This chapter draws on work by Frederick Mosteller and David Wallace, who used Bayesian
inference to assign authorship for unsigned Federalist Papers. The Federalist Papers were a
collection of papers known to be written during the American Revolution. However,
some papers were unsigned by the author, resulting in disputed authorship. The chapter
provides a very basic Bayesian analysis of the unsigned “Paper 54,” which was written by
Alexander Hamilton or James Madison. The example illustrates the principles of Bayesian
inference for two competing hypotheses.
• Chapter 6, the “Birthday Problem,” is intended to highlight the decisions the analyst
(you!) must make in setting the prior distribution. The “Birthday Problem” expands
consideration from two hypotheses to multiple, discrete hypotheses. In this chapter,
interest is in determining the posterior probability that a woman named Mary was born
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
By the end of this section, you will have a good understanding of how Bayes’ Theorem is
related to the scientific method.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
CHAPTER 3
Bayes’ Theorem
In this chapter, we’re going to build on the content in Section 1 and derive Bayes’ Theorem.
This is what you’ve been waiting for!
By the end of this chapter, you will be able to derive Bayes’ Theorem and explain the
relationship between PrðA j BÞ and PrðB j AÞ.
Let’s begin with a few questions.
Answer: It could be, but nobody is really sure! We’ll revisit this question in a future chapter.
Bayesian Statistics for Beginners: A Step-by-Step Approach. Therese M. Donovan and Ruth M. Mickey,
Oxford University Press (2019). © Ruth M. Mickey 2019.
DOI: 10.1093/oso/9780198841296.001.0001
Another random document with
no related content on Scribd:
Transcriber’s Note
Page 67: “wen trooping by” changed to “went trooping by”
*** END OF THE PROJECT GUTENBERG EBOOK THE COAT
WITHOUT A SEAM, AND OTHER POEMS ***
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must, at
no additional cost, fee or expense to the user, provide a copy, a
means of exporting a copy, or a means of obtaining a copy upon
request, of the work in its original “Plain Vanilla ASCII” or other
form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.