Instant Download Bayesian Statistics For Beginners: A Step-By-Step Approach Therese M Donovan PDF All Chapter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

Full download test bank at ebook ebookmass.

com

Bayesian Statistics for Beginners:


A Step-By-Step Approach Therese M
Donovan

CLICK LINK TO DOWLOAD

https://ebookmass.com/product/bayesian-
statistics-for-beginners-a-step-by-step-
approach-therese-m-donovan/

ebookmass.com
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Elementary Statistics: A Step By Step Approach 10th


Edition, (Ebook PDF)

https://ebookmass.com/product/elementary-statistics-a-step-by-
step-approach-10th-edition-ebook-pdf/

Elementary Statistics: A Step By Step Approach, 11e ISE


11th/ISE Edition Allan Bluman

https://ebookmass.com/product/elementary-statistics-a-step-by-
step-approach-11e-ise-11th-ise-edition-allan-bluman/

ISE Elementary Statistics: A Step By Step Approach - A


Brief Version, 8e 8th Edition Allan G. Bluman

https://ebookmass.com/product/ise-elementary-statistics-a-step-
by-step-approach-a-brief-version-8e-8th-edition-allan-g-bluman/

WordPress for Beginners 2019: A Visual Step by Step


Guide to Mastering WordPress (Webmaster Series Book 2)
(Ebook PDF)

https://ebookmass.com/product/wordpress-for-
beginners-2019-a-visual-step-by-step-guide-to-mastering-
wordpress-webmaster-series-book-2-ebook-pdf/
Developing Helping Skills: A Step by Step Approach to
Competency 3rd Edition, (Ebook PDF)

https://ebookmass.com/product/developing-helping-skills-a-step-
by-step-approach-to-competency-3rd-edition-ebook-pdf/

Developing Helping Skills: A Step-By-Step Approach to


Competency 3rd Edition Valerie Nash Chang

https://ebookmass.com/product/developing-helping-skills-a-step-
by-step-approach-to-competency-3rd-edition-valerie-nash-chang/

Endovascular Interventions-A Step-by-Step Approach (Nov


6, 2023)_(1119467780)_(Wiley-Blackwell) Jose Wiley

https://ebookmass.com/product/endovascular-interventions-a-step-
by-step-approach-nov-6-2023_1119467780_wiley-blackwell-jose-
wiley/

Learn Data Mining Through Excel: A Step-by-Step


Approach for Understanding Machine Learning Methods,
2nd Edition Hong Zhou

https://ebookmass.com/product/learn-data-mining-through-excel-a-
step-by-step-approach-for-understanding-machine-learning-
methods-2nd-edition-hong-zhou/

Microsoft Project 2019 Step by Step (Step by Step


(Microsoft)) Cindy Lewis & Carl Chatfield & Timothy
Johnson [Lewis

https://ebookmass.com/product/microsoft-project-2019-step-by-
step-step-by-step-microsoft-cindy-lewis-carl-chatfield-timothy-
johnson-lewis/
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

Bayesian Statistics
for Beginners
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

Bayesian Statistics
for Beginners
A Step-by-Step Approach

THERESE M. DONOVAN
RUTH M. MICKEY

1
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

3
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Ruth M. Mickey 2019
The moral rights of the author have been asserted
First Edition published in 2019
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2019934655
ISBN 978–0–19–884129–6 (hbk.)
ISBN 978–0–19–884130–2 (pbk.)
DOI: 10.1093/oso/9780198841296.001.0001
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

To our parents, Thomas and Earline Donovan and Ray and Jean Mickey,
for inspiring a love of learning.

To our mentors, some of whom we’ve met only by their written words,
for teaching us ways of knowing.

To Peter, Evan, and Ana—for everything.


OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

Preface

Greetings. This book is our attempt at gaining membership to the Bayesian Conspiracy.
You may ask, “What is the Bayesian Conspiracy?” The answer is provided by Eliezer
Yudkowsky (http://yudkowsky.net/rational/bayes): “The Bayesian Conspiracy is a multi
national, interdisciplinary, and shadowy group of scientists that controls publication,
grants, tenure, and the illicit traffic in grad students. The best way to be accepted into the
Bayesian Conspiracy is to join the Campus Crusade for Bayes in high school or college, and
gradually work your way up to the inner circles. It is rumored that at the upper levels of the
Bayesian Conspiracy exist nine silent figures known only as the Bayes Council.”
Ha ha! Bayes’ Theorem, also called Bayes’ Rule, was published posthumously in 1763 in
the Philosophical Transactions of the Royal Society. In The Theory That Would Not Die,
author Sharon Bertsch McGrayne aptly describes how “Bayes’ rule cracked the enigma code,
hunted down Russian submarines, and emerged triumphant from two centuries of contro-
versy.” In short, Bayes’ Rule has vast application, and the number of papers and books that
employ it is growing exponentially.
Inspired by the knowledge that a Bayes Council actually exists, we began our journey by
enrolling in a 5-day ‘introductory’ workshop on Bayesian statistics a few years ago. On Day
1, we were introduced to a variety of Bayesian models, and, on Day 2, we sheepishly had to
inquire what Bayes’ Theorem was and what it had to do with MCMC. In other words, the
material was way over our heads. With tails between our legs, we slunk back home and
began trying to sort out the many different uses of Bayes’ Theorem.
As we read more and more about Bayes’ Theorem, we started noting our own questions as
they arose and began narrating the answers as they became more clear. The result is this
strange book, cast as a series of questions and answers between reader and author. In this
prose, we make heavy use of online resources such as the Oxford Dictionary of Statistics
(Upton and Cook, 2014), Wolfram Mathematics, and the Online Statistics Education: An
Interactive Multimedia Course for Study (Rice University, University of Houston Clear
Lake, and Tufts University). We also provide friendly links to online encyclopedias such
as Wikipedia and Encyclopedia Britannica. Although these should not be considered
definitive, original works, we have included the links to provide readers with a readily
accessible source of information and are grateful to the many authors who have contrib-
uted entries.
We are not experts in Bayesian statistics and make no claim as such. Therese Donovan is a
biologist for the U. S. Geological Survey Vermont Cooperative Fish and Wildlife Research
Unit, and Ruth Mickey is a statistician in the Department of Mathematics and Statistics at
the University of Vermont. We were raised on a healthy dose of “frequentist” and max-
imum likelihood methods but have begun only recently to explore Bayesian methods. We
have intentionally avoided controversial topics and comparisons between Bayesian and
frequentist approaches and encourage the reader to dig deeper—much deeper—than we
have here. Fortunately, a great number of experts have paved the way, and we relied heavily
on the following books while writing our own:
• N. T. Hobbs and M. B. Hooten. Bayesian Models: A Statistical Primer for Ecologists.
Princeton University Press, 2015.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

viii PREFACE

• J. Kruschke. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan.
Elsevier, 2015.
• A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. Chapman &
Hall, 2004.
• J. V. Stone. Bayes’ Rule: a Tutorial Introduction to Bayesian Analysis. Sebtel Press, 2014.
• H. Raiffa and R. Schlaifer. Applied Statistical Decision Theory. Division of Research,
Graduate School of Business Administration, Harvard University, 1961.
• P. Goodwin and G. Wright. Decision Analysis for Management Judgment. John Wiley &
Sons, 2014.

Although we relied on these sources, any mistakes of interpretation are our own.
Our hope is that Bayesian Statistics for Beginners is a “quick read” for the uninitiated and
that, in one week or less, we could find a reader happily ensconced in a book written by one
of the experts. Our goal in writing the book was to keep Bayes’ Theorem front and center in
each chapter for a beginning audience. As a result, Bayes’ Theorem makes an appearance in
every chapter. We frequently bring back past examples and explain what we did “back
then,” allowing the reader to slowly broaden their understanding and sort out what has
been learned in order to relate it to new material. For the most part, our reviewers liked this
approach. However, if this is annoying to you, you can skim over the repeated portions.
If this book is useful to you, it is due in no small part to a team of stellar reviewers. We owe
a great deal of gratitude to George Allez, Cathleen Balantic, Barry Hall, Mevin Hooten, Peter
Jones, Clint Moore, Ben Staton, Sheila Weaver, and Robin White. Their enthusiasm,
questions, and comments have improved the narrative immensely. We offer a heartfelt
thank you to Gary Bishop, Renee Westland, Melissa Murphy, Kevin Roark, John Bell, and
Stuart Geman for providing pictures for this book.
Therese Donovan
Ruth Mickey
October 2018
Burlington, VT
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

Contents

SECTION 1 Basics of Probability

1 Introduction to Probability 3
2 Joint, Marginal, and Conditional Probability 11

SECTION 2 Bayes’ Theorem and Bayesian Inference

3 Bayes’ Theorem 29

4 Bayesian Inference 37
5 The Author Problem: Bayesian Inference with Two Hypotheses 48
6 The Birthday Problem: Bayesian Inference with Multiple Discrete Hypotheses 61

7 The Portrait Problem: Bayesian Inference with Joint Likelihood 73

SECTION 3 Probability Functions

8 Probability Mass Functions 87

9 Probability Density Functions 108

SECTION 4 Bayesian Conjugates

10 The White House Problem: The Beta-Binomial Conjugate 133


11 The Shark Attack Problem: The Gamma-Poisson Conjugate 150
12 The Maple Syrup Problem: The Normal-Normal Conjugate 172

SECTION 5 Markov Chain Monte Carlo

13 The Shark Attack Problem Revisited: MCMC with the Metropolis Algorithm 193
14 MCMC Diagnostic Approaches 212
15 The White House Problem Revisited: MCMC with the Metropolis–Hastings
Algorithm 224
16 The Maple Syrup Problem Revisited: MCMC with Gibbs Sampling 247
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

x CONTENTS

SECTION 6 Applications

17 The Survivor Problem: Simple Linear Regression with MCMC 269


18 The Survivor Problem Continued: Introduction to Bayesian Model Selection 308

19 The Lorax Problem: Introduction to Bayesian Networks 325


20 The Once-ler Problem: Introduction to Decision Trees 353

Appendices

A.1 The Beta-Binomial Conjugate Solution 369


A.2 The Gamma-Poisson Conjugate Solution 373
A.3 The Normal-Normal Conjugate Solution 379

A.4 Conjugate Solutions for Simple Linear Regression 385


A.5 The Standardization of Regression Data 395

Bibliography 399
Hyperlinks 403
Name Index 413
Subject Index 414
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

SECTION 1

Basics of Probability

Overview

And so we begin. This first section deals with basic concepts in probability theory, and
consists of two chapters.

• In Chapter 1, the concept of probability is introduced. Using an example, the chapter


focuses on a single characteristic and introduces basic vocabulary associated with
probability.
• Chapter 2 introduces additional terms and concepts used in the study of probability. The
chapter focuses on two characteristics observed at the same time, and introduces the
important concepts of joint probability, marginal probability, and conditional
probability.
After covering these basics, your Bayesian journey will begin.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

CHAPTER 1

Introduction to Probability

In this chapter, we’ll introduce some basic terms used in the study of probability. By the end
of this chapter, you will be able to define the following:
• Sample space
• Outcome
• Discrete outcome
• Event
• Probability
• Probability distribution
• Uniform distribution
• Trial
• Empirical distribution
• Law of Large Numbers
To begin, let’s answer a few questions . . .

What is probability?

Answer: The best way to introduce probability is to discuss an example. Imagine you’re a
gambler, and you can win $1,000,000 if a single roll of a die turns up four. You get only one
roll, and the entry fee to play this game is $10,000. If you win, you’re a millionaire. If you
lose, you’re out ten grand.

Should you play?

Answer: It’s up to you!


If the roll always comes up a four, you should play! If it never comes up four, you’d be
foolish to play! Thus, it’s helpful to know something about the die. Is it fair? That is, is each
face equally likely to turn up? How likely are you to roll a four?
This type of question was considered by premier mathematicians Gerolamo Cardano
(1501–1576), Pierre de Fermat (1601–1665), Blaise Pascal (1623–1662), and others. These
brilliant minds created a branch of mathematics known as probability theory.
The rolling of a die is an example of a random process: the face that comes up is subject
to chance. In probability, our goal is to quantify a random process, such as rolling a die.
That is, we want to assign a number to it. If we roll a die, there are 6 possible outcomes
(possible results), namely, one, two, three, four, five, or six. The set of all possible outcomes
is called the sample space.

Bayesian Statistics for Beginners: A Step-by-Step Approach. Therese M. Donovan and Ruth M. Mickey,
Oxford University Press (2019). © Ruth M. Mickey 2019.
DOI: 10.1093/oso/9780198841296.001.0001
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

4 BAYESIAN STATISTICS FOR BEGINNERS

Let’s call the number of possible outcomes N, so N ¼ 6. These outcomes are discrete,
because each result can take on only one of these values. Formally, the word “discrete” is
defined as “individually separate and distinct.” In addition, in this example, there is a
finite number of possible outcomes, which means “there are limits or bounds.” In other
words, the number of possible outcomes is not infinite.
If we believe that each and every outcome is just as likely to result as every other outcome
(i.e., the die is fair), then the probability of rolling a four is 1/N or 1/6. We can then say, “In
6 rolls of the die, we would expect 1 roll to result in a four,” and write that as Pr(four) ¼ 1/6.

Here, the notation Pr means “Probability,” and we will use this notation throughout
this book.

How can we get a good estimate of Pr(four) for this particular die?

Answer: You collect some data.


Before you hand over the $10,000 entry, you ask the gamemaster if you could run an
“experiment” and roll the die a few times before you make the decision to play. In this
experiment, which consists of tossing a die many times, your goal is to get a rough estimate
of the probability of rolling a four compared to what you expect. To your amazement, the
gamemaster complies.
You start with one roll, which represents a single “trial” of an experiment. Suppose you roll
a three and give the gamemaster a sneer. Table 1.1 shows you rolled 1 three, and 0 for the rest.

Table 1.1

Outcome Frequency Probability


One 0 0
Two 0 0
Three 1 1
Four 0 0
Five 0 0
Six 0 0
Sum 1 1

In this table, the column called Frequency gives the number of times each outcome was
observed. The column called Probability is the frequency divided by the sum of the
frequencies over all possible outcomes, a proportion. The probability of an event of four is
the frequency of the observed number of occurrences of four (which is 0) divided by the
total throws (which is 1). We can write this as:

jnumber of foursj 0
PrðfourÞ ¼ ¼ ¼ 0: ð1:1Þ
jtotal trialsj 1

This can be read as, “The probability of a four is the number of “four” events divided by the
number of total trials.” Here, the vertical bars indicate a number rather than an absolute
value. This is a frequentist notion of probability, because we estimate Pr(four) by asking
“How frequently did we observe the outcome that interests us out of the total?”
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

INTRODUCTION TO PROBABILITY 5

A probability distribution that is based on raw data is called an empirical probability


distribution. Our empirical probability distribution so far looks like the one in Figure 1.1:

1.0

0.8

Probability
0.6

0.4

0.2

0.0
One Two Three Four Five Six
Outcome

Figure 1.1 Empirical probability distribution for


die outcomes, given 1 roll.

Is one roll good enough?

Answer: No.
By now, you should realize that one roll will never give us a good estimate of Pr(four).
(Sometimes, though, that’s all we have . . . we have only one Planet Earth, for example).
Next, you roll the die 9 more times, and summarize the results of the 10 total rolls (i.e., 10
trials or 10 experiments) in Table 1.2. The number of fours is 2, which allows us to estimate
Pr(four) as 2/10, or 0.20 (see Figure 1.2).

Table 1.2

Outcome Frequency Probability


One 0 0.0
Two 2 0.2
Three 5 0.5
Four 2 0.2
Five 0 0.0
Six 1 0.1
Sum 10 1.0

1.0

0.8
Probability

0.6

0.4

0.2

0.0
One Two Three Four Five Six
Outcome

Figure 1.2 Empirical probability distribution for


die outcomes, given 10 rolls.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

6 BAYESIAN STATISTICS FOR BEGINNERS

What can you conclude from these results? The estimate of Pr(four) ¼ 0.2 seems to
indicate that the die may be in your favor! (Remember, you expect Pr(four) ¼ 0.1667 if
the die was fair). But $10,000 is a lot of money, and you decide that you should keep test
rolling until the gamemaster shouts “Enough!” Amazingly, you are able to squeeze in 500
rolls, and you obtain the results shown in Table 1.3.

Table 1.3

Outcome Frequency Probability


One 88 0.176
Two 91 0.182
Three 94 0.188
Four 41 0.082
Five 99 0.198
Six 87 0.174
Sum 500 1.000

The plot of the frequency results in Figure 1.3 is called a frequency histogram. Notice
that frequency, not probability, is on the y-axis. We see that a four was rolled 41 times.
Notice also that the sum of the frequencies is 500. The frequency distribution is an example
of an empirical distribution: It is constructed from raw data.

100

80
Frequency

60

40

20

0
One Two Three Four Five Six
Outcome

Figure 1.3 Frequency distribution of 500 rolls.

We can now estimate Pr(four) as 41/500 ¼ 0.082. We can calculate the probability estimates
for the other outcomes as well and then plot them as the probability distribution in Figure 1.4.
1.0

0.8
Probability

0.6

0.4

0.2

0.0
One Two Three Four Five Six
Outcome

Figure 1.4 Empirical probability distribution for


500 rolls.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

INTRODUCTION TO PROBABILITY 7

According to the Law of Large Numbers in probability theory, the formula:

jnumber of observed outcomes of interestj


probability ¼ ð1:2Þ
jtotal trialsj

yields an estimate that is closer and closer to the true probability as the number of trials
increases. In other words, your estimate of Pr(four) gets closer and closer to or approaches
the true probability when you use more trials (rolls) in your calculations.

What would we expect if the die were fair?

Table 1.4 lists the six possible outcomes, and the probability of each event (1/6 ¼ 0.167).
Notice that the sum of the probabilities across the events is 1.0.

Table 1.4

Outcome Probability
One 0.167
Two 0.167
Three 0.167
Four 0.167
Five 0.167
Six 0.167
Sum 1

Figure 1.5 shows exactly the same information as Table 1.4; both are examples of a
probability distribution. On the horizontal axis, we list each of the possible outcomes. On
the vertical axis is the probability. The height of each bar provides the probability of
observing each outcome. Since each outcome has an equal chance of being rolled, the
heights of the bars are all the same and show as 0.167, which is 1/N. Note that this is not an
empirical distribution, because we did not generate it from an experiment. Rather, it was
based on the assumption that all outcomes are equally likely.

1.0

0.8
Probability

0.6

0.4

0.2

0.0
One Two Three Four Five Six
Outcome

Figure 1.5 Probability distribution for rolling a fair die.

Again, this is an example of a discrete uniform probability distribution: discrete


because there are a discrete number of separate and distinct outcomes; uniform because
each and every event has the same probability.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

8 BAYESIAN STATISTICS FOR BEGINNERS

How would you change the table and probability distribution if the
die were loaded in favor of a four?

Answer: Your answer here!


There are many ways to do this. Suppose the probability of rolling a four is 0.4. Since all
the probabilities have to add up to 1.0, this leaves 0.6 to distribute among the remaining
five outcomes, or 0.12 for each (assuming these five are equally likely to turn up). If you roll
this die, it’s not a sure thing that you’ll end up with a four, but getting an outcome of four is
more likely than, say, getting a three (see Table 1.5).

Table 1.5

Outcome Probability
One 0.12
Two 0.12
Three 0.12
Four 0.4
Five 0.12
Six 0.12
Sum 1

The probabilities listed in the table sum to 1.0 just as before, as do the heights of the
corresponding blue bars in Figure 1.6.

1.0

0.8
Probability

0.6

0.4

0.2

0.0
One Two Three Four Five Six
Outcome

Figure 1.6 Probability distribution for rolling a


loaded die.

What would the probability distribution be for the bet?

Answer: The bet is that if you roll a four, you win $1,000,000, and if you don’t roll a
four, you lose $10,000. It would be useful to group our 6 possible outcomes into one of
two events. As the Oxford Dictionary of Statistics explains, “An event is a particular
collection of outcomes, and is a subset of the sample space.” Probabilities are assigned
to events.
Our two events are E1 ¼ {four} and E2 ¼ {one, two, three, five, six}. The brackets { }
indicate the set of outcomes that belong in each event. The first event contains one
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

INTRODUCTION TO PROBABILITY 9

outcome, while the second event consists of five possible outcomes. Thus, in probability
theory, outcomes can be grouped into new events at will. We started out by considering six
outcomes, and now we have collapsed those into two events.
Now we assign a probability to each event. We know that Pr(four) ¼ 0.4. What is the
probability of NOT rolling a four? That is the probability of rolling one OR two OR three OR
five OR six. Note that these events cannot occur simultaneously. Thus, we can write
Pr(four) as the SUM of the probabilities of events one, two, three, five, and six (which is
0.12 þ 0.12 þ 0.12 þ 0.12 þ 0.12 ¼ 0.6). Incidentally, the  sign means “complement of.” If
A is an event, A is its complement (i.e., everything but A). This is sometimes written as Ac.
The word OR is a tip that you ADD the individual probabilities together to get your answer
as long as the events are mutually exclusive (i.e., cannot occur at the same time).
This is an example of a fundamental rule in probability theory: if two or more events are
mutually exclusive, then the probability of any occurring is the sum of the probabilities of
each occurring. Because the different outcomes of each roll (i.e., rolling a one, two, three,
five, or six) are mutually exclusive, the probability of getting any outcome other than four is
the sum of the probability of each one occurring (see Table 1.6).

Table 1.6

Event Probability
Four 0.4
Not Four 0.6
Sum 1

Note that the probabilities of these two possible events sum to 1.0. Because of that, if we
know that Pr(four) is 0.4, we can quickly compute the Pr(four) as 1  0.4 ¼ 0.6 and save a
few mental calculations.
The probability distribution looks like the one shown in Figure 1.7:

1.0

0.8
Probability

0.6

0.4

0.2

0.0
Four Not Four
Event

Figure 1.7 Probability distribution for rolling a four


or not.

Remember: The SUM of the probabilities across all the different outcomes MUST EQUAL
1! In the discrete probability distributions above, this means that the heights of the bars
summed across all discrete outcomes (i.e., all bars) totals 1.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

10 BAYESIAN STATISTICS FOR BEGINNERS

Do you still want to play? At the end of the book, we will introduce decision trees, an
analytical framework that employs Bayes’ Theorem to aid in decision-making. But that
chapter is a long way away.

Do Bayesians think of probability as long-run averages?

Answer: We’re a few chapters away from hitting on this very important topic. You’ll see
that Bayesians think of probability in a way that allows the testing of theories and hypoth-
eses. But you have to walk before you run. What you need now is to continue learning the
basic vocabulary associated with probability theory.

What’s next?

Answer: In Chapter 2, we’ll expand our discussion of probability. See you there.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

CHAPTER 2

Joint, Marginal, and Conditional


Probability

Now that you’ve had a short introduction to probability, it’s time to build on our
probability vocabulary. By the end of this chapter, you will understand the
following terms:
• Venn diagram
• Marginal probability
• Joint probability
• Independent events
• Dependent events
• Conditional probability
Let’s start with a few questions.

What is an eyeball event?

A gala that celebrates the sense of vision? Nope. The eyeball event in this chapter
refers to whether a person is right-eyed dominant or left-eyed dominant. You already
know if you are left- or right-handed, but did you know that you are also left- or right-
eyed? Here’s how to tell (http://www.wikihow.com/Determine-Your-Dominant-Eye; see
Figure 2.1):

Figure 2.1 Determining your dominant eye.

Bayesian Statistics for Beginners: A Step-by-Step Approach. Therese M. Donovan and Ruth M. Mickey,
Oxford University Press (2019). © Ruth M. Mickey 2019.
DOI: 10.1093/oso/9780198841296.001.0001
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

12 BAYESIAN STATISTICS FOR BEGINNERS

1. Stretch your arms out in front of you and create a hole with your hands by joining your
finger tips to make a triangular opening, as shown.
2. Find a small object nearby and align your hands with it so that you can see it in the
triangular hole. Make sure you are looking straight at the object through your hands—
cocking your head to either side, even slightly, can affect your results. Be sure to keep
both eyes open!
3. Slowly move your hands toward your face to draw your viewing window toward you. As
you do so, keep your head perfectly still, but keep the object lined up in the hole between
your hands. Don’t lose sight of it.
4. Draw your hands in until they touch your face—your hands should end up in front of
your dominant eye. For example, if you find that your hands end up so you are looking
through with your right eye, that eye is dominant.

The eyeball characteristic has two discrete outcomes: lefty (for left-eyed dominant people)
or righty (for right-eyed dominant people). Because there are only two outcomes, we can
call them events if we want.
Let us suppose that you ask 100 people if they are “lefties” or “righties.” In this case,
the 100 people represent our “universe” of interest, which we designate with the letter
U. The total number of elements (individuals) in U is written |U|. (Once again, the
vertical bars here simply indicate that U is a number; it doesn’t mean the absolute value
of U.)
Here, there are only two possible events: “lefty” and “righty.” Together, they make up a
set of possible outcomes. Let A be the event “left-eye dominant,” and A be the event
“right-eye dominant.” Here, the tilde means “complement of,” and here it can be inter-
preted as “everything but A.” Notice that these two events are mutually exclusive: you
cannot be both a “lefty” and a “righty.” The events are also “exhaustive” because you must
be either a lefty or righty.
Suppose that 70 of 100 people are lefties. These people are a subset of the
larger population. The number of people in event A can be written | A |, and in this
example | A | ¼ 70. Note that | A | must be less than or equal to | A |, which is 100.
Remember that we use the vertical bars here to highlight that we are talking about a
number.
Since there are only two possibilities for eye dominance type, this means that 100  70 ¼ 30
people are righties. The number of people in event A can be written |A|, and in this
example |A| ¼ 30. Note that |A| must be less than or equal to |U|.
Our universe can be summarized as shown in Table 2.1.

Table 2.1

Event Frequency
Lefty (A) | A | ¼ 70
Righty (A) | A | ¼ 30
Universe (U) | U | ¼ 100

We can illustrate this example in a diagrammatic form, as shown in Figure 2.2. Here, our
universe of 100 people is captured inside a box.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

JOINT, MARGINAL, AND CONDITIONAL PROBABILITY 13

This is a Venn diagram, which shows A and A. The universe U is 100 people and is
represented by the entire box. We then allocate those 100 individuals into A and A. The blue
circle represents A; lefties stand inside this circle; righties stand outside the circle, but inside
the box. You can see that A consists of 70 elements, and A consists of 30 elements.

Lefty (A)

70

Righty (∼A) 30

Figure 2.2

Why is it called a Venn diagram?

Answer: Venn diagrams are named for John Venn (see Figure 2.3), who wrote his seminal
article in 1880.

Figure 2.3 John Venn.


OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

14 BAYESIAN STATISTICS FOR BEGINNERS

According to the MacTutor History of Mathematics Archive, Venn’s son described him as
“of spare build, he was throughout his life a fine walker and mountain climber, a keen
botanist, and an excellent talker and linguist.”

What is the probability that a person in universe U is in group A?

Answer: We write this as Pr(A). Remember that Pr stands for probability, so Pr(A) means
the probability that a person is in group A and therefore is a lefty. We can determine the
probability that a person is in group A as:

jAj 70
PrðAÞ ¼ ¼ ¼ 0:7: ð2:1Þ
j U j 100

Probability is determined as the number of persons in group A out of the total. The
probability that the randomly selected person is in group A is 0.7.

What about people who are not in group A?

Answer: There are 30 of them (100  70 ¼ 30), and they are righties.

jAj 30
Prð AÞ ¼ ¼ ¼ 0:3: ð2:2Þ
jU j 100

With only two outcomes for the eyeball event, our notation focuses on the probability
of being in a given group (A ¼ lefties) and the probability of not being in the given group
(A ¼ righties).

I’m sick of eyeballs. Can we consider another characteristic?

Answer: Yes, of course. Let’s probe these same 100 people and find out other details about
their anatomy. Suppose we are curious about the presence or absence of Morton’s toe.
People with “Morton’s toe” have a large second metatarsal, longer in fact than the first
metatarsal (which is also known the big toe or hallux toe). Wikipedia articles suggest that
this is a normal variation of foot shape in humans and that less than 20% of the human
population have this condition. Now we are considering a second characteristic for our
population, namely toe type.
Let’s let B designate the event “Morton’s toe.” Let the number of people with Morton’s
toe be written as |B|. Let B designate the event “common toe.” Suppose 15 of the 100
people have Morton’s toe. This means |B| ¼ 15, and |B| ¼ 85. The data are shown in
Table 2.2, and the Venn diagram is shown in Figure 2.4.

Table 2.2

Event Frequency
Morton’s toe (B) |B| ¼ 15
Common toe (B) |B| ¼ 85
Universe (U) |U| ¼ 100
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

JOINT, MARGINAL, AND CONDITIONAL PROBABILITY 15

These events can be represented in a Venn diagram, where a box holds our universe of
100 people.
Note the size of this red circle is smaller than the previous example because the number of
individuals with Morton’s toe is much smaller.

Morton’s toe (B)


15

Common toe (∼B) 85

Figure 2.4

Can we look at both characteristics simultaneously?

Answer: You bet. With two characteristics, each with two outcomes, we have four possible
combinations:
1. Lefty AND Morton’s toe, which we write as A \ B.
2. Lefty AND common toe, which we write as A \B.
3. Righty AND Morton’s toe, which we write as A \ B.
4. Righty AND common toe, which we write as A \B.
The upside-down \ is the mathematical symbol for intersection. Here, you can read it as
“BOTH” or “AND.”
The number of individuals in A \ B can be written j A \ B j, where the bars indicate a
number (not absolute value). Let’s suppose we record the frequency of individuals in each
of the four combinations (see Table 2.3).

Table 2.3

Lefty (A) Righty (A) Sum


Morton’s toe (B) 0 15 15
Common toe (B) 70 15 85
Sum 70 30 100

Let’s study this table carefully.


Notice that this table has four “quadrants,” so to speak. Our actual values are stored in the
upper left quadrant, shaded dark blue. The upper right quadrant (shaded light blue) sums
the number of people with and without Morton’s toe. The lower left quadrant (shaded light
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

16 BAYESIAN STATISTICS FOR BEGINNERS

blue) sums the number of people that are lefties and righties. The lower right (white)
quadrant gives the grand total, or |U|.
Now let’s plot the results in the same Venn diagram (see Figure 2.5).

Lefty (A)

70 Morton’s toe (B)


15

Righty with common toe (∼A and ∼B): 15

Figure 2.5

The updated Venn diagram shows the blue circle with 70 lefties, the red circle with
15 people with Morton’s toe, and no overlap between the two. This means j A \ B j ¼ 0,
j A j ¼ j A \ B j ¼ 70, and j B j ¼ j  A \ B j ¼ 15. By subtraction, we know the number of
individuals that are not in A OR B j A \ B j is 15 because we need to account for all 100
individuals somewhere in the diagram.

Is it possible to have Morton’s toe AND be a lefty?

Answer: For our universe, no. If you are a person with Morton’s toe, you are standing in
the red circle and cannot also be standing in the blue circle. So these two events
(Morton’s toe and lefties) are mutually exclusive because they do not occur at the
same time.

Is it possible NOT to have Morton’s toe if you are a lefty?

Answer: You bet. All 70 lefties do not have Morton’s toe. These two events are non-
mutually exclusive.

What if five lefties also have Morton’s toe?

Answer: In this case, we need to adjust the Venn diagram to show that five of the people
that are lefties also have Morton’s toe. These individuals are represented as the intersection
between the two events (see Figure 2.6). Note that the total number of individuals is still
100; we need to account for everyone!
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

JOINT, MARGINAL, AND CONDITIONAL PROBABILITY 17

Lefty (A)

Morton’s toe (B)


65 5 10

Righty with common toe (∼A and ∼B): 20

Figure 2.6

We’ll run with this example for the rest of the chapter.
This Venn diagram is pretty accurate (except for the size of the box overall). There are
70 people in A, 15 people in B, and 5 people in A \ B. The blue circle contains 70 elements
in total, and 5 of those elements also occur in B. The red circle contains 15 elements (so is a
lot smaller than the blue circle), and 5 of these are also in A. So 5/15 (33%) of the red circle
overlaps with the blue circle, and 5/70 (7%) of the blue circle overlaps with the red circle.

Of the four events (A, A, B, and B), which are not
mutually exclusive?

Answer:
• A and B are not mutually exclusive (a lefty can have Morton’s toe).
• A and B are not mutually exclusive (a lefty can have a common toe).
• A and B are not mutually exclusive (a righty can have Morton’s toe).
• A and B are not mutually exclusive (a righty can have a common toe).
In Venn diagrams, if two events overlap, they are not mutually exclusive.

Are any events mutually exclusive?

Answer: If we focus on each circle, A and A are mutually exclusive (a person cannot be a
lefty and righty). B and B are mutually exclusive (a person cannot have Morton’s toe and a
common toe). Apologies for the trick question!

If you were one of the lucky 100 people included in the universe,
where would you fall in this diagram?

Answer: Your answer here!


It’s handy to look at the numbers in a table format too. Table 2.4 shows the same values as
Figure 2.6.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

18 BAYESIAN STATISTICS FOR BEGINNERS

Table 2.4

Lefty (A) Righty (A) Sum


Morton’s toe (B) 5 10 15
Common toe (B) 65 20 85
Sum 70 30 100

This is an important table to study. Once again, notice that there are four quadrants or
sections in this table. In the upper left quadrant, the first two columns represent the two
possible events for eyeball dominance: lefty and righty. The first two rows represent the two
possible events for toe type: Morton’s toe and common toe.
The upper left entry indicates that 5 people are members of both A and B.
• Look for the entry that indicates that 20 people are A \B, that is, both not A and not B.
• Look for the entry that indicates that 65 people are A \B.
• Look for the entry that indicates that 10 people are A \ B.
The lower left and upper right quadrants of our table are called the margins of the table.
They are shaded light blue. Note that the total number of individuals in A (regardless of B)
is 70, and the total number of individuals in A is 30. The total number of individuals in B
(regardless of A) is 15 and the total number of individuals B is 85. Any way you slice it, the
grand total must equal 100 (the lower right quadrant).

What does this have to do with probability?

Answer: Well, if you are interested in determining the probability that an individual
belongs to any of these four groups, you could use your universe of 100 individuals to do
the calculation. Do you remember the frequentist way to calculate probability? We
learned about that in Chapter 1.

jnumber of observed outcomes of interestj


Pr ¼ : ð2:3Þ
jU j

Our total universe in this case is the 100 individuals. To get the probability that a person
selected at random would belong to a particular event, we simply divide the entire table
above by our total, which is 100, and we get the results shown in Table 2.5.

Table 2.5

Lefty (A) Righty (A) Sum


Morton’s toe (B) 0.05 0.1 0.15
Common toe (B) 0.65 0.2 0.85
Sum 0.7 0.3 1

We’ve just converted the raw numbers to probabilities by dividing the frequency table by
the grand total.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

JOINT, MARGINAL, AND CONDITIONAL PROBABILITY 19

Note that this differs from our die rolling exercise in Chapter 1, where you were unsure
what the probability was and had to repeatedly roll a die to estimate it. By the Law of Large
Numbers, the more trials you have, the more you zero in on the actual probability. In this
case, however, we are given the number of people in each category, so the calculation is
straightforward. These 100 people are the only people of interest. They represent our
universe of interest; we are not using them to sample a larger group. If you didn’t know the
make-up of the universe, you could randomly select one person out of the universe over
and over again to get the probabilities, where all persons are equally likely to be selected.

Let’s walk through one calculation. Suppose we want to know the probability that an
individual is a lefty AND has Morton’s toe. The number of individuals in A and B is
written jA \ Bj and the probability that an individual is a lefty with Morton’s toe is
written:

jA \ Bj 5
PrðA \ BÞ ¼ ¼ ¼ 0:05: ð2:4Þ
jU j 100

This is officially called the joint probability and is the upper left entry in our table. The
Oxford Dictionary of Statistics states that the “joint probability of a set of events is the
probability that all occur simultaneously.” Joint probabilities are also called conjoint
probabilities. Incidentally, a table that lists joint probabilities such as the one above is
sometimes referred to as a conjoint table.
When you hear the word joint, you should think of the word AND and realize that you
are considering (and quantifying) more than one characteristic of the population. In this
case, it indicates that someone is in A AND B. This is written as:

PrðA \ BÞ : : : or; equivalently; : : : PrðB \ AÞ: ð2:5Þ

What is the probability that a person selected at random is a righty


and has Morton’s toe?

Answer: This is equivalent to asking, what is the joint probability that a person is
right-eye dominant AND has Morton’s toe? See if you can find this entry in Table 2.5. The
answer is 0.1.
In addition to the joint probabilities, the table also provides the marginal probabilities,
which look at the probability of A or A (regardless of B) and the probability of B or B
(regardless of A).

What does the word “marginal” mean?

Answer: The word marginal in the dictionary is defined as “pertaining to the margins; or
situated on the border or edge.” In our table, the marginal probabilities are just the
probabilities for one characteristic of interest (e.g., A and A) regardless of other character-
istics that might be listed in the table.
Let’s now label each cell in our conjoint table by its probability type (see Table 2.6).
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

20 BAYESIAN STATISTICS FOR BEGINNERS

Table 2.6

Lefty (A) Righty (A) Sum


Morton’s toe (B) Joint Joint Marginal
Common toe (B) Joint Joint Marginal
Sum Marginal Marginal Total

Suppose you know only the following facts: the marginal probability of being a lefty is
0.7, the marginal probability of having Morton’s toe is 0.15, and the joint probability of
being a lefty with Morton’s toe is 0.05. Also suppose that you haven’t looked at
Table 2.6!

Can you fill in the empty cells in Table 2.7?

Take out some scratch paper and a pencil. You can do it! Here are some hints:
• the lower right hand quadrant must equal 1.00;
• for any given characteristic, the sum of the two marginal probabilities must equal 1.00.

Table 2.7

Lefty (A) Righty (A) Sum


Morton’s toe (B) 0.05 ? 0.15
Common toe (B) ? ? ?
Sum 0.7 ? ?

Answer: Because the marginal probabilities for eyeballs must sum to 1.00, and the mar-
ginal probabilities for toes must sum to 1.00 (because they deal with mutually exclusive
events), we can fill in the missing marginal probabilities.
The marginal probability of a lefty, Pr(A), is 0.7, so the marginal of a righty, Pr(A), must
be 1.00  0.7 ¼ 0.3.
The marginal probability of having Morton’s toe, Pr(B), is 0.15, so the marginal of Pr(B)
must be 1.00  0.15 ¼ 0.85.
So far, so good. Once we know the marginals, we can calculate the joint probabilities in
the upper left quadrant (see Table 2.8). For example:

• if the marginal Pr(A) ¼ 0.7, then we know that PrðA \ BÞ ¼ 0.7  0.05 ¼ 0.65;
• if the marginal Pr(B) ¼ 0.15, then we know that Prð A \ BÞ ¼ 0.15  0.05 ¼ 0.1;
• if the marginal Pr(B) ¼ 0.3, then we know that Prð A \ BÞ ¼ 0.3  0.1 ¼ 0.2.

Table 2.8

Lefty (A) Righty (A) Sum


Morton’s toe (B) 0.05 0.1 0.15
Common toe (B) 0.65 0.2 0.85
Sum 0.7 0.3 1
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

JOINT, MARGINAL, AND CONDITIONAL PROBABILITY 21

Quickly: What is the marginal probability of having Morton’s


toe with this conjoint table?

Answer: The marginal probability of having Morton’s toe is written:

PrðBÞ ¼ 0:15 ð2:6Þ

Can you express the marginal probability of having Morton’s


toe as the sum of joint probabilities?

Answer: Don’t cheat now . . . try to express Pr(B) as the sum of joint probabilities before
reading on! This step is essential for understanding Bayesian inference in future chapters!
How did you do?
Hint 1: We can decompose the total, 0.15, into its two pieces: 0.05 þ 0.1.
The probability that a lefty has Morton’s toe can be written:

PrðA \ BÞ: ð2:7Þ

The probability that a righty has Morton’s toe can be written:

PrðA \ BÞ: ð2:8Þ

If we put these two terms together, we can express the marginal probability of having
Morton’s toe as:

PrðBÞ ¼ PrðA \ BÞ þ PrðA \ BÞ ð2:9Þ

PrðBÞ ¼ 0:05 þ 0:1 ¼ 0:15: ð2:10Þ

Can we look at this problem from the Venn diagram perspective again?

Of course! Here it is in Figure 2.7.

Lefty

Morton’s toe
65 5 10

Righty with common toe (∼A and ∼B): 20

Figure 2.7

We are now poised to ask some very interesting questions.


OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

22 BAYESIAN STATISTICS FOR BEGINNERS

If you have Morton’s toe, does that influence your probability of


being a lefty?

Answer: To answer this question, we must introduce the very important concept of
conditional probability.

What is conditional probability?

Answer: Conditional probability is the probability of an event given that another event
has occurred.
Conditional probability is written as:

• Pr(A|B), which is read “the probability of A, given that B occurs”; in our context, Pr(A|B)
is Pr(lefty | Morton’s toe);
• Pr(A|B), which is read “the probability of A, given that B occurs”; in our context,
Pr(A|B) is Pr(lefty | common toe);
• Pr(B|A), which is read “the probability of B, given that A occurs”; in our context,
Pr(B|A) is Pr(Morton’s toe | righty);
• etc.
The vertical bar means “given.”

How exactly do you calculate the probability that a person is a


lefty, given the person has Morton’s toe?

Answer: You use the following equation, which is a standard equation in probability
theory:

PrðA \ BÞ
PrðAjBÞ ¼ : ð2:11Þ
PrðBÞ

It’s essential that you understand conditional probability, so let’s look at this equation from
a few different angles and, in the words of Kalid Azad, “let’s build some intuition” about
what it means.
Angle 1: The Venn diagram zoom
We already know that the numerator

PrðA \ BÞ ð2:12Þ

is the intersection in the Venn diagram where A and B overlap (the probability of a lefty and
Morton’s toe). This can be written as

PrðB \ AÞ ð2:13Þ

as well. The intersection of A and B is the intersection, no matter how you write it:

PrðA \ BÞ ¼ PrðB \ AÞ: ð2:14Þ

And we know that the denominator Pr(B) is the probability of Morton’s toe.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

JOINT, MARGINAL, AND CONDITIONAL PROBABILITY 23

In the Venn diagram, we can focus on the area of B and then look to see what fraction of
the total B is occupied by A. In this example, we restrict our attention to the 15 people with
Morton’s toe, and note that 5 of them are lefties. Therefore, about 5/15 or 1/3 of the red
circle is overlapped by the blue circle.

Morton’s toe (B)

A ∩ B: 5 ∼A ∩ B: 10

Figure 2.8

For the numbers given, we can see that Pr(A | B) ¼ 5/15 ¼ 1/3 ¼ 0.333 ¼ 33.3%. A general
rule can help with the visualization: zoom to the denominator space B, then determine
what fraction of this space is occupied by A. Similarly, Pr(A | B) ¼ 10/15 ¼ 2/3 ¼ 0.667 ¼
66.7%. Note that these probabilities sum to 1.
Angle 2: The table approach
We can also tackle this problem using the raw data (see Table 2.9).

Table 2.9

Lefty (A) Righty (A) Sum


Morton’s toe (B) 5 10 15
Common toe (B) 65 20 85
Sum 70 30 100

Here’s the key equation again:

PrðA \ BÞ
PrðA j BÞ ¼ : ð2:15Þ
PrðBÞ

In words, this equation says, what fraction of B consists of A \ B?


From our table, we calculated PrðA \ BÞ as:

jA \ Bj 5
PrðA \ BÞ ¼ ¼ : ð2:16Þ
jU j 100

And we know that Pr(B) is:

jBj 15
PrðBÞ ¼ ¼ : ð2:17Þ
jU j 100

Now we can calculate the probability of A given B as:


jA\Bj
jU j jA \ Bj 5
PrðA j BÞ ¼ ¼ ¼ ¼ 0:333: ð2:18Þ
jBj jBj 15
j Uj
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

24 BAYESIAN STATISTICS FOR BEGINNERS

So if you have Morton’s toe, does that influence your probability


of being a lefty?

Answer: If you have Morton’s toe, the probability of being a lefty is 0.33. If you don’t have
Morton’s toe, the probability of being a lefty is 65/85 ¼ 0.77 (you can confirm this too).
If Morton’s toe does not matter, these conditional probabilities should be equal to the
marginal probability, which is 0.7. This is clearly not the case here.

Does Pr(A | B) ¼ Pr(B | A)?

Answer: In other words, is the probability of a lefty, given Morton’s toe, the same thing as
the probability of Morton’s toe, given a lefty? Let’s try it!
jA\Bj
jU j jA \ Bj 5
PrðA j BÞ ¼ ¼ ¼ ¼ 0:333 ð2:19Þ
jBj j Bj 15
jU j

jA\Bj
jU j jA \ Bj 5
PrðB j AÞ ¼ ¼ ¼ ¼ 0:072: ð2:20Þ
jAj jAj 70
jU j

So the answer is No! These two probabilities are very different things. The first asks what is
the probability of A given that event B happens (with a result of 0.333), while the second
asks what is the probability of B given that A happens (with a result of 0.072).

Can you calculate the conditional probability of being a lefty,


given you have Morton’s toe, from our conjoint table instead
of the raw numbers?

Answer: Yes . . . see if you can find it before looking at Table 2.10!

Table 2.10

Lefty (A) Righty (A) Sum


Morton’s toe (B) 0.05 0.1 0.15
Common toe (B) 0.65 0.2 0.85
Sum 0.7 0.3 1

Remember, when dealing with conditional probabilities, the key word is “zoom.” Let’s
start with Pr(A | B):

PrðA \ BÞ
PrðA j BÞ ¼ : ð2:21Þ
PrðBÞ

If B happens, we zoom to row 1 (Morton’s toe), and then ask what fraction of the people
with Morton’s toe are lefties:

:05
PrðA j BÞ ¼ ¼ 0:333: ð2:22Þ
0:15
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

JOINT, MARGINAL, AND CONDITIONAL PROBABILITY 25

Can you calculate conditional probability of having Morton’s toe,


given you are a lefty, from our conjoint table?

Answer: If A happens, we zoom to the first column (lefties) and then ask what fraction of
the lefties have Morton’s toe:

0:05
PrðB j AÞ ¼ ¼ 0:072: ð2:23Þ
0:7

If we know the conditional and marginal probabilities, can we


calculate the joint probabilities?

Yes! Don’t forget this fundamental equation:

PrðA \ BÞ
PrðA j BÞ ¼ : ð2:24Þ
PrðBÞ

You can rearrange this to your heart’s content. For this book, the most important
rearrangement is:

PrðA \ BÞ ¼ PrðA j BÞ ∗ PrðBÞ: ð2:25Þ

This formula can be used to calculate joint probability, PrðA \ BÞ. Take some time to make
sure this equation sinks in and makes full sense to you.
As an aside, if the occurrence of one event does not change the probability of the
other occurring, the two events are said to be independent. This means that
PrðA j BÞ ¼ PrðAj  BÞ ¼ PrðAÞ.
So, when A and B are independent:

PrðA \ BÞ ¼ PrðAÞ ∗ PrðBÞ: ð2:26Þ

Are Pr(A|B) and Pr(B|A) related in some way?

Answer: That, dear reader, is the subject of our next chapter, where we will derive Bayes’
Theorem. See you there!
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

SECTION 2

Bayes’ Theorem and


Bayesian Inference

Overview

Welcome to Section 2! This section provides an introduction to Bayesian inference and


provides three (hopefully) fun examples to get your feet wet.
This section consists of 5 chapters.

• In Chapter 3, Bayes’ Theorem is introduced. The chapter shows its derivation and
describes two ways to think about it. First, Bayes’ Theorem describes the relationship
between two inverse conditional probabilities, P(A|B) and P(B|A). Second, Bayes’ The-
orem can be used to express how a degree of belief for a given hypothesis can be updated
in light of new evidence. This chapter focuses on the first interpretation.
• Chapter 4 introduces the concept of Bayesian inference. The chapter discusses the
scientific method, and illustrates how Bayes’ Theorem can be used for scientific inference.
Bayesian Inference is the use of Bayes’ Theorem to draw conclusions about a set of
mutually exclusive and exhaustive alternative hypotheses by linking prior knowledge
about each hypothesis with new data. The result is updated probabilities for each hy-
pothesis of interest. The ideas of prior probabilities, likelihood, and posterior probabilities
are introduced.
• Chapter 5, the “Author Problem,” provides a concrete example of Bayesian inference.
This chapter draws on work by Frederick Mosteller and David Wallace, who used Bayesian
inference to assign authorship for unsigned Federalist Papers. The Federalist Papers were a
collection of papers known to be written during the American Revolution. However,
some papers were unsigned by the author, resulting in disputed authorship. The chapter
provides a very basic Bayesian analysis of the unsigned “Paper 54,” which was written by
Alexander Hamilton or James Madison. The example illustrates the principles of Bayesian
inference for two competing hypotheses.
• Chapter 6, the “Birthday Problem,” is intended to highlight the decisions the analyst
(you!) must make in setting the prior distribution. The “Birthday Problem” expands
consideration from two hypotheses to multiple, discrete hypotheses. In this chapter,
interest is in determining the posterior probability that a woman named Mary was born
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

28 BAYESIAN STATISTICS FOR BEGINNERS

in a given month; there are 12 alternative hypotheses. Furthermore, consideration is


given to assigning prior probabilities. The priors represent a priori probabilities that each
alternative hypothesis is correct, where a priori means “prior to data collection,” and can
be “informative” or “non-informative.” A Bayesian analysis cannot be conducted with-
out using a prior distribution. The concept of likelihood is explored more deeply.
• Chapter 7, the “Portrait Problem,” highlights the fact that multiple pieces of information
can be used in a Bayesian analysis. A key concept in this chapter is that multiple sources
of data can be combined in a Bayesian inference framework. The main take home point is
that Bayesian analysis can be very, very flexible. A Bayesian analysis is possible as long as
the likelihood of observing the data under each hypothesis can be computed.

By the end of this section, you will have a good understanding of how Bayes’ Theorem is
related to the scientific method.
OUP CORRECTED PROOF – FINAL, 6/5/2019, SPi

CHAPTER 3

Bayes’ Theorem

In this chapter, we’re going to build on the content in Section 1 and derive Bayes’ Theorem.
This is what you’ve been waiting for!
By the end of this chapter, you will be able to derive Bayes’ Theorem and explain the
relationship between PrðA j BÞ and PrðB j AÞ.
Let’s begin with a few questions.

First, who is Bayes?

Answer: Thomas Bayes (1701–1761) was an English mathematician and Presbyterian


minister, known for having formulated a specific case of the theorem that bears his
name, Bayes’ Theorem.

Is that really a picture of Thomas Bayes in Figure 3.1?

Answer: It could be, but nobody is really sure! We’ll revisit this question in a future chapter.

Figure 3.1 “Thomas Bayes” (Photocopied


from Terrence O’Donnell)

Bayesian Statistics for Beginners: A Step-by-Step Approach. Therese M. Donovan and Ruth M. Mickey,
Oxford University Press (2019). © Ruth M. Mickey 2019.
DOI: 10.1093/oso/9780198841296.001.0001
Another random document with
no related content on Scribd:
Transcriber’s Note
Page 67: “wen trooping by” changed to “went trooping by”
*** END OF THE PROJECT GUTENBERG EBOOK THE COAT
WITHOUT A SEAM, AND OTHER POEMS ***

Updated editions will replace the previous one—the old editions


will be renamed.

Creating the works from print editions not protected by U.S.


copyright law means that no one owns a United States copyright
in these works, so the Foundation (and you!) can copy and
distribute it in the United States without permission and without
paying copyright royalties. Special rules, set forth in the General
Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.

START: FULL LICENSE


THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the


free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree to
abide by all the terms of this agreement, you must cease using
and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only


be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project
Gutenberg™ works in compliance with the terms of this
agreement for keeping the Project Gutenberg™ name
associated with the work. You can easily comply with the terms
of this agreement by keeping this work in the same format with
its attached full Project Gutenberg™ License when you share it
without charge with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.

1.E. Unless you have removed all references to Project


Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:

This eBook is for the use of anyone anywhere in the United


States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it
away or re-use it under the terms of the Project Gutenberg
License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country where
you are located before using this eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is


derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of the
copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is


posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute
this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must, at
no additional cost, fee or expense to the user, provide a copy, a
means of exporting a copy, or a means of obtaining a copy upon
request, of the work in its original “Plain Vanilla ASCII” or other
form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or


providing access to or distributing Project Gutenberg™
electronic works provided that:

• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt that
s/he does not agree to the terms of the full Project Gutenberg™
License. You must require such a user to return or destroy all
copies of the works possessed in a physical medium and
discontinue all use of and all access to other copies of Project
Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project


Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite
these efforts, Project Gutenberg™ electronic works, and the
medium on which they may be stored, may contain “Defects,”
such as, but not limited to, incomplete, inaccurate or corrupt
data, transcription errors, a copyright or other intellectual
property infringement, a defective or damaged disk or other
medium, a computer virus, or computer codes that damage or
cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES -


Except for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU
AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE,
STRICT LIABILITY, BREACH OF WARRANTY OR BREACH
OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE
TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER
THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR
ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE
OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF
THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If


you discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person or
entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set


forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR
ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the


Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you do
or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.

Section 2. Information about the Mission of


Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status by
the Internal Revenue Service. The Foundation’s EIN or federal
tax identification number is 64-6221541. Contributions to the
Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.

The Foundation’s business office is located at 809 North 1500


West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws


regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or
determine the status of compliance for any particular state visit
www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states


where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot


make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.

Please check the Project Gutenberg web pages for current


donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.

Section 5. General Information About Project


Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.

Project Gutenberg™ eBooks are often created from several


printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.

You might also like