Notebook
Notebook
Notebook
↪aec88b30af87fad8d45da7e774223f91dad09e88/lh_data.csv"
lefthanded_data = pd.read_csv(data_url_1)
1
ax.plot('Age', 'Female', data = lefthanded_data, marker = 'o') # plot "Female"␣
↪vs. "Age"
[128]: Text(0,0.5,'Age')
2
fig, ax = plt.subplots()
ax.plot('Birth_year', 'Mean_lh', data = lefthanded_data) # plot 'Mean_lh' vs.␣
↪'Birth_year'
[130]: Text(0,0.5,'Mean_lh')
𝑃 (𝐿𝐻|𝐴)𝑃 (𝐴)
𝑃 (𝐴|𝐿𝐻) =
𝑃 (𝐿𝐻)
P(LH | A) is the probability that you are left-handed given that you died at age A. P(A) is the
overall probability of dying at age A, and P(LH) is the overall probability of being left-handed. We
will now calculate each of these three quantities, beginning with P(LH | A).
3
To calculate P(LH | A) for ages that might fall outside the original data, we will need to extrapolate
the data to earlier and later years. Since the rates flatten out in the early 1900s and late 1900s,
we’ll use a few points at each end and take the mean to extrapolate the rates on each end. The
number of points used for this is arbitrary, but we’ll pick 10 since the data looks flat-ish until about
1910.
[132]: # import library
# ... YOUR CODE FOR TASK 3 ...
import numpy as np
# create a function for P(LH | A)
def P_lh_given_A(ages_of_death, study_year = 1990):
""" P(Left-handed | ages of death), calculated based on the reported rates␣
↪of left-handedness.
# Use the mean of the 10 last and 10 first points for left-handedness rates␣
↪before and after the start
early_1900s_rate = lefthanded_data['Mean_lh'][-10:].mean()
late_1900s_rate = lefthanded_data['Mean_lh'][:10].mean()
middle_rates = lefthanded_data.loc[lefthanded_data['Birth_year'].
↪isin(study_year - ages_of_death)]['Mean_lh']
return P_return
4
[134]: # Death distribution data for the United States in 1999
data_url_2 = "https://gist.githubusercontent.com/mbonsma/
↪2f4076aab6820ca1807f4e29f75f18ec/raw/
↪62f3ec07514c7e31f5979beeca86f19991540796/cdc_vs00199_table310.tsv"
ax.set_xlabel('Age')
ax.set_ylabel('Both Sexes')
5
0.5 5. The overall probability of left-handedness
In the previous code block we loaded data to give us P(A), and now we need P(LH). P(LH) is
the probability that a person who died in our particular study year is left-handed, assuming we
know nothing else about them. This is the average left-handedness in the population of deceased
people, and we can calculate it by summing up all of the left-handedness probabilities for each age,
weighted with the number of deceased people at each age, then divided by the total number of
deceased people to get a probability. In equation form, this is what we’re calculating, where N(A)
is the number of people who died at age A (given by the dataframe death_distribution_data):
[136]: def P_lh(death_distribution_data, study_year = 1990): # sum over P_lh for each␣
↪age group
""" Overall probability of being left-handed if you died in the study year
Input: dataframe of death distribution data, study year
Output: P(LH), a single floating point number """
p_list = death_distribution_data['Both Sexes'] *␣
↪P_lh_given_A(death_distribution_data['Age'], study_year) # multiply number␣
print(P_lh(death_distribution_data))
0.07766387615350638
𝑃 (𝐿𝐻|𝐴)𝑃 (𝐴)
𝑃 (𝐴|𝐿𝐻) =
𝑃 (𝐿𝐻)
6
P_lh_A = P_lh_given_A(ages_of_death, study_year) # use P_lh_given_A to get␣
↪ probability of left-handedness for a certain age
return P_lh_A*P_A/P_left
7
0.9 9. Moment of truth: age of left and right-handers at death
Finally, let’s compare our results with the original study that found that left-handed people were
nine years younger at death on average. We can do this by calculating the mean of these probability
distributions in the same way we calculated P(LH) earlier, weighting the probability distribution
by age and summing over the result.
8
print("The difference in average ages is " + str(round(average_rh_age -␣
↪average_lh_age, 1)) + " years.")
9
The difference in average ages is 2.3 years.
10