Mathematics and Statistics in Biology
Mathematics and Statistics in Biology
Mathematics and Statistics in Biology
Using BioInteractive
Resources to Teach
Mathematics and
Statistics in
Biology
Paul Strode, PhD
Fairview High School
Boulder, Colorado
Ann Brokaw
Rocky River High School
Rocky River, Ohio
Using BioInteractive Resources to Teach Mathematics and Statistics in Biology Pg. 1
Version: October 2015
Using BioInteractive Resources to Teach
This guide is not meant to be a textbook on statistics; it only covers topics most relevant to high school biology,
focusing on methods and examples rather than theory. It is organized in four parts:
• Part 1 covers descriptive statistics, methods used to organize, summarize, and describe quantifiable
data. The methods include ways to describe the typical or average value of the data and the spread of
the data.
• Part 2 covers statistical methods used to draw inferences about populations on the basis of
observations made on smaller samples or groups of the population—a branch of statistics known as
inferential statistics.
• Part 3 describes other mathematical methods commonly taught in high school biology, including
frequency and rate calculations, Hardy-Weinberg calculations, probability, and standard curves.
• Part 4 provides a chart of activities on the BioInteractive website that use math and statistics methods.
A first draft of the guide was published in July 2014. It has been revised based on user feedback and expert
review, and this version was published in October 2015. The guide will continue to be updated with new
content and based on ongoing feedback and review.
For a more comprehensive discussion of statistical methods and additional classroom examples, refer to
John McDonald’s Handbook of Biological Statistics, http://www.biostathandbook.com, and the College Board’s
AP Biology Quantitative Skills: A Guide for Teachers,
http://apcentral.collegeboard.com/apc/public/repository/AP_Bio_Quantitative_Skills_Guide-2012.pdf.
Listed below are the universal statistical symbols and equations used in this guide. The calculations can all be
done using scientific calculators or the formula function in spreadsheet programs.
!: Total number of individuals in a population (i.e., the total number of butterflies of a particular species)
#: Total number of individuals in a sample of a population (i.e., the number of butterflies in a net)
df: The number of measurements in a sample that are free to vary once the sample mean has been
calculated; in a single sample, df = # – 1
S: Summation
*+
&: Sample mean & =
,
∑ *+ 0 * 1
- . : Sample variance - . =
, 0 2
2.785
95% CI: 95% confidence interval 95% CI =
,
|*: 0 *1 |
t-test: tobs =
;1 1
: = ;1
<: <1
?0@ 1
Chi-square test (> . ): > . =
@
< B+ C B E+ C E
+D: ;B ;E
Linear regression test: A =
, 0 2
Table 1. Beak Depth Measurements in a Sample of Medium Ground Finches from Daphne Major
Note: “Band” refers to an individual’s identity—more specifically, the number on a metal leg band it
was given. Fifty individuals died in 1977 (nonsurvivors) and 50 survived beyond 1977 (survivors), the
year of the drought.
One of the first steps in analyzing a small data set like the one shown in Table 1 is to graph the data and
examine the distribution. Figure 1 shows two graphs of beak measurements. The graph on the top shows beak
measurements of finches that died during the drought. The graph on the bottom shows beak measurements of
finches that survived the drought.
Beak Depths of 50 Medium Ground Finches That Did Not Survive the Drought
Figure 1. Distributions of Beak Depth Measurements in Two Groups of Medium Ground Finches
Notice that the measurements tend to be more or less symmetrically distributed across a range, with most
measurements around the center of the distribution. This is a characteristic of a normal distribution. Most
statistical methods covered in this guide apply to data that are normally distributed, like the beak
measurements above; other types of distributions require either different kinds of statistics or transforming
data to make them normally distributed.
How would you describe these two graphs? How are they the same or different? Descriptive statistics allows
you to describe and quantify these differences. The rest of Part 1 of this guide provides step-by-step
instructions for calculating mean, standard deviation, standard error, and other descriptive statistics.
Mean
You calculate the sample mean (also referred to as the average or arithmetic mean) by summing all the data
points in a data set ( X) and then dividing this number by the total number of data points (N):
What we want to understand is the mean of the entire population, which is represented by µ. They use the
sample mean, represented by &, as an estimate of µ.
Application in Biology
Students in a biology class planted eight bean seeds in separate plastic cups and placed them under a bank of
fluorescent lights. Fourteen days later, the students measured the height of the bean plants that grew from
those seeds and recorded their results in Table 2.
Median
When the data are ordered from the largest to the smallest, the median is the midpoint of the data. It is not
distorted by extreme values, or even when the distribution is not normal. For this reason, it may be more
useful for you to use the median as the main descriptive statistic for a sample of data in which some of the
measurements are extremely large or extremely small.
Application in Biology
A researcher studying mouse behavior recorded in Table 3 the time (in seconds) it took 13 different mice to
locate food in a maze.
Mode
The mode is another measure of the average. It is the value that appears most often in a sample of data. In the
example shown in Table 3, the mode is 33 seconds.
The mode is not typically used as a measure of central tendency in biological research, but it can be useful in
describing some distributions. For example, Figure 2 shows a distribution of body lengths with two peaks, or
modes—called a bimodal distribution. Describing these data with a measure of central tendency like the mean
or median would obscure this fact.
Figure 2. Graph of Body Lengths of Weaver Ant Workers (Reproduced from
http://en.wikipedia.org/wiki/File:BimodalAnts.png.)
Range
The simplest measure of variability in a sample of normally distributed data is the range, which is the
difference between the largest and smallest values in a set of data.
Application in Biology
Students in a biology class measured the width in centimeters of eight leaves from eight different maple trees
and recorded their results in Table 4.
∑ (*+ 0 *)1
s =
(, 0 2)
Calculation Steps
1. Calculate the mean (&) of the sample.
2. Find the difference between each measurement (& i) in the data set and the mean (&) of the entire set:
(& i − &)
3. Square each difference to remove any negative values:
(& i − &)2
4. Add up (sum, S) all the squared differences:
S (& i − &)2
5. Divide by the degrees of freedom (df), which is 1 less than the sample size (n – 1):
∑ (&' − &).
(# − 1)
Note that the number calculated at this step provides a statistic called variance (s2). Variance is a measure of
variability that is used in many statistical methods. It is the square of the standard deviation.
6. Take the square root to calculate the standard deviation (s) for the sample.
Application in Biology
You are interested in knowing how tall bean plants (Phaseolus vulgaris) grow in two weeks after planting. You
plant a sample of 20 seeds (n = 20) in separate pots and give them equal amounts of water and light. After two
weeks, 17 of the seeds have germinated and have grown into small seedlings (now n = 17). You measure each
plant from the tips of the roots to the top of the tallest stem. You record the measurements in Table 5, along
with the steps for calculating the standard deviation.
The mean height of the bean plants in this sample is 103 millimeters ±11.7 millimeters. What does this tell us?
In a data set with a large number of measurements that are normally distributed, 68.3% (or roughly two-
thirds) of the measurements are expected to fall within 1 standard deviation of the mean and 95.4% of all
data points lie within 2 standard deviations of the mean on either side (Figure 3). Thus, in this example, if you
assume that this sample of 17 observations is drawn from a population of measurements that are normally
distributed, 68.3% of the measurements in the population should fall between 91.3 and 114.7 millimeters and
95.4% of the measurements should fall between 79.6 millimeters and 126.4 millimeters. (Note: you might get
slightly different values if you use a spreadsheet to
–2 SD +2 SD
make the calculations because in this example the
–1 SD +1 SD mean and standard deviation were rounded.)
Number of Plants
Mean
Figure 3. Theoretical Distribution of Plant Heights.
13.6% 34.1% 34.1% 13.6%
For normally distributed data, 68.3% of data points
lie between ±1 standard deviation of the mean and
Plant Height (mm) 95.4% of data points lie between ±2 standard
deviations of the mean.
Using BioInteractive Resources to Teach Mathematics and Statistics in Biology Pg. 11
We can graph the mean and the standard deviation of this sample of bean plants using a bar graph with error
bars (Figure 4). Standard deviation bars summarize the variation in the data—the more spread out the
individual measurements are, the larger the standard deviation. On the other hand, error bars based on the
standard error of the mean or the 95% confidence interval reveal the uncertainty in the sample mean. They
depend on how spread out the measurements are and on the sample size. (These statistics are discussed
further in “Measures of Confidence: Standard Error of the Mean and 95% Confidence Interval”.)
Figure 4. Mean Plant Height of a Sample of Bean Plants and an
Error Bar Representing ±1 Standard Deviation. Roughly two-thirds
of the measurements in this population would be expected to fall
in the range indicated by the bar.
A common misconception is that standard deviation decreases with increasing sample size. As you increase
the sample size, standard deviation can either increase or decrease depending on the measurements in the
sample. However, with a larger sample size, standard deviation will become a more accurate estimate of the
standard deviation of the population.
To illustrate what this number means, consider the following example. Biologists are interested in the variation
in leg sizes among grasshoppers. They catch five grasshoppers (# = 5) in a net and prepare to measure the left
legs. As the scientists pull grasshoppers one at a time from the net, they have no way of knowing the leg
lengths until they measure them all. In other words, all five leg lengths are “free” to vary within some general
range for this particular species. The scientists measure all five leg lengths and then calculate the mean to be x
= 10 millimeters. They then place the grasshoppers back in the net and decide to pull them out one at a time
to measure them again. This time, since the biologists already know the mean to be 10, only the first four
measurements are free to vary within a given range. If the first four measurements are 8, 9, 10, and 12
millimeters, then there is no freedom for the fifth measurement to vary; it has to be 11. Thus, once they know
the sample mean, the number of degrees of freedom is 1 less than the sample size, df = 4.
The sample mean is not necessarily identical to the mean of the entire population. In fact, every time you take
a sample and calculate a sample mean, you would expect a slightly different value. In other words, the sample
means themselves have variability. This variability can be expressed by calculating the standard error of the
mean (abbreviated as SE* or SEM).
To illustrate this point, assume that there is a population of a species of anole lizards living on an island of the
Caribbean. If you were able to measure the length of the hind limbs of every individual in this population and
then calculate the mean, you would know the value of the population mean. However, there are thousands of
individuals, so you take a sample of 10 anoles and calculate the mean hind limb length for that sample.
Another researcher working on that island might catch another sample of 10 anoles and calculate the mean
hind limb length for this sample, and so on. The sample means of many different samples would be normally
distributed. The standard error of the mean represents the standard deviation of such a distribution and
estimates how close the sample mean is to the population mean.
The greater each sample size (i.e., 50 rather than 10 anoles), the more closely the sample mean will estimate
the population mean, and therefore the standard error of the mean becomes smaller.
To calculate SE* or SEM divide the standard deviation by the square root of the sample size:
∑(*+ 0 *)1
- =
(, 0 2)
5
SE* =
,
What the standard error of the mean tells you is that about two-thirds (68.3%) of the sample means would
be within ±1 standard error of the population mean and 95.4% would be within ±2 standard errors.
Another more precise measure of the uncertainty in the mean is the 95% confidence interval (95% CI). For
2.785 .5
large sample sizes, 95% CI can be calculated using this formula: , which is typically rounded to for ease
, ,
of calculation. In other words, 95% CI is about twice the standard error of the mean.
The actual formula for calculating 95% CI uses a statistic called the t-value for a significance level of 0.05, which
is explained in Table 8 in Part 2. For large sample sizes, this t-value is 1.96. Since t-values are not typically
covered in high school biology, in this guide we estimate the 95% CI by using 2 x SEM, but note that this is just
an approximation.
Note about Error Bars: Many bar graphs include error bars, which may represent standard deviation, SEM, or
95% CI. When the bars represent SEM, you know that if you took many samples only about two-thirds of the
error bars would include the population mean. This is very different from standard deviation bars, which show
how much variation there is among individual observations in a sample. When the error bars represent 95%
confidence intervals in a graph, you know that in about 95% of cases the error bars include the population
Application in Biology—Example 1
Seeds of many weed species germinate best in recently disturbed soil that lacks a light-blocking canopy of
vegetation. Students in a biology class hypothesized that weed seeds germinate best when exposed to light. To
test this hypothesis, the students placed a seed from crofton weed (Ageratina adenophora, an invasive species
on several continents) in each of 20 petri dishes and covered the seeds with distilled water. They placed half
the petri dishes in the dark and half in the light. After one week, the students measured the combined lengths
in millimeters of the radicles and shoots extending from the seeds in each dish. Table 6 shows the data and
calculations of variance, standard deviation, standard error of the mean, and 95% confidence interval. The
students plotted the data as two bar graphs showing the standard error of the mean and 95% confidence
interval (Figure 5).
Table 6. Combined Lengths of Crofton Weed Radicles and Shoots after One Week in the Dark and the Light
2 2
Petri Dishes Dark (Z[ ) Light (Z\ ) Dark (Z] − Z[ ) Light (Z] − Z\ )
2 2
(mm) (mm) (mm ) (mm )
2 2
1 and 2 12 18 (12 – 9.6) = 5.8 (18 – 18.4) = 0.16
2 2
3 and 4 8 22 (8 – 9.6) = 2.6 (22 – 18.4) = 12.96
2 2
5 and 6 15 17 (15 – 9.6) = 29.1 (17 – 18.4) = 1.96
2 2
7 and 8 13 23 (13 – 9.6) = 11.5 (23 – 18.4) = 21.16
2 2
9 and 10 6 16 (6 – 9.6) = 13.0 (16 – 18.4) = 5.76
2 2
11 and 12 4 18 (4 – 9.6) = 31.4 (18 – 18.4) = 0.16
2 2
13 and 14 13 22 (13 – 9.6) = 11.6 (22 – 18.4) = 12.96
2 2
15 and 16 14 12 (14 – 9.6) = 19.3 (12 – 18.4) = 40.96
2 2
17 and 18 5 19 (5 – 9.6) = 21.1 (19 – 18.4) = 0.36
2 2
19 and 20 6 17 (6 – 9.6) = 13.0 (17 – 18.4) = 1.96
2 2
∑ (&' − &2 ) = 158.4 ∑ (&' − &. ) = 98.4
&2 = 9.6 (10) &. = 18.4 (18) (*+ 0 *)1 2^_.` ∑ (*+ 0 *)1 7_.`
Mean (&) = =
mm mm (, 0 2) 7 (, 0 2) 7
∑ (*+ 0 *)1
Standard Deviation, - = - = 4.20 mm - = 3.31 mm
(, 0 2)
5 `..a b.b2
Standard Error of the Mean, SE* = SE* = = 1.33 SE* = = 1.05
, 2a 2a
\c .(`..a) .(`.e`)
95% CI = 95% CI = = 2.7 95% CI = = 2.1
d 2a 2a
Note: The number of replicates (i.e., sample size, n) = 10. Means in parentheses, that is, (10) and (18),
are to the nearest millimeter.
Figure 5. Mean Length of Crofton Seedlings after One Week in the Dark or in the Light. The standard error of the mean
graph shows the fgZ as error bars, and the 95% confidence interval graph shows the hi% jk as error bars. (Note that in
these calculations we approximated 95% CI as about twice the SEM.)
The calculations in Table 6 show that although the students don’t know the actual mean combined radicle and
shoot length of the entire population of crofton plants in the dark, it is likely to be a number around the
sample mean of 9.6 millimeters ± 1.3 millimeters. For the light treatment it is likely to be 18.4 millimeters ± 1
millimeter. The students can be even more certain that the population mean would be 9.6 millimeters ±2.6
millimeters for the dark treatment and 18.4 millimeters ± 2.1 millimeters for the light treatment.
Note: By looking at the bar graphs, you can see that the means for the light and dark treatments are different.
Because the 95% confidence interval error bars do not overlap, this suggests that the true population means
are also different. However, in order to determine whether this difference is significant, you will need to
conduct another statistical test, the Student’s t-test, which is covered in ”Comparing Averages” in Part 2 of this
guide.
Application in Biology—Example 2
A teacher had five students write their names on the board, first with their dominant hands and then with
their nondominant hands. The rest of the class observed that the students wrote more slowly and with less
precision with the nondominant hand than with the dominant hand. The teacher then asked the class to
explain their observations by developing testable hypotheses. They hypothesized that the dominant hand was
better at performing fine motor movements than the nondominant hand. The class tested this hypothesis by
timing (in seconds) how long it took each student to break 20 toothpicks with each hand. The results of the
experiment and the calculations of variance, standard deviation, standard error of the mean, and 95%
confidence interval are presented in Table 7. The students then illustrated the data and uncertainty with two
bar graphs, one showing the standard error of the mean and the other showing the 95% confidence interval
(Figure 6).
\c .(8._`) .(7.8)
95% CI = 95% CI = = 3.5 95% CI = = 5.1
d 2` 2`
Standard Error of the Mean 95% Confidence Interval
Figure 6. Mean Number of Seconds for Students to Break 20 Toothpicks with their Nondominant Hands (ND) and
Dominant Hands (D). The standard error of the mean graph shows the fgZ as error bars, and the 95% confidence interval
graph shows the hi% jk as error bars.
This ends the part on descriptive statistics. Going back to the finch data set in Table 1 and Figure 1 of Part 1,
how would you calculate the sample means for beak sizes of the survivors and nonsurvivors? Is there more
variability among survivors or nonsurvivors? What is the uncertainty in your sample mean estimates? To
find the answers to these questions, see the “Evolution in Action: Data Analysis” activities at
http://www.hhmi.org/biointeractive/evolution-action-data-analysis.
Statistical tests evaluate statistical hypotheses. The statistical null hypothesis (symbolized by H0 and
pronounced H-naught) is a statement that you want to test. In this case, if you grow 10 plants with nitrogen
and 10 without, the null hypothesis is that there is no difference in the mean heights of the two groups and
any observed difference between the two groups would have occurred purely by chance. The alternative
hypothesis to H0 is symbolized by H1 and usually simply states that there is a difference between the
populations.
The statistical null and alternative hypotheses are statements about the data that should follow from the
experimental hypothesis.
The significance level is the probability of getting a test statistic rare enough that you are comfortable
rejecting the null hypothesis (H0). (See the “Probability” section of Part 3 for further discussion of probability.)
The widely accepted significance level in biology is 0.05. If the probability (p) value is less than 0.05, you reject
the null hypothesis; if p is greater than or equal to 0.05, you don’t reject the null hypothesis.
The t-test assesses the probability of getting a result more different than the observed result (i.e., the values
you calculated for the means shown in Figure 1) if the null statistical hypothesis (H0) is true. Typically, the null
statistical hypothesis in a t-test is that the mean of the population from which sample 1 came (i.e., the mean
beak size of survivors) is equal to the mean of the population from which sample 2 came (i.e., the mean beak
size of the nonsurvivors), or m 1 = m 2. Rejecting H0 supports the alternative hypothesis, H1, that the means are
significantly different (m 1 ¹ m 2). In the finch example, the t-test determines whether any observed differences
between the means of the two groups of finches (9.67 millimeters versus 9.11 millimeters) are statistically
significant or have likely occurred simply by chance.
A t-test calculates a single statistic, t, or tobs, which is compared to a critical t-statistic (tcrit):
|*: 0 *1 |
tobs =
no
To calculate the standard error (SE) specific for the t-test, we calculate the sample means and the variance (s2)
for the two samples being compared—the sample size (n) for each sample must be known:
5:1 51
SE = + 1
,: ,1
Calculation Steps
1. Calculate the mean of each sample population and subtract one from the other. Take the absolute
value of this difference.
2. Calculate the standard error, SE. To compute it, calculate the variance of each sample (s2), and divide it
by the number of measured values in that sample (n, the sample size). Add these two values and then
take the square root.
3. Divide the difference between the means by the standard error to get a value for t. Compare the
calculated value to the appropriate critical t-value in Table 8. Table 8 shows tcrit for different degrees of
freedom for a significance value of 0.05. The degrees of freedom is calculated by adding the number
of data points in the two groups combined, minus 2. Note that you do not have to have the same
number of data points in each group.
4. If the calculated t-value is greater than the appropriate critical t-value, this indicates that you have
enough evidence to support the hypothesis that the means of the two samples are significantly
different at the probability value listed (in this case, 0.05). If the calculated t is smaller, then you
cannot reject the null hypothesis that there is no significant difference.
Application in Biology
After a small population of crayfish was accidentally released into a shallow pond, biologists noticed that the
crayfish had consumed nearly all of the underwater plant population; aquatic invertebrates, such as the water
flea (Daphnia sp.), had also declined. The biologists knew that the main predator of Daphnia is the goldfish,
and they hypothesized that the underwater plants protected the Daphnia from the goldfish by providing hiding
places. The Daphnia lost their protection as the underwater plants disappeared. The biologists designed an
Experimental hypothesis: The underwater plants protect Daphnia from goldfish by providing hiding places.
Experimental prediction: By placing Daphnia and goldfish in tanks with and without plants, you should see a
difference in the survival of Daphnia in the two tanks.
Statistical null hypothesis: There is no difference in the number of Daphnia in tanks with plants compared to
tanks without plants: any difference between the two groups occurs simply by chance.
Statistical alternative hypothesis: There is a difference in the number of Daphnia in tanks with plants
compared to tanks without plants.
Table 9. Number of Daphnia Eaten by Goldfish in 30 Minutes in Tanks with or without Underwater Plants
Tanks Plants No Plants Plants No Plants
2 2
(sample1) (sample2) (Z] − Z[ ) (Z] − Z\ )
2 2
1 and 2 13 14 (9.6 – 13) = 11.56 (14.4 – 14) = 0.16
2 2
3 and 4 9 12 (9.6 – 9) = 0.36 (14.4 – 12) = 5.876
2 2
5 and 6 10 15 (9.6 – 10) = 0.16 (14.4 – 15) = 0.436
2 2
7 and 8 10 14 (9.6 – 10) = 0.16 (14.4 – 14) = 0.16
2 2
9 and 10 7 17 (9.6 – 7) = 6.76 (14.4 – 17) = 6.76
2 2
11 and 12 5 10 (9.6 – 5) = 21.16 (14.4 – 10) = 19.37
2 2
13 and 14 10 15 (9.6 – 10) = 0.16 (14.4 – 15) = 0.36
2 2
15 and 16 14 15 (9.6 – 14) = 19.34 (14.4 – 15) = 0.36
2 2
17 and 18 9 18 (9.6 – 9) = 0.36 (14.4 – 18) = 12.96
2 2
19 and 20 9 14 (9.6 – 9) = 0.36 (14.4 – 14) = 0.16
2 2
∑ (&' − &2 ) = 60.4 ∑ (&' − &. ) = 46.4
∑ (*+ 0 *: )1 8a.` ∑ (*+ 0 *: )1 `8.`
Mean, & &2 = 9.6 &. = 14.4 = =
,02 7 ,02 7
To determine whether the difference between the two groups was significant, the biologists calculated a t-test
statistic, as shown below:
For example, you decide to flip a coin 50 times. You expect a proportion of 50% heads and 50% tails. Based on
a 50:50 probability, you predict 25 heads and 25 tails. These are the expected values. You would rarely get
exactly 25 and 25, but how far off can these numbers be without the results being significantly different from
what you expected? After you conduct your experiment, you get 21 heads and 29 tails (the observed values). Is
the difference between observed and expected results purely due to chance? Or could it be due to something
else, such as something might be wrong with the coin? The chi-square test can help you answer this question.
The statistical null hypothesis is that the observed counts will be equal to that expected, and the alternative
hypothesis is that the observed numbers are different from the expected.
Note that this test must be used on raw categorical data. Values need to be simple counts, not percentages or
proportions. The size of the sample is an important aspect of the chi-square test—it is more difficult to detect
a statistically significant difference between experimental and observed results in a small sample than in a
large sample. Two common applications of this test in biology are in analyzing the outcomes of a genetic cross
and the distribution of organisms in response to an environmental factor of interest.
To calculate the chi-square test statistic (χ 2), you use the equation
?0@ 1
s . =
@
o = observed values
e = expected values
χ2 = chi-square value
S = summation
Calculation Steps
1. Calculate the chi-square value. The columns in Table 10 outline the steps required to calculate the chi-
square value and test the null hypothesis, using the coin-flipping example discussed above. The
equations for calculating a chi-square value are provided in each column heading.
Table 10. Coin-Toss Chi-Square Value Calculations
2 2
Side of Coin Observed (o) Expected (e) (o − e) (o − e) (o − e) /e
Heads 21 25 (−4) 16 0.64
Tails 29 25 4 16 0.64
2 2 2
χ = ∑ (o − e) /e → χ = 1.28
2. Determine the degrees of freedom value as follows:
df = number of categories − 1
In the example above, there are two categories (heads and tails):
df = (2 − 1) = 1
If the χ 2-value was 3.1, then you cannot reject the null hypothesis. The difference between observed and
expected data may be accidental and is not statistically significant.
Significance testing in biology typically uses a p-value of 0.05, which is also referred to as the alpha value (see
“Significance Testing: The α (Alpha) Level” in Part 2). A result with the p-value of 0.05 or lower is deemed a
statistically significant result.
To use the critical values table (Table 11), locate the calculated χ2-value in the row corresponding to the
appropriate number of degrees of freedom. For the coin-flipping example, locate the calculated χ2-value in the
df = 1 row. The χ2-value obtained was 1.28, which falls between 0.455 and 2.706 and is smaller than 3.841 (the
χ2-value at the p = 0.05 cutoff); in other words, the result was likely to happen between 10% and 50% of time.
Therefore, you cannot reject the null hypothesis that the results have likely occurred simply by chance, at an
acceptable significance level.
Table 11. Critical Values Table for Different Significance Levels and Degrees of Freedom
Application in Biology—Example 1
Students just learned in their biology class that pill bugs use gill-like structures to breathe oxygen. The students
hypothesized that the pill bugs’ gills require them to live in wet environments for their survival. To test the
hypothesis, they wanted to determine whether pill bugs show a preference for living in wet or dry
environments.
The students placed 15 pill bugs on the dry side of a two-sided choice chamber, and 15 pill bugs on the wet
side of the chamber. Fifteen minutes later, 26 pill bugs were on the wet side and 4 on the dry side. The data
are shown in Table 12.
Note that an alternative hypothesis is never proven true with any statistical test like the chi-square Test. This
statistical test only tells you whether the null hypothesis can or cannot be rejected. There is always a chance,
however small, that the observed difference could have occurred by chance even if the null hypothesis is true.
Likewise, failing to reject the null hypothesis does not necessarily mean that it is true. There might be a
difference between the observed and expected data that was too small to detect with the sample size of the
experiment.
Application to Biology—Example 2
One common application for the chi-square test is a genetic cross. In this case, the statistical null hypothesis is
that the observed results from the cross are the same as those expected, for example, the 3:1 ratio or 1:2:1
ratio for a Mendelian trait.
Dr. William Cresko, a researcher at the University of Oregon, conducted several crosses between marine
stickleback fish and freshwater stickleback fish. All marine stickleback fish have spines that protrude from the
pelvis, which presumably serve as protection from larger predatory fish. Many freshwater stickleback
populations lack pelvic spines. Dr. Cresko wanted to find out whether the presence or absence of pelvic spines
behaves like a Mendelian trait, meaning that it is likely to be controlled mainly by a single gene.
In one cross, marine stickleback with spines were crossed with stickleback from Bear Paw Lake, which don’t
have pelvic spines. All the progeny fish from this cross, the so-called F1 generation, had pelvic spines. Dr.
Cresko then took the F1 offspring and conducted several crosses between them to produce the F2 generation.
The results of the F2 crosses are shown in Table 14.
If the presence of pelvic spines is controlled by a single gene and the presence of pelvic spines is the dominant
trait as suggested by the F1 results, you would expect a ratio of 3:1 for fish with pelvic spines to fish without
pelvic spines in the F2 generation. For a total of 408 fish, the expected results would be 306:102. The results
from Dr. Cresko’s crosses are 320:88.
The null hypothesis is that there is no real difference between the expected results and the observed results,
and that the difference that we see occurred purely by chance. The statistical alternative hypothesis is that
there is a real difference between observed and expected results.
The chi-square example above is provided in the BioInteractive activity “Using Genetic Crosses to Analyze a
Stickleback Trait,” http://www.hhmi.org/biointeractive/using-genetic-crosses-analyze-stickleback-trait.
Another application of chi-square to genetics is available in the activity “Mapping Genes to Traits in Dogs
Using SNPs,” http://www.hhmi.org/biointeractive/mapping-traits-in-dogs.
For example, if you plot the width of an oak (Quercus sp.) leaf (Y) on an xy scatter plot as a function of the
leaf’s length (X), the correlation coefficient ( A) indicates how much width depends on length. An A-value equal
to +1 would indicate a perfect positive correlation between width and length. In other words, the longer an
oak leaf, the wider it is. An A-value of −1 would indicate a perfect negative correlation—the longer an oak leaf,
the narrower it is. If there is no correlation between two variables, the A-value equals 0, which would mean
that there is no relationship between oak leaf length and width. The null hypothesis (H0) for a correlation is
that there is no correlation and A = 0.
Calculating A involves determining the sample mean of the predictor variable (&) and its standard deviation
(-* ), the sample mean of the response variable (t) and its standard deviation (-u ), and the number of pairs (X,
Y) of individuals in the sample (#):
< B+ C B E+ C E
+D: ;B ;E
A =
, 0 2
Another statistic, called the coefficient of determination, uses the square of A. The A . -value tells us the
strength of the relationship between X and Y.
When calculating correlations, it is important for you to remember that correlation does not imply causation.
For example, Figure 7 shows that there is a strong negative correlation between the mean temperature of
Earth over the last 190 years and the number of pirates in the Caribbean. Clearly, a decrease in the number of
pirates is not the cause of global warming.
Figure 7. Mean Global Temperature (°C) as a
Function of the Approximate Number of Pirates
in the Caribbean, 1820–2000. The line is the linear
regression. Statistics are correlation coefficient (v)
and the coefficient of determination (v\ ).
1. Calculate &, -* , t, and -u as shown in Table 16.
*+ 0 * u+ 0u
2. Determine and , multiply the two for each tomato sample, and then sum the results as
5B 5E
shown in Table 16:
*+ 0 * u+ 0 u
= 8.136
5B 5E
3. Calculate A = 8.136/9 = 0.904.
4. Compare the calculated A-value with the critical value of A at a (the H0 rejection level) = 0.05. See
Table 17 for critical A-values. The degrees of freedom is calculated by adding the total number of data
points minus 2.
In Table 17, the value for Az{'| is ±0.632 for 8 degrees of freedom (10 pairs of observations – 2). The calculated
A-value is 0.904, which is much closer to +1 (perfect positive correlation) than to 0.632. This means that the
probability of getting a value as extreme as 0.904 purely by chance if H0 is true (A = 0) is less than 0.05.
Therefore, students can reject H0 and conclude that there is a statistically significant association between the
mass of the students’ tomatoes and the number of seeds each tomato has.
Figure 8. Number of Seeds Counted in Tomatoes
as a Function of Tomato Mass. Statistics are
correlation coefficient (v) and the coefficient of
determination (v\ ).
Using BioInteractive Resources to Teach Mathematics and Statistics in Biology Pg. 27
The coefficient of determination is a measure of how well the regression line represents the data. If the
regression line passes exactly through every point on the scatter plot, it would be able to explain all of the
variation. The further the line is away from the points, the less it is able to explain. In Figure 8, 81.7% of the
variation in y can be explained by the relationship between x and y.
Note: Many biological traits (i.e., animal behavior or physical appearance) vary greatly among individuals in a
population. Thus, a coefficient of determination of 0.817 between two biological variables is considered very
high. However, for a standard curve, as described in “Standard Curves” in Part 3, it would be considered low.
Application in Biology—Example 1
Students were curious to learn whether there is an association between amounts of algae in pond water and
the water’s clarity. They collected water samples from seven local ponds that seemed to differ in water clarity.
To quantify the clarity of the water, they cut out a small disk from white poster board, divided the disk into
four equal parts, and colored two of the opposite parts black; they then placed the disk in the bottom of a 100-
milliliter graduated cylinder. For each sample, the students slowly poured pond water into the cylinder until
the disk was no longer visible from above. In Table 18 they recorded the volume of water necessary to obscure
the disk—the more water necessary to obscure the disk, the clearer the water. As a proxy for algae
concentration, they extracted chlorophyll from the water samples and used a spectrophotometer to
determine chlorophyll concentration (Table 18).
Note: Water clarity is given as the volume of water in milliliters (mL) required to obscure a black-and-
white disk at the bottom of a 100-milliliter graduated cylinder. A greater volume indicates clearer
water.
When students calculate A, they get a value of −0.965, which is a strong correlation (with −1 being a perfect
negative correlation). They can confirm this by checking the critical A-value on Table 17, which is ±0.754 for 5
degrees of freedom (7 – 2) with a 0.05 confidence level. Since the A of −0.965 is closer to −1 than the Az{'| of
−0.754, they can conclude that the probability of getting a value as extreme as −0.965 purely by chance is less
than 0.05. Therefore, they can reject H0 and conclude that chlorophyll concentration and water clarity are
significantly associated.
Moreover, the coefficient of determination (A . ) of 0.932 is close to 1. As you can see from the graph, most of
the points are close to the line.
Application in Biology—Example 2
Students noticed that some ponderosa pine trees (Pinus ponderosa) on a street had more ovulate cones
(female pinecones) than other ponderosa pine trees. They hypothesized that the number of pinecones was a
function of the age of the tree and predicted that taller trees would have more cones than younger, shorter
trees. To determine the height of a tree, they used the “old logger” method. A student held a stick the same
length as the student’s arm at a 90° angle to the arm and backed up until the tip of the stick “touched” the top
of the tree. The distance the student was from the tree equaled the height of the tree. Using this method, the
students measured the heights of 10 trees. Then, using binoculars, they counted the number of ovulate cones
on each tree and recorded the data in Table 19.
t = 45.4
Mean & = 6.8 m
cones
Standard
-* = 3.019 -u = 28.625
Deviation
< B+ CB E+ CE
+D: ; ;E ..a`8 ..a`8
B
A = = = = 0.227
, 0 2 2a 0 2 7
Figure 10. Number of Ovulate Cones on Ponderosa Pine Trees as a Function of Tree Height (m). Statistics are correlation
coefficient (v) and the coefficient of determination (v\ ).
Just looking at the data points in Figure 10, it is hard to know whether there is a correlation or not. If there is a
correlation, it is not very strong. Drawing the line of best fit suggests a positive correlation. This is clearly a case
in which calculating A will help determine whether the correlation is statistically significant.
In Table 17, Az{'| is ±0.632 for 8 degrees of freedom (10 – 2). The calculated r-value is 0.227, which is further
away from +1 than Az{'| 0.632, so the probability of getting a value of 0.227 purely by chance is greater than
0.05 (p > 0.05). Therefore, students cannot reject H0 and can conclude that there is not a statistically significant
association between the numbers of ovulate cones on ponderosa pine trees and the heights of the trees.
Allele and genotype frequencies are commonly calculated by population geneticists. For instance, in a
population of 350 pea plants, suppose 112 are homozygous for the dominant yellow pea seed allele (YY), 139
are heterozygous (Yy), and 99 are homozygous for the recessive green pea seed allele (yy).
To determine the relative frequency (and percentage) of plants in this population that are homozygous for the
dominant yellow pea seed allele, you should divide the number of plants that are homozygous for the yellow
pea seed allele by the total number of plants:
relative frequency of the recessive allele (y) = [(139 × 1) + (99 × 2)]/(350 × 2) = 337/700 = 0.48
percentage = 0.48 × 100 = 48% of the gene pool is the recessive green pea seed allele
Probability
You learned in the “Significance Testing” section of Part 2 that a probability of 0.05 means that there is a 5%
chance for an event to happen—for example, a 5% chance of obtaining a particular test statistic by chance.
This section provides more information about probability and how to calculate it for different scenarios.
Probability allows scientists to predict the likelihood of the outcome of random events. Probability (p) values
lie between 1 (the event is certain to happen) and 0 (the event certainly will not happen). The probabilities for
all other events have fractional values. For example, the probability of throwing a 2 on a six-sided die is 1 out
of 6 (p = 1/6), since the number 2 appears on only one of the six sides. By contrast, the probability of throwing
a 7 on a normal six-sided die is 0.
Rule of Addition
The probability of either of two mutually exclusive events occurring is equal to the sum of their individual
probabilities.
The probability of two independent events both occurring is the product of their individual probabilities.
Example
Given a normal six-sided die, what is the probability of you rolling a 2 and then a 4 on two consecutive
rolls? These events are independent of one another because they have no effect on each other’s
occurrence—that is, if you roll a six-sided die twice, rolling a 2 on the first roll has no effect on whether
you will roll a 4 on the second roll.
On the first roll, there is a 1/6 chance of rolling a 2.
On the second roll, there is a 1/6 chance of rolling a 4.
The probability of rolling a 2 first and a 4 second follows:
p = 1/6 × 1/6 = 1/36
There is 1 chance in 36 of rolling a 2 and then a 4 on two consecutive rolls of the die.
Application in Biology—Example 1
What is the probability that two parents who are heterozygous for the sickle cell allele would have three
children in a row who are homozygous for the sickle cell allele and have sickle cell anemia?
The probability of two parents who are heterozygous for an allele to have a child who is homozygous for that
allele is 1 in 4:
Two pea plants that are heterozygous for the round (R) and yellow (Y) alleles (RrYy) are crossed and produce
only a single seed. What is the probability of a seed from this cross having the genotype RRYy or RRYY?
The probability of getting a seed with the RRYy genotype is 1/4 × 1/2 = 1/8 = 2/16.
The probability of getting a seed with the RRYY genotype is 1/4 × 1/4 = 1/16.
Rate Calculations
Rate is used to express one measured quantity (y) in relation to another measured quantity (x).
In biology, rates are often calculated to indicate the change in a property of a system over time. For example,
the rate of an enzyme-catalyzed reaction is frequently expressed as the amount of product produced by the
enzyme in a given amount of time. When you use data plotted on a graph, you calculate the rate in the same
way as you calculate the slope:
rate = Δy/Δx
Students in an advanced biology class studied the reaction catalyzed by the catalase enzyme. Catalase
degrades hydrogen peroxide (H2O2) to water (H2O) and oxygen gas (O2). The students set up an experiment to
measure the amount of O2 produced by catalase over 5 minutes when it is added to H2O2. Table 20 contains
the data collected by a group of students, and Figure 11 shows the corresponding graph. From these data,
rates of catalase activity can be calculated over various intervals of time.
Table 20. Volume of Oxygen Produced from the Catalysis of Hydrogen Peroxide by the Enzyme Catalase
Time (min.) Volume of Oxygen Produced (mL)
0 0
1 12
2 25
3 33
4 39
5 42
50
Volume of Oxygen
40
Produced (mL)
30
20
10
0
0 1 2 3 4 5 6
Time (min.)
Figure 11. Oxygen Produced in a Catalase-Catalyzed Reaction as a Function of Time
Allele frequencies in a population are in equilibrium (do not change) when all the following conditions are met:
Hardy-Weinberg predicts that the allele frequencies in a population are at equilibrium, whereby p + q = 1.0
If the observed allele frequencies in a population differ from the frequencies predicted by the Hardy-Weinberg
principle, then the population is not at equilibrium and evolution may be occurring.
Application in Biology
In a hypothetical population of 100 rock pocket mice (Chaetodipus intermedius), 81 individuals have light,
sandy-colored fur and a dd genotype. The remaining 19 individuals are dark colored and therefore have either
the DD genotype or the Dd genotype. Scientists assumed that this population is at equilibrium; they used the
Hardy-Weinberg equations to find p and q for this population and calculated the frequency of heterozygous
genotypes.
Using BioInteractive Resources to Teach Mathematics and Statistics in Biology Pg. 34
Scientists knew that 81 mice have the dd genotype: q2 = 81/100 = 0.81, or 81%
Next, they calculated q:
q = 0.81 = 0.9
Then, they calculated p using the equation p + q = 1:
p + (0.9) = 1
p = 0.1
To calculate the frequency of the heterozygous genotype, they calculated 2pq:
2pq = 2(0.1)(0.9) = 2(0.09)
2pq = 0.18
Based on the calculations, the estimated frequency of the recessive allele is 0.9 and the frequency of the
dominant allele is 0.1.
If the scientists had a way to distinguish mice that are heterozygotes from those that are homozygous
dominant for the dark-colored fur, then they would have a way of determining whether the population is or is
not at equilibrium and could apply a statistical test like the chi-square test to see if there is a difference.
Standard Curves
A standard curve is a method of quantitative data analysis in which measurements of samples with known
properties are plotted on a graph and then the graph is analyzed to determine the properties of unknown
samples. Analysis of the graph is performed by drawing a line of best fit through the plotted points of the
known samples and then determining the equation of this line (in the form y = mx + b) or by interpreting the
values of unknown samples directly from the drawn line. The samples with known properties are the
standards, and the graph is the standard curve. Two common uses of standard curves in biology are to
determine protein concentrations and to analyze DNA fragment length.
The Bradford protein assay is a colorimetric assay that determines the protein concentration of a solution by
measuring how much light of a certain wavelength it absorbs. The light absorbance of several samples with
known protein concentrations is measured using a spectrophotometer and then plotted on a graph as a
function of protein concentration. Using this graph, or linear regression analysis, scientists determined the
protein concentration of an unknown sample once its absorbance was measured.
Table 21. Absorbance Measured at 595 Nanometers of Various Known Protein Concentrations
0.5
0
0 5 10 15 20 25
Protein Concentration (μg/mL)
Figure 12. Protein Concentration as a Function of Absorbance
In Figure 12, absorbance in Table 21 was plotted as a function of protein concentration for the known samples
(standards). The calculated coefficient of determination (A . ) and equation of the regression line are included
on the graph. The closer the A . value is to 1, the better the data fit the curve—or the more likely that the data
points x and y are actual solutions to the equation y = mx + b.
y = 0.0699x + 0.3722
The absorbance of the unknown protein solution was measured with a spectrophotometer as 0.921 (y =
0.921), so the scientists used the equation of the best-fit line to determine the protein concentration (x):
In RFLP (restriction fragment length polymorphism) analysis, the fragment sizes of unknown DNA samples can
be determined from the standard curve of DNA markers of known fragment lengths. First, scientists measured
the distance traveled by each of the marker fragments in a gel plate and plotted it as a function of size. This
provides a standard for comparison to interpolate the size of the unknown fragments (Table 22).
10,000
1,000
100
5 10 15 20 25 30
Migration Distance (mm)
Figure 13. Marker Fragment Length as a Function of Distance Migrated
Note: The standard curve in Figure 13 was plotted on a logarithmic y-axis scale, because the relationship
between fragment length and distance migrated is exponential, or nonlinear.
Scientists estimated the length of an unknown DNA fragment that migrated 20 millimeters by using the graph
in Figure 13, as illustrated by the red lines. To estimate the unknown length, they located the distance of 20
millimeters on the x-axis and traced a vertical line to the line of best fit. A horizontal line from the point of
intersection will cross the y-axis at the corresponding fragment length. In this case, they estimated the
fragment length to be 3,100 base pairs.
Authors:
Written by Paul Strode, PhD, Fairview High School, CO, and Ann Brokaw, Rocky River High School, OH
Edited by Laura Bonetta, PhD, HHMI
Reviewed by Brad Williamson, The University of Kansas; John McDonald, PhD, University of Delaware; Sandra
Blumenrath, PhD, and Satoshi Amagai, PhD, HHMI
Copyedited by Barbara Resch
Graphics by Heather McDonald, PhD, and Bill Pietsch
Hardy-Weinberg
Standard Curves
(Hyperlinked Titles)
Student t-Test
Probability
Frequency
Deviation
Median
Range
Mode
Mean
Rate
Diet and the Evolution of Salivary Amylase
Students analyze data obtained from two different research studies in order
to draw conclusions between AMY1 gene copy number and amylase
X X X X X
production; and also between AMY1 gene copy number and dietary starch
consumption. The activity involves graphing, analyzing research data utilizing
statistics, making claims, and supporting the claims with scientific reasoning.
Evolution in Action: Graphing and Statistics
Students analyze frequency distributions of beak depth data from Peter and
Rosemary Grant’s Galapagos finch study and suggest hypotheses to explain
the trends illustrated in the graphs. Students then investigate the effect of
sample size on descriptive statistics and notice that the means and standard X X
deviations vary for each sub sample. Finally, students use wing length and
body mass data to construct bar graphs and are asked to propose
explanations for how and why some characteristics are more adaptive than
others in given environments.
Evolution in Action: Statistical Analysis
Students calculate descriptive statistics (mean, standard deviation, and 95%
confidence intervals) for eight sets of data from Peter and Rosemary Grant’s
Galapagos finch study. Students construct bar graphs with 95% confidence X X X X
intervals and analyze the means of finch body measurements with t-Tests.
Students also graph two of the finch measurements against each other to
investigate a possible association.
Chi-Square Analysis
Hardy-Weinberg
Standard Curves
BioInteractive Resource Name
Student t-Test
(Hyperlinked Titles)
Probability
Frequency
Deviation
Median
Range
Mode
Mean
Rate
Lizard Evolution Virtual Lab
The virtual lab includes four modules that investigate different concepts in
evolutionary biology, including adaptation, convergent evolution,
phylogenetic analysis, reproductive isolation, and speciation. Each module X X X X
involves data collection, calculations, analysis and answering questions. The
“Educators” tab includes lists of key concepts and learning objectives and
detailed suggestions for incorporating the lab in your instruction.
The Virtual Stickleback Evolution Lab
This virtual lab is appropriate for the high school biology classroom as an
excellent companion to an evolution unit. Because the trait under study is fish
pelvic morphology, the lab can be used for lessons on vertebrate form and
function. In an ecology unit, the lab can be used to illustrate predator-prey X X
relationships and environmental selection pressures. The sections on
graphing, data analysis, and statistical significance make the lab a good fit for
addressing the "science as a process" or "nature of science" aspects of the
curriculum.
Battling Beetles
This series of activities complements the HHMI DVD Evolution: Constant
Change and Common Threads, and requires simple materials such as M&Ms,
food storage bags, colored pencils, and paper cups. An extension of this
X X
activity allows students to model Hardy-Weinberg and selection using an Excel
spreadsheet. The overall goal of Battling Beetles is to engage students in
thinking about the mechanism of natural selection through data collection,
analysis and pattern recognition.
Allele and Phenotype Frequencies in Rock Pocket Mouse Populations
- A lesson that uses real rock pocket mouse data collected by Dr. Michael X X
Nachman and his colleagues to illustrate the Hardy-Weinberg principle.
Chi-Square Analysis
BioInteractive Resource Name
Hardy-Weinberg
Standard Curves
(Hyperlinked Titles)
Student t-Test
Probability
Frequency
Deviation
Median
Range
Mode
Mean
Rate
Population Genetics, Selection, and Evolution
This hands-on activity, used in conjunction with the film The Making of the
Fittest: Natural Selection in Humans, teaches students about population
genetics, the Hardy-Weinberg principle, and how natural selection alters the
X X
frequency distribution of heritable traits. It uses simple simulations to
illustrate these complex concepts and includes exercises such as calculating
allele and genotype frequencies, graphing and interpretation of data, and
designing experiments to reinforce key concepts in population genetics.
Mendelian Genetics, Pedigrees, and Chi-Square Statistics
This lesson requires students to work through a series of questions pertaining
to the genetics of sickle cell disease and its relationship to malaria. These X
questions will probe students' understanding of Mendelian genetics,
probability, pedigree analysis, and chi-square statistics.
Using Genetic Crosses to Analyze a Stickleback Trait
This hands-on activity involves students applying the principles of Mendelian
genetics to analyze the results of genetic crosses between stickleback fish with
different traits. Students use photos of actual research specimens (the F1 and X X
F2 cards) to obtain their data; they then analyze the data they collected along
with additional data from the scientific literature. In the extension activity,
students use chi-square analysis to determine the significance of genetic data.
Pedigrees and the Inheritance of Lactose Intolerance
In this classroom activity, students analyze the same Finnish family pedigrees
that researchers studied to understand the pattern of inheritance of lactose X
tolerance/intolerance. They also examine portions of DNA sequence near the
lactase gene to identify specific mutations associated with lactose tolerance.
Chi-Square Analysis
BioInteractive Resource Name
Hardy-Weinberg
Standard Curves
(Hyperlinked Titles)
Student t-Test
Probability
Frequency
Deviation
Median
Range
Mode
Mean
Rate
Beaks as Tools: Selective Advantage in Changing Environments
In their study of the medium ground finches, evolutionary biologists Peter and
Rosemary Grant were able to track the evolution of beak size twice in an
amazingly short period of time due to two major droughts that occurred in the
1970s and 1980s. This activity simulates the food availability during these
X X
droughts and demonstrates how rapidly natural selection can act when the
environment changes. Students collect and analyze data and draw conclusions
about traits that offer a selective advantage under different environmental
conditions. They have the option of using an Excel spreadsheet to calculate
different descriptive statistics and interpret graphs.
Look Who’s Coming for Dinner: Selection by Predation
This hands-on classroom activity is based on real measurements from a year-
long field study on predation, in which Dr. Jonathan Losos and colleagues
introduced a large predator lizard to small islands that were inhabited
by Anolis sagrei. The activity illustrates the role of predation as an agent of
X X X
natural selection. Students are asked to formulate a hypothesis and analyze a
set of sample research data from actual field experiments. They then use
drawings of island habitats to collect data on anole survival and habitat use.
The quantitative analysis includes calculating and interpreting simple
descriptive statistics and plotting the results as line graphs.
Mapping Genes to Traits in Dogs Using SNPs
In this hands-on genetic mapping activity students identify single nucleotide
X
polymorphisms (SNPs) correlated with different traits in dogs. The
quantitative analysis section includes chi-square analysis.
Chi-Square Analysis
BioInteractive Resource Name
Hardy-Weinberg
Standard Curves
(Hyperlinked Titles)
Student t-Test
Probability
Frequency
Deviation
Median
Range
Mode
Mean
Rate
Spreadsheet Data Analysis Tutorials
These tutorials are designed to show essential data analysis techniques using
a spreadsheet program such as Excel. Follow the tutorials in sequence to learn
the fundamentals of using a spreadsheet program to organize data; taking X X X X X
advantage of formulae and functions to calculate statistical values including
mean, standard deviation, standard error of the mean, and 95% confidence
intervals; and plotting graphs with error bars.
Schooling Behavior of Stickleback Fish from Different Habitats
(Data Point) A team of scientists studied the schooling behavior of threespine
X X
stickleback fish by experimentally testing how individual fish responded to an
artificial fish school model.
Effects of Natural Selection on Finch Beak Size
(Data Point) Rosemary and Peter Grant studied the change in beak depths of
X X X
finches on the island of Daphne Major in the Galápagos Islands after a
drought.