Stats Notes
Stats Notes
Basic Concepts:
Statistics is a collection of methods for planning experiments, obtaining data, and then
organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions
based on the data.
A statistic is a value, usually a numerical value, that describes a sample. A statistic may
be obtained from a single measurement, or it may be derived from a set of
measurements from the sample.
Inferential statistics consist of techniques that allow us to study samples and then make
generalizations about the populations from which they were selected.
Sampling error is the discrepancy, or amount of error, that exists between a sample
statistic, and the corresponding population parameter.
Margin of error is an estimate of the extent to which sample results are likely to deviate
from the population value.
Qualitative (or categorical or attribute) data can be separated into different categories
that are distinguished by some nonnumeric characteristics.
A constant is a characteristic or condition that does not vary, but is the same for every
individual.
The dependent variable is the one that is observed for changes in order to assess the
effect of the treatment.
For a continuous variable, there are an infinite number of possible values that fall
between any two observed values. A continuous variable is divisible into an infinite
number of fractional parts.
Scales of Measurement
S.S. Stevens has devised a system that makes a logical approach to measurement. He
defined four types of scales: nominal, ordinal, interval, and ratio. These scales are
distinguished on the basis of the relationships assumed to exist between objects having
different scale values.
Frequency Distributions
Thus, a frequency distribution presents a picture of how the individual scores are
distributed on the measurement scale – hence the name frequency distribution.
Proportion measures the fraction of the total group that is associated with each score.
In general, the proportion associated with each score is
"
proportion = #=
!
Because proportions describe the frequency (f) in relation to the total number (N), they
often are called relative frequencies.
In addition to using frequencies (f) and (p), researchers often describe a distribution of
scores with percentages.
"
percentage = #$"##! = $"##!
!
An ungrouped or simple frequency distribution is used when there is a small number
of observations. It is merely an arrangement of the data usually from the highest to the
lowest that shows the frequency of occurrence of the different values of the variable.
The steps in grouping a large mass of data into a frequency distribution are as follows:
1. Find the range between the highest and the lowest scores.
2. Determine the interval size by dividing the range by the desired number of classes
which is normally not less than 10 and not more than 20.
3. Determine the class limits of the class intervals. Tabulation is facilitated if the lower
class limits of the class intervals are multiples of the class size. The bottom interval
must include the lowest score.
4. Tally the frequencies for each class interval and get the sum of the frequency
column.
Frequency Distribution Graphs
For a histogram, vertical bars are drawn above each score so that
A histogram is used when the data are measured on an interval or a ratio scale.
When data have been grouped into class intervals, you can construct a frequency
distribution histogram by drawing a bar above each interval so that the width of the bar
extends to the real limits of the interval.
Bar graphs. When you are presenting the frequency distribution for data from nominal
or ordinal scale, the graph is constructed so that there is some space between the bars.
In the case of a nominal scale, the separate bars emphasize that the scale consists of
separate, distinct categories. For ordinal scales, the bar graph is used because
differences between ranks do not provide information about interval size on the X-axis.
For a bar graph, a vertical bar is drawn above each score (or category) so that
A bar graph is used when the data are measured on a nominal or an ordinal scale.
In a frequency distribution polygon, a single dot is drawn above each score so that
A continuous line is then drawn connecting these dots. The graph is completed by
drawing a line down to the X-axis (zero frequency) at each end of the range of scores.
As with histogram, the frequency distribution polygon is intended for use with interval or
ratio scales. A polygon also can be used with data that have been grouped into class
intervals. In this case, you position each dot directly above the midpoint of a class
interval. The midpoint can be found by averaging the apparent limits of the interval or by
averaging the real limits of the interval.
The raw frequencies may be converted to percentages such that the cumulative
frequencies will total 100 percent instead of the total number of frequencies. If these
cumulative frequencies are graphed, a cumulative percentage polygon or ogive is
obtained.
In a skewed distribution, the scores tend to pile up toward one end of the scale and taper
off gradually at the other end.
A skewed distribution with the tail on the right-hand side is said to be positively skewed
because the tail points toward the positive (above-zero) end of the X-axis. If the tail
points to the left, the distribution is said to be negatively skewed.
The percentile system is widely used in educational measurement to report the standing
of an individual relative to the performance of a known group. It is based on the
cumulative percentage distribution. A percentile point is a point on the measurement
scale below which is a specified percentage of the cases in the distribution falls. It is
often called a percentile. A percentile rank is the percentage of cases falling below a
given point on the measurement scale.
Do not confuse percentile and percentile ranks: percentile ranks may take values only
between zero and 100, whereas a percentile (point) may have any value that scores may
have.
To determine a particular percentile when the distribution is ungrouped, the scores must
first be arranged according to size. Then, the position p of the score or point which
defines the pth percentile in a distribution consisting of n observations is:
"(! + !)
!""
If the data are given in a grouped distribution, the pth percentile is obtained by a
procedure similar to that use in finding the median.
#$ ! " '
&# = % + !
"
Mean. The arithmetic mean or simple mean (popularly called the average) is the sum
of the separate scores or measures divided by the number of the scores.
1. The sum of deviations of all the measurements in a distribution from the mean is 0.
2. In many statistical cases, the squares of the deviations from the mean are used in
statistical computations. A second useful property of the mean involves the sum of
squares of deviations from the mean. This second property of the mean states that
the sum of the squared deviations of scores from their arithmetic mean is less than
the sum of the squared deviations around any point other than the mean.
3. If a constant c is added to each score in a distribution, the mean of the distribution
will be increased by the same constant.
4. If each score in a distribution is multiplied by a constant, the mean of the original
scores is also multiplied by the same constant.
5. The mean may not be a value that exists in the distribution.
6. All the values of the variable under investigation are incorporated in the computation
of the mean.
1. When the distribution consists of interval or ratio data which have no extreme values
(too high or too low in comparison with the other scores in the set).
2. When other statistics (like standard deviation, coefficient of correlation, etc.) are
subsequently to be computed.
3. When the distribution is normal or is not greatly skewed, the mean is usually
preferred to either the median or the mode. In such cases, it provides a better
estimate of the corresponding population parameter than either the median or the
mode.
!"
"=
!
When some scores occur several times, the mean is computed with the formula:
!#
$ =!
$ "$
#=
!
When only a grouped frequency distribution is available, the mean is approximated by the
formula:
& ' #A #
' = %& + $ !!
% " "
Median. The median is that point on the scale of measurement that divides a series of
ranked observations into halves, such that half of the observations fall above it and the
other half fall below it.
1. The median is the point below which half of the scores in a distribution lie and above
which the other half of the scores lie.
2. The median is an ordinal statistic because its calculation is based on the ordinal
properties of the data being analyzed.
3. When the distribution is grossly asymmetrical or skewed or when a series contains
either a few extremely high or a few extremely low scores compared with the rest of
the scores, the median is the most representative average. This is because the
values of the different scores have nothing to do with the computation of the median.
4. In an open-ended distribution, the median is the most reliable measure of central
tendency that can be computed.
5. Unlike the mean, the medians of separate or different distributions cannot be
combined to give the median of the resulting combined distribution.
6. The median is less reliable or less dependable than the mean. If different samples
are randomly selected from a given population, the medians of these samples are
likely to vary or fluctuate more from each other and from the median of the given
population than the means of the same samples.
For ungrouped data, the calculation of the median is based on the following formula:
# +"
$%& = !"
!
The median is calculated from grouped data using the formula below:
&% #
$ ' $" !
'() = & + $ ! !!
$ "# !
$ !
% "
Mode. The mode is the point on the measurement scale with the maximum frequency in
the given distribution. In an ungrouped distribution, it is the measurement which occurs
most frequently.
The mode does not always exist. In a rectangular distribution where all the frequencies
are equal, there is no mode. On the other hand, for some sets of data there may be two
or more scores with the same highest frequency.
The mode has the following properties:
1. The mode is a nominal statistic which means that it is used for nominal data. Its
computation does not depend on the values of the variable or on their order, but
merely on their frequency or occurrence. It is rarely used with interval, ratio, and
ordinal variables, where means and medians can be calculated.
2. It is usually employed as a simple, inspectional measure which indicates roughly the
center of concentration of distribution. As such, there is no need to calculate it as
exactly as the median or the mean.
3. The mode is a very unstable value. It can change radically if the method of rounding
the data is changed.
4. The mode is the appropriate measure of central tendency if the distribution is bimodal
with the modes at the extreme ends of the distribution.
VARIABILITY
The Range. The range is the distance between the largest scores (Xmax) and the
smallest scores (Xmin) in the distribution. The problem with using the range as a measure
of variability is that it is completely determined by the two extreme values and ignores the
other scores in the distribution.
Because the range does not consider all the scores in the distribution, it often does not
give an accurate description of the variability for the entire distribution. For this reason,
the range is considered to be a crude and unreliable measure of variability.
A distribution can be divided into four equal parts using quartiles. By definition, the first
quartile (Q1) is the score that separates the lowest 25% of the distribution from the rest.
The second quartile (Q2) is the score that has exactly two quarters, or 50%, of the
distribution below. Notice that the second quartile and the median are the same. Finally,
the third quartile (Q3) is the score that divides the bottom three-fourths of the distribution
from the top quarter.
The interquartile range is the distance between the first quartile and the third quartile:
interquartile range = Q3 – Q1
The standard deviation is the most commonly used and the most important measure of
variability. Standard deviation uses the mean of the distribution as a reference point and
measures variability by considering the distance between each score and the mean. It
determines whether the scores are generally near or far from the mean. That is, are the
scores clustered together or are scattered? In simple terms, the standard deviation
approximates the average distance from the mean.
Variance is the mean of the squared deviation scores. Standard deviation is the
square root of the variance.
#( " " µ )
!
!=
!
Formula of the sample standard deviation:
#=
! !!
One theoretical distribution that has proved to be extremely valuable is the normal
distribution (or normal curve), a distribution that, among other things, describes how
chance operates. It is a bell-shaped, theoretical distribution that predicts the frequency of
occurrence of chance events.
The mathematical representation of this curve was first studied by the famous German
mathematician Carl Gauss. That is why the curve is often referred to as the Gaussian
distribution.
Standard Normal Distribution. The simplest of the family of normal distributions is the
standard normal distribution, also called z distribution. It is a distribution of a normal
( )
random variable with a mean equal to zero µ = ! and a standard deviation equal to one
(! = !).
The standard normal distribution has the following characteristics:
1. It is symmetrical about the vertical line drawn through z = 0. This means that the
shape of the distribution at the right is a mirror image of the left.
2. The highest point in the curve is y = 0.3989.
3. The curve is asymptotic to the x-axis. This means that both positive and negative
ends approach the horizontal axis but do not touch it.
4. For all practical purposes, the area under the curve from z = -3 to z = +3 equals 1,
hence the term unit normal curve.
5. The three measures of central tendency (mean, median, and mode) coincide with
each other.
Skewness
When a distribution has many more observations on the right side of the curve, we say
that the curve is negatively skewed.
When a distribution has more observations on the left side of the curve, we say that the
curve is positively skewed.
The area under the unit normal curve may represent several things like the probability of
an event, the percentile rank of a score, or the percentage distribution of the whole
population.
• About 68% of all scores fall within 1 standard deviation of the mean
• About 95% of all scores fall within 2 standard deviations of the mean
• About 99.7% of all scores fall within 3 standard deviations of the mean
HYPOTHESIS TESTING
We test the null hypothesis directly in the sense that the conclusion will be either
a rejection of H0 or a failure to reject H0.
2. The alternative hypothesis (denoted by H1) is the statement that must be true if
the null hypothesis is false. For the mean, the alternative hypothesis will be
stated in only one of the three possible forms:
a. H1: µ ¹ some value
b. H1: µ > some value
c. H1: µ < some value
Mathematically, a nondirectional hypothesis makes use of the “not equal to” (¹)
sign, while a directional hypothesis involves one of the order relations, “less than”
(<) or “greater than” (>).
Very Important Note 1: Depending on the original wording of the problem, the
original claim will sometimes be the null hypothesis H0, and at other times it will
be the alternative hypothesis H1. Regardless of whether the original claim
corresponds to H0 or H1, the null hypothesis H0 must always contain equality.
Very Important Note 2: Even though we sometimes express H0 with the symbol
£ or ³ as in H0: µ £ some value or H0: µ ³ some value, we conduct the test by
assuming that H0: µ = some value is true. We must have a fixed and specific
value for µ so that we can work with a single distribution having a specific mean.
Very Important Note 3: If we are making our own claims, we should arrange the
null and alternative hypotheses so that the most serious error would be the
rejection of a true null hypothesis. Ideally, all claims would be made so that they
would all be null hypotheses. Unfortunately, our real world is not ideal. There is
poverty, war, crime, and people who make claims that are actually alternative
hypotheses.
3. Type I error: The mistake of rejecting the null hypothesis when it is true. The
probability of rejecting the null hypothesis when it is true is called the
significance level; that is, the significance level is the probability of a type I
error. The symbol a (alpha) is used to represent the significance level. The
values of a = 0.05 and a = 0.01 are commonly used.
Type I Error
The rejection of a true null hypothesis is labeled a Type I error. A Type I error,
symbolized with a greek alpha (a), is a “false alarm” – the investigator thinks he
or she has something when there is nothing there.
Significance Level
The actual probability figure that you obtain from the data is referred to as the
significance level. Thus, p£.001 is an expression of the level of significance of
the difference. In some statistical reports, an a level is not specified; only
significance levels are given. Thus, in the same report, some differences may be
reported as significant at the .001 level, some at the .01, and some at the .05
level. Regardless of how the results are reported, however, researchers view .05
as an important cutoff (Nelson, Rosenthal, and Rosnow, 1986). When .10 or .20
is used as an a level, a justification should be given.
The p in p<.05 is the probability of getting the sample statistic if H0 is true. This is
a simple definition that is easy to memorize. Nevertheless, the meaning of p is
commonly misinterpreted. Everitt and Hay (1992) report that among 70
academic psychologists, only 3 scored 100 percent in a six-item test on the
meaning of p. Here is what p is not:
Sampling distributions that are used to determine probabilities are always ones
that assume the null hypothesis is true. Thus, the probabilities these sampling
distributions for a particular statistics are accurate when H0 is true. Thus, p is the
probability of getting the sample statistic if H0 is true.
4. Type II error: The mistake of failing to reject the null hypothesis when it is false.
The symbol b (beta) is used to represent the probability of a type II error.
Type II Error
The retention of a false null hypothesis is labeled a Type II error. A type II error,
symbolized with a greek beta (β), is a “miss” – the investigator concludes that
there is nothing when there really is something.
Type I errors typically lead to changes that are unwarranted. Type II errors
typically lead to a maintenance of the status quo when a change is warranted.
The consequences of a Type I error are generally considered more serious than
the consequences of a Type II error, although there are certainly exceptions.
5. Controlling Type I and Type II Errors. The following practical considerations may
be relevant in controlling type I and type II errors:
a. For any fixed a, an increase in the sample size n will cause a decrease in b.
That is, a larger sample will lessen the chance that you will fail to reject a
false null hypothesis.
b. For any fixed sample size n, a decrease in a will cause an increase in b.
Conversely, an increase in a will cause a decrease in b.
c. To decrease both a and b, increase the sample size.
6. The following terms are associated with key components in the hypothesis-
testing procedure.
7. Degrees of Freedom
Walker (1940) summarizes this reasoning by stating: “A universal rule holds: The
number of degrees of freedom is always equal to the number of observations
minus the number of necessary relations obtaining among these observations.”
The conclusion of failing to reject the null hypothesis or rejecting it is fine for
those of us with the wisdom to take a statistics course, but then it’s usually
necessary to use simple, nontechnical terms in stating what the conclusion
suggests. Students often have difficulty formulating this final nontechnical
statement, which describes the practical consequence of the data and
computations. It’s important to be precise in the language used; the implications
of words such as “support” and “fail to reject” are very different. If you want to
justify some claim, state it in such a way that it becomes the alternative
hypothesis and then hope that the null hypothesis gets rejected. This claim
(alternative hypothesis) will be supported if you reject the null hypothesis. If, on
the other hand, your claim is stated in the null form, you will either reject or fail to
reject the claim; in either case you will not support the original claim.
Some texts say “accept the null hypothesis,” instead of “fail to reject the null
hypothesis.” Whether we use the tem accept or fail to reject, we should
recognize that we are not proving the null hypothesis; we are merely saying that
the sample evidence is not strong enough to warrant rejection of the null
hypothesis. The term accept is somewhat misleading because it seems to
incorrectly imply that the null hypothesis has been proved. The phrase fail to
reject says more correctly that the available evidence isn’t strong enough to
warrant rejection of the null hypothesis. So, we will use the conclusion fail to
reject the null hypothesis, instead of accept the null hypothesis.
Level of measurement
Each of these approaches assume that the dependent variable is measured at the
interval or ratio level, that is, using a continuous scale, rather than discrete
categories. Whenever possible when designing your study try to make use of
continuous, rather than categorical, measures of your dependent variable. This
gives you a wider range of possible techniques to use when analyzing your data.
Random sampling
The techniques assume that the scores are obtained using a random sample from
the population.
Independence of observations
The observations that make up your data must be independent of one another. That
is, each observation or measurement must not be influenced by any other
observation or measurement. Violation of this assumption, according to Stevens
(1996), is very serious.
Normal distribution
It is assumed that the populations from which the samples are taken are normally
distributed. In a lot of research (particularly in the social sciences), scores on the
dependent variable are not nicely normally distributed. Fortunately, most of the
techniques are reasonably ‘robust’ or tolerant of violations of this assumption.
Homogeneity of variance
Techniques in this section make the assumption that samples obtained from
population of equal variances. This means that the variability of scores for each of
the groups is similar.
Level of measurement
The scale of measurement for the variables should be interval or ratio (continuous).
The exception to this is if you have one dichotomous independent variable (with
only two values: e.g., gender) and one continuous variable. You should however,
have roughly the same number of people or cases in each category of the
dichotomous variable.
Related pairs
Each subject must provide a score on both variable X and Y (related pairs). Both
pieces of information must be from the same subject.
Independence of observations
The observations that make up your data must be independent of one another.
That is, each observation or measurement must not be influenced by any other
observation or measurement.
Normality
Linearity
The relationship between the two variables should be linear. This means that when
you look at a scatterplot of scores you should see a straight line (roughly).
Homoscedasticity
The variability in scores for variable X should be similar at all values of variable Y.
Check the scatterplot. It should show a fairly even cigar shape along its length.
Formulas of the different statistical tests:
#" ! # !
$=
"! "!
+
!" ! !
!" + ! ! ! !
#" ! # ! #" ! # !
$= =
"% "%
!
"" "
!
(" ")
"
#" = !
! !!
3. One-Way Analysis of Variance
%!
"# $ = ! ! !
#$ & = "% ! !
$$ !
%$ ! =
"# !
$$ !
%$ ! =
"# !
#$ "
%=
#$ !
" ! #! " ! # ! !
$=
[" ! # !
" (! # )
!
][" ! ! !
" (! ! )
!
]
5. Spearman rho
$" " %
# ="!
! #! % ! "!
# =!
! (" " ! )!
!
7. Chi – Square Test of Association
#! =!
(" " ! )!
!
8. Mann – Whitney U test
A! $A! + #"
B ! = A!A" + " ! #!
!
A! $A! + #"
B ! = A"A! + " ! #!
!
When both samples are relatively large, around 20, the following formula is used.
#"#!
A!
B !µ "
&= =
" #"#! $#" + #! + !#
!"
9. Wilcoxon – Signed Ranks test
!%! + $#
"!
# !µ "
$= =
" !%! + $#%!! + $#
!"
10. Kruskal – Wallis H test
"% %
#&%
$= " ! $# ! + "!
! # ! + "! & =" "&