Business Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 346

STATISTICS FOR

Better
Business
Decisions

Dr. Gordon W. McClung


Statistics
Overview

In this section we address a general question.

“How do we know?” or phased differently “Can we


ever be 100% certain?”

The generalized concept regarding the value of


information is discussed and the sections of the text
are outlined.
How do we know?

BEST GUESS? How do we know?


1. How do we know?
One of the befuddling aspects of statistics is that we never really “KNOW”. At least not in the sense of
2. Empirical approach being 100%, no possible way this can be wrong. By the very nature of probability we can approach
absolute certainty but we can never get to a state of absolute certainty. This concept, of probability
3. Scientific inquiry without certainty, is one that causes many to question the use of statistics. Yet in a practical sense we

4. Scientific method all use statistics. It may not be in the formal sense we will be using in this text but we do use our
beliefs about the probability of an event or object to help us make decisions everyday.
5. Value of Information
For example, when one is driving past a gas station what do the prices on the sign signify and how do
you interpret the information. In its raw form the prices are simply data points. Everyday we may drive
past multiple gas stations. Each station has price per gallon displayed on a large sign for us to see.
We collect these data points and consciously calculate what we belief is the going price for a gallon of
gas. In essence, we have taken data, organized the information and generated a descriptive statistic
(the average price for gas). Albeit, this is an average based on our observations but it is a statistic
(derived from data) that we use as representative of the “going” (average) price of gas (for the
population of gas stations) to help us make decisions.

How do you use this average gas price (a statistic)? Let’s say that the next day you need to purchase
gas for the car. You are driving past a gas station where the price of gas is posted as $ .10 below what
you believe the average price of gas to be (based on recent observations). Do you stop and get gas at
this station? Maybe.

Why maybe? This brings us to that nagging thing called probability. To help understand probability
think of how different your reaction to a price drop would be if prices had been staying the same for the
past 6 months versus fluctuating prices. For this example, let’s assume you have been observing

2
fluctuations in gas prices? Some days the price is rising, other days you observe production planning and control, inventory control, finance and accounting, and
the price is dropping. How confident are you that this $ .10 difference is below personnel selection and training to mention some major areas.
what your regular gas station will be charging? Will the price be $ .15 lower at the
next station? Scientific procedures involve the use of models to describe, analyze, and make
predictions. A model can be a well-defined set of descriptions and procedures
This is where you use your built in statistical calculator to make a decision. You like the Product Life Cycle in marketing, or it can be a scaled down analog of the
estimate the probability of the price difference and the chance that all stations will real thing, such as an engineer’s scale model of a car.
have lowered gas prices verses this one station being outside of the normal range
of price variations you have been observing. If you come to the conclusion that Models that are useful and valuable in managing business operations are broadly

the price of gas at this station is below what you can expect from other stations termed “statistical models”. A large number of these have been developed to
assist researchers in a variety of fields, such as agriculture, psychology, education,
you are likely to purchase from this location. This is the essence of statistical
communication, and military tactics, as well as in business. Only the professional
inference. We take data from a sample to represent the general population. We
statistician would be expected to understand the full range of such statistical
transform the data from its raw form by sorting, describing and analyzing. This
models.
allows us to make inferences about the world at large (the population represented
by our sample). The question then is, “How can the average business manager be expected to
understand and use such models?” First, that average business manager does
An Empirical Approach to Business Decisions
not need to understand and use all the models. There are certain models that

Managers need information in order to introduce products and services that create have more frequent business application than others. The statistical methods

value in the mind of the customer. But the perception of value is a subjective one, presented in this text have been chosen to cover a variety of problems generally

and what customers value this year may be quite different from what they value encountered in business. The reader is encouraged to seek out additional

next year. As such, the attributes that create value cannot simply be deduced from readings for detailed coverage of each model and additional statistical methods

common knowledge. Rather, data must be collected and analyzed. The goal of beyond the scope of this text.

research is to provide the facts and direction that managers need to make their
Second, the approach to learning models is to emphasize the similarity in logical
more important decisions. The value of the information provided is dependent on
structure among various models, and to stress understanding of assumptions
how well one effectively and efficiently executes the research process.
inherent in each model. This makes it possible to make sense from a seeming

“SWAG”, “shooting from the hip”, “flying by the seat of your pants”, and having a quagmire of statistical methods, and to clearly and logically determine when it is

“gut feeling” are all expressions that characterized business practices during its appropriate to use a specific statistical model. As a manager reading research

early stages of development. Now, however, business managers increasingly turn reports, it is then possible to determine if the correct techniques were utilized.

to more scientific procedures in every phase of operation: planning, forecasting,


Third, the average business manager cannot be expected to use statistical models
budgeting, investment decisions, market analysis, advertising and promotion,
if he or she does not understand what can go wrong or what the limitations of the
techniques are. Consequently, we will address the limitations of the models,

3
emphasizing what the techniques do not say, rather than simply how to properly How to observe. Use an existing questionnaire or develop a new survey
interpret conclusions from statistical tests. instrument. For instance, researchers might build or adopt existing interviews,
questionnaires, personality scales, etc., to use in making observations.
Finally, the average business manager is not expected to be a statistician. The
objective is to train a manager who can properly understand and interpret results The observations that researchers make result in data. The data might be the
from statistical models. A manager who will ask the right questions of the brands participants plan to purchase or the data might be respondent scores on a
researcher or statistician in order to evaluate and apply results. scale that measures preference. In this context, variables are things that we
measure, control, or manipulate in research. The participants (respondents) with
Empiricism, the scientific method, refers to using direct observation to obtain
the variables represent our data. Think of the data file as a spreadsheet in Excel
knowledge. Thus, the empirical approach to acquiring knowledge is based on
with each respondent represented by a row of data and each variable represented
making observations of individuals or objects of interest. As illustrated by the gas
by a column.
price example, everyday observation is an application of the scientific approach.
General Characteristics of Scientific Inquiry:
In general, your decision about what station to purchase gas from when you
observe the price is a microcosm of the processes we use to draw statistical 1. Use objectivity – freedom from bias
inferences. Unfortunately, generalizations based on everyday observations are
often misleading. In the context of making business decision we will require a 2. Obtain total evidence – all relevant facts
precise estimate of the probability we are drawing a correct conclusion from the
3. Seek general patterns – laws of nature
sample of data we have available. A major distinction between research and
everyday observation is that research is planned in advance. Based on a theory or
4. Use theories involving general laws – understanding and prediction
hunch, researchers develop research questions and then plan what, when, where
and how to observe in order to answer the questions. 5. Require empirical verification – predicted results

What (or whom) to observe, the population. When a population is large, 6. Disclose all methods and assumptions
researchers often plan to observe only a sample (i.e., a subset of a population).
Planning how to draw an adequate sample is, of course, critical in conducting 7. State proficiency of methods – degree of success

valid research.

When the observations will be made. (Morning, night…). Researchers realize that
the timing of their observations may affect the results of their investigations.

Where to make the observations. For instance, will the observations be made in a
quiet room or in a busy shopping mall?

4
Objective of Science is to Predict with Understanding:

Why is understanding necessary if prediction is okay without it? Theories in


business are generally not very highly refined and represent a great deal of
abstraction from the real world situation. The use of statistics does not overcome
these deficiencies; it may even fool some people by making them think they
understand more than they do. It is essential to be constantly aware of what is
known and what is not known.

Two Logics Used in Scientific Method:

Deductive

• All S is G & W

• X is S

• Therefore, X is G & W

Closely reasoned. If (a) and (b) are true, (c) must be true. This logic is used in
Objectives of Scientific Procedure:
making mathematical proofs.

1. To abstract from many factors those causally related to the outcome or


Inductive
problem.
• Almost all S is G & W
2. To define a relational structure among factors that can be expressed
mathematically. • X is S

3. To specify conditions wherein the relational structure will adequately • Therefore, X is almost certain to be G & W
represent the real situation, or can be related to it.
X can be explained relative to several sets of empirical data. Thus, one must
4. To use the relational structure to predict with understanding or test the either include total evidence or have only a “potential explanation.” Most
observed outcome. applications of statistical reasoning in business situations are only potential
explanations. It is generally impossible or too expensive to obtain total evidence.

5
To maximize the benefit of research, those who use it need to understand the First, we will review the general concept of statistics. This will include:
research process and its limitations. One of the most difficult aspects of business
decision-making is clearly defining the problem. The decision problem faced by • General Concepts and Assumptions.

management must be translated into a research problem in the form of questions


• Distinguishing between a sample and the population.
that define the information that is required to make the decision and how this
information can be obtained. • Descriptive versus inferential statistics.

The decision problem is translated into a research problem. For example, a • Levels of measurement.
decision problem may be whether to launch a new product. The corresponding
research problem might be to assess whether the market would accept the new Once we have reviewed the fundamental concepts we will focus on obtaining
product. information from data. We will discuss the pros and cons of various sampling
procedures.
The objective of the research should be defined clearly. To ensure that the true
decision problem is addressed, it is useful for the researcher to outline possible The second section, descriptive statistics, is dedicated to describing the data we
scenarios of the research results and then for the decision maker to formulate have collected. Including:
plans of action under each scenario.
• Frequencies and proportions.

The Value of Information


• Tables and cross tabulations.

Information can be useful, but what determines its real value to the organization?
• Graphical representations.
In general, the value of information is determined by:
• Mean, median and mode.
1. The validity and reliability of the information.
• Range, variance, and standard deviation.
2. The level of indecisiveness that would exist without the information.
• Measures of association.
3. The cost of the information in terms of time and money.
• Tests of normality
4. The ability and willingness to act on the information.
The third section of the text outlines statistical procedures used to draw
To ascertain the probability we are drawing a correct conclusion by providing
inferences about the population we are studying. Including:
accurate information we need to start at the beginning. We must answer several
basic questions about the data we are using and the techniques we have • Hypothesis testing including type I and type II error.
employed to analysis and generate information. These areas of inquiry lead us to
the organization of this text. • Confidence intervals and the normal distribution.

6
• Testing against a hypothesized value.

In the fourth section we apply our understanding of hypothesis testing to


situations where we are interested in comparing means. We often want to
compare the means from two or more groups. In this section we review the use of
t-test for two means and ANOVA for comparing two or more means.

• Comparing two means.

• Comparing three or more means (ANOVA).

The fifth section focuses on the development of linear equations to help in making
better decisions and prediction. We initially develop basic bi-variate equations
and then expand on our foundation to develop multiple regression models.

• Linear regression.

• Multivariate regression.

The final section of the text revisits the question “How do we know?” by posing
the question, “What do we know?”. The section provides a review of key sections
of a research report coupled with a summary of the techniques and procedures
covered. A general guide on when to use each technique is provided as a
reference.

Statistics is about this whole process used to answer questions and make
decisions. Effective decision-making involves correctly designing studies,
collecting unbiased data, describing the data with numbers and graphs, analyzing
the data to draw inferences and reaching conclusions based on the transformation
of data into information. The next section outlines the several elements in
research and statistics. It is imperative that the research process and
methodologies be clearly articulated in the research report to aid the decision
maker in understanding the findings and recommendations.

7
Section 2

Introduction to Research and Statistics

RESEARCH & STATISTICS Introduction to Research and Statistics

1. Value of Information Managers need information in order to introduce products and services that create value in the mind of

2. Research Process the customer. But the perception of value is a subjective one, and what customers value this year may
be quite different from what they value next year. As such, the attributes that create value cannot simply
3. Research Report be deduced from common knowledge. Rather, data must be collected and analyzed. The goal of
research is to provide the facts and direction that managers need to make their more important
management decisions.

To maximize the benefit of research, those who use it need to understand the research process and its
limitations.

The Value of Information

Information can be useful, but what determines its real value to the organization? In general, the value
of information is determined by:

• The ability and willingness to act on the information.

• The accuracy of the information.

• The level of indecisiveness that would exist without the information.

• The amount of variation in the possible results.

• The level of risk aversion.

• The reaction of competitors to any decision improved by the information.

8
• The cost of the information in terms of time and money. plans of action under each scenario. The use of such scenarios can ensure that
the purpose of the research is agreed upon before it commences.
The Research Process
Research Design
Once the need for research has been established, most research projects involve
these steps: Research can be classified in one of three categories:

1. Define the problem • Exploratory research

2. Determine research design • Descriptive research

3. Identify data types and sources • Causal research

4. Design data collection forms and questionnaires These classifications are made according to the objective of the research. In some
cases the research will fall into one of these categories, but in other cases
5. Determine sample plan and size different phases of the same research project will fall into different categories.

6. Collect the data Exploratory research has the goal of formulating problems more precisely,
clarifying concepts, gathering explanations, gaining insight, eliminating impractical
7. Analyze and interpret the data
ideas, and forming hypotheses. Exploratory research can be performed using a
8. Prepare the research report literature search, surveying certain people about their experiences, focus groups,
and case studies. When surveying people, exploratory research studies would not
Problem Definition try to acquire a representative sample, but rather, seek to interview those who are
knowledgeable and who might be able to provide insight concerning the
The decision problem faced by management must be translated into a research relationship among variables. Case studies can include contrasting situations or
problem in the form of questions that define the information that is required to benchmarking against an organization known for its excellence. Exploratory
make the decision and how this information can be obtained. Thus, the decision research may develop hypotheses, but it does not seek to test them. Exploratory
problem is translated into a research problem. For example, a decision problem research is characterized by its flexibility.
may be whether to launch a new product. The corresponding research problem
might be to assess whether the market would accept the new product. Descriptive research is more rigid than exploratory research and seeks to
describe users of a product, determine the proportion of the population that uses
The objective of the research should be defined clearly. To ensure that the true
a product, or predict future demand for a product. As opposed to exploratory
decision problem is addressed, it is useful for the researcher to outline possible
research, descriptive research should define questions, people surveyed, and the
scenarios of the research results and then for the decision maker to formulate
method of analysis prior to beginning data collection. In other words, the who,

9
what, where, when, why, and how aspects of the research should be defined. Some secondary data is republished by organizations other than the original
Such preparation allows one the opportunity to make any required changes before source. Because errors can occur and important explanations may be missing in
the costly process of data collection has begun. republished data, one should obtain secondary data directly from its source. One
also should consider who the source is and whether the results may be biased.
There are two basic types of descriptive research: longitudinal studies and cross-
sectional studies. Longitudinal studies are time series analyses that make There are several criteria that one should use to evaluate secondary data.
repeated measurements of the same individuals, thus allowing one to monitor
behavior such as brand-switching. However, longitudinal studies are not • Whether the data is useful in the research study.

necessarily representative since many people may refuse to participate because


• How current the data is and whether it applies to time period of interest.
of the commitment required. Cross-sectional studies sample the population to
make measurements at a specific point in time. A special type of cross-sectional • Errors and accuracy - whether the data is dependable and can be verified.
analysis is a cohort analysis, which tracks an aggregate of individuals who
experience the same event within the same time interval over time. Cohort • Presence of bias in the data.
analyses are useful for long-term forecasting of product demand.
• Specifications and methodologies used, including data collection method,
Causal research seeks to find cause and effect relationships between variables. It response rate, quality and analysis of the data, sample size and sampling
accomplishes this goal through laboratory and field experiments. technique, and questionnaire design.

Data Types and Sources • Objective of the original data collection.

• Nature of the data, including definition of variables, units of measure,


Secondary Data
categories used, and relationships examined.
Before going through the time and expense of collecting primary data, one should
Primary Data
check for secondary data that previously may have been collected for other
purposes but that can be used in the immediate study. Secondary data may be
Often, secondary data must be supplemented by primary data originated
internal to the firm, such as sales invoices and warranty cards, or may be external
specifically for the study at hand. Some common types of primary data are:
to the firm such as published data or commercially available data. The
government census is a valuable source of secondary data. • demographic and socioeconomic characteristics

Secondary data has the advantage of saving time and reducing data gathering • psychological and lifestyle characteristics
costs. The disadvantages are that the data may not fit the problem perfectly and
that the accuracy may be more difficult to verify for secondary data than for • attitudes and opinions
primary data.
• awareness and knowledge

10
• intentions - for example, purchase intentions. While useful, intentions are Questionnaire or Experimental Design
not a reliable indication of actual future behavior.
The questionnaire or experimental design are very important tools for gathering
• motivation - a person's motives are more stable than his/her behavior, so primary data. Poorly constructed questions or design can result in large errors and
motive is a better predictor of future behavior than is past behavior. invalidate the research data, so significant effort should be put into the design. A
questionnaire should be tested thoroughly prior to conducting the survey.
• behavior
Scenarios from the experimental design should be evaluated prior to conducting
the study.
• quality measurements

Measurement Scales
• performance data

• arrival times Attributes can be measured on nominal, ordinal, interval, and ratio scales:

Primary data can be obtained by experimentation or by observation. A common • Nominal numbers are simply identifiers, with the only permissible

observational methods is communication involving questioning respondents either mathematical use being for counting. Example: social security numbers.

verbally or in writing. This method is versatile, since one needs only to ask for the
• Ordinal scales are used for ranking. The interval between the numbers
information; however, the response may not be accurate. Direct communication
conveys no meaning. Median and mode calculations can be performed on
usually is quicker and cheaper than observation. Observation involves the
ordinal numbers. Example: class ranking
recording of actions and is performed by either a person or some mechanical or
electronic device. Observation is less versatile than communication since some • Interval scales maintain an equal interval between numbers. These scales
attributes of a person may not be readily observable, such as attitudes, can be used for ranking and for measuring the interval between two
awareness, knowledge, intentions, and motivation. Observation also might take numbers. Since the zero point is arbitrary, ratios cannot be taken between
longer since observers may have to wait for appropriate events to occur, though numbers on an interval scale; however, mean, median, and mode are all
observation using scanner data might be quicker and more cost effective. valid. Example: temperature scale
Observation typically is more accurate than communication.
• Ratio scales are referenced to an absolute zero values, so ratios between
Personal interviews have an interviewer bias that mail-in questionnaires do not numbers on the scale are meaningful. In addition to mean, median, and
have. For example, in a personal interview the respondent's perception of the mode, geometric averages also are valid. Example: weight
interviewer may affect the responses.
The scale of measurement has implications for the appropriateness of various
statistical techniques and models in data analysis.

11
Validity and Reliability There is a tradeoff between sample size and cost. The larger the sample size, the
smaller the sampling error but the higher the cost. After a certain point the smaller
The validity of a test is the extent to which differences in scores reflect differences sampling error cannot be justified by the additional cost.
in the measured characteristic. Predictive validity is a measure of the usefulness of
a measuring instrument as a predictor. Proof of predictive validity is determined by While a larger sample size may reduce sampling error, it actually may increase the
the correlation between results and actual behavior. Construct validity is the extent total error. There are two reasons for this effect. First, a larger sample size may
to which a measuring instrument measures what it intends to measure. reduce the ability to follow up on non-responses. Second, even if there is a
sufficient number of interviewers for follow-ups, a larger number of interviewers
Reliability is the extent to which a measurement is repeatable with the same may result in a less uniform interview process.
results. A measurement may be reliable and not valid. However, if a measurement
is valid, then it also is reliable and if it is not reliable, then it cannot be valid. One Data Collection
way to show reliability is to show stability by repeating the test with the same
In addition to sampling error, the actual data collection process will introduce
results.
additional errors. These errors are called non-sampling errors. Some non-sampling
Sampling Plan errors may be intentional on the part of the interviewer, who may introduce a bias
by leading the respondent to provide a certain response. The interviewer also may
The sampling frame is the pool from which the interviewees are chosen. The introduce unintentional errors, for example, due to not having a clear
telephone book often is used as a sampling frame, but have some shortcomings. understanding of the interview process or due to fatigue.
Telephone books exclude those households that do not have telephones and
those households with unlisted numbers. Since a certain percentage of the Respondents also may introduce errors. A respondent may introduce intentional
numbers listed in a phone book are out of service, there are many people who errors by lying or simply by not responding to a question. A respondent may
have just moved who are not sampled. Such sampling biases can be overcome by introduce unintentional errors by not understanding the question, guessing, not
using random digit dialing. Mall intercepts represent another sampling frame, paying close attention, and being fatigued or distracted.
though there are many people who do not shop at malls and those who shop
The research study should be designed to minimize non-sampling errors.
more often will be over-represented unless their answers are weighted in inverse
proportion to their frequency of mall shopping. Data Analysis - Preliminary Steps

In designing the research study, one should consider the potential errors. Two Before analysis can be performed, raw data must be transformed into the right
sources of errors are random sampling error and non-sampling error. Sampling format. First, it must be edited so that errors can be corrected or omitted. The
errors are those due to the fact that there is a non-zero confidence interval of the data must then be coded; this procedure converts the edited raw data into
results because of the sample size being less than the population being studied. numbers or symbols. A codebook is created to document how the data was
Non-sampling errors are those caused by faulty coding, untruthful responses, coded. Finally, the data is tabulated to count the number of samples falling into
respondent fatigue, etc. various categories. Simple tabulations count the occurrences of each variable

12
independently of the other variables. Cross tabulations, also known as • Type II error: occurs when one accepts the null hypothesis when in fact the
contingency tables or cross tabs, treats two or more variables simultaneously. null hypothesis is false.
However, since the variables are in a two-dimensional table, cross tabbing more
than two variables is difficult to visualize since more than two dimensions would Because their names are not very descriptive, these types of errors sometimes are

be required. Cross tabulation can be performed for nominal and ordinal variables. confused. Some people jokingly define a Type III error to occur when one
confuses Type I and Type II. To illustrate the difference, it is useful to consider a
Cross tabulation is the most commonly utilized data analysis method in research. trial by jury in which the null hypothesis is that the defendant is innocent. If the jury
Many studies take the analysis no further than cross tabulation. This technique convicts a truly innocent defendant, a Type I error has occurred. If, on the other
divides the sample into sub-groups to show how the dependent variable varies hand, the jury declares a truly guilty defendant to be innocent, a Type II error has
from one subgroup to another. A third variable can be introduced to uncover a occurred.
relationship that initially was not evident.
Hypothesis testing involves the following steps:
Hypothesis Testing
• Formulate the null and alternative hypotheses.
A basic fact about testing hypotheses is that a hypothesis may be rejected but
• Choose the appropriate test.
that the hypothesis never can be unconditionally accepted until all possible
evidence is evaluated. In the case of sampled data, the information set cannot be • Choose a level of significance (alpha) - determine the rejection region.
complete. So if a test using such data does not reject a hypothesis, the
conclusion is not necessarily that the hypothesis should be accepted. • Gather the data and calculate the test statistic.

The null hypothesis in an experiment is the hypothesis that the independent • Determine the probability of the observed value of the test statistic under
variable has no effect on the dependent variable. The null hypothesis is expressed the null hypothesis given the sampling distribution that applies to the
as H0. This hypothesis is assumed to be true unless proven otherwise. The chosen test.
alternative to the null hypothesis is the hypothesis that the independent variable
• Compare the value of the test statistic to the rejection threshold.
does have an effect on the dependent variable. This hypothesis is known as the
alternative, research, or experimental hypothesis and is expressed as H1. This
• Based on the comparison, reject or do not reject the null hypothesis.
alternative hypothesis states that the relationship observed between the variables
cannot be explained by chance alone. • Make the research conclusion.

There are two types of errors in evaluating a hypothesis: In order to analyze whether research results are statistically significant or simply
by chance, a test of statistical significance can be run.
• Type I error: occurs when one rejects the null hypothesis and accepts the
alternative, when in fact the null hypothesis is true.

13
Tests of Statistical Significance Regression procedures are widely used in research. In general, multiple regression
allows the researcher to ask (and hopefully answer) the general question "what is
Chi-square the best predictor of Y. The regression line expresses the best prediction of the
dependent variable (Y), given the independent variables (X). However, nature is
The chi-square ( χ2 ) goodness-of-fit test is used to determine whether a set of
rarely perfectly predictable, and usually there is substantial variation of the
proportions have specified numerical values. We will review chi-square as a
observed points around the fitted regression line.
techniques and see how it is used to analyze bivariate cross-tabulated data.

Research Report
Student’s t-Test

The format of the research report varies with the needs of the organization. The
Another test of statistical significance is the t test. Many instances occur in
report often contains the following sections:
business where it is desirable to test whether the differences between two sample
outcomes is statistically significant, or is just a chance occurrence due to • Purpose and background research
sampling error. For example, an owner of a fleet of trucks buys two brands of tires
and needs to know if they provide equal mileage or equal tread wear. It is • Table of Contents

possible to test the difference between two means using the t distribution, which • Executive summary
is the best test to use when its assumptions can be met
• Research objectives

ANOVA • Methodology

Another test of significance is the Analysis of Variance (ANOVA) test. The primary • Results
purpose of ANOVA is to test for differences between multiple means. Whereas the
• Limitations
t-test can be used to compare two means, ANOVA is needed to compare three or
more means. If multiple t-tests were applied, the probability of a TYPE I error • Conclusions and recommendations
(rejecting a true null hypothesis) increases as the number of comparisons
• Appendices containing copies of the questionnaires, etc.
increases.
Research by itself does not arrive at business decisions, nor does it guarantee
ANOVA is efficient for analyzing data using relatively few observations and can be that the organization will be successful in marketing or manufacturing its products.
used with categorical variables. Regression can perform a similar analysis to that However, when conducted in a systematic, analytical, and objective manner,
of ANOVA. research can reduce the uncertainty in the decision-making process and increase
the probability and magnitude of success.
Regression

The general purpose of regression is to learn more about the relationship between
an independent or predictor variable and a dependent or criterion variable.

14
Chapter 1

General
Statistical
Concepts
This chapter offers an overview of general statistics.
This is a review only. The reader is encouraged to
do additional research and readings in specific
areas of interest.
Section 1

General Statistical Concepts

STATISTICAL CONCEPTS General Statistical Concepts


1. Experiment
In this section we will review elementary statistical concepts that provide the foundation for an
2. Observation understanding of any area of statistical data analysis. The topics covered are considered necessary for
one to understand the “quantitative nature” of reality. Statistics is the practice of collecting, organizing,
3. Probability
summarizing, and analyzing information to draw conclusions or answer questions. Statistics provides a
measure of confidence in any conclusions.

Experiment vs Observation

When designing a research project a distinction is drawn between experimental or observation


research. In experimental research, we manipulate some variables and measure the effects of this
manipulation on other variables. In observational research we try to not influence any variables but
measure the variables and look for correlations between some sets of variables.

Experiments

An experiment is a study in which treatments are given to see how the participants respond to them.
We all conduct informal experiments in our everyday lives. For example:

• We might try a new dry cleaner (the treatment) to see if our clothes are cleaner (the response)
than when we used our old service.

• A teacher might bring to class a new video (the treatment) to see if students enjoy it (the
response).

16
In an experiment, the treatments are called the independent variable, and the A nonexperimental study, a descriptive study, is defined as a study in which
responses are called the dependent variable. The treatments (independent observations are made to determine the status of what exists at a given point in
variables) are administered so researchers can observe possible changes in time without the administration of treatments. An example is a survey could be
response (dependent variables). when one wants to determine participants' attitudes. In such a study, researchers
strive not to change the participants' attitudes.
Clearly, the purpose of experiments is to identify cause-and-effect relationships, in
which the independent variable is the possible cause and the dependent variable A researcher can obtain solid data on the attitudes held by participants with a
demonstrates the possible effect. proper sample of appropriate questions.

Unfortunately, informal experiments can be misleading. For instance, suppose a


waiter notices that when he is friendlier, he gets larger tips than when he is less
friendly. Did the increased friendliness cause the increase in the tips? The answer Surveys can be very useful tools for getting

is not clear. Perhaps, by chance, the evening that the waiter tried being more information. However, if not conducted properly,

friendly, he happened to have been more efficient. The possible alternative surveys can result in bogus or misleading information.

explanations are almost endless unless an experiment is planned in advance to Some problems include improper wording of questions,

eliminate them. misleading or leading questions, the number of people


who do not respond, or excluding a group in the
To aid in our understanding of the cause and effect we need to have an population who had no chance of responding. These
appropriate control condition. For instance, we could have the waiter be friendlier potential problems mean an observational study has to
to every alternate party of customers. These customers would constitute the be well thought-out before it's administered.
experimental group. The remaining customers that receive the normal amount of
friendliness would be referred to as the control group. Then, statistics could be One cautionary note, an observational study (survey) of

used to compare the average tips earned under the more friendly condition with attitudes will not gather data on how to change

those earned under the less friendly condition. attitudes. To do this, one would need to conduct an
experiment. Data from an observational study can only
Observation be interpreted in causal terms based on a theory that
we have, but correlational data cannot prove causality.
An observational study is one in which data is collected on individuals in a way
that doesn't affect them. The most common nonexperimental study is the survey.
Surveys are questionnaires that are presented to individuals who have been
selected from a population of interest. Surveys take on many different forms:
paper surveys sent through the mail; Web sites; call-in polls conducted by TV
networks; and phone surveys.

17
Probability 1. Suppose we flip a coin and count the number of heads. The number of heads
could be any integer value between 0 and plus infinity. However, it could not be
At the core of statistics is the concept of probability. The following is a brief any number between 0 and plus infinity. We could not, for example, get 2.5
introduction to this core concept. heads. Therefore, the number of heads must be a discrete variable.

Discrete or Continuous Variables Event Probability

All probability distributions can be classified as discrete probability distributions or What is the probability that a card drawn at random from a deck of cards will be
as continuous probability distributions, depending on whether they define an ace? Of the 52 cards in the deck, 4 are aces, so the probability is 4/52. In
probabilities associated with discrete variables or continuous variables. In the general, the probability of an event is the number of favorable outcomes divided
discrete case, one can easily assign a probability to each possible value: when by the total number of possible outcomes. (This assumes the outcomes are all
throwing a die, each of the six values 1 to 6 has the probability 1/6. In contrast, equally likely.) In this case there are four favorable outcomes: (1) the ace of
when a random variable takes values from a continuum, probabilities are nonzero spades, (2) the ace of hearts, (3) the ace of diamonds, and (4) the ace of clubs.
only if they refer to finite intervals: in quality control one might demand that the Since each of the 52 cards in the deck represents a possible outcome, there are
probability of a "16 oz" package containing between 15.5 oz and 16.5 oz should 52 possible outcomes.
be no less than 98%.
The same principle can be applied to the problem of determining the probability of
A continuous random variable can take a continuous range of values — as obtaining different totals from a pair of dice. As shown below, there are 36
opposed to a discrete distribution, where the set of possible values for the random possible outcomes when a pair of dice is thrown.
variable is at most countable. For a discrete distribution an event with probability
zero is impossible (e.g. rolling 3½ on a standard die is impossible, and has
probability zero), this is not so in the case of a continuous random variable.

If a variable can take on any value between two specified values, it is called a
continuous variable; otherwise, it is called a discrete variable.

Some examples will clarify the difference between discrete and continuous
variables.

1. Suppose our customer specifies that a component contain between 9 and


10% magnesium. The percentage of magnesium would be an example of a
continuous variable; since a part could take on any value between 9 and
10%.

18
To calculate the probability that the sum of the two dice will equal 5, calculate the There are 6 outcomes for which the first die is a 6, and of these, there are four that
number of outcomes that sum to 5 and divide by the total number of outcomes total more than 8 (6,3; 6,4; 6,5; 6,6). The probability of a total greater than 8 given
(36). Since four of the outcomes have a total of 5 (1,4; 2,3; 3,2; 4,1), the probability that the first die is 6 is therefore 4/6 = 2/3.
of the two dice adding up to 5 is 4/36 = 1/9 . In like manner, the probability of
obtaining a sum of 12 is computed by dividing the number of favorable outcomes More formally, this probability can be written as:

(there is only one) by the total number of outcomes (36). The probability is
p(total>8 | Die 1 = 6) = 2/3.
therefore 1/36
In this equation, the expression to the left of the vertical bar represents the event
Conditional Probability
and the expression to the right of the vertical bar represents the condition. Thus it
would be read as "The probability that the total is greater than 8 given that Die 1 is
A conditional probability is the probability of an event given that another event has
6 is 2/3." In more abstract form, p(A|B) is the probability of event A given that
occurred. For example, what is the probability that the total of two dice will be
event B occurred.
greater than 8 given that the first die is a 6? This can be computed by considering
only outcomes for which the first die is a 6. Then, determine the proportion of
Probability of A and B
these outcomes that total more than 8.
Independent Events
All the possible outcomes for two dice are shown below:
A and B are two events. If A and B are independent, then the probability that
events A and B both occur is:

p(A and B) = p(A) x p(B).

In other words, the probability of A and B both occurring is the product of the
probability of A and the probability of B.

What is the probability that a fair coin will come up with heads twice in a row? Two
events must occur: a head on the first toss and a head on the second toss. Since
the probability of each event is 1/2, the probability of both events is: 1/2 x 1/2 =
1/4.

Now consider a similar problem: Someone draws a card at random out of a deck,
replaces it, and then draws another card at random. What is the probability that
the first card is the ace of clubs and the second card is a club (any club). Since
there is only one ace of clubs in the deck, the probability of the first event is 1/52.

19
Since 13/52 = 1/4 of the deck is composed of clubs, the probability of the second If the events A and B are not mutually exclusive, then
event is 1/4. Therefore, the probability of both events is: 1/52 x 1/4 = 1/208.
p(A or B) = p(A) + p(B) - p(A and B). A and B are two events. If A and B
Dependent Events are independent, then the probability that events A and B both occur is:

If A and B are not independent, then the probability of A and B is: The logic behind this formula is that when p(A) and p(B) are added, the occasions
on which A and B both occur are counted twice. To adjust for this, p(A and B) is
p(A and B) = p(A) x p(B|A) subtracted.

where p(B|A) is the conditional probability of B given A. What is the probability that a card selected from a deck will be either an ace or a
spade? The relevant probabilities are:
If someone draws a card at random from a deck and then, without replacing the
first card, draws a second card, what is the probability that both cards will be p(ace) = 4/52
aces? Event A is that the first card is an ace. Since 4 of the 52 cards are aces, p(A)
= 4/52 = 1/13. Given that the first card is an ace, what is the probability that the p(spade) = 13/52
second card will be an ace as well? Of the 51 remaining cards, 3 are aces.
The only way in which an ace and a spade can both be drawn is to draw the ace
Therefore, p(B|A) = 3/51 = 1/17 and the probability of A and B is:
of spades. There is only one ace of spades, so:
1/13 x 1/17 = 1/221.
p(ace and spade) = 1/52 .
Mutually Exclusive
The probability of an ace or a spade can be computed as:
If events A and B are mutually exclusive, then the probability of A or B is simply:
p(ace)+p(spade)-p(ace and spade) = 4/52 + 13/52 - 1/52 = 16/52 = 4/13.
p(A or B) = p(A) + p(B). Two events are mutually exclusive if it is not possible for
Consider the probability of rolling a die twice and getting a 6 on at least one of the
both of them to occur. For example, if a die is rolled, the event "getting a 1" and
rolls. The events are defined in the following way:
the event "getting a 2" are mutually exclusive since it is not possible for the die to
be both a one and a two on the same roll. The occurrence of one event "excludes" Event A: 6 on the first roll: p(A) = 1/6
the possibility of the other event. 
Event B: 6 on the second roll: p(B) = 1/6
What is the probability of rolling a die and getting either a 1 or a 6?
p(A and B) = 1/6 x 1/6
Since it is impossible to get both a 1 and a 6, these two events are mutually
exclusive. p(A or B) = 1/6 + 1/6 - 1/6 x 1/6 = 11/36

Therefore, p(1 or 6) = p(1) + p(6) = 1/6 + 1/6 = 1/3


20
When using statistics to make business decisions we are relying an unbiased
estimate of the probability of an event or observation. We want to distinguish a
rare occurrence from one we could expect by pure chance. This simplified
presentation of probability serves to illustrate the core concept. The reader is
encouraged to review detailed discussions of probability and the underlying
theory.

Assumptions Underlying Statistical Models

There are a number of assumptions associated with various statistical techniques.

The implications relative to each assumption are discussed along with the
procedures and techniques employed in statistical modeling.

There are six general assumptions concerning:

1. level of measure

2. random sampling

3. shape of population distribution

4. equal variance

5. independent samples

6. number of samples

The computational procedures associated with several tests are discussed in later
sections.

21
Section 2

Population vs Sample

STATISTICAL CONCEPTS Population vs Sample


1. Population
A population consists of all members of a group in which a researcher has an interest. It may be small,
2. Sampling from a population such as all doctors affiliated with a particular hospital, or it may be large, such as all college seniors in a
state. When populations are large, researchers usually sample. A sample is a subset of a population.
3. Random sampling and sampling
For instance, we might be interested in the attitudes of all registered voters in the state toward the
procedures
economy. The registered voters would constitute the population. If we administered an attitude scale to
all these voters, we would be studying the population, and the summarized results (such as averages)
would be referred to as parameters. If we studied only a sample of the voters, the summarized results
would be referred to as statistics.

No matter how a sample is drawn, it is always possible that the statistics obtained by studying the
sample do not accurately reflect the population parameters that would have been obtained if the entire
population had been studied. In fact, researchers almost always expect some amount of error as a
result of sampling.If sampling creates errors, why do researchers sample? First, for economic and
physical reasons it is not always possible to study an entire population. Second, with proper sampling,
highly reliable results can be obtained. Furthermore, with proper sampling, the amount of error to allow
for in the interpretation of the resulting data can be estimated with inferential statistics, which are
covered in this book. It is the role of the decision maker to design the research study to yield results
within an acceptable level of error.

Freedom from bias is the most important characteristic of a good sample. A bias exists whenever some
members of a population have a greater chance of being selected for inclusion in a sample than other
members of the population. Here are some examples of biased samples:

22
• A professor wishes to study the attitudes of all sophomores at a college To eliminate bias in the selection of individuals for a study, some type of random
(the population) but asks only those enrolled in an introductory class (the sampling is needed. A classic type of random sampling is simple random
sample) to participate in the study. Note that only those in the class have a sampling. Random sampling gives each member of a population an equal chance
chance of being selected; other sophomores have no chance. of being selected. After the sample has been selected, efforts must be made to
encourage all those selected to participate. If some refuse, as often happens, a
• An individual wants to predict the results of a statewide election (the biased sample is obtained even though all members of the population had an
population) but ask the intentions of only voters whom he encounters in a equal chance to have their names selected.
large shopping mall (the sample). Note that only those in the mall have a
chance of being selected; other voters have no chance. Suppose that a researcher is fortunate and obtained the cooperation of everyone
selected. The researcher has obtained an unbiased sample. Can the researcher be
• A magazine editor wants to determine the opinions of all rifle owners (the certain that the results obtained from the sample accurately reflect those results
population) on a gun-control measure but mails questionnaires only to that would have been obtained by studying the entire population? Definitely not,
those who subscribe to the magazine (the sample). Note that only the possibility of random errors still exists. Random errors (created by random
magazine subscribers have a chance to respond; other rifle owners have no selection) are called sampling errors. At random (i.e., by chance), the researcher
chance. may have selected a disproportionately large number of Democrats, males, low
SES group members, and so on. Such errors make the sample unrepresentative
In these three examples, samples of convenience were used, increasing the odds
and therefore may lead to incorrect results.
that some members of a population will be selected while reducing the odds other
members will be selected. Any studies that use this this type of sampling are If both biased and unbiased sampling is subject to error, why do researchers
suspect and should be looked upon with skepticism. In addition to the obvious prefer unbiased random sampling? They prefer it for two reasons: (1) inferential
bias in the examples, there is an additional problem. Even those who do have a statistics enable researchers to estimate the amount of error to allow for when
chance of being included in the samples may refuse to participate. This problem is analyzing the results from unbiased samples, and (2) the amount of sampling error
often referred to as the problem of volunteerism (also called self-selection bias). obtained from unbiased samples tends to be small when large samples are used.
Volunteerism is presumed to create an additional source of bias because those
who decide not to participate have no chance of being included. Furthermore, While using large samples helps to limit the amount of random error, it is important
many studies comparing participants (i.e., volunteers) with non-participants to note that selecting a large sample does not correct for errors due to bias. If
suggest that participants tend to be more highly educated and tend to come from an individual who is trying to predict the results, of an election is very persistent
higher socioeconomic status (SES) groups than their counterparts. Efforts to and spends weeks at the shopping mall asking shoppers how they intend to vote,
reduce the effects of volunteerism include offering rewards; stressing to potential the individual will obtain a very large sample of people who may differ from the
participants the importance of the study; and making it easy for individuals to population of voters in various ways, such as being more affluent, having more
respond. time to spend shopping, better educated, and so on. Thus, increasing the size of a
biased sample does not reduce the amount of error due to bias.

23
Yet, there are many situations in which researchers have no choice but to use • This means that the selection of one item does not affect the selection of any
biased samples. For instance, for ethical and legal reasons, much medical other particular items; they are in no way “tied together.”
research is conducted using volunteers who are willing to risk taking a new
medication or undergoing a new surgical procedure. If promising results are Stated differently, this means:

obtained in initial studies, larger studies with better (but usually still biased)
• Events are independent
samples are undertaken. At some point, despite the possible role of bias,
decisions such as Food and Drug Administration approval of a new drug need to • Underlying probabilities remained unchanged in drawing the sample.
be made on the basis of data obtained with biased samples. Little progress would
be made in most fields if the results of all studies with biased samples were The second condition cannot be strictly true when sampling from a finite
summarily dismissed. population because the selection of one item increases the probability of the
selection of any remaining items (relative to the first item chosen). This creates no
It is important to note that the statistical remedies for errors due to biased great problem as long as the sample is not a large percentage of the population.
samples are extremely limited. When biased samples are used, the results of In that case, the probabilities will have been altered sufficiently to need corrective
statistical analyses of the data should be viewed with caution. measures. The finite correction factor provides the necessary correction.

Finite Correction Factor

Random Sample and Sampling Procedure Whenever the sample from a finite population equals or exceeds 10% of the total
population, i.e., n ≥ 10% N, the following correction factor is used to compensate
The basic assumption in every application of statistical inference is that the for the inherent changes in the underlying probabilities during the sampling
sample is a random sample. The term “random sample” actually refers to a process:
procedure by which the sample is drawn. The resulting sample is also called a
random sample. The estimate of the standard deviation of the sampling distribution is

Definition multiplied by where:

A random sample from a finite population is a sample that has been selected by a
procedure with the following properties: N = number in population and

• The procedure assigns a known probability to each element in the population. n = number in sample.

• If a given element has been selected, then the probability of selecting the
The effect of the finite correction factor is to decrease by a small amount
remaining items is uniformly affected.
unless a very large percentage of the population is included in the sample. When

24
n = N, the is reduced to 0, Recall, when we are able to collect data from
Procedure for Drawing a Random Sample:

everyone in the population we have a census and inferential statistics do not Identify the population you plan to study. You then must decide on the most
apply. In this case of (sample n = population N) the measure of standard deviation appropriate way to draw a representative sample from the population. If the
is the true population parameter. population consists of individuals, what kind of individuals: what income,
geographic area, etc.? At times it is sufficient to draw a simple random sample. In
other instances you may choose to stratify or cluster the population to get your
sample.
Distinction Between Random and Simple Random Sample:
For example, if I want to draw a random sample of machine owners (construction
Random Sample
equipment) within a dealer territory. The population definition must be sharpened –
what is a “machine owner?” Does it include owners of certain brands, non-
• Probability of selecting
owners of certain brands, particular machine usage or production, certain
o each element is known but markets, leased equipment, or not?

o not necessarily equal. The definition of the population must be determined by the purpose of the
research. In a recent study of the feasibility of locating a mini branch store, it was
• Events are independent. determined that the relevant machine owner population included: any project that
used machines within a ten miles radius of the proposed location, office of firms
• Underlying probabilities remain unchanged in sampling process.
that owned machines that might be presently located elsewhere, and all brands of
machines.
Simple Random Sample
Are there significant differences among definable groups in the population to
• Probability of selecting
justify stratified sampling as opposed to simple random sampling? In surveying
o each element is known the relevant machine owner population, a simple random sample (by chance) may
not reflect proportionately the needs of owners with one or two machines and
o and equal. those owning more than two. If the product mix in the store needs to be
significantly different to meet the needs of different segments of the population, a
• Events are independent.
stratified sample (later weighted according to the relative size of each stratum)
would likely produce a more accurate measure of potential demand than a simple
• Underlying possibilities remain unchanged in sampling process.
random sample of the same size.

Once the population and strata have been appropriately defined, a list of members
(in each stratum) should be prepared, if possible. Then, selection from the list (in
25
each stratum) should be done using a random number table to select the sample clustering. The appropriate method is dependent upon the definition of the
members. population of interest.

If a single list cannot be prepared, sampling may be done in two or more stages. It's clear that asking one person in a telephone poll isn't enough to get an idea
For example, the first list (stage) might be of UCC1 filings, the second of brands, about the general views in the United States. What isn't quite as clear is the
and the third of dealers. A telephone book might be sampled by listing the question of how many people we should pick for the sample to get a
numbers of pages and then the number of names on each page. The first random representative sample. Remember that the bigger the sample the more it will cost
number chooses the page; a second random number chooses the person on the to interview or to study. This means that statisticians must weigh the needs for a
page. larger sample against the costs of acquiring one, always remembering that one of
the costs of a too-small sample is that it will have a greater chance of being
Other approaches to sampling include clustering and stratification. If a unrepresentative. More generally, samples in statistics must be of a certain size to
complete enumeration of the population is not possible, “cluster” sampling may be meaningful representations of populations. How large, depends partly on the
be needed. In this case only blocks of a city or areas are listed and then selected population we are looking at. If it's very diverse we need a larger sample to
by means of a random number table. All members in the block or area selected capture that diversity. The size of the sample also depends on the precision we
are interviewed as a “cluster”. are seeking and on the kinds of questions we are asking.

For statistical precision, cluster sampling is generally not as desirable as


complete enumeration of the population, but may be much more cost effective
because interviews are geographically grouped. Simple random sampling of a city
population would likely require considerable travel throughout the city to reach
persons chosen by the random sampling process.

A stratified sample is one obtained by separating the population into


homogeneous, non-overlapping groups called strata, and then obtaining a simple
random sample from each. A researcher may use a stratified sample to help
ensure a more representative sample from the population. For example we may
want to be certain that the proportion of male and female respondents is
maintained so the sample is not drawn at random from the whole population, but
separately from a number of disjoint strata (gender) of the population in order to
ensure a more representative sample.

Once you have identified the population you plan to study you will chose the
most efficient method to draw a representative random sample. Your choices
include a simple random sample, a systematic sample, stratification and
26
Section 3

Descriptive or Inferential

STATISTICAL CONCEPTS Descriptive or Inferential Techniques


1. Descriptive Statistics
The two most general classifications of statistical techniques are: descriptive and inferential. The
2. Inferential Statistics general assumptions underlying each class of technique are described below:

3. Parametric or Non-parametric Descriptive Statistics

Descriptive statistics are used to describe a set of data. Descriptions are in the form of a midpoint
(mean, median, mode), a dispersion (range, variance, quartiles), and a shape (normal, skewed,
rectangular). For instance, suppose you have the scores on a standardized test for 500 participants.
One way to summarize the data is to calculate an average score, which indicates how the typical
individual scored. You might also determine the range of scores from the highest to the lowest score,
which would indicate how much the scores vary. The only major assumption underlying descriptive
models is the Level of Measure (or scale) used to represent the data.

Correlational statistics are a special subgroup of descriptive statistics, which are described separately.
The purpose of correlational statistics is to describe the relationship between two or more variables for
one group of participants. For instance, suppose a researcher is interested in the predictive validity of a
college admissions test. The researcher could collect the admissions scores and the freshman GPAs for
a group of college students. To determine the validity of the test for predicting GPAs, a statistic known
as a correlation coefficient could be computed. Correlation coefficients range in value from 0.00 (no
correlation between variables) to 1.00 (a perfect correlation).

Level of Measure Assumption:

Measurement can be defined as the assignment of numerals to objects or events according to rules.
There are four basic levels of measure: nominal, ordinal, interval, and ratio. The importance of the level

27
of measure is realized in the operations to produce the scale and in the process was included in earlier sections and will be revisited in subsequent
mathematical operations that are permissible with each level. The mathematical sections when the requirements for specific techniques are addressed.
operations possible and examples of each level detailed in the table at the end of
the section. Having sampled, a researcher knows that the results may not be accurate
because the sample may not be representative. In fact, the researcher knows that
Note that the only mathematical operation possible with nominal level of measure there is a high probability that the results are off by at least a small amount. This is
is equivalence, that is = or ≠. With ordinal (ranking) level, the additional operations why researchers often mention a margin of error, which is an inferential statistic. It
of > and < are added to = and ≠. Thus, with ordinal data, the middle of a is reported as a warning to readers of research that random sampling may have
distribution could be determined with the median. This cannot be done with produced errors, which should be considered when interpreting results. For
nominal level. With an ordinal level of measure, however, we can not compute the instance, a weekly news magazine recently reported that 52% of the respondents
mean because the operations (of addition and division) are not possible. If we had in a national poll believed that the economy was improving. A footnote in the
interval or ratio level we could compute the mean and median as well. We could report indicated that the margin of error was ±2.3. This means that the researcher
also compute the variance. was confident that the true percentage for the whole population was within 2.3
percentage points of 52% (i.e., 49.7% to 54.3%).
For most statistical models, distinction needs to be made among only three levels:
nominal, ordinal, and interval or ratio. The level of measure is a necessary You may recall that a population is any group in which a researcher is interested. It
condition for all statistical models, descriptive or inferential. It is very important in may be large, such as all adults age 18 and over who reside in the United States,
distinguishing among different models involving statistical inference as discussed or it might be small, such as all employees of a specific company. A study in
in the next section. which all members of a population are included is called a census. A census is
often feasible and desirable when studying small populations (e.g., an algebra
Inference Models teacher may choose to pretest all students at the beginning of a course). When a
population is large, it is more economical to study only a sample of the population.
Inferential statistics are tools that tell us how much confidence we can have when
With modern sampling techniques, highly accurate information can be obtained
generalizing from a sample to a population. Consider national opinion polls in
using relatively small samples.
which carefully drawn samples of only about 1,500 adults are used to estimate the
opinions of the entire adult population of the United States. The pollster first Inferential statistics are not needed when analyzing the results of a census
calculates descriptive statistics, such as the percentage of respondents who are in because there is no sampling error. The use of inferential statistics for evaluating
favor of capital punishment and the percentage who are opposed. results when sampling is covered in chapter 3.

Most statistical models of interest in business problem solving involve making an Shape of Population Distribution Assumption:
inference concerning a characteristic of a population from data in a sample. All
such models are based on probability theory and therefore require that random Among the models used for statistical inference are two general groups:
sample be used in making such inferences. A discussion of the random sampling Parametric and Non-Parametric. The parametric models, such as Z or t, and
Pearson product moment correlation require that the sample be drawn from a

28
population with prescribed “shape”, usually a normal curve. The non-parametric the choice of statistical model is given in the section of the text entitled “Decision
models such as Spearman rank correlation and the Mann-Whitney U test are Rules for Choosing Among Statistical Models.”
termed “distribution-free” models because they do not require any prescribed
population shape. Non-Parametric Statistics

Equal Variance Assumption: We have focused on understanding a population based on a distribution of


observations or outputs. The distribution is assumed to be normal or approach
Parametric statistical models such as Z or t, which compare two means may also normal as sample size increases. We use statistics as the unbiased estimators for
have a built-in assumption that the two samples were drawn from populations unknown population parameters.
with equal variance. The parametric model used in association analysis called
“regression analysis” or Pearson product moment correlation analysis includes an Non-parametric statistics provide techniques for developing and testing

assumption that the variance of x is the same for all values of y and that the distribution free statistical models. As we progress to testing hypotheses and

variance for y is the same for all values of x. This property is called drawing inferences we will be reviewing several distribution free procedures.

homoscedasticity and is described later in the text. (Non-parametric models do Interested readers who deal with very small sample sizes or ordinal levels of

not require any assumption concerning variances.) measure are encouraged to review the multitude of distribution free modeling
options available.
Independent Samples Assumption:

Depending on the design of the test, either a parametric or non-parametric test


may require that two or more samples be independent of each other. If matched
pairs of data are being evaluated such as with non-parametric Wilcoxon Matched
Pairs signed Ranks test, then the sample consists of two matched pairs of data.

Numbers of Samples:

Depending on the test, certain parametric or non-parametric tests are designed


for use with two or more samples. Thus, when two or more samples are involved
in the design only certain tests will be applicable. This selection criterion is
especially appropriate when more than two samples are involved. The parametric
“analysis of variance” model or the non-parametric Chi Square model may be
used depending on other assumptions.

The particular assumptions and requirements for various models are presented in
the sections that describe the models. A summary of decision rules concerning

29
Section 4

Level of Measurement

STATISTICAL CONCEPTS Levels of Measurement


1. Nominal
Levels of measurement help researchers determine what type of statistical analysis is appropriate for a
2. Ordinal given set of data. It is important to master the material in this section of the book because it is
frequently referenced in the discussion of descriptive and inferential statistics that follow.
3. Interval
Measurement can be defined as the assignment of numerals to objects or events according to rules.
4. Ratio
There are four basic levels of measure: nominal, ordinal, interval, and ratio. The importance of the level
of measure is realized in the operations to produce the scale and in the mathematical operations that
are permissible with each level. The mathematical operations possible and examples of each level are
shown in the level of measure table at the end of this section.

For most statistical models, distinction needs to be made among only three measurement levels:
nominal, ordinal, and interval or ratio. The level of measure is a necessary condition for all statistical
techniques, descriptive or inferential. It is very important in distinguishing among different techniques
involving statistical inference.

The lowest level of measurement is nominal (also known as categorical). It is helpful to think of this level as the
naming level because names (i.e., words) are used instead of numbers. Here are four examples:

• Participants name their political affiliation.

• Participants name their religious affiliation.

• Participants name their gender.

30
Notice that the categories named do not put the participants in any particular order. Notice that if one participant is 5'6" tall and another is 5'8" tall, we know not only
There is no basis on which we could all agree for saying that Democrats are either the order of the participants, but we also know by how much the participants differ
higher or lower than Republicans. The same is true for religious affiliation or gender. from each other (i.e., two inches). Both interval and ratio scales have equal
Note that the only mathematical operation possible with nominal level of measure intervals. For instance, the difference between three inches and four inches is the
is equivalence, that is = or ≠. same as the difference between five inches and six inches.

The next level of measurement is ordinal. Ordinal measurement puts participants in rank In most statistical analyses, interval and ratio measurements are analyzed in the
order from high to low, but it does not indicate how much higher or lower one participant same way. However, there is a difference between these two levels. An interval
is in relation to another. To understand this level, consider these examples: scale does not have an absolute zero. For instance, if we measure intelligence, we
do not know exactly what constitutes absolutely zero intelligence and thus cannot
• Participants are ranked according to their height; the tallest participant is given a measure the zero point. In contrast, a ratio scale has an absolute zero point on its
rank of 1, the next tallest is given a rank of 2, and so on. scale. For instance, we know where the zero point is when we measure height.

• College students report out their class rank in terms of freshman, sophomore,
junior or senior.

In the examples above, the measurements indicate the relative standings of


participants but do not indicate the amount of difference among participants. For
instance, we know that a participant with a rank of one is taller than a participant
with a rank of two, but we do not know by how much. The first participant may be
only one-quarter of an inch taller or may be two feet taller than the second. With
ordinal (ranking) level, the additional operations of > and < are added to = and ≠.
Thus, with ordinal data, the middle of a distribution could be determined with the
median. This cannot be done with nominal level. With ordinal level of measure,
however, we can not compute the mean because the operations (of addition and
division) are not possible.

The next two levels, interval and ratio, tell us by how much participants differ, For
example:

• The height of each participant is measured to the nearest inch.

• College students report out the number of credit hours completed.

31
Levels of Measure

Level Empirical Permissible Examples


Transformations and Statistics
Characteristics
Nominal = or ≠ Number of cases. Gender, auto
(classifications) classes are either License numbers,
equal or not equal Mode Jersey numbers
Yes or No answer.

Ordinal =, ≠, >, or < Median Street numbers,


(ranking) Assumes continuous Percentiles Sgt., Cpl., Pvt.,
scale Rank-Order Class rank
Correlation Differential scales

Interval =, ≠, >, <, ÷, x Mean Temperature °F or


(arithmetic) All math operations Standard °C,
possible up to linear Deviation Calendar time
transformation, i.e., Product
multiply by constant moment correlation
(known difference
between numbers are
multiples of each other,
zero point and amount
of measure are
arbitrary)

Ratio =, ≠, >, <, ÷, x Coefficient of Length,


(true zero) Same as interval with variation Weight,
additional feature of Density,
absolute or true zero Logarithmic Force,
point. Values on scale transformation Time interval,
and intervals on scale (minutes)
may be expressed as
multiples of each other

32
Chapter 2

Descriptive
Statistics

In this section we focus on describing data. We will


explore graphical, tabular and numeric methods of
describing what we have found.
Section 1

Descriptive Statistics

DESCRIPTIVES STATISTICS Descriptive Statistics


1. Tables and Crosstabulations
The gas price example from the introduction illustrated a basic application of descriptive statistics.
2. Types of Frequency Tables When one is driving past a gas station the prices on the sign are simply data points. Everyday we may
drive past multiple gas stations. Each station has price per gallon displayed on a large sign for us to
3. Graphical Representations see. We collect these data points and consciously calculate what we belief is the going price for a
gallon of gas. In essence, we have taken data, organized the information and generated a descriptive
statistic (the average price for gas). Albeit, this is an average based on our observations but it is a
statistic (derived from data) that we use as representative of the “going” (average) price of gas (for the
population of gas stations) to help us make decisions.

How do you use this average gas price (a statistic)? Let’s say that the next day you need to purchase
gas for the car. You are driving past a gas station where the price of gas is posted as $ .10 below what
you believe the average price of gas to be (based on recent observations). Do you stop and get gas at
this station? Maybe.

Why maybe? This brings us to that nagging thing called probability. To help understand probability
think of how different your reaction to a price drop would be if prices had been the same for the past 6
months versus fluctuating prices. For this example, let’s assume you have been observing fluctuations
in gas prices? Some days the price is rising, other days you observe the price is dropping. How
confident are you that this $ .10 difference is below what your regular gas station will be charging? Will
the price be $ .15 lower at the next station?

In addition to the average price of gas we have calculated several descriptive statistics regarding the
variability in price. We have an estimate for the range in prices we expect from a low to high price and

34
we have a sense of how much variation we have observed. These descriptive Methods for Summarizing Data
statistics help us in making decisions.
Tables and Crosstabulations
This chapter is dedicated to describing the data we have collected, Including:
When variables are categorical, frequency tables (crosstabulations) provide useful
Section 1 - Descriptives summaries. For a report, you may need only the number or percentage of cases
falling in specified categories or cross-classifications. At times, you may require a
• Frequencies and proportions.
test of independence or a measure of association between two categorical
• Tables and cross tabulations. variables.

• Plots and graphical depictions. Statistical procedures are designed to make, analyze, and save frequency tables
that are formed by categorical variables (or table factors). The values of the factors
Section 2 - Distribution can be character or numeric. Both procedures form tables using data read from a
cases-by-variables rectangular file or recorded as frequencies (for example, from a
• Shapes of distributions
table in a report) with cell indices. You can request percentages of row totals,

• Skewness column totals, or the total sample size.

Section 3 - Location

• Mean, median and mode.

Section 4 - Spread

• Range, variance, and standard deviation.

Section 5 - Association

• Correlation.

Section 6 - Normality

• Tests of normality

We will focus on drawing conclusions from the information we have drawn out of
the data in later sections of the text.

35
• Multiway: Frequency counts, percentages, tests and measures of association
Tabulate for series of two-way tables and standardized tables stratified by all
combinations of values of a third, fourth, etc., table factor.

• Multiway: Standardized frequency counts, and partial measures of association


for the Standardize crosstabulation of two factors by controlling the effect of test
factor(s). Resampling procedures are available in this SYSTAT feature.

Tables report results as counts or the number of cases falling in specific


categories or cross-classifications. Categories may be unordered (democrat,
republican, and independent), ordered (low, medium, and high), or formed by
defining intervals on a continuous variable like AGE (child, teen, adult, and elderly).

There are many formats for displaying tabular data. Let us examine several basic
layouts for counts and percentages.

Four Types of Frequency Tables

• One-Way Frequency counts, percentages, and confidence intervals on cell


proportions for single table factors or categorical variables.

• Two-Way Frequency counts, percentages, tests, and measures of association for


the crosstabulation of two factors.

36
DATA FOR ANALYSIS I

One-Way Frequency Table

If we enter the data into SYSTAT we can request a basic frequency table for just
current lease holders by using the data select function. Click on data and Select
Cases.

In this example we have data from a survey of current lease holders and their
perception of repair cost at a dealership. The lease variable is coded as 1 for
current lease holders and 2 for non-lease owners. Repair cost is coded as
1=expensive, 2- high, and 3-average.

37
Once you click on Select Cases the following window will open. You will click on Analysis, Tables and One -way to open the window for selecting
the repair cost variable for analysis.
You will click on LEASE to have it entered as a selection criteria and made the
operator - 1 (since 1 represents current lease holders in the data file).

Click OK and you are ready to run the basic one way analysis of the data.

38
When you click on One-Way the window for defining the tables will open. Select
repair cost as the variable of interest and click on counts and percents.

From the analysis we see that half of the current lease holders view the dealer
repair cost to be expensive and 42% view the cost as average.

Once you click OK the output window will open and you will see the following
tables from your one-way frequency analysis of repair cost for current lease
holders.

39
Two-Way Frequency Table

Extending the current example, the dealership is interested in the opinions of


current lease holders when compared with non-lease holders. We use the data
we entered into SYSTAT and turn off the case select function. Click on Data and
Select Cases. Once you open the select window click on the turn off button.

To start the analysis we will select two-way tables (Lease by Repair Cost).

We will have Repair Cost as the column and Lease as the Row variable. Click on Counts and Percents for the output. When you are finished click OK.
This will generate the analysis and open the output window.

40
Graphical Representations

Generating graphical representations of the data is fairly straight forward in


SYSTAT. We may choose between multiple representations using the Graph
function.

Under the Graph function you can select Summary Charts including Bar, Dot, Line,
and Pie charts.

From the analysis it does appear that owners without a lease view the dealership Clicking on Density Displays you can generate Histograms, Box Plots, Dot Density

as an expensive repair option at a higher rate than the current lease holders. and Density Functions.

We can extend our analysis by using multiway tables when we wish to examine
the interaction between more than two variables.

41
Additional graphical representations are provided for multivariate data.

Clicking on Plots opens a menu for selecting Scatterplots and Probability Plots.

42
Example: Pie Chart from Repair Cost Data

The other option available under the Graphics routine is to open the Graph Gallery Using the data from our two-way table analysis we can generate a pie chart to get
and select the style of representation you wish to use by clicking from the visual a view of how lease holders and non-lease holders view dealership repair cost.
menu of types.

As we found in the two way table analysis a higher percentage of non lease
holders view the dealership as an expensive repair option (red area of pie chart)..

43
Section 2

Distribution's Shape

DESCRIPTIVE STATISTICS Shapes of Distributions


1. Stem-Leaf Plot
Stem-Leaf Plot
2. Frequency Polygon
The Stem procedure creates a stem-and-leaf plot for one or more variables. The plot shows the
3. Normal Distribution distribution of a variable graphically. In a stem-and-leaf plot, the digits of each number are separated
into a stem and a leaf. The stems are listed as a column on the left, and the leaves for each stem are in
4. Skewed Distributions
a row on the right. Stem-and-leaf plots also list the minimum, lower-hinge, median, upper-hinge, and
5. Bi-modal Distributions maximum values of the sample. Unlike histograms, stem-and-leaf plots show actual numeric values to
the precision of the leaves.

The stem-and-leaf plot is useful for assessing distributional shape and identifying outliers. Values that
are markedly different from the others in the sample are labeled as outside values-that is, the value is
more than 1.5 hspreads outside its hinge (the hspread is the distance between the lower and upper
hinges, or quartiles). Under normality, this translates into roughly 2.7 standard deviations from the
mean.

The following must be specified to obtain a stem-and-leaf plot:

• Selected variable(s). A separate stem-and-leaf plot is created for each selected variable.

• Number of lines. You can indicate how many lines (stems) to include in the plot.

The shape of a distribution of a set of scores can be seen by examining a frequency distribution which is a
table that shows how many participants have each score. Consider the frequency distribution in Table 1. The
frequency (i.e., f which is the number of participants) associated with each score (X) is shown. Examination
indicates that most of the participants are near the middle of the distribution (i.e., near a score of 19) and that

44
the participants are spread out on both sides of the middle with the frequencies tapering
off.

Distribution of
Scores
X f
22 1
21 3
20 4
19 8
18 5
17 2
16 0
15 1
N=24 Frequency Polygon

When there are many participants, the shape of a polygon becomes smoother and
is referred to as a curve. The most important shape is that of the normal curve,
The shape of a distribution is even clearer when examining a frequency polygon, which which is often called the bell-shaped curve. This curve is illustrated below.
is a figure (i.e., a drawing) that shows how many participants have each score. The same
data shown in the table are shown in the frequency polygon on the next page. For
instance, the frequency distribution shows that 3 participants had a score of 21; this
same information is displayed in the frequency polygon. The high point in the polygon
shows where most of the participants are clustered (in this case, near a score of 19).
The tapering off around 19 illustrates how spread out the participants are around the
middle.

The normal curve is important for two reasons. First, it is a shape very often found
in nature. For instance, the heights of women in large populations are normally
distributed. There are small numbers of very short women, which is why the curve

45
is low on the left; many women are of about average height, which is why the When the long tail is pointing to the left, a distribution is said to have a negative skew
curve is high in the middle; and there are small numbers of very tall women. Here (i.e., skewed to the left). A negative skew would be found if a large population of
is another example: The average annual rainfall in Pittsburgh over the past 100 individuals was tested on skills in which they have been thoroughly trained. For
years has been approximately normal. There have been a very small number of instance, if a researcher tested a very large population of recent nursing school
years in which there was extremely little rainfall, many years with about average graduates on very basic nursing skills, a distribution with a negative skew should
rainfall, and a very small number of years with a great deal of rainfall. Another emerge. There should be large numbers of graduates with high scores, but there should
reason the normal curve is important is that it is used as the basis for a number of be a long tail pointing to the left, showing that a small number of nurses who, for one
inferential statistics, which are covered in this text. reason or another, such as being physically ill on the day the test was administered,
did not perform well on the test.
Some distributions are skewed. For instance, if you plot the distribution of income for a
large population, in all likelihood you will find that it has a positive skew (i.e., is skewed to
the right). Skewed right indicates that there are large numbers of people, with relatively
low incomes; thus, the curve is high on the left. The curve drops off dramatically to the
right, forming a long tail pointing to the right. This long tail is created by the small
numbers of individuals with very high incomes. Skewed distributions are named for
their long tails. On a number line, positive numbers are to the right; hence, the term
positive skew is used to describe a skewed distribution in which there is a long tail
pointing to the right (but no long tail pointing to the left).

A distribution skewed to the left (negative skew).

Bimodal distributions have two high points. A curve such is called bimodal even though
the two high points are not exactly equal in height. Such a curve is most likely to
emerge when human intervention or a rare event has changed the composition of a
population. For instance, if a civil war in a country cost the lives of many young adults,
the distribution of age after the war might be bimodal, with a dip in the middle.
Bimodal distributions are much less frequently found in research than the other types
of curves discussed earlier in this section.
A distribution skewed to the right (positive skew).

46
The table represents a discrete probability distribution because it relates each
value of a discrete random variable with its probability of occurrence. With a
discrete probability distribution, each possible value of the discrete random
variable can be associated with a non-zero probability. Thus, a discrete probability
distribution can always be presented in tabular form.

Continuous Probability Distributions

If a random variable is a continuous variable, its probability distribution is called a


continuous probability distribution. A continuous random variable can take on an
infinite number of values. The probability that it will equal a specific value is
always zero. The equation used to describe a continuous probability distribution
is called a probability density function. For a continuous probability distribution:
Just like variables, probability distributions can be classified as discrete or
continuous. • The graph of the density function will also be continuous over that range.

Discrete Probability Distributions • The area bounded by the curve of the density function and the x-axis is
equal to 1, when computed over the domain of the variable.
If a random variable is a discrete variable, its probability distribution is called a
discrete probability distribution. Suppose you flip a coin two times. This simple • The probability that a random variable assumes a value between a and b is
statistical experiment can have four possible outcomes: HH, HT, TH, and TT. Now, equal to the area under the density function bounded by a and b.
let the variable X represent the number of Heads that result from this experiment.
The variable X can only take on the values 0, 1, or 2, so it is a discrete random The shape of a distribution has important implications for determining which average to

variable. compute. Graphical displays and specific statistical test are used to determine the
appropriateness of
assuming a normal
# HEADS Probability
distribution for

0 0.25 specific data. Tests of


normality are covered
1 0.50 in section 6 of the
chapter.
2 0.25

47
Section 3

Location

DESCRIPTIVE STATISTICS Location


1. Mean
There are many ways to describe data, although not all descriptors are appropriate for a given sample.
2. Median Means and standard deviations are useful for data that follow a normal distribution, but are poor
descriptors when the distribution is highly skewed or has outliers, subgroups, or other anomalies. Some
3. Mode statistics, such as the mean and median, describe the center of a distribution. These estimates are called
measures of location. Others, such as the standard deviation, describe the spread of the distribution.

Before deciding what you want to describe (location, spread, and so on), you should consider what type
of variables are present. Are the values of a variable unordered categories, ordered categories, counts,
or measurements?

For many statistical purposes, counts are treated as measured variables. From the discussion on levels of
measurement we know that such variables are called quantitative if one can do arithmetic on their values.

The Mean: An Average

The mean is the most frequently used average. It is so widely used that it is sometimes simply called the
average. However, the term "average" is ambiguous because several different types of averages are used in
statistics. In this section, the mean will be considered. The mean is a location measure a measure of central
tendency.

Computation of the mean is easy: sum (i.e., add up) the scores and divide by the number of scores. Here is an
example:

Scores: 5, 6,7,10,12,15

48
Sum of scores: 55 example of the contributions given to charity by two groups of children expressed
in cents:
Number of scores: 6
Group A: 1,1,2,3,3,4,4,4,5,5,5,5,6,6,6,7,8,10,10,10,11
Computation of mean: 55/6 = 9.166 = 9.17
Mean for Group A = 5.52
Notice in the example above that the answer was computed to three decimal places and
rounded to two. In research reports, the mean is usually reported to two decimal Group B: 1,2,2,3,3, 3,4,4,5, 5,5, 6,6,6, 6, 6,9,10,10, 150, 200
places.
Mean for Group B = 21.24
There are several symbols for the mean. Commonly used symbols for the mean are ,
Notice that overall the two distributions are quite similar. Yet the mean for Group B
M and m.
is much higher than the mean for Group A because just two students in Group B

When statisticians use this symbol: The symbol is pronounced "X-bar." It is used gave extremely high contributions of 150 cents and 200 cents. If only the means
for the two groups were reported without reporting all the individual contributions,
frequently in statistics textbooks and research reports in business.
it would suggest that the average student in Group B gave about 21 cents when in
The mean is defined as "the balance point in a distribution of scores." Specifically, it is fact none of the students made a contribution of this amount. Recall from the
the point around which all the deviations sum to zero. earlier discussion, a distribution that has some extreme scores at one end but not
the other is called a skewed distribution. The mean is almost always inappropriate
For example, if the sum of the scores for 5 numbers is 60; dividing this by the number of for describing the average of a highly skewed distribution. Another limitation of
scores (5) yields a mean of 12.00. By subtracting the mean from each score, the the mean is that it is appropriate only for use with interval and ratio scales of
deviations from the mean are obtained. If the first score is 7, the score (7) minus the measurement.
mean (12) yields a deviation of-5. Thus, for a score of 7, the deviation is -5. The
deviations of all 5 scores will sum to zero. (The negatives cancel out the positives Median and Mode
when summing, yielding zero.)
How do we describe the center, or central location of the distribution, on a scale? If
If you substitute any other number for the mean and perform the calculations of the data is not normally distributed, extreme high or low scores, then the mean is not
deviations, you will not get a sum of zero. Only the mean will produce this sum. the best way to describe the center of the data. When there are extreme values or
Thus, saying "the mean equals 12.0" is a shorthand way of saying "the value outliers present in the data, the arithmetic mean (AM) will be affected by the extreme
around which the deviations sum to zero is 12.0." observations and thus will not be a suitable measure of central tendency. Another
measure of location is to pick the value above which one half of the data values fall
A major drawback of the mean is that it is drawn in the direction of extreme
and, by implication, below which the other half of the data values fall. This measure
scores. This is a problem if there are either some extremely high scores that pull
is called the median. The median is computed based only on the central one or two
the mean up or some extremely low scores that pull it down. The following is an
values and does not depend on the values of other observations.

49
The alternative to describing the center of the skewed data with the mean is the median. Scores (arranged in order from low to high):
The median is the value in a distribution that has 50% of the cases above it and 50% of
the cases below it. Thus, it is defined as the middle point in a distribution. In the following 2,2,4,6,7,7,7,9,10,12

example there are 11 scores. The middle score, with 50% on each side, is 81, which is the
A disadvantage of the mode is that there may be more than one mode for a given
median. Thus, 81 is the value of the median for the set of scores. Note: there are five
distribution. This is the case for the following observations in which both 20 and 23 are
scores above 81 and five scores below 81.
modes.

Scores (arranged in order from low to high):


Scores (arranged in order from low to high):

61, 61,72, 77, 80, 81, 82, 85, 89, 90,92


17,19,20,20,22,23,23,28

In the next example there are 6 scores. Because there is an even number of scores,
Choosing Among the Three Averages
the median is halfway between the two middle scores. To find the halfway point, sum
the two middle scores (7+10 = 17) and divide by 2 (17/2 = 8.5). Thus, 8.5 is the value Other things being equal, choose the mean because more powerful statistical tests
of the median of the set of scores. described later in this book can be applied to it than to the other averages. However,

Scores (arranged in order from low to high): • the mean is not appropriate for describing highly skewed distributions, and

3,3,7,10,12,15 • the mean is not appropriate for describing nominal and ordinal data.

An advantage of the median is that it is insensitive to extreme scores. Taking the same set Choose the median when the mean is inappropriate. The exception to this guideline is
of data and replacing the 15 with an extremely high score of 319 has no effect on the when describing nominal data. Nominal data are naming data such as political affiliation,
value of the median. The median is 8.5, which is the same value as in the previous ethnicity, and so on. There is no natural order to these data; therefore, they cannot be
example, despite the one extremely high score. Thus, the median is insensitive to the put in order, which is required in order to calculate the median.
skew in a skewed distribution. Put another way, the median is an appropriate average
for describing the typical participant in a highly skewed distribution. Choose the mode when an average is needed to describe nominal data. Note that when
describing nominal data, it is often not necessary to use an average because
Scores (arranged in order from low to high): percentages can be used as an alternative. For instance, if there are more registered
Democrats than Republicans in a community, the best way to describe this is to report
3,3,7,10,12,319
the percentage of people registered in each party. To state only the mode is much less
The mode is another average. It is defined as the most frequently occurring score. The informative than reporting percentages.
following data has a mode of 7 because it occurs more often than any other score.
Note that in a perfectly symmetrical distribution such as the normal distribution,
the mean, median, and mode all have the same value.

50
In skewed distributions, their values are different. In a distribution with a positive
skew, the mean has the highest value because it is pulled in the direction of the
extremely high scores. In a distribution with a negative skew, the mean has the
lowest value because it is pulled in the direction of the extremely low scores. As
noted earlier, the mean should not be used when a distribution is highly skewed.

51
Section 4

Spread

DESCRIPTIVE STATISTICS Spread


1. Range
To describe the data we often are interested in the variability or spread of the scores. One way to measure
2. Interquartile Range spread is to take the difference between the largest and smallest value in the data. This is called the
range. Another measure, called the interquartile range or midrange, is the difference between the values
3. Standard Deviation at the limits of the middle 50% of the data. Using the statistics at the top of the stem-and-leaf display,
subtract the lower hinge from the upper hinge. Still another way to measure would be to compute the
average variability in the values. The standard deviation is the square root of the average squared
deviation of values from the mean. We will examine each of these measures of variability in the following
sections.

Range and Interquartile Range

Variability (spread) refers to differences among the scores of participants. For instance, if all the participants
who take a test earn the same score, there is no variability. In practice, of course, some variability (and often
quite a large amount of variability) is usually found among participants in research studies. Two measures of
variability (i.e., the range and interquartile range) are designed to concisely describe the amount of variability in
a set of data.

A simple statistic that describes variability is the range, which is the difference between the highest score and
the lowest score. For the following scores the range is 18 (20 minus 2). A researcher could report 18 as the
range or simply state that the scores range from 2 to 20.

Scores: 2,5,7,7,8,8,10,12,12,15,17,20

A weakness of the range is that it is based on only the two most extreme scores, which may not
accurately reflect the variability in the entire group. Consider the following data where the range in is also

52
18. However, there is much less variability among the participants in the following value of an average (such as the value of the median) first, followed by the value of a
set of scores. measure of variability (such as the interquartile range).

Scores: 2,2,2,3,4,4,5,5,5,6,6,20 Standard Deviation

Notice that except for the one participant with a score of 20, all participants have The standard deviation is the most frequently used measure of variability. In the
scores in the narrow range from 2 to 6. Yet, the one participant with a score of 20 previous section, you learned that the term variability refers to the differences
has pulled the range up to a value of 18, making it unrepresentative of the among participants. Synonyms for variability are spread and dispersion.
variability of the scores of the majority of the group.
The standard deviation is a statistic that provides an overall measurement of how
In this case, scores such as the score 20 are known as outliers. They lie far much participants' scores differ from the mean score of their group. It is a special
outside the range of the majority of other scores and increase the size of the type of average of the deviations of the scores from their mean.
range. As a general rule, the range is inappropriate for describing a distribution of
scores with outliers. The more spread out participants are around their mean, the larger the standard
deviation. Comparison of the following two examples illustrates this principle.
A better measure of variability is the interquartile range (IQR). It is defined as the Note that S is the symbol for the standard deviation. Notice, too, that the mean is
range of the middle 50% of the participants. By using only the middle 50%, the the same for both groups (i.e., Mean =10.00 for each group), but Group A with the
range of the majority of the participants is being described and at the same time, greater variability among the scores (S = 7.45) has a larger standard deviation than
outliers that could have an undue influence on the ordinary range are stripped of Group B (S = 1.49).
their influence.
Example Group A
Using the same set of data illustrates the value and meaning of the interquartile
range. Notice that the scores are in order from low to high. The interquartile range Scores for Group A: 0, 0, 5,5,10,15,15,20,20
separates the lowest 25% from the middle 50%, and separates the highest 25%
Mean= 10.00, S=7.45
from the middle 50%. It turns out that the range for the middle 50% is 3 points.
When 3.0 is reported as the IQR, readers know that the range of the middle 50%
Example Group B:
of participants is only 3 points, indicating little variability for the majority of the
participants. Note that the undue influence of the outlier of 20 has been overcome Scores for Group B: 8, 8,9,9,10,11,11,12,12
by using the interquartile range.
Mean= 10.00, S= 1.49
Scores: 2,2,2,3,4,4,5,5,5,6,6,20
Now consider the scores of Group C in next example. All participants have the
When the median is reported as the average for a set of scores, it is customary to also same score; therefore, there is no variability. When this is the case, the standard
report the interquartile range as the measure of variability. It is customary to report the

53
deviation equals zero, which indicates the complete lack of variability. Thus, S= Consider this example: Suppose that the mean of a set of normally distributed
0.00. scores equals 70 and the standard deviation equals 10. Then, about two-thirds of
the cases lie within 10 points of the mean. More precisely, 68% (a little more than
Example Group C: two-thirds) of the cases lie within +/- one standard deviation (10 points) of the
mean.
Scores for Group C: 10, 10,10, 10, 10, 10,10,10, 10, 10

Mean = 10.00, S= 0.00

Considering the three previous examples, it is clear that the more participants
differ from the mean of their group, the larger the standard deviation. Conversely,
the less participants differ from the mean of their group, the smaller the standard
deviation.

In review, even though the three groups (A, B and C) have the same mean, the
following is true:

• Group A has more variability than Groups B and C.

• Group B has more variability than Group C.

• Group C has no variability.

Thus, if you were reading a research report on the three groups, you would obtain
important information about how the groups differ by considering their standard
deviations. As you can see, 34% of the cases will lie between a score of 60 and the mean of
70 ( ), while another 34% of the cases will lie between the mean of 70 and a
The standard deviation takes on a special meaning when considered in relation to
score of 80 ( ). In all, 68% of the cases lie between scores of 60 and 80.
the normal curve because the standard deviation was designed expressly to
describe this curve. Here is a basic rule to remember: About two-thirds of the
The 68% rule applies to all normal curves. In fact, this is a property of the normal curve:
cases (68%) lie within one standard deviation unit of the mean in a normal
68% of the cases lie in the "middle area" bounded by one standard deviation on each
distribution. (Note that "within one standard deviation unit" means one unit on
side. Suppose, for instance, that for another group, the mean of their normal distribution
both sides of the mean.)
also equals 70, but the group has less variability with a standard deviation of only 5.
Since the standard deviation is only 5 points, 68% of the cases lie between scores of 65
and 75 for this group of participants.
54
The 68% guideline (sometimes called the two-thirds rule of thumb) strictly applies As discussed earlier, when the data contains outliers it is better to
only to perfectly normal distributions. The less normal a distribution is, the less accurate report the Median and the Interquartile Range so the reader is not
the guidelines. misled by extreme scores in interpreting the spread of the responses.

By examining the calculation of the standard deviation you can see it is based on the
differences between the mean and each of the scores in a distribution. When researchers
report the mean (the most frequently used average), they report the standard deviation.

Sample statement reporting means and standard deviations:

"Group A has a higher mean (Mean= 67.89, S = 8.77) than Group B (Mean=
60.23,S=8.54)."

EXAMPLE

Entering the data from the earlier example of range and interquartile comparisons
into SYSTAT we are able to run analysis using descriptives to derive the following
output.

Scores: 2,2,2,3,4,4,5,5,5,6,6,20

55
Section 5

Measures of Association

DESCRIPTIVE STATISTICS Measures of Association


1. Correlational Statistics
Correlational Statistics
2. Pearson Product Moment Correlation
Correlation refers to the extent to which two variables are related across a group of participants. Consider
3. Chi-square Distribution
scores on the College Entrance Examination Board's Scholastic Aptitude Test (SAT) and first-year GPA in
college. Because the SAT is widely used as a predictor of college student selection, there should be a
correlation between these scores and GPAs earned in college. Consider Example 1 in which SAT-V refers to
the verbal portion of the SAT.1 Notice that there is one group of students with two scores for each student. Is
there a relationship between the two variables?

Student SAT-V GPA

Mitt 333 1

Janice 756 3.8

Thomas 444 1.9

Scot 629 3.2

Diana 501 2.3

Hillary 245 0.4

56
Indeed there is. Notice that students who scored high on the SAT-V such as Janice
Participant Self-Concept Depression
and Scot had the highest GPAs. Also, those who scored low on the SAT-V such as
Hillary and Mitt had the lowest GPAs. This type of relationship is called a direct Sally 12 25
relationship (also called a positive relationship). In a direct relationship, those who
score high on one variable tend to score high on the other and those who score low on Jose 12 29
one variable tend to score low on the other.
Sarah 10 38
In the next example the scores are on two variables for one group of participants. The
Dick 7 50
first variable is self-concept, which was measured with 12 true-false items containing
statements "such as "I feel good about myself when I am in public." Participants earned
Matt 8 61
one point for each statement that they marked as being true of them. Thus, the self-
concept scores could range from zero (marking all statements as false) to 12 Joan 4 72
(marking all statements as true). Obviously, the higher a participant's score, the
higher the self-concept. The second variable is depression measured with a
It is important to note that just because a correlation between two Variables is
standardized depression scale with possible scores from 20 to 80. Higher scores
observed, it does not necessarily indicate that there is a causal relationship
indicate more depression.
between the variables. For instance, our data does not establish whether (a)

A key question for the research is “Does the data indicate that there is a having a low self-concept causes depression or (b) being depressed causes an

relationship between self-concept and depression?” Close examination indicates individual to have a low self-concept. In fact, there might not be any causal

that there is a relationship. Notice that participants with high self-concept scores relationship at all between the two variables because a host of other variables

such as Sally and Jose (both with the highest possible self-concept score of 12) (such as life circumstances, genetic depositions, and so on) might account for the

had relatively low depression scores of 25 and 29 (on a scale from 20 to 80). At the relationship between self-concept and depression. For instance, having a

same time, participants with low self-concept scores such as Matt and Joan have disruptive home life might cause some individuals to have a low self-concept and

high depression scores. In other words, those with high self-concepts tend to have at the same time cause these same individuals to become depressed.

low depression scores, while those with low self-concepts tend to have high
In order to study cause-and-effect, a controlled experiment is needed in which
depression scores. Such a relationship is called an inverse relationship (also called
different treatments are administered to the participants. For instance, to examine
a. negative relationship). In an inverse relationship, those who score high on one
a possible causal link between self-concept and depression, a researcher could
variable tend to score low on the other.
give an experimental group a treatment designed to improve self-concept and
then compare the average level of depression of the experimental group with the
average level of a control group.

Although it is generally inappropriate to infer causality from a correlational study,


such studies can still be of great value. For instance, the College Board is

57
interested in how well the SAT works in predicting success in college. This can be Joe has a high SAT-V score but a very low GPA. Thus, Joe is an exception to the
revealed by examining the correlation between SAT scores and college GPAs. It is rule that high values on one variable are associated with high values on the other.
not necessary for the College Board to examine what causes high GPAs in an There may be a variety of explanations for this exception: Joe may have had a
experiment for the purposes of determining the predictive validity of its test. family crisis during his first year in college or he may have abandoned his good
work habits to make time for TV viewing and campus parties as soon as he moved
In addition to validating tests correlations are of interest in developing theories. away from home to college. Patricia is another exception: Perhaps she made an
Often, a postulate of a theory may indicate that X should be related to Y. If a extra effort to apply herself to college work, which could not be predicted by the
correlation is found in a correlational study, the finding helps to support the theory. SAT. When studying hundreds of participants, there will be many exceptions, some
If it is not found it calls the theory into question. large and some small. To make sense of such data, statistical techniques are
required.
Up to this point, only clear-cut examples have been considered. However in
practice, correlational data almost always include individuals who are exceptions
The Pearson Correlation Coefficient
to the overall trend, making the degree of correlation less obvious. Consider the
following, which has the students from our first example and two others: Joe and Assume that we are interested in studying acceleration and braking for new cars.
Patricia. We want a single number that summarizes how well we could predict acceleration
from braking. Later on we will use linear regression when we discuss how we
calculate such a line but it is enough here to know that we are interested in
drawing a line through the area covered by the points in the scatterplot such that
Student SAT-V GPA the acceleration of a car could be predicted rather well by the value on the line
corresponding to its braking. The closer the points cluster around this line the
Mitt 333 1
better would be the prediction.
Janice 756 3.8

Thomas 444 1.9 We also want this number to represent how well we can predict braking from
acceleration using a similar line. This symmetry we seek is fundamental to all the
Scot 629 3.2
measures available in correlation. It means that, whatever the scales on which we
Diana 501 2.3 measure our variables, the coefficient of association we compute will be the same
Hillary 245 0.4 for either prediction. If this symmetry makes no sense for a certain data set, then

Joe 630 0.9 you probably should not be using correlation.

Patricia 404 3.1 The most common measure of association is the Pearson correlation
coefficient, which varies between -1 and +1. A Pearson correlation of 0 indicates
that neither of two variables can be predicted from the other by using a linear
equation. A Pearson correlation of +1 indicates that one variable can be predicted

58
perfectly by a positive linear function of the other, and vice versa. And a value of
-1 indicates the same, except that the function has a negative sign for the slope of
the line.

By plotting a scatter diagram of x and y, one can get a good idea of whether there
is a relationship between the two variables. Referring to Figure on the following
page to see that the values of Y (dependent variable) can take on a number for
forms when plotted with values of X (independent variable). The previous unit
dealt with the construction of a line through the scatter plot in order to assess the
nature of the relationship. Now a way to measure the degree or strength of the
relationship and a means of testing the strength for statistical significance is
needed. There is one primary means of measuring the strength of the relationship:
by the correlation coefficient (r), and/or the coefficient of determination (r2).

There are three commonly used methods to test the statistical significance of the
strength of the relationship as measured by (r):

• Testing r, itself, for statistical significance using Student’s t-test.

• Testing the b coefficient in the linear regression equation for statistical


significance using the Student’s t-test.

• Using the F test to evaluate the ratio of the explained variations (from the
regression line) to the unexplained variations.

All three tests are equivalent methods of determining whether the strength of the
relationship between x and y is statistically significant, versus whether the
observed relationship could have occurred by chance. They are equivalent,
Keep in mind that the Pearson correlation measures linear predictability. Do not
however, only for bivariate analysis. Primary attention will be given to the first two
assume that a Pearson correlation near 0 implies no relationship between variables.
tests. The third (the F test) is included here for completeness because computer
Many nonlinear associations (U- and S-shaped curves, for example) can have
programs like SYSTAT use the F test for correlation analysis.
Pearson correlations of 0.

59
Example Analysis Using SYSTAT: The output window will include Pearson’s correlation coefficient and a scatterplot of
the data.
After entering the data on SAT-V and GPA for the eight students from our earlier
example, we can request Analysis, Correlations, and Simple:

The following window opens and you add SAT and GPA as Selected variables.

As we had observed earlier, the correlation coefficient appears to be strong at .644


and the pattern of scores in the scatterplot reflect a positive relationship between
SAT scores and GPA. A test of the hypothesis of no relationship reveals we can not
reject the null hypothesis at a 95% confidence level (t= 2.063, p=.085).
Click OK.

60
Chi Square Distribution A simple graph reveals these similarities.

The most familiar test available for two-way tables is the Pearson chi-square test
( ) for independence of table rows and columns. When the table has only two rows

or two columns, the chi-square test is also a test for equality of proportions. The
concept of interaction in a two-way frequency table is similar to the one in analysis
of variance. It is easiest to see in an example. An advertising agency was interested
in the potential effect on sales for two different campaigns. The campaigns were
ran in two major cities (NY and LA). The results are as follows:

NY LA
There is almost complete overlap in the plot. This indicates their is no interaction
AD A 8 9 between the AD campaigns (A or B) and sales in NY or LA.

AD B 6 7 Now let’s extend the example and assume the agency has two new campaigns (C
and D) and has ran each campaign in NY and LA. We will once again use unit
sales in thousands as are outcome measure. The results are as follows:
Notice in the table the sales in (000 units) are similar for both NY and LA.

We are interpreting these numbers relatively, so we should compute row percentages NY LA


to understand the differences better.
AD C 5 12
Here is the same table standardized by rows:
AD D 9 4

NY LA
Notice in the table the sales in (000 units) are dissimilar for both NY and LA.
AD A 47.1 52.8
We are interpreting these numbers relatively, so we should compute row percentages
AD B 46.1 53.8 to understand the differences better.

Now we can see that the percentages are similar in the two rows in the table.

61
Here is the same table standardized by rows: about 10 points along the normal distribution. The method of making these
comparisons will be described below. The degrees of freedom associated with

NY LA the distribution reflects the number of points of comparison.

AD C 29.4 70.6 In every case, the basic form of the null hypothesis is that there is “no difference”
between the two distributions being compared, with the alternate hypothesis
AD D 69.2 30.8
being that there is a difference. Since hypotheses are always statements about a
population the “no difference” statement must refer to the populations involved.
Now we can see that the percentages are dissimilar in the two rows in the table.
The following examples show a progression from simple to more complex
A simple graph reveals these dissimilarities. applications, indicating the statements of hypotheses associated with each. The
only assumptions required of the model are that: (1) at least a nominal level of

measure is achieved, thus any level is acceptable, and (2) a random sample is
used. If more than one sample is involved, then (3) the samples must be
independent of each other.

One Sample Case – Uniform Distribution

Caterpillar Inc. completed a survey of wheel loader/aggregate national accounts


without Caterpillar equipment to determine whether these end users feel that parts
prices are generally higher on the East Coast than on the West Coast. The
There is almost no overlap in the plots. This indicates their is an interaction following results were obtained from a random sample of 24 end users:
between the AD campaigns (C and D) and sales in NY or LA. Campaign C was
effective in LA but less effective in NY while campaign D was effective in NY and Don't
Yes No TOTAL
Know
less effective in LA.
Are prices
higher in 15 5 4 24
We can test an analogous hypothesis in this context-that each of the four cells the East?
contains the expected percent (frequency) of the unit sales. It is possible to use
the test with as few as two comparisons, or for many comparisons. The Caterpillar Inc. wants to know if the response pattern is different enough to
conclude that the population does have a definite opinion (yes). In this case there
number of comparisons possible depends on the sample size. To establish a
are three categories of response and the level of measure is nominal. If there had
“comparison point” takes about five items in a sample. Thus, in testing whether a
been only two responses, yes or no, the binomial model for the normal
sample came from a normal population if n = 50, comparisons could be made at
approximation to the binomial could have been used.

62
If this had been a binomial problem, in order to test whether there is a definite yes The hypothesis is tested by comparing the height of the two distributions at each
or no response, the null hypothesis would have been: of the three bars or points. If the two distributions differ too much, as measured
by the statistic, the null hypothesis would be rejected and it would be

concluded that the sample did not come from a population where the opinions
were equally distributed. This would mean that there is a definite opinion one way
or the other.
50-50 is used because it represents no definite (balance) of opinion one way or
the other. In the present case, with three categories of response, the null Equally distributed opinions in this case would be equivalent to no opinion one
hypothesis would represent equal responses in all three categories. way or the other, like 50-50 when testing a proportion. Thus, if the two
distributions do not differ “too” much as measured by the statistic, the null
Using the Chi Squared model, this is analogous to comparing the sample data
hypothesis would be accepted and the conclusion would be that there is no
distribution to a uniform population distribution with flat, constant height as shown
strong difference of opinion one way or the other.
below:

Calculation of

To test the null hypothesis requires calculating a value of , which like an F

statistic, is then compared to a critical value for .

Step 1: Set up the table showing frequencies observed (f0) in the sample and

frequencies expected (fe) under the null hypothesis:

YES NO DK TOTAL
Observed 15 5 4 24
Expected ? ? ? 24

Step 2: Check to see if no more than 20% of the cells have fe less than 5. If more
than 20% of the cells have fe less than 5, the cell would have to be regrouped or
the test discontinued. In this case, all three cells have fe>5, thus the criteria is
met. This criterion reflects the statement made above that it takes at least 5
The null hypothesis states that the population distribution is uniform in shape, of elements of a sample to properly establish a “point” on a distribution where a
that the sample distribution came from a population distribution that is uniform in comparison is to be made.
shape.
63
OUTPUT FROM SYSTAT:
Step 3: Calculate:

Step 4: Determine degrees of freedom:

General Procedure: df = the number of cells in which fe must be determined


before the totals in rows or columns fixes the fe in the remaining cell.

YES NO DK TOTAL
Observed 15 5 4 24
Expected ? ? ? 24

After the fe (expected value) is determined in the first two cells, the fe in the third
cell is fixed by the total n. Thus there are 2 degrees of freedom in this case.

Step 5: Compare observed to expected

Although this is a two tail or non-directional test, the values for α are shown as Step 6: State conclusion: Reject H0; the observed value of is too great to
areas under one tail, as in the case with the F distribution. conclude that the pattern of responses in the sample came from a population with
a uniform distribution of end user opinions about prices in the East. Thus, the
difference observed is statistically significant. The population of end users do
have definite (yes) opinions about whether prices are higher in the East.

64
Two (or more) Independent Samples In interpreting these statements it must be realized that “the same” refers to “from
the same population distribution.” A statistical hypothesis must be a statement
Perhaps the most widely used application of the test is to test whether two or about a population. The shortened version of H0 and H1 appear to be statements
about two samples unless correctly interpreted as explained above.
more groups differ with respect to some opinion or characteristic. The test
evaluates whether two or more sample distributions (patterns) could reasonably The correct understanding of the statistical hypothesis in this example sheds light
have come from the same population distribution (pattern). on the way the expected frequencies (fe) are determined in order to calculate the

An extension of the one-sample example of the survey of end users concerning value of observed.

prices in the East will illustrate this application. In addition to the survey of 24 end
users without Caterpillar equipment, a survey was also made of 60 end users with The logic is this: the column totals represent a combined estimate of the
Caterpillar equipment. The results from the random samples were as follows: population distribution using both samples:

Are prices higher in the East? Yes No DK Total

45 10 29 84
Yes No Don’t Know Total
No Caterpillar
15 5 4 24
Equipment The object is then to compare each sample pattern with this common estimate of
Caterpillar
30 5 25 60 the population distribution. Since the sample sizes are less than the total, and
Equipment
Total 45 10 29 84 unequal in this case, the population pattern can be “scaled down” for each
sample by multiplying the column totals by the ratio of each sample size to the
Now the relevant question is, “Do the end users with Caterpillar equipment have combined total of both samples.
the same opinions as end users without Caterpillar equipment.” Converted to a
statistical hypothesis, the parallel question is whether sample 1 and sample 2 are For example, to calculate the expected frequencies for end users without
Caterpillar equipment, the column totals are multiplied by the ration of that sample
from the same common population. Since the test compares one distribution
size to the total size. The results are as follows
with another, the null hypothesis might read, “The pattern of responses from end
users without Caterpillar equipment is from the same population pattern.” Yes No DK Total
End users
Often the statements of these hypotheses are shortened to without Cat
fe = 45 fe = 10 fe = 29
equipment
H0: the patterns of responses are the same; and
= 12.86 = 2.86 = 8.29 24

H1: the patterns of responses are not the same.

65
Likewise the expected frequencies for end users with Caterpillar equipment would Conclusion in terms of the problem statement: End users with Caterpillar
be calculated: equipment have the same pattern of opinions as end users without Caterpillar
equipment regarding the question of prices in the East.
Yes No DK Total
End users EXAMPLE: Analysis using SYSTAT
without Cat
fe = 45 fe = 10 fe = 29
equipment Are prices higher in the East?

= 32.14 = 7.14 = 20.71 60


We have the same results as used in the illustration of how chi-square is
calculated.

Yes No Don’t Know Total


Calculation of No Caterpillar
15 5 4 24
Equipment
Caterpillar
30 5 25 60
Equipment
Total 45 10 29 84
= 5.742

NOTE: In calculating (fe) it would have been necessary to calculate fe for only two
cells (A) and (B) before the remaining values were predetermined by the row and
column totals. Thus, there are 2 degrees of freedom with a 2 x 3 contingency
table.

Compare with

critical, 2df, α = 0.05 = 5.991

Statistical conclusion: Accept H0, the two sample distributions do not differ
enough (from a common population) to conclude that they come from different
population distributions.

66
Once the data is entered into SYSTAT you can run the Two-way table analysis by A window will open where you will select the variables you wish to include in the
clicking on Analysis, Tables and Two-Way. analysis and any additional tests you may wish run during the analysis. In this

example ownership of Caterpillar Equipment will be our row variable and the
perception of cost in the East our column variable. We are requesting a separate
table of counts and percentages.

67
When we are finished with our selections we click OK and the results appear in an Extension of Concept to More Than Two Independent Samples:
output window.
Instead of making the test to see whether two sample distributions (end users
without and with Caterpillar equipment) could have come from the same
population distribution, the concept can be extended to more than two samples.
The survey might have included end users without Caterpillar equipment, end
users with Caterpillar equipment, and end users leasing machines. In this case

the contingency table would have been a 3 x 3 table. (This is still considered a

two-way table since we are examining equipment ownership and perception of


cost in the east.)

Don’t
Yes No Total
Know
No
Caterpillar 15 5 4 24
Equipment
Caterpillar
30 5 25 60
Equipment
Lease
Caterpillar 18 10 2 30
Equipment

Total 63 20 31 114

The expected frequencies would be calculated in the same was as the 2 x 3


matrix.

The degrees of freedom in the 3 x 3 matrix would be greater than the 2 x 3 case.
Note that the fe in four cells must be computed before the remaining values are
determined by the row and column totals. The 3 x 3 matrix thus has four degrees
of freedom.

68
You will select the variables you want to analysis and any additional statistical
tests you may wish to have ran as part of the analysis.
Once the data is entered into SYSTAT you can run the Two-way table analysis by
clicking on Analysis, Tables and Two-Way.

In this example ownership of Caterpillar Equipment will be our row variable and
the perception of cost in the East our column variable. We are requesting a
separate table of counts and percentages.

69
Interpretation of Test for Independence

Click OK for the statistical analysis to run and you will receive the results under the The two or more independent sample case described above is often called a “test
output tab. for independence.” The interpretation of the meaning of “independence” is often
difficult for someone first learning Chi Square technique. The procedure is the
same, but be careful how to interpret the meaning of the result. In the 3 x 3 table
above, three independent random samples of (1) end users without Caterpillar
equipment, (2) end users with Caterpillar equipment, and (3) end users with leased
machines responded “yes”, “no”, or “don’t know” concerning whether prices were
generally higher in the East than in the West. This problem is usually posed as
determining whether the response (yes, no, or don’t know) to question is
“independent” of end user category. This is meant to test whether ownership
makes any difference, or exerts any influence, on the response to the question.

The structure of the problem is exactly the same as the 2 x 3 table above, with the
exception that now there are three rows instead of two, and the expected
frequencies are calculated as described above. The null hypothesis is again that
there is “no difference” statistically between the observed pattern of responses
and the expected pattern that would occur if responses were proportioned within
the cells (weighted) according to the totals observed in column and row.

If the observed and expected values are very similar there will be a low value for
and H0, which states that there is no difference between the observed and

expected distributions, would be accepted. Again, H1 would state that there is a


difference. Now, the interpretation of the result is the tricky part. If the category of
ownership does exert come influence on the response to the question,
“clustering” of answers would be expected. For example, end users without
Caterpillar equipment (group 1) might answer “yes” more often to the question,
whereas end users with Caterpillar equipment (group 2) might respond “don’t
know” more often. If this were the case, there would be large cell differences
between the observed values and the expected values. In this case expect a high

70
value of and reject the null hypothesis that there is “no difference”, a Rank Correlation

statistically significant difference would be indicated. There are two main techniques for calculating correlation coefficients based on
ranks: the Spearman Rank Correlation Coefficient and the Kendall Rank
The interpretation of the results is the key point. The clustering of data which
Correlation Coefficient. For purposes of simple correlation, they produce
results in the large differences, and high value of the statistic is evidence that
practically identical results. This section will be limited to the Spearman Rank
place of residence is influencing the responses on the question. Thus the pattern correlation Coefficient, labeled rs.
of responses does depend on the place of residence. Therefore there is evidence
Rank correlation is a non-parametric alternative technique to the parametric
of dependence, or lack of independence, between ownership category and
technique for calculating (r), called the Pearson (r) or “product-moment”
response to the question.
correlation, associated earlier with the least squares regression analysis. The rank
On the other hand, if the observed pattern was very similar to the expected correlation model requires only
“equally proportioned” pattern, then would be a low value and H0 accepted.
1. that a random sample is taken and
This would be interpreted as meaning that the responses do not cluster – and thus
2. that at least an ordinal level of measure is achieved on any of the variables.
indicate that ownership category does not influence the response to the question
concerning prices. In this case, there is no dependency and thus we conclude the
The product-moment correlation model requires (1) a random sample as well, but
effects are independent although (and because) the distribution patterns are
in addition, (2) x and y must be normally distributed, and (3) level of measure must
similar.
be interval.

Expressing the statement of the research hypothesis in terms such as “effect A is


Thus, rank correlation is useful in two situations:
independent of effect B”, tends to obscure the meaning of the statistical
hypothesis, which is always a statement of “null” or “no difference” statistically. 1. Where ordinal level of measure is possible (ranked data) but interval
And it is always the statistical hypothesis that is actually tested. The failure to measure is not.
keep the two types of hypotheses separate is asking for confusing results.
2. Where interval level of measure is obtained, but it is questionable whether x
It is therefore recommended that the null hypothesis be stated in terms of “similar and y are normally distributed (Income data is often not normally
patterns” as discussed earlier, and that the research hypothesis be stated more distributed).
directly, such as “ownership category affects attitudes toward prices,” to avoid
possible misinterpretation of results. In addition, rank correlation is easier to calculate and can be used any time the
product-moment correlation assumptions are met. Rank correlation will produce
very similar results, but with lower power-efficiency. In comparison with the
Pearson (r), the Spearman (rs) has a power-efficiency of about 91%.

71
EXAMPLE We select the two variables of interest, Percent_rank (percentage of steel content)
and Break_rank (the breaking point). Under the test we identify our data as rank
The quality control group measured the percent of steel in the final product and order and use Spearman. Click OK.
the breaking point. The data has been transformed to rank order for the eight
samples. Once the data has been entered into SYSTAT we can request Analysis,
Correlation and Simple.

Once we click on Simple the following window for variable and procedure The scatterplot reveals a strong positive relationship between the two rank order
selection will open. variables. The Spearman Correlation reflects this relationship at .952

72
We test the significance of the Spearman Correlation Coefficient using SYSTAT.
Under Analysis, Hypothesis Testing and Correlation we select Zero Correlation.

We will discuss the confidence level and alternate hypothesis in later sections of
the text. The results of the analysis give a p-value of 0.000.

We conclude there is a significant correlation between percent of steel and


breaking point.

73
Section 6

Tests of Normality

DESCRIPTIVE STATISTICS Location and Spread Summary


1. Skewness
All of these measures of location and spread have their advantages and disadvantages, but the mean
2. Kurtosis and standard deviation are especially useful for describing data that follow a normal distribution.
The normal distribution is a mathematical curve with only two parameters in its equation: the mean and
3. Tests of Normality
standard deviation. As you recall, a parameter defines a family of mathematical functions, all of which
4. Multivariate Normality Assessment have the same general shape. Thus, if data come from a normal distribution, we can describe them
completely (except for random variation) with only a mean and standard deviation.

If the fit of the curve to the data looks excellent examine the fit in more detail. For a normal distribution, we
would expect 68% of the observations to fall between one standard deviation below the mean and one
standard deviation above the mean. By counting values in the stem-and-leaf diagram, we find the number
of cases-on target. This is not to say that every number follows a normal distribution exactly, however.

Skewness and Kurtosis

You’ve learned numerical measures of location, spread, and outliers, but what about measures of
shape? The histogram gives you a general idea of the shape, but two numerical measures of shape
give a more precise evaluation: skewness tells you the amount and direction of skew (departure from
horizontal symmetry), and kurtosis tells you how tall and sharp the central peak is, relative to a
standard bell curve.

Many statistics inferences require that a distribution be normal or nearly normal. A normal distribution
has skewness and excess kurtosis of 0, so if your distribution is close to those values then it is
probably close to normal.

74
Among the descriptive statistics are Skewness and Kurtosis. Abnormally skewed the high end of the scale). Values of 2 standard errors of skewness (ses) or
and peaked distributions may be signs of trouble and that problems may then more (regardless of sign) are probably skewed to a significant degree.
arise in applying testing statistics. So the key question is, what are the acceptable
ranges for these two statistics and how will they affect the testing statistics if they The first thing you usually notice about a distribution’s shape is whether it has one
are outside those limits? mode (peak) or more than one. If it’s unimodal (has just one peak), like most data
sets, the next thing you notice is whether it’s symmetric or skewed to one side. If
A commonly used "number crunching" software program is SYSTAT and is the bulk of the data is at the left and the right tail is longer, we say that the
compatible with PCs and Macs. The program provides an analysis of the distribution is skewed right or positively skewed; if the peak is toward the right
dependent variables. ans puts a number of useful descriptive statistics into the and the left tail is longer, we say that the distribution is skewed left or negatively
output, including all of the following: mean, standard error of the mean, median, skewed.
mode, standard deviation, variance, kurtosis, skewness, range, minimum,
maximum, sum, count, largest, smallest, and confidence level. Our question Look at the following two graphs. They both have μ = 0.1 and σ = 0.26, but their
focuses on the skew and kurtosis statistics. shapes are different.

Skewness

Let us begin by talking about skewness. Skewness is a function that returns the
skewness of a distribution. Skewness characterizes the degree of asymmetry of a
distribution around its mean. Positive skewness indicates a distribution with an
asymmetric tail extending towards more positive values. Negative skewness
indicates a distribution with an asymmetric tail extending towards more negative
values. While that definition is accurate, it isn't 100 percent helpful because it
doesn't explain what the resulting number actually means.

The skewness statistic is sometimes also called the skewedness statistic. Normal
distributions produce a skewness statistic of about zero. (I say "about" because
small variations can occur by chance alone). So a skewness statistic of -0.0201 When the distribution has a positive skew the mean is pulled to the right of the

would be an acceptable skewness value for a normally distributed set of test mode and median.

scores because it is very close to zero and is probably just a chance fluctuation
from zero. As the skewness statistic departs further from zero, a positive value
indicates the possibility of a positively skewed distribution (that is, with scores
bunched up on the low end of the score scale) or a negative value indicates the
possibility of a negatively skewed distribution (that is, with scores bunched up on

75
Interpreting

If skewness is positive, the data are positively skewed or skewed right, meaning
that the right tail of the distribution is longer than the left. If skewness is negative,
the data are negatively skewed or skewed left, meaning that the left tail is longer.

If skewness = 0, the data are perfectly symmetrical. But a skewness of exactly


zero is quite unlikely for real-world data, so how can you interpret the skewness
number? Rule of thumb:

• If skewness is less than −1 or greater than +1, the distribution is highly


skewed.
When the distribution has a positive skew the mean is pulled to the right of the
• If skewness is between −1 and −½ or between +½ and +1, the distribution
mode and median.
is moderately skewed.
Remember that the mean and standard deviation have the same units as the
• If skewness is between −½ and +½, the distribution is approximately
original data, and the variance has the square of those units. However, the
symmetric.
skewness has no units: it’s a pure number, like a t-score.
With a skewness of −0.1098, the sample data for student heights are
For example, let's say you are using SYSTAT and calculate a skewness statistic of
approximately symmetric.
-.9814 for a particular test administered to 30 students. The ses for the sample is
given in the output as .4472. Since two times the standard error of the skewness Caution: This is an interpretation of the data you actually have. When you have
is .8944 and the absolute value of the skewness statistic is -.9814, which is data for the whole population, that’s fine. But when you have a sample, the
greater than .8944, you can assume that the distribution is significantly skewed. sample skewness doesn’t necessarily apply to the whole population. In that case
The number of deviations in skewness is greater than -2 (-.9814/.4472). Since the the question is, from the sample skewness, can you conclude anything about the
sign of the skewness statistic is negative, you know that the distribution is population skewness?
negatively skewed. Alternatively, if the skewness statistic had been positive, you
would have known that the distribution was positively skewed. Yet another Inferring
alternative would be that the skew statistic might fall within the range between - .
8944 and + .8944, in which case, you would have to assume that the skewness Your data set is just one sample drawn from a population. Maybe, from ordinary
was within the expected range of chance fluctuations in that statistic, which would sample variability, your sample is skewed even though the population is
further indicate a distribution with no significant skewness problem. symmetric. But if the sample is skewed too much for random chance to be the
explanation, then you can conclude that there is skewness in the population.

76
But what is “too much for random chance to be the explanation”? Divide the students all scoring very high on an achievement test at the end of a course may
sample skewness by the standard error of skewness (SES) to get a test statistic simply indicate that the teaching, materials, and student learning are all
that measures how many standard errors separate the sample skewness from functioning very well. This would be especially true if the students had previously
zero. scored poorly in a positively skewed distribution (with students generally scoring
very low) at the beginning of the course on the same or a similar test. In fact, the
The critical test value is approximately 2. (This is a two-tailed test of skewness ≠ 0 difference between the positively skewed distribution at the beginning of the
at roughly the 0.05 significance level.) course and the negatively skewed distribution at the end of the course would be
an indication of how much the students had learned while the course was going
• At < −2, the population is very likely skewed negatively (though you don’t
on.
know by how much).

You should also note that, when reporting central tendency for skewed
• At −2 and +2, you can’t reach any conclusion about the skewness of the
distributions, it is a good idea to report the median in addition to the mean. A few
population: it might be symmetric, or it might be skewed in either direction.
very skewed scores (representing only a few students) can dramatically affect the
• At > 2, the population is very likely skewed positively (though you don’t mean, but will have less affect on the median. This is why we rarely read about the
know by how much). average family income (or mean salary) in the United States. Just a few billionaires
would make the average "family income" very high, higher than most people
Don’t mix up the meanings of this test statistic and the amount of skewness. The actually make. Median income is reported and makes a lot more sense to most
amount of skewness tells you how highly skewed your sample is: the bigger the people. The same is true in any skewed distributions of scores as well. So
number, the bigger the skew. The test statistic tells you whether the whole reporting the median along with the mean in skewed distributions is a generally
population is probably skewed, but not by how much: the bigger the number, the good idea.
higher the probability.
Kurtosis
The existence of positively or negatively skewed distributions as indicated by the
skewness statistic is important for you to recognize because skewing, one way or Kurtosis characterizes the relative peakedness or flatness of a distribution
the other, will tend to reduce the reliability of the results. Perhaps more compared to the normal distribution. Positive kurtosis indicates a relatively peaked
importantly, from a decision making point of view, if the scores are scrunched up distribution. Negative kurtosis indicates a relatively flat distribution. And, once
around any of your cut-points, making a decision will be difficult because many again, that definition doesn't really help us understand the meaning of the
observations will be near that cut-point. Skewed distributions will also create numbers resulting from this statistic.
problems insofar as they indicate violations of the assumption of normality that
Normal distributions produce a kurtosis statistic of about one (small variations can
underlies many of the other statistics like correlation coefficients and t-tests.
occur by chance alone). So a kurtosis statistic of 0.9581 would be an acceptable
However, a skewed distribution may actually be a desirable outcome on a kurtosis value for a mesokurtic (that is, normally high) distribution because it is
criterion-referenced test. For example, a negatively skewed distribution with close to one. As the kurtosis statistic departs further from one, a positive value

77
indicates the possibility of a leptokurtic distribution (that is, too tall) or a negative • At < −2, the population very likely has negative excess kurtosis (kurtosis <3,
value indicates the possibility of a platykurtic distribution (that is, too flat, or even platykurtic), though you don’t know how much.
concave if the value is large enough). Values of 2 standard errors of kurtosis (sek)
or more (regardless of sign) probably differ from mesokurtic to a significant • At between −2 and +2, you can’t reach any conclusion about the kurtosis:

degree. excess kurtosis might be positive, negative, or zero.

You may remember that the mean and standard deviation have the same units as • At > +2, the population very likely has positive excess kurtosis (kurtosis >3,

the original data, and the variance has the square of those units. However, the leptokurtic), though you don’t know how much.

kurtosis has no units: it’s a pure number, like a t-score.


For example, let's say you are using SYSTAT and calculate a kurtosis statistic of +

The reference standard is a normal distribution, which has a kurtosis of 1. For 1.9142 for a particular study with a standard error of kurtosis (sek) of .8944. Since

example, the “kurtosis” reported by SYSTAT. 1.9142/.8944 >2 times the standard error of the kurtosis, you can assume that the
distribution has a significant kurtosis problem. Since the sign of the kurtosis
• A normal distribution has kurtosis 1. Any distribution with kurtosis ≈1 is statistic is positive, you know that the distribution is leptokurtic (too tall).
called mesokurtic. Alternatively, if the kurtosis statistic had been negative, you would have known
that the distribution was platykurtic (too flat). Yet another alternative would be that
• A distribution with kurtosis <1 is called platykurtic. Compared to a normal the kurtosis statistic might fall within the range between - 2 and + 2, in which
distribution, its central peak is lower and broader, and its tails are shorter case, you would have to assume that the kurtosis was within the expected range
and thinner. of chance fluctuations in that statistic.

• A distribution with kurtosis >1 is called leptokurtic. Compared to a normal The existence of flat or peaked distributions as indicated by the kurtosis statistic
distribution, its central peak is higher and sharper, and its tails are longer is important to you as a researcher insofar as it indicates violations of the
and fatter. assumption of normality that underlies many of the other statistics like correlation
coefficients and t-tests.
The smallest possible kurtosis is 1 and the largest is ∞. Just as with variance,
standard deviation, and kurtosis, the computation of kurtosis is complete if you
Geometric and Harmonic Means
have data for the whole population. But if you have data for only a sample, you
have to compute the sample kurtosis and standard error for the sample kurtosis. When the data is multiplicative or the quantities are rates the more appropriate
measure of central tendency is a geometric or harmonic mean, respectively.
Your data set is just one sample drawn from a population. You divide the sample
excess kurtosis by the standard error of kurtosis (SEK) to get the test statistic, The geometric mean (GM) is a suitable measure of central tendency when the
which tells you how many standard errors the sample excess kurtosis is from zero. quantities involved are multiplicative in nature, such as rate of population growth,
interest rate, etc. For example, suppose an investment earns an interest of 5% in
The critical test value is approximately 2. (This is a two-tailed test of excess
the first year, 15% in the second, and 25% in the third. Then the investor may be
kurtosis ≠ 0 at approximately the 0.05 significance level.)
78
interested in the 'average' annual interest percentage. Evidently, we want the Values for acceptability for statistical purposes (+/-1 to +/-2) are the same
answer to be such a number y that if the annual interest rate of y applies uniformly as with kurtosis.
over the three years, then the final return is the same as that given by the differential
interest rates mentioned. The skewness and kurtosis statistics, like all the descriptive statistics, are
designed to help us think about the distributions of scores that our study creates.
The harmonic mean (HM) is a suitable measure of central tendency when the Interpreting the results depends heavily on the type and purpose of the data being
quantities involved are rates. For example, a person drove a car for 100 miles of analyzed. Keep in mind that all statistics must be interpreted in terms of the types
which he maintained a speed of 50 miles/hr for the first 25 miles, 40 miles/hr for the and purposes of your study.
next 25 miles, 45 miles/hr for the next 25 miles and 55 miles/hr for the last 25 miles.
Then he has spent 25(1/50 + 1/40 + 1/45 + 1/55) hours on a 100 mile journey making A formal way of finding out if the normal distribution describes the data well is to

the average speed 4/ (1/50 + 1/40 + 1/45 + 1/55) = 43.83619, which is the harmonic carry out a statistical test of hypothesis. The Shapiro-Wilk test is a standard test for

mean of the four speeds. normality used when the sample size is between 3 and 5000. The p-value given by
this test is an indication of how good the fit is—the smaller the p-value is, the worse
Test for Normality is the fit. Generally, p-values of the order of 0.05 or 0.01 are considered small
enough to declare the fit poor.
There are many ways to assess normality, and unfortunately none of them are
without problems. Graphical methods are a good start, such as plotting a The Anderson-Darling test is a standard goodness-of-fit test. It can be used to test
histogram and making a quantile plot. whether the given data arise from a normal distribution where F„(x) is the proportion
of sample points less than or equal to x in a sample of size n. It gives greater
We have reviewed measures of the shape of the distribution importance to the observations in tails than those at the center. Note that there are
algorithms to determine reasonably precisely the Anderson-Darling p-value in the
• Kurtosis: a measure of the "peakedness" or "flatness" of a distribution. A
range 0.01 to 0.15 but beyond 0.15 it is difficult to compute it with sufficient
kurtosis value near zero indicates a shape close to normal. A negative value
precision.
indicates a distribution which is more peaked than normal, and a positive
kurtosis indicates a shape flatter than normal. An extreme positive kurtosis Multivariate Normality Assessment
indicates a distribution where more of the values are located in the tails of
the distribution rather than around the mean. A kurtosis value of +/-1 is Mardia's skewness and kurtosis coefficients and tests of significance of these
considered very good for most statistical uses, but +/-2 is also usually coefficients using asymptotic distributions, are useful for multivariate normality
acceptable. assessment. Also, one may use the Henze-Zirkler test statistic and its associated p-
value using lognormal distribution.
• Skewness: the extent to which a distribution of values deviates from
symmetry around the mean. A value of zero means the distribution is
symmetric, while a positive skewness indicates a greater number of smaller
values, and a negative value indicates a greater number of larger values.

79
Non-Normal Shape sample. This information is used to “match up” the two distributions for
comparison.
Before you compute means and standard deviations on everything in sight, however,
means and standard deviations are not good descriptors for non-normal data.
In these cases, you have two alternatives: either transform your data to look normal,
Sample:
or find other descriptive statistics that characterize the data. You may find that if you
log the values of a variable, for example, the histogram looks quite normal.
Sample Invoices Sales Sample Invoices Sales
1 160 100 21 155 155
If a transformation does not work, then you may be looking at data that come from
2 220 105 22 203 155
a different mathematical distribution. You should turn to distribution-free summary
3 128 118 23 150 160
statistics to characterize your data: the median, range, minimum, maximum,
4 160 120 24 155 160
midrange, quartiles, and percentiles.
5 135 123 25 195 160

EXAMPLE: Normal Distribution 6 156 125 26 140 160


7 160 125 27 172 160
SYSTAT provides a number of tests for evaluating whether a sample can be 8 130 128 28 160 172
assumed to come from a population distribution of a specified type. Since many 9 150 130 29 183 175
parametric statistics models assume that the sample came from a normal 10 120 130 30 155 180
population, an example of that type will be illustrated. 11 150 135 31 175 180
12 160 135 32 100 183
A random sample of 37 sales reps were selected and their number of annual sales 13 105 140 33 155 185
invoices recorded. The null hypothesis will be simply H0: the population is 14 125 145 34 125 195
normally distributed; and H1: the population is not normally distributed. Another 15 180 150 35 145 203
way of stating H0 would be to say that the sample came from a population with 16 235 150 36 180 220
normal distribution. In both cases, the null hypothesis has the same general form, 17 135 150 37 123 235
“there is no difference between the sample distribution and a normal distribution.” 18 130 155
The alternative hypothesis will be that there is a difference. 19 185 155
20 118 155
To test whether the sample could have come from a normal distribution, it is
necessary to “match up” the sample distribution with a theoretical normal
distribution and compare the two distributions. To “match up” the observed
sample distribution and the theoretical normal distribution, first estimate the mean
and standard deviation from the sample. Then assume that the theoretical normal
curve has the same mean and standard deviation as that estimated from the

80
Once we have the data loaded into the SYSTAT program we can run analysis The following window will open for you to select the type of output you would like
using the descriptives routine. to review.

Click on Analysis and then Basic Statistics:

81
You can add to the default selection by clicking on the median, mode, skewness,
SE of skewness, kurtosis, and SE of kurtosis. In addition you may wish to click on
the Normality test tab on the left menu. You can select either or both univariate Descriptive Output from SYSTAT

test of normality. For this example both test will be ran.

Once you have made all of the selections for analysis click OK. The results of the
analysis are provided in the output window.

82
INVOICES SALES
N of Cases 37 37
Minimum 100.000 100.000
Maximum 235.000 235.000
Interquartile Range 42.750 42.750
Median 155.000 155.000
Arithmetic Mean 154.405 154.378
Mode 160.000 .
Standard Deviation 29.994 29.993
Skewness (G1) 0.620 0.623
Standard Error of Skewness 0.388 0.388
Kurtosis (G2) 0.527 0.530
Standard Error of Kurtosis 0.759 0.759
Shapiro-Wilk Statistic 0.966 0.966
Shapiro-Wilk p-Value 0.313 0.306
Anderson-Darling Statistic 0.434 0.439
Adjusted Anderson-Darling Statistic 0.443 0.449

p-Value >0.15* >0.15*

Conclusion: Accept H0, the population is a normal distribution; or, the sample
could reasonably have come from a normal distribution; or, there is very little
difference between the observed sample distribution and the observed sample
distribution and the theoretical normal distribution. Therefore, the sample data
support the hypothesis that the population is a normal distribution.

83
Section 7

Summary of Procedures

STATISTICAL PROCEDURES Summary


1. Descriptive Statistics
Means and standard deviations are appropriate for quantitative variables that follow a normal distribution.
2. Correlational Statistics Often, however, real data does not always meet this assumption of normality. A descriptive statistic is
called robust if the calculations are insensitive to violations of the assumption of normality. Robust
3. Tests of Normality
measures include the median, quartiles, frequency counts, and percentages.
4. Cronbach’s Alpha
If you were to minimize the influence of extreme observations (outliers) on your descriptors, you may need
to trim the data, whereby a specified proportion of data on one or both extremes is not considered for
computing the descriptors; hence the term, Trimmed Mean. The trimmed mean is not so efficient for
normally distributed data, but if the distribution is skewed it is less sensitive to sampling fluctuations.
Another descriptor you may use to eliminate the effect of extreme observations is the Winsorized Mean,
where an extreme observation is replaced by its nearest included observation for computing the mean.

Before requesting descriptive statistics, first scan graphical displays to see if the shape of the distribution
is symmetric, if there are outliers, and if the sample has subpopulations. If the latter is true, then the
sample is not homogeneous, and the statistics should be calculated for each subgroup separately.

Generally, data are presented in a format with columns representing variables and rows representing
cases (respondents/participants). Almost always, descriptive statistics are needed for the variables and
such statistics are called column statistics. Occasionally, descriptive statistics are needed for cases or
rows. For instance, if your data set consists of scores in a number of similar tests (columns) on a list of
students (cases) and if you wish to find the average score and the variation of each student, you would
want row statistics.

84
The Descriptive Statistics procedure in SYSTAT provides basic statistics and stem- the corresponding parameters using two popular methods, Percentile method and
and-leaf plots for columns as well as rows. The basic statistics are number of Bias corrected accelerated method.
observations (N), minimum, maximum, arithmetic mean (AM), geometric mean,
harmonic mean, sum, standard deviation, variance, coefficient of variation (CV), Descriptive statistics are numerical summaries of our data. Inevitably, these

range, interquartile range, median, mode, standard error of AM, etc. summaries mask details of the data. Without them, however, we would be lost.

Besides the above descriptive statistics, the trimmed mean, Winsorized mean and There are many ways to describe a set of data. Not all are appropriate for every data

their standard error and confidence interval, can also be computed for columns and set, however.

rows. For trimmed mean, you can specify whether left-sided (lower), right-sided
Descriptive Statistics in Statistical Software
(upper), or two-sided trimming is required and the proportion p of data to be
removed. For Winsorized mean, you can specify a proportion p for two-sided Basic Statistics: The following statistics are available:
Winsorization.
All options. Calculate all available statistics except Trimmed and Winsorized Means,
A confidence interval for the mean (based on the normal distribution, with a default Normality tests, and N-tiles and P-tiles.
confidence coefficient of 0.95), and skewness and kurtosis measures with their
standard errors (SES, SEK) can also be opted for. Along with all the above options, N. Computes the number of non-missing values for the variable.
Shapiro-Wilk and Anderson-Darling tests for normality can also be performed. For
multivariate data, Mardia's skewness and kurtosis coefficients and asymptotic tests Minimum. Computes the smallest non-missing value.

of significance on them, and the Henze-Zirkler test for multinormality are available.
Maximum. Computes the largest non-missing value.
N-tiles and P-tiles are also available with seven different algorithms and an
associated transformation of the data to an N-tile class can be requested.
Sum. The total of all non-missing values of a variable.

A stem-and-leaf plot is available for assessing distributional shape and identifying


Arithmetic mean (AM). Computes the arithmetic mean of a variable- the sum of the
outliers. Moreover, Descriptive Statistics provide stratified analyses - that is, you can
values divided by the number of (non-missing) values.
request results separately for each level of grouping variable (such as gender) or for
each combination of levels of two or more grouping variables. SE of AM. The standard error of the mean is the standard deviation divided by the
square root of the sample size. It is the estimation error, or the average deviation of
Resampling procedures are available with this feature. Under Basic Statistics, if you
sample means from the expected value of a variable.
choose any of the resampling options, then SYSTAT gives a summarization based on
resampling. You can opt for the following: mean, median, variance, standard CI of AM. Endpoints for the confidence interval of the mean. You can specify a
deviation, skewness, and kurtosis. You can get resampling estimates along with their confidence level for the confidence interval of the mean. Enter a value between 0 and
bias and standard error. Under bootstrap, you will also get confidence intervals for 1. (0.95 (default) and 0.99 are typical values.) If the value is bigger than 1, it is treated
as a percentage.

85
Median. The median estimates the center of a distribution. If the data are sorted in distribution is flatter than a normal distribution. A kurtosis coefficient is considered
increasing order, the median is the value above which half of the values fall. significant if the absolute value of KURTOSIS / SEK is greater than 2.

Mode. Computes the variable value which occurs most frequently. SE of kurtosis. Computes the standard error of kurtosis (SQR(24/w)).

Geometric mean (GM). Computes the geometric mean for positive values. It is the nth Trimmed mean (TM). Computes the mean after trimming out the extreme
root of the product of all non-missing w-entries. observations. For two-sided trimming (default) enter a value between 0 and 0.5 and
for lower or for upper trimming enter a value between 0 and 1. The default value for
Harmonic mean (HM). Calculates the harmonic mean for positive values. It is the all the cases is 0.10. Beware that for two-sided, each side is trimmed by the given
number of elements to be averaged divided by the sum of the reciprocals of the proportion.
elements.
SE of TM. Computes the standard error of two-sided trimmed mean.
SD. Standard deviation, a measure of spread, is the square root of the sum of the
squared deviations of the values from the mean divided by (n-l). CI of TM. Computes the confidence interval of two-sided trimmed mean.
Enter a value between 0 and 1. (0.95 (default) and 0.99 are typical values.) If
CV. The coefficient of variation is the standard deviation divided by the sample mean. the value is bigger than 1, it is treated as a percentage.

Variance. The mean of the squared deviations of values from the mean. (Variance is Winsorized mean (WM). Computes mean after replacing a specified proportion of
the standard deviation squared.) the extreme observations with the nearest observation. Enter a value between 0 and
0.5 for two-sided Winsorizing. The default value is 0.10. Beware that each side is
Range. The difference between the minimum and the maximum values.
Winsorized by the given proportion.

Interquartile range. The difference between the 1st and 3rd quartiles. The quartiles
SE of WM. Computes the standard error of two-sided Winsorized mean.
(corresponding percentiles) are calculated using CLEVELAND method.
CI of WM. Computes the confidence interval for two-sided Winsorized mean.
Skewness. A measure of the symmetry of a distribution about its mean. If the Enter a value between 0 and 1 (0.95 (default) and 0.99 are typical values.) If
skewness is significantly nonzero, the distribution is asymmetric. A significant the value is bigger than 1, it is treated as a percentage.
positive value indicates a long right tail; a negative value, a long left tail. A skewness
coefficient is considered significant if the absolute value of SKEWNESS / SES is
greater than 2.

SE of skewness. Computes the standard error of skewness (SQR(6/w)).

Kurtosis. A value of kurtosis significantly greater than 0 indicates that the variable
has longer tails than those for a normal distribution; less than 0 indicates that the

86
CORRELATION MEASURES TESTS OF NORMALITY

Measures for Continuous Data Univariate tests. The following tests of normality are available:

The following measures are available for continuous data: Shapiro-Wilk. Computes the Shapiro-Wilk test statistic along with p-value.

Pearson. Produces a matrix of Pearson product-moment correlation coefficients. Anderson-Darling. Computes the Anderson-Darling test statistic along with its p-
Pearson correlations vary between -1 and +1. A value of 0 indicates that neither of value.
two variables can be predicted from the other by using a linear equation. A Pearson
correlation of +1 or -1 indicates that one variable can be predicted perfectly by a Multivariate tests. The following measures and tests of multivariate normality are
linear function of the other. available:

Covariance. Produces a covariance matrix. Mardia skewness. Computes Mardia's skewness coefficient and tests its
significance using an asymptotic distribution.
SSCP. Produces a sum of cross-products matrix. If the Pairwise option is chosen,
sums are weighted by N/n, where n is the count for a pair, and N is the number of Mardia kurtosis. Computes Mardia's kurtosis coefficient and tests its significance
cases. using an asymptotic distribution.

The Pearson, Covariance, and SSCP measures are related. The entries in an SSCP Henze-Zirkler. Computes the Henze-Zirkler test statistic and its associated p-value
matrix are sum of squares of deviations (from the mean) and sum of cross-products using lognormal distribution.
of deviations. If you divide each entry by (n-1), variances result from the sum of
RESAMPLING
squares and covariances from the sum of cross-products. Divide each covariance
by the product of the standard deviations (of the two variables) and the result is a
Perform resampling. Generates samples of cases and uses data thereof to carry out
correlation.
the same analysis on each sample.

Method. Three sampling methods are available:

Bootstrap. Generates bootstrap samples. This is the default method.

Without replacement. Generates subsamples without replacement.

Jackknife. Generates jackknife samples.

87
Number of samples. Specify the number of samples to be generated. These • Kurtosis: a measure of the "peakedness" or "flatness" of a distribution. A
samples are analyzed using the chosen method of sampling. The default is 1. kurtosis value near zero indicates a shape close to normal. A negative value
indicates a distribution which is more peaked than normal, and a positive
Sample size. Specify the size of each sample to be generated while resampling. The kurtosis indicates a shape flatter than normal.
default sample size is the number of cases in the data file in use.
• Skewness: the extent to which a distribution of values deviates from
Random seed. Specify a random seed to be used while resampling. The default symmetry around the mean. A value of zero means the distribution is
random seed is generated by the system. symmetric, while a positive skewness indicates a greater number of smaller
values, and a negative value indicates a greater number of larger values.
Confidence. Specify a confidence level for bootstrap-based confidence interval.
Enter any value between 0 and 1. The default value is 0.95. Descriptive statistics are designed to help us think about the distributions of
scores that our study creates. Interpreting the results depends heavily on the type
Estimates. Specify the parameters for which you desire resampling estimates. and purpose of the data being analyzed. All statistics must be interpreted in terms
of the types and purposes of your study.
CRONBACH’S ALPHA
Remember, means and standard deviations are not good descriptors for non-
Cronbach's alpha is a lower bound for test reliability and ranges in value from 0 to 1
normal data. In a case of non-normal data, either transform your data to look
(negative values can occur when items are negatively correlated). Alpha can be
normal, or find other descriptive statistics that characterize the data.
viewed as the correlation between the items (variables) selected and all other
possible tests or scales (with the same number of items) constructed to measure If a transformation does not work, then you may be looking at data that come from
the characteristic of interest. Note that alpha depends on both the number of items a different mathematical distribution. You should turn to distribution-free summary
and the correlations among them. Even when the average correlation is small, the statistics (Non-parametric Statistics) to characterize your data: the median, range,
reliability coefficient can be large if the number of items is large. minimum, maximum, midrange, quartiles, and percentiles.

The following must be specified to obtain a Cronbach's alpha:

Selected variable(s). To obtain Cronbach's alpha, at least two variables must be


selected

There are many ways to assess normality, and unfortunately none of them are
without problems. Graphical methods are a good start, such as plotting a
histogram and making a quantile plot.

We have reviewed measures of the shape of the distribution

88
Chapter 3

Inferential
Statistics

This section of the text outlines statistical


procedures used to draw inferences about the
population we are studying.
Section 1

Introduction to Inferential Statistics

INFERENTIAL STATISTICS Introduction to Inferential Statistics


1. Introduction
The text started with an example illustrating the basic concept of inferential statistics. As discussed,
2. What is a Population? when one is driving past a gas station the prices on the sign are simply data points. Everyday we may
drive past multiple gas stations. We collect these data points and consciously calculate what we belief
3. Sampling is the going price for a gallon of gas. In essence, we have taken data, organized the information and

4. Constructing the Model generated a descriptive statistic (the average price for gas). Albeit, this is an average based on our
observations but it is a statistic (derived from data) that we use as representative of the
5. Estimating the Model “going” (average) price of gas (for the population of gas stations) to help us make decisions.

6. Confidence Interval You are driving past a gas station where the price of gas is posted as $ .10 below what you believe the

7. Hypothesis Testing average price of gas to be (based on recent observations). Do you stop and get gas at this station?
Maybe.

Why maybe? This brings us to that nagging thing called probability. To help understand probability
think of how different your reaction to a price drop would be if prices had been staying the same for the
past 6 months versus fluctuating prices. This is where you use your built in statistical calculator to
make a decision. You estimate the probability of the price difference and the chance that all stations
will have lowered gas prices verses this one station being outside of the normal range of price
variations you have been observing. If you come to the conclusion that the price of gas at this station
is below what you can expect from other stations you are likely to purchase from this location. This is
the essence of statistical inference. We take data from a sample to represent the general population.
We transform the data from its raw form by sorting, describing and analyzing. This allows us to make
inferences about the world at large (the population represented by our sample).

90
This chapter outlines statistical procedures used to draw inferences about the What is a Population?
population we are studying. Including:
We are going to use inferential methods to estimate the mean age of the popula-
• Confidence intervals and the normal distribution. tion contained in a recent edition of Who's Who in America. We could enter all
70,000 plus ages into a file and compute the mean age exactly. This is not practi-
• Hypothesis testing including type I error and type II error.
cal. A sampling estimate can be more accurate than an entire census. For exam-
• Testing against a hypothesized value. ple, biases are introduced into large censuses from refusals to comply, keypunch
or coding errors, and other sources. In these cases, a carefully constructed ran-
Once we have established an understanding of how we state and test hypothesis, dom sample can yield less-biased information about the population.
we will move into test where we have two or more means and situations where we
This is an unusual population because it is contained in a list and is therefore fi-
want to develop a predictive equation to help in making better decisions. Chapter
nite. We are not about to estimate the mean age of the rich and famous. After all,
4 will focus on:
Spy magazine used to have a regular feature listing all of the famous people who
are not in Who's Who. And bogus listings may escape the careful fact checking of
• Comparing two means.
the Who's Who research staff. When we get our estimate, we might be tempted to
• Comparing three or more means (ANOVA). generalize beyond the book, but we would be wrong to do so. For example, if a
psychologist measures opinions in a random sample from a class of college
Chapter 5 will expand on the relationship between variables by examining how we sophomores, his or her conclusions should begin with the statement, "College
develop and test linear equations (regression and multiple regression). sophomores at my university think..." If the word "people" is substituted for "col-
lege sophomores," it is the researcher’s responsibility to make clear that the sam-
• Linear regression.
ple is representative of the larger group on all attributes that might affect the re-

• Multivariate regression. sults.

As discussed earlier, we often want to do more than describe a particular sample.


In order to generalize, formulate a policy, or test a hypothesis, we need to make Picking a Simple Random Sample
an inference. Making an inference implies that we think a model describes a more
general population from which our data have been randomly sampled. A
That our population is finite should cause us no problems as long as our sample is
population can be "all possible voters," "all possible replications of this
much smaller than the population. Otherwise, we would have to use special tech-
experiment," or "all possible moviegoers." When you make inferences, you should
niques to adjust for the bias it would cause. How do we choose a simple random
have a population in mind.
sample from a population? We use a method that ensures that every possible sam-
ple of a given size has an equal chance of being chosen. The following methods
are not random:

91
• Pick the first name on every tenth page (some names have no chance of In specifying this model, we assume the following:
being chosen). • The model is true for every member of the population.
• Close your eyes, flip the pages of the book, and point to a name (Tversky • The error, plus or minus, that helps determine one population member's
and others have done research that shows that humans cannot behave ran- age is independent of (not predictable from) the error for other members.
domly).
• The errors in predicting all of the ages come from the same random distribu-
• Randomly pick the first letter of the last name and randomly choose from tion with a mean of 0 and a standard deviation of σ.
the names beginning with that letter (there are more names beginning with
C, for example, than with I).

The way to pick randomly from a book, file, or any finite population is to assign a
Estimating the Model
number to each name or case and then pick a sample of numbers randomly.
Because we have not sampled the entire population, we cannot compute the
parameter values directly from the data. We have only a small sample from a
Construct a Model
much larger population, so we can estimate the parameter values only by using
some statistical method on our sample data. When our three assumptions are

To make an inference about age, we need to construct a model for our population: appropriate, the sample mean will be a good estimate of the population mean.
Without going into all of the details, the sample estimate will be, on average, close
a=μ+ε
to the values of the mean in the population.
This model says that the age (a) of someone we pick from the book can be de-
scribed by an overall mean age (μ) plus an amount of error (ε) specific to that per- We can use various methods to estimate the mean. This chapter outlines
son and due to random factors that are too numerous and insignificant to describe statistical procedures used to draw inferences about the population we are
systematically. Notice that we use Greek letters to denote things that we cannot studying. Including:
observe directly and Roman letters for those that we do observe. Of the unobserv-
• Confidence intervals and the normal distribution.
ables in the model, μ is called a parameter, and ε a random variable. A parameter
is a constant that helps to describe a population. Parameters indicate how a
• Hypothesis testing including type I error and type II error.
model is an instance of a family of models for similar populations. A random vari-
able varies like the tossing of a coin. • Testing against a hypothesized value.
There are two more parameters associated with the random variable ε but not ap-
pearing in the model equation. One is its mean (με), which we have rigged to be 0,
Confidence Interval
and the other is its standard deviation (σε or simply σ). Because a is simply the
Our estimate will not be exactly correct. If we took more samples of the same size
sum of μ (a constant) and ε (a random variable), its standard deviation is also σ.
and computed estimates, how much would we expect them to vary? First, it
should be plain without any mathematics to see that the larger our sample, the

92
closer will be our sample estimate to the true value of μ in the population. After all, From this normal approximation, we can build a 95% symmetric confidence
if we could sample the entire population, the estimates would be the true values. interval that gives us a specific idea of the variability of our estimate. If we did this
Even so, the variation in sample estimates is a function only of the sample size entire procedure again—sample names, compute the mean and its standard error,
and the variation of the ages in the population. It does not depend on the size of and construct a 95% confidence interval using the normal approximation—then
the population. The standard deviation of the sample mean is the standard we would expect that 95 intervals out of a hundred so constructed would cover
deviation of the population divided by the square root of the sample size the real population mean age. Remember, population mean age is not necessarily
(discussed in the next section). On average, we would expect our sample at the center of the interval that we just constructed, but we do expect the interval
estimates of the mean age to vary by plus or minus a little more than one standard to be close to it.
deviation of the sample mean.
Hypothesis Testing
If we knew the shape of the sampling distribution of mean age, we would be able
to complete our description of the accuracy of our estimate. There is an From the sample mean and its standard error, we can construct hypothesis tests
approximation that works quite well, however. If the sample size is reasonably on the mean. Suppose we believed that the average age of those listed in Who's
large (say, greater than 25), then the mean of a simple random sample is Who is 62 years. After all, we might have picked an unusual sample just through
approximately normally distributed. This is true even if the population distribution the luck of the draw. Let us say, for argument, that the population mean age is 62
is not normal, provided the sample size is large. and the standard deviation is 11.5. How likely would it be to find a sample mean
age of 56.7? If it is very unlikely, then we would reject this null hypothesis that the
We now have enough information from our sample to construct a normal population mean is 62. Otherwise, we would fail to reject it.
approximation of the distribution of our sample mean.
There are several ways to represent an alternative hypothesis against this null
hypothesis. We could make a simple alternative value of 56.7 years. Usually,
however, we make the alternative composite—that is, it represents a range of
possibilities that do not include the value 60. Here is how it would look:

Ho: μ = 62 (null hypothesis)

H1: μ ≠ 62 (alternative hypothesis)

We would reject the null hypothesis if our sample value for the mean were outside
of a set of values that a population value of 62 could plausibly generate. In this
context, "plausible" means more probable than a conventionally agreed upon
critical level for our test. This value is usually 0.05. A result that would be expected
to occur fewer than five times in a hundred samples is considered significant and
would be a basis for rejecting our null hypothesis.

93
Constructing this hypothesis test is mathematically equivalent to sliding the
normal distribution to center over 62. We then look at the sample value 56.7 to see
if it is outside of the middle 95% of the area under the curve. If so, we reject the Statistical Methods
null hypothesis.
Once we have established an understanding of how we state and test hypothesis,
we will move into test where we have two or more means and situations where we
want to develop a predictive equation to help in making better decisions. Chapter
4 will focus on:

• Comparing two means.

• Comparing three or more means (ANOVA).

Chapter 5 will expand on the relationship between variables by examining how we


develop and test linear equations (regression and multiple regression).

• Linear regression.

• Multivariate regression.

Structuring of hypothesis for testing is covered in sections 3 and 4.

94
Section 2

Confidence Interval Estimates

CONFIDENCE INTERVAL Confidence Interval Estimates

1. Interval Estimates To make an inference about a population we will start by constructing and estimating a model to

2. Population Unknown estimate a confidence interval using the normal distribution and then show the circumstances that
require modifying the procedure and using the (t) distribution. The procedure for making a 95%
3. Using t-Statistics confidence interval estimate of (µ) is as follows:

4. 95% Confidence? 1. Assume a normal population with a (σ) known.

2. Take a random sample of size n from the population and compute its mean ( ).

3. Select as the lower (upper) confidence limit a value µ1 (ô2) which if it were the true population
mean, would make the probability of obtaining the given sample mean ( ) or a larger (smaller)
sample mean just equal to 0.025.

Thus, the lower confidence limit is selected by choosing the value of µ1 to fall two deviations below the
sample mean ( ), (at – 2 ) and the upper confidence limit (µ2) to fall at 1.96 above the sample

mean ( ), (at +2 ). That is: µ1 = –2 and µ2 = +2

Note: It is necessary to assume that the population is normally distributed

And if sampling from a normal population; the sampling distribution of means will be normal regardless
of the size of the sample (n).

If the above conditions are met, use the statistic t = 2 to compute the 95% confidence interval because
the statistic:

95
1. Is normally distributed. 1. ô must be determined from the sample and

2. Has mean = 0. 2. the sample size is small, the (t) statistic should be used to determine the
width of the confidence interval (or to test hypotheses).
3. A value of t + 2 would encompass 95% of the area under a standard
The values of the (t) statistic depend on the sample size, whereas the values of (Z)
normal curve.
are the same for all values of sample size. For sample sizes greater than 121, the
95% Confidence Interval with Standard Normal Statistic (t) is approximately +/-2 value of the (t) statistic is essentially the same as the (Z) statistic. Therefore, the
. The exact test and probability is reported by the statistics program. (t) distribution is tabled only for values of (n) from 1 to 121.

Some Typical Values for (Z) and (t)


Population assumed normal (or large) with mean (µ) unknown but standard
deviation (σ) known.
t t t t
Confidence Z (n=121) (n=30) (n=10) (n=5)
Coefficient (All N) (df=120) (df=29) (df=9) (df=4)

0.90 1.645 1.658 1.699 1.833 2.132


0.95 1.960 1.980 2.045 2.262 2.776
0.99 2.580 2.617 2.756 3.250 4.604

Since the value of Z is the same for all values of (n) there is only one curve for the
standard normal statistic. For the (t) distribution, however, we have a family of
curves, each slightly different depending on the sample size (n). The (df) under (n)
in each column for the (t) distribution stands for “degrees of freedom”. Degrees of
σ Unknown, Population Normal or Large
freedom equals (n-1) for the case where only one population parameter is
In many cases there is no way of knowing the population standard deviation (σ). estimated from sample data.

In such cases estimate σ from the sample in order to, in turn, estimate . Later, when more than one population parameter from the sample data are
estimated, the degrees of freedom will be [(n) – (number of parameters estimated)].
If the estimated standard deviation of the sampling distribution ( ) is used (also
For example, in testing the difference between the two means, the degrees of
freedom are (n1 + n2 – 2).
called the estimated standard error of the mean), the (t) statistic can be computed.
For small sample sizes (n) the distribution of the (t) statistic departs from the
Note in the above table that the values for (t), when n=30, are somewhat, but not
normal distribution. Thus when:
greatly, different from the (Z) values for the (t) values when n ≥ 30. It is always

96
more accurate to use the (t) values for n < 121 and when ô has been estimated Is it appropriate to use the t distribution?
from the sample.
1. estimate ô from the sample
Summary of When to Use the (t) Distribution
2. n is small (even less than 30)
• Whenever (ô) must be estimated from the sample.
3. assume population of service hours is normally distributed
• Sample size (n) is small.
Because of (1) and (2) use the (t) distribution. The (t) value for n = 10, df = 9, 95%
Do not confuse these criteria with the incorrect assumption that the (t) distribution Confidence Interval = 2.262. Thus the confidence interval would be:
is used whenever (n) is small.
Lower limit: µ1:
95% Confidence Interval with (t) statistic and n= 10
= 70.6 – (2.262) (.75)
Population assumed normal (or large) with mean and a standard deviation σ
= 70.6 – 1.70
both unknown.
= 68.9 hours

Example
Upper limit: µ2:
Take a random sample of n = 10 invoices from a monthly total of 165 invoices and
make a 95% confidence interval estimate of the average number of service hours = 70.6 + (2.262) (.75)

invoiced.
= 70.6 + 1.70

= 706/10 = 70.6
= 72.3 hours

ô = 2.37

Thus, the 95% confidence interval estimate of the population mean service hours
= = 0.75
of is 68.9 hours ← to → 72.3 hours.

97
What Does the “95% Confidence” Mean?

If all possible samples of size 10 were taken from this population of 165 invoices
and an interval was constructed from each sample (as done above), then 95% of
the interval constructed would contain the true population mean service hour and
5% of the intervals would NOT include the true population mean service hours.

Comparison of (x) and (t) 95% Confidence Intervals

µ1 µ2 Width
Normal (Z) (incorrect) 69.13 70.6 72.07 2.94 hours
t (correct) 68.90 70.6 72.30 3.40 hours

The correct use of the (t) statistic produces a wider confidence interval (3.40
hours) than the interval constructed with the normal (Z) statistic (2.94 hours). The
relatively wider interval (and thus more conservative estimate) associated with the
(t) distribution reflects the fact that less information is assumed available to make
the estimate (ô must be estimated from the sample) with the (t) distribution than
with the normal (Z) distribution. As discussed above, it is (theoretically) assumed
with the use of the normal distribution that (σ) is either known or that the sample
used to estimate (σ) is large enough (>121, or 30), and that no adjustment need be
made for the paucity of information used in estimating (σ). Of course, it is also
assumed that the population is normal, or large, as previously explained.

98
Section 3

Hypothesis Testing

HYPOTHESIS TESTING Hypothesis Testing

1. Key Points from Sampling Hypotheses come in many shapes and forms. Some are fairly general

2. Concept of Hypothesis Testing statements or assumptions about phenomena, such as the hypothesis: Dilbert’s
Service Department is going to have a rough time completing work promised
3. Null and Alternate Hypothesis today. That hypothesis can be tested by spending the day in the service shop, or
simply by waiting until the next morning and reading the computer printout of
4. Type I an Type II Error
yesterday’s activities.

Key Points from Sampling

Statistics such as are random variables since their value varies from sample to
sample. As such, they have probability distributions associated with them. The
sampling distribution of a statistic is a probability distribution for all possible
values of the statistic computed from a sample of size n.

The sampling distribution of the sample mean is the probability distribution of


all possible values of the random variable computed from a sample of size n
from a population with mean μ and standard deviation σ.

The mean of the sampling distribution is equal to the mean of the parent
population and the standard deviation of the sampling distribution of the sample

mean is regardless of the sample size.

99
The Central Limit Theorem the shape of the distribution of the sample mean The Null and Alternate Hypothesis
becomes approximately normal as the sample size n increases, regardless of the
shape of the population. Statistical tests always involve two hypotheses:

A hypothesis is a statement regarding a characteristic of one or more 1. The null hypothesis H0.
populations. We test these types of statements using sample data because it is
2. The alternative hypothesis H1.
usually impossible or impractical to gain access to the entire population. If
population data are available, there is no need for inferential statistics. The null hypothesis is that there is “no difference” between two things.

The “no difference” really means no statistically significant difference between the
observed sample statistic, for example, a sample mean, x and the assumed value
Statistical Hypothesis – A Narrow Concept of the population mean, µ. Thus the statistical notation: H0 = µ = 70 inches
means that there is no statistically significant difference between a sample mean
Hypothesis testing is a procedure, based on sample evidence and probability,
( = some value) and 70 inches.
used to test statements regarding a characteristic of one or more populations.

Stated another way, it means that any actual difference between a sample value
A statistical hypothesis is an assumption or statement made about a population
(mean, standard deviation, shape as normal or not) that is tested using information (for example, = 72 inches) and the assumed value µ = 70 inches, can be

contained in samples together with probability ideas. We either accept or reject it explained or accounted for by chance variation alone, and does not require some
on the basis of a pre-chosen probability level. Note: You never prove or disprove other (outside) influence to explain the difference.
a statistical hypothesis, only accept or reject.
More technically, the null or “no statistically significant difference” means that the
The pre-chosen probability level is called the level of significance, or the alpha ( ) value of the sample mean actually observed ( = 72) could reasonably be a
error, or type 1 error. It is pre-chosen probability of rejecting the hypothesis when member of a sampling distribution whose mean is equal to the assumed
it is in fact true. The most commonly used error level is = 0.05, however, =
population mean µ = 70 and whose standard deviation is
0.10 is also commonly used in business, and occasionally = 0.010.

100
The alternative hypotheses, H1 can be one of the three variations:

3. H1: µ < 70 one tail test

1. H1: µ ≠ 70 two tail test

The alternative hypotheses of the second and third types are “directional”
hypotheses, and are used only when the expected direction of the deviation from
the null hypothesis is reasonably founded.

Whenever the directional hypotheses are used, the null hypothesis can be
interpreted as ≤ or ≥, depending on the direction of the alternative hypothesis.
2. H1: µ > 70 one tail test
For example: H0: µ = 70
H1: µ > 70
actually becomes: H0: µ ≤ 70
H1: µ > 70

and: H0: µ = 70
H1: µ < 70
actually becomes: H0: µ ≥ 70
H1: µ < 70

101
Whenever the null hypothesis is tested, the alternative hypotheses must be stated 2. Calculate an acceptance region in terms of the test statistic (such as t) for the
because it determines whether a one tail or two tail test is to be made. chosen α level, such as α = 0.05. Again, this determines a 95% acceptance region
and a 5% rejection region stated in terms of t values. The boundaries are called
Whenever the null hypothesis is accepted, the alternative hypothesis is rejected. critical values of t and are labeled t crit1 and tcrit2 for a two tail test. The value of
Likewise, whenever the null hypothesis is rejected, the alternative hypothesis must
the sample statistic, , is converted into a t statistic value. This value of called
be accepted.
tobserved (or tobs). If the value of tobs is within the acceptance limits established by
Another way of viewing statistical tests of hypotheses is to ask, “Which of the tcrit1 and tcrit2, the null hypothesis is accepted, and if outside the limits, the null

hypotheses, H0 or H1, is most consistent with the sample data (e.g., mean = 72)?” hypothesis is rejected.

If the difference between the assumed value under H0, µ = 70 and the observed 3. Calculate the probability that a certain sample statistic will differ from the

value mean = 72 is “large”, it is probably more reasonable to conclude that the chosen population parameter, µ, by more than a specified amount. If the
sample did not come from a population whose mean = 70, rather it came from one probability is smaller than a minimum level, such as α = 0.05, reject the null
whose mean is greater or less than 70, as the case may but under H1, µ > 70 or µ hypothesis, if not, accept it.
<70.
Example of Three Computational Procedures for Testing Hypotheses
Just how far the sample value, = 72, can be from H0, µ = 70, before one
Data from service hour sample of 165 invoices :
concludes that it supports H1 rather than H0 depends on the level of significance,
or (error, and the standard deviation of the sampling distribution (which can be
N = 165 ô = 2.37
reduced by increasing the sample size n).

Three Computational Procedures n =10 =0.75

There are three computational procedures which can be used in testing = 70.6 in

hypotheses. All give the same results, but the first method described below is
To test the null hypothesis that mean hours of population of service invoices for
less likely to lead to error in application situations.
transmission overhauls in the building construction market is µ = 72 hours:

1. Calculate an acceptance region in terms of the data in the problem statement


H0 µ = 72 (two tail test, α = 0.05)
for the chosen α level. If α = 0.05, this determines a 95% acceptance region and
a 5% rejection region. The boundaries of the acceptance region are called “critical H1 µ ≠ 72
values” and are labeled CV1 and CV2 for the lower and upper boundaries in a two
tail test. If the sample statistic, , fall in the acceptance region, accept the null
hypothesis, if ∂

102
1. Computational procedures in terms of data in problem:

• Compute CV1 and CV2: 2. Computational procedure in terms or test statistic (in this case, t):

CV1 = 72 - t CV2 = 72 + t

= 72 - 2.262 (.75) =72 + 2.262 (.75)

• Determine tcrit1 and t crit2: t9df, α = 0.05 = 2.262.

• Compute t observed for value:

tobs = 1.867

• Check to see whether tobs falls within acceptable region established by

tcrit1 and tcrit2.


• Check to see whether = 70.6 falls in acceptance or rejection region for
H0. • Conclusion: tobs is within acceptance region, therefore accept H0.

• Conclusion: = 70.6 falls in acceptance region, therefore accept H0 and


reject H1.

103
by sampling variation alone, and does not need any (outside) influence to
explain the difference.
3. Computational procedure in terms of probability levels:
Assumptions Underlying Hypothesis Tests Regarding One Mean
From the original hypothesis and confidence level of .95 we get alpha = 0.05 for a
two tail test. In other words we have a probability of .025 in each tail of the reject The preceding examples of different computational procedures for testing
Ho region. hypothesis concerning one mean have three basic underlying assumptions:

• To compute probability of obtaining a value of = 70.6 or greater from a 1. The sample chosen must be a Random Sample.
sampling distribution when µ = 72 and = 0.75.
2. The level of measure achieved must be at least interval level.

Compute value of t:
3. The samples must have been drawn from normal populations or from large
populations (so that the Central limit theorem holds) as discussed
previously.

t= = = -1.867
Summary
• Interpret the probability associated with: t = 1.867, two tail, 9 degrees of
The null hypothesis, denoted Ho, is a statement to be tested. The null hypothesis
freedom
is a statement of no change, no effect or no difference. The null hypothesis is
o t = 1.867, area = .096 assumed true until evidence indicates otherwise. In this chapter, it will be a
statement regarding the value of a population parameter.
• To get area in left tail to compare with critical level of α/2 = 0.025, for
The alternative hypothesis, denoted H1, is a statement that we are trying to find
evidence to support.
o t = 1.867, /2 = = 0.048
A statement regarding the value of a population parameter.
• Compare critical level of α = 0.025 with observed level of .048.
There are three ways to set up the null and alternative hypotheses:
• Conclusion: The probability that x will differ from µ is observed to be larger
(0.096, two tail) than the minimum (critical) level (α = 0.05, two tail); thus, Equal versus not equal hypothesis (two-tailed test)
the null hypothesis is accepted. Note: This means that it is not
• H0: parameter = some value
“unreasonable” to obtain a sample value as large as = 70.6 from a
population whose µ = 72 and = 0.75. Thus, the difference (70.6 vs. 72)
• H1: parameter ≠ some value
is not significant. The observed difference (70.6 vs. 72) can be explained
104
Equal versus less than (left-tailed test)

• H0: parameter = some value

• H1: parameter < some value

Equal versus greater than (right-tailed test)

• H0: parameter = some value

• H1: parameter > some value

The null hypothesis is a statement of “status quo” or “no difference” and always
contains a statement of equality. The null hypothesis is assumed to be true until
we have evidence to the contrary. The claim that we are trying to gather evidence
for determines the alternative hypothesis.

Type I versus Type II Errors

We reject the null hypothesis when the null hypothesis is true. This decision would
be incorrect. This type of error is called a Type I error.

The probability of making a Type I error, α, is chosen by the researcher before the
sample data is collected. The level of significance, α, is the probability of making a
Type I error.

We do not reject the null hypothesis when the alternative hypothesis is true. This
decision would be incorrect. This type of error is called a Type II error.

As the probability of a Type I error increases, the probability of a Type II error


decreases, and vice-versa.

105
Section 4

Hypothesis Test

HYPOTHESIS TEST Hypothesis Test

1. Test of Hypothesis The sampling distribution (of means) is used in a slightly different manner in a

2. Two-tailed Test confidence interval estimate as compared to test of hypotheses. An example of


is shown using the following data. (Two Methods of Calculating ô)
3. One-tailed Test
Method 1 or Method 2
4. Testing in Scientific Research

5. Testing in Quality Control



6. Testing Using Student’s t-Distribution
Example:
A local dealership is interested in the average number of service hours invested in
a specific repair. They have drawn the following sample.

n=10 x ( x − x) ( x − x )2 x2
1 73 +0.9 0.81 5329
2 73 +0.9 0.81 5329
3 70 +2.1 4.41 4900
4 69 +3.1 9.61 4761
5 71 +1.1 1.21 5041
6 70 +2.1 4.41 4900
7 78 +5.9 34.81 6084
8 70 +2.1 4.41 4900
9 71 +1.1 1.21 5041
10 76 3.9 15.21 5776

106
The following illustrates the general hypothesis.

Test of Hypothesis: Service Hours

H0: µ = 70 α = 0.05
H1: µ ≠ 70

Our hypothesis states that we expect the mean population to be equal to 70


service hours. We will reject the hypothesis if we find a value that is more than 2
standard deviations above or below 70 service hours. The use of approximately 2
standard deviations is from the alpha of .05 and the normal curve. Using SYSTAT we enter the data from our sample and generate the following
statistics by using the Analysis and Descriptives (Basic) options.

107
The standard deviation for the sampling distribution is then the standard deviation When we click on One-Sample t-Test the following window opens. We click on the
divided by the square root of the number of cases. variable named Service_Hrs and click the Add button. For the Mean we enter the
hypothesized value for service hours of 70. The default test is for a hypothesis of
Sampling distribution of the means we set the mean as hypothesized at 70 service equality and the alternate of not equal. The default confidence level for alpha is
hours and calculated the standard deviation as . 0.95.

Based on our analysis we will accept H0 of service hours equal 70 if the value from
our sample falls within +/- 2 (.92). We calculate the acceptance range as being
67.92 to 72.08. If the sample mean is less than 67.92 we will reject H0. If the
sample mean is greater than 72.08 we will reject H0.

= 72.1, Reject H0

The results of our study lead us to reject our hypothesis of service hours equal 70.

Extending the
example we can
use SYSTAT to
test our general
hypothesis of
service hours
equal to 70. Once
we have entered
the data into
SYSTAT we can
request Analysis,
Hypothesis
Testing, Mean,
Click OK.
and One-Sample
t-test.

108
The first window of output you will see is the graph for the t-test showing the Guidelines for Hypothesis Construction
distribution of values and the box plot from the sample.
In a single-tailed hypothesis testing problems, it is sometimes difficult to
determine which hypothesis should be the null and which should be the
alternative. There are two general areas of problem solving in which we will
employ hypothesis testing techniques. The first is the area of scientific research
and the second is the area of quality control.

Hypothesis Testing in Scientific Research

In scientific problem solving we are trying to determine if a research hypothesis is


supported by the available evidence. A research hypothesis is a statement or
prediction based upon the theory that is under investigation.

Examples of research hypothesis:

• The quantity demanded of good (Y) will increase as its price is decreased.

• The profits of monopolistic firms will exceed those of purely competitive


firms.

• Attitude toward a brand and purchase of that brand are positively related.
The results of the t-test reveal that the sample mean of 72.1 service hours is over
We test our research hypothesis by converting it into a statistical hypothesis. A
2 standard deviation above the hypothesized mean of 72 (t=2.272) and the
statistical hypothesis is an assumption about a population or a population
parameter that may be accepted or rejected using the techniques of hypothesis
testing.

Examples of statistical hypotheses:

1. H0: µ1 = µ2 one tail


H1: µ1 > µ2
probability of this occurring is .049 or less than 5% (p-Value). WIth our we 2. H0: µ = 36 two tail
reject the null hypothesis at a 95% Confidence Level. H1: µ ≠ 36

109
In scientific research the alternative hypothesis (H1) is the operational statement of In this example we are interested in testing in one direction. We are looking to find
the research hypothesis. This means that the null hypothesis (H0) is a “straw if our sample results are significantly greater than the hypothesized population
man”, that is, it is formulated for the express purpose of being rejected. In value of 20.
scientific research the hypothesis is not accepted unless the overwhelming weight
of the evidence is in its favor.

Example: Assume a theory indicates that the true value of a population mean in
the current time period should exceed its value in the previous time period (the
research hypothesis). If the mean equaled 20 in the previous period, their
hypothesis would be:

H0: µ = 20 (straw man)

H1: µ > 20 (operational statement of the research hypothesis)

If, in the above example, we test these hypotheses at the alpha = 0.05 level, there
is a 0.05 probability of rejecting H0, and thereby accepting H1, when H0 is true, i.e.,
when our research hypothesis is incorrect. By setting up our hypotheses in this
manner, we make it difficult to accept our research hypothesis unless the weight
of the evidence is strongly in its favor. If our hypothesis had been stated as less than such as:

As illustrated earlier we are able to draw these conclusions based on the H0: µ = 50 (straw man)
probability of an observation occurring by pure chance. As an observation moves
H1: µ < 50 (operational statement of the research hypothesis)
further away from our hypothesized value we know that the probability of this
occurring by chance reduces.
In this case our test would be in the opposite direction. We are looking to find if
our sample results are significantly less than the hypothesized population value of
50. Remember that as the researcher (decision maker) you set the confidence
level (CL) required (the alpha value for the statistical test is simply 1-CL).

110
Hypothesis Testing In Quality Control

In acceptance sampling a sample of items is taken from a production process in


order to determine if that process is in control, i.e., whether items are produced to
specifications. If the sample indicates that the process is not “in control”, stop
production and find out what is wrong.

1. Since the decision is based on the test of a statistical hypothesis, there is


always the risk of making an error.

2. The risk (or probability) of shutting down production to look for a problem
that does not exist is the alpha or Type 1 error.

3. The alternative error, producing products which do not meet specifications,


Using SYSTAT we can test our hypothesis by selection greater than or less than in
is the β or Type II error.
the alternate hypothesis portion of the Analysis window for the t_test.

4. By convention, quality control engineers have considered the alpha risk to


be more serious (or costly) of the two (note that this is a production
orientation).

In order for the risk of shutting down production when no problem exists to be
given by alpha, the null hypothesis must be that the process is in control.

Example: Assume that Rastafar Equipment requires that not more than 5% of
weld/bore service jobs are redone. The acceptance sampling hypotheses would
be:

H0: µ ≤ 5% (production in control)

H1: µ > 5% (production not in control)

Thus, at the alpha level of 0.01, there is a very small chance of shutting down the
production line to correct a problem that does not exist.

111
As we move further away from the mean score we find the probability of this result Student’s t-distribution
gets increasingly smaller. By design we do not want to such down the production
line unless the probability of the results from our sample are so rare that it is highly To test hypotheses regarding the population mean assuming the population
likely that the number of reworked jobs is greater than 5%. standard deviation is unknown, we use the t-distribution. When we replace σ with
s,

follows Student’s t-distribution with n-1 degrees of freedom.

Characteristics of Student’s t-distribution

• The t-distribution is different for different degrees of freedom.

• The t-distribution is centered at 0 and is symmetric about 0.

• The area under the curve is 1. Because of the symmetry, the area under
the curve to the right of 0 equals the area under the curve to the left of 0
equals 1/2.

• As t increases (or decreases) without bound, the graph approaches, but


never equals, 0.

• The area in the tails of the t-distribution is a little greater than the area in the
tails of the standard normal distribution because using s as an estimate of σ
introduces more variability to the t-statistic.

• As the sample size n increases, the density curve of t gets closer to


the standard normal density curve. This result occurs because as the
sample size increases, the values of s get closer to the values of σ by the
Law of Large Numbers.

112
Testing Hypothesis with the t-distribution Hypothesis Testing or Confidence Intervals

To test hypotheses regarding the population mean with σ unknown, we use the Confidence intervals and hypothesis testing give the same results so which
following steps, provided that: method is more useful. The answer is that it depends on the context. Scientific
journals usually follow a hypothesis testing model because their null hypothesis
• The sample is obtained using simple random sampling.
value for an experiment is usually 0 and the scientist is attempting to reject the

• The sample has no outliers, and the population from which the sample is hypothesis that nothing happened in the experiment. Those involved in making

drawn is normally distributed or the sample size is large (n≥30). decisions—epidemiologists, business people, engineers—are often more
interested in confidence intervals. They focus on the size and credibility of an
Step 1: Determine the null and alternative hypotheses. effect and care less whether it can be distinguished from 0.

Step 2: Select a level of significance, α, based on the seriousness of making a


Type I error.

Step 3: Compute the test statistic

SYSTAT provides the test statistic and probability as part of the Analysis routine.

Step 4: Determine the probability (p-value) using n-1 degrees of freedom.


(Provided in SYSTAT output).

Step 5: Compare the observed p-value to the critical alpha value. If the P-value <
α, reject the null hypothesis.

Step 6: State your conclusion.

Remember we never “accept” the null hypothesis because without having access
to the entire population, we don’t know the exact value of the parameter stated in
the null. Rather, we say that we do not reject the null hypothesis.

113
Chapter 4

Testing Two
or More
Means
We often want to compare the means from two or
more groups. In this section we will review the use
of t-test for two means and ANOVA for comparing
two or more means.
Section 1

Two Independent Samples

INFERENCES TWO SAMPLES Inferences on Two Independent Samples


1. Two Independent Samples A sampling method is independent when the individuals selected for one sample do not dictate which

2. Testing Hypothesis individuals are to be in a second sample.

3. SYSTAT Two Samples Analysis Suppose that a simple random sample of size n1 is taken from a population with unknown mean μ1
and unknown standard deviation σ1. In addition, a simple random sample of size n2 is taken from a
population with unknown mean μ2 and unknown standard deviation σ2. If the two populations are nor-
mally distributed or the sample sizes are sufficiently large (n1 ≥ 30, n2 ≥ 30) , then

approximately follows Student’s t-distribution with the smaller of n1-1 or n2-1 degrees of freedom
where is the sample mean and s is the sample standard deviation from population i.

Testing Hypotheses Regarding Two Independent Samples

To test hypotheses regarding two population means, μ1 and μ2, with unknown population standard de-
viations, we can use the following steps, provided that:

• the samples are obtained using simple random sampling;

115
• the samples are independent; The basic statistical concept underlying the test is to sample from two normal
populations, which in theory are actually assumed to be one, take two independ-
• the populations from which the samples are drawn are normally distributed ent random samples and compute their means (mean 1 and mean 2); then, if all
or the sample sizes are large (n1 ≥ 30, n2 ≥ 30). possible, such samples of sizes n1 and n2 are taken from the population(s) and
the differences between means for such samples is determined, then the distribu-
tion of all these differences between means forms a sampling distribution which
Hypothesis Test of Difference Between Two Means

itself has a mean 1- 2 = 0, and is normally distributed with standard devia-


Many instances occur in business where it is desirable to test whether the differ- tion:
ences between two sample outcomes is statistically significant, or is just a chance
occurrence due to sampling error. For example, a maintenance supervisor buys
light bulbs from two suppliers and needs to determine whether the burning life of
the two are equivalent; an owner of a fleet of trucks buys two brands of tires and
needs to know if they provide equal mileage or equal tread wear; a marketing man-
ager experimenting with two different point of sale displays needs to know if the Not surprisingly, this sampling distribution is called the “Sampling Distribution of
difference in sales produced by each are statistically significant; an accountant Differences Between Means”. This sampling distribution is used to test hypothe-
auditing a firm’s accounts receivable samples the accounts in April and again in ses of differences between two means.
June to determine if the average age of account is the same in June as in April or
has changed; the personnel manager of a large dealership measures the IQ Hypothesis test of difference between two means
among a sample of service workers and among similar service workers at another Population (1) and Population (2)
dealership to test the hypothesis that the greater productivity at one plant is due H0: µ1 - µ2 = 0 Thus, µ1 = µ2
to the higher average IQ of the service employees there. H1: µ1 - µ2 0 (two tail)
or H1: µ1 - µ2 > 0; or 0 < (one tail)
All of these examples are possible applications of the parametric test of difference
between two means using the Z or t distribution, which is the best test to use
when its assumptions can be met. The five assumptions of this test are among Two Tailed Left Tail Right Tail
the most restrictive of all statistical models, yet they can often be fulfilled. The
H0: ud = 0 H0: ud = 0 H0: ud = 0
test requires (1) that interval level of measure be attained (2) with two independent
(3) random samples from (4) normal populations that have (5) equal variances.
H1: ud <> 0 Hi. ud < 0 Hi. ud > 0
.

Sampling Distribution of Mean:

116
If the samples did not come from normal populations with equal variance the com-
Made up of differences between Std. Dev.: putation of ô pooled will produce erroneous conclusions for the test. The equal
variance assumption can be verified with the F test presented in the next unit.
Means from all possible pairs of Independent samples of size n1and n2 which can Note the similarity between the above formula for and the formula used in testing
one mean:
be taken from estimate from sample the populations (1) and (2)
Two Independent Samples (one mean) or

Sample (1) Sample (2)

Mean = X1 Mean = X2
= (two means)

In computing from the sample data, it is usually more efficient to use the for-
Std. Dev.= Std. Dev.=
mula:
(est. of pop.) (est. of pop.)
Difference between Sample Means= X1 - X2.

The standard deviation of the sampling distribution of differences between means


is derived from a combination of the standard deviations of the two samples.
rather than compute ô-pooled first. Either method will give the same answer if n’s
Some authors use the term “ô-pooled” to denote this combination, where:
are equal.

Example: Hypotheses Test, Difference Between Two Means

Rastafar Equipment can purchase caliper brake pads from two manufacturers.
The pads have the same shape and surface configuration but are made of differ-
ent rubber compounds. Shipments of 1,000 pads have been purchased from
each vendor, and a random sample of 15 pads is chosen for an engineering test.
Then which is the same as .
A special test fixture determines the number of hours the brake pad can be
pressed against a backhoe loader wheel with a given force before wearing away
117
1/8 inch of material. Assume the following are typical results:

Brake Pad A Brake Pad B Step 6: Establish critical value, CV1 and CV2, for acceptance and rejection re-
n = 15 n = 15 gions for H0: Since H0: µ1 = µ2 is same as µ1 - µ2 = 0, the center of the sampling
= 573 hours = 620 hours
distribution of differences between means 1- 2 centers on zero.
ô = 40 hours ô = 60 hours

In this case, since n1 + n2 – 2 = 15 +15 – 2 = 28, is less than 30, and ô1 and ô2

Step 1:State H0 and H1: Ho: µ1 = µ2 are estimated from the samples, it is appropriate to use Students’ t distribution

H1: µ1 µ2 instead of the normal Z distribution. The degrees of freedom for the t distribution

= 0.05 for two means tests are n1 + n2 = 28 degrees of freedom in this case: t 28df.05 =
2.048.
Since no evidence is available as to which brake pad should last longer,
use a two-tail test.

Step 2: Calculate 1and 2 from the sample data; in this case, given CV1$=$,38.09$ CV2$=$38.09$
as: 1= 573, 2= 620

Step 3: Calculate ô1 and ô2 from data:

and
In this case, given as ô1 = 40, ô2 = 60.
Step 4: Use the F test to determine whether samples came from a popu- Thus, for a two-tail test, = 0.05.
lation with equal variance.

CV1 = 0 – (2.048) (18.6) = -38.09

Step 5: Calculate = CV2 = 0 + (2.048) (18.6) = 38.09


= 18.6 hours

118
Step 7: Check to see whether 1- 2 falls within or outside the acceptance re-

gion for Ho: 1- 2. = 573 – 670 = -47.

Conclusion: Reject Ho: 1- 2= -47 is outside the acceptance region for Ho.
The brake pads do not last for an equal number of hours, or they do not wear at
the same rate.

Using the t-test we calculate the observed t statistic = -2.526. We compare the
observed t statistic to the critical t statistic of -2.048. The observed t statistic is
outside the acceptance region for Ho. We reject Ho.

SYSTAT Two Independent Samples Analysis Once the data has been entered into SYSTAT we can request Analysis, Hypothesis

When we have two independent samples we can use SYSTAT to test our hypothe-
sis regarding the population.

EXAMPLE

A company is interested in marketing educational software. The company col-


lected data from the Northeast and West on math and verbal skills.

The company believes the Northeast and West coast have similar math skills. The
hypothesis is:

Ho: NE math = W math

H1: NE math W math

119
Testing, Mean, and Two-Sample t-Test.

We are interested in math skills as measured by a math proficiency exam for two
REGIONS the NE (coded 1) and the WEST (coded 4). We click OK and SYSTAT
generates the following output.

The plot of scores for the northeast and west gives a clear indication of the differ-
ences in both the distribution of scores and the mean scores. Scores in the north
The results indicate that the average math scores in the West (REGION 4) are
east are more tightly clustered (lower variance) than the scores in the west.
higher than math score in the Northeast (REGION 1) at 505 versus 470, respec-
tively. The test of differences at a 95% confidence level (alpha=.05) is significant.
We come to this conclusion by comparing the observed p_Value of .000 to our
critical value of .05. SInce the observed p_Value is less than our critical value we
reject Ho (no difference in scores).

120
Section 2

Two Dependent Samples

INFERENCES TWO OR MORE SAMPLES


Inferences on Two Dependent Samples
1. Two Dependent Samples

2. Testing Hypothesis A sampling method is dependent when the individuals selected to be in one sample are used to deter-
mine the individuals to be in the second sample. Dependent samples are often referred to as matched-
3. SYSTAT Two Dependent Samples pairs samples.

In other words, statistical inference methods on matched-pairs data use the same methods as infer-
ence on a single population mean, except that the differences are analyzed.

Testing Hypotheses Regarding Two Dependent Samples

To test hypotheses regarding the mean difference of matched-pairs data, the following must be satis-

fied:

1. the sample is obtained using simple random sampling

2. the sample data are matched pairs,

3. the differences are normally distributed with no outliers or the sample size, n, is large (n > 30).

121
Step 5: Compare the critical value with the test statistic:
Step 1: Determine the null and alternative hypotheses. The hypotheses can be struc-
Step 6: State the conclusion.

tured in one of three ways, where ud is the population mean difference of the matched-

P-Value Approach
pairs data.

Step 4: Determine the P value using n-1 degrees of freedom.

Two-Tailed
Two Tailed Left Tail Right Tail
The sum of the area in the tails is the P-value.
H0: ud = 0 H0: ud = 0 H0: ud = 0

H1: ud <> 0 Hi. ud < 0 Hi. ud > 0

Step 2: Select a level of significance for alpha based on the seriousness of making

a Type-1 error.

Step 3: Compute the test statistic which approximately follows Student's t- Left-Tailed
The area left of t0 is the P-value.
distribution with n-1 degrees of freedom. The values of X and sd are the mean and

standard deviation of the differenced data.

Classical Approach

Step 4: Determine the critical value using n-1 degrees of freedom

122
Right-Tailed
Note: The interval is exact when the population is normally distributed and approxi-
The area right of t0 is the P-value.
mately correct for non-normal populations, provided that n is large.

Inference about Two Means: Independent Samples

Suppose that a simple random sample of size n1 is taken from a population with
unknown mean and unknown standard deviation In addition, a simple ran-
dom sample of size n2 is taken from a population with unknown mean and un-
known standard deviation If the two populations are normally distributed or the
Step 5: If P-value < a, reject the null hypothesis. sample sizes are sufficiently large (n1> 30, n2 > 30), then

Step 6: State the conclusion.


d
t0 =
sd
These procedures are robust, which means that minor departures from normality will
n
not adversely affect the results. However, if the data have outliers, the procedure approximately follows Student's f-distribution with the smaller of n1 or n2 degrees
of freedom where is the sample mean and s, is the sample standard deviation
should not be used. of the differenced data.

Confidence Interval for Matched-Pairs Data Testing Hypotheses Regarding the Difference of Two Means

A ( ) confidence interval for Ud is given by To test hypotheses regarding two population means, X1 and X2, with unknown

population standard deviations, we can use the following steps, provided that:
Lower bound:
1. the samples are obtained using simple random sampling;

Upper bound: 2. the samples are independent;

3. the populations from which the samples are drawn are normally dis-
The critical value ta/2 is determined using n-1 degrees of freedom.
tributed or the sample sizes are large [ n 1 > 30, n2 > 30).

123
lected data on math and verbal skills across the US.
Step 1: Determine the null and alternative hypotheses. The hypotheses are structured

The company believes that Verbal and Math skills should be the same. The gen-
in one of three ways:
eral hypothesis is that the population profile on verbal and math is equal for each
state. If the state received high verbal skill scores then they received high math
Two-Tailed Left-Tailed Right-Tailed
skill scores. This is a matched pairs test with scores for verbal and math by state.
H0: U1 = U2 H0: U1 = U2 H0: U1 = U2
Ht: U1 U2 Hi. U1 < U2 Hx: U1 > U2 Once the data are entered into SYSTAT we request Analysis, Hypothesis Tests,
Note: U1 is the population mean for population 1, and U2 is the population mean for Mean, and Paired t-Test.
population 2.

Step 2: Select a level of significance, a, based on the seriousness of making a


Type I error.

Step 3: Compute the test statistic which approximately follows Student's t-


distribution.

Step 4: Determine the critical value using the smaller of n1 -1 or n2 -1 degrees of


freedom.

Step 5: Compare the critical value with the test statistic:

Step 6: State the conclusion.

SYSTAT Two Dependent Samples Analysis

When we have two dependent samples we can use SYSTAT to test our hypothesis
regarding the population.

EXAMPLE

A company is interested in marketing educational software. The company col-

124
The following window opens for variable selection. The pattern of verbal and math score differences is illustrated in the following
graph.

We are interested in the VERBAL and MATH variables and set the confidence level
at 95% (.95) for the hypothesis test. Click OK.

The output from our analysis indicates that VERBAL and MATH scores are not
equal. We reject the null hypothesis.

125
Section 3

Proportions

INFERENCES TWO OR MORE SAMPLES Hypothesis Test for Difference Between Proportions

1. Difference Between Proportions How to conduct a hypothesis test to determine whether the difference between two proportions is

2. State the Hypotheses significant. The test procedure, called the two-proportion t-test, is appropriate when the following
conditions are met:
3. Analyze Sample Data
The sampling method for each population is simple random sampling.
4. Interpret the Results
The samples are independent.
5. SYSTAT Test of Proportions
Each sample includes at least 10 successes and 10 failures. (Some texts say that 5 successes and 5
failures are enough.)

Each population is at least 10 times as big as its sample.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3)
analyze sample data, and (4) interpret results.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis. The
table below shows three sets of hypotheses. Each makes a statement about the difference d between
two population proportions, P1 and P2. (In the table, the symbol ≠ means " not equal to ".)

126
Analyze Sample Data
Set Null hypothesis Alternative hypothesis Number of tails

1 P1 - P2 = 0 P1 - P2 ≠ 0 2 Using sample data, complete the following computations to find the test statistic
2 P1 - P2 > 0 P1 - P2 < 0 1 and its associated P-Value.
3 P1 - P2 < 0 P1 - P2 > 0 1
• Pooled sample proportion. Since the null hypothesis states that P1=P2, we
use a pooled sample proportion (p) to compute the standard error of the
sampling distribution. p = (p1 * n1 + p2 * n2) / (n1 + n2) where p1 is the
The first set of hypotheses (Set 1) is an example of a two-tailed test, since an sample proportion from population 1, p2 is the sample proportion from
extreme value on either side of the sampling distribution would cause a researcher population 2, n1 is the size of sample 1, and n2 is the size of sample 2.
to reject the null hypothesis. The other two sets of hypotheses (Sets 2 and 3) are
one-tailed tests, since an extreme value on only one side of the sampling
distribution would cause a researcher to reject the null hypothesis.

When the null hypothesis states that there is no difference between the two
population proportions (i.e., d = 0), the null and alternative hypothesis for a two- • Standard error. Compute the standard error (SE) of the sampling
tailed test are often stated in the following form. distribution difference between two proportions. SE = sqrt{ p * ( 1 - p ) * [ (1/
n1) + (1/n2) ] } where p is the pooled sample proportion, n1 is the size of
Ho: P1 = P2 

sample 1, and n2 is the size of sample 2.
H1: P1 ≠ P2

Formulate an Analysis Plan



The analysis plan describes how to use sample data to accept or reject the null
hypothesis. It should specify the following elements. • Test statistic. The test statistic is a t-score defined by the following
equation. t = (p1 - p2) / SE where p1 is the proportion from sample 1, p2 is
• Significance level. Often, researchers choose significance levels equal to the proportion from sample 2, and SE is the standard error of the sampling
0.01, 0.05, or 0.10; but any value between 0 and 1 can be used. distribution.

• Test method. Use the two-proportion z-test (described in the next section)
to determine whether the hypothesized difference between population
proportions differs significantly from the observed sample difference.

127
• P-value. The P-value is the probability of observing a sample statistic as Interpret Results
extreme as the test statistic. Since the test statistic is a t-score, use the
probability associated with the t-score. If the sample findings are unlikely, given the null hypothesis, the researcher rejects
the null hypothesis. Typically, this involves comparing the P-value to the
The analysis described above is a two-proportion t-test. significance level, and rejecting the null hypothesis when the P-value is less than
the significance level.

Test the Difference between Two Population Proportions

To test hypotheses regarding two population proportions, p1 and p2, we can use SYSTAT Two Proportions Analysis

the steps that follow:


SYSTAT provides Hypothesis Testing for either a Single Proportion or the equality
Step 1: Determine the null and alternative hypotheses. The hypotheses can be of Two Proportions.
structured in one of three ways:

Two-Tailed Left-Tailed Right-Tailed


H0: P1 = P2 H0: P1 = P2 H0: P1 = P2
Ht: P1 P2 Hi. P1 < P2 Hx: P1 > P2

Step 2: Select a level of significance, based on the seriousness of making a


Type I error.

Step 3: Compute the test statistic where:

Step 4: Determine the critical value. Use the test for a single proportion for a situation involving one group of subjects
whose members can be classified into one of two categories of a dichotomous
Step 5: If P-value < a, reject the null hypothesis. response variable, such as successes and failures. For instance, in a public
opinion poll, we could ask people if they approve or disapprove of the current
Step 6: State the conclusion.
political administration. If sentiment was evenly split, 0.50 of the respondents
128
should respond in each category. However, we hypothesize that recent events will The test for the equality of two proportions applies when dealing with two
sway opinions to be more favorable, leading to a 0.60 approval rating. independent groups whose members can be classified into one of two categories
of a dichotomous response variable. For example, suppose we desire to compare
the effectiveness of two different teaching methods, large lectures versus smaller
laboratory sessions. We will divide the student population into two groups,
assigning a teaching method to each. At the end of the semester, we will record
the number of students passing and the number failing using a common exam.
The null hypothesis asserts that the proportion passing will be the same in the two
groups.

Proportion. Enter the hypothesized value of the proportion according to the


alternative hypothesis. This value must lie in the interval (0,1).

Test against (Null). Enter the hypothesized value of the proportion according to the
null hypothesis. This value must lie in the interval (0,1) and differ from the value for
Proportion. Proportion 1. The hypothesized proportion in the first group. This value must lie in
the interval (0,1).
Alternative type. Specify the alternative (greater than or less than or not equal)
under which the power or sample size is to be calculated. The default is 'not Proportion 2. The hypothesized proportion in the second group. This value must
equal'. lie in the interval (0,1) and cannot equal Proportion 1.

Level of test. Specify the probability of a Type I error, commonly referred to as the
alpha (α) level by default the confidence level is set at 95% (alpha=.05).

129
Alternative type. Specify the alternative (greater than or less than or not equal) EXAMPLE
under which the power or sample size is to be calculated. The default is 'not
equal'. We conducted a study of taste preferences for a Dark coffee blend. The product
group hypothesized that the proportion of liking the dark roast would be 50%.
Sample sizes. You must identify how the total number of cases is distributed
across the two groups:

Equal. The number of cases in the first group equals the number of cases in the
second group.

Unequal. The number of cases differs between the two groups. If selecting this
option, enter the ratio of the group 2 sample size to the group 1 sample size. A
value between 0 and 1 indicates that the second group contains fewer cases than
the first group. Values above 1 correspond to the situation in which the second
group is larger.

Level of test. Specify the probability of a Type I error, commonly referred to as the
The results of the analysis indicate that we reject Ho: Proportion = 0.50 and
alpha (α) level by default the confidence level is set at 95% (alpha=.05).
accept the alternate H1: Proportion <> 0.50.

In addition to an interest in the acceptability of dark roast for Male respondents


the product group was interested in the response of Female participants in the
taste test. The general hypothesis is that Male and Females are equal in their
preference for a dark roast.

130
We can not reject Ho: Proportion Male = Proportion of Female at a 95%
confidence level. The p-value of 0.097 exceeds are critical value of alpha=0.05.

131
Section 4

ANOVA

INFERENCES TWO OR MORE SAMPLES ANALYSIS OF VARIANCE (ANOVA)

1. ANOVA The procedure known as the Analysis of Variance or ANOVA is used to test hypotheses concerning

2. One-way ANOVA means when we have several populations.

3. Two-way ANOVA The Analysis of Variance (ANOVA)

4. Post-hoc Test ANOVA is a general technique that can be used to test the hypothesis that the means among two or
more groups are equal, under the assumption that the sampled populations are normally distributed.
5. Comparison of Bonferroni Method with
The ANOVA procedure is one of the most powerful statistical techniques.
Scheffé and Tukey Methods
A couple of questions come immediately to mind: what means? and why analyze variances in order to
6. SYSTAT ANOVA
derive conclusions about the means?

Both questions will be answered as we delve further into the subject.

To begin, let us study the effect of temperature on a passive component such as a resistor. We select
three different temperatures and observe their effect on the resistors. This experiment can be
conducted by measuring all the participating resistors before placing n resistors each in three different
ovens.

Each oven is heated to a selected temperature. Then we measure the resistors again after, say, 24
hours and analyze the responses, which are the differences between before and after being subjected
to the temperatures. The temperature is called a factor. The different temperature settings are called
levels. In this example there are three levels or settings of the factor Temperature.

132
What is a factor? The alternative hypothesis for cases 1 and 2 is: the means are not equal.

A factor is an independent treatment variable whose settings (values) are The alternative hypothesis for case 3 is: there is an interaction between A and B.
controlled and varied by the experimenter. The intensity setting of a factor is the
For the 3-way ANOVA: The main effects are factors A, B and C. The 2-factor
level.
interactions are: AB, AC, and BC. There is also a three-factor interaction: ABC.
Levels may be quantitative numbers or, in many cases, simply "present" or "not
For each of the seven cases the null hypothesis is the same: there is no difference
present" ("0" or "1").
in means, and the alternative hypothesis is the means are not equal.
In this experiment there is only one factor, temperature, and the analysis of
In general, the number of main effects and interactions can be found by the
variance that we will be using to analyze the effect of temperature is called a one-
following expression:
way or one-factor ANOVA.

We could have opted to also study the effect of positions in the oven. In this case
there would be two factors, temperature and oven position. Here we speak of a
two-way or two-factor ANOVA. Furthermore, we may be interested in a third
factor, the effect of time. Now we deal with a three-way or three-factor ANOVA. In
each of these ANOVA's we test a variety of hypotheses of equality of means (or
The first term is for the overall mean, and is always 1. The second term is for the
average responses when the factors are varied).
number of main effects. The third term is for the number of 2-factor interactions,
and so on. The last term is for the n-factor interaction and is always 1.
First consider the possible hypothesis for one-way ANOVA.

We will be focusing on 1-way and 2-way ANOVA analysis and interpretation.


1. The null hypothesis is: there is no difference in the population means of
the different levels of factor A (the only factor). One-way ANOVA

2. The alternative hypothesis is: the means are not the same. This section gives an overview of the one-way ANOVA. First we explain the
principles involved in the 1-way ANOVA.
For the 2-way ANOVA, the possible null hypotheses are:
Partition response into components
1. There is no difference in the means of factor A
In an analysis of variance the variation in the response measurements is
2. There is no difference in means of factor B
partitioned into components that correspond to different sources of variation.

3. There is no interaction between factors A and B

133
The goal in this procedure is to split the total variation in the data into a portion Algebraically, this is expressed by
due to random error and portions due to changes in the values of the independent
variable(s).

The variance of n measurements is given by

where k is the number of treatments and the bar over the y.. denotes the "grand"
Sums of squares and degrees of freedom or "overall" mean. Each ni is the number of observations for treatment i. The total
number of observations is N (the sum of the ni).
The numerator part is called the sum of squares of deviations from the mean, and
the denominator is called the degrees of freedom. Concept of "Treatment"

The variance, after some algebra, can be rewritten as: We introduced the concept of treatment. The definition is: A treatment is a specific
combination of factor levels whose effect is to be compared with other treatments.

The mathematical model that describes the relationship between the response
and treatment for the one-way ANOVA is given by

The first term in the numerator is called the "raw sum of squares" and the second
term is called the "correction term for the mean". Another name for the numerator where Yij represents the j-th observation (j = 1, 2, ...ni) on the i-th treatment (i = 1,
is the "corrected sum of squares", and this is usually abbreviated by Total SS or 2, ..., k levels). So, Y23 represents the third observation using level 2 of the factor
SS(Total). is the common effect for the whole experiment, I represents the i-th treatment
effect and ij represents the random error present in the j-th observation on the i-th
The SS in a 1-way ANOVA can be split into two components, called the "sum of treatment.
squares of treatments" and"sum of squares of error", abbreviated as SST and
SSE, respectively. The errors ij are assumed to be normally and independently (NID) distributed,

with mean zero and variance is always a fixed parameter and are considered to
The guiding principle behind ANOVA is the decomposition of the sums of squares,
or Total SS. be fixed parameters if the levels of the treatment are fixed, and not a random
sample from a population of possible levels. It is also assumed that it is chosen so
that

134
F Statistic

holds. This is the fixed effects model. The test statistic, used in testing the equality of treatment means is:

If the k levels of treatment are chosen at random, the model equation remains the F = MST / MSE.
same. However, now the i's are random variables assumed to be NID. This is the
The critical value is the tabular value of the F distribution, based on the chosen
random effects model.
alpha level and the degrees of freedom DFT and DFE.
Whether the levels are fixed or random depends on how these levels are chosen in
The calculations are displayed in an ANOVA table as output from statistical
a given experiment.
software:
The sums of squares SST and SSE previously computed for the one-way ANOVA
are used to form two mean squares, one for treatments and the second for error. Source SS DF MS F
These mean squares are denoted by MST and MSE, respectively. These are
typically displayed in a tabular form, known as an ANOVA Table. The ANOVA table Treatments SST k-1 SST / (k-1) MST/MSE
also shows the statistics used to test hypotheses about the population means. Error SSE N-k SSE / (N-k)

When the null hypothesis of equal means is true, the two mean squares estimate Total SS N-1
the same quantity (error variance), and should be of approximately equal (corrected)
magnitude. In other words, their ratio should be close to 1. If the null hypothesis is
false, MST should be larger than MSE. The word "source" stands for source of variation. Some researchers prefer to use
"between" and "within" instead of "treatments" and "error", respectively.
The mean squares are formed by dividing the sum of squares by the associated
degrees of freedom.

Let N = ni. Then, the degrees of freedom for treatment, DFT = k - 1, and the EXAMPLE
degrees of freedom for error, DFE = N -k.
The data below resulted from measuring the difference in resistance resulting from
The corresponding mean squares are: subjecting identical resistors to three different temperatures for a period of 24
hours. The sample size of each group was 5. In the language of Design of
MST = SST / DFT
Experiments, we have an experiment in which each of three treatments was
replicated 5 times.
MSE = SSE / DFE

135
Level 1 Level 2 Level 3 FURTHER ANALYSIS
6.9 8.3 8.0
There are several techniques we might use to further analyze the differences.
5.4 6.8 10.5
These are:
5.8 7.8 8.1
4.6 9.2 6.9 • constructing confidence intervals around the difference of two means.
4.0 6.5 9.3
Means 5.34 7.72 8.56 • estimating combinations of factor levels with confidence bounds.

• multiple comparisons of combinations of factor levels tested


ANOVA TABLE OUTPUT
simultaneously.

Source SS DF MS F
CALCULATIONS

Treatments 27.897 2 13.949 9.59 SYSTAT and other statistical programs do ANOVA calculations. This section
Error 17.452 12 1.454 describes how to calculate the various entries in an ANOVA table. Remember, the
goal is to produce two variances (of treatments and error) and their ratio. The
Total (corrected) 45.349 14 various computational formulas will be shown and applied to the data from the
Correction 779.041 1 previous example.
Factor
STEP 1 Compute CM, the correction for the mean.
INTERPRETATION

The test statistic is the F value of 9.59. Using an α of .05, we have that F .05; 2, 12
= 3.89 (compare to the critical F value). Since the test statistic is much larger than
the critical value, we reject the null hypothesis of equal population means and
conclude that there is a (statistically) significant difference among the population
means. The p-value for 9.59 is .00325, so the test statistic is significant at that
level.

STEP 2 Compute the total SS.


The populations here are resistor readings while operating under the three
different temperatures. What we do not know at this point is whether the three
The total SS = sum of squares of all observations - CM
means are all different or which of the three means is different from the other two,
and by how much.

136
STEP 5 Compute MST, MSE and their ratio, F.

MST is the mean square of treatments, MSE is the mean square of error (MSE is
also frequently denoted by ).

MST = SST / (k-1) = 27.897 / 2 = 13.949

The 829.390 SS is called the "raw" or "uncorrected " sum of squares. MSE = SSE / (N-k) = 17.452/ 12 = 1.454

STEP 3 Compute SST, the treatment sum of squares. where N is the total number of observations and k is the number of treatments.
Finally, compute F as
First we compute the total (sum) for each treatment.
F = MST / MSE = 9.59
T1 = (6.9) + (5.4) + ... + (4.0) = 26.7
That is it. These numbers are the quantities that are in the ANOVA table that was
T2 = (8.3) + (6.8) + ... + (6.5) = 38.6 shown previously.

T1 = (8.0) + (10.5) + ... + (9.3) = 42.8

ANOVA RESULTS:

Then
Source SS DF MS F
Treatments 27.897 2 13.949 9.59
Error 17.452 12 1.454

Total (corrected) 45.349 14


Correction 779.041 1
Factor

STEP 4 Compute SSE, the error sum of squares.

Here we utilize the property that the treatment sum of squares plus the error sum
of squares equals the total sum of squares.

Hence, SSE = SS Total - SST = 45.349 - 27.897 = 17.45.


137
How do we construct a confidence interval? and variance of

For the one-way ANOVA the formula for a confidence interval for the difference
between two treatment means is:

the t statistic can be shown to be

where = MSE.

Substituting values from our example we have t = 2.17, thus

(8.56 - 5.34) 2.179(0.763) or 3.22 1.616


with a t distribution of (N - k) degrees of freedom for the ANOVA model under
consideration, where N is the total number of observations and k is the number of
That is, the confidence interval is from 1.604 to 4.836.
factor levels or groups. The degrees of freedom are the same as were used to
A 95% confidence interval for is: from -1.787 to 3.467. calculate the MSE in the ANOVA table. That is: dfe (degrees of freedom for error)
= N - k. From this we can calculate (1-α) 100% confidence limits for each μi.

A 95% confidence interval for is: from -0.247 to 5.007. These are given by:

Estimation of a Factor Level Mean With Confidence Bounds

An unbiased estimator of the factor level mean i in the 1-way ANOVA model is
given by:

where

138
EXAMPLE

The data in the accompanying table resulted from an experiment run in a


completely randomized design in which each of four treatments was replicated
five times.

Total Mean Hence, we obtain confidence limits 5.34 ± 2.120 (0.5159) and the confidence
Group 1 6.9 5.4 5.8 4.6 4.0 26.70 5.34 interval is
Group 2 8.3 6.8 7.8 9.2 6.5 38.60 7.72
Group 3 8.0 10.5 8.1 6.9 9.3 42.80 8.56
Group 4 5.8 3.8 6.1 5.6 6.2 27.50 5.50
Definition and Estimation of Contrasts
All 135.60 6.78
Groups
Definitions

ANOVA OUTPUT:
A contrast is a linear combination of 2 or more factor level means with coefficients
that sum to zero.
Source SS DF MS F
Treatments 38.820 3 12.940 9.724 Two contrasts are orthogonal if the sum of the products of corresponding
Error 21.292 16 1.331 coefficients (i.e., coefficients for the same means) adds to zero.
Total 60.112 19
(Corrected) Formally, the definition of a contrast is expressed below, using the notation i for
Mean 919.368 1 the i-th treatment mean where

Total (Raw) 979.480 20

 
Since the confidence interval is two-sided, the entry (1 - α/2) value for the t table is
(1 - 0.05/2) = 0.975, and the associated degrees of freedom is N - 4, or 20 - 4 =
16.

From the t table, we obtain t (0.975;16) = 2.120.

Next we need the standard error of the mean for group 1:

139
ORTHOGONAL CONTRASTS

As an example of orthogonal contrasts, note the three contrasts defined by the


table below, where the rows denote coefficients for the column treatment means.

1 2 3 4 These formulas hold for any linear combination of treatment means, not just for

c1 +1 0 0 -1 contrasts.

c2 0 +1 -1 0
CONFIDENCE INTERVAL FOR CONTRASTS
c3 +1 -1 -1 +1
An unbiased estimator for a contrast C is given by
PROPERTIES OF ORTHOGONAL CONTRASTS:

The following is true:

1. The sum of the coefficients for each contrast is zero.

2. The sum of the products of coefficients of each pair of contrasts is also 0


with variance of
(orthogonality property).

3. The first two contrasts are simply pairwise comparisons, the third one
involves all the treatments.
The 1-a confidence limits of C are:
Contrasts are estimated by taking the same linear combination of treatment mean
estimators. In other words:

ESTIMATING CONTRASTS

We wish to estimate the following contrast:

and

140
and construct a 95 % confidence interval for C. LINEAR COMBINATIONS

POINT ESTIMATE Sometimes we are interested in a linear combination of the factor-level means that
is not a contrast. Assume that in our sample experiment certain costs are
The point estimate is: associated with each group. For example, there might be costs associated with
each factor:

Factor Cost in $
thus 1 3
2 5
3 2
4 1
and
The following linear combination may be of interest

and the standard error is .

This resembles a contrast, but the coefficients ci do not sum to zero. A linear
combination is given by the definition:
CONFIDENCE INTERVAL

For a confidence coefficient of 95% and df = 20 - 4 = 16, t 0.975,16 = 2.12.


Therefore, the desired 95% confidence interval is -0.5 ± 2.12(0.5159) or (-1.594,
0.594).

with no restrictions on the coefficients of ci.

Confidence limits for a linear combination C are obtained in precisely the same
way as those for a contrast, using the same calculation for the point estimator and
estimated variance.

141
TWO WAY ANOVA

The 2-way ANOVA is probably the most popular layout in experimental design.

An experiment that utilizes every combination of factor levels as treatments is


called a factorial experiment.

Factorial Model

In a factorial experiment with factor A at a levels and factor B at b levels, the


model for the general layout can be written as The ANOVA output for the A by B factorial design

Source SS df MS
Factor A SS(A) (a - 1) MS(A) = SS(A)/(a-1)
Factor B SS(B) (b - 1) MS(B) = SS(B)/(b-1)
where is the overall mean response, is the effect due to the i-th level of
Interaction SS(AB) (a-1) MS(AB)= 

AB (b-1) SS(AB)/(a-1)(b-1)
factor A, is the effect due to the j-th level of factor B and ij is the effect due
Error SSE (N - ab) SSE/(N - ab)
to any interaction between the i-th level of A and the j-th level of B. Total SS(Total) (N - 1)
(Corrected)
At this point, consider the levels of factor A and of factor B chosen for the
experiment to be the only levels of interest to the experimenter such as
predetermined levels for temperature settings or the length of time for process The various hypotheses that can be tested using this ANOVA table concern
step. The factors A and B are said to be fixed factors and the model is a fixed- whether the different levels of Factor A, or Factor B, really make a difference in the
effects model. Random actors will be discussed later. response, and whether the AB interaction is significant.

When an a x b factorial experiment is conducted with an equal number of Recall that the possible null hypotheses are:
observations per treatment combination, the total (corrected) sum of squares is
partitioned as: 1. There is no difference in the means of factor A

SS(total) = SS(A) + SS(B) + SS(AB) + SSE 2. There is no difference in means of factor B

where AB represents the interaction between A and B. 3. There is no interaction between factors A and B

For reference, the formulas for the sums of squares are:

142
BASIC MODEL EXAMPLE

Factor A has 1, 2, ..., a levels. Factor B has 1, 2, ..., b levels. There are A by B An evaluation of a new coating applied to 3 different materials was conducted at 2
treatment combinations (or cells) in a complete factorial layout. Assume that each different laboratories. Each laboratory tested 3 samples from each of the treated
treatment cell has r independent observations (known as replications). When each materials. The results are given in the following table:
cell has the same number of replications, the design is a balanced factorial.

Calculation of the Sum of Squares


Materials (B)
The SYSTAT statistical program will calculate the sums of squares needed for the LABS (A) 1 2 3
ANOVA table.   4.1 3.1 3.5
1 3.9 2.8 3.2
• Let Ai be the sum of all observations of level i of factor A, i = 1, ... ,a. The Ai
  4.3 3.3 3.6
are the row sums.

• Let Bj be the sum of all observations of level j of factor B, j = 1, ...,b. The Bj   2.7 1.9 2.7
are the column sums. 2 3.1 2.2 2.3
  2.6 2.3 2.5
• Let (AB)ij be the sum of all observations of level i of A and level jof B. These
are cell sums.

• Let r be the number of replicates in the experiment; that is: the number of PRELIMINARY ANALYSIS
times each factorial treatment combination appears in the experiment.
The preliminary part of the analysis yields a table of row and column sums
Then the total number of observations for each level of factor A is rb and the total
number of observations for each level of factor B is ra and the total number of Material (B)
observations for each interaction is r. Lab (A) 1 2 3 Total (Ai)
1 12.3 9.2 10.3 31.8
2 8.4 6.4 7.5 22.3
Total (Bj) 20.7 15.6 17.8 54.1

143
ANOVA RESULTS and the k levels (e.g., the batches) are chosen at random from a population with
variance . The data are as follows:
Source SS df MS F p-value
A 5.0139 1 5.0139 100.28 0 Batch
B 2.1811 2 1.0906 21.81 0.0001 1 2 3 4 5
AB 0.1344 2 0.0672 1.34 0.298 74 68 75 72 79
Error 0.6000 12 0.0500 76 71 77 74 81
Total (Corr) 7.9294 17 75 72 77 73 79

INTERPRETATION From the analysis the test statistic from the ANOVA table is F = 36.94 / 1.80 =
20.5.
From the results we see that the interaction of AB is insignificant at p=0.298. The
Lab and Material Coating were each significant at p=.000. If we had chosen an α value of .01, then the F value for a df of 4 in the numerator
and 10 in the denominator is 5.99.
RANDOM MODELS
Since the test statistic is larger than the critical value, we reject the hypothesis of
Random factors, such as operators, days, lots or batches, where the levels in the
equal means. Since these batches were chosen via a random selection process, it
experiment might have been chosen at random from a large number of possible
may be of interest to find out how much of the variance in the experiment might
levels, the model is called a random model and inferences are to be extended to
be attributed to batch differences and how much to random error. In order to
all levels of the population.
answer these questions, we can use the EMS (expected mean square). From this
analysis we see that 11.71/13.51 = 86.7 percent of the total variance is
In a random model the experimenter is often interested in estimating components
attributable to batch differences and 13.3 percent to error variability within the
of variance. Let us run an example that analyzes and interprets a component of
batches.
variance or random model.

Components of Variance Example for Random Factors

A company supplies a customer with a larger number of batches of raw materials.


HOW TO MAKE MULTIPLE COMPARISONS
The customer makes three sample determinations from each of 5 randomly
What to do after equality of means is rejected. Post-hoc tests facilitate our
selected batches to control the quality of the incoming material. The model is
understanding of the form of the inequality. When processes are compared and
the null hypothesis of equality (or homogeneity) is rejected, all we know at that
point is that there is no equality amongst them. But we do not know the form of
the inequality.

144
Questions concerning the reason for the rejection of the null hypothesis arise in • Confidence Intervals For A Contrast
the form of:
These types of investigations should be done on combinations of factors that
• "Which mean(s) or proportion(s) differ from a standard or from each other?" were determined in advance of observing the experimental results, or else the
confidence levels are not as specified by the procedure. Also, doing several
• "Does the mean of treatment 1 differ from that of treatment 2?" comparisons might change the overall confidence level. This can be avoided by
carefully selecting contrasts to investigate in advance and making sure that:
• "Does the average of treatments 1 and 2 differ from the average of
treatments 3 and 4?" • the number of such contrasts does not exceed the number of degrees of
freedom between the treatments.
One popular way to investigate the cause of rejection of the null hypothesis is a
Multiple Comparison Procedure. These are methods which examine or compare • only orthogonal contrasts are chosen.
more than one pair of means or proportions at the same time. Doing pairwise
comparison procedures over and over again for all possible pairs will not, in However, there are also several powerful multiple comparison procedures we can
general, work. This is because the overall significance level is not as specified for use after observing the experimental results.
a single pair comparison.
Tests on Means after Experimentation
The ANOVA uses the F test to determine whether there exists a significant
difference among treatment means or interactions. In this sense it is a preliminary If the decision on what comparisons to make is withheld until after the data are
test that informs us if we should continue the investigation of the data at hand. examined, the following procedures can be used:

If the null hypothesis (no difference among treatments or interactions) is accepted, • Tukey's Method to test all possible pairwise differences of means to
there is an implication that no relation exists between the factor levels and the determine if at least one difference is significantly different from 0.
response. There is not much we can learn, and we are finished with the analysis.
• Scheffé's Method to test all possible contrasts at the same time, to see if
When the F test rejects the null hypothesis, we usually want to undertake a at least one is significantly different from 0.
thorough analysis of the nature of the factor-level effects.
• Bonferroni Method to test, or put simultaneous confidence intervals
Previously, we discussed several procedures for examining particular factor-level around, a pre-selected group of contrasts.
effects. These were
Multiple Comparisons Between Proportions
• Estimation of the Difference Between Two Factor Means.
When we are dealing with population proportion defective data, the Marascuilo
• Estimation of Factor Level Effects. procedure can be used to simultaneously examine comparisons between all
groups after the data have been collected.

145
TUKEY METHOD Tukey's Method

The Tukey method applies simultaneously to the set of all pairwise comparisons. The Tukey confidence limits for all pairwise comparisons with confidence
The confidence coefficient for the set, when all sample sizes are equal, is exactly coefficient of at least 1- α are:
1- . For unequal sample sizes, the confidence coefficient is greater than 1- . In
other words, the Tukey method is conservative when there are unequal sample
sizes.
Notice that the point estimator and the estimated variance are the same as those
Studentized Range Distribution
for a single pairwise comparison that was illustrated previously. The only
difference between the confidence limits for simultaneous comparisons and those
The Tukey method uses the studentized range distribution. Suppose we have r
for a single comparison is the multiple of the estimated standard deviation.
independent observations y1, ..., yr from a normal distribution with mean μ and
variance σ2. Let w be the range for this set , i.e., the maximum minus the
Example
minimum. Now suppose that we have an estimate s2 of the variance σ2 that is
based on the degrees of freedom and is independent of the yi. The studentized Using the data from a previous example and setting a confidence coefficient of 95
range is defined as percent we find that the simultaneous pairwise comparisons indicate that the
differences μ1 - μ4 and μ2 - μ3 are not significantly different from 0 (their
confidence intervals include 0), and all the other pairs are significantly different.
(We will do a full analysis using SYSTAT at the end of this section.)
The distribution of q has been formulated and is provided as part of the analysis
function in SYSTAT. It is possible to work with unequal sample sizes. In this case, one has to calculate
the estimated standard deviation for each pairwise comparison. The Tukey
As an example, let r = 5 and n = 10. The 95th percentile is q.05;5,10 = 4.65. This
procedure for unequal sample sizes is sometimes referred to as the Tukey-Kramer
means:
Method.

So, if we have five observations from a normal distribution, the probability is .95
that their range is not more than 4.65 times as great as an independent sample
standard deviation estimate for which the estimator has 10 degrees of freedom.

146
SCHEFFE’S METHOD

Scheffé's method applies to the set of estimates of all possible contrasts among
are correct simultaneously.
the factor level means, not just the pairwise differences considered by Tukey's
method.

An arbitrary contrast is defined by Scheffe method example

We wish to estimate, in our experiment, the following contrasts

where

Technically there are an infinite number of contrasts. The simultaneous confidence


coefficient is exactly 1- , whether the factor level sample sizes are equal or
and construct 95 percent confidence intervals for them.
unequal.
The point estimates are:
We estimate C by:

for which the estimated variance is: Applying the formulas above we obtain in both cases:

It can be shown that the probability is 1 - α that all confidence limits of the type and

147
Comparison of Scheffé's Method with Tukey's Method

If only pairwise comparisons are to be made, the Tukey method will result in
where = 1.331 was computed in our previous example. a narrower confidence limit, which is preferable.

The standard error = .5158 (square root of .2661). Consider for example the comparison between μ3 and μ1.

For a confidence coefficient of 95 percent and degrees of freedom in the Tukey:    1.13 < 3 - 1 < 5.31 
numerator of r - 1 = 4 - 1 = 3, and in the denominator of 20 - 4 = 16, we have:
Scheffé:  0.95 < 3 - 1 < 5.49

that gives Tukey's method the edge.

The confidence limits for C1 are -.5 ± 3.12(.5158) = -.5 ± 1.608, and for C2 they
The normalized contrast, using sums, for the Scheffé method is 4.413, which is
are .34 ± 1.608.
close to the maximum contrast.

The desired simultaneous 95 percent confidence intervals are


In the general case when many or all contrasts might be of interest, the Scheffé
method tends to give narrower confidence limits and is therefore the preferred
-2.108   C1   1.108 
method.
-1.268   C2   1.948

Recall that when we constructed a confidence interval for a single contrast, we


found the 95 percent confidence interval: BONFERRONI METHOD

-1.594   C   0.594 The Bonferroni method is a simple method that allows many comparison
statements to be made (or confidence intervals to be constructed) while still
As expected, the Scheffé confidence interval procedure that generates assuring an overall confidence coefficient is maintained.
simultaneous intervals for all contrasts is considerably wider.
This method applies to an ANOVA situation when the analyst has picked out a
particular set of pairwise comparisons or contrasts or linear combinations in
advance. This set is not infinite, as in the Scheffé case, but may exceed the set of
pairwise comparisons specified in the Tukey procedure.

The Bonferroni method is valid for equal and unequal sample sizes. We restrict
ourselves to only linear combinations or comparisons of treatment level means

148
(pairwise comparisons and contrasts are special cases of linear combinations). We
denote the number of statements or comparisons in the finite set by g.

Formally, the Bonferroni general inequality is presented by:

and construct 95% confidence intervals around the estimates.

where Ai and its complement are any events.


The point estimates are:

In particular, if each Ai is the event that a calculated confidence interval for a


particular linear combination of treatments includes the true value of that
combination, then the left-hand side of the inequality is the probability that all the
confidence intervals simultaneously cover their respective true values. The right-
hand side is one minus the sum of the probabilities of each of the intervals
missing their true values. Therefore, if simultaneous multiple interval estimates are As before, for both contrasts, we have
desired with an overall confidence coefficient 1- , one can construct each

interval with confidence coefficient (1- /g), and the Bonferroni inequality insures

that the overall confidence coefficient is at least 1- .

In summary, the Bonferroni method states that the confidence coefficient is at


least 1- that simultaneously all the following confidence limits for the g linear
and
combinations Ci are "correct" (or capture their respective true values).

Example using Bonferroni method

We wish to estimate, as we did using the Scheffe method, the following linear
where = 1.331. The standard error is .5158 (the square root of .2661).
combinations (contrasts):

For a 95 % overall confidence coefficient using the Bonferroni method, the t value
is t 1-0.05/(2*2),16 = t 0.9875,16 = 2.473. Now we can calculate the confidence
intervals for the two contrasts. For C1 we have confidence limits -0.5 ± 2.473 (.
5158) and for C2 we have confidence limits 0.34 ± 2.473 (0.5158).

149
Thus, the confidence intervals are: SYSTAT ANOVA Analysis

-1.776   C1   0.776


We can use SYSTAT to test our hypothesis for two or more means and do post
-0.936   C2   1.616 hoc analysis using Tukey’s method.

Notice that the Scheffé interval for C1 is:


EXAMPLE

-2.108   C1   1.108 Data was collected on the poverty level for the lower 48 states. Each state was
classified regionally as northeast, midwest, south or west. As a researcher we
wider and therefore less attractive. want to determine if there is a significant difference in poverty levels by region.

Comparison of Bonferroni Method with Scheffé and Tukey Our test hypothesis can be stated as “poverty levels do not vary by region.” The
alternate is that “poverty levels are not equal across all regions of the US.”
Methods
Once the data have been entered into SYSTAT we can request Analysis using
1. If all pairwise comparisons are of interest, Tukey has the edge. If only a
ANOVA.
subset of pairwise comparisons are required, Bonferroni may sometimes
be better.

2. When the number of contrasts to be estimated is small, (about as many as


there are factors) Bonferroni is better than Scheffé. Actually, unless the
number of desired contrasts is at least twice the number of factors, Scheffé
will always show wider confidence bands than Bonferroni.

3. Many computer packages include all three methods. So, study the output
and select the method with the smallest confidence band.

4. No single method of multiple comparisons is uniformly best among all the


methods.

Once you click on estimate model the following window will open.

150
As stated in our hypothesis we are interested in the rate of poverty by region (US). We are now interested in where significant differences exist by region. We request
a pairwise comparison using Tukey’s test. We click on Analysis, ANOVA and
Pairwise Comparisons.

We set up the model by adding POVRTY91 (our poverty variable) to the


This will open a window to specify the specific test(s) we would like to run.
dependent(s) list and REGION to the Factor(s) list. Click OK.

Once you click OK the analysis will generate the model and multiple components
of output. We are interested in the section of the output focusing on the test of
our hypothesis.

We find that the F-Ratio of 4.550 is significant at 0.007 (p-Value). Given a


confidence level of 95% we reject the hypothesis of equality of poverty across US
regions.

151
We have only one available effect from our model (REGION). We add REGION to
the groups and click on Tukey. We have the option of changing the confidence
level. By default the level is set at 95%. Once we are finished click OK. A plot of means is provided as part of the SYSTAT output.

Tukey’s pair wise comparisons are in the output window. Recall that our general
hypothesis is that all means are equal (no differences by region). Or stated
differently:

Ho: NE Poverty = Midwest Poverty = South Poverty = West Poverty

H1: Not Equal

The plot provides a clear representation of the higher rate of poverty in the south
region relative to the northeast.

Post Hoc analysis reveals the significant difference in poverty is between the
Northeast and the South. The other regional contrast are not significant at alpha
of .05. Our findings indicate that the significant difference we found in our ANOVA
analysis is the result of the difference between the northeast and south.

152
Section 5

Method to Use

INFERENCES TWO OR MORE SAMPLES Putting It Together: Which Method do I Use?

1. Method to Use A sampling method is independent when the individuals selected for one sample do not dictate which

2. Criteria for Selecting the Best Method individuals are to be in a second sample. A sampling method is dependent when the individuals se-
lected to be in one sample are used to determine the individuals to be in the second sample.

Dependent samples are often referred to as matched-pairs samples. We began by focusing on the
analysis of independent samples then examined the procedures for testing hypothesis for matched
pairs. We concluded this section with an introduction to the analysis of two or more means using
ANOVA.

The statistical inference methods on matched pairs data use the same methods as inference on a sin-

gle population mean, except that the differences are analyzed.

Selecting the appropriate method:

Step 1: Was the sample(s) drawn according to the requirements for a random sample? If YES: Go to
Step 2. If NO: Cannot use any statistical model which is based on probability theory. However, one
can describe data, i.e., its mean, standard deviation, quartiles, etc. One might also be able to fit a line
to a set of data by method of least squares, but could not construct a meaningful confidence interval.

153
Step 2: What level of measure was attained? Step 4: What are the number of samples involved?

Interval or Ratio Nominal Ordinal One Random Sample: 1. Normal (Z)


1. Normal (Z) 1. Chi Square 1. Mann-Whitney
2. Students’ (t)
2. Students’ (t) 2. Binomial 2. Wilcoxon
(itself)
3. Binomial (itself)
3. F distribution 3. Spearman Rank
Correlation
4. Normal approx. to Binomial
4. Pearson’s
Product Moment
Correlation 5. Pearson’s Product Moment Correlation
5. Normal
approx. to 6. Chi Square
Binomial
7. Spearman Rank Correlation
The above are the minimum levels of measure necessary for each test. Thus, the
tests associated with interval level of measure may not be used with nominal or One Random Sample Consisting of Matched Pairs (Before-After):
ordinal levels of measure. The Chi Square test which needs only nominal level of
1. Wilcoxon
measure, can be used on all levels of measure: ordinal, interval, and ratio. The
Mann-Whitney test, which requires ordinal level of measure can be used on
2. Special Case Students’ (t)
interval or ratio data as well. In fact, the Mann-Whitney test is a good alternative
to the two means Z or t test, especially if homogeneity of variance is questionable. Two independent Random Samples:
Likewise, the Wilcoxon and Spearman tests may be used on interval or ratio data
when other conditions for the tests have been met. 1. Normal (Z)

2. Students’ (t)

Step 3: What parameter is addressed in the hypothesis? 3. Normal approx. to Binomial

4. F distribution
• Proportion, p

5. Chi Square
• Mean, μ

6. Mann-Whitney

154
Mean:

Proportion: Is the sampling Dependent or Independent?

Is the sampling Dependent or Independent? Dependent samples:

Dependent samples: Provided each sample size is greater than 30 or the differences come from a
population that is normally distributed, use Student’s t-distribution with n-1
Provided the samples are obtained randomly and the total number of observations degrees of freedom with:
where the outcomes differ is at least 10, use the normal distribution with:

Independent samples: Independent samples:

Provided: Provided each sample size is greater than 30 or each population is normally
distributed, use Student’s t-distribution:

for each sample and the sample size is no more than 5% of the population size,
use the normal distribution with

More Than Two Independent Random Samples:

1. Chi Square

2. Analysis of Variance
where:
Step 5: Is the normal population assumption required?

Yes for parametric tests: Z, t, F, and Pearson product moment correlation


(variables X and Y). Samples must have been drawn from a normal or large

155
population. Note: Normal Approximate to binomial requires np and nq ≥ 5, in (All cells should have fe >5 if possible.)
order to approximate binomial distribution with a normal distribution.
• For Mann-Whitney test and Spearman Rank Correlation, ties in ranks
No for non-parametric tests: Chi Square, Mann-Whitney, Wilcoxon, Spearman receive the average of the tied ranks.
rank correlation, and Binomial, itself.
• In Wilcoxon matched pairs signed ranks test, difference of zero in (response
Step 6: Is equal variance assumption required? 2) minus (response 1) are dropped from the analysis and n reduced by one.

Yes for Z and t two means tests and for Pearson Product Moment Correlation Step 8: Lastly, if a random sample has been drawn, but none of the above tests
(homoscedasticity, i.e., variance of Yi from the regression line is the same for all meet the circumstances of the problem, refer to an advanced text on Statistical
values of X). Methods.

No for Two-proportions test, Chi Square, Mann-Whitney, Wilcoxon, and Spearman Testing the Model
Rank Correlation.
As we have discovered, confidence intervals and hypothesis testing give the
Step 7: Check other assumptions required for different models (tests) to have same results. The approach to use will depend on the context. Scientific research
valid application: usually follow a hypothesis testing model because their null hypothesis value for
an experiment is usually 0 and the scientist is attempting to reject the hypothesis
• (Z) versus (t)
that nothing happened in the experiment. Those involved in making decisions—

a. Use z when σ is unknown, or n ≥ 30. epidemiologists, business people, engineers—are often more interested in
confidence intervals. They focus on the size and credibility of an effect and care
b. Use t when ô is unknown and n ≤ 30. less whether it can be distinguished from 0.

• Normal approximation to binomial

a. One sample case, n and n ≥ 5.

b. Two sample case, (n1 + n2) p and (n1 + n2) q ≥ 5

• Chi Square test

a. No less than 20% of cells may have fe < 5, and none < 1

156
Chapter 5

Regression

This section focuses on the development of linear


equations to help in making better decisions and
prediction. We initially develop basic bi-variate
equations and then expand on our foundation to
develop multiple regression models.
Section 1

Linear Regression

REGRESSION Linear Regression


1. Purpose
General Purpose
2. Least Squares
The general purpose of regression is to learn more about the relationship between an independent or
3. The Regression Equation
predictor variable and a dependent or criterion variable.
4. Bi-variate Linear Regression
Regression procedures are very widely used in research. In general, multiple regression allows the
5. SYSTAT output researcher to ask (and hopefully answer) the general question "what is the best predictor of ...". For
example, educational researchers might want to learn what are the best predictors of success in high-
school. Psychologists may want to determine which personality variable best predicts social
adjustment. Sociologists may want to find out which of the multiple social indicators best predict
whether or not a new immigrant group will adapt and be absorbed into society.

Computational Approach

The general computational problem that needs to be solved in regression analysis is to fit a straight line
to a number of points.

158
In the simplest case - one dependent and one independent variable - you can
visualize this in a scatterplot.
A scatter plot reveals different possible relationships between the explanatory
variable and the response variable.

The linear relationships in (a) and (b) are both explanatory with (a) sowing a strong
positive relationship and (b) a negative relationship. In essence as the explanatory
variable increase in value the response increase for (a) but decreases in (b). We
may also find nonlinear relationships as in (c) and (d) or no relationship as in (e).

Least Squares

In the scatterplot, we have an independent or X variable, and a dependent


or Y variable. These variables may, for example, represent population (state) by
violent crime, respectively. Each point in the plot represents one state, that is, the
respective state’s population and violent crime. The goal of linear regression
procedures is to fit a line through the points. Specifically, the program will
compute a line so that the squared deviations of the observed points from that line
are minimized. Thus, this general procedure is sometimes also referred to as least
squares estimation. 

159
The Regression Equation In the multivariate case, when there is more than one independent variable, the
regression line cannot be visualized in the two dimensional space, but can be
A line in a two dimensional or two-variable space is defined by the equation: computed just as easily.

Y=a+b*X Association Analysis: Bivariate Linear Regression

The Y variable can be expressed in terms of a constant (a) and a slope (b) times Association Analysis is a method for examining the relationship between two or
the X variable. The constant is also referred to as the intercept, and the slope as more variables. The following two sections deal only with relationships between
the regression coefficient or B coefficient. For example, VIolent Crime may best be two variables and the third section deals with the more complex relationships
predicted by Population. Thus, knowing that a state’s population would lead us to between more than two variables.
predict the Violent Crime rate.
Association Analysis is composed of two parts: (1) the nature of the relationship
For example, the graph below shows a two dimensional regression equation among variables, usually referred to as regression analysis, and (2) the degree or
plotted with a 95% confidence interval. strength of the relationship, usually referred to as a correlation analysis. In most
applications, regression and correlation are used as supplementary techniques.

In many instances in business and elsewhere, two variables are thought to be


related. Examples might be the price of a house and the square feet in the house,
or the sales made per month from a point of sale display and the size of the
display in square feet, or the amount of sales a firm in a certain industry has and
the amount of accounts receivable it has.

In the first case, the objective might be to estimate or predict the market price of
certain houses knowing the size in square feet of each. (This is used in real estate
appraising and in tax assessing). In the second case, a drug store chain may
want to decide how many square feet of in-store display space to allot to a certain
type of product in order to provide a predicted amount of sales. In the third case,
a financial analyst for a hardware distribution chain may be estimating or
predicting the working capital requirements to cover accounts receivable of a new
operation.

In these types of situations, the value of one variable is predicted based on a


previously known value of the other. The variable whose value is being predicted
is called the “dependent” variable because its value depends on the value of the

160
second variable. The second variable is called the “independent” variable, A plot of the data is shown below:
because its value is assumed known, or does not depend on knowing the value of
some other variable. Scatter diagram of sales orders taken versus time spent with buyer

Regression Analysis – The Specific Case of Bivariate Linear Regression

The statistical procedure of “bivariate linear regression” is a lot simpler than it


appears if taken one step at a time. The word bivariate refers to working with
straight, as opposed to curved lines. Regression refers to the fact that two
variables are under study. The term linear refers to the nature of the relationship
between two variables. Bivariate linear regression is referred to simply as
regression analysis in this chapter, but it is important to remember that the
“bivariate linear” is assumed. The concept of regression is most easily introduced
by an example such as the one that follows:

In an effort to improve the allocation of direct sales effort, the sales manager of
Bubba Equipment had a random sample of his product support sales people keep
a record of the minutes spent with the buyers of maintenance and repair items on
each sales call. The orders taken on each call were also recorded so the sales
manager could determine the relationship between minutes spent with the buyers
and sales dollars produced by the call.
As the plot depicts, there is generally a pattern to the scatter plot of a data set.
Some typical data are shown below. (Only six data points are included for Several alternatives can be used to describe such a set of data. One alternative is
simplicity in calculation.) to simply draw a line through the data points that appear to be representative of
the set of data. This is sometimes called “eyeballing” a line. The problem with
Sales Calls Time Spent Dollar Amount of such an approach is that it lacks precision, i.e., it probably is not the
With Buyer (In Sales Order mathematically best fitting line for that data set. A commonly used technique for
Minutes)
fitting such a line is called the “method of least squares”.
1 30 250
2 20 200
3 25 175
4 15 125
5 10 100
6 15 175

161
Method of Least Squares Illustration of Deviations From Regression Line

Recall from basic algebra that the equation for a straight line is:

Y = a + bX

where: Y = dependent variable (sales in the example above)

a = y-intercept (point at which a line crosses the y axis)

b = slope of the line

x = independent variable (time spent with buyer in example above)

In statistical jargon the equation for a straight line is often referred to as the basis
for the general linear model. More detail concerning the general linear model is
contained in a later section.

The task of fitting a line to the set of data in the example above is accomplished
The distance between the line and a given observation is called “error”. For
with the method of least squares. Developed by a German mathematician, Karl
example, the initial observation has a $250 order as the result of 30 minutes spent
Gauss (1777 – 1855), the method of least squares did not achieve popularity until
with the buyer (from Table 2). Note that the line constructed in Figure 5 does not
the early part of the twentieth century.
pass directly through the point (30, 250), but rather is a short distance below it.
The name “least squares” is quite descriptive of how the regression line is The difference between the regression line and the point is the error in predicting
estimated. It is an attempt to fit a line to a scatter plot such that the line best that point. Note that a line measures the distance or error between the actual
describes the data composing that scatter plot. By arbitrarily drawing a line observation and the regression line perpendicular to the x-axis rather than
through the data (eyeball method) there is no assurance that the line is the best perpendicular to the regression line.
line for describing that data. Least square provides a rigorous mathematical
The error term is incorporated into the general linear model as follows:
procedure for specifying the best possible line. A line is fitted such that the sum
of the deviations (from the line) squared is a minimum. The deviations from the
line are illustrated in terms of the above example:

where:

= the predicted value of y for a particular value of x

162
= a particular value of x

a = the y intercept, the value of when = 0

b = the slope of the line

ei = the deviation of the actual Y observation from the line


To determine the equation, that provides the minimum of deviations

The value of Yi (the actual Y observation) depends on the value of Xi, plus an error squared from the line.
term because of the deviation Yi from the line. Then to minimize the error terms
(ei) squared over all (6) values of Xi, the equation solved for ei and squared
becomes: Table 3: Calculation of Terms Used in Regression Equations

Time Sales
with Orders X2 Y2 XY
Buyer (Y)
(X)
With the help of some simple calculus, it is possible to determine the
computational formulas for a and b of the general linear model. In order to
1 30 250 900 62,500 7,500
2 20 200 400 40,000 4,000
minimize take partial derivatives with respect to a and b 3 25 175 625 30,625 4,375
and set the equation = 0. 4 15 125 225 15,625 1,875
5 10 100 100 10,000 1,000
This process results in two simultaneous equations called “normal equations”:
6 15 175 225 30,625 2,625

∑Y=na+b∑X

These normal equations can be solved for the coefficients a and b in the linear
equation, Y = a + bX:

163
If another value of X is chosen near the low end of the range of the data, for
example Xi = 12, and a value near the high end, for example Xi = 28, and the
corresponding value of is determined from the equation, the line should pass

through the point and when it is drawn through the two points on the scatter
diagram as shown in Figure 6 below:

Substituting into the equation for a + b:


For Xi = 12, = a + b(12) = 48.55 + 6.38(12)

= 48.55 + 76.56 = 125.11

For Xi = 28, = a + b(28) = 48.55 + 6.38(28)

= 48.55 + 178.64 = 227.19

SYSTAT provides a plot of the regression line for bivariate analysis .

The regression equation (derived from the data) is denoted as:

Y = a + bx = 48.55 + 6.38X

To position the line on the plot of the data, it is necessary to compute several (at
least two) values of for given values of X. When correctly positioned, the line

will always intercept and . Therefore if the value of = 19.17 is substituted


into the equation, it should give the value of = 170.83 (or very close to it); e.g.,

164
5. The relationship between X and Y is linear.

SYSTAT Graphics Output: Plot of a Regression Line 6. Each Yi is independent of each other Yi (no autocorrelation)

In addition to these assumptions, some texts argue that X must be a fixed


variable, i.e., known without error, rather than a random variable. This assumption
is seldom restrictive, however.

Interpretation of the Regression Line

The regression line represents a series of point estimates of the average values(s)
of Y for given values of X. It is analogous to as a point estimate for µ. The
difference is that we have different point estimate(s) of along the regression line
for each value X. For example, if the sales manager wanted to make a point
estimate of the average amount of sales orders that could be expected if
salespeople spent 26 minutes with the buyers of large department stores, that
point estimate would be:

= a + b(x) = a + b(26)

Assumptions Underlying Bivariate Linear Regression = 48.55 + 6.38(26) = $214.43

The regression model illustrated has a number of underlying assumptions. The Similar point estimates could be made for other values of X. Point estimates
model is a parametric model, and has assumptions analogous to those presented are also used to construct interval estimates of , similar to the way is used in
earlier for the two mean tests. The assumptions underlying the regression model
the construction of confidence interval estimate of µ.
are:
With regard to the assumptions outlined above, it is very important to note that the
1. A random sample is chosen.
relationship between X and Y is assumed to be linear. The relationship may be

2. Interval level of measure is achieved on both X and Y. linear over a portion of the range of X, and non-linear over another portion. In the
above example, the relationship between sales orders taken and time spent with
3. The deviations of Yi from the regression line (Yi -x), are normally distributed. the buyer becomes non-linear as the time spent with the buyer increases above
approximately 25 minutes. A “diminishing returns” phenomena sets in, and more
4. The variance of the deviations of Yi from the regression line is the same time spent with the buyer does not produce proportionally more sales orders.
over all values of X (homoscedasticity).

165
The model gives only the best linear equation fit to the data. The relationship explained on some other theoretical basis outside the realm of the regression
between X and Y may be better represented by a non-linear regression equation. model, such as the theory of electromagnetic radiation or economic theory.
In such cases, the usual procedure is to perform a mathematical transformation
on the data of one or both variables to “linearize” the relationship and then Another point needing emphasis is that predictions should be made beyond the

proceed with the linear regression model on the transformed data. A logarithmic range of data for the independent variables. In the above example, the model

transform is frequently useful in connection with application of this model in should be used to predict sales orders only when time spent with the buyer is

business. between 10 and 30 minutes. As a rule of thumb, predictions can be made for X
values 15% below the minimum X value (8.5 for this example) and 15% above the
In addition, if the assumptions of any statistical model are violated, it is necessary maximum X value (34.5 for this example). As noted previously, any prediction
to explicitly note the consequences. When violation occurs, it is possible to have beyond these numbers can be inappropriate because the regression line might not
completely spurious results. If the consequent limitations of a violation are not be linear beyond these points.
explicit, the business results could well be disastrous.
As a corollary to the rule of thumb noted above, the a value (Y intercept) is usually
It is also worthy to note that regression analysis has primarily two uses: (1) not a meaningfully interpretable number. The reason it is usually meaningless is
prediction and (2) explanation. Many times these two purposes overlap as in the that an X value of zero is usually outside the relevant range of X values.
example used in this unit. In the example used above, the objective was stated as
the prediction of sales orders based on information about time spent by the The b value in the regression equation is mathematically equal to the slope of the

salespeople with the buyer. The resultant equation allowed prediction of sales, line as previously noted. It can be interpreted as the change in Y for each unit

but it was also logically and theoretically consistent. That is to say, it is logical change in X. Recall that the above equation was = 48.55 + 66.38X. In other

that time spent with a buyer will at least partially explain why a sale was made. words, as X changes by one unit Y, will change by 6.38 units. A business

Had an attempt to explain sales orders been made by using the number of games interpretation would be for every minute spent with the buyer, sales will change by

won by the Chicago Bears, there could still be a high degree of predictive validity 6.38 dollars.

but there would be no theoretical foundation. Such an equation would be called a


The sign of b, either positive or negative, indicates whether Y will change in the
spurious relationship and would be dangerous to use for long term forecasting.
same or opposite direction with respect to X. if the sign of b is positive, a one unit

The above point also emphasizes that a regression equation does not prove a change upwards or downwards in X will cause a change of 6.38 units upwards/

causal relationship between the independent and dependent variables. One can downwards in Y. Conversely, if the sign of b is negative, a one unit change

only assume that time spent with a buyer causes more sales orders, but it cannot upwards/downwards in X will cause a 6.38 unit change downwards/upwards in Y.

be proven with regression analysis.


Also important is the fact that b will not be a very reliable estimate unless there are

The existence of an apparent relationship between two variables, even at least 10 observations for each of the independent variables. In bivariate linear

substantiated by a test of significance of the correlation coefficient as discussed regression there is only one independent variable and therefore at least 10

in the next section, does not prove a causal relationship. Causality must be observations are needed. Only six are presented in the above example for ease of
computation.

166
One final comment on the interpretation of a regression equation is in order. It is Confidence Interval Estimate of the Conditioned Mean
important to realize that the interpretation of a regression analysis depends on
whether the data used are cross-sectional or time series. Cross-sectional data is A confidence interval estimate of µ was shown to be constructed by positioning
collected at one point in time, census data being a good example. The majority of the sampling distribution of means on either side of the sample mean . The upper
the census is collected during the month of April on every tenth year. Time-series and lower limits of the confidence interval were:
data, on the other hand, observes a particular phenomenon over successive time
periods. An example of time-series data would be collecting monthly sales and Upper limit = + (Z or t) and lower limit = + (Z or t)
advertising data for a particular firm or industry for a given length of time, say
three years. This would provide the possibility 36 observations for construction of
a regression line. It is important to realize that a time-series interpretation pertains where is the standard error of the mean.

to the way Y changes over time as X changes over time, while a cross-sectional
interpretation refers to the amount that Y values differ as there is a simultaneous In a completely analogous fashion, the confidence interval estimate of conditional
unit change in X. mean ( ) is constructed by positioning the sampling distribution of the
conditional mean vertically above and below the regression line. The width of the
THE INFERENTIAL ASPECTS OF REGRESSION
confidence interval of the conditional mean is:

At this point it is useful to note how regression analysis relates to the other
inferential statistics studied. Recall the general linear model: + (Z or t) and - (Z or t)

Y = a + bX +e
where is the standard error of the conditional mean, analogous to .
This conforms to the usual notation of Latin letters for statistics and Greek letters
for parameters. In terms of the parameters, the equation would be expressed as:
One major difference between and is for small sample cases (defined as
Y = a + βX + ε
n ≤ 100 for regression analysis) the value of varies at different points along the
Note that the three parameters (a, β, ε) correspond to the three statistics (a, b, e).
Both Y and X are variables and therefore do not have a corresponding parameter. regression line, whereas is always the same for a given sample. For large

Inferring from a sample statistic to a population parameter in this case is


sample case (defined as n > 100 for regression analysis) can be considered to
completely analogous to previously noted inferences as to µ. The inferential
aspects of regression analysis are explained in greater detail in the following have the same value for all points along the regression line for a given set of data,
section on confidence intervals.
similar to .

167
(B) =
Concepts and Computational Procedures

In constructing a confidence interval estimate of the mean, the sample data were Estimate of the population standard deviation (of y) based on sample (of x + y’s)
where the (y -x)’s are the deviations from the sample mean y, and the values of x
used to compute , and .
are disregarded.

In regression analysis, which is analogous to , is determined by the equation


which is fitted to the data. (C) =

In regression analysis, is the concept similar to . It, is called “the Estimate of the population standard deviation (of y) conditional on x. The values
of x are taken into account. The deviations (y-), are deviations in y from the
estimate of the population standard deviation conditional on x”. Also in regression
regression line at points defined by the x’s.
analysis is the concept similar to . It, is called “the standard error of
This formula (C) is conceptually correct, but it is not computationally convenient,
the conditional mean,” as mentioned above. The standard error of the conditional because it would require:

mean is called “the standard error of the regression,” and sometimes is simply
1) Calculation of each value of x for each Xi
called the “standard error of the estimate.”
2) Determining the difference (Y - x) for each Yi, corresponding to

each X.
(A) = .
Formulas have been developed for calculating ô directly without computing the
deviations from the line. The formulas are similar to:
The conceptual similarity between and is shown by comparing the two

formulas (A and B) below. The subscript (y) on ô, i.e., , is used with the
variable (x or y) to which reference is being made. ô=

Estimate of the population standard deviation based on sample (of x’s), where (x which is the alternative formula for calculating ô without determining the
-)’s are deviations from the sample mean. deviations (x - ) for use in formula (A) above.

168
Computational Formulas for These three terms A, B, and C are also used in other formulas (for and for (r)
the coefficient of correlation). It is computationally convenient to record them for
There are two commonly used formulas for ô. The first, shown below, uses only later use to avoid recalculation.
terms derived directly from the data. The second, simpler formula uses the terms
(a + b) of the coefficients of the regression equation. If the coefficients a + b have (2) Using regression coefficients a + b in computing :
already been calculated, the second formula could be used. They both produce
the same result, given usual rounding errors. After a and b have been determined, the following equation can be used to

compute :
(1) Using terms from data to compute:

In using this formula, be aware that if any error has been made in computing a and

b, the value for will be incorrect.

At first glance, this formula appears formidable until you realize that there are three Calculating the Standard Error of the Conditional Mean
basic terms involved:
Once the value has been calculated for the estimate of the standard deviation of
the population of y conditional on x, ô, then can be calculated. There are two
situations to be dealt with: the large sample and the small sample case.

(1) Large Sample Case: When the sample size is ≥ 100, the standard error of the
conditional mean, , can be determined simply as:

This is exactly the same as for the confidence interval estimate of a

mean.

169
(2) Small Sample Case: When the sample size is < 100, a second consideration In the confidence interval construction procedure, the net effect of the second
must be made in computing . This consideration is that, because of the small term is to make the confidence interval estimate of Y become larger the farther

is from the mean. This gives a confidence interval whose width varies along the
sample size, there may be error in positioning the regression line. Instead of the
regression line for the small sample case. For the large sample case, it is
line being correctly positioned the line may be incorrectly slanted upward or
assumed that the regression line is positioned correctly, thus the width of the
incorrectly slanted downward.
confidence interval is the same all along the line.
Possible Errors in Positioning Regression Lines
Confidence Interval Estimates for Large and Small Sample Case
To compensate for the possible error in positioning the regression line when the
To continue the example of the relationship between sales orders taken and time
sample size is small, a second term is added to the formula for ô as shown below.
spent with the buyer introduced above, the regression line
Correct positioning here refers to the line being a good estimate of the parameter
line, Y = + βX + ε Y= 48.55 + 6.38(x)

was computed. It was pointed out that the values of represent a string of point

estimates of the average sales orders taken conditional on x, the time spent with
the buyer. A confidence interval estimate is a vertical interval above and below
the regression line.

second term to compensate for error in positioning of the regression line. Since ô is estimated from the sample and n = 6, the t distribution will be used with
n – 2 = 4 degrees of freedom; t4df, 95% confidence = 2.776.

Note that the denominator in the second term under the radical, , is (a) Confidence interval at 19.17:

the term “C” of the three basic terms mentioned above in computing . = ô; = 48.55 + 6.38 (19.17) = 170.86

= 170.86 ± (2.776) (11.61)


The numerator of the second term reflects the difference between a given value of
x and the mean, . Thus the value of the term becomes larger as the square of the = 170.86 ± 32.23
distance that xi is from the mean becomes larger. The result is to make ô larger
when xi is farther from the mean, and to make ô smaller when xi is close to the = $138.63 to $203.09
mean. And when xi is at the mean, xi = , then (xi-)2 = 0, the second term under
(b) Confidence Interval at 28
the radical is zero, and the formula becomes the same as the large sample case.

= ô; = 48.55 + 6.38 (28) = 227.19

170
= 227.19 ± (2.776) (19.17) If the sales manager wishes to have an estimate of the average sales orders to be
expected when 28 minutes are spent with the buyer, the point estimate would be
= 227.19 ± 53.19 = $227.19, and the 95% confidence interval estimate would be from $173.97 to
$280.41. Likewise, if 12 minutes are spent with the buyer, the point estimate of
= $173.97 to $280.41
sales order would be = $125.11, and the 95% confidence interval estimate would
(c) Confidence Interval at 12 be from $77.97 to $172.25.

= ô; = 48.55 + 6.38 (12) = 125.11 Other factors affect sales orders taken besides time with the buyer. Additional
variables can be evaluated with a multiple regression model.
= 125.11 ± (2.776) (16.98)
The Coefficient of Determination, r2, and Coefficient of Correlation, r
= 125.11 ± 47.19
Both and are measures of the strength of the relationship between two
= $77.97 to $172.25
variables. Definitions and calculations are often first made in terms of , but
95% Confidence Interval Estimate of Mean Sales Orders Taken Conditional
tests of significance are made in terms of r
on Time Spent with the Buyer
In using a linear regression line for predicting a value of for a given value of xi, we
can consider that the line is a way of “explaining” some of the variation in Y as
depending on x.

Since there is still some variation in yi around the regression line, the values of x
have not “explained” all of the variation in values of y. The total variation in y can
be expressed and represented by appropriate equations in terms of populations
as:

Total variation in y = Explained Variation in y + Unexplained Variation in y

The population coefficient of determination, rho squared, , is defined as the

proportion of the total variation in y which is explained by x.:

= proportion of variation in y explained by x

may take on only values from zero to one.

171
The sample coefficient of determination, , is defined in terms of (biased)

estimates of the above terms as: Once we have entered the data in SYSTAT we can request Analysis using
Regression and Two-stage Least Square.

This equation is often labeled as follows:

Although may take on only values between zero and one, r may be +/- thus

values of r may be 0 to +1 or 0 to -1. The only significance of a negative value for


r is descriptive: it signifies that the slope of the regression line is downward rather
than upward. If the b coefficient of the regression equation is positive, the
The following window opens to specify the independent (explanatory) variable and
correlation coefficient r is considered positive, and if the b coefficient is negative,
the dependent (response) variable for our analysis.
the correlation coefficient r is considered negative. The (+) or (-) designation on r
has nothing to do with the strength of the relationship between x and y. It only
signifies the direction of the relationship. Therefore in testing r for significance,
whether is large enough to be considered not zero.

If r were zero, or nearly so, then knowing x is of little or no value in predicting y.


Thus the proposition of the total variation in y explained by x will be nil; there is a
small or zero value for r2, and a small or zero value for r.

The test of hypothesis for the correlation coefficient is therefore stated as follows:
H0: ρ = 0

H1: ρ 0

172
Testing the Coefficient b in the Regression Equation

Click OK and the following output is generated from the analysis. A second way to test the strength of the correlation between x and y is to test b,
the coefficient in the linear regression equation. This test is equivalent to the
direct test of r described above, although it is b which is actually tested.

The logic of the test is this: if the slope of the regression line were absolutely flat,
then b in the equation would be equal to zero. The equation would then by
where (a) is a constant, the y intercept. Thus knowing value of x is of no value in
predicting the value of y. To test b, the statistical hypothesis is:

Ho: β = 0 and

H1: β 0. Where β is the coefficient in the population regression equation,

From the SYSTAT output we see that the t value for Time Spent (beta) is 3.697 and
is significant at a p-Value of 0.021. If we are testing at a CL of 95% our critical
alpha would be equal to 0.05 and we would reject the null hypothesis of β=0.

As before the equation is: Y = 48.462 + 6.385 (X)

Step 5 State statistical conclusion: Reject Ho, Accept H1

Step 6 State conclusion in terms of problem setting: the correlation coefficient is


too large to be attributed to chance alone, thus the variables x and y are
correlated in the population. The observed correlation coefficient is statistically
significant. This is generally the conclusion desired; the r is too large to have
occurred by chance alone.

173
Section 2

Predicted Scores

REGRESSION Predicted and Residual Scores

1. Predicted and Residual Scores The regression line expresses the best prediction of the dependent variable (Y), given the independent

2. R-square variables (X). However, nature is rarely (if ever) perfectly predictable, and usually there is substantial
variation of the observed points around the fitted regression line. The deviation of a particular point
3. Residual Diagnostics from the regression line (its predicted value) is called the residual value.

Residual Variance and R-square

In a scatterplot, we have an independent or X variable, and a dependent or Y variable. Each point in


the plot represents one observation or participant, that is, the respective case and variable. The goal of
linear regression procedures is to fit a line through the points. Specifically, the program will compute a
line so that the squared deviations of the observed points from that line are minimized. Thus, this
general procedure is sometimes also referred to as least squares estimation.

For example:

174
The linear relationships in (a) and (b) are both explanatory with (a) sowing a strong then there is no residual variance and the ratio of variance would be 0.0. All
positive relationship and (b) a negative relationship. To interpret the direction of the variation in Y is explained by X. If we know the value of X we can perfectly predict
relationship between variables, look at the signs (plus or minus) of the regression the value of Y.
or B coefficients. If a B coefficient is positive, then the relationship of this variable
with the dependent variable is positive (e.g., the greater the independent variable In most cases, the ratio would fall somewhere between these extremes, that is,

the higher the dependent variable); if the B coefficient is negative then the between 0.0 and 1.0. 1.0 minus this ratio is referred to as R-square or

relationship is negative. the coefficient of determination. This value is immediately interpretable in the


following manner. If we have an R-square of 0.4 then we know that the variability
The tight cluster of observations about the regression line estimated for either (a) of the Y values around the regression line is 1-0.4 times the original variance; in
or (b) results in small residual values as compared to the residual values other words we have explained 40% of the original variability, and are left with
associated with the following scatter plot. 60% residual variability. Ideally, we would like to explain most if not all of the
original variability. The R-square value is an indicator of how well the model fits the
data (e.g., an R-square close to 1.0 indicates that we have accounted for almost
all of the variability with the variables specified in the model).

Given irregularities in data, the regression line drawn is a compromise. How do we


find a best fitting line? A reasonable method is to place a line through the points
so that the vertical deviations between the points and the line (errors in predicting
earnings) are as small as possible. In other words, these deviations (absolute
discrepancies, or residuals) should be small, on the average, for a good-fitting line.

As discussed in the section on linear regression, the procedure of fitting a line or


curve to data such that residuals on the dependent variable are minimized in some
way is called regression. Because we are minimizing vertical deviations, the
regression line often appears to be more horizontal than we might place it by eye,
especially when the points are fairly scattered. The regression line is not intended
to pass through as many points as possible. It is for predicting the dependent
variable as accurately as possible, given each value of the independent variable.
The smaller the variability of the residual values (plots a and b) around the
regression line relative to the overall variability, the better is our prediction. For There are several ways to draw the line to minimize the deviations. One method is
example, if there is no relationship between the X and Y variables, then the ratio of to minimize the sum (or mean) of the squared residuals. Using squared instead of
the residual variability of the Y variable to the original variance is equal to 1.0. In absolute residuals gives more influence to points whose y value is farther from the
essence, no variation in Y can be explained by X. If X and Y are perfectly related

175
average of all y values. This makes the mathematics simpler. This method is any method whether a small sample is from a normal population. You can also
ordinary least squares. plot a histogram or stem-and-leaf diagram of the residuals to see if they are lumpy
in the middle with thin, symmetric tails. SYSTAT offers tests to check normality:
Residual Diagnostics Shapiro-Wilk test, and Anderson-Darling test.

You do not need to understand the mathematics of how a line is fitted in order to The errors have constant variance. Plot the residuals against the estimated
use regression. You can fit a line to any x-y data. The computer doesn't care values. The following plot shows studentized residuals (STUDENT) against
where the numbers come from. To have a model and estimates that mean estimated values (ESTIMATE). Use these statistics to identify outliers in the
something, however, you should be sure the assumptions are reasonable. dependent variable space. Under normal regression assumptions, they have a t
distribution with (N-p -1) degrees of freedom, where N is the total sample size and
The sample of the errors in the model are the residuals—the differences between
(p) is the number of predictors (including the constant). Large values (greater than
the observed and predicted values of the dependent variable. There are many
2 or 3 in absolute magnitude) indicate possible problems.
diagnostics you can perform on the residuals. Here are the several important
ones:

The errors are normally distributed. Draw a normal probability plot (PPLOT) of
the residuals.

Large Residual

ESTIMATE

RESIDUAL Our residuals should be arranged in a horizontal band within two or three units
around 0 in this plot. Again, since there are so few observations, it is difficult to tell
The residuals should fall approximately on a diagonal straight line in this plot. whether they violate this assumption in this case. There is only one particularly
When the sample size is small the line may be quite jagged. It is difficult to tell by large residual, and it is toward the middle of the values.

176
The errors are independent. Several plots can be done. Examine the plot of
residuals against estimated values. Make sure that the residuals are randomly
scattered above and below the 0 horizontal and that they do not track in a snaky
way across the plot. If they look as if they are scattered horizontally then they may
not be independent of each other. You may also want to plot residuals against
other variables, such as time, orientation, or other ways that might influence the
variability of your dependent measure. ACF PLOT in SERIES measures whether
the residuals are serially correlated. Here is an autocorrelation plot:

Autocorrelation Plot

All the bars should be within the confidence bands if each residual is not
predictable from the one preceding it, and the one preceding that, and the one
preceding that, and so on.

Testing assumptions graphically is critically important. You should never


present regression results until you have examined these plots.

177
Section 3

Correlation Coefficient

REGRESSION Interpreting the Correlation Coefficient R

1. Interpreting the Correlation Coefficient Customarily, the degree to which two or more predictors (independent or X variables) are related to the

2. Assumptions dependent (Y) variable is expressed in the correlation coefficient R, which is the square root of R-
square. In multiple regression, R can assume values between 0 and 1. To interpret the direction of the
3. Limitations relationship between variables, look at the signs (plus or minus) of the regression or B coefficients. If
a B coefficient is positive, then the relationship of this variable with the dependent variable is positive
4. Residual Analysis
(e.g., the greater the IQ the better the grade point average); if the B coefficient is negative then the
relationship is negative (e.g., the lower the class size the better the average test scores). Of course, if
the B coefficient is equal to 0 then there is no relationship between the variables.

Assumptions, Limitations, Practical Considerations

• Assumption of Linearity
• Normality Assumption
• Limitations
• Multicollinearity and matrix ill-conditioning
• The importance of residual analysis
• Choice of the number of variables

Assumption of Linearity

First of all, as is evident in the name multiple linear regression, it is assumed that the relationship
between variables is linear. In practice this assumption can virtually never be confirmed; fortunately,
multiple regression procedures are not greatly affected by minor deviations from this assumption.
However, as a rule it is prudent to always look at bivariate scatterplot of the variables of interest. If

178
curvature in the relationships is evident, you may consider either transforming the is one and the same variable, regardless of whether it is measured in pounds or
variables, or explicitly allowing for nonlinear components. ounces. Trying to decide which one of the two measures is a better predictor of
height would be rather silly; however, this is exactly what you would try to do
Other methods include Exploratory Data Analysis and Data Mining Techniques, if you were to perform a multiple regression analysis with height as the dependent
the General Stepwise Regression, and the General Linear Models. (Y) variable and the two measures of weight as the independent (X) variables.
When there are very many variables involved, it is often not immediately apparent
Normality Assumption
that this problem exists, and it may only manifest itself after several variables have
already been entered into the regression equation. Nevertheless, when this
It is assumed in multiple regression that the residuals (predicted minus observed
problem occurs it means that at least one of the predictor variables is (practically)
values) are distributed normally (i.e., follow the normal distribution). Again, even
completely redundant with other predictors.
though most tests (specifically the F-test) are quite robust with regard to violations
of this assumption, it is always a good idea, before drawing final conclusions, to
The Importance of Residual Analysis
review the distributions of the major variables of interest. You can produce
histograms for the residuals as well as normal probability plots, in order to inspect Even though most assumptions of multiple regression cannot be tested explicitly,
the distribution of the residual values. gross violations can be detected and should be dealt with appropriately. In
particular outliers (i.e., extreme cases) can seriously bias the results by "pulling" or
Limitations
"pushing" the regression line in a particular direction, thereby leading to biased
regression coefficients. Often, excluding just a single extreme case can yield a
The major conceptual limitation of all regression techniques is that you can only
completely different set of results.
ascertain relationships, but never be sure about underlying causal mechanism. For
example, you would find a strong positive relationship (correlation) between the
damage that a fire does and the number of firemen involved in fighting the blaze.
Do we conclude that the firemen cause the damage? Of course, the most likely
explanation of this correlation is that the size of the fire (an external variable that
we forgot to include in our study) caused the damage as well as the involvement
of a certain number of firemen (i.e., the bigger the fire, the more firemen are called
to fight the blaze). Even though this example is fairly obvious, in real correlation
research, alternative causal explanations are often not considered.

Multicollinearity and Matrix Ill-Conditioning

This is a common problem in many correlation analyses. Imagine that you have
two predictors (X variables) of a person's height: (1) weight in pounds and (2)
weight in ounces. Obviously, our two predictors are completely redundant; weight

179
Section 4

Multiple Regression

REGRESSION Multiple Regression and Multiple Correlation

1. Multiple Regression Multiple regression (the term was first used by Pearson, 1908) is more technically referred to as multiple

2. Partial Correlation linear regression. Multiple refers to the fact that there is more than one independent variable, as
compared to the bivariate or simple model that contains only one independent variable. Linear refers to
3. Purposes the relationship between the Y and X’s as being a first order equation. A linear equation is an equation
of the first order. In other words, an equation containing no exponent greater than 1. Examples are y =
4. Inferences
3x and y = 29 + 6x – 0.3z. Other functions are referred to as non-linear or curvilinear. Examples are
5. Considerations equations of second order, e.g., y = x2 and y = 21 – 0.6x2; equations of third order, e.g., y = x3 and y =
2 + 3x +43 etc. The linearity assumption is discussed later under curvilinear models.
6. Curvilinearity
The general purpose of multiple regression is to learn more about the relationship between several
independent or predictor variables and a dependent or criterion variable. For example, a real estate
agent might record for each listing the size of the house (in square feet), the number of bedrooms, the
average income in the respective neighborhood according to census data, and a subjective rating of
appeal of the house. Once this information has been compiled for various houses it would be
interesting to see whether and how these measures relate to the price for which a house is sold. For
example, you might learn that the number of bedrooms is a better predictor of the price for which a
house sells in a particular neighborhood than how "pretty" the house is (subjective rating). You may
also detect "outliers," that is, houses that should really sell for more, given their location and
characteristics.

Personnel professionals customarily use multiple regression procedures to determine equitable


compensation. You can determine a number of factors or dimensions such as "amount of
responsibility" (Resp) or "number of people to supervise" (No_Super) that you believe to contribute to
the value of a job. The personnel analyst then usually conducts a salary survey among comparable

180
companies in the market, recording the salaries and respective characteristics • X = IQ
(i.e., values on dimensions) for different positions. This information can be used in
a multiple regression analysis to build a regression equation of the form: If in addition to IQ we had additional predictors of achievement (e.g., Motivation,
Self- discipline) we could construct a linear equation containing all those variables.
Salary = .5*Resp + .8*No_Super In general then, multiple regression procedures will estimate a linear equation of
the form:
Once this so-called regression line has been determined, the analyst can now
easily construct a graph of the expected (predicted) salaries and the actual Y = a + b1*X1 + b2*X2 + ... + bp*Xp
salaries of job incumbents in his or her company. Thus, the analyst is able to
determine which position is underpaid (below the regression line) or overpaid The mathematics involved in estimating the parameters, i.e., a, b1, b2 as

(above the regression line), or paid equitably. estimates for a, β1 and β2, of a multiple regression become extremely time
consuming and complex when the regression equation includes more than two
The multiple regression model is a logical extension of the bivariate model and variables. There are several methods of estimating such equations, the most
can be expressed as:.Y = a + b1*X1 + b2*X2 + ... + bp*Xp .where: frequently employed method being matrix algebra. The least squares logic, as
presented under bivariate regression, still applies. Statistical software programs
• Y = dependent variable are available to solve the more complex regression problem.

• a = y intercept
Unique Prediction and Partial Correlation
• b1 thru bn = regression coefficients for each of the respective independent
Note that in the equation, regression coefficients (or B coefficients) represent
variables
the independent contributions of each independent variable to the prediction of
• x1 thru xn = independent variables the dependent variable. Another way to express this fact is to say that, for
example, variable X1 is correlated with the Y variable, after controlling for all other
To relate the above model in more concrete terms, consider the equation: where independent variables. This type of correlation is also referred to as a partial
is sales in units, x1 is price in dollars, and x2 is advertising in dollars. Such an correlation (this term was first used by Yule, 1907). Perhaps the following example
equation could allow analysis of the relationship between the dependent variable, will clarify this issue. You would probably find a significant negative correlation
sales and the two independent variables, price and advertising. between hair length and height in the population (i.e., short people have longer
hair). At first this may seem odd; however, if we were to add the
Another example would be an expansion of the bivariate equation from earlier: variable Gender into the multiple regression equation, this correlation would
probably disappear. This is because women, on the average, have longer hair than
Y=a+b*X
men; they also are shorter on the average than men. Thus, after we remove this
gender difference by entering Gender into the equation, the relationship between
• Y = GPA
hair length and height disappears because hair length does not make any unique

181
contribution to the prediction of height, above and beyond what it shares in the A random sample of data from the regional divisions of company records:
prediction with variable Gender. Put another way, after controlling for the
variable Gender, the partial correlation between hair length and height is zero.

Three Purposes of Multiple Regression

1. To estimate the regression coefficients (b’s) for each of the independent


variables. The coefficients can then be employed for prediction of the
dependent variable.

2. To estimate the error involved in using the regression equation for


estimation.
Based on least squares calculations the estimated model is:
3. To estimate the proportion of variance in the dependent variable that is
“explained” by the independent variables, in other words, how Y = 1818.90 – 63.84X1 + 0.56X2
comprehensive the equation is.
We will use our statistical program to calculate this equation.
The first purpose, estimation of the regression coefficients, is accomplished by
using the previously explained method of least squares. The second purpose, Note that the parameters have been estimated to be:
estimation of the standard error of the regression, is accomplished in much the a = 1818.90
same manner as for bivariate regression. Again, the estimate of error is simply the b1 = -63.84
amount that the estimates from the regression equation differ from the actual b2 = 0.56
observations in terms of standard deviations. The third purpose, estimating the The standard error of the regression, in this case being useful only for specifying
proportion of explained variance, is accomplished by computing the multiple the confidence interval at ( , ) because of the small sample size , is computed
coefficient of determination. The multiple coefficient of determination is denoted
to be:
in upper case (R2) versus the lower case (r2) for bivariate analysis.

As an example, return to sales as a function of price and advertising. For = 159.19


illustrative purposes, only 10 observations will be used, but in an actual study a
To complete the three pieces of information sought, the multiple coefficients of
minimum of 20 observations are required to accurately estimate the parameters of
determination is computed to be: R2 = 0.8296 = 82.96%
the model.

Multiple correlation is an extremely useful concept and the following section


explains it in greater detail.

182
Multiple Correlation The unshaded part of Y is unrelated to either X1 or X2. The unshaded area is
equal to 1-R2 and is often denoted k2. k2 is appropriately named the coefficient of
Multiple correlation is the degree or strength of the relationship between more non-determination. In this case k2 = 1 - 0.8296 = 0.1704.
than two variables. In the previous unit sample (or bivariate or zero order)
correlation was examined, i.e., the correlation between x and y. Simple correlation
is usually denoted r although a more satisfactory notation would be rxy. The
It is appropriate at this time to introduce some of the formulas for correlation and
subscripts tell which variables pertain to the coefficient.
regression, and very similar formulas are useful in studying analysis of variance.
In the case of multiple correlation, the subscripts become more important than for Recall from the discussion of bivariate correlation that SSt = SSreg + SSres where:
the bivariate case. In the example above, the multiple correlation coefficient
SSt = Total sum of squares
would be:R y●12. = 0.9108. For purposes here, the dot in the subscripts can be
thought of as separating the dependent and independent variables. The Y SSreg = Sum of squares due to the regression
subscript stands for the dependent variable and the (12) part of the subscript
indicates that the correlation involves both of the independent variables. As with SSres = Sum of squares unexplained by the regression
bivariate correlation, the R2 is easier to interpret than the R value itself. The R2
The portion of percentage interpretation of R2 becomes clearer if a sum of squares
value is the portion of the variance in the dependent variable that is “explained” by
formula is used:
the independent variables.

In the following figure, R2 would be the shaded area.

Multiple correlation with Venn Diagram


It can also be shown that the previously computed F statistic has an alternative
formula in terms of R2:

F obs =

where N and k are sample size and number of coefficients in the equation
respectively. Computer programs use the F and t statistics in very specific ways
and this is the topic of the next section.

183
Inferences in Multiple Regression
= standard deviation of the specific independent variable under consideration.

The inferential aspects of multiple regression are directly analogous to inferences


in the bivariate case, and the same is true for multiple and simple correlation.
That is to say, r approximates ρ, a approximates , and the b coefficients Considerations in Multiple Regression
approximate population β coefficients.
There are several considerations in multiple regression that can effect
Significance tests are performed just as when dealing with a bivariate case. interpretation:
Computer programs operate in a slightly different fashion, however. Each
program is somewhat different from the next and it is always wise to consult the The first consideration is the correctness of specification of the equation. The
specific manuals for each program. In general, the computer programs utilize the equation must make good common sense and fit with available theory. Good
t statistic to test the significance of the regression coefficient (b) and the common sense refers to the use of a priori logic in specifying independent
correlation coefficient (r). The F statistic is generally, but not always, reserved for variables that are logically related to the dependent variable. Theory can often
testing the significance of the regression equation as a whole. That is to say, do help in specifying the proper variables for an equation. For instance, using
the X variables explain a statistically significant portion of the Y variable? consumption data for the dependent variable and income data as an independent
variable, the resultant regression coefficient has a positive sign. If the sign were
With respect to the regression coefficient (b), the available literature does not negative, the assumption is that the data was not accurate or the sample was
explain very well the statistic known as (β). This β may not refer to the population somehow biased. Specification error could also be in the form of excluding a
regression weight estimated by the statistic, but rather to a standardized form of variable that is relevant to a multiple regression or including an irrelevant variable.
b. Standardized, or normalized, refers to the transformation of the raw data to Again, theory provides the underpinnings of these decisions. A related important
standard scores before the coefficients are computed. The reason for using beta point here is when R2 is high, i.e., greater than 0.80, there is usually limited
weights rather than the more common regression weights is to get an idea about usefulness in adding new variables to an equation. Usually any new variable is
the relative importance of the independent variables in explaining the dependent redundant with an already included variable. This redundancy is further discussed
variable. When the independent variables are in different units, e.g., X1 might be below under multicolinearity. As a rule of thumb, predictive equations should
in dollars and X2 in tons, the regression coefficients will not reflect the relative contain 5 or fewer variables.
importance. When the raw data are standardized, however, the relative
importance is reflected by the size of the beta coefficients. β’s can be translated to The second consideration is multicolinearity. Multicolinearity refers to the inter-
b’s with the formula: correlation of independent variables, X1 and X2 and the simple correlation
between these variables is 0.60.

where: ôy = standard deviation of Y

184
As a Venn diagram, they would appear as: solution then is to eliminate the variable(s) that is causing the problem. Which
variable to eliminate is a subjective choice. Further symptoms of, and treatments
for, colinearity are found in the many available regression texts.

A third important consideration is the equality of variance or homoscedasticity


assumption. This assumption was discussed under the bivariate case and there is
little difference for the multivariate case. One important difference is that the F
test is not appropriate. The F test is appropriate for testing the equality of only
two variances. A regression with three variables would then render the F test
The shaded area in the above diagram indicates the overlap or correlation inappropriate.
between the two X variables. Using these two variables as regressors on Y, the
In a multiple regression, heteroscedasticity is detected when the error term (e) is
resultant multiple correlation coefficient is 0.05. This situation is depicted in the
correlated with an independent variable(s). Most computer programs offer a
Venn diagram below:
graphic option for the purpose of plotting the error term against each of the
independent variables. The resultant graph should be random rather than
showing an increasing or decreasing pattern.

To detect heteroscedasticity, it is also useful to list the error term and each
independent variable from the smallest value to the largest value. If e increases as
Xi increases, then heteroscedasticity is most likely a problem, although
autocorrelation, as defined below, can also be the culprit.

The cure for heteroscedasticity is to divide each variable in the equation by the X
variable that causing the problem. This has effect of standardizing the variance
and hence eliminating the problem.
The question now is how to assign the beta weights (or regression weights) to the
independent variables. In other words, how should the overlap of X1 and X2 be
The forth consideration is a frequent problem in multiple regression studies known
proportioned when explaining Y?
as autocorrelation. Actually the technical name for this problem is first-order
autoregression, but it is commonly referred to as autocorrelation or serial
There is not one satisfactory answer to this question that will apply in all
correlation. It occurs only in time series data, and is defined as a systematic
situations. First, it is difficult to set a cut off point to detect when two variables are
correlation between successive observations. Serial correlation is detected by
collinear because colinearity is to reduce (or increase) the number of observations
examining scatter plots of the error terms. If the error terms appear as a linear
by about 10 percent and rerun the equation. If the beta weights (or regression
function (rather than random scatter), autocorrelation is most likely a problem.
weights change significantly or reverse their sign, colinearity is present. The
First-order autocorrelation can also be detected with the Durbin-Watson statistic

185
or the Von-Neumann K ratio. It would be unnecessarily burdensome to discuss CAVIAT
these statistics here, as it is quite effective to simply examine a scatter plot of the
error term. Multiple regression is a seductive technique: "plug in" as many predictor variables
as you can think of and usually at least a few of them will come out significant.
The problem of autocorrelation is usually caused by an omitted variable from the This is because you are capitalizing on chance when simply including as many
equation. It can also be caused by erroneous specification of the functional form variables as you can think of as predictors of some other variable of interest. This
of the equation. The problem of specifying functional form is discussed briefly in problem is compounded when, in addition, the number of observations is
the next section. relatively low. Intuitively, it is clear that you can hardly draw conclusions from an
analysis of 100 questionnaire items based on 10 respondents. Most authors
Curvilinearity recommend that you should have at least 10 to 20 times as many observations
(cases, respondents) as you have variables; otherwise the estimates of the
Thus far the assumptions outlined for bivariate regression have sufficed, with the
regression line are probably very unstable and unlikely to replicate if you were
exception of multicolinearity that cannot exist in the bivariate case. One important
to conduct the study again.
assumption that can be relaxed is that of linearity. Surprisingly, most models
conform well to the linearity assumption. On the other hand, a linear model is
often used when there is little available evidence of the theoretical form of the
relationship, or a linear approximation is sufficiently precise estimate of a complex
form. The form or combinatorial rule applied to an equation is usually referred to
as functional form. There are several ways of relating dependent and independent
variables to make a function linear. Examples would be logarithmic, semi-
logarithmic, and multiplicative models. The important element here is to recognize
that a curvilinear model sometimes produces a better fit to the data than a linear
model. A combination of theory, logic, and data analysis will indicate when such
models are appropriate.

186
Chapter 6

What do we
know?

Review of the sections in a research report coupled


with the techniques and procedures covered in the
text. Formulas for the procedures covered are
provided for reference.
Section 1

What do we know?

WHAT DO WE KNOW What do we know?


1. Research Process
In the introduction we started with a basic question “How do we know?” One of the befuddling aspects
2. Value of Information of statistics is that we never really “KNOW”. At least not in the sense of being 100%, no possible way
this can be wrong. As we have learned, by the very nature of probability we can approach absolute
certainty but we can never get to a state of absolute certainty. The concept, of probability without
certainty, is one that causes many to question the use of statistics. Yet in a practical sense we all use
statistics. It may not be in the formal sense we will be using in this text but we do use our beliefs about
the probability of an event or object to help us make decisions everyday.

Every day we use our built in statistical calculator to make decisions. We estimate the probability of a
price/object/offer difference and the chance that this one price/object/offer being outside of the normal
range of variations you have observed. This is the essence of statistical inference. We take data
from a sample to represent the general population. We transform the data from its raw form by sorting,
describing and analyzing. This allows us to make inferences about the world at large (the population
represented by our sample).

Review of the Research Process

Managers need information in order to introduce products and services that create value in the mind of
the customer. But the perception of value is a subjective one, and what customers value this year may
be quite different from what they value next year. As such, the attributes that create value cannot simply
be deduced from common knowledge. Rather, data must be collected and analyzed. The goal of
research is to provide the facts and direction that managers need to make their more important
decisions. The value of the information provided is dependent on how well one effectively and
efficiently executes the research process.

188
Scientific procedures involve the use of models to describe, analyze, and make A major distinction between research and everyday observation is that research is
predictions. A model can be a well-defined set of descriptions and procedures planned in advance. Based on a theory or hunch, researchers develop research
like the Product Life Cycle in marketing, or it can be a scaled down analog of the questions and then plan what, when, where and how to observe in order to
real thing, such as an engineer’s scale model of a car. answer the questions.

Models that are useful and valuable in managing business operations are broadly What (or whom) to observe, the population. When a population is large,
termed “statistical models”. A large number of these have been developed to researchers often plan to observe only a sample (i.e., a subset of a population).
assist researchers in a variety of fields, such as agriculture, psychology, education, Planning how to draw an adequate sample is, of course, critical in conducting
communication, and military tactics, as well as in business. Only the professional valid research.
statistician would be expected to understand the full range of such statistical
models. When the observations will be made. (Morning, night…). Researchers realize that
the timing of their observations may affect the results of their investigations.
The statistical methods presented in this text have been chosen to cover a variety
of problems generally encountered in business. The reader is encouraged to seek Where to make the observations. For instance, will the observations be made in a
out additional readings for detailed coverage of each model and additional quiet room or in a busy shopping mall?
statistical methods beyond the scope of this text.
How to observe. Use an existing questionnaire or develop a new survey
The approach to learning models is to emphasize the similarity in logical structure instrument. For instance, researchers might build or adopt existing interviews,
among various models, and to stress understanding of assumptions inherent in questionnaires, personality scales, etc., to use in making observations.
each model. This makes it possible to make sense from a seeming quagmire of
statistical methods, and to clearly and logically determine when it is appropriate to The observations that researchers make result in data. The data might be the

use a specific statistical model. As a manager reading research reports, it is then brands participants plan to purchase or the data might be respondent scores on a

possible to determine if the correct techniques were utilized. scale that measures preference. In this context, variables are things that we
measure, control, or manipulate in research. The participants (respondents) with
A business manager needs to understand and address the limitations of statistical the variables represent our data.
models, emphasizing what the techniques do not say, rather than simply how to
properly interpret conclusions from statistical tests. Why is understanding necessary if prediction is okay without it? Theories in
business are generally not very highly refined and represent a great deal of
Finally, a business manager is not expected to be a statistician. The objective is abstraction from the real world situation. The use of statistics does not overcome
to properly understand and interpret results from statistical models. A manager these deficiencies; it may even fool some people by making them think they
who will ask the right questions of the researcher or statistician in order to understand more than they do. It is essential to be constantly aware of what is
evaluate and apply results. known and what is not known.

189
The Value of Information

Information can be useful, but what determines its real value to the organization?
In general, the value of information is determined by:

1. The validity and reliability of the information.

2. The level of indecisiveness that would exist without the information.

3. The cost of the information in terms of time and money.

4. The ability and willingness to act on the information.

To maximize the value of information obtained, those who use information need to
understand the research process and its limitations.

190
Section 2

Research Report Format

REPORT FORMAT FORMAT FOR A RESEARCH REPORT

1. Executive Summary
Title Page
2. Purpose
The title page should include the title of the report along with the name(s) of the client or organization
3. Objectives for whom the report is written. Also included on the title page should be the name(s) of the author(s) of
the report along with all pertinent information about them.
4. Methodology
Table of Contents
5. Findings

6. Conclusions and Recommendations The table of contents lists the information contained in the report in the order in which it will be found.
All major topics of interest should be listed.

Executive Summary

The executive summary should be a one to two page overview of the information contained in the
research report. It should give the reader an easy reference, in very brief form, to the important
information contained in the report and explained in more detail in the body of the report. People
attending a presentation of research or reading the report will use this section as a reference during
presentations and as a synopsis of the research done.

A one to two page summary of the entire report, focusing on Findings, Conclusions,
Recommendations.

191
Background and Purpose of Research Qualitative Research (if used):

• Problem/opportunity This section should contain all information regarding any interviews or focus
groups that were conducted as part of the research project. This section should
• Background & context (summary of background & exploratory research) begin with an explanation of why this research is needed or beneficial. Other
information provided should include:
The introduction should contain a brief overview of the problem being addressed
and the background information needed for the reader to understand the work • An overview of the issues that were included in this research
being done and the reasoning behind it. After reading the introduction, the reader
should know exactly what the report is about, why the research was conducted, • Why these issues were salient
and how this research adds to the knowledge that the reader may have about the
• How the discussion guide was developed
topic.

• A description (not identification) of the participants


Research Objectives

• Discussion of the information collected (using quotes to highlight important


• Research objectives (short sentences &/or bullet points)
points)
Research Methodology
• Conclusions based on the collected information
• Sampling
• Clear explanation of how the conclusions are based on the research done
• Method of data collection
• How these conclusions will contribute to the rest of the research project
• Dates
Experimentation (if used):
• Estimate of random sampling error
There are many things that must be considered on order for an experiment to be a
• Research limitations useful part of any research agenda. Once again, the discussion should begin with
why this research is deemed to be important to the overall research agenda being
Secondary Research: followed. The following topics must be included if an experiment was used:

This section will contain all of the information that was collected through review of • Identification and description of the variables included in the experiment
existing information. The importance of the secondary information as it pertains to
the problem being researched must be made clear to the reader. Conclusions • Clear statement of the hypothesized relationships between or among the
should be drawn in a logical fashion and insight into how these conclusions will be variables
used throughout the rest of the research agenda should be provided.

192
• Explanation of how the variables were measured • Explanation of bases for those conclusions

• Discussion of reliability and validity of the measurements • How these conclusions will contribute to the rest of the research project

• Clear explanation of the treatment being used Survey Research

• Conditions under which the experiment was conducted This is the section that should be pulling together all of the other issues that were
identified in the research steps that were conducted previously. The connections
• Description (not identification) of the subjects to the issues and constructs identified earlier should be made again here so that
they reader can easily see the foundations that are being used. Many issues will
• Description of data collection
have to be addressed in this section regarding how the survey was developed and
• Analysis of data, including details of procedures used and statistical how it was administered. Topics discussed in this section should include:
significance
• Identification of all issues included on the survey
• Conclusions clearly based on data analysis
• Explanation of the importance of the selected issues to project
• How these conclusions will contribute to the rest of the research project
• Development of the survey questions and wording
Observation (if used):
• Sources of survey questions (existing scales or newly created)
If observation was a part of the research project, you will need to explain several
• Description of population of interest
things to the reader or attendee at your presentation starting with why this method
is appropriate for your research goals. In addition, the following topics should all • Explanation of target population appropriateness
be part of the final report:
• Determination of sample size needed
• Explanation of why observation was appropriate
• Sampling procedures (random or convenience)
• Location and conditions under which observation was conducted
• Determination of the sample population
• Description of the population observed
• Method of survey distribution
• The recording methods used
Data Analysis
• Methods used to interpret observed behaviors
In this section, the reader should find a brief overview of the methods that were
• Conclusions drawn from observation
utilized in the research, the reasons that those methods were appropriate for the
193
research problem, an explanation of how the outcomes for those methods can be • Findings based only on results of the research not speculation
understood and interpreted. It is important to remember that the people reading
your report or listening to your presentation may not be familiar with the analysis • In-depth explanation of all major findings

methods being used. You must present the methods in such a way that anyone
• Clear presentation of support for the findings
interested in your research will be able to understand what was done and why it
was done. This section should include the following: Limitations:

• Overview of analysis methods used Recognize that even the best marketing research work is not perfect and open to
questioning. In this section, briefly discuss the factors that may have influenced
• Justification for methods chosen
your findings but were outside of your control. Some of the limitations may be

• Outcomes of analysis time constraints, budget constraints, market changes, certain procedural errors,
and other events. Admit that your research is not perfect but discuss the degree of
• Significance of results (statistical and otherwise) accuracy with which your results can be accepted. In this section, suggestions
can be offered to correct these limitations in future research.
Detailed Findings
Conclusions and Recommendations
• Results for each question
• Summary of conclusions
• Tables
• Recommendations (ie. Solve the problem / opportunity)
• Graphs
• Future research suggestions (if any)
• Cross-tabulations
Conclusions are broad generalizations that focus on addressing the research
• Text summary of findings questions for which the project was conducted. Recommendations are your
choices for strategies or tactics based on the conclusions that you have drawn.
Findings
Quite often authors are tempted to speculate on outcomes that cannot be
The findings are the actual results of your research. Your findings should consist of supported by the research findings. Do not draw any conclusions or make any
a detailed presentation of your interpretation of the statistics found relating to the recommendations that your research cannot clearly support.
study itself and analysis of the resulting data collection. The judicious use of
References
figures, tables and graphs is encouraged when it is helpful to allow the reader to
more easily understand the work being presented. The findings section should This section should be a listing of all existing information sources used in the
include the following: research project. It is important to allow the reader to see all of the sources used

194
and enable the reader to further explore those sources to verify the information
presented.

Appendices

• Copy of questionnaire (if used)

• Supporting material for background & exploratory research

• Anything else which is relevant, but not appropriate for the main body of
the report

This section should include all supporting information from the research project
that was not included in the body of the report. You should include surveys,
complex statistical calculations, certain detailed tables and other such information
in an appendix. The information presented in this section is important to support
the work presented in the body of the report but would make it more difficult to
read and understand if presented within the body of the report.

195
Section 3

Techniques and Procedures

STATISTICAL TECHNIQUES Statistical Techniques and Procedures


1. Decision Rules
This section of the text provides a summary of the techniques and procedures covered. A general
2. Methods for Summarizing
guide on when to use each technique is provided as a reference.

Statistic’s is about this whole process of using the scientific method to answer questions and make de-

cisions. Effective decision-making involves correctly designing studies, collecting unbiased data, de-

scribing the data with numbers and graphs, analyzing the data to draw inferences and reaching conclu-

sions based on the transformation of data into information.

196
Step 3: What are the number of samples involved?
Decision Rules: Choosing Among Statistical Techniques
One Random Sample: 1. Normal (Z)

Step1: Was the sample(s) drawn according to the requirements for a random sam- 2. Students’ (t)
ple? If YES: Go to Step 2. If NO: Cannot use any statistical model which is 3. Binomial (itself)
based on probability theory. However, one can describe data, i.e., its mean, stan-
4. Normal approx. to Binomial
dard deviation, quartiles, etc. One might also be able to fit a line to a set of data
5. Pearson’s Product Moment Correlation
by method of least squares, but could not construct a meaningful confidence inter-
val. 6. Chi Square

7. Spearman Rank Correlation


Step2: What level of measure was attained?

One Random Sample Consisting of Matched Pairs (Before-After):


Interval or Ratio Nominal Ordinal
1. Normal (Z) 1. Chi Square 1. Mann-Whitney 1. Wilcoxon
2. Students’ (t) 2. Binomial (itself) 2. Wilcoxon 2. Special Case Students’ (t)
3. F distribution 3. Spearman Rank
4. Pearson’s Product Correlation
Moment Correlation Two independent Random Samples:
5. Normal approx. to 1. Normal (Z)
Binomial 2. Students’ (t)

3. Normal approx. to Binomial


The above are the minimum levels of measure necessary for each test. Thus, the 4. F distribution
tests associated with interval level of measure may not be used with nominal or
5. Chi Square
ordinal levels of measure. The Chi Square test which needs only nominal level of
6. Mann-Whitney
measure, can be used on all levels of measure: ordinal, interval, and ratio. The
Mann-Whitney test, which requires ordinal level of measure can be used on inter-
val or ratio data as well. In fact, the Mann-Whitney test is a good alternative to the More Than Two Independent Random Samples:
two means Z or t test, especially if homogeneity of variance is questionable. Like- 1. Chi Square
wise, the Wilcoxon and Spearman tests may be used on interval or ratio data
2. Analysis of Variance
when other conditions for the tests have been met.

197
Step 4: Is the normal population assumption required? 5. In Wilcoxon matched pairs signed ranks test, difference of zero in
(response 2) minus (response 1) are dropped from the analysis and n
Yes for parametric tests: Z, t, F, and Pearson product moment correlation reduced by one.
(variables X and Y). Samples must have been drawn from a normal or large
population. Note: Normal Approximate to binomial requires np and nq ≥ 5,
Step 7: Lastly, if a random sample has been drawn, but none of the above tests
in order to approximate binomial distribution with a normal distribution.
meet the circumstances of the problem, refer to an advanced text on Statistical
No for non-parametric tests: Chi Square, Mann-Whitney, Wilcoxon, Spear- Methods.
man rank correlation, and Binomial, itself.

Step 5: Is equal variance assumption required?

Yes for Z and t two means tests and for Pearson Product Moment Correla-
tion (homoscedasticity, i.e., variance of Yi from the regression line is the
same for all values of X).

No for Two-proportions test, Chi Square, Mann-Whitney, Wilcoxon, and


Spearman Rank Correlation.

Step 6: Check other assumptions required for different models (tests) to have
valid application:

1. (Z) versus (t)

a. Use z when σ is unknown, or n ≥ 30.

b. Use t when ô is unknown and n ≤ 30.

2. Normal approximation to binomial

a. One sample case, n and n ≥ 5.

b. Two sample case, (n1 + n2) p and (n1 + n2) q ≥ 5

3. Chi Square test

a. No less than 20% of cells may have fe < 5, and none < 1

(All cells should have fe >5 if possible.)

4. For Mann-Whitney test and Spearman Rank Correlation, ties in ranks


receive the average of the tied ranks.

198
Methods for Summarizing Data

199
Section 4

Statistical Formulas

Statistical Formulas Sample variance

Numerical Descriptive techniques


=
Population mean

Population standard deviation

=
=

Sample mean Sample standard deviation

s=

Population covariance

Range

Largest observation - Smallest observation

Population variance
Sample covariance

200
Population coefficient of correlation Multiplication rule

P(A and B) = P(A|B)P(B)

Addition rule

Sample coefficient of correlation


P(A or B) = P(A) + P(B) - P(A and B)

Bayes’ Law Formula

Coefficient of determination

R2 = r2 Random Variables and Discrete Probability Distributions

Slope coefficient Expected value (mean)

E(X) =

y-intercept Variance

V(x) =

Standard deviation
Probability

Conditional probability
Covariance
P(A|B) = P(A and B)/P(B)

COV(X, Y) = σxy =
Complement rule

P( ) = 1 – P(A)

201
Coefficient of Correlation Mean and variance of a portfolio of two stocks

E(Rp) = w1E(R1) + w2E(R2)

V(Rp) = V(R1) + V(R2) + 2 COV(R1, R2)

Laws of expected value


= + +2
1. E(c) = c
Mean and variance of a portfolio of k stocks
2. E(X + c) = E(X) + c

3. E(cX) = cE(X) E(Rp) =

Laws of variance

1.V(c) = 0 V(Rp) =

2. V(X + c) = V(X)
Binomial probability
3. V(cX) = V(X)
P(X = x) =
Laws of expected value and variance of the sum of two variables

1. E(X + Y) = E(X) + E(Y)

2. V(X + Y) = V(X) + V(Y) + 2COV(X, Y)


Laws of expected value and variance for the sum of more than two variables

1.
Poisson probability

2. if the variables are independent P(X = x) =

202
Continuous Probability Distributions

Standard normal random variable


Standard error of the sample mean

Exponential distribution
Standardizing the sample mean


Expected value of the sample proportion


Variance of the sample proportion


F distribution

Standard error of the sample proportion


Sampling Distributions

Expected value of the sample mean

Standardizing the sample proportion

Variance of the sample mean

203
Expected value of the difference between two means Introduction to Hypothesis Testing

Test statistic for

Variance of the difference between two means

Standard error of the difference between two means Inference about One Population

Test statistic for


Standardizing the difference between two sample means

Confidence interval estimator of


Introduction to Estimation
Test statistic for

Confidence interval estimator of


Confidence interval Estimator of

Sample size to estimate


LCL =

UCL =

204
Test statistic for p Confidence interval estimator of the total in a small population

Confidence interval estimator of p Confidence interval estimator of p when the population is small

Sample size to estimate p


Confidence interval estimator of the total number of successes in a small
population


Confidence interval estimator of the total of a large finite population

Inference About Two Populations

Confidence interval estimator of the total number of successes in a large finite


Equal-variances t-test of
population

Confidence interval estimator of when the population is small


Equal-variances interval estimator of

205
Unequal-variances t-test of F-Estimator of

LCL =

UCL =
Unequal-variances interval estimator of

z-Test and estimator of


Case 1:

t-Test of

Case 2:

t-Estimator of
z-estimator of

F-test of

F= and

206
Analysis of Variance MST =

One-way analysis of variance


MSB =
SST =

MSE =

SSE =
F=

MST =
F=

MSE = Two-factor experiment

F= SS(Total) =

Two-way analysis of variance (randomized block design of experiment)


SS(A) =

SS(Total) =

SS(B) =

SST =

SS(AB) =

SSB =

SSE =

SSE =

207
F= Simple Linear Regression

Sample slope
F=

F=
Sample y-intercept

Least Significant Difference Comparison Method

LSD = Sum of squares for error

SSE =
Tukey’s multiple comparison method

Standard error of estimate

Chi-Squared Tests Test statistic for the slope

Test statistic for all procedures

Standard error of

208
Coefficient of determination Multiple Regression

Standard Error of Estimate


Prediction interval

Test statistic for


Confidence interval estimator of the expected value of y

Coefficient of Determination

Sample coefficient of correlation

Adjusted Coefficient of Determination


Adjusted

Test statistic for testing =0

Mean Square for Error



MSE = SSE/k

Mean Square for Regression

MSR = SSR/(n-k-1)

F-statistic

F = MSR/MSE

209
Durbin-Watson statistic

210
Section 5

Video Links

VIDEO LINKS The following video links are provided to aid in the review of the many concepts presented throughout
the text.
1. Introduction
Introduction to Quantitative Research, Measurement Levels, Frequency Tables, and Graphics:
2. Central Tendency

3. Sampling Distribution Basic Statistics:

4. Hypothesis Testing http://www.youtube.com/watch?v=yq6lEH9bt-g&feature=related

5. Statistical Inference

6. Regression Basic Terms and Concepts of Statistics:

http://www.youtube.com/watch?v=jhVdiIBQQHE&playnext=1&list=PLB815A3C15C587645

Data & Scales/Level of Measurement:

http://www.youtube.com/watch?v=5fDBOq5x5nk

Descriptive & Inferential Statistics:

http://www.youtube.com/watch?v=Nol6wS9Wj4M

211
Sampling Distribution of a Sample Proportion (Central Limit Theorem) Probability and Events:

http://www.youtube.com/watch?v=aE11B6deuNQ http://www.youtube.com/watch?v=BolCgB4YGMw

Sampling Distribution of the Sample Mean: Calculating the Probability of Simple Events:

http://www.youtube.com/watch?v=FXZ2O1Lv-KE&feature=related http://www.youtube.com/watch?v=BAjOEsU_mpE

Sampling and Nonresponse Bias: Probability Basics:

http://www.youtube.com/watch?v=qudsqhWBApA http://www.youtube.com/watch?v=Kuz3ZHLVj_k

Measures of Central Tendency, Measures of Dispersion, Correlation, Probability Basics with Excel:
Regression, Reliability, Validity, Probability
http://www.youtube.com/watch?v=88FCRYjyySc
Central Tendency:

http://www.youtube.com/watch?v=HP6ip-dDKxE
Basic Rules of Probability:
http://www.youtube.com/watch?v=7sg8bo_BGeM
http://www.youtube.com/view_play_list?p=2B567EA871FF4171
http://www.youtube.com/watch?v=81zcjULlh58

Correlation (Bivariate Data):


Dispersion:
http://www.youtube.com/watch?v=UPAmwBW65bY
http://www.youtube.com/watch?v=fl5KvqmhDA8

212
What is Correlation?: Sampling Distribution of Sample Mean & Central Limit Theorem:

http://www.youtube.com/watch?v=Ypgo4qUBt5o http://www.dailymotion.com/video/xd9fc5_how-to-graph-the-normal-
distributio_school

Regression Analysis Using Microsoft Excel 2007:


Probabilities for sampling distribution of sample mean:
http://www.youtube.com/watch?v=O0X8jAfApbk
http://www.youtube.com/watch?v=tgJFyzN0eAE

Properties of Distributions and the Sampling Distribution


How to do business statistics in excel:
A-level statistics: normal distribution p(x less than x)
http://www.youtube.com/watch?v=Ee7t5_ibAG4
http://www.youtube.com/watch?v=l8PzNJ5N_mU

Statistical Inference:
Applications of the Normal Distribution:
http://www.youtube.com/watch?v=zRKDDADMDO8
http://www.youtube.com/watch?v=bYnIIZbeFes

Population and Sampling Distribution:


The Normal Distribution:
http://www.youtube.com/watch?v=zRKDDADMDO8
http://www.studybeat.com/math/statistics/distributions/Statistics:%20The
%20Normal%20Distribution/1255640765/

Sampling Distribution of the Sample Mean:

How To Graph the Normal Distribution PDF in Excel: http://www.youtube.com/watch?


v=LGzuYlhfEO0&feature=pyv&ad=4006917158&kw=statistics%20sampling
http://www.dailymotion.com/video/xd9fc5_how-to-graph-the-normal- %20distribution
distributio_school

213
Hypothesis Testing, Sampling, Types of Errors, Statistical Significance, Effect Statistical Inference, Hypothesis Testing, Comparing Two Means, Comparing
Size, Power Two Proportions

Hypothesis Testing:

Hypothesis Test for a Mean: http://www.youtube.com/watch?v=abjHpJ36pIE

http://www.youtube.com/watch?v=tpdmnFWcSn0

Post hoc tests:

Hypothesis Testing: http://www.youtube.com/watch?v=8DHFTd-KjjI

http://www.youtube.com/watch?v=rHAxhlmbRPU http://www.youtube.com/watch?v=rZuYwJupGus

http://www.youtube.com/watch?v=kMxDtJL3RFY&feature=relmfu

Comparing means:

Statistics: Confidence Intervals (Difference in Means): http://www.youtube.com/watch?v=pBgLemMUohk

http://www.youtube.com/watch?v=9je5FMppLlU

Multiple Regression Analysis and Forecasting:

Excel Statistics: Confidence Intervals for Proportions: http://www.youtube.com/watch?v=t6KiRSSmQg4

http://www.youtube.com/watch?
v=2iclH6eysgY&playnext=1&list=PLC54F478AEBCE4EE4
Lean Sigma Search: Leveraging Lean & Six Sigma in Executive Search:

http://www.youtube.com/watch?v=Mr_4JtI2LvI
Statistical and Practical Significance:

http://www.youtube.com/watch?v=rOyK_K0SOaU

214
Six Sigma for Managers: Intro to SSFM

http://www.youtube.com/watch?v=vtspuAPsOl0

Non-parametric tolerance intervals:

http://www.youtube.com/watch?v=tlDO4KFkxeI

Simple and Multiple Regression/Predictions; and Non-parametric statistics

Simple regression:

http://www.youtube.com/watch?v=GCEyZxS6vn8

Simple Regression Model to Predict a Variable:

http://www.youtube.com/watch?v=wOmqP9auN0Y

Bivariate or simple regression:

http://www.youtube.com/watch?v=CSYTZWFnVpg

Non-parametric:

http://www.youtube.com/watch?v=W6SBqH_nlV4

215
References & References and Resources

Resources Afifi, A. A., May, S., and Clark, V. (2012). Practical multivariate analysis, 5th ed New York: Chapman &
Hall

The following references and resources are provided as a starting Akaike, H. (1973). Information theory as an extension of the maximum likelihood principle. in B. N.
point for the reader interested in learning more about any of the topics
covered in this text. No attempt has been made to provide an Petrov, and F. Csaki, (eds.) Second International Symposium on Information Theory. Budapest:
exhaustive list.
Akademiai Kiado, pp. 267-281.
There are many great books on statistics that focus on specific
techniques and procedures.
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic
Control AC 19, 716-723.

Anderson, T.W. (1971). The Statistical Analysis of Time Series. Wiley, New York

Anderson, T. W. and Darling, D. A. (1952). Asymptotic theory of certain 'goodness-of-nt' criteria


based on stochastic processes. Annals of Mathematical Statistics, 23, 193-212.

Anderson, T. W. and Darling, D. A. (1954). A test of goodness of fit. Journal of American Statistical
Association, 49, 765-769.

Ansneld, F. and Klotz, J. (1977). A phase in study comparing the clinical utility of four regimens of 5-
fluorouracil. Cancer, 39, 34-40.

Aitchison, J. and Dunsmore, I.R..(1975) Statistical Prediction Analysis. Cambridge 



University Press, Cambridge

Bailey, B. J. R. (1980). Large sample simultaneous confidence intervals for the multinomial
probabilities based on transformations of the cell frequencies.Technometrics, 22, 583-589

Bartlett, M. S. (1947). Multivariate analysis. Journal of the Royal Statistical Society, Series B, 9,
176-197.

Belsley, D. A., Kuh, E., and Welsch, R. E. (1980). Regression diagnostics: Identifying influential data
and sources of collinearity. New York: John Wiley & Sons

ccxvi
Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975). Discrete multivariate Dixon, W. J. (1992). BMDP statistical software manual. Berkeley: University of
analysis: Theory and practice. Cambridge, Mass.: McGraw-Hill. California Press

Block, J. (1960). On the number of significant findings to be expected by chance. Dixon, W. J. and Tukey, J. W. (1968). Approximate behavior of the distribution of
Psychometrika, 25, 369-380. winsorized t. Technometrics, 10, 83-98.

Bock, R. D. (1975). Multivariate statistical methods in behavioral research. New York: Efron, B. (1982). The jackknife, the bootstrap and other resampling plans. Vol. 38 of
McGraw-Hill CBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia, Penn.:
SIAM.
Boos, D. D. and Brownie, C. (1989). Bootstrap methods for testing homogeneity of
variances. Technometrics, Vol. 31, No. 1, 69-82. Efron, B. and LePage, R. (1992). Introduction to bootstrap. In R. LePage and L. Billard
(eds.), Exploring the Limits of Bootstrap. New York: John Wiley & Sons.
Box, G. E. P., and Tiao, G. C. (1973). Bayesian inference in statistical analysis.
Reading, Mass.: Addison-Wesley . Efron, B. and Tibshirani, R. J. (1993). An introduction to the bootstrap. New York:
Chapman & Hall.
Brillinger, D.R. (1975) . Time Series:Data Analysis and Theory. Holt, Rinehart and
Winston, New York Faith, D. P., Minchin, P., and Belbin, L. (1987). Compositional dissimilarity as a robust
measure of ecological distance. Vegetatio, 69, 57-68.
Burnham, K. P. and Anderson, D. R. (2002). Model selection and multimodel
inference: A practical information-theoretic approach. New York: Springer-Verlag . Feingold, M. and Korsog, P. E. (1986). The correlation and dependence between two F
statistics with the same denominator. The American Statistician, 40, 218-220.
Chambers, J. M. (1977). Computational methods for data analysis. New York: John
Wiley & Sons. Fisher, R. A. (1935). The design of experiments. London:Ohver & Boyd.

Cochran, W G. and Cox, G. M. (1957). Experimental designs, 2nd ed. New York: John Fleiss, J. L., Levin, B., and Paik, M. C. (2003). Statistical methods for rates and
Wiley & Sons. proportions, 3rd ed. New York: John Wiley & Sons.

Daniel, C. (1960). Locating outliers in factorial experiments. Technometrics, 2, Flury, B. (1997). A first course in multivariate analysis. New York: Springer-Verlag
149-156.
Goodman, L. A. and Kruskal, W. H. (1954). Measures of association for cross-
Dempster, A. P. (1969). Elements of continuous multivariate analysis. San Francisco: classification. Journal of the American Statistical Association, 49, 732-764.
Addison-Wesley.
Gower, J. C. (1985). Measures of similarity, dissimilarity, and distance. In Kotz, S. and
Davis, D. J. (1977). An analysis of some failure data. Journal of the American Johnson, N. L. Encyclopedia of Statistical Sciences, vol. 5. New York: John Wiley &
Statistical Association, 72, 113-150. Sons.

ccxvii
Hadi, A. S. (1994). A modification of a method for the detection of outliers in Kline,R. B. (2005). Principles and practices of structural equation modeling (2nd ed.)
multivariate samples. Journal of the Royal Statistical Society, Series (B), 56, 393-396. New York: Guilford Press.

Hand, D. J., Daly, E, Lunn, A. D., Mc Conway K. J., and Ostrowski, E. (1996). A Kutner, M. H. (1974). Hypothesis testing in Hnear models (Eisenhart Model I). The
handbook of small data sets. New York: Chapman & Hall. American Statistician, 28, 98-100..

Henze, N. and Zirkler, B (1990). A class of invariant consistent tests for multivariate Kutner, M. H., Nachtshiem, C. J., Neter, J. and Li, W. (2004). Applied Hnear statistical
normality. Communications in Statistics; Theory and Methods, 19, 3595-3618. models, 5th ed. Irwin: McGraw-Hill.

Hill, M. A. and Engelman, L. (1992). Graphical aids for nonlinear regression and Levene, H. (1960). Robust tests for equaHty of variances. I. Olkin, ed., Contributions to
discriminant analysis. Computational Statistics, Vol. 2, Y. Dodge and J. Whittaker, eds. Probability and Statistics. Palo Alto, CaHf: Stanford University Press, 278-292.
Proceedings of the 10th Symposium on Computational Statistics Physica-Verlag,
111-126 Little, R. J. A. and Rubin, D. B. (2002). Statistical analyses with missing data. New York:
John Wiley & Sons.
Hocking, R. R. (1983). Developments in linear regression methodology: 1959-82.
Technometrics, 25, 219-230 Longley, J. W. (1967). An appraisal of least-squares for the electronic computer from
the point of view of the user. Journal of the American Statistical Association, 62,
Hoyle, R. H. (1995). Structural equation modeling: Concepts, issues, and applications. 819-841
Thousand Oaks, CA
Mardia, K. V (1970). Measures of multivariate skewness and kurtosis with
Huff, D. (1993). How to Lie With Statistics. W. W. Norton & Company. applications. Biometrika, 58, 519-530.

James, A. D (1984). Extending Rosenberg's Technique for Standardizing Percentage Mardia, K. V, Kent, J. T, and Bibby, J. M. (1979). Multivariate analysis. London:
Tables. Social Forces, 62, 3, 679-708. Academic Press.

John, P. W. M. (1971). Statistical design and analysis of experiments. New York: Mendenhall, W., Beaver, R. J., and Beaver B. M. (2002). A brief introduction to
MacMillan . probability and statistics. Pacific Grove, CA: Duxbury Press.

Judge, G. G., Griffiths, W. E., Lutkepohl, H., Hill, R. C, and Lee, T. C. (1988). Miller, R. (1985). Multiple comparisons. Kotz, S. and Johnson, N. L., eds.,
Introduction to the theory and practice of econometrics, 2nd ed. New York: John Encyclopedia of Statistical Sciences, vol. 5. New York: John Wiley & Sons, 679-689.
Wiley & Sons, pp. 275-318, pp. 453-454.
MilHken, G. A. and Johnson, D. E. (1984). Analysis of messy data, Vol. 1: Designed
Kaplan, D. (2000). Structural equation modeling: Foundations and extensions. Experiments. New York: Van Nostrand Reinhold CompanyKendall, M. G., Stuart, A.,
Thousand Oaks, CA: Sage Ord, J. K., and Arnold, S. (1999). Kendall's advanced theory of statistics, Volume 24,
London: Hodder Arnold.

ccxviii
Montgomery, D. C. (2005). Introduction to statistical quality control, 5th ed. New York: Santner, T.J. and Duffy E. D. (1989) Statistical Analysis of Discrete Data. Springer,
John Wiley & Sons.. New York .

Montgomery, D. C, Peck, E. A., and Vining, G. G. (2001). Introduction to Linear Schachter, S. (1959). The psychology of affiliation: Experimental studies of the
Regression Analysis, 3rd ed. New York: John Wiley sources of gregrariousness. Stanford, CA: Stanford University Press.

Morrison, D. F. (2004). Multivariate statistical methods. 4th ed. Pacific Grove CA: Scheffe, H. (1959). The analysis of variance. New York: John Wiley & Sons.
Duxbury Press.
Shapiro, S. S. and Wilk, M. B. (1965). An analysis of variance test for normality
Morrison, A. S., Black, M. M., Lowe, C. R., MacMahon, B., and Yuasa, S. Y. (1990). (complex samples). Biometrika, 52, 591-611.
Some international differences in histology and survival in breast cancer. International
Journal of Cancer, 11, 261-267 Shye, S.[Ed]. (1978). Theory construction and data analysis in the behavioral sciences.
San Francisco: Jossey-Bass.
Nelson, L. S (1998). The Anderson-Darling test for normality. Journal of Quality
Technology, 30-3, 298-299. Snedecor, G W. and Cochran, W. G (1989). Statistical methods, 8th ed. Ames: Iowa
State University Press.
Noreen, E. W. (1989). Computer intensive methods for testing hypotheses: An
introduction. New York: John Wiley & Sons Spicer, C. C. (1972). Calculation of power sums of deviations about the mean. Applied
Statistics, 21, 226-227.
Rosenberg, M. (1962). Test Factor Standardization as a Method of Interpretation.
SocialForces, 53-61. Stephens, M. A. (1982). Anderson-Darling test of goodness of fit. Encyclopedia of
Statistical Sciences: Volumel (Edited by Kotz, S. and Johnson, N.L). New York: John
Ott, L. R. and Longnecker, M. (2001). Statistical methods and data analysis, 5th ed. Wiley & Sons, 81-85.
Pacific Grove, CA: Duxbury Press.
Timm, N. H. (2002). Applied multivariate analysis. New York: Springer-Verlag .
Press, S. J. (1989). Bayesian statistics: principles, models and applications. New York:
John Wiley & Sons Trader, R. L. (1986). Bayesian regression. In Johnson, N. L. and Kotz, S. (eds.)
Encyclopedia of Statistical Sciences New York: John Wiley & Sons, 7, 677-683.
Rao, C. R. (1973). Linear statistical inference and its applications, 2nd ed. New York:
John Wiley & Sons. (note; paperback reprint edition 2002) Tukey, J. and McLaughlin, D. (1963). Less vulnerable confidence and significance
procedures for location based on a single sample: trimming/winsorization. Sankhya,
Salkind, N.J. (2007). Statistics for People Who (Think They) Hate Statistics. Sage A 25, 331-352.
Publications
Tukey, J. W. (1958). Bias and confidence in not quite large samples. Annals of
Mathematical Statistics, 29, 614.

ccxix
Vogt, W.P (2005). Dictionary of Statistics & Methodology: A Nontechnical Guide for
the Social Sciences

Weaver, A. (2005). Good-Natured Statistics. Bookman Publishing

Weisberg, S. (2005). Applied linear regression. 3rd ed. Hoboken, N. J.: Wiley-
Interscience.

Wilkinson, L. and Dallal, G. E. (1977). Accuracy of sample moments calculations


among widely used statistical programs. The American Statistician, 31,128-131.

Wilkinson, J. H. and Reinsch, C. (Eds.). (1971). Linear Algebra, Vol. 2, Handbook for
automatic computation. New York: Springer-Verlag.

Winer, B. J., Brown, D. R. and Michels, K. M. (1991). Statistical principles in


experimental design, 3rd ed. New York: McGraw-Hill.

Wludyka, P. S. and Nelson, P. R. (1997). An analysis-of-means-type test for variances


from normal populations. Technometrics, 39:3, 274-285.

Zellner, A. (1971). An introduction to Bayesian inference in econometrics. New York:


John Wiley & Sons.

ccxx
Statistics for!
! ! !
Better ! ! !
Business !!
Decisions
by

Dr. Gordon W. McClung

ISBN-10: 0988216000

ISBN-13: 978-0-9882160-0-6

© 2012 Rhumb Line Publishing, Morgantown, WV 26501

ccxxi
Alpha

Alpha is the inverse of the confidence level. Alpha = 1 - confidence level. Alpha is the critical test value used for
comparison to the observed p-Value. At a 95% confidence level alpha is equal to 5% or .05.

Related Glossary Terms


Confidence Interval

Index Find Term


Alternate Hypothesis

Whenever the null hypothesis is tested, the alternative hypotheses must be stated because it determines whether
a one tail or two tail test is to be made.

Whenever the null hypothesis is accepted, the alternative hypothesis is rejected. Likewise, whenever the null
hypothesis is rejected, the alternative hypothesis must be accepted.

Related Glossary Terms


Confidence Interval, Null Hypothesis, Probability

Index Find Term


ANOVA

A collection of statistical models, and their associated procedures, in which the observed variance in a particular
variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA
provides a statistical test of whether or not the means of several groups are all equal, and therefore generalizes t-
test to more than two groups. Doing multiple two-sample t-tests would result in an increased chance of
committing a type I error. For this reason, ANOVAs are useful in comparing two, three, or more means.

Related Glossary Terms


Variance

Index Find Term


Average

The result obtained by adding several quantities together and then dividing this total by the number of quantities;
the mean.

Related Glossary Terms


Central Limit Theorem, Central Tendency, Mean, Median, Mode

Index Find Term


Beta

In statistics, the beta is the unit-contribution of a variable to the value of the outcome variable. Weighted
regression is perhaps the easiest form of multiple regression analysis, a method in which two or more variables are
used to predict the value of an outcome.

At a conceptual level, we use multiple regression so several variables can be considered at the same time for their
effect on an outcome of interest.

Related Glossary Terms


Linear Equation, Multiple Regression, Regression

Index Find Term


Between Groups Variation

The variation due to the interaction between the samples is denoted SS(B) for Sum of Squares Between groups. If
the sample means are close to each other (and therefore the Grand Mean) this will be small. There are k samples
involved with one data value for each sample (the sample mean), so there are k-1 degrees of freedom.

The variance due to the interaction between the samples is denoted MS(B) for Mean Square Between groups. This
is the between group variation divided by its degrees of freedom.

Related Glossary Terms


Mean Squared, Mean Squared Error, Mean Squared Treatment, Total Variation, Within
Group Variation

Index Find Term


Bias

A statistic is biased if it is calculated in such a way that is systematically different from the population parameter of
interest. Characteristics of an experimental or sampling design, or the mathematical treatment of data, that
systematically affects the results of a study so as to produce incorrect, unjustified, or inappropriate inferences or
conclusions.

Related Glossary Terms


Unbiased

Index Find Term


Carat Symbol

The carat or “hat” is used over a letter to represent an estimate of the letter. For example: means an estimate
of σ. The sub-letter used with another letter is some what like and adjective to further describe the letter. For

example, (sigma x-bar) is read, “standard deviation of x-bar” or more generally, “standard deviation of

means.”

Related Glossary Terms


Sampling Distribution Estimates Based on Sample

Index Find Term


Categorical Variables

When variables are categorical, frequency tables (crosstabulations) provide useful summaries.

Related Glossary Terms


Nominal

Index Find Term


Central Limit Theorem

The central limit theorem states that the mean of a sufficiently large number of independent random variables,
each with finite mean and variance, will be approximately normally distributed.

Essentially, the Central Limit Theorem, tells us that if we take the mean of the samples (n) and plot the frequencies
of their mean, we get a normal distribution. And as the sample size (n) increases, approaches infinity, we find a
normal distribution.

See: Law of Large Numbers

Related Glossary Terms


Law of Large Numbers

Index Find Term


Central Tendency

Statistical measures of central tendency include the mean, median and mode.

Related Glossary Terms


Central Limit Theorem, Mean, Median, Mode

Index Find Term


Chi-square

A statistical method assessing the goodness of fit between observed values and those expected theoretically.

Related Glossary Terms


Level of Measurement, Non-parametric Models

Index Find Term


Cluster Sampling

Cluster sampling is a technique used when relatively homogeneous groupings are evident in a statistical
population. The total population is divided into groups (or clusters) and a simple random sample of the groups is
selected. Then the required information is collected from a simple random sample of the elements within each
selected group.

Related Glossary Terms


Drag related terms here

Index Find Term


Coefficient of Determination

The coefficient of determination R is used in the context of statistical models whose main purpose is the
prediction of future outcomes on the basis of other related information. It is the proportion of variability in a data
set that is accounted for by the statistical model. It provides a measure of how well future outcomes are likely to
be predicted by the model.

Related Glossary Terms


Correlation Coefficient R, R, R-square

Index Find Term


Confidence Interval

A confidence interval (CI) is an interval estimate of a population parameter and is used to indicate the reliability of
an estimate. It is an observed interval (i.e. it is calculated from the observations), in principle different from sample
to sample, that frequently includes the parameter of interest, if the experiment is repeated. How frequently the
observed interval contains the parameter is determined by the confidence level or confidence coefficient.

From a normal approximation, we can build a 95% symmetric confidence interval that gives us a specific idea of
the variability of our estimate. We would expect that 95 intervals out of a hundred constructed would cover the real
population mean age. Remember, population mean age is not necessarily at the center of the interval that we just
constructed, but we do expect the interval to be close to it.

Related Glossary Terms


Alpha, Alternate Hypothesis, Null Hypothesis, Probability

Index Find Term


Continuous Probability Distribution

If a random variable is a continuous variable, its probability distribution is called a continuous probability
distribution. The equation used to describe a continuous probability distribution is called a probability density
function. The probability that a random variable assumes a value between a and b is equal to the area under the
density function bounded by a and b.

Related Glossary Terms


Discrete Probability Distribution

Index Find Term


Contrast

A contrast is a linear combination of 2 or more factor level means with coefficients that sum to zero.

Two contrasts are orthogonal if the sum of the products of corresponding coefficients (i.e., coefficients for the
same means) adds to zero.

Related Glossary Terms


Drag related terms here

Index Find Term


Correlation

Correlation is a measure of interdependence of variable quantities. A measure of the extent of interdependence


between measures. The Pearson product-moment correlation coefficient (is typically denoted by r) is a measure of
the correlation (linear dependence) between two variables X and Y, giving a value between +1 and −1 inclusive. It
is widely used in the sciences as a measure of the strength of linear dependence between two variables.

Related Glossary Terms


Correlation Coefficient R, Correlational Statistics

Index Find Term


Correlation Coefficient R

Pearson product-moment correlation coefficient, also known as r, R, or Pearson's r, a measure of the strength and
direction of the linear relationship between two variables.

Related Glossary Terms


Coefficient of Determination, Correlational Statistics

Index Find Term


Correlational Statistics

Correlational statistics are a special subgroup of descriptive statistics, which are described separately. The
purpose of correlational statistics is to describe the relationship between two or more variables for one group of
participants.

Related Glossary Terms


Chi-square, Correlation, Correlation Coefficient R

Index Find Term


Cronbach’s Alpha

Cronbach's alpha is a lower bound for test reliability and ranges in value from 0 to 1 (negative values can occur when
items are negatively correlated). Alpha can be viewed as the correlation between the items (variables) selected and all
other possible tests or scales (with the same number of items) constructed to measure the characteristic of interest.

Related Glossary Terms


Drag related terms here

Index Find Term


Data

The observations that researchers make result in data. The data might be the brands participants plan to purchase
or the data might be respondent scores on a scale that measures preference. In this context, variables are things
that we measure, control, or manipulate in research. The participants (respondents) with the variables represent
our data. Think of the data file as a spreadsheet in Excel with each respondent represented by a row of data and
each variable represented by a column.

Related Glossary Terms


Drag related terms here

Index Find Term


Deductive

Closely reasoned. If (a) and (b) are true, (c) must be true. This logic is used in making mathematical proofs.

Related Glossary Terms


Inductive

Index Find Term


Dependent Samples

A sampling method is dependent when the individuals selected to be in one sample are used to determine the indi-
viduals to be in the second sample. Dependent samples are often referred to as matched-pairs samples. In other
words, statistical inference methods on matched pairs data use the same methods as inference on a single popula-
tion mean, except that the differences are analyzed.

Related Glossary Terms


Independent Samples

Index Find Term


Descriptive Statistics

Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data.
Descriptive statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive
statistics aim to summarize a data set, rather than use the data to learn about the population that the data are
thought to represent. This generally means that descriptive statistics, unlike inferential statistics, are not developed
on the basis of probability theory. Even when a data analysis draws its main conclusions using inferential statistics,
descriptive statistics are generally also presented.

Related Glossary Terms


Level of Measurement

Index Find Term


Discrete Probability Distribution

With a discrete probability distribution, each possible value of the discrete random variable
can be associated with a non-zero probability. Thus, a discrete probability distribution can
always be presented in tabular form.

Related Glossary Terms


Continuous Probability Distribution

Index Find Term


Dispersion

Statistical analyses also commonly use measures of dispersion (spread), such as the range, interquartile range, or
standard deviation.

Related Glossary Terms


Drag related terms here

Index Find Term


Efficient

A statistic is efficient if the spread of the sampling distribution around the population parameter being estimated is
small. Or, in comparison of one statistic to another statistic, the one whose sampling distribution spreads less
around the parameter, is more efficient.

Related Glossary Terms


Drag related terms here

Index Find Term


Empiricism

Empiricism, the scientific method, refers to using direct observation to obtain knowledge. Thus, the empirical
approach to acquiring knowledge is based on making observations of individuals or objects of interest. As
illustrated by the gas price example, everyday observation is an application of the scientific approach.

Related Glossary Terms


Drag related terms here

Index Find Term


Experiment

An experiment is a study in which treatments are given to see how the participants respond to them. We all
conduct informal experiments in our everyday lives.

Related Glossary Terms


Treatment

Index Find Term


Finite Correction Factor

Whenever the sample from a finite population equals or exceeds 10% of the total population, i.e., n ≥ 10% N, the
following correction factor is used to compensate for the inherent changes in the underlying probabilities during
the sampling process.

Related Glossary Terms


Drag related terms here

Index Find Term


Formula

There are many equivalent formulas used in statistical calculations. The many variations have been derived largely
as an aid in the process of computation.

Related Glossary Terms


Drag related terms here

Index Find Term


Frequency

The rate at which something occurs or is repeated in a given sample.

Related Glossary Terms


Drag related terms here

Index Find Term


Frequency Tables

When variables are categorical, frequency tables (crosstabulations) provide useful summaries.

Related Glossary Terms


Drag related terms here

Index Find Term


Generalizability

Generalizability refers to the appropriateness of applying findings from a study to a larger population.
Generalizability requires random selection. If participants in a study are randomly selected from a larger
population, it is appropriate to generalize study results to the larger population; if not, it is not appropriate to
generalize.

Related Glossary Terms


Population, Random Sample, Random Sampling, Sample

Index Find Term


Greek Letters

Greek letters (like µ, π, σ) are generally used when referring to populations, while common letters like (s, p) are
used to refer to samples.

Related Glossary Terms


Drag related terms here

Index Find Term


Hypothesis

A supposition or proposed explanation made on the basis of limited evidence as a starting point for further
investigation.

Related Glossary Terms


Drag related terms here

Index Find Term


Hypothesis Setting

There are three ways to set up the null and alternative hypotheses:

Equal versus not equal hypothesis (two-tailed test)

• H0: parameter = some value

• H1: parameter ≠ some value

Equal versus less than (left-tailed test)

• H0: parameter = some value

• H1: parameter < some value

Equal versus greater than (right-tailed test)

• H0: parameter = some value

• H1: parameter > some value

Related Glossary Terms


Drag related terms here

Index Find Term


Independent Samples

A sampling method is independent when the individuals selected for one sample do not dictate which individuals
are to be in a second sample.

Related Glossary Terms


Dependent Samples

Index Find Term


Inductive

X can be explained relative to several sets of empirical data. Thus, one must either include total evidence or have
only a “potential explanation.” Most applications of statistical reasoning in business situations are only potential
explanations. It is generally impossible or too expensive to obtain total evidence.

Related Glossary Terms


Deductive

Index Find Term


Inferential Statistics

Statistical inference is the process of drawing conclusions from data subject to random variation. Inferential
statistics (or inductive statistics) aim to use the data to learn about the population that the data are thought to
represent.

Inferential statistics are tools that tell us how much confidence we can have when generalizing from a sample to a
population.

Related Glossary Terms


Drag related terms here

Index Find Term


Interquartile Range

Statistical analyses also commonly use measures of dispersion, such as the range, interquartile range, or standard
deviation. The interquartile range (IQR) is a measure of statistical dispersion, being equal to the difference between
the upper and lower quartiles, IQR = Q3 −  Q1

Related Glossary Terms


Drag related terms here

Index Find Term


Interval

In most statistical analyses, interval and ratio measurements are analyzed in the same way. However, there is a
difference between these two levels. An interval scale does not have an absolute zero. For instance, if we measure
intelligence, we do not know exactly what constitutes absolutely zero intelligence and thus cannot measure the
zero point. In contrast, a ratio scale has an absolute zero point on its scale. For instance, we know where the zero
point is when we measure height.

Related Glossary Terms


Level of Measurement

Index Find Term


Kurtosis

Is a measure of the "peakedness" of the probability distribution of a real-valued random variable. Kurtosis is a
descriptor of the shape of a probability distribution.

Related Glossary Terms


Drag related terms here

Index Find Term


Law of Large Numbers

The Law of Large Numbers tells us that if we take a sample (n) observations of our random variable & average the
observation (mean)-- it will approach the expected value E(x) of the random variable.

See: The Central Limit Theorem

Related Glossary Terms


Central Limit Theorem

Index Find Term


Level of Measurement

Measurement can be defined as the assignment of numerals to objects or events according to rules. There are
four basic levels of measure: nominal, ordinal, interval, and ratio. The importance of the level of measure is
realized in the operations to produce the scale and in the mathematical operations that are permissible with each
level.

Related Glossary Terms


Interval, Nominal, Ordinal, Ratio

Index Find Term


Linear Equation

A linear equation is an algebraic equation in which each term is either a constant or the product of a constant and
(the first power of) a single variable.

Y= a + bX is a form of linear equation.

Related Glossary Terms


Beta, Multiple Regression, Regression

Index Find Term


Matched-Pairs

A sampling method is dependent when the individuals selected to be in one sample are used to determine the
individuals to be in the second sample. Dependent samples are often referred to as matched-pairs samples. In
other words, statistical inference methods on matched pairs data use the same methods as inference on a single
population mean, except that the differences are analyzed.

Related Glossary Terms


Dependent Samples, Independent Samples

Index Find Term


Mean

The result obtained by adding several quantities together and then dividing this total by the number of quantities;
the mean.

Related Glossary Terms


Average, Central Limit Theorem, Central Tendency, Median, Mode

Index Find Term


Mean Squared

In general, the mean square of a set of values is the arithmetic mean of the squares of their differences from some
given value, namely their second moment about that value.

When the mean square is regarded as an estimator of certain parental variance components the sum of squares
about the observed mean is divided by the number of degrees of freedom, not the number of observations

Related Glossary Terms


Mean Squared Error, Sum of Squares

Index Find Term


Mean Squared Error

Mean squared error (MSE) equals the sum of the variance and the squared bias of the estimator. An estimator is
used to infer the value of an unknown parameter in a statistical model. Bias is the difference between this
estimator's expected value and the true value of the parameter being estimated. The MSE provides a means of
choosing the best estimator: a minimal MSE often, but not always, indicates minimal variance, and thus a good
estimator. Like variance, mean squared error has the disadvantage of heavily weighting outliers.

Related Glossary Terms


ANOVA, Between Groups Variation, Mean Squared, Sum of Squares, Total Variation, Within
Group Variation

Index Find Term


Mean Squared Treatment

The variation due to the interaction between the samples is denoted SS(B) for Sum of Squares Between groups. If
the sample means are close to each other (and therefore the Grand Mean) this will be small. There are k samples
involved with one data value for each sample (the sample mean), so there are k-1 degrees of freedom.

The variance due to the interaction between the samples is denoted MS(B) for Mean Square Between groups. This
is the between group variation divided by its degrees of freedom.

Related Glossary Terms


Between Groups Variation, Within Group Variation

Index Find Term


Median

A measure of central tendency, the median is a numerical value separating the higher half of a sample, a
population, or a probability distribution, from the lower half.

Related Glossary Terms


Drag related terms here

Index Find Term


Mode

A measure of central tendency, the mode is the most frequently occurring score. The mode and median should be
used when the data is skewed.

Related Glossary Terms


Drag related terms here

Index Find Term


Multiple Regression

In statistics, linear regression is an approach to modeling the relationship between a dependent variable y and one
or more explanatory variables denoted X. The case of one explanatory variable is called simple regression. More
than one explanatory variable is multiple regression.

Related Glossary Terms


Linear Equation, R, R-square, Regression

Index Find Term


Nominal

The lowest level of measurement is nominal (also known as categorical). It is helpful to think of this level as the naming level
because names (i.e., words) are used instead of numbers.

Related Glossary Terms


Level of Measurement

Index Find Term


Non-parametric Models

Among the models used for statistical inference are two general groups: Parametric and Non-Parametric. The
parametric models, such as Z or t, and Pearson product moment correlation require that the sample be drawn
from a population with prescribed “shape”, usually a normal curve. The non-parametric models such as
Spearman rank correlation and the Mann-Whitney U test are termed “distribution-free” models because they do
not require any prescribed population shape.

Related Glossary Terms


Parametric Models

Index Find Term


Non-probability Samples

With non-probability sampling methods, we do not know the probability that each population element will be
chosen, and/or we cannot be sure that each population element has a non-zero chance of being chosen.

Non-probability sampling methods offer two potential advantages - convenience and cost. The main disadvantage
is that non-probability sampling methods do not allow you to estimate the extent to which sample statistics are
likely to differ from population parameters. Only probability sampling methods permit that kind of analysis.

Related Glossary Terms


Random Sampling

Index Find Term


Normal Distribution

In probability theory, the normal distribution is a continuous probability distribution that has a bell-shaped
probability density function, known informally the bell curve.

The bell curve is a function that represents the distribution of random variables as a symmetrical bell-shaped
graph.

Related Glossary Terms


Central Tendency, Kurtosis, Law of Large Numbers, Skewness, Spread

Index Find Term


Null Hypothesis

The hypothesis that there is no significant difference between specified populations, any observed difference
being due to sampling or experimental error. The two tailed test holds when the hypothesis is of equality. When
the hypothesis is directional, either greater than or less than, we use a one tailed test.

Whenever the null hypothesis is tested, the alternative hypotheses must be stated because it determines whether
a one tail or two tail test is to be made.

Whenever the null hypothesis is accepted, the alternative hypothesis is rejected. Likewise, whenever the null
hypothesis is rejected, the alternative hypothesis must be accepted.

Related Glossary Terms


Alternate Hypothesis, Confidence Interval

Index Find Term


Observational Study

An observational study is one in which data are collected on individuals in a way that doesn't affect them. The
most common nonexperimental study is the observational survey. Surveys are questionnaires that are presented
to individuals who have been selected from a population of interest. Surveys take on many different forms: paper
surveys sent through the mail; Web sites; call-in polls conducted by TV networks; and phone surveys.

Related Glossary Terms


Observations

Index Find Term


Observations

The observations that researchers make result in data. The data might be the brands participants plan to purchase
or the data might be respondent scores on a scale that measures preference. In this context, variables are things
that we measure, control, or manipulate in research. The participants (respondents) with the variables represent
our data. Think of the data file as a spreadsheet in Excel with each respondent represented by a row of data and
each variable represented by a column.

Related Glossary Terms


Observational Study

Index Find Term


Ogive

A cumulative frequency graph.

Related Glossary Terms


Drag related terms here

Index Find Term


Ordinal

Ordinal measurement puts participants in rank order from high to low, but it does not indicate how much higher or lower
one participant is in relation to another.

Related Glossary Terms


Level of Measurement

Index Find Term


Orthogonal Contrast

Two contrasts are orthogonal if the sum of the products of corresponding coefficients (i.e., coefficients for the
same means) adds to zero.

Related Glossary Terms


Drag related terms here

Index Find Term


Parameter

Measure of the population. No inference is required.

Related Glossary Terms


Drag related terms here

Index Find Term


Parametric Models

Among the models used for statistical inference are two general groups: Parametric and Non-Parametric. The
parametric models, such as Z or t, and Pearson product moment correlation require that the sample be drawn
from a population with prescribed “shape”, usually a normal curve. The non-parametric models such as
Spearman rank correlation and the Mann-Whitney U test are termed “distribution-free” models because they do
not require any prescribed population shape.

Related Glossary Terms


Non-parametric Models

Index Find Term


Pearson Product-Moment Correlation

In statistics, the Pearson product-moment correlation coefficient (is typically denoted by r) is a measure of the
correlation (linear dependence) between two variables X and Y, giving a value between +1 and −1 inclusive. It is
widely used in the sciences as a measure of the strength of linear dependence between two variables.

Related Glossary Terms


Correlation, Correlation Coefficient R, Correlational Statistics, Parametric Models

Index Find Term


Point Estimate

We cannot make a probability statement about a point estimate. The parameter is either equal to the point or not.
But we can describe the “goodness” of a point estimate as:

1. Efficient – the spread around the parameter value is small.

2. Unbiased – not a consistently higher or lower estimate than the parameter value.

Related Glossary Terms


Drag related terms here

Index Find Term


Population

A population is all the objects that both belong to the same group. A population is a well-defined collection of
objects, items, numbers, etc. with mean:

and standard deviation

Size of population = N

A population consists of all members of a group in which a researcher has an interest. It may be small, such as all
doctors affiliated with a particular hospital, or it may be large, such as all college seniors in a state.

Related Glossary Terms


Parameter, Statistic

Index Find Term


Population Estimates Based on Sample

is an estimate of population standard deviation based on sample data and corrected for bias. The standard
deviation of the sample itself, is often used as an estimate of the population standard deviation, although it is a
biased estimate. The amount of bias (error) diminishes as n increase and is usually negligible for n > 30.

Estimate of population variance based on a sample.

Pooled estimate of population standard deviation based on two or more samples.

Ratio of the estimates of two population variances based on two samples.

σ̂ 12
σ̂ 22

Related Glossary Terms


Drag related terms here

Index Find Term


Population Mean

Average of scores for the population. The mean equals:

Number of observations = N

Related Glossary Terms


Drag related terms here

Index Find Term


Population Sampling Distribution (of Means) and Sample (s)

Population Sampling Distribution (of Means) and Sample (s) with a mean of equal to the population mean:

and a standard deviation equal to:

The distribution is made up of multiple samples of size n drawn from the population. The mean of each sample

and standard deviation of each sample equal to

The relationship of the population to the sampling distribution of means makes it possible to state inferences
about populations in terms of probabilities associated with the appropriate sampling distribution; all starting with
actual information from one sample.

Related Glossary Terms


Drag related terms here

Index Find Term


Population Standard Deviation

The standard deviation is a measure of the spread (variability in scores) in the population standard deviation in the
units of measure.

Related Glossary Terms


Drag related terms here

Index Find Term


Population Variance

Variance is a measure of variability in the population expressed as units of measure squared.

Related Glossary Terms


Drag related terms here

Index Find Term


Probability

Probability is the branch of mathematics that studies the possible outcomes of given events together with the
outcomes' relative likelihoods and distributions.

Probability is the chance that a particular event will occur expressed on a linear scale from 0 (impossibility) to 1
(certainty), also expressed as a percentage between 0 and 100%.

Related Glossary Terms


Statistics

Index Find Term


Proportion

A measure of a part, share, or number considered in comparative relation to a whole.

Related Glossary Terms


Drag related terms here

Index Find Term


R

Pearson product-moment correlation coefficient, also known as r, R, or Pearson's r, a measure of the strength and
direction of the linear relationship between two variables.

Related Glossary Terms


Drag related terms here

Index Find Term


R-square

In statistics, the coefficient of determination R is used in the context of statistical models whose main purpose is
the prediction of future outcomes on the basis of other related information. It is the proportion of variability in a
data set that is accounted for by the statistical model. It provides a measure of how well future outcomes are likely
to be predicted by the model.

Related Glossary Terms


Drag related terms here

Index Find Term


Random Sample

A random sample from a finite population is a sample that has been selected by a procedure with the following
properties:

• The procedure assigns a known probability to each element in the population.

• If a given element has been selected, then the probability of selecting the remaining items is uniformly
affected.

• This means that the selection of one item does not affect the selection of any other particular items; they are in
no way “tied together.”

Stated differently, this means:

• Events are independent

• Underlying probabilities remained unchanged in drawing the sample.

Related Glossary Terms


Generalizability

Index Find Term


Random Sampling

Sampling is concerned with the selection of a subset of individuals from within a population to estimate
characteristics of the whole population.

Related Glossary Terms


Generalizability, Non-probability Samples

Index Find Term


Range

Statistical analyses also commonly use measures of dispersion, such as the range, interquartile range, or standard
deviation. The range is a measure of the difference between the high and low measures from a series of
observations.

Related Glossary Terms


Drag related terms here

Index Find Term


Ratio

In most statistical analyses, interval and ratio measurements are analyzed in the same way. However, there is a
difference between these two levels. An interval scale does not have an absolute zero. For instance, if we measure
intelligence, we do not know exactly what constitutes absolutely zero intelligence and thus cannot measure the
zero point. In contrast, a ratio scale has an absolute zero point on its scale. For instance, we know where the zero
point is when we measure height.

Related Glossary Terms


Level of Measurement

Index Find Term


Regression

Regression analysis is a statistical technique for estimating the relationships among variables. There are several
types of regression:

◦ Linear regression model

◦ Simple linear regression

◦ Logistic regression

◦ Nonlinear regression

◦ Nonparametric regression

◦ Robust regression

◦ Stepwise regression

Related Glossary Terms


Linear Equation, Multiple Regression, R, R-square

Index Find Term


Relationship between Population, Sampling Distribution &
Sample

Relationship between Population, Sampling Distribution & Sample:

A. If the population is normally distributed: If random samples of size (n) are taken from a normal population with

mean (µ) and standard deviation (σ), then the sampling distribution of the sample means ( ) is also a normal

distribution with mean ( = µ) and standard deviation .

B. If the population is large, but not necessarily normally distributed, the Central Limit Theorem applies. If random
samples of size (n) are taken from a large population with mean (µ) and standard deviation (σ), then the sampling

distribution of the sample means ( ) is approximately normal with mean approximately equal to µ and the

standard deviation is approximately equal to , PROVIDED that the sample size is large, (where large
means n ≥ 30).

Related Glossary Terms


Drag related terms here

Index Find Term


Reliability

Consist; the ability of a person or system to perform and maintain its functions.

Related Glossary Terms


Validity

Index Find Term


Research

A major distinction between research and everyday observation is that research is planned in advance. Based on a
theory or hunch, researchers develop research questions and then plan what, when, where and how to observe in
order to answer the questions.

Related Glossary Terms


Empiricism, Experiment, Observational Study, Observations, Scientific Method

Index Find Term


Sample

A sample is one subset of all possible subsets which may be selected from a population. When populations are
large, researchers usually sample. A sample is a subset of a population. For instance, we might be interested in
the attitudes of all registered voters in California toward the economy. The registered voters would constitute the
population. If we administered an attitude scale to all these voters, we would be studying the population, and the
summarized results (such as averages) would be referred to as parameters. If we studied only a sample of the
voters, the summarized results would be referred to as statistics.

Related Glossary Terms


Generalizability

Index Find Term


Sample Mean

In statistics the mean may be :

• the arithmetic mean of a sample.

• • the expected value of a random variable.

• • the mean of a probability distribution.

There are other statistical measures of central tendency that should not be confused with means - including the
'median' and 'mode'. Statistical analyses also commonly use measures of dispersion, such as the range,
interquartile range, or standard deviation.

A measure of central tendency, the mean of the sample, is the average of observations expressed in the units of
measure.

Related Glossary Terms


Drag related terms here

Index Find Term


Sample Standard Deviation

Calculation of the measure of variability in a sample expressed in units of measure.

Related Glossary Terms


Sample Variance, Spread, Standard Deviation

Index Find Term


Sample Variance

Sample variance is a measure of variability in the observed values for the sample expressed in units of measure
squared.

Related Glossary Terms


Sample Standard Deviation, Spread, Standard Deviation

Index Find Term


Sampling Distribution

Sampling distribution is the set (collection) of all possible subsets of a given size (n) which may be selected from a
population. A subset (sample) is different.

The number of possible samples of size n which can be taken from a finite population of size N is =

Related Glossary Terms


Drag related terms here

Index Find Term


Sampling Distribution Estimates Based on Sample

Estimate of standard deviation (error) or the mean based on (a) sample.

Estimate of standard deviation (error) of the difference between means based on two samples.

Estimate of standard deviation (error) of the sampling distribution of the proportion based on (a) sample
proportion.

Estimate of standard deviation (error) of the sampling distribution of the difference between proportions
based on proportions from two samples.

Related Glossary Terms


Carat Symbol

Index Find Term


Sampling Distribution Mean

Mean of the sampling distribution of means is the average of samples and theoretically equal to the population
mean.

Related Glossary Terms


Drag related terms here

Index Find Term


Sampling Distribution Standard Deviation

Sampling distribution standard deviation is a measure of variability in the sampling distribution of means
expressed in units of measure.

Related Glossary Terms


Drag related terms here

Index Find Term


Scientific Method

Empiricism, the scientific method, refers to using direct observation to obtain knowledge. Thus, the empirical
approach to acquiring knowledge is based on making observations of individuals or objects of interest. As
illustrated by the gas price example, everyday observation is an application of the scientific approach.

Related Glossary Terms


Empiricism, Research

Index Find Term


Scientific Procedures

Scientific procedures involve the use of models to describe, analyze, and make predictions. A model can be a
well-defined set of descriptions and procedures like the Product Life Cycle in marketing, or it can be a scaled
down analog of the real thing, such as an engineer’s scale model of a car.

Related Glossary Terms


Drag related terms here

Index Find Term


Simple Random Sample

A simple random sample is a subset of individuals (a sample) chosen from a larger set (a population). Each
individual is chosen randomly and entirely by chance, such that each individual has the same probability of being
chosen at any stage during the sampling process, and each subset of k individuals has the same probability of
being chosen for the sample as any other subset of k individuals. This process and technique should not be
confused with random sampling.

Related Glossary Terms


Drag related terms here

Index Find Term


Skewness

Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The
skewness value can be positive or negative, or even undefined. Qualitatively, a negative skew indicates that the tail
on the left side of the probability density function is longer than the right side and the bulk of the values (possibly
including the median) lie to the right of the mean. A positive skew indicates that the tail on the right side is longer
than the left side and the bulk of the values lie to the left of the mean. A zero value indicates that the values are
relatively evenly distributed on both sides of the mean, typically but not necessarily implying a symmetric
distribution.

Related Glossary Terms


Drag related terms here

Index Find Term


Spread

Statistical analyses also commonly use measures of dispersion, such as the range, interquartile range, or standard
deviation.

Related Glossary Terms


Interquartile Range, Range, Standard Deviation

Index Find Term


Standard Deviation

Statistical analyses also commonly use measures of dispersion, such as the range, interquartile range, or standard
deviation.

For the population the standard deviation equals:

Number of observations = N

Related Glossary Terms


Sample Standard Deviation

Index Find Term


Statistic

A sample is a subset of a population. For instance, we might be interested in the attitudes of all registered voters
in Pennsylvania. The registered voters would constitute the population. If we administered an attitude scale to a
sample of these voters the summarized results would be referred to as statistics.

Related Glossary Terms


Parameter

Index Find Term


Statistical Inference

Statistical Inference is the process of making inferences concerning a population on the basis of information
contained in samples (from the population) and is based on the relationships between the population, the
sampling distribution, and the sample.

Related Glossary Terms


Drag related terms here

Index Find Term


Statistical Models

Models that are useful and valuable in managing business operations are broadly termed “statistical models”. A
large number of these have been developed to assist researchers in a variety of fields, such as agriculture,
psychology, education, communication, and military tactics, as well as in business. Only the professional
statistician would be expected to understand the full range of such statistical models.

Related Glossary Terms


Drag related terms here

Index Find Term


Statistics

Statistics is about this whole process of using the scientific method to answer questions and make decisions.
Effective decision-making involves correctly designing studies, collecting unbiased data, describing the data with
numbers and graphs, analyzing the data to draw inferences and reaching conclusions based on the transformation
of data into information.

The analysis of events governed by probability is called statistics.

Related Glossary Terms


Probability

Index Find Term


Stem and Leaf Plot

The Stem procedure creates a stem-and-leaf plot for one or more variables. The plot shows the distribution of a
variable graphically. In a stem-and-leaf plot, the digits of each number are separated into a stem and a leaf. The
stems are listed as a column on the left, and the leaves for each stem are in a row on the right.

Related Glossary Terms


Drag related terms here

Index Find Term


Stratified Sampling

When subpopulations within an overall population vary, it is advantageous to sample each subpopulation (stratum)
independently. Stratification is the process of dividing members of the population into homogeneous subgroups
before sampling. The strata should be mutually exclusive: every element in the population must be assigned to
only one stratum. The strata should also be collectively exhaustive: no population element can be excluded. Then
simple random sampling or systematic sampling is applied within each stratum. This often improves the
representativeness of the sample by reducing sampling error.

Related Glossary Terms


Drag related terms here

Index Find Term


Student’s t-Test

A test for statistical significance that uses a statistical distribution called Student's t-distribution, which is that of a
fraction ( t) whose numerator is drawn from a normal distribution with a mean of zero, and whose denominator is
the root mean square of k terms drawn from the same normal distribution (where k is the number of degrees of
freedom). The t-distribution approaches normal as sample size increases.

Related Glossary Terms


Alpha, Confidence Interval, T-Distribution

Index Find Term


Sum of Squares

A statistical technique used in ANOVA and regression analysis. Regression analysis is a tool used to determine
how well a statistical model fits a set of data. The sum of squares technique helps determine what estimator(s)
provide the best fit.

Related Glossary Terms


ANOVA, Mean Squared Error, Total Variation

Index Find Term


Survey

Surveys are questionnaires that are presented to individuals who have been selected from a population of interest.
Surveys take on many different forms: paper surveys sent through the mail; Web sites; call-in polls conducted by
TV networks; and phone surveys.

Related Glossary Terms


Drag related terms here

Index Find Term


Symbols

Greek letters (like µ, π, σ) are generally used when referring to populations, while common letters like (, s, p) are
used to refer to samples.

Related Glossary Terms


Symbols for Populations, Symbols for Samples, Symbols from Sampling Distribution

Index Find Term


Symbols for Populations

Symbols for Populations

µ = Population mean

= Population variance

σ = Population standard deviation

π = Population proportion

µ1 - µ2 = Difference between two population means

π 1 - π 2 = Difference between two population proportions

Related Glossary Terms


Symbols, Symbols for Samples, Symbols from Sampling Distribution

Index Find Term


Symbols for Samples

One member of a sample (one data point)

Sample mean

s2 Sample variance

s Sample standard deviation

Difference between two sample means

p1 –p2 Difference between two sample proportions

Weighted average proportion from two or more samples.

Related Glossary Terms


Symbols, Symbols for Populations, Symbols from Sampling Distribution

Index Find Term


Symbols from Sampling Distribution

Mean of sampling distribution of (sample) means

Standard deviation of the sampling distribution of means, (often called the “standard error” of the sampling
distribution of (sample) means.)

Standard deviation of the sampling distribution of differences between (sample) means (also called the
“standard error” of the sampling distribution of differences between sample means.)

Standard deviation (error) of the sampling distribution of the proportion based on an assumed value for the

population proportion, .

Related Glossary Terms


Symbols, Symbols for Populations, Symbols for Samples

Index Find Term


T-Distribution

Characteristics of Student’s t-distribution

• The t-distribution is different for different degrees of freedom.

• The t-distribution is centered at 0 and is symmetric about 0.

• The area under the curve is 1. Because of the symmetry, the area under the curve to the right of 0 equals
the area under the curve to the left of 0 equals 1/2.

• As t increases (or decreases) without bound, the graph approaches, but never equals, 0.

• The area in the tails of the t-distribution is a little greater than the area in the tails of the standard normal
distribution because using s as an estimate of σ introduces more variability to the t-statistic.

• As the sample size n increases, the density curve of t gets closer to the standard normal density
curve. This result occurs because as the sample size increases, the values of s get closer to the values of
σ by the Law of Large Numbers.

Related Glossary Terms


Alpha, Confidence Interval, Student’s t-Test

Index Find Term


Total Variation

The total variation is comprised the sum of the squares of the differences of each mean with the grand mean.

There is the between group variation and the within group variation. The whole idea behind the analysis of
variance is to compare the ratio of between group variance to within group variance. If the variance caused by the
interaction between the samples is much larger when compared to the variance that appears within each group,
then it is because the means aren't the same.

Related Glossary Terms


Between Groups Variation, Sum of Squares

Index Find Term


Treatment

A treatment is a specific combination of factor levels whose effect is to be compared with other treatments.

The mathematical model that describes the relationship between the response and treatment for the one-way
ANOVA is given by

where Yij represents the j-th observation (j = 1, 2, ...ni) on thei-th treatment (i = 1, 2, ..., k levels).

Related Glossary Terms


Experiment

Index Find Term


Tukey’s Test

Tukey's test, also known as Tukey's HSD (honestly significant difference) test is generally used in conjunction with
an ANOVA to find which means are significantly different from one another. The test compares the means of every
treatment to the means of every other treatment; that is, it applies simultaneously to the set of all pairwise
comparisons and identifies where the difference between two means is greater than the standard error would be
expected to allow.

Related Glossary Terms


Drag related terms here

Index Find Term


Type I Error

We reject the null hypothesis when the null hypothesis is true. This decision would be incorrect. This type of
error is called a Type I error.

As the probability of a Type I error increases, the probability of a Type II error decreases, and vice-versa.

Related Glossary Terms


Alpha, Confidence Interval, Type II Error

Index Find Term


Type II Error

We do not reject the null hypothesis when the alternative hypothesis is true. This decision would be incorrect.
This type of error is called a Type II error.

As the probability of a Type II error increases, the probability of a Type I error decreases, and vice-versa.

Related Glossary Terms


Confidence Interval, Type I Error

Index Find Term


Unbiased

A statistic is said to be an unbiased estimate of a population parameter if the mean of the sampling distribution of
the statistic is (theoretically) equal to the population parameter.

For example:

The “hat on σ, read “sigma-hat,” denotes the estimate of population standard deviation based on a sample and
corrected for bias.”

Related Glossary Terms


Bias

Index Find Term


Validity

In logic, an argument is valid if and only if its conclusion is entailed by its premises and each step in the argument
is valid. A formula is valid if and only if it is true under every interpretation, and an argument form (or schema) is
valid if and only if every argument of that logical form is valid.

Related Glossary Terms


Reliability

Index Find Term


Variance

The quality of being different. In statistics it is a measure of spread or variability in the observations (data). The
variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability
distribution, describing how far the numbers lie from the mean (expected value).

Related Glossary Terms


Sample Standard Deviation, Sample Variance, Spread, Standard Deviation

Index Find Term


Within Group Variation

The variation due to differences within individual samples, denoted SS(W) for Sum of Squares Within groups. Each
sample is considered independently, no interaction between samples is involved. The degrees of freedom is equal
to the sum of the individual degrees of freedom for each sample. Since each sample has degrees of freedom equal
to one less than their sample sizes, and there are k samples, the total degrees of freedom is k less than the total
sample size: df = N - k.

The variance due to the differences within individual samples is denoted MS(W) for Mean Square Within groups.
This is the within group variation divided by its degrees of freedom.

Related Glossary Terms


Mean Squared Treatment

Index Find Term

You might also like