(8 17) Stats Midterms

RANDOM SAMPLING & PROBABILITY
Inferential Statistics
Basic aim of inferential statistics is to use the sample scores to make a statement about a characteristic of the population.
2 Kinds of Statement: Hypothesis Testing and Parameter Estimation.
In hypothesis testing, the experimenter is collecting data in an experiment on a sample set of subjects in an attempt to validate some
hypothesis involving a population.
“The improvement in final exam scores was due to the new teaching method and not chance factors. Furthermore, the improvement does not
apply just to the particular sample tested. Rather, the improvement would be found in the whole population of third graders if they were taught
by the new method.”
In parameter estimation experiments, the experimenter is interested in determining the magnitude of a population characteristics.
“The probability is 0.95 that the interval of $300-$400 contains the population mean.”
Random sampling and probability are central to the methodology of inferential statistics.
Sampling is the process of drawing out sample in the population.
Sampling frame is the list of all individual where your sample be taken
RANDOM SAMPLING
Both in hypothesis testing and in parameter estimation experiments, the sample cannot be just any subset of the population. Rather, it is crucial
that the sample is a random sample.
Random sample is defined as a sample selected from the population by a process
ensures that (1) each possible sample of a given size has an equal chance of being selected and (2) all the members of the population have an
equal chance of being asked into the sample.
The sample should be a random sample for two reasons:
● First, to generalize from a sample to a population, it is necessary to apply the laws of probability to the sample.
● Second, to generalize from a sample to a population, it is necessary that the sample be representative of the population.
Techniques for Random Sampling
● Fishbowl
● Lottery
● Table of Random Numbers/Random Numbers using Calcu
● shift, dec/ran#, x100 (100 sample), =
● mode (fix), 0, =
● 40+, shift, dec/ran#, x60 (range 40-60)
● fix; shift, mode, 2, =, = (reset)
Sampling with replacement is defined as a method of sampling in which each member of the population selected for the sample is returned
to the population before the next member is selected. When the sample size is small relative to population size, the differences are negligible
and 'with-replacement' techniques are much easier to use providing the mathematical basis for inference.
Sampling without replacement is defined as a method of sampling in which the member of the sample are not returned to the population
before subsequent members are selected. Often used in experiment.
Sampling
It is a method that allows researchers to infer information about a population based on results from a subset of the population, without having to
investigate every individual.
Sampling reduces cost and workload, and may make it easier to obtain high-quality information.
Whatever method it is chosen, it is important that the individuals selected are representative of the whole population.
Sampling can be subdivided into two groups: probability sampling and non-probability sampling
Probability Sampling
● Probability sampling refers to sampling techniques for which a person’s (or event’s) likelihood of being selected for membership in the
sample is known.
● Researchers who use probability sampling techniques are aiming to identify a representative sample from which to collect data.
● You start in a complete sampling frame of all eligible individuals
● Do random sampling so that individuals in a population will have a chance of being chosen for the sample, and you will be more able to
generalize the results from your study. Generalizability refers to the idea that a study’s results will tell us something about a group larger
than the sample from which the findings were generated.
● More time consuming and expensive
Non-probability Sampling
● Do not start with a complete sampling frame, so some of the individuals have no chance of being selected.
● Consequently you cannot estimate the effect of sampling error and there is a significant risk of ending up with a non-representative
sample which produces non-generalizable results
● Cheaper and more convenient
● Useful for exploratory research and hypothesis generation
Probability Sampling Methods
1. Simple random sampling - each individual is chosen entirely by chance and each member of the population has an equal
chance, or probability, of being selected.
-Using table of random numbers
-Fishbowl method
Straightforward method
Disadvantage, you may not select enough individuals with your characteristics of interest, especially if that characteristic is uncommon.
2. Systematic Sampling - individuals are selected at regular intervals from the sampling frame.
-X/nth - 1000/100 = 10th
-More convenient than the first
3. Stratified Sampling - the population is first divided into subgroups (or strata) who all share a similar characteristics.
-It is used when we might reasonably expect the measurement of interest to vary between the different subgroups, and we want to ensure
representation from all the subgroups.
-It may also be appropriate to choose non-equal sample sizes from each stratum
-It improves the accuracy and representativeness of the results bu reducing sampling bias;
It is difficult to decide which characteristics to stratify by.
4. Clustered Sampling - subgroup of the population are used as the sampling unit, rather than indivioduals. The population is
divided into subgroups, known as clusters.
-Single-stage cluster sampling
-Two-stage cluster sampling
-Can be more efficient than simple random sampling, especially where a study takes place over a wide geographical region.
-Disadvantages include an increased risk of bias
Non-probability Sampling Method
1. Convenience Sampling - participants are selected based on availability and willingness to take part.
-Prone to significant bias, because those who volunteer to take part may be different from those who choose not to (volunteer bias), and the
sample may not be representative of other characteristics.
-Researcher gathers data from whatever cases happen to be convenient.
2. Quota Sampling-often used by market researchers. Interviewers are given a quota of subjects of a specified type of attempt to
recruit.
-Advantage of being straightforward and potentially representative
-Researcher selects cases from within several different subgroups.
3. Purposive (Judgment Sampling) - also known as selective or subjective sampling.
-This sampling relies on the judgment of the researcher when choosing who to ask to participate
-This approach is often used by the media when canvassing the public for opinions and in qualitative research.
-Advantage of being time and cost effective to perform whilst resulting in a range of response.
-Researcher seeks out elements that meet specific criteria.
4. Snowball Sampling - commonly used in social sciences when investigating hard-to-reach groups.
-Existing subjects are asked to nominate further subjects known to them, so the sample increases in size like rolling snowball.
-Advantage, it can be effective when sampling frame is difficult to identify.
-Researcher relies on participant referrals to recruit new participants.
Bias in sampling
There are five important potential sources of bias that should be considered when selecting a sample, irrespective of the method
used. Sampling bias may be introduced when:
1. Any pre-agreed sampling rules are deviated from
2. People in hard-to-reach groups are omitted
3. Selected individuals are replaced with others, for example if they are difficult to contact
4. There are low response rates
5. An out-of-date list is used as the sample frame (for example, if it excludes people who have recently moved to an area)
Essential Concepts and Steps in Sampling
1. Determine the population of individuals, or items, or cases where to find the data needed. (ex. You want to get the perceptions of the clergy
from certain churches about issues of national development in the Philippines. The target population or universe are all members of the clergy
of selected churches in the Phil)
2. Determine the kind of sample you want to have. (ex. a priest is the basic element of the total target population of priests)
3. Find out what is the appropriate size of the sample.
3.1. Compute the sample size. A sample of at least 30 is needed for statistical laws of probability to operate. Generally, a “large sample” is
useful when the population is small, and a “small sample” for a large population.
3.2. Apply the formula for sample determination
Example: Find n from an N of 1000.
Gay (1976) offers some minimum acceptable sizes depending on the types of research as follows:
a. descriptive research: 10 percent of the population (for a smaller population, a minimum of 20 percent may be required)
b. correlational research: 30 subjects
c. experimental research: 15 subjects per group. Some authorities believe that 30 subjects per group should be considered the minimum
d. ex-post-facto or causal research: 15 subjects per group
4. Having the desired sample size, get the samples from the sampling frame, based on the sampling method that you want to use
Two Main Types of Sampling Procedures/Designs

1. Probability:
-each of the units in the target population has the same chance of being included in the sample
-greater possibility of representative sample of the population
-conclusion derived from data gathered can be generalized for the whole population
2. Non-Probability:
-No way that each of the units in the target population has the same chance of being included in the sample
-No assurance that every unit has some chance of being included
-Conclusion derived from data gathered is limited only to the sample itself
Types of Probability Sampling

1. Simple Random Sampling.
1.1. The Lottery Method
1.2. The use of a Table of Random Numbers
2. Systematic Sampling.
2.1. Get the list of the total universe or population
2.2. Divide the total population by the desired sample size to get the sampling interval
2.3. Proceed with the identification of the samples
3. Stratified Sampling
3.1. Get a list of the universe
3.2. Decide on the sampling size or the actual percentage of the universe that should be considered as sample
3.3. Get a proportion of sample from each group
3.4. Select the respondents either by simple random sampling or systematic sampling
4. Cluster Sampling. This is used in large-scale surveys.
4.1. The researcher arrives at the set of sampling units be included in the sample by first sampling larger grouping, called clusters
4.2. The cluster is selected by simple or stratified sampling
4.3. If not all the sampling units in the clusters are to be included in the sample, the final selection from within the clusters is also carried out by
a simple random or stratified sampling procedure(ex. a survey of urban households may need a sample of cities; within each city that is
selected, a sample of districts; and within each selected district, a sample of households.)
5. Multi-stage Sampling. Usually used for national, regional, provincial or country level studies.
5.1. Decide on the level of analysis that should be studied, such as national, regional, provincial, city or municipality levels
5.2. Determine the sample size per level-stage
5.3. Obtain the samples per level-stage by random sampling or any of the other previously discussed methods.
Types of Non-Probability Sampling

1. Accidental or Convenience Sampling. (ex. you have decided on a sample size of 100. You can interview the first 100 people that you meet)
2. Purposive Sampling. The sampling units are selected subjectively by the researcher, who attempts to obtain a sample that appears to be
representatives of the population.
3. Quota Sampling. In quota sampling, the researchers have an assignment of “quota” or a certain number that must be covered by the
research. It may also be specified how many will be included according to some criteria such as gender, age, and social class, among others.
4. Snowball Sampling. This type of sampling starts with the known sources of information, who or which will in turn give other sources of
information. Used when there is adequate information for making the sampling frame.
5. Network Sampling. Used to find socially devalued urban populations such as addicts, alcoholics, child abusers and criminals, because they
are usually “hidden from outsiders”
PROBABILITY
May be approached in two ways:
1. From a priori/classical viewpoint
2. From an a posteriori/empirical viewpoint
A priori means that which can be deduced from reason alone, without experience (without recourse to any data collection.)
1. From the a priori/classical viewpoint probability defined as:
p(A) is read as the “the probability of occurrence of event A.”

A posteriori means “after the fact,” and in the content of probability, it means after some data have been collected.
2. From the a posteriori/empirical viewpoint probability is defined as:
To determine the probability of a 2 in one roll of one die by using the empirical approach, we would have to take the actual die, roll it many
times, and count the number of times a 2 has occured. The more times we roll the die, the better.
A. Note that, with this approach, it is necessary to have the actual die and to collect some data before determining the probability. The
interesting thing is that if the die is evenly balanced (all numbers are equally alike), then when we roll the die many, many times, the a posteriori
probability approaches the a priori probability. If we roll an infinite number of times, the two probabilities will equal to each other.
B. Also, if the die is loaded (weighted so that one side comes up more often than the others), then the a posteriori will differ from the a
priori determination. For example, if the die is heavily weighted for a 6 to come up, a 2 might never appear.
We can see now that a priori equation assumes that each possible outcome has an equal chance of occurence
Basic Points to Probability
Probability is fundamentally a proportion, it ranges in value from 0.00 to 1.00.
If the probability of an event occuring is equal to 1.00, then the event is certain to occur (the probability that a number from 1 to 6 will occur
equals 1.00, it is certain that numbers will occur).
If the probability equals 0.00, then the event is certain not to occur (example, an ordinary die does not have a side with 7dots on it, rolling a 7 is
certain not to occur).
The probability of occurence of an event is expressed as a fraction or a decimal number (the answer may be left as a fraction but usually
converted to decimal)
Sometimes probability is express as “chance in 100.” For example, someone might say the probability that event A will occur is 5 chance in
100. What it really means p(A) = 0.05.
Occasionally, probability is also expressed as the odds for or against an event occuring. For example, a betting person might say that the odds
are 3 to 1 favoring Mich to win the race. In probability terms, p(Mich's winning) = 3/4 = 0.75. If the odds were 3 to 1 against Mich's winning, the
p(Mich's winning) = 1/4 = 0.25
Two Major Probability Rules
1. Addition Rule
2. Multiplication Rule
The Addition Rule
It is concerned with determining the probability of occurence of any one of several possible events.
Let's assume there are only two possible events, A and B. When there are two events, the addition rule states the following.
The probability of occurence of A or B is equal to the probability of occurence of A plus the probability of occurence of B minus the probability of
occurence of both A and B.
Addition rule for two events - general equation
First method, there are 16 ways to get an ace or a club, so the probability of getting an ace or a club =16/52 = 0.3077
Second method, uses addition rule. The probability of getting an ace = 4/52, and the probability of getting a club = 13/52. The probability of
getting both an ace and a club = 1/52. By the addition rule, the probability of getting an ace or a clube =
Conditions:
-Events are mutually exclusive (two or many)
-Events are mutually exhaustive and mutually exclusive (two or many)
Two events are mutually exclusive if both cannot occur together. Another way of saying this is that two events are mutually exclusive if the
occurence of one preludes the occurrence of the other.
The events of rolling a 1 and of rolling a 2 in one roll of a die are mutually exclusive. If the roll end with a 1, it cannot also be a 2. The events of picking an ace and a king in one
draw from a deck of ordinary playing cards are mutually exclusive. If the card is an ace, it preludes the card also being king. (Ace and club are not mutually exclusive because
there is an ace of club card)
When the events are mutually exclusive, the probability of both events occuring together is zero. Thus, p(A and B)=0, when A and B are mutually exclusive.
To simplify: p(A or B) = p(A) + p(B)
Suppose you are going to random sample 1 individual from a population of 130 people. In the population, there are 40 children younger than 12, 60 teenagers, and 30 adults.
What is the probability the individual you select will be a teenager or an adult?
p(teenager or adult)= p(teenager)+p(adult)
60/130 + 30/130 = 90/130 = 0.6923
What is the probability of randomly picking a 10 or a 4 in one draw from a deck of ordinary playing cards? (There are four 10s an four 4s)
p(10 or 4) = p(10) + p(4)
p(10 or 4) = 4/52 + 4/52 = 8/52 = 0.1538
Addition rule with more than two mutually exclusive events. p(A or B or C... or Z) = p(A)+p(B)+p(C)+...+p(Z)
A set of events is exhaustive if the set includes all of the possible events.
For example, in rolling a die once, the set of events of getting a 1, 2, 3, 4, 5, or 6 is exhaustive because the set includes all of the possble
events.
When a set of events is both exhaustive and mutually exclusive, a very useful relationship exist.
p(A)+p(B)+p(C)+...+p(Z) = 1.00
A,B, C... Z (last event) = the events
Since the events are exhaustive and mutually exclusive, the sum of their probabilities must equal to 1.00, Thus: p(1)+p(2)+p(3)+p(4)+p(5)+p(6)
= 1.00
1/6+1/6+1/6+1/6+1/6+1/6=1.00
When there are only two events and the events are mutually exclusive, it is customary to assign symbol P to the probability of occurence of one
events and Q to the probability of occurence of the other event.
For example, if were flipping a penny and only allowed it to come up heads or tails, this would be a situation in which there are only two
possible events with each flip (a head or a tail), and the events are mutually exclusive (if it is head, it cant be tail, vice versa). P be the head
and Q be the tail.
In flipping coins, fair vs biased coins must be distinguished.
A fair coin or unbiased coin is one where it flipped once, the probability of a head = the probability of a tail = 1/2.
If the coin is biased, the probability of a head not equal to the probability of a tail, not equal to 1/2
Thus, if we are flipping a coin, if we let P equal the probability of a head and Q equal the probability of a tail, and the coin is a fair coin, the P =
1/2 or 0.50 and Q = 1/2 or 0.50.
Since the (2) events of getting a head or a tail in a single flip of a coin are exhaustive and mutually exclusive, their probabilities must equal to 1.
Thus: P+Q = 0.50+0.50 = 1.00
The Multiplication Rule
It is concerned with the joint or successive occurrence of several events.
It deals with what happens on more than one roll or draw, while addition rule covers just one roll or draw.
It states the following: the probability of occurence of both A and B qual to the probability of occurence of A times the probability of B, give A
has occured.
Multiplication rule with two events - equation form
p(B|A) = probability of occurrence of B given A has occured (does not mean B divided by A)
Multiplication rule is concerned with the occurence both A and B (Addition rule applies to the occurence of either A or B)
3 Conditions:
-Events are mutually exclusive
-Events are independent
-Events are dependent
Multiplication Rule: Mutually Exclusive Events
● If A and B are mutually exclusive, then p(A and B) = 0
● Because when events are mutually exclusive, the occurrence of one precludes the occurence of the other. The probability of their joint
occurrence is zero.
Multiplication Rule: Independent Events
Two events are independent if the occurence of one has no effect on the probability of occurence of the other.
Sampling with replacement illustrates this condition well. For example, suppose we are going to draw two cards, one at a time, with
replacement, from a deck of ordinary playing cards.
We can let A be the card drawn first and B be the card drawn second. Since A is replaced before drawing B, the occurence of A on the first
draw has no effect on the probability of occurence of B.
If A and B are independent, then the probability of B occuring is unaffected by A. Therefore p(B|A)=p(B)
Suppose we are going to randomly draw two cards, one at a time, with replacement, from a deck of ordinary playing cards. What is the
probability both cards will be aces?
Since the problem requires an ace on the first draw and an ace on the second draw, the multiplication rule is appropriate. We can let A be an
ace and B be an ace on the second draw. Since sampling is with replacement, A and B are independent.
Thus, p (an ace on first draw and an ace on second draw) = p (an ace on first draw) p (an ace on second draw).
There are four aces possible on he first draw, four aces possible on the second draw (sampling is with replacement), and 52 cards in the deck,
so
p(an ace on first draw) = 4/52
p(an ace on second draw) = 4/52
Thus, p(an ace on first draw and an ace on second draw) = 4/52 (4/52) = 16/2704 = 0.0059
The multiplication rule with independent events also applies to situation in which there are more than two events. In such cases, the probability
of the joint occurence of the events is equal to the product of the individual probabilities of each event.
p(A and B and C and ... Z) = p(A) p(B) p(C) ... p(Z) - multiplication rule with more than two independent events.
Multiplication Rule: Dependent Events

When A and B are dependent, the probability of occurence of B is affected by the occurence of A. We must use it in its original form.
Thus, if A and B are dependent: p(A and B) = p(A) p(B|A)
Sampling without replacement provides a good illustration for dependent events.
Suppose you are going to draw two cards. What is the probability both cards will be aces?
We can let A be an ace on the first draw and B be an ace on the second draw. Since sampling is without replacement (whatever card is picked
the first time is kept out of the deck), the occurence of A does affect the probability of B. A and B are dependent.
Since the problem ask for an ace on the first draw and an ace on the second draw, and these events are dependent, the multiplication rule with
dependent event is appropriate.
Thus, p(an ace on first draw and an ace on second draw)= p(an ace on first draw) p(an ace on second draw, given an ace was obtained on first
draw) = 4/52.
Since sampling is without replacement, p(an ace on second draw given an ace on first draw) = 3/51.
Thus, p (an ace on first draw and an ace on second draw) = 4/52 (3/51) = 12/2652 = 0.0045
Like the multiplication rule with independent events, the multiplication rule with dependent events also applies to situation in which there are
more than two event. In such cases, the equation becomes.
p(A and B and C and ... Z) = p(A)p(B|A)p(C|AB)... p(Z|ABC) - multiplication rule with more than two dependent events.
Combination of Multiplication and Addition Rule

Probability and Continuous Variable
Variables that have been discrete, such as sampling from a deck of cards or rolling a pair of dice.
Many dependent variables that are evaluated in experiments are continuous, not discrete
Probability of A with a continuous variable
Suppose we have measured the heights of all the freshmen women at this university. Let's assume this is a population set of scores that is
normally distributed, with mean of 120 pounds and a standard deviaton of 8 pounds. If we randomly sampled one score from the population,
what is the probability it would be equal to or greater than a score of 134?
The scores are normally distributed, so we can find this proportion by converting the raw score to its z-transformed value and then looking up
the area in the z table.
The main difference is that, the problem has been cast in terms of probability rather than asking for the proportion of percentage of scores as
was done.
The addition rules:
probability of A or B
p(A or B)=p(A)+p(B)-p(A and B)
*If events are mutually exclusive
p(A or B)=p(A)+p(B)
*If events are mutually exclusive and exhaustive
p(A)+p(B) = 1.00
Experiment: Marijuana and the Treatment of AIDS Patients

These data could be analyzed with several different statistical inference tests such as the sign test,
Wilcoxon matched-pairs signed rank test, and Student's t test for correlated groups.
The choice of which test to use in an actual experiment is an important one. It depends on the sensitivity of the test and on whether the data of
the experiment meet the assumption of the test.
Introduction to Hypothesis Testing (using the Sign Test)

Scientific methodology depends on this application of inferential statistics.
The heart of scientific methodology is experiment, and experiment has been designed to test a hypothesis, and the resulting data
must be analyzed.
Sign test is chosen because (1) it is easy to understand and (2) all of the major concepts concerning hypothesis testing can be illustrated
clearly and simply.
It ignores the magnitude of the different scores and considers only their direction or sign.
Repeated Measure Design
The experimental design that is use is called repeated measures, replicated measures, or correlated group designs. The essential features are
that there are paired scores in the conditions, and the differences between the paired scores are analyzed.
In the marijuana (IV) experiment, subject is used the same in each condition, their scores were paired and the differences between these pairs
were analyzed.
Instead of the same subjects, we could have used identical twins or subjects who were matched in some other way.
In animal experimentation, litterness have often been used for pairing.
The most basic form of this design employs just two conditions: an experimental and control condition. The two conditions are kept as identical
a possible except for values of the independent variable.
HYPOTHESIS
Guess/Hunch
Educational Guess
Intellectual Guess
Alternative Hypothesis (H1)
Other term is research hypothesis, is the one that claims the difference in results between conditions is due to the independent variable.
“marijuana affect appetite”
It can be directional (marijuana increases appetite) or nondirectional (marijuana affect appetite - fact-finding)
It is legitimate to use this when there are good theoretical basis and good supporting evidence in the literature
Null Hypothesis (H0)
It is the logical counterpart of the alternative hypothesis such that if the null hypothesis is false, the alternative hypothesis must be true.
Therefore, these hypothesis must be mutually exclusive and exhaustive.
Nondirectional counterpart (IV no effect to DV) “marijuana does not affect appetite”
Directional counterpart (no effect in the direction specified)“marijuana either has no effect on appetite, or it increases appetite”
We first evaluate the null hypothesis by assuming that chance alone is responsible for the differences in scores between conditions.
Decision Rule (alpha level)
Null hypothesis is always evaluated, the reason for this is that we can calculate the probability of chance events, but there is no way to
calculate the probability of the alternative hypothesis.
We assume that the null hypothesis is true and testing the reasonableness of this assumption by calculating the probability of getting the results
if chance alone is operating.
If the obtained probability turns out to be equal to or less than a critical probability level or alpha level we reject the null hypothesis. obt v < crit v
= reject the Ho
Rejecting the null hypothesis allows us to accept indirectly the alternative hypothesis. When we reject it the results are significant or reliable.
If the obtained probability is greater than the alpha level, we conclude by failing to reject H0.
obt v > crit v = fail to reject Ho, accept Ha
Since the experiment does not allow rejection of H0, we retaining H0, as a reasonable explanation of the data. Expressions such as “failure to
reject H0” and “retain H0” interchangeably.
Decision rules states: p-value vs alpha

1. If the obtained probability < alpha, reject H0, accept H1
2. If the obtained probability > alpha, failure to reject H0, retain H0.
Commonly used alpha levels are alpha = 0.05 and alpha = 0.01
Assume alpha = 0.05 for the marijuana data. Thus, to evaluate the results of the marijuana experiment, we need to (1) determine the probability
of getting 9 out of 10 pluses if chance alone is responsible and (2) compare this probability with alpha.
2 ERRORS when making decisions regarding H0

Type 1 error - is defined as a decision to reject the null hypothesis when the null hypothesis is true. (False Positive)
-”Akala mo lang wala, pero meron, meron, meron!”
Type 2 error - is defined as a decision to retain the null hypothesis when the null hypothesis is false. (False Negative)
-”Akala mo merong kayo, pero wala, wala, wala!”
From preceding analysis, we know there are only two such possibilities, a Type I error or a Type II error. Knowing these are possible, we can
design experiments before conducting them to help minimize the probability of making a Type I or a Type II errors.
By minimizing the probability of making these errors, we maximize the probability of concluding correctly, regardless of whether the null
hypothesis is true or false.
Alpha limits the probability of making a Type I error. Therefore, by controlling the alpha level we can minimize the probability of making a Type I
error.
Beta is defined as the probability of making a Type II error. When alpha is made more stringent, beta increases.
ALPHA LEVEL AND THE DECISION PROCESS

The alpha level that the scientist sets at the beginning of the experiment is the level to which he or she wishes to limit the probability of making
Type I error.
Alpha levels are used, like 0.05 for social sciences and 0.01 for clinical trials/experiments/scientific researches (commonly used)
When scientist set alpha = 0.05, he is in effect saying that when he collects the data he will reject the null hypothesis if, under the assumption
that chances alone is responsible, the obtained probability is equal to ot less than 5 times in 100. In so doing, he is saying that he is willing to
limit the probability of rejecting the null hypothesis when it is true to 5 times in 100. Thus, he limits the probability of making a Type I error to
0.05
To determine a reasonable alpha level for an experiment, we must consider the consequences of making an error.
Why not set an even more stringent criteria, such as alpha = 0.001? Unfortunately, when alpha is made more stringent, the probability of
making a Type II error of increases.
Suppose we do an experiment and set alpha = 0.05. We evaluate chance and get an obtained probability of 0.02. We reject H0. If H0 is true,
we have made a Type I error. Suppose, however, that alpha had been set at alpha = 0.01 instead of 0.05. In this case, we would retain H0 and
no longer would be making a Typing I error. Thus, the more stringent the alpha level, the lower the probability of making a Type I error.
On the other hand, what happens if H0 is really false? With alpha = 0.05 and the obtained probability = 0.02, we would reject H0 and
thereby make a correct decision. However, if we changed alpha to 0.01, we could retain H0 and we would make a Type II error. Thus, making
alpha more stringent decreases the probability of making a Type I error but increases the probability of making a type II error.
Because of this interaction between alpha and beta, the alpha level chosen for an experiment depends on the intended use of the experimental
results.
If the results are to communicate a new fact to the scientific community, the consequences of a Type I error are great, and therefore stringent
alpha levels are used (0.05 and 0.01).
If the experiement is exploratory in nature and the results are to guide the researcher in deciding whether to do a full-fledged experiment, it
would be foolish to use stringent levels. In such cases, alpha levels as high as 0.10 to 0.20 are often used.
Before an “alleged fact” is accepted into the body of scientific knowledge, it must be demonstrated independently in several laboratories. The
probability of making a Type I error decreases greatly with independent replication
EVALUATING THE TAIL OF THE DISTRIBUTION

We must determine the probability of getting the obtained outcome or any outcome even more extreme.
We evaluate the tail of the distribution, beginning with the obtained result, rather than just the obtained result itself.
If the alternative hypothesis in nondirectional, we evaluate the obtained result or any even more etreme in both directions (both tails).
If the alternative hypothesis is directional, we evaluate only the tail of the distribution that is in the direction specified by H1.
ONE- AND TWO- TAILED PROBABILITY EVALUATIONS

When setting the alpha level, we must decide whether the probability evaluation should be one- or two-tailed.
When making this decision, use the following rule: the evaluation should always be two-tailed unless the experimenter will retain H0 when
results are extreme in the direction opposite to the predicted direction.
When answering a problem, use the direction specified by the H1 or alpha level given in the problem to determine if the evaluation is to be-one-
tailed or two-tailed.
In following this rule, there are two situations commonly encountered that warrant direcetional hypotheses.
First, when it makes no practical difference if the results turn out to be in the opposite direction, it is legitimate to use a directional hypothesis
and a one-tailed evaluation.
Another situation in which it seems permissible to use a one-tailed evaluation is when there are good theoretical reasons, as well as string
supporting data, to justify the predicted direction.
In situations in which the experimenter will reject H0 if the results of the experiment are extreme in the direction opposite to the prediction
direction, a two-tailed evaluation should be used.
PROBLEM 1
Assume we have conducted an experiment to test the hypothesis that marijuana affects the appetites of AIDS patients. The
procedure and population are the same as we described previously, except this time we have sampled 12 AIDS patients. The results
are shown here (the scores are in calories):
A. What is the nondirectional alternative hypothesis?

B. What is the null hypothesis?
C. Using alpha = 0.052 tail, what do you conclude?
D. What error might you be making by your conclusion in part c?
E. To what population does your conclusion apply?
SOLUTION
A. Nondirectional alternative hypothesis (H1): Marijuana affects appetite of AIDS patients who are being treated at your hospital.
B. Null Hypothesis (N0): Marijuana has no effect on appetites of AIDS patients who are being treated at your hospital.
C. Conclusion, using alpha = 0.052 tail:
Step 1: Calculate the number of pluses and minuses. Subtract “placebo” scores from the corresponding “THC” (reverse could also have been
done). 10 pluses and 2 minuses.
Step 2: Evaluate the number of pluses and minuses. We must determine the probability of getting this outcome or any even more extreme in
both direction because this is a two-tailed evaluation. The binomial distribution is appropriate for this determination. N = the number of
difference scores (pluses and minuses) = 12. P = the probability of a plus with any subject; if no effect P = 0.50. The obtained result was 10
plused and 2 minuses with 10 number of P events.
The probability of getting an outcome as extreme as or more extreme that 10 pluses (two-tailed) euals the probability of 0, 1, 2, 10, 11, or 12
pluses. Since the distribution is symmetrical, p(0,1,2,10,11, or 12 pluses) equals p (10, 11, 0r 12 pluses) x 2. Thus
p (0,1,2,10,11, or 12 pluses) = p(10,11 or 12) x 2 = (0.0161 + 0.0029+0.0002)x2
= 0.0384 (0.0384 < 0.05, reject H0, accept H1)
D. . Possible error: By rejecting the null hypothesis, you might be making a Type I error. In reality, the null hypothesis may be true and you
have rejected it.
E. E. Population: These results apply to the population of AIDS patients from which the sample was taken
PROBLEM 2.
You have good reason to believe a particular TV program is causing increased violence in teenagers. To test this hypothesis, you conduct an
experiment in which 15 individuals are randomly sampled from teenagers attending your neighborhood high school. Each subject is run in an
experimental and a control condition. In the experimental condition, the teenagers watch the TV program for 3 months, during which you record
the number of violent acts committed. The control condition also lasts for 3 months, but the teenagers are not allowed to watch the program
during this period. At the end of each 3-month period, you total the number of violent acts committed. The results are given:
A. What is the directional alternative hypothesis?

B. What is the null hypothesis?
C. Using alpha = 0.011 tail, what do you conclude?
D. What error might you be making by your conclusion in part c?
E. To what population does your conclusion apply?
A. Directional alternative hypothesis (H1): Watching the TV program causes increased violence in teenagers.
B. Null Hypothesis (N0): Watching the TV program, does not increased violence in teenagers.
C. Conclusion, using alpha = 0.01 1 tail:
Step 1: Calculate the number of pluses and minuses. Subtract “not viewing” scores from the corresponding “viewing” condition. 12 pluses and 3
minuses.
Step 2: The binomial distribution is appropriate for this determination. N = the number of difference scores (pluses and minuses) = 15. P = the
probability of a plus with any subject; We can evaluate the null hypothesis by assuming chance alone accounts for whether any subject scores
plus or minus. Therefore, P = 0.50. The obtained result was 12 plused and 3 minuses with 10 number of P events. Thus the probability of 12
pluses or more equals the probability of 12, 13, 14, or 15 pluses.
p (12, 13, 14, or 15 pluses) = p(12)+p(13)+p(14)+p(15) = (0.0139+0.0032+0.0005+0.0000) =

0.0176
(0.0176 > 0.01, fail to reject H0, retain H0)
D. Possible error: By retaining the null hypothesis, you might be making a Type II error. The
TV program may actuall cause increased violence in teenagers
E. Population: These results apply to teenagers watching tv programs (with or without violent
acts)
● If we are able to reject the null hypothesis the results are significant - statistically significant. That is, the results are probably not due
to chance, the independent variable has had a real effect, and if we repeat the experiment, we would again get results that would allow
us to reject the null hypothesis
● Reliable is the better term to use, however the usage of significant is well established
● The point is that we must not confuse statistically significant with practically or theoretically important. A statistically significant effect
says little about whether the effect is an important one.
● The importance of an effect generally depends on the size of the effect. The larger the effect, the more likely it is to be an important
effect.
INTRO TO SPSS
- STATISTICAL PACKAGE FOR THE SOCIAL SCIENCES
It is a statistical software package that runs on PCs and Macs for editing and analyzing all sorts of data, and was first launched in 1968 in an
IBM company.
It is widely used within psychology and is available in universities for scientific researches and complex statistical data analysis.
It can open all file formats that are commonly used for structured data such as MS
Excel or OpenOffice, etc.
SPSS DATA VIEW
After opening data, SPSS displays them in a spreadsheet-like fashion
as shown in the screenshot below
This sheet -called data view- always displays our data values.
for instance, our first record seems to contain a male respondent from 1979 and so on.
SPSS VARIABLE VIEW

An SPSS data file always has a second sheet called variable view. It
shows the
metadata associated with the data. Metadata is information about the
meaning of
variables and data values. This is generally known as the “codebook” but
in SPSS
It's called the dictionary.
For non SPSS users, the look and feel of SPSS’ Data Editor window
probably
come closest to an Excel workbook containing two different but strongly
related sheets.
DATA ANALYSIS
How to analyze your data in SPSS? One option is using SPSS’ elaborate menu options.
For instance, if our data contain a variable holding respondents’ incomes over
2010, we can compute the average income by navigating to Descriptive Statistics as shown below.
Doing so opens a dialog box in which we select one or many variables
and one or
several statistics we'd like to inspect.
SPSS Output Window
After clicking Ok, a new window opens up: SPSS’ output viewer window.
It holds a nice table with all statistics on all variables we chose. The
screenshot below shows what it
looks like.
As we see, the Output Viewer window has a different layout and structure
than the Data Editor window we saw earlier. Creating output in SPSS
does not change our data in any way; unlike Excel, SPSS uses different windows for data and research outcomes based on those data.
For non SPSS users, the look and feel of SPSS’ Output Viewer window probably comes closest to a Powerpoint slide holding items such as
blocks of text, tables and charts.
SPSS Reporting
SPSS Output items, typically tables and charts, are easily copy-pasted into other programs. For instance, many SPSS users use a word
processor
such as MS Word, OpenOffice or GoogleDocs for reporting. Tables are
usually copied in rich text format, which means they'll retain their styling
such as fonts and borders. The screenshot below illustrates the result.
SPSS Syntax Editor Window
The output table we showed was created by running
Descriptive Statistics from SPSS’ menu. Now, SPSS has a
second option for running this (or any other) command: we can open a third window, known as the syntax editor window. Here we can type and
run SPSS code known as SPSS syntax. For
instance, running
descriptives income_2010.
has the exact same result as running this command from
SPSS’ menu like we did earlier.
The basic point is that syntax can be saved, corrected,
rerun and shared between projects or users. Your syntax make
your SPSS work replicable. If anybody raises any doubts
regarding your outcomes, you can show exactly what you did
and -if needed- correct and rerun it in seconds.
For non SPSS users, the look and feel of SPSS’ Syntax
Editor window probably come closest to Notepad: a single
window basically just containing plain text.
Opening Data Files

SPSS has its own data file format. Other file formats it easily
deals with include MS Excel, plain text files, SQL, Stata and
SAS.
Editing Data
In real world research, raw data usually need some
editing before they can be properly analyzed. Typical examples
are creating means or sums as new
variables, restructuring data or detecting and
removing unlikely observations. SPSS performs such tasks -and more complex ones- with amazing
efficiency.
For getting things done fast, SPSS contains many
numeric functions, string functions, date
functions and other handy routines.
Tables and Charts

All basic tables and charts can be created easily and fast in
SPSS. Typical examples are demonstrated under Data
Analysis. A real weakness of SPSS is that its charts tend to be ugly and often have a clumsy layout. A great way to overcome this problem is
developing and applying SPSS chart templates. Doing so, however, requires a fair amount of effort and
expertise.
SPSS IMPORTANCE (TOP 3)

(8 17) Stats Midterms

Uploaded by

Copyright:

Available Formats

(8 17) Stats Midterms

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(8 17) Stats Midterms

Uploaded by

Copyright:

Available Formats

RANDOM SAMPLING & PROBABILITY

Two Main Types of Sampling Procedures/Designs

Types of Probability Sampling

Types of Non-Probability Sampling

1. From the a priori/classical viewpoint probability defined as:

p(A) is read as the “the probability of occurrence of event A.”

2. From the a posteriori/empirical viewpoint probability is defined as:

Multiplication Rule: Dependent Events

Combination of Multiplication and Addition Rule

Experiment: Marijuana and the Treatment of AIDS Patients

Introduction to Hypothesis Testing (using the Sign Test)

Decision rules states: p-value vs alpha

2 ERRORS when making decisions regarding H0

ALPHA LEVEL AND THE DECISION PROCESS

EVALUATING THE TAIL OF THE DISTRIBUTION

ONE- AND TWO- TAILED PROBABILITY EVALUATIONS

A. What is the nondirectional alternative hypothesis?

A. What is the directional alternative hypothesis?

p (12, 13, 14, or 15 pluses) = p(12)+p(13)+p(14)+p(15) = (0.0139+0.0032+0.0005+0.0000) =

SPSS VARIABLE VIEW

Opening Data Files

Tables and Charts

SPSS IMPORTANCE (TOP 3)

You might also like