Statistical Hypothesis Test
Statistical Hypothesis Test
Statistical Hypothesis Test
Hypothesis testing
One use of hypothesis testing is deciding whether experimental results contain enough
information to cast doubt on conventional wisdom.
Statistical hypothesis testing is a key technique of frequentist statistical inference, and is widely
used, but also much criticized. While controversial, the Bayesian approach to hypothesis testing
is to base rejection of the hypothesis on the posterior probability. Other approaches to reaching a
decision based on data are available via decision theory and optimal decisions.
The critical region of a hypothesis test is the set of all outcomes which, if they occur, will
lead us to decide that there is a difference. That is, cause the null hypothesis to be rejected in
favor of the alternative hypothesis. The critical region is usually denoted by C.
1. The first step in any hypothesis testing is to state the relevant null and
alternative hypotheses to be tested. This is important as mis-stating the hypotheses
will muddy the rest of the process.
2. The second step is to consider the statistical assumptions being made about the
sample in doing the test; for example, assumptions about the statistical independence or
about the form of the distributions of the observations. This is equally important as
invalid assumptions will mean that the results of the test are invalid.
3. Decide which test is appropriate, and stating the relevant test statistic T.
4. Derive the distribution of the test statistic under the null hypothesis from the
assumptions. In standard cases this will be a well-known result. For example the test
statistics may follow aStudent's t distribution or a normal distribution.
5. The distribution of the test statistic partitions the possible values of T into those
for which the null-hypothesis is rejected, the so called critical region, and those for
which it is not.
6. Compute from the observations the observed value tobs of the test statistic T.
7. Decide to either fail to reject the null hypothesis or reject it in favor of the
alternative. The decision rule is to reject the null hypothesis H0 if the observed
value tobs is in the critical region, and to accept or "fail to reject" the hypothesis
otherwise.
It is important to note the philosophical difference between accepting the null hypothesis and
simply failing to reject it. The "fail to reject" terminology highlights the fact that the null
hypothesis is assumed to be true from the start of the test; if there is a lack of evidence against it,
it simply continues to be assumed true. The phrase "accept the null hypothesis" may suggest it
has been proved simply because it has not been disproved, a logical fallacy known as
the argument from ignorance. Unless a test with particularly high power is used, the idea of
"accepting" the null hypothesis may be dangerous. Nonetheless the terminology is prevalent
throughout statistics, where its meaning is well understood.
Wrong decision
Reject Null Hypothesis Right decision
Type I Error
and
and hence, very small. The probability of a false positive is the probability of
randomly guessing correctly all 25 times.
From all the numbers c, with this property, we choose the smallest, in
order to minimize the probability of a Type II error, a false negative. For the
above example, we select: c = 12.
But what if the subject did not guess any cards at all? Having zero correct
answers is clearly an oddity too. The probability of guessing incorrectly once is
equal to p'=(1-p)=3/4. Using the same approach we can calculate that probability
of randomly calling all 25 cards wrong is:
This is highly unlikely (less than 1 in a 1000 chance). While the subject
can't guess the cards correctly, dismissing H0 in favour of H1 would be an error.
In fact, the result would suggest a trait on the subject's part of avoiding calling the
correct card. A test of this could be formulated: for a selected 1% error rate the
subject would have to answer correctly at least twice, for us to believe that card
calling is based purely on guessing.
1. The null hypothesis was that the Lady had no such ability.
2. The test statistic was a simple count of the number of
successes in 8 trials.
3. The distribution associated with the null hypothesis was
the binomial distribution familiar from coin flipping
experiments.
4. The critical region was the single case of 8 successes in
8 trials based on a conventional probability criterion
(< 5%).
5. Fisher asserted that no alternative hypothesis was (ever)
required.
If and only if the 8 trials produced 8 successes was Fisher willing to reject
the null hypothesis – effectively acknowledging the Lady's ability with > 98%
confidence (but without quantifying her ability). Fisher later discussed the
benefits of more trials and repeated tests.