Statistics Project
Statistics Project
Statistics Project
Binf II-C
Abstract
This project consists of mean inferences. It shows in a detailed way how to perform statistical
tests and jump to conclusions using the hypothesis method of comparing two means of
different populations or the same populations. Based on sample data we are able to conduct
the test and move on to conclusions based on the evidence collected as a result, yet not fully
certain about it. Whenever the data presented to us tells us that the population means are
unknown, we base our hypothesis testing on the t-distribution.
This project has four parts. The first part is the abstract itself, which gives a description of the
whole project and what it consists of. The second part is an introduction about cases of mean
inferences; hypothesis testing conducted on them and also interval estimation of the two
means described with the use of some important concepts and formulas. The third part is the
presentation of the problem in question, its analysis, solution both by the regular hypothesis
testing steps and Excel. The last part is just a conclusion to state the observation results and to
sum up the project.
Regarding the problem we chose as an example, it has to do with a two-tailed test about two
matched samples when the respective populations are unknown. We perform the testing
procedures based on a t-distribution, since we do not know the population means and standard
deviations, thus we rely on the given samples to estimate them.
When there is a relationship between the samples, they are referred to as dependent samples.
The information is made up of matched pairs drawn from random samples.
When the values chosen for one sample are used to determine the values in the second sample
, the sampling method is said to be dependent.
Dependent samples are measurements taken before and after on a population. The objects of
the sample are measured twice: once at one point in time, and then again at a subsequent
point in time.
Dependency can also emerge when objects are connected.
We use the difference of the pairs of data in our analysis. For each pair, we subtract the
values:
We are creating a new random variable d (differences), and it is important to keep the sign,
whether positive or negative. We can compute d̄, the sample mean of the differences, and sd,
the sample standard deviation of the differences as follows:
We’ll use the same three pairs of null and alternative hypotheses
The critical value comes from the student’s t-distribution table with n – 1 degrees of
freedom, where n = number of matched pairs. The test statistic follows the student’s t-
distribution.
Using independent samples means that there is no relationship between the groups. The values
in one sample have no association with the values in the other sample.
With a two-sample t-test, we compare the population means to each other and again look at
the difference. We expect that would be close to μ1 – μ2. The test statistic will use
both sample means, sample standard deviations, and sample sizes for the test.
Both samples come from independent random samples. The populations must be normally
distributed, or both have large enough sample sizes (n1 and n2 ≥ 30). We will also use the same
three pairs of null and alternative hypotheses.
The test statistic is Welch’s approximation under the assumption that the independent
population variances are not equal.
This test statistic follows the student’s t-distribution with the degrees of freedom formula as
below:
When handling a problem long-hand, a simpler option to finding degrees of freedom is to use t
he lesser of
n1-1 or n2-1 as the degrees of freedom.
This strategy yields a lesser degree of freedom value and, as a result, a greater critical value.
This makes the test more conservative, as rejecting the null hypothesis requires more evidence.
Making the assumption that our two populations have unequal variations.
The Welch's t-test statistic does not presume that the population variances are equal and can b
e used whether or not they are.
The pooled t-test is a statistical test that assumes equal population variances.
Finding a weighted average of the two independent sample variances is referred to as pooling.
The pooled test statistic uses a weighted average of the two sample variances.
The advantage of this test statistic is that it exactly follows the student’s t-distribution with n1+
n2– 2 degrees of freedom.
On the basis of sample data, it may be difficult to establish that two population variances are
equal.
The F-test is a popular way to test variances, although it isn't very reliable.
Small deviations from normalcy have a significant impact on the outcome, making the F-test res
ults untrustworthy.
It can be difficult to tell if a significant result from an F-test is due to non-normality or difference
s in variances.
As a result, when comparing two means, many researchers employ Welch's t.
P value approach
There is another way of deciding on whether to reject or not the null hypothesis. After the
test statistic is performed, we calculate the p-value, which shows the % of obtaining a value
as small as or as big as the sample. In other words, it shows the incompatibility of data
samples to the null hypothesis data variability. If the p-value is lower or equal to the level of
significance, we reject the null hypothesis.
A hypothesis test will answer the question about the difference of the means. BUT, we can
answer the same question by constructing a confidence interval about the difference of the
means. This process is just like the confidence intervals from Chapter 2.
1.Find the critical value.
2.Compute the margin of error.
3.Point estimate ± margin of error.
Because we are working with two samples, we must modify the components of the confidence
interval to incorporate the information from the two populations.
Critical value
E =
The confidence interval takes the form of the point estimate plus or minus the standard error of
the differences.
±
OR
±E
Standard deviations of the populations are known we use the z-statistics to compute the
interval estimation.
Problem Analysis
As we can see, we have the case of matched samples, and the population size, mean, and
standard deviation are unknown. The key to solving this problem is the sample difference
mean, which the whole problem is based around. All the calculations have been done based on
approximate values, since the t-distribution tables provided do not have exact p-values. Still we
use the t-distribution table to find a range for the p-values. Computer software aids us to
determine the exact p-value for correct results.
H o : μd = 0
H a : μd ≠ 0
α = 0.05
P-value approach
Excel spreadsheet
References
Essentials of Statistics for Business and Economics.
Lesson slides
courses.lumenlearning.com