Video Questions
Video Questions
Video Questions
VANGUARDIA PROGRAM
The degrees of freedom are the number of independent values that we assume
to test our hypothesis. It is the information that is required to estimate the
distribution of the data that we want to analyze.
By a certainty level of 95% is mean that we can assume that the results are
equal or close to the estimated results in a range of 95%. In the orher side, by a
p-level of 5% or less we can assume that the results or any analyzed value is
into a range of values with an error of 5% or less.
• How is the level of significance for testing determined? Who sets this
level of significance?
The level of significance for testing is determined by the study type, depending
on the topic to be studied, because it will be set by the person who is doing the
research and will depend on how significant or representable should be the
analysis. It is like the risk that you accept of rejecting the null hypothesis while it
is correct.
• Explain in your own words what is the ‘Central Limit Theorem’ and what
are the useful implications of that theorem for studying sample means or
sample percentages when taking a random sample from a population.
• In the context of the Law of Large Numbers, explain in words (or in maths)
the relationship between the size of the random sample drawn from a
population and the precision with which inferences can be made about
the characteristics of the population from which the sample is drawn.
The relationship between the size of the random sample and the precision with
which inferences can be made is that while the number of data analyzed
increases, the value of the standard deviation decreases, tending to the same
value of population when the number of samples is infinite.
The main difference between these two concepts is that a correlation coefficient
helps to determine the relation between two variables with their units and a
contingency table is mainly associated to establish a quantity relation between
two or more variables with their units. It can help to get a brief summary about
the situation of the data.
• What are the problems that you may encounter with the quality of your
data? How do you analyze a data matrix for possible data problems? How
do you deal with such data problems?
Some of the problems with the quality of the data and their solutions are:
Missing data.
Missing data are either accepted as missing, reducing your effective number of
observations, or replaced by ‘acceptable’ values. Missing data are best
reported with a specific code, reducing your effective number of observations in
one of two ways:
- List wise deletion: eliminates the complete row from your data set,
- Pair wise deletion: eliminates only the missing datum, retaining the other data.
• Assume that you have made 500 observations, of which 150 belong to a
first category, 100 to a second, 160 to a third category and 90 to a fourth
category. Test the (null) hypothesis that the four categories actually may
have equal probability.
• Take variable x4 in the example data set. Test the hypothesis that the
numbers of the (ordered) response categories follow a Normal
distribution. Explain how you would carry out the test and carry out the
test.
To carry out the test, it is necessary to distribute the data into categories and
then the process can be numerical or graphical.
Numerical solution
Category Standard normal value area under observed expected squared squared difference /
Category difference
mean of a category border normal curve frequiencies frequencies difference expected frequency
Graphical Solution
• Compute a new the variable x3+x4+x6 in the data set. For that variable
find the median, the interquartile range and the range between the 10th
and 90th percentile value. Compare the median to the mean and the mode.
X3+X4+X6
Median 10.50
Mean 11.07
Mode 9.00
Interquartile range 4.00
10th percentile 8.10
90th percentile 15.00
min 6.00
max 17.00
Standar deviation 2.37
• Verify that the mean of the new variable is 0 and that the standard
deviation is 1.0
X3+X4+X6 (values)
Mean 11,07
Standar deviation 2,65
min 6,00
max 17,00
Ho: mean 10,00
Certainty level 95%
T student 2,05
t 2,20
Conclusion Reject Ho
The t value exceeds the T student value for certain level of 95% and 29
degrees of freedom, so the hypothesis should be rejected.
• Variable x1 has a sample mean of 50%. Draw a 95% confidence interval (2-
tailed) around this value of the mean.
• In the example above (n° 3), with a sample mean of 0.50, how large a
sample size do you need in order to obtain a 95% confidence interval of
+/- .03?
The sample size needed to obtain a confidence interval of 95% with an interval
of 0.03 that is the double of the standard deviation, is calculated with the next
expression:
𝜎 = 0.03/2 𝜇 = 50%
1 − 0.5
0.0152 = 0.5 ∗
𝑛
1 − 0.5
0.015 = 2 ∗ √0.5 ∗
𝑛
𝑛 = 1100
• If the sample % is 50%, as in the data set for variable x1, then how likely is
it that the population percentage of x1 is actually less than 40%? (one-
tailed test)
Using the data x1, the value of 𝜎 = 0.51 with this value the probability than the
population percentage is less than 40% is 15%. It means that has a confidence
interval of +/- 0.15.
The null hypothesis, x1-x2≠0, cannot be rejected because the probability of the
test (50%) is less than the confidence interval of 95%.
• Now carry out the same test using a regression approach, i.e. regressing
variable x4 on a dummy variable to represent membership of group 2;
verify that the results of this approach are identical to those of the first
approach.
X4 Variable Mean Standart Deviation Degrees of freedom Standart Deviation Degree of freedom t Probability
1 5,00
2 2,00
3 3,00
4 3,00
5 3,00
6 2,00
7 4,00
8 5,00 x1 3,27 1,22 14,00
9 5,00
10 1,00
11 3,00
12 2,00
13 3,00
14 4,00
15 4,00
2,76 28,00 -0,66 0,51
16 1,73
17 -0,27
18 0,73
19 -0,27
20 -1,27
21 -2,27
22 -2,27
23 -2,27 x2 -0,33 1,53 14,00
24 -0,27
25 -1,27
26 0,73
27 0,73
28 1,73
29 1,73
30 -2,27
The result of the test with the dummy variable is 51% close to the result of the
previous analysis with 50%
• Carry out the problem for n° 1 above, assuming that the 15 observations
are paired (dependent). Compare the significance level of this test with
that of the test for problem n° 1.
X4 Variable Mean Standart Deviation Degrees of freedom Standart Deviation Degree of freedom t Probability
1 5,00
2 2,00
3 3,00
4 3,00
5 3,00
6 2,00
7 4,00
8 5,00 x1 3,27 1,18 14,00
9 5,00
10 1,00
11 3,00
12 2,00
13 3,00
14 4,00
15 4,00
1,48 28,00 1,23 0,23
16 1,73
17 -0,27
18 0,73
19 -0,27
20 -1,27
21 -2,27
22 -2,27
23 -2,27 x2 -0,33 1,48 14,00
24 -0,27
25 -1,27
26 0,73
27 0,73
28 1,73
29 1,73
30 -2,27