Spiros Velianitis: Slides by
Spiros Velianitis: Slides by
Spiros Velianitis: Slides by
Spiros
Velianitis
CSUS
Slide
1
Review Objective
Slide
2
Stat 1 Review Summary Slide
Slide
3
Statistic vs. Parameter
In order for managers to make good decisions, they need information. Information is
derived by summarizing data (often obtained via samples) since data, in its original
form, is hard to interpret. This is where statistics come into play -- a statistic is nothing
more than a quantitative value calculated from a sample.
There are many different statistics that can be calculated from a sample. Since we are
interested in using statistics to make decisions there usually are only a few statistics
we are interested in using. These useful statistics estimate characteristics of the
population, which when quantified are called parameters.
Some of the most common parameters are μ (population mean), σ 2 (population
variance), σ (population standard deviation), and Ρ (population proportion).
The key point here is that managers must make decisions based upon their perceived
values of parameters. Usually the values of the parameters are unknown. Thus,
managers must rely on data from the population (sample), which is summarized
(statistics), in order to estimate the parameters. The corresponding statistics used to
estimate the parameters listed above (μ , σ 2, σ , and Ρ ) are called ˆx (sample mean), s2
(sample variance), s (sample standard deviation), and ˆp (sample proportion).
Slide
4
Mean and Variance
Two very important parameters which managers focus on frequently are the mean
and variance.
The mean, which is frequently referred to as “the average,” provides a measure of
the central location
The variance describes the amount of dispersion within the population. The
square root of the variance is called a standard deviation. In finance, the standard
deviation of the stock returns is called volatility.
For example, consider a portfolio of stocks. When discussing the rate of return
from such a portfolio, and knowing that the rate of return will vary from time
period to time period one may wish to know the average rate of return (mean) and
how much variation there is in the returns. The rate of return is calculated as
follows:
return= (New Price - Old Price)/Old Price
The median is another measure of central location and is the value in the middle
when the data are arranged in ascending order.
The mode is a third measure of central location and is the value that occurs the
most often in the data.
Slide
5
Exercises
Slide
6
Sampling Distribution
In order to understand statistics and not just “plug” numbers into formulas,
one needs to understand the concept of a sampling distribution. In
particular, one needs to know that every statistic has a sampling
distribution, which shows every possible value the statistic can take on and
the corresponding probability of occurrence.
What does this mean in simple terms? Consider a situation where you wish
to calculate the mean age of all students at CSUS. If you take a random
sample of size 25, you will get one value for the sample mean (average)
which may or may not be the same as the sample mean from the first
sample. Suppose you get another random sample of size 25, will you get the
same sample mean? What if you take many samples, each of size 25, and
you graph the distribution of sample means. What would such a graph
show? The answer is that it will show the distribution of sample means,
from which probabilistic statements about the population mean can be
made.
Slide
7
Normal Distribution
For the situation described in the previous slide, the distribution of the sample mean
will follow a normal distribution. What is a normal distribution? The normal
distribution has the following attributes (suppose the random variable Χ follows a
normal distribution):
1. It is bell-shaped
2. It is symmetrical about the mean.
3. It depends on two parameters - the mean ( μ ) and variance ( σ2)
From a manager’s perspective it is very important to know that with normal
distributions approximately:
68% of all observations fall within 1 standard deviations of the mean:
Ρrob(μ −σ ≤ Χ ≤μ +σ)≈ 0.68 .
95% of all observations fall within 2 standard deviations of the mean:
Ρrob(μ −2σ ≤Χ≤μ +2σ)≈0.95.
99.7% of all observations fall within 3 standard deviations of the mean:
Ρrob(μ −3σ ≤Χ≤μ +3σ)≈0.997 .
When μ =0 and σ =1 , we have the so-called standard normal distribution, usually
denoted by Ζ . It is also called the Z-score.
Look at http://www.statsoft.com/textbook/sttable.html#z
Slide
8
Central Limit Theorem
Slide
9
Chi-Square (χ2) Distribution and the F
Distribution
The sample variance
Slide
10
How to Look up the Chi Square Distribution table
The Chi-square distribution's shape is determined by its degrees of freedom.
As shown in the illustration below, the values inside this table are critical
values of the Chi-square distribution with the corresponding degrees of
freedom. To determine the value from a Chi-square distribution (with a
specific degree of freedom) which has a given area above it, go to the given
area column and the desired degree of freedom row. For example, the .25
critical value for a Chi-square with 4 degrees of freedom is 5.38527. This
means that the area to the right of 5.38527 in a Chi-square distribution with 4
degrees of freedom is .25.
Right tail areas for the Chi-square Distribution
Slide
11
Student’s t-Distribution
Slide
12
How to Look up the t Distribution table
Slide
13
Confidence Intervals
Constructing a confidence interval estimate of the unknown
value of a population parameter is one of the most common
statistical inference procedures.
A confidence interval is an interval of values computed from
sample data that is likely to include the true population value.
The term confidence level is the chance that this confidence
interval actually contains the true population value.
Slide
14
Confidence Interval Example
Suppose you wish to make an inference about the average income for all students
at Sacramento State (population mean μ, a parameter). From a sample of 45
Sacramento State students, one can come up with a point estimate (a sample
statistic used to estimate a population parameter), such as $24,000. But what does
this mean? A point estimate does not take into account the accuracy of the
calculated statistic. We also need to know the variation of our estimate. We are not
absolutely certain that the mean income for Sacramento State students is $24,000
since this sample mean is only an estimate of the population mean. If we collect
another sample of 45 Sacramento State students, we would have another estimate
of the mean. Thus, different samples yield different estimates of the mean for the
same population. How close these sample means are to one another determines the
variation of the estimate of the population mean.
A statistic that measures the variation of our estimate is the standard error of the
mean. It is different from the sample standard deviation ( s ) because the sample
standard deviation reveals the variation of our data.
The standard error of the mean reveals the variation of our sample mean. The
standard error of the mean is a measure of how much error we can expect when
we use the sample mean to predict the population mean. The smaller the standard
error is, the more accurate our sample estimate is.
Slide
15
Confidence Interval Example (cont)
In order to provide additional information, one needs to provide a
confidence interval.
A confidence interval is a range of values that one believes to contain the
population parameter of interest and places an upper and lower bound
around a sample statistic.
To construct a confidence interval, we need to choose a significance level. A
95% (=1-5% where 5% is the level of significance or α ) confidence interval
is often used to assess the variability of the sample mean. A 95%
confidence interval for the mean student income means we are 95%
confident the interval contains the mean income for Sacramento State
students. We want to be as confident as possible. However, if we increase
the confidence level, the width of our confidence interval increases. As the
width of the interval increases, it becomes less useful. What is the difference
between the following 95% confidence intervals for the population mean?
[23000, 24500] and [12000, 36000]
Slide
16
Confidence Interval Hands On Example (Pg 10).
The following is a sample of regular gasoline price in Sacramento on May
22, 2006 found at www. automotive.com:
3.299 3.189 3.269 3.279 3.299 3.249 3.319 3.239 3.219
3.249 3.299 3.239 3.319 3.359 3.169 3.299 3.299 3.239
Find the 95% confidence interval for the population mean.
Given the small sample size of 18, the t-distribution should be used. To find
the 95% confidence interval for the population mean using this sample, you
need to x , s , n , and tα/2 .
Then α = 0.05 (from 1-0.95), n = 18 , ^x = 3.268 , s = 0.0486 , n = 18, degrees of
freedom=18-1=17, and t0.05/2 = 2.11 . Plug these values into the formula below.
Thus, we are 95% confident that the true mean of regular gas price in
Sacramento is between 3.244 and 3.293. The formal interpretation is that in
repeated sampling, the interval will contain the true mean of the population
from which the data come 95% of the time.
Slide
17
Hypothesis Testing
Consider the following scenario:
I invite you to play a game where I pull a coin out and toss it. If it comes up
heads you pay me $1. Would you be willing to play? To decide whether to
play or not, many people would like to know if the coin is fair. To determine
if you think the coin is fair (a hypothesis) or not (alternative hypothesis) you
might take the coin and toss it a number of times, recording the outcomes
(data collection). Suppose you observe the following sequence of outcomes,
here H represents a head and T represents a tail -
HHHHHHHHTHHHHHHTHHHHHH
What would be your conclusion? Why?
Most people look at the observations and notice the large number of heads
(statistic) and conclude that they think the coin is not fair because the
probability of getting 20 heads out of 22 tosses is very small, if the coin is
fair (sampling distribution). It did happen; hence one rejects the idea of a
fair coin and consequently does not wish to participate in the game.
Slide
18
Hypothesis Testing Steps
1. State hypothesis
2. Collect data
3. Calculate test statistic
4. Determine likelihood of outcome, if null hypothesis is true
5. If the likelihood is small, then reject the null hypothesis
The one question that needs to be answered is “what is small?” To quantify
what small is one needs to understand the concept of a Type I error. As you
may recall from your Stat 1 course, there are the null ( 0 H ) and alternative
( 1 H ) hypotheses. Either one of them is true. Our test procedure should lead to
accept 0 H when 0 H is true and reject 0 H if 1 H is true, ideally. However, this
not always the case and errors could be made. Type I error is made if a true
0 H is rejected. Type II error is made if a false 0 H is accepted. This is
summarized below:
Slide
19
P-Values
In order to simplify the decision-making process for hypothesis
testing, p-values are frequently reported when the analysis is
performed on the computer. In particular a p-value refers to
where in the sampling distribution the test statistic resides.
Hence the decision rules managers can use are:
• If the p-value is < alpha, then reject Ho
• If the p-value is >=alpha, then do not reject Ho.
The p-value may be defined as the probability of obtaining a test
statistic equal to or more extreme than the result obtained from the
sample data, given the null hypothesis H0 is really true.
Can you explain this concept with an example?
Slide
20
P-Values Example
Slide
21
Exercise 1
The interest rates in percentage for the 15 sampled cards are:
15.6, 17.8, 14.6, 17.3, 18.7, 15.3, 16.4, 18.4, 17.6, 14.0, 19.2, 15.8,
18.1, 16.6, 17.0
Slide
22
Exercise 2
Slide
23
Stat 1 Review Summary Slide
Slide
24