What Is Statistic
What Is Statistic
What Is Statistic
Population
a population is the group of all items of interest to a
statistics practitioner.
frequently very large; sometimes innite.
E.g. All 5 million Florida voters
Sample
A sample is a set of data drawn from the population.
Potentially very large, but less than the population.
E.g. a sample of 765 voters exit polled on election day.
Parameter
A descriptive measure of a population.
Statistic
A descriptive measure of a sample.
Descriptive Statistics. . .
. . . are methods of organizing, summarizing, and
presenting data in a convenient and informative
way. These
methods include:
Graphical Techniques , and
Numerical Techniques .
The actual method used depends on what
information we would like to extract.
Inferential Statistics. . .
Descriptive Statistics describe the data set thats
being analyzed, but doesnt allow us to draw any
conclusions or make any interferences about the
data. Hence we need another branch of statistics:
inferential statistics.
However:
Such conclusions and estimates are not always going to be
correct. For this reason, we build into the statistical
inference measures of reliability, namely condence level
and signicance level.
Random Variables. .
variability is omnipresent in the
business world. To model variability
probabilistically, we need the
concept of a random variable.
A random variable is a numerically
valued variable which takes on
dierent values with given
probabilities.
Examples:
The return on an investment in a one-year
period
The price of an equity
The number of customers entering a store
The sales volume of a store on a
particular day
The turnover rate at your organization
next year
Types of Random
Variables. . .
Discrete Random Variable:
one that takes on a countable
number of possible
values, e.g.,
total of roll of two dice: 2, 3, . . . ,
12
number of desktops sold: 0, 1, . . .
customer count: 0, 1, . . .
Probability Distributions. . .
Random variableshave values that
are determined by chance events.
The future price of a share of stock is
a random variable because its value
is determined by chance factors such
as market conditions, the
accomplishment of revenue targets
by the company, interest rates, and
so on.
Population Variance. . .
The population variance is calculated similarly. It is the
weighted average of the squared deviations from the
mean. Formally
Poisson distribution
is the parameter which indicates the average number of events in the given time interval.
Hypergeometric Distribution
The probability distribution of a hypergeometric random variable is
called ahypergeometric distribution
The following notation is helpful, when we talk about
hypergeometric distributions and hypergeometric probability.
N: The number of items in thepopulation.
k: The number of items in the population that are classied as
successes.
n: The number of items in thesample.
x: The number of items in the sample that are classied as
successes.
kCx: The number ofcombinationsofkthings, takenxat a time.
h(x;N,n,k):hypergeometric probability- the probability that
ann-trial hypergeometric experiment results
inexactlyxsuccesses, when the population consists
ofNitems,kof which are classied as successes.
Hypergeometric Experiments
Hypergeometric Distribution
Ahypergeometric random variableis the number of
successes that result from a hypergeometric experiment.
Theprobability distributionof a hypergeometric random
variable is called a hypergeometric distribution.
Givenx,N,n, andk, we can compute the hypergeometric
probability based on the following formula:
Hypergeometric Formula.Suppose a population
consists ofNitems,kof which are successes. And a
random sample drawn from that population consists
ofnitems,xof which are successes. Then the
hypergeometric probability is:h(x;N,n,k) = [kCx] [N-kCnx] / [NCn]
Example 1
Suppose we randomly select 5 cards without replacement from an
ordinary deck of playing cards. What is the probability of getting
exactly 2 red cards (i.e., hearts or diamonds)?
Solution:This is a hypergeometric experiment in which we know
the following:
N = 52; since there are 52 cards in a deck.
k = 26; since there are 26 red cards in a deck.
n = 5; since we randomly select 5 cards from the deck.
x = 2; since 2 of the cards we select are red.
We plug these values into the hypergeometric formula as follows:
h(x;N,n,k) = [kCx] [N-kCn-x] / [NCn]
h(2;52,5,26) = [26C2] [26C3] / [52C5]
h(2;52,5,26) = [ 325 ] [ 2600 ] / [ 2,598,960 ] = 0.32513
Thus, the probability of randomly selecting 2 red cards is 0.32513.
Multinomial
The Binomial distribution was based on having a
series of events that could take on only two states:
success/failure, sick/well, heads/tails, et cetera.
But what if there are several possible events, like
left/right/center, or Africa/Eurasia/Australia/Americas?
The Multinomial distribution extends the Binomial
distribution for such cases.
The Binomial case could be expressed with one
parameter, p, which indicated success with probability
p and failure with probability 1 p. The Multinomial
case requires k variables, p1, . . . , p k, such that
where
p is the probability,
n is the total number of events
n1is the number of times Outcome 1 occurs,
n2is the number of times Outcome 2 occurs,
n3is the number of times Outcome 3 occurs,
p1is the probability of Outcome 1
p2is the probability of Outcome 2, and
p3is the probability of Outcome 3.
Continuous Distributions
Normal Distribution
83
1
2
l
n
x
1
2
2
f(x
)
e
x
2
0
Lognormal Distribution Probability Density Function
,
for X >0
f(x)
for X
If
X ~ LN(,),
then
Y= ln (X) ~ N(,)
Negative binomial
Gamma distribution
A better name in the statistical context would be
Negative Poisson, because it relates to the
Poisson distribution in the same way the Negative
binomial relates to the Binomial.
If the timing of events follows a Poisson
distribution, meaning that events come by at the
rate of per period, then this distribution tells us
how long we would have to wait until the nth
event occurs
The form of the Gamma distributionis typically
expressed in terms of a shape parameter 1/,
where is the Poisson parameter.
Bivariate Distributions. . .
Up to now, we have looked at univariate
distributions, i.e., probability distributions in one
variable.
Bivariate distributions, also called joint
distributions, are probabilities of combinations of
two variables.
For discrete variables X and Y , the joint
probability distribution or joint probability mass
function of X and Y is dened as:
P(x, y) P(X = x and Y = y)
for all pairs of values x and y.
Marginal Probabilities
Correlation
correlation also tells you the degree to which the
variables tend to move together.
covariance measures variables that have
dierent units of measurement. Using covariance,
you could determine whether units were
increasing or decreasing, but it was impossible to
measure the degree to which the variables
moved together because covariance does not use
one standard unit of measurement. To measure
the degree to which variables move together, you
must use correlation.