Probability Distributions-Sarin B

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

Probability Distributions

Statistics:

• Statistics is the science of data. This involves collecting, classifying,


summarizing, organizing, analyzing and interpreting numerical
information.

• Most research studies result in a large volume of raw data which must
be suitably reduced so that the same can be read easily and can be used
for further analysis.
Key Terms
 Data
Facts or information that is relevant or appropriate to a decision maker

 Population
The totality of objects under consideration

 Sample
A portion of the population that is selected for analysis

 Parameter
A summary measure (e.g., mean) that is computed to describe characteristic
of the population
Variables
Independent variable: is a variable that can be controlled or manipulated.

Dependent variable: is a variable that cannot be controlled or manipulated.


Its values are predicted from the independent variable.

Discrete variables are measured in units that cannot be subdivided.


Example: Number of children

Continuous variables are measured in a unit that can be subdivided


infinitely. Example: Height
Types of Statistical Applications

• There are two major areas of statistics viz., descriptive statistics and inferential statistics.

Descriptive statistics uses numerical and graphical methods to summarize the information revealed in a
data set and to present that information in a convenient form.

Descriptive statistics do not allow us to make conclusions beyond the data we have analysed or reach
conclusions regarding any hypotheses we might have made.
• Examples:

1) Measures of central location


Mean,
Median,
mode and midrange

2) Measures of Variation
Variance,
Standard Deviation,
z-scores
• Inferential statistics utilizes sample data to make estimates, decisions,
predictions or other generalizations about a larger set of data.

• Inferential statistics are techniques that allow us to use these samples to


make generalizations about the populations from which the samples were
drawn.

• “Sampled” data are incomplete but can still be representative of the


population.
• Permits the making of generalizations (inferences) about the data.
• Probability theory is a major tool used to analyze sampled data.
Probability Distributions
• A Probability Distribution is a table or an equation that associates each
outcome of a statistical experiment with its probability of occurrence.

• Probability Distributions are a fundamental concept in statistics and are


used at both theoretical and practical levels.

• Eg:-to compute confidence interval for parameters and to calculate


critical regions of hypothesis tests.
Probability distributions are generally divided into two classes
1.Discrete probability distribution

• If a random variable is a discrete variable, its probability is called a discrete


probability distribution.
• In the case of a discrete probability distribution, each possible value of the
discrete random variable can be associated with a non-zero probability.
• Examples of discrete probability distribution are
• a) Binomial Distribution
• b) Poisson Distribution
• c) Bernoulli Distribution
2. Continuous probability distribution

• If a random variable is a continuous variable, its probability is called a continuous


probability distribution.

• Examples of discrete probability distribution are


• a) Normal Distribution (Gaussian)
• b) Chi-square distribution
• c) F-distribution
• d) Student’s t-distribution

• The probability that a continuous random variable will assume a particular value is zero.
Therefore, continuous probability distribution cannot be expressed in tabular form.it can
be expressed as an equation or formula.
• Binomial Distribution:

• The Binomial Distribution describes discrete, not continuous, data,


resulting from an experiment known as Bernoulli process.

• A frequency distribution of the possible number of successful


outcomes in a given number of trials in each of which there is the
same probability of success.

• Binomial distribution is a discrete probability distribution, defined by


the probability function-
• f(x) = nCx (p)x (q)n- x x = 0,1,2……n
• Where
• x = No. of successes
• n = number of trails
• p = probability of success
• q = (1-p)
• There are 4 conditions need to be satisfied for a binomial experiment
1.There is a fixed number of n trials carried out
2.The outcome of a given trial is either a “success” or “failure”
3.The probability of success (p) remains constant from trial to trial
4.The trials are independent, the outcome of a trial is not affected by the
outcome of any other trial.
Poisson Distribution

• Poisson distribution is a discrete probability distribution that expresses


the probability of a given number of events occurring in a fixed
interval of time or space if these events occur with a known constant
rate and independently of the time since the last event.

• The poisson is a one-parameter discrete distribution that takes non-


negative integer values.
• F(x) = e-λ λx x = 0,1, 2.n
x!
• Where,

• λ = parameter of distribution.

• λ is real number, equal to the expected number of occurrences that


occur during the given time interval.
• Normal Distribution:

• The central limit theorem states that when a large number of random
variables are independently and identically distributed with finite
variance, their sum is approximately normally distributed.

• If the sample size is large enough, the distribution of the means will
follow a Gaussian distribution even if the population is not Gaussian.

• The measures of variation like SD give an idea about the distribution of


data around a central value. Using the mean and standard deviation, a
distribution can be created, which measures variability of observations
around mean. Such a distribution is called normal distribution.
• Characteristics of a Normal Distribution:

• The normal curve is bell-shaped and has a single peak at the exact
center of the distribution.
• The arithmetic mean, median, and mode of the distribution are equal
and located at the peak.
• Half the area under the curve is above the peak, and the other half is
below it.
• The normal distribution is symmetrical about its mean.

• It is often called the bell-shaped curve because the graph of its


probability density resembles a bell.
Bell Curve
Curve is asymptotic to the x- axis

Total area under the curve above the x-axis = 1 or 100%

The normal distribution is defined by the probability density function –


• In a normal distribution, about 68% of all observations lie within one
SD on either side of mean µ;

• roughly 95% of the observations lie within two SD.

• and about 99.7% lie within three SD.

• this is known as 68 – 99.7 rule or empirical rule. this rule applies to


only to normal distributions.
Normal Distribution

You might also like