Basic Statistics in Fluid Mechanics
Basic Statistics in Fluid Mechanics
Basic Statistics in Fluid Mechanics
• Probability distributions
• Discrete probability distributions
• Continuous probability distributions
• Binomial distribution
• Normal distribution
This work is licensed under a Creative Commons Attribution 4.0 International License.
Random Variables
Definition:
• A random variable, usually written X, is a variable whose
possible values are numerical outcomes of a random
phenomenon. These values can be associated with
probabilities. There are two types of random variables,
discrete and continuous.
2
Probability Functions/Distribution
Note:
• p(x) is a number between 0 and 1
• Area under a probability function is always 1
3
Distributions
4
Mean and Variance
• If we understand the underlying probability distribution
of a certain phenomenon, we know how x is expected to
behave on average
• The expected value E[X] is the weighted average or
mean (µ) of random variable X
• A random variable X takes values x1 with a probability p1,
x2 with p2,… and xn with pn, the expected value or mean
is then given by
5
Mean and Variance
• The variance describes how far the values of a random
variable deviate from the mean
• Variance Var[X] of a random variable X with expected
value µ=E[X] is given by
Questions:
• How can you relate the concepts of accuracy and precision to
the measure of variance
• What about reproducibility?
6
Discrete Example: Roll of a Die
• There are six possible outcomes for a die roll: numbers from one
through six
• Assume the die is fair, i.e. all numbers have the same probability
of showing
• If all outcomes are equally likely, then the probabilities are equal
as well – and since the sum over all probabilities has to be one,
they are all 1/6
• The histogram below shows the probabilities for each number
showing for every single roll of the die
p(x)
1/6
x
1 2 3 4 5 6
http://s522.photobucket.com/user/poka-dot-pocky/media/Gaia/Decorated%20images/dice_zps0d0b23cc.png.html 7
Discrete Example: Roll of a Die
Definition:
The cumulative distribution function (CDF) or just
distribution function, describes the probability that random
variable X with a given probability distribution will be found
to have a value less than or equal to x.
11
Important Discrete Distributions
0 10 20 30 40
• N marbles in a jar
• r black and N-r white
• What is the probability
to have k black marbles,
if n are drawn with
replacement ?
14
Binomial Distribution
• The binomial distribution B(x;n,p) describes the probability for an
n-trial binomial experiment to result in exactly k successes:
where
• k: the number of successes that result from the binomial experiment
• n: the number of trials in the binomial experiment
• p: the probability of success in an individual trial
• q: the probability of failure (q = 1 - p)
• Binomial coefficient (read: “n choose k”) the number of different ways to
choose k things out of n things
15
Example: Drawing Marbles
Experiment: Draw two marbles from a jar containing 10
white and 10 black marbles (with replacement)
• Probability of having drawing k black marbles is:
16
Example: Throwing a Coin
Experiment: Throw a fair coin ten times
Probability
Number of heads
17
Binomial Approximation of the
Poisson Distribution
• Let P(x=k) denote the binomial distribution
and let p = λ/n
18
Binomial Approximation of the
Poisson Distribution
We thus obtain
n=1,000 n=5,000
p=0.009 p=0.0009
20
Poisson Distribution
n = 1000 n = 1000
λ = 4.5 λ = 2.5
21
Continuous Random Variables
• The probability function fX for a continuous random
variable X is a non-negative, continuous function that
integrates to 1
22
Gaussian Distribution
• The probability function is given by
• By definition
23
Gaussian Mean and Variance
• The expectation value is calculated as follows,
• Furthermore
24
Standard Normal Distribution
• The standard normal distribution corresponds to the
general form of the Gaussian distribution with µ = 0 and
σ2 = 1 (centered, unit variance)
• An arbitrary normal distribution can be converted to a
standard normal distribution via Z-transformation:
resulting in
25
Gaussian Distribution
26
Error Function
29
p-Value
• Widely used as a measure of statistical significance
• p-values in the measurement context:=
The probability of observing an incorrect event with a
given score or better
• Hence, a low p-value implies a low probability that the
observed measurement is incorrect
• The p-value can be derived from the false positive rate
(FPR), the fraction of incorrect measurements among a
set of measurements (i.e., all measurements above a
given threshold)
• Problems associated with p-value calculations
• The FPR is usually unknown
• p-values should be corrected for multiple hypothesis testing
30
p-Values in Statistical Testing
• p-values are used to judge the significance of a test for the
null-hypothesis
• Null-hypothesis:= corresponds to the default position, e.g.,
random chance peptide identification or mean values of two
independent measurements are not different
• Alternative-hypothesis:=the opposite positions, e.g., non
random peptide identification
• Usually, the null hypothesis cannot be formally proven, but
statistical testing can accept or reject the null-hypothesis
• The null-hypothesis is rejected if the p-value is less then a
significance level α (e.g., 0.05 or 0.01)
31
What is False?
32
A note of Weibull
33
Understanding Poisson
34