Gentle Introduction to Inferential Statistics!
Hi Guys!
In this article, we will have a look at a deeper intuition of inferential statistics.
What is Inferential Statistics?
Inferential Statistics are all about probability theory!
What is probability?
- Probability is a numerical description of how likely an event will occur or how likely the proportion will be true.
- Probability is a number between 0 and 1.
- 0 — It shows the impossibility
- 1 — It indicates certainty.
- You can’t find the probability of a continuous event instead of you just change to discrete events.
- The Sum of all Probability in a set must sum to 1.
It’s used in:
- Weather Prediction.
- Insurance.
- Medicine.
- Physics, chemistry, and all applied science.
- Deep Learning / AI.
When do we need probability?
We need probability when there is uncertainty about the outcome of an event.
1. Speed of light vs. speed of sound.
2. Height of men vs. height of women.
3. Whether a test for a disease shows the disease.
4. Whether a potato will fall to the ground if you drop it.
Probability vs Proportion:
Probability: It tells how likely the event will occur or the statement is true.
Proportion: Fraction of the hole!
Proportion is nothing but exactly equivalent
(‘::’) — we use this symbol for exactly equivalent.
Ex: 2/4 or 0.50, 0.50 :: 50%, 0.05:: 5%.
Compute Proportion
- Discrete, Ordinal, Nominal are suitable data types of computing probability.
Proportion in action!
# import libraries
import matplotlib.pyplot as plt
import numpy as np## the basic formula
# counts of the different events
c = np.array([ 1, 2, 4, 3 ])
# convert to probability (%)
prob = 100*c / np.sum(c)
Probability and Odds
- Odds are the ratio of the probability of an event not occurring to the probability of the event occurring.
1-p: The probability of an event not occurring.
p: The probability of an event occurring.
Probability Mass Function:
- A function that describes the probability of a set of exclusive discrete events.
- Picking cards are discrete events (aka. I got queen and joker)
- Picking a ball from a jug is a discrete event (aka. I have taken 3 balls).
- Tossing a coin is also a discrete event (aka. I got tail).
- We can make histograms easily with discrete events.
Probability Density Function
- A Function that describes the probability for a set of exclusive, continuous event occurs.
- It has a smooth curve. If you sum all the values, it must be 1.
- Finding the average of penguins in Antarctica gives continuous numbers.
- Find the average probability of tossing a key 100 times gives continuous numbers.
- Distance between India to the Netherlands is continuous numbers.
Cumulative Distribution Function (CDF):
- A CDF is the cumulative sum (or integral) of the probability distribution (or density).
- The y-axis value at each x-value is the sum of all probabilities to the left of that x-value.
- A CDF starts at O and increases monotonically to 1. The sum of the CDF is over 1.
- The cumulative distribution function is the sum of previous value, it will increase up to 1 monotonically.
- It will help to find various insights about the distribution, and we will get the expected value accurately.
- Let see in code!
Import libraries
# import libraries
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
Creating Data
## example using log-normal distribution
# variable to evaluate the functions on
x = np.linspace(0,5,1001)
# note the function call pattern...
p1 = stats.lognorm.pdf(x,1)
c1 = stats.lognorm.cdf(x,1)
p2 = stats.lognorm.pdf(x,.1)
c2 = stats.lognorm.cdf(x,.1)
Implementing cumsum
# draw the pdfs
fig,ax = plt.subplots(2,1,figsize=(4,7))
ax[0].plot(x,p1/sum(p1)) # question: why divide by sum here?
ax[0].plot(x,p1/sum(p1), x,p2/sum(p2))
# draw the cdfs
ax[1].plot(x,c1, x,c2)
Sample Distribution
- Sample Distribution means we draw a copy of population data with randomly selected data from population.
- To analyze population data is too difficult because it has a lot of values and it takes too much time to analyze the data. So, we will take a sample of data and analyze the statistical methods.
- To select a sample from the population, we have a lot of techniques like random sampling, stratified sampling, snowball sampling, but we here see general sampling and what it means!
- To sample, we have some steps
Step 1: We have data for giraffes. Our ultimate aim is to find the sample distribution of the data.
Step 2: We will take a random giraffe for the first observation, and we will take some other giraffes for the second observation, and it goes . . . N
Step 3: Every observation has a mean, we will create separate data that contains the mean value for every observation.
Step 4: Then we plot the data into the graph. That is Sample Distribution.
Monte Carlo Sampling
- Is the same thing as randomly sampling from a population to estimate an unknown population parameter.
It has many subgroups.
- Monte Carlo sampling is another name of Random Sampling.
Thank you!
Name: R. Aravindan
Position: Content Writer.
Company: Artificial Neurons.AI