Gentle Introduction to Inferential Statistics!
Hi Guys!
In this article, we will have a look at a deeper intuition of inferential statistics.
What is Inferential Statistics?
Inferential Statistics are all about probability theory!
What is probability?
- Probability is a numerical description of how likely an event will occur or how likely the proportion will be true.
- Probability is a number between 0 and 1.
- 0 — It shows the impossibility
- 1 — It indicates certainty.
- You can’t find the probability of a continuous event instead of you just change to discrete events.
- The Sum of all Probability in a set must sum to 1.
It’s used in:
- Weather Prediction.
- Insurance.
- Medicine.
- Physics, chemistry, and all applied science.
- Deep Learning / AI.
When do we need probability?
We need probability when there is uncertainty about the outcome of an event.
Consider:
1. Speed of light vs. speed of sound.
2. Height of men vs. height of women.
3. Whether a test for a disease shows the disease.
4. Whether a potato will fall to the ground if you drop it.
Probability vs Proportion:
Probability: It tells how likely the event will occur or the statement is true.
Proportion: Fraction of the hole!
Proportion is nothing but exactly equivalent
(‘::’) — we use this symbol for exactly equivalent.
Ex: 2/4 or 0.50, 0.50 :: 50%, 0.05:: 5%.
Compute Proportion
- Discrete, Ordinal, Nominal are suitable data types of computing probability.
Formula:
Proportion in action!
# import libraries
import matplotlib.pyplot as plt
import numpy as np## the basic formula
# counts of the different events
c = np.array([ 1, 2, 4, 3 ])
# convert to probability (%)
prob = 100*c / np.sum(c)
print(prob)
Probability and Odds
- Odds are the ratio of the probability of an event not occurring to the probability of the event occurring.
1-p: The probability of an event not occurring.
p: The probability of an event occurring.
Probability Mass Function:
- A function that describes the probability of a set of exclusive discrete events.
- Picking cards are discrete events (aka. I got queen and joker)
- Picking a ball from a jug is a discrete event (aka. I have taken 3 balls).
- Tossing a coin is also a discrete event (aka. I got tail).
- We can make histograms easily with discrete events.
Probability Density Function
- A Function that describes the probability for a set of exclusive, continuous event occurs.
- It has a smooth curve. If you sum all the values, it must be 1.
- Finding the average of penguins in Antarctica gives continuous numbers.
- Find the average probability of tossing a key 100 times gives continuous numbers.
- Distance between India to the Netherlands is continuous numbers.
Cumulative Distribution Function (CDF):
- A CDF is the cumulative sum (or integral) of the probability distribution (or density).
- The y-axis value at each x-value is the sum of all probabilities to the left of that x-value.
- A CDF starts at O and increases monotonically to 1. The sum of the CDF is over 1.
- The cumulative distribution function is the sum of previous value, it will increase up to 1 monotonically.
- It will help to find various insights about the distribution, and we will get the expected value accurately.
- Let see in code!
Import libraries
# import libraries
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
Creating Data
## example using log-normal distribution
# variable to evaluate the functions on
x = np.linspace(0,5,1001)
# note the function call pattern...
p1 = stats.lognorm.pdf(x,1)
c1 = stats.lognorm.cdf(x,1)
p2 = stats.lognorm.pdf(x,.1)
c2 = stats.lognorm.cdf(x,.1)
Implementing cumsum
# draw the pdfs
fig,ax = plt.subplots(2,1,figsize=(4,7))
ax[0].plot(x,p1/sum(p1)) # question: why divide by sum here?
ax[0].plot(x,p1/sum(p1), x,p2/sum(p2))
ax[0].set_ylabel('probability')
ax[0].set_title('pdf(x)')
# draw the cdfs
ax[1].plot(x,c1)
ax[1].plot(x,c1, x,c2)
ax[1].set_ylabel('probability')
ax[1].set_title('cdf(x)')
plt.show()
Sample Distribution
- Sample Distribution means we draw a copy of population data with randomly selected data from population.
- To analyze population data is too difficult because it has a lot of values and it takes too much time to analyze the data. So, we will take a sample of data and analyze the statistical methods.
- To select a sample from the population, we have a lot of techniques like random sampling, stratified sampling, snowball sampling, but we here see general sampling and what it means!
- To sample, we have some steps
Step 1: We have data for giraffes. Our ultimate aim is to find the sample distribution of the data.
Step 2: We will take a random giraffe for the first observation, and we will take some other giraffes for the second observation, and it goes . . . N
Step 3: Every observation has a mean, we will create separate data that contains the mean value for every observation.
Step 4: Then we plot the data into the graph. That is Sample Distribution.
Monte Carlo Sampling
- Is the same thing as randomly sampling from a population to estimate an unknown population parameter.
It has many subgroups.
- Monte Carlo sampling is another name of Random Sampling.
Thank you!
Name: R. Aravindan
Position: Content Writer.
Company: Artificial Neurons.AI