Gentle Introduction to Inferential Statistics!

Artificial Neurons.AI
5 min readDec 20, 2021

--

Hi Guys!

In this article, we will have a look at a deeper intuition of inferential statistics.

What is Inferential Statistics?

Inferential Statistics are all about probability theory!

What is probability?

  • Probability is a numerical description of how likely an event will occur or how likely the proportion will be true.
  • Probability is a number between 0 and 1.
  • 0 — It shows the impossibility
  • 1 — It indicates certainty.
  • You can’t find the probability of a continuous event instead of you just change to discrete events.
  • The Sum of all Probability in a set must sum to 1.

It’s used in:

  1. Weather Prediction.
  2. Insurance.
  3. Medicine.
  4. Physics, chemistry, and all applied science.
  5. Deep Learning / AI.

When do we need probability?

We need probability when there is uncertainty about the outcome of an event.

Consider:

1. Speed of light vs. speed of sound.

2. Height of men vs. height of women.

3. Whether a test for a disease shows the disease.

4. Whether a potato will fall to the ground if you drop it.

Probability vs Proportion:

Probability: It tells how likely the event will occur or the statement is true.

Proportion: Fraction of the hole!

Proportion is nothing but exactly equivalent

(‘::’) — we use this symbol for exactly equivalent.

Ex: 2/4 or 0.50, 0.50 :: 50%, 0.05:: 5%.

Compute Proportion

  • Discrete, Ordinal, Nominal are suitable data types of computing probability.

Formula:

Proportion in action!

# import libraries
import matplotlib.pyplot as plt
import numpy as np
## the basic formula
# counts of the different events
c = np.array([ 1, 2, 4, 3 ])
# convert to probability (%)
prob = 100*c / np.sum(c)
print(prob)

Probability and Odds

  • Odds are the ratio of the probability of an event not occurring to the probability of the event occurring.

1-p: The probability of an event not occurring.

p: The probability of an event occurring.

Probability Mass Function:

  • A function that describes the probability of a set of exclusive discrete events.
  • Picking cards are discrete events (aka. I got queen and joker)
  • Picking a ball from a jug is a discrete event (aka. I have taken 3 balls).
  • Tossing a coin is also a discrete event (aka. I got tail).
  • We can make histograms easily with discrete events.

Probability Density Function

  • A Function that describes the probability for a set of exclusive, continuous event occurs.
  • It has a smooth curve. If you sum all the values, it must be 1.
  • Finding the average of penguins in Antarctica gives continuous numbers.
  • Find the average probability of tossing a key 100 times gives continuous numbers.
  • Distance between India to the Netherlands is continuous numbers.

Cumulative Distribution Function (CDF):

  • A CDF is the cumulative sum (or integral) of the probability distribution (or density).
  • The y-axis value at each x-value is the sum of all probabilities to the left of that x-value.
  • A CDF starts at O and increases monotonically to 1. The sum of the CDF is over 1.
  • The cumulative distribution function is the sum of previous value, it will increase up to 1 monotonically.
  • It will help to find various insights about the distribution, and we will get the expected value accurately.
  • Let see in code!

Import libraries

# import libraries
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats

Creating Data

## example using log-normal distribution
# variable to evaluate the functions on
x = np.linspace(0,5,1001)
# note the function call pattern...
p1 = stats.lognorm.pdf(x,1)
c1 = stats.lognorm.cdf(x,1)
p2 = stats.lognorm.pdf(x,.1)
c2 = stats.lognorm.cdf(x,.1)

Implementing cumsum

# draw the pdfs
fig,ax = plt.subplots(2,1,figsize=(4,7))
ax[0].plot(x,p1/sum(p1)) # question: why divide by sum here?
ax[0].plot(x,p1/sum(p1), x,p2/sum(p2))
ax[0].set_ylabel('probability')
ax[0].set_title('pdf(x)')
# draw the cdfs
ax[1].plot(x,c1)
ax[1].plot(x,c1, x,c2)
ax[1].set_ylabel('probability')
ax[1].set_title('cdf(x)')
plt.show()

Sample Distribution

  • Sample Distribution means we draw a copy of population data with randomly selected data from population.
  • To analyze population data is too difficult because it has a lot of values and it takes too much time to analyze the data. So, we will take a sample of data and analyze the statistical methods.
  • To select a sample from the population, we have a lot of techniques like random sampling, stratified sampling, snowball sampling, but we here see general sampling and what it means!
  • To sample, we have some steps

Step 1: We have data for giraffes. Our ultimate aim is to find the sample distribution of the data.

Step 2: We will take a random giraffe for the first observation, and we will take some other giraffes for the second observation, and it goes . . . N

Step 3: Every observation has a mean, we will create separate data that contains the mean value for every observation.

Step 4: Then we plot the data into the graph. That is Sample Distribution.

Monte Carlo Sampling

  • Is the same thing as randomly sampling from a population to estimate an unknown population parameter.

It has many subgroups.

  • Monte Carlo sampling is another name of Random Sampling.

Thank you!

Name: R. Aravindan

Position: Content Writer.

Company: Artificial Neurons.AI

--

--

No responses yet