Sampling distribution Using Python

Last Updated : 30 Jul, 2024

There are different types of distributions that we study in statistics like normal/gaussian distribution, exponential distribution, binomial distribution, and many others. We will study one such distribution today which is Sampling Distribution.

Let’s say we have some data then if we sample some finite number of data points from it and then calculate some statistical measure of it and let’s do this some n number of times. Then if we draw the distribution curve of those sample statistics then the distribution obtained is known as Sampling Distribution.

Sampling distribution Using Python

There is also a special case of the sampling distribution which is known as the Central Limit Theorem which says that if we take some samples from a distribution of data(no matter how it is distributed) then if we draw a distribution curve of the mean of those samples then it will be a normal distribution.

Let’s understand it by using an example:

Let’s take numbers from 1 to 10 and use them as our primary data.

import numpy as np
num = np.arange(10)
num

Output:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Now, let’s sample two points from the data and take the average of these two. Also, let’s maintain a dictionary with the sample means and the number of times they appear.

sample_freq = {}

for i in range(4):
    for j in range(4):
          # Selecting each pair possible with 
        # repetition
      
        mean_of_two = (num[i] + num[j]) / 2
        
        if (mean_of_two in sample_freq):
              # Updating the value for a mean value
            # if it already exists
            sample_freq[mean_of_two] += 1
            
        else:
              # Adding a new key to the dictionary
            # if it is not their
            sample_freq[mean_of_two] = 1

sample_freq

Output:

{1.0: 1, 1.5: 2, 2.0: 3, 2.5: 4, 3.0: 3, 3.5: 2, 4.0: 1}

Now, let’s plot the sample statistics to visualize its distribution.

import matplotlib.pyplot as plt
plt.scatter(sample_freq.keys(), sample_freq.values())
plt.show()

Output:

Distribution of the sample statistic

From the above graph, we can observe that the distribution of the sample statistic is symmetric and if we will take infinite such points which are totally random then we’ll be able to observe that the distribution formed will be a normal/gaussian distribution.

There are some error measurements that are related to the sampling distributions:

Standard Error

Let’s say we have a sampling distribution that has been calculated using some sample statistics then the SE of that statistics is calculated by dividing the standard deviation of those statistics by the square root of the sample size.

$SE = \frac{\sigma}{\sqrt n}$

$\sigma \rightarrow \;\text{Standard Deviation of the sample statistic.}$

$n \rightarrow \;\text{Number of the samples taken}$

means = []

# getting all the mean values also
# taking account of number of times they occur
for key in sample_freq.keys():
    for _ in range(sample_freq[key]):
        means.append(key)

# Applying standard error formula
se = np.std(means)/np.sqrt(len(means))
print(f'Standard Error of the samples is {se}.')

Output:

Standard Error of the samples is 0.19764235376052372.

Data Mining - Time-Series, Symbolic and Biological Sequences Data

abhishekm482g

Improve

Article Tags :

Practice Tags :

Sampling distribution Using Python

Sampling distribution Using Python

Let’s understand it by using an example:

Standard Error

Similar Reads

Introduction to Data Analysis

Data Analysis Libraries

Data Visulization Libraries

Exploratory Data Analysis (EDA)

Data Preprocessing

Data Transformation

Time Series Data Analysis

Case Studies and Projects

Thank You!

What kind of Experience do you want to share?