Working with Missing Data in Pandas

Last Updated : 11 Mar, 2025

In Pandas, missing values are represented by None or NaN, which can occur due to uncollected data or incomplete entries. Let’s explore how to detect, handle, and fill in missing values in a DataFrame to ensure accurate analysis.

Checking for Missing Values in Pandas DataFrame

To identify and handle the missing values, Pandas provides two useful functions: isnull() and notnull(). These functions help detect whether a value is NaN or not, making it easier to clean and preprocess data in a DataFrame or Series.

1. Checking for Missing Values Using isnull()

isnull() returns a DataFrame of Boolean values, where True represents missing data (NaN). This is useful when you want to locate and address missing data within a dataset.

Example 1: Detecting Missing Values in a DataFrame

# Importing pandas and numpy
import pandas as pd
import numpy as np

# Sample DataFrame with missing values
data = {'First Score': [100, 90, np.nan, 95],
        'Second Score': [30, 45, 56, np.nan],
        'Third Score': [np.nan, 40, 80, 98]}

df = pd.DataFrame(data)

# Checking for missing values using isnull()
missing_values = df.isnull()

print(missing_values)

Output:

Example 2: Filtering Data Based on Missing Values

In this case, the isnull() function is applied to the “Gender” column to filter and display rows with missing gender information.

import pandas as pd

data = pd.read_csv("employees.csv")
bool_series = pd.isnull(data["Gender"])
missing_gender_data = data[bool_series]
print(missing_gender_data)

Output:

Checking for Missing Values Using notnull()

notnull() returns a DataFrame of Boolean values, where True indicates non-missing data. This function can be useful when you want to focus on the rows that contain valid, non-missing data.

Example 3: Detecting Non-Missing Values in a DataFrame

# Importing pandas and numpy
import pandas as pd
import numpy as np

# Sample DataFrame with missing values
data = {'First Score': [100, 90, np.nan, 95],
        'Second Score': [30, 45, 56, np.nan],
        'Third Score': [np.nan, 40, 80, 98]}

df = pd.DataFrame(data)

# Checking for non-missing values using notnull()
non_missing_values = df.notnull()

print(non_missing_values)

Output:

Example 4: Filtering Data with Non-Missing Values

This code snippet uses the notnull() function to filter out rows where the “Gender” column does not have missing values.

# Importing pandas
import pandas as pd

# Reading data from a CSV file
data = pd.read_csv("employees.csv")

# Identifying non-missing values in the 'Gender' column
non_missing_gender = pd.notnull(data["Gender"])

# Filtering rows where 'Gender' is not missing
non_missing_gender_data = data[non_missing_gender]

display(non_missing_gender_data)

Output:

Filling Missing Values in Pandas Using fillna(), replace(), and interpolate()

When working with missing data in Pandas, the fillna(), replace(), and interpolate() functions are commonly used to fill NaN values. These functions allow you to replace missing values with a specific value or use interpolation techniques.

1. Filling Missing Values with a Specific Value Using fillna()

The fillna() function is used to replace missing values (NaN) with a specified value. For example, you can fill missing values with 0.

Example: Fill Missing Values with Zero

import pandas as pd
import numpy as np

dict = {'First Score': [100, 90, np.nan, 95],
        'Second Score': [30, 45, 56, np.nan],
        'Third Score': [np.nan, 40, 80, 98]}

df = pd.DataFrame(dict)

# Filling missing values with 0
df.fillna(0)

Output:

2. Filling Missing Values with the Prev/Next Value Using fillna

You can use the pad method to fill missing values with the previous value, or bfill to fill with the next value. We will be using the above dataset for the demonstration.

Example: Fill with Previous Value (Forward Fill)

df.fillna(method='pad')  # Forward fill

Output:

Example: Fill with Next Value (Backward Fill)

df.fillna(method='bfill')  # Backward fill

Output:

Example: Fill NaN Values with ‘No Gender’ using fillna()

Download the csv file from here.

import pandas as pd
import numpy as np

data = pd.read_csv("employees.csv")
# Print records from 10th row to 24th row
data[10:25]

Output

Now we are going to fill all the null values in Gender column with “No Gender”

 # filling a null values using fillna() 
data["Gender"].fillna('No Gender', inplace = True) 
data[10:25]

Output:

3. Replacing Missing Values Using replace()

Use replace() to replace NaN values with a specific value like -99.

Example: Replace NaN with -99

import pandas as pd
import numpy as np

data = pd.read_csv("employees.csv")
data[10:25]

Output:

Now, we are going to replace the all Nan value in the data frame with -99 value.

data.replace(to_replace=np.nan, value=-99)

Output:

4. Filling Missing Values Using interpolate()

The interpolate() function fills missing values using interpolation techniques, such as the linear method.

Example: Linear Interpolation

# importing pandas as pd 
import pandas as pd
  
# Creating the dataframe  
df = pd.DataFrame({"A": [12, 4, 5, None, 1], 
                   "B": [None, 2, 54, 3, None], 
                   "C": [20, 16, None, 3, 8], 
                   "D": [14, 3, None, None, 6]}) 
  
# Print the dataframe 
print(df)

Output:

Let’s interpolate the missing values using Linear method. Note that Linear method ignore the index and treat the values as equally spaced.

# to interpolate the missing values 
df.interpolate(method ='linear', limit_direction ='forward')

Output:

This method fills missing values by treating the data as equally spaced.

Dropping Missing Values in Pandas Using dropna()

The dropna()function in Pandas removes rows or columns with NaN values. It can be used to drop data based on different conditions.

1. Dropping Rows with At Least One Null Value

Use dropna() to remove rows that contain at least one missing value.

Example: Drop Rows with At Least One NaN

import pandas as pd
import numpy as np

dict = {'First Score': [100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score': [52, 40, 80, 98],
        'Fourth Score': [np.nan, np.nan, np.nan, 65]}

df = pd.DataFrame(dict)

# Drop rows with at least one missing value
df.dropna()

Output:

2. Dropping Rows with All Null Values

You can drop rows where all values are missing using dropna(how=’all’).

Example: Drop Rows with All NaN Values

dict = {'First Score': [100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score': [52, np.nan, 80, 98],
        'Fourth Score': [np.nan, np.nan, np.nan, 65]}

df = pd.DataFrame(dict)

# Drop rows where all values are missing
df.dropna(how='all')

Output:

3. Dropping Columns with At Least One Null Value

To remove columns that contain at least one missing value, use dropna(axis=1).

Example: Drop Columns with At Least One NaN

dict = {'First Score': [100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score': [52, np.nan, 80, 98],
        'Fourth Score': [60, 67, 68, 65]}

df = pd.DataFrame(dict)

# Drop columns with at least one missing value
df.dropna(axis=1)

Output :

4. Dropping Rows with Missing Values in CSV Files

When working with data from CSV files, you can drop rows with missing values using dropna().

Example: Drop Rows with NaN in a CSV File

import pandas as pd

data = pd.read_csv("employees.csv")

# Drop rows with any missing value
new_data = data.dropna(axis=0, how='any')

# Compare lengths of original and new dataframes
print("Old data frame length:", len(data))
print("New data frame length:", len(new_data))
print("Rows with at least one missing value:", (len(data) - len(new_data)))

Output :

Old data frame length: 1000
New data frame length: 764
Rows with at least one missing value: 236

Since the difference is 236, there were 236 rows which had at least 1 Null value in any column.