Python Pandas - Boolean Indexing



Boolean indexing is a technique used to filter data based on specific conditions. It allows us to create masks or filters that extract subsets of data meeting defined criteria. It allows selecting elements from an array, list, or DataFrame using boolean values (True or False).

Instead of manually iterating through data to find values that meet a condition, Boolean indexing simplifies the process by applying logical expressions.

What is Boolean Indexing in Pandas?

In Pandas, Boolean indexing is used to filter rows or columns of a DataFrame or Series based on conditional statements. It helps extract specific data that meets the defined condition by creating boolean masks, which are arrays of True and False values. The True values indicate that the respective data should be selected, while False values indicate not selected.

In this tutorial, we will learn how to access data in a Pandas DataFrame using Boolean indexing with conditional expressions, .loc[], and .iloc[] methods. We will also explore how to apply complex conditions using logical operators for advanced filtering.

Creating a Boolean Index

Creating a boolean index is done by applying a conditional statement to a DataFrame or Series object. For example, if you specify a condition to check whether values in a column are greater than a specific number, then Pandas will return a series of True or False values, which results in a Boolean index.

Example: Creating a Boolean Index

The following example demonstrates how to create a boolean index based on a condition.

Open Compiler
import pandas as pd # Create a Pandas DataFrame df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], columns=['A', 'B']) # Display the DataFrame print("Input DataFrame:\n", df) # Create Boolean Index result = df > 2 print('Boolean Index:\n', result)

Following is the output of the above code −

Input DataFrame:
A B
0 1 2
1 3 4
2 5 6
Boolean Index:
        A      B
0  False  False
1   True   True
2   True   True

Filtering Data Using Boolean Indexing

Once a boolean index is created, you can use it to filter rows or columns in the DataFrame. This is done by using .loc[] for label-based indexing and .iloc[] for position-based indexing.

Example: Filtering Data using the Boolean Index with .loc

The following example demonstrates filtering the data using boolean indexing with the .loc method. The .loc method is used to filter rows based on the boolean index and specify columns by their label.

Open Compiler
import pandas as pd # Create a Pandas DataFrame df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], columns=['A', 'B']) # Display the DataFrame print("Input DataFrame:\n", df) # Create Boolean Index s = (df['A'] > 2) # Filter DataFrame using the Boolean Index with .loc print('Output Filtered DataFrame:\n',df.loc[s, 'B'])

Following is the output of the above code −

Input DataFrame:
A B
0 1 2
1 3 4
2 5 6
Output Filtered DataFrame:
1    4
2    6
Name: B, dtype: int64

Filtering Data using the Boolean Index with .iloc

Similar to the above approach, the .iloc method is used for position-based indexing.

Example: Using .iloc with a Boolean Index

This example uses the .iloc method for positional indexing. By converting the boolean index to an array using .values attribute, we can filter the DataFrame similarly to .loc method.

Open Compiler
import pandas as pd # Create a Pandas DataFrame df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], columns=['A', 'B']) # Display the DataFrame print("Input DataFrame:\n", df) # Create Boolean Index s = (df['A'] > 2) # Filter data using .iloc and the Boolean Index print('Output Filtered Data:\n',df.iloc[s.values, 1])

Following is the output of the above code −

Input DataFrame:
A B
0 1 2
1 3 4
2 5 6
Output Filtered Data:
1    4
2    6
Name: B, dtype: int64

Advanced Boolean Indexing with Multiple Conditions

Pandas provides more complex boolean indexing by combining multiple conditions with the operators like & (and), | (or), and ~ (not). And also you can apply these conditions across different columns to create highly specific filters.

Example: Using Multiple Conditions Across Columns

The following example demonstrates how apply the boolean indexing with multiple conditions across columns.

Open Compiler
import pandas as pd # Create a DataFrame df = pd.DataFrame({'A': [1, 3, 5, 7],'B': [5, 2, 8, 4],'C': ['x', 'y', 'x', 'z']}) # Display the DataFrame print("Input DataFrame:\n", df) # Apply multiple conditions using boolean indexing result = df.loc[(df['A'] > 2) & (df['B'] < 5), 'A':'C'] print('Output Filtered DataFrame:\n',result)

Following is the output of the above code −

Input DataFrame:
A B C
0 1 5 x
1 3 2 y
2 5 8 x
3 7 4 z
Output Filtered DataFrame:
A B C
1 3 2 y
3 7 4 z
Advertisements