Python Pandas - Boolean Masking



A boolean mask is an array of boolean values (True or False) used to filter data. It is created by applying conditional expressions to the dataset, which evaluates each element and returns True for matching conditions and False otherwise.

Boolean Masking in Pandas

Boolean masking in Pandas is a useful technique to filter data based on specific conditions. It works by creating a boolean mask, where each element in a DataFrame or Series is represented as either True or False. When you apply this mask to a DataFrame or Series to select data, it selects only the rows or columns that satisfy the given condition.

Why Use Boolean Masks?

Boolean masks provide an efficient way to filter and manipulate data in Pandas without using loops. They are useful for −

  • Selecting data based on specific conditions.

  • Performing conditional operations on DataFrames.

  • Filtering data based on index and column values.

In this tutorial we will learn how to create a Boolean mask and apply it to a Pandas DataFrame or Series for filtering data based on index and column values.

Creating a Boolean Mask

Creating a boolean mask is done by applying a conditional statement to a DataFrame or Series object. For example, if you specify a condition to check whether values in a series are greater than a specific number, then Pandas will return a series of True or False values, which results in a Boolean mask.

Example

The following example demonstrates how to create a boolean mask for Series object in Pandas.

Open Compiler
import pandas as pd # Create a Pandas Series s = pd.Series([1, 5, 2, 8, 4], index=['A', 'B', 'C', 'D', 'E']) # Display the Series print("Input Series:") print(s) # Create Boolean mask result = s > 2 print('\nBoolean Mask:') print(result)

Following is the output of the above code −

Input Series:
A    1
B    5
C    2
D    8
E    4
dtype: int64

Boolean Mask:
A    False
B     True
C    False
D     True
E     True
dtype: bool          

Selecting Data with Boolean Mask

Selecting or filtering data in a DataFrame is done by creating a boolean mask that defines the conditions for selecting rows.

Example

The following example demonstrates how to filter data using boolean masking.

Open Compiler
import pandas as pd # Create a sample DataFrame df= pd.DataFrame({'Col1': [1, 3, 5, 7, 9], 'Col2': ['A', 'B', 'A', 'C', 'A']}) # Display the Input DataFrame print('Original DataFrame:\n', df) # Create a boolean mask mask = (df['Col2'] == 'A') & (df['Col1'] > 4) # Apply the mask to the DataFrame filtered_data = df[mask] print('Filtered Data:\n',filtered_data)

Following is the output of the above code −

Original DataFrame:
Col1 Col2
0 1 A
1 3 B
2 5 A
3 7 C
4 9 A
Filtered Data:
Col1 Col2
2 5 A
4 9 A

Masking Data Based on Index Value

Filtering data based on the index values of the DataFrame can be possible by creating the mask for the index, so that you can select rows based on their position or label.

Example

This example uses the df.isin() method to create a boolean mask based on the index labels.

Open Compiler
import pandas as pd # Create a DataFrame with a custom index df = pd.DataFrame({'A1': [10, 20, 30, 40, 50], 'A2':[9, 3, 5, 3, 2] }, index=['a', 'b', 'c', 'd', 'e']) # Dispaly the Input DataFrame print('Original DataFrame:\n', df) # Define a mask based on the index mask = df.index.isin(['b', 'd']) # Apply the mask filtered_data = df[mask] print('Filtered Data:\n',filtered_data)

Following is the output of the above code −

Original DataFrame:
A1 A2
a 10 9
b 20 3
c 30 5
d 40 3
e 50 2
Filtered Data:
A1 A2
b 20 3
d 40 3

Masking Data Based on Column Value

In addition to filtering based on index values, you can also filter data based on specific column values using boolean masks. The df.isin() method is used to check if values in a column match a list of values.

Example

The following example demonstrates how to create and apply a boolean mask to select data based on DataFrame column values.

Open Compiler
import pandas as pd # Create a DataFrame df= pd.DataFrame({'A': [1, 2, 3],'B': ['a', 'b', 'f']}) # Dispaly the Input DataFrame print('Original DataFrame:\n', df) # Define a mask for specific values in column 'A' and 'B' mask = df['A'].isin([1, 3]) | df['B'].isin(['a']) # Apply the mask using the boolean indexing filtered_data = df[mask] print('Filtered Data:\n', filtered_data)

Following is the output of the above code −

Original DataFrame:
A B
0 1 a
1 2 b
2 3 f
Filtered Data:
A B
0 1 a
2 3 f
Advertisements