
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Boolean Masking
A boolean mask is an array of boolean values (True or False) used to filter data. It is created by applying conditional expressions to the dataset, which evaluates each element and returns True for matching conditions and False otherwise.
Boolean Masking in Pandas
Boolean masking in Pandas is a useful technique to filter data based on specific conditions. It works by creating a boolean mask, where each element in a DataFrame or Series is represented as either True or False. When you apply this mask to a DataFrame or Series to select data, it selects only the rows or columns that satisfy the given condition.
Why Use Boolean Masks?
Boolean masks provide an efficient way to filter and manipulate data in Pandas without using loops. They are useful for −
Selecting data based on specific conditions.
Performing conditional operations on DataFrames.
Filtering data based on index and column values.
In this tutorial we will learn how to create a Boolean mask and apply it to a Pandas DataFrame or Series for filtering data based on index and column values.
Creating a Boolean Mask
Creating a boolean mask is done by applying a conditional statement to a DataFrame or Series object. For example, if you specify a condition to check whether values in a series are greater than a specific number, then Pandas will return a series of True or False values, which results in a Boolean mask.
Example
The following example demonstrates how to create a boolean mask for Series object in Pandas.
import pandas as pd # Create a Pandas Series s = pd.Series([1, 5, 2, 8, 4], index=['A', 'B', 'C', 'D', 'E']) # Display the Series print("Input Series:") print(s) # Create Boolean mask result = s > 2 print('\nBoolean Mask:') print(result)
Following is the output of the above code −
Input Series: A 1 B 5 C 2 D 8 E 4 dtype: int64 Boolean Mask: A False B True C False D True E True dtype: bool
Selecting Data with Boolean Mask
Selecting or filtering data in a DataFrame is done by creating a boolean mask that defines the conditions for selecting rows.
Example
The following example demonstrates how to filter data using boolean masking.
import pandas as pd # Create a sample DataFrame df= pd.DataFrame({'Col1': [1, 3, 5, 7, 9], 'Col2': ['A', 'B', 'A', 'C', 'A']}) # Display the Input DataFrame print('Original DataFrame:\n', df) # Create a boolean mask mask = (df['Col2'] == 'A') & (df['Col1'] > 4) # Apply the mask to the DataFrame filtered_data = df[mask] print('Filtered Data:\n',filtered_data)
Following is the output of the above code −
Original DataFrame:
Col1 | Col2 | |
---|---|---|
0 | 1 | A |
1 | 3 | B |
2 | 5 | A |
3 | 7 | C |
4 | 9 | A |
Col1 | Col2 | |
---|---|---|
2 | 5 | A |
4 | 9 | A |
Masking Data Based on Index Value
Filtering data based on the index values of the DataFrame can be possible by creating the mask for the index, so that you can select rows based on their position or label.
Example
This example uses the df.isin() method to create a boolean mask based on the index labels.
import pandas as pd # Create a DataFrame with a custom index df = pd.DataFrame({'A1': [10, 20, 30, 40, 50], 'A2':[9, 3, 5, 3, 2] }, index=['a', 'b', 'c', 'd', 'e']) # Dispaly the Input DataFrame print('Original DataFrame:\n', df) # Define a mask based on the index mask = df.index.isin(['b', 'd']) # Apply the mask filtered_data = df[mask] print('Filtered Data:\n',filtered_data)
Following is the output of the above code −
Original DataFrame:A1 | A2 | |
---|---|---|
a | 10 | 9 |
b | 20 | 3 |
c | 30 | 5 |
d | 40 | 3 |
e | 50 | 2 |
A1 | A2 | |
---|---|---|
b | 20 | 3 |
d | 40 | 3 |
Masking Data Based on Column Value
In addition to filtering based on index values, you can also filter data based on specific column values using boolean masks. The df.isin() method is used to check if values in a column match a list of values.
Example
The following example demonstrates how to create and apply a boolean mask to select data based on DataFrame column values.
import pandas as pd # Create a DataFrame df= pd.DataFrame({'A': [1, 2, 3],'B': ['a', 'b', 'f']}) # Dispaly the Input DataFrame print('Original DataFrame:\n', df) # Define a mask for specific values in column 'A' and 'B' mask = df['A'].isin([1, 3]) | df['B'].isin(['a']) # Apply the mask using the boolean indexing filtered_data = df[mask] print('Filtered Data:\n', filtered_data)
Following is the output of the above code −
Original DataFrame:A | B | |
---|---|---|
0 | 1 | a |
1 | 2 | b |
2 | 3 | f |
A | B | |
---|---|---|
0 | 1 | a |
2 | 3 | f |