Drop rows from dataframe based on certain condition applied on a column – Pandas

Last Updated : 27 Nov, 2024

In this post, we are going to discuss several approaches on how to drop rows from the dataframe based on certain conditions applied to a column. Whenever we need to eliminate irrelevant or invalid data, the primary way to do this is: boolean indexing which involves applying a condition to a DataFrame column and using that condition to filter rows. A boolean series (True or False values) can be applied with a condition to a column and can then be used to index the DataFrame and select only the rows where the condition is True. Below is a simple example to illustrate this:

Method 1. Using Boolean Indexing

For instance, a common nba.csv dataset includes detailed statistics about NBA players, including information like player names, team affiliations, jersey numbers, positions, age, height, weight, college, and salary. To download the CSV (“nba.csv” dataset) used in the code, click here. Now, we want those players whose age is greater than or equal to 25 years.

import pandas as pd
df = pd.read_csv('https://media.geeksforgeeks.org/wp-content/uploads/nba.csv')

# Filter rows where the player's age is 25 or older
filtered_df = df[df['Age'] >= 25]
print(filtered_df.head(15))

Output:

As we can see in the output, the returned Dataframe only retains those players whose age is greater than or equal to 25 years.

Boolean indexing allows to directly specify conditions using logical operators (>, <, ==, etc.), making it ideal for quick filtering tasks.

Method 2. `DataFrame.query() for C`omplex Conditions.

The query() method enables filtering rows using a string-based query expression. It supports logical operators (and, or) and allows referencing Python variables using @. Useful for complex conditions or when the column names are not simple or are reserved keywords.

import pandas as pd
df = pd.read_csv('https://media.geeksforgeeks.org/wp-content/uploads/nba.csv')

# Filter rows where the player's age is 25 or older
df_filtered = df.query('Age <= 30')
print(df_filtered.head())

Output

            Name            Team  Number  ... Weight            College     Salary
0  Avery Bradley  Boston Celtics     0.0  ...  180.0              Texas  7730337.0
1    Jae Crowder  Boston Celtics  ...

Method 3. Using `DataFrame.drop()`

The drop() method in pandas is used to drop rows by specifying the index labels. drop() is useful when you need to remove rows by index or after filtering based on a condition. It gives you more control over row selection, especially when dealing with index-based operations. However, it is less intuitive for simple conditions compared to boolean indexing or query().

Example 1: Delete Rows Based on Multiple Conditions on a Column Using drop()

import pandas as pd
df = pd.read_csv('https://media.geeksforgeeks.org/wp-content/uploads/nba.csv')

# Delete all rows with column 'Age' has value 20 to 25 
indexAge = df[ (df['Age'] >= 20) & (df['Age'] <= 25) ].index
df.drop(indexAge , inplace=True)
df.head(15)

Output:

As we can see in the output, the returned Dataframe only contains those players whose age is not between 20 to 25 age using df.drop().

Example 2: Here, we drop all the rows whose names and Positions are associated with ‘John Holland‘ or ‘SG’

import pandas as pd
df = pd.read_csv('https://media.geeksforgeeks.org/wp-content/uploads/nba.csv')

# Delete rows where the 'Name' is 'John Holland' or the 'Position' is 'SG'
indexAge = df[(df['Name'] == 'John Holland') | (df['Position'] == 'SG')].index
df.drop(indexAge, inplace=True)
df.head(15)

Output:

Method 4. Using `DataFrame.loc[]`

loc[] is a label-based indexing method that allows to filter rows based on a condition. You can apply conditions directly within the loc[] method, similar to boolean indexing, but loc[] also gives you the flexibility to select specific rows and columns. Useful when you need to work with both row labels and column conditions in one operation.

import pandas as pd
df = pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")

# Filter the dataset to only include players weigh 185 or above
df_filtered = df.loc[df['Weight'] >= 185]
print(df_filtered.head())

Output

            Name            Team  Number  ... Weight            College      Salary
1    Jae Crowder  Boston Celtics    99.0  ...  235.0          Marquette   6796117.0
2   John Holland  Boston Celtics...

In this article, we covered four methods for dropping rows from a DataFrame based on a condition applied to a column. Here’s a quick recap:

Boolean Indexing: Simplest and most intuitive way to drop rows based on a condition for basic conditions.
DataFrame.query(): A more flexible method for complex conditions or when working with special column names. May be slower on large datasets.
DataFrame.drop(): Ideal for removing rows based on known index labels. It offers more control but is less intuitive for conditions.
DataFrame.loc[]: A versatile method that allows filtering rows and selecting specific columns simultaneously. It’s powerful for more advanced filtering operations.

For more, you can refer to: How to drop rows or columns based on their labels

Insert row at given position in Pandas Dataframe

Shubham__Ranjan

Improve

Article Tags :

Drop rows from dataframe based on certain condition applied on a column – Pandas

Method 1. Using Boolean Indexing

Method 2. DataFrame.query() for Complex Conditions.

Method 3. Using DataFrame.drop()

Method 4. Using DataFrame.loc[]

Similar Reads

Pandas DataFrame Practice Exercises

Pandas Dataframe Rows Practice Exercise

Pandas Dataframe Columns Practice Exercise

Pandas Series Practice Exercise

Pandas Date and Time Practice Exercise

DataFrame String Manipulation

Accessing and Manipulating Data in DataFrame

DataFrame Visualization and Exporting

Thank You!

What kind of Experience do you want to share?

Method 2. `DataFrame.query() for C`omplex Conditions.

Method 3. Using `DataFrame.drop()`

Method 4. Using `DataFrame.loc[]`