How to sort a Pandas DataFrame by multiple columns?
We are given a DataFrame and our task is to sort it based on multiple columns. This means organizing the data first by one column and then by another within that sorted order. For example, if we want to sort by ‘Rank’ in ascending order and then by ‘Age’ in descending order, the output will be a DataFrame ordered according to those rules, with NaN values placed at the end if specified.
Using nlargest()
nlargest() method is the fastest way to get the top n rows sorted by specific columns. It is optimized for performance, making it ideal when you need to retrieve only the top values based on one or more criteria.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['Raj', 'Akhil', 'Sonum', 'Tilak', 'Divya', 'Megha'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})
# Selecting top 3 rows with highest 'Rank'
res = df.nlargest(3, ['Rank'])
print(res)
Output
Name Age Rank 3 Tilak 19 9.0 2 Sonum 21 8.0 4 Divya 17 4.0
Explanation: nlargest(n, columns) selects the top n rows with the highest values in the specified column, ignoring NaNs. Here, df.nlargest(3, [‘Rank’]) efficiently sorts by ‘Rank’ in descending order and returns the top 3 rows.
Using nsmallest()
nsmallest() method works similarly to nlargest() but retrieves the lowest n values instead.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['Raj', 'Akhil', 'Sonum', 'Tilak', 'Divya', 'Megha'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})
# Selecting bottom 3 rows with lowest 'Rank'
res = df.nsmallest(3, ['Rank'])
print(res)
Output
Name Age Rank 0 Raj 20 1.0 4 Divya 17 4.0 2 Sonum 21 8.0
Explanation: nsmallest(n, columns) selects the bottom n rows with the lowest values in the specified column, ignoring NaNs. Here, df.nsmallest(3, [‘Rank’]) sorts ‘Rank’ in ascending order and returns the lowest 3 rows.
Using sort_values()
sort_values() method is the most flexible and widely used method for sorting a DataFrame by multiple columns. It allows sorting in both ascending and descending order while handling missing values efficiently.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['Raj', 'Akhil', 'Sonum', 'Tilak', 'Divya', 'Megha'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})
# Sorting by 'Rank' in ascending order and 'Age' in descending order
res = df.sort_values(by=['Rank', 'Age'], ascending=[True, False], na_position='last')
print(res)
Output
Name Age Rank 0 Raj 20 1.0 4 Divya 17 4.0 2 Sonum 21 8.0 3 Tilak 19 9.0 5 Megha 23 NaN 1 Akhil 22 NaN
Explanation: sort_values(by, ascending, na_position) sorts a DataFrame based on multiple columns. Here, df.sort_values() sorts ‘Rank’ in ascending order and, for equal ranks, sorts ‘Age’ in descending order while pushing NaN values to the end.
Using sort_index()
sort_index() method sorts the DataFrame based on its index rather than its column values. It is useful when you want to reorder rows by their index, such as after setting a custom index.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['Raj', 'Akhil', 'Sonum', 'Tilak', 'Divya', 'Megha'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})
# Sorting the DataFrame by index in descending order
res = df.sort_index(ascending=False)
print(res)
Output
Name Age Rank 5 Megha 23 NaN 4 Divya 17 4.0 3 Tilak 19 9.0 2 Sonum 21 8.0 1 Akhil 22 NaN 0 Raj 20 1.0
Explanation: sort_index(ascending) sorts a DataFrame based on its index. Here, df.sort_index(ascending=False) arranges the rows in descending order of their index values.
Using argsort()
If you need extremely fast sorting and are working with NumPy arrays, you can use argsort() to get the sorted indices and then apply them to the DataFrame.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['Raj', 'Akhil', 'Sonum', 'Tilak', 'Divya', 'Megha'],
'Age': [20, 22, 21, 19, 17, 23],
'Rank': [1, np.nan, 8, 9, 4, np.nan]
})
# Sorting DataFrame by 'Rank' using NumPy's argsort
sorted_idx = np.argsort(df['Rank'].values, kind='quicksort')
res = df.iloc[sorted_idx]
print(res)
Output
Name Age Rank 0 Raj 20 1.0 4 Divya 17 4.0 2 Sonum 21 8.0 3 Tilak 19 9.0 1 Akhil 22 NaN 5 Megha 23 NaN
Explanation: np.argsort(df[‘Rank’].values, kind=’quicksort’) returns sorted indices for the ‘Rank’ column, ignoring NaNs. Using .iloc[sorted_idx], the DataFrame is reordered accordingly.