Python Pandas - Sorting



Sorting is a fundamental operation when working with data in Pandas, whether you're organizing rows, columns, or specific values. Sorting can help you to arrange your data in a meaningful way for better understanding and easy analysis.

Pandas provides powerful tools for sorting your data efficiently, which can be done by labels or actual values. In this tutorial, we'll explore various methods for sorting data in Pandas, from basic sorting by index or column labels to more advanced techniques like sorting by multiple columns and choosing specific sorting algorithms.

Types of Sorting in Pandas

There are two kinds of sorting available in Pandas. They are −

  • Sorting by Label − This involves sorting the data based on the index labels.

  • Sorting by Value − This involves sorting data based on the actual values in the DataFrame or Series.

Sorting by Label

To sort by the index labels, you can use the sort_index() method, by passing the axis arguments and the order of sorting, data structure object can be sorted. By default, this method sorts the DataFrame in ascending order based on the row labels.

Example

Let's take a basic example of demonstrating the sorting a DataFrame by using the sort_index() method.

Open Compiler
import pandas as pd import numpy as np unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],columns = ['col2','col1']) print("Original DataFrame:\n", unsorted_df) # Sort the DataFrame by labels sorted_df=unsorted_df.sort_index() print("\nOutput Sorted DataFrame:\n", sorted_df)

Its output is as follows −

Original DataFrame:
        col2      col1
1  1.116188  1.631727
4  0.287900 -1.097359
6  0.058885 -0.642273
2 -2.070172  0.148255
3 -1.458229  1.298907
5 -0.723663  2.220048
9 -1.271494  2.001025
8 -0.412954 -0.808688
0  0.922697 -0.429393
7 -0.476054 -0.351621

Output Sorted DataFrame:
        col2      col1
0  0.922697 -0.429393
1  1.116188  1.631727
2 -2.070172  0.148255
3 -1.458229  1.298907
4  0.287900 -1.097359
5 -0.723663  2.220048
6  0.058885 -0.642273
7 -0.476054 -0.351621
8 -0.412954 -0.808688
9 -1.271494  2.001025

Example − Controlling the Order of Sorting

By passing the Boolean value to ascending parameter, the order of the sorting can be controlled. Let us consider the following example to understand the same.

Open Compiler
import pandas as pd import numpy as np unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],columns = ['col2','col1']) print("Original DataFrame:\n", unsorted_df) # Sort the DataFrame by ascending order sorted_df = unsorted_df.sort_index(ascending=False) print("\nOutput Sorted DataFrame:\n", sorted_df)

Its output is as follows −

Original DataFrame:
        col2      col1
1 -0.668366  0.576422
4  0.605218 -0.066065
6  1.140478  0.236687
2  0.137617  0.312423
3 -0.055631  0.774057
5  0.108002  1.038820
9 -0.929134 -0.982358
8 -0.207542 -1.283386
0 -0.210571 -0.656371
7 -0.106388  0.672418

Output Sorted DataFrame:
        col2      col1
9 -0.929134 -0.982358
8 -0.207542 -1.283386
7 -0.106388  0.672418
6  1.140478  0.236687
5  0.108002  1.038820
4  0.605218 -0.066065
3 -0.055631  0.774057
2  0.137617  0.312423
1 -0.668366  0.576422
0 -0.210571 -0.656371

Example − Sort the Columns

By passing the axis argument with a value 0 or 1, the sorting can be done on the column labels. By default, axis=0, sort by row. Let us consider the following example to understand the same.

Open Compiler
import pandas as pd import numpy as np unsorted_df = pd.DataFrame(np.random.randn(6,4),index=[1,4,2,3,5,0],columns = ['col2','col1', 'col4', 'col3']) print("Original DataFrame:\n", unsorted_df) # Sort the DataFrame columns sorted_df=unsorted_df.sort_index(axis=1) print("\nOutput Sorted DataFrame:\n", sorted_df)

Its output is as follows −

Original DataFrame:
        col2      col1      col4      col3
1 -0.828951 -0.798286 -1.794752 -0.082656
4  0.440243 -0.693218 -0.218277 -0.790168
2  1.017670  1.443679 -1.939119 -1.887223
3 -0.992471 -1.425046  0.651336 -0.278247
5 -0.103537 -0.879433  0.471838  0.860885
0 -0.222297  1.094805  0.501531 -0.580382

Output Sorted DataFrame:
        col1      col2      col3      col4
1 -0.798286 -0.828951 -0.082656 -1.794752
4 -0.693218  0.440243 -0.790168 -0.218277
2  1.443679  1.017670 -1.887223 -1.939119
3 -1.425046 -0.992471 -0.278247  0.651336
5 -0.879433 -0.103537  0.860885  0.471838
0  1.094805 -0.222297 -0.580382  0.501531

Sorting by Actual Values

Like index sorting, sorting by actual values can be done using the sort_values() method. This method allows sorting by one or more columns. It accepts a 'by' argument which will use the column name of the DataFrame with which the values are to be sorted.

Example − Sorting a Series Values

The following example demonstrates how to sort a pandas Series object using the sort_values() method.

Open Compiler
import pandas as pd panda_series = pd.Series([18, 95, 66, 12, 55, 0]) print("Unsorted Pandas Series: \n", panda_series) panda_series_sorted = panda_series.sort_values(ascending=True) print("\nSorted Pandas Series: \n", panda_series_sorted)

On executing the above code you will get the following output −

Unsorted Pandas Series: 
 0    18
1    95
2    66
3    12
4    55
5     0
dtype: int64

Sorted Pandas Series: 
 5     0
3    12
0    18
4    55
2    66
1    95
dtype: int64

Example − Sorting a DataFrame Values

The following example demonstrates working of the sort_values() method on a DataFrame Object.

Open Compiler
import pandas as pd import numpy as np unsorted_df = pd.DataFrame({'col1':[2,9,5,0],'col2':[1,3,2,4]}) print("Original DataFrame:\n", unsorted_df) # Sort the DataFrame by values sorted_df = unsorted_df.sort_values(by='col1') print("\nOutput Sorted DataFrame:\n", sorted_df)

Its output is as follows −

Original DataFrame:
    col1  col2
0     2     1
1     9     3
2     5     2
3     0     4

Output Sorted DataFrame:
    col1  col2
3     0     4
0     2     1
2     5     2
1     9     3

Observe, col1 values are sorted and the respective col2 value and row index will alter along with col1. Thus, they look unsorted.

Example − Sorting Value of the Multiple Columns

You can also sort by multiple columns by passing a list of column names to the 'by' parameter.

Open Compiler
import pandas as pd import numpy as np unsorted_df = pd.DataFrame({'col1':[2,1,0,1],'col2':[1,3,4,2]}) print("Original DataFrame:\n", unsorted_df) # Sort the DataFrame multiple columns by values sorted_df = unsorted_df.sort_values(by=['col1','col2']) print("\nOutput Sorted DataFrame:\n", sorted_df)

Its output is as follows −

Original DataFrame:
    col1  col2
0     2     1
1     1     3
2     0     4
3     1     2

Output Sorted DataFrame:
    col1  col2
2     0     4
3     1     2
1     1     3
0     2     1

Choosing a Sorting Algorithm

Pandas allows you to specify the sorting algorithm using the kind parameter in the sort_values() method. You can choose between 'mergesort', 'heapsort', and 'quicksort'. 'mergesort' is the only stable algorithm.

Example

The following example sorts a DataFrame using the sort_values() method with specific algorithm.

Open Compiler
import pandas as pd import numpy as np unsorted_df = pd.DataFrame({'col1':[2,5,0,1],'col2':[1,3,0,4]}) print("Original DataFrame:\n", unsorted_df) # Sort the DataFrame sorted_df = unsorted_df.sort_values(by='col1' ,kind='mergesort') print("\nOutput Sorted DataFrame:\n", sorted_df)

Its output is as follows −

Original DataFrame:
    col1  col2
0     2     1
1     5     3
2     0     0
3     1     4

Output Sorted DataFrame:
    col1  col2
2     0     0
3     1     4
0     2     1
1     5     3
Advertisements