
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Sorting
Sorting is a fundamental operation when working with data in Pandas, whether you're organizing rows, columns, or specific values. Sorting can help you to arrange your data in a meaningful way for better understanding and easy analysis.
Pandas provides powerful tools for sorting your data efficiently, which can be done by labels or actual values. In this tutorial, we'll explore various methods for sorting data in Pandas, from basic sorting by index or column labels to more advanced techniques like sorting by multiple columns and choosing specific sorting algorithms.
Types of Sorting in Pandas
There are two kinds of sorting available in Pandas. They are −
Sorting by Label − This involves sorting the data based on the index labels.
Sorting by Value − This involves sorting data based on the actual values in the DataFrame or Series.
Sorting by Label
To sort by the index labels, you can use the sort_index() method, by passing the axis arguments and the order of sorting, data structure object can be sorted. By default, this method sorts the DataFrame in ascending order based on the row labels.
Example
Let's take a basic example of demonstrating the sorting a DataFrame by using the sort_index() method.
import pandas as pd import numpy as np unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],columns = ['col2','col1']) print("Original DataFrame:\n", unsorted_df) # Sort the DataFrame by labels sorted_df=unsorted_df.sort_index() print("\nOutput Sorted DataFrame:\n", sorted_df)
Its output is as follows −
Original DataFrame: col2 col1 1 1.116188 1.631727 4 0.287900 -1.097359 6 0.058885 -0.642273 2 -2.070172 0.148255 3 -1.458229 1.298907 5 -0.723663 2.220048 9 -1.271494 2.001025 8 -0.412954 -0.808688 0 0.922697 -0.429393 7 -0.476054 -0.351621 Output Sorted DataFrame: col2 col1 0 0.922697 -0.429393 1 1.116188 1.631727 2 -2.070172 0.148255 3 -1.458229 1.298907 4 0.287900 -1.097359 5 -0.723663 2.220048 6 0.058885 -0.642273 7 -0.476054 -0.351621 8 -0.412954 -0.808688 9 -1.271494 2.001025
Example − Controlling the Order of Sorting
By passing the Boolean value to ascending parameter, the order of the sorting can be controlled. Let us consider the following example to understand the same.
import pandas as pd import numpy as np unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],columns = ['col2','col1']) print("Original DataFrame:\n", unsorted_df) # Sort the DataFrame by ascending order sorted_df = unsorted_df.sort_index(ascending=False) print("\nOutput Sorted DataFrame:\n", sorted_df)
Its output is as follows −
Original DataFrame: col2 col1 1 -0.668366 0.576422 4 0.605218 -0.066065 6 1.140478 0.236687 2 0.137617 0.312423 3 -0.055631 0.774057 5 0.108002 1.038820 9 -0.929134 -0.982358 8 -0.207542 -1.283386 0 -0.210571 -0.656371 7 -0.106388 0.672418 Output Sorted DataFrame: col2 col1 9 -0.929134 -0.982358 8 -0.207542 -1.283386 7 -0.106388 0.672418 6 1.140478 0.236687 5 0.108002 1.038820 4 0.605218 -0.066065 3 -0.055631 0.774057 2 0.137617 0.312423 1 -0.668366 0.576422 0 -0.210571 -0.656371
Example − Sort the Columns
By passing the axis argument with a value 0 or 1, the sorting can be done on the column labels. By default, axis=0, sort by row. Let us consider the following example to understand the same.
import pandas as pd import numpy as np unsorted_df = pd.DataFrame(np.random.randn(6,4),index=[1,4,2,3,5,0],columns = ['col2','col1', 'col4', 'col3']) print("Original DataFrame:\n", unsorted_df) # Sort the DataFrame columns sorted_df=unsorted_df.sort_index(axis=1) print("\nOutput Sorted DataFrame:\n", sorted_df)
Its output is as follows −
Original DataFrame: col2 col1 col4 col3 1 -0.828951 -0.798286 -1.794752 -0.082656 4 0.440243 -0.693218 -0.218277 -0.790168 2 1.017670 1.443679 -1.939119 -1.887223 3 -0.992471 -1.425046 0.651336 -0.278247 5 -0.103537 -0.879433 0.471838 0.860885 0 -0.222297 1.094805 0.501531 -0.580382 Output Sorted DataFrame: col1 col2 col3 col4 1 -0.798286 -0.828951 -0.082656 -1.794752 4 -0.693218 0.440243 -0.790168 -0.218277 2 1.443679 1.017670 -1.887223 -1.939119 3 -1.425046 -0.992471 -0.278247 0.651336 5 -0.879433 -0.103537 0.860885 0.471838 0 1.094805 -0.222297 -0.580382 0.501531
Sorting by Actual Values
Like index sorting, sorting by actual values can be done using the sort_values() method. This method allows sorting by one or more columns. It accepts a 'by' argument which will use the column name of the DataFrame with which the values are to be sorted.
Example − Sorting a Series Values
The following example demonstrates how to sort a pandas Series object using the sort_values() method.
import pandas as pd panda_series = pd.Series([18, 95, 66, 12, 55, 0]) print("Unsorted Pandas Series: \n", panda_series) panda_series_sorted = panda_series.sort_values(ascending=True) print("\nSorted Pandas Series: \n", panda_series_sorted)
On executing the above code you will get the following output −
Unsorted Pandas Series: 0 18 1 95 2 66 3 12 4 55 5 0 dtype: int64 Sorted Pandas Series: 5 0 3 12 0 18 4 55 2 66 1 95 dtype: int64
Example − Sorting a DataFrame Values
The following example demonstrates working of the sort_values() method on a DataFrame Object.
import pandas as pd import numpy as np unsorted_df = pd.DataFrame({'col1':[2,9,5,0],'col2':[1,3,2,4]}) print("Original DataFrame:\n", unsorted_df) # Sort the DataFrame by values sorted_df = unsorted_df.sort_values(by='col1') print("\nOutput Sorted DataFrame:\n", sorted_df)
Its output is as follows −
Original DataFrame: col1 col2 0 2 1 1 9 3 2 5 2 3 0 4 Output Sorted DataFrame: col1 col2 3 0 4 0 2 1 2 5 2 1 9 3
Observe, col1 values are sorted and the respective col2 value and row index will alter along with col1. Thus, they look unsorted.
Example − Sorting Value of the Multiple Columns
You can also sort by multiple columns by passing a list of column names to the 'by' parameter.
import pandas as pd import numpy as np unsorted_df = pd.DataFrame({'col1':[2,1,0,1],'col2':[1,3,4,2]}) print("Original DataFrame:\n", unsorted_df) # Sort the DataFrame multiple columns by values sorted_df = unsorted_df.sort_values(by=['col1','col2']) print("\nOutput Sorted DataFrame:\n", sorted_df)
Its output is as follows −
Original DataFrame: col1 col2 0 2 1 1 1 3 2 0 4 3 1 2 Output Sorted DataFrame: col1 col2 2 0 4 3 1 2 1 1 3 0 2 1
Choosing a Sorting Algorithm
Pandas allows you to specify the sorting algorithm using the kind parameter in the sort_values() method. You can choose between 'mergesort', 'heapsort', and 'quicksort'. 'mergesort' is the only stable algorithm.
Example
The following example sorts a DataFrame using the sort_values() method with specific algorithm.
import pandas as pd import numpy as np unsorted_df = pd.DataFrame({'col1':[2,5,0,1],'col2':[1,3,0,4]}) print("Original DataFrame:\n", unsorted_df) # Sort the DataFrame sorted_df = unsorted_df.sort_values(by='col1' ,kind='mergesort') print("\nOutput Sorted DataFrame:\n", sorted_df)
Its output is as follows −
Original DataFrame: col1 col2 0 2 1 1 5 3 2 0 0 3 1 4 Output Sorted DataFrame: col1 col2 2 0 0 3 1 4 0 2 1 1 5 3