
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Indexing and Selecting Data
In pandas, indexing and selecting data are crucial for efficiently working with data in Series and DataFrame objects. These operations help you to slice, dice, and access subsets of your data easily.
These operations involve retrieving specific parts of your data structure, whether it's a Series or DataFrame. This process is crucial for data analysis as it allows you to focus on relevant data, apply transformations, and perform calculations.
Indexing in pandas is essential because it provides metadata that helps with analysis, visualization, and interactive display. It automatically aligns data for easier manipulation and simplifies the process of getting and setting data subsets.
This tutorial will explore various methods to slice, dice, and manipulate data using Pandas, helping you understand how to access and modify subsets of your data.
Types of Indexing in Pandas
Similar to Python and NumPy indexing ([ ]) and attribute (.) operators, Pandas provides straightforward methods for accessing data within its data structures. However, because the data type being accessed can be unpredictable, relying exclusively on these standard operators may lead to optimization challenges.
Pandas provides several methods for indexing and selecting data, such as −
Label-Based Indexing with .loc
Integer Position-Based Indexing with .iloc
Indexing with Brackets []
Label-Based Indexing with .loc
The .loc indexer is used for label-based indexing, which means you can access rows and columns by their labels. It also supports boolean arrays for conditional selection.
.loc() has multiple access methods like −
single scalar label: Selects a single row or column, e.g., df.loc['a'].
list of labels: Select multiple rows or columns, e.g., df.loc[['a', 'b']].
Label Slicing: Use slices with labels, e.g., df.loc['a':'f'] (both start and end are included).
Boolean Arrays: Filter data based on conditions, e.g., df.loc[boolean_array].
loc takes two single/list/range operator separated by ','. The first one indicates the row and the second one indicates columns.
Example 1
Here is a basic example that selects all rows for a specific column using the loc indexer.
#import the pandas library and aliasing as pd import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D']) print("Original DataFrame:\n", df) #select all rows for a specific column print('\nResult:\n',df.loc[:,'A'])
Its output is as follows −
Original DataFrame: A B C D a 0.962766 -0.195444 1.729083 -0.701897 b -0.552681 0.797465 -1.635212 -0.624931 c 0.581866 -0.404623 -2.124927 -0.190193 d -0.284274 0.019995 -0.589465 0.914940 e 0.697209 -0.629572 -0.347832 0.272185 f -0.181442 -0.000983 2.889981 0.104957 g 1.195847 -1.358104 0.110449 -0.341744 h -0.121682 0.744557 0.083820 0.355442 Result: a 0.962766 b -0.552681 c 0.581866 d -0.284274 e 0.697209 f -0.181442 g 1.195847 h -0.121682 Name: A, dtype: float64
Note: The output generated will vary with each execution because the DataFrame is created using NumPy's random number generator.
Example 2
This example selecting all rows for multiple columns.
# import the pandas library and aliasing as pd import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D']) # Select all rows for multiple columns, say list[] print(df.loc[:,['A','C']])
Its output is as follows −
A C a 0.391548 0.745623 b -0.070649 1.620406 c -0.317212 1.448365 d -2.162406 -0.873557 e 2.202797 0.528067 f 0.613709 0.286414 g 1.050559 0.216526 h 1.122680 -1.621420
Example 3
This example selects the specific rows for the specific columns.
# import the pandas library and aliasing as pd import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D']) # Select few rows for multiple columns, say list[] print(df.loc[['a','b','f','h'],['A','C']])
Its output is as follows −
A C a 0.391548 0.745623 b -0.070649 1.620406 f 0.613709 0.286414 h 1.122680 -1.621420
Example 4
The following example selecting a range of rows for all columns using the loc indexer.
# import the pandas library and aliasing as pd import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D']) # Select range of rows for all columns print(df.loc['c':'e'])
Its output is as follows −
A B C D c 0.044589 1.966278 0.894157 1.798397 d 0.451744 0.233724 -0.412644 -2.185069 e -0.865967 -1.090676 -0.931936 0.214358
Integer Position-Based Indexing with .iloc
The .iloc indexer is used for integer-based indexing, which allows you to select rows and columns by their numerical position. This method is similar to standard python and numpy indexing (i.e. 0-based indexing).
Single Integer: Selects data by its position, e.g., df.iloc[0].
List of Integers: Select multiple rows or columns by their positions, e.g., df.iloc[[0, 1, 2]].
Integer Slicing: Use slices with integers, e.g., df.iloc[1:3].
Boolean Arrays: Similar to .loc, but for positions.
Example 1
Here is a basic example that selects 4 rows for the all column using the iloc indexer.
# import the pandas library and aliasing as pd import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D']) print("Original DataFrame:\n", df) # select all rows for a specific column print('\nResult:\n',df.iloc[:4])
Its output is as follows −
Original DataFrame: A B C D 0 -1.152267 2.206954 -0.603874 1.275639 1 -0.799114 -0.214075 0.283186 0.030256 2 -1.823776 1.109537 1.512704 0.831070 3 -0.788280 0.961695 -0.127322 -0.597121 4 0.764930 -1.310503 0.108259 -0.600038 5 -1.683649 -0.602324 -1.175043 -0.343795 6 0.323984 -2.314158 0.098935 0.065528 7 0.109998 -0.259021 -0.429467 0.224148 Result: A B C D 0 -1.152267 2.206954 -0.603874 1.275639 1 -0.799114 -0.214075 0.283186 0.030256 2 -1.823776 1.109537 1.512704 0.831070 3 -0.788280 0.961695 -0.127322 -0.597121
Example 2
The following example selects the specific data using the integer slicing.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D']) # Integer slicing print(df.iloc[:4]) print(df.iloc[1:5, 2:4])
Its output is as follows −
A B C D 0 0.699435 0.256239 -1.270702 -0.645195 1 -0.685354 0.890791 -0.813012 0.631615 2 -0.783192 -0.531378 0.025070 0.230806 3 0.539042 -1.284314 0.826977 -0.026251 C D 1 -0.813012 0.631615 2 0.025070 0.230806 3 0.826977 -0.026251 4 1.423332 1.130568
Example 3
This example selects the data using the slicing through list of values.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D']) # Slicing through list of values print(df.iloc[[1, 3, 5], [1, 3]])
Its output is as follows −
B D 1 0.890791 0.631615 3 -1.284314 -0.026251 5 -0.512888 -0.518930
Direct Indexing with Brackets "[]"
Direct indexing with [] is a quick and intuitive way to access data, similar to indexing with Python dictionaries and NumPy arrays. Its often used for basic operations −
Single Column: Access a single column by its name.
Multiple Columns: Select multiple columns by passing a list of column names.
Row Slicing: Slice rows using integer-based indexing.
Example 1
This example demonstrates how to use the direct indexing with brackets for accessing a single column.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D']) # Accessing a Single Column print(df['A'])
Its output is as follows −
0 -0.850937 1 -1.588211 2 -1.125260 3 2.608681 4 -0.156749 5 0.154958 6 0.396192 7 -0.397918 Name: A, dtype: float64
Example 2
This example selects the multiple columns using the direct indexing.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D']) # Accessing Multiple Columns print(df[['A', 'B']])
Its output is as follows −
A B 0 0.167211 -0.080335 1 -0.104173 1.352168 2 -0.979755 -0.869028 3 0.168335 -1.362229 4 -1.372569 0.360735 5 0.428583 -0.203561 6 -0.119982 1.228681 7 -1.645357 0.331438