Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Python Pandas - Basics
Python Pandas - Introduction to Data Structures
Python Pandas - Index Objects
Python Pandas - Panel
Python Pandas - Basic Functionality
Python Pandas - Indexing & Selecting Data
Python Pandas - Series
Python Pandas - Series
Python Pandas - Slicing a Series Object
Python Pandas - Attributes of a Series Object
Python Pandas - Arithmetic Operations on Series Object
Python Pandas - Converting Series to Other Objects
Python Pandas - DataFrame
Python Pandas - DataFrame
Python Pandas - Accessing DataFrame
Python Pandas - Slicing a DataFrame Object
Python Pandas - Modifying DataFrame
Python Pandas - Removing Rows from a DataFrame
Python Pandas - Arithmetic Operations on DataFrame
Python Pandas - IO Tools
Python Pandas - IO Tools
Python Pandas - Working with CSV Format
Python Pandas - Reading & Writing JSON Files
Python Pandas - Reading Data from an Excel File
Python Pandas - Writing Data to Excel Files
Python Pandas - Working with HTML Data
Python Pandas - Clipboard
Python Pandas - Working with HDF5 Format
Python Pandas - Comparison with SQL
Python Pandas - Data Handling
Python Pandas - Sorting
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Concatenation
Python Pandas - Statistical Functions
Python Pandas - Descriptive Statistics
Python Pandas - Working with Text Data
Python Pandas - Function Application
Python Pandas - Options & Customization
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Merging/Joining
Python Pandas - MultiIndex
Python Pandas - Basics of MultiIndex
Python Pandas - Indexing with MultiIndex
Python Pandas - Advanced Reindexing with MultiIndex
Python Pandas - Renaming MultiIndex Labels
Python Pandas - Sorting a MultiIndex
Python Pandas - Binary Operations
Python Pandas - Binary Comparison Operations
Python Pandas - Boolean Indexing
Python Pandas - Boolean Masking
Python Pandas - Data Reshaping & Pivoting
Python Pandas - Pivoting
Python Pandas - Stacking & Unstacking
Python Pandas - Melting
Python Pandas - Computing Dummy Variables
Python Pandas - Categorical Data
Python Pandas - Categorical Data
Python Pandas - Ordering & Sorting Categorical Data
Python Pandas - Comparing Categorical Data
Python Pandas - Handling Missing Data
Python Pandas - Missing Data
Python Pandas - Filling Missing Data
Python Pandas - Interpolation of Missing Values
Python Pandas - Dropping Missing Data
Python Pandas - Calculations with Missing Data
Python Pandas - Handling Duplicates
Python Pandas - Duplicated Data
Python Pandas - Counting & Retrieving Unique Elements
Python Pandas - Duplicated Labels
Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy
Python Pandas - Time-series Data
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Sparse Data Structures
Python Pandas - Sparse Data
Python Pandas - Visualization
Python Pandas - Visualization
Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Python Pandas - Iteration

Quiz

Iterating over pandas objects is a fundamental task in data manipulation, and the behavior of iteration depends on the type of object you're dealing with. This tutorial explains how iteration works in pandas, specifically focusing on Series and DataFrame objects.

The iteration behavior in pandas varies between Series and DataFrame objects −

Series: Iterating over a Series object yields the values directly, making it similar to an array-like structure.
DataFrame: Iterating over a DataFrame follows a dictionary-like convention, where the iteration produces the column labels (i.e., the keys).

Iterating Through Rows in a DataFrame

To iterate over the rows of the DataFrame, we can use the following methods −

items(): to iterate over the (key,value) pairs
iterrows(): iterate over the rows as (index,series) pairs
itertuples(): iterate over the rows as namedtuples

Iterate Over Column Pairs

The items() method allows you to iterate over each column as a key-value pair, with the label as the key and the column values as a Series object. This method is consistent with the dictionary-like interface of a DataFrame.

Example

The following example iterates a DataFrame rows using the items() method. In this example each column is iterated separately as a key-value pair in a Series.

Open Compiler

import pandas as pd
import numpy as np
 
df = pd.DataFrame(np.random.randn(4,3),columns=['col1','col2','col3'])

print("Original DataFrame:\n", df)

# Iterate Through DataFrame rows
print("Iterated Output:")
for key,value in df.items():
   print(key,value)

Its output is as follows −

Original DataFrame:
        col1      col2      col3
0  0.422561  0.094621 -0.214307
1  0.430612 -0.334812 -0.010867
2  0.350962 -0.145470  0.988463
3  1.466426 -1.258297 -0.824569

Iterated Output:
col1 0    0.422561
1    0.430612
2    0.350962
3    1.466426
Name: col1, dtype: float64
col2 0    0.094621
1   -0.334812
2   -0.145470
3   -1.258297
Name: col2, dtype: float64
col3 0   -0.214307
1   -0.010867
2    0.988463
3   -0.824569
Name: col3, dtype: float64

Observe, each column is iterated separately, where key is the column name, and value is the corresponding Series object.

Iterate Over DataFrame as Series Pairs

The iterrows() method returns an iterator that yields index and row pairs, where each row is represented as a Series object, containing the data in each row.

Example

The following example iterates the DataFrame rows using the iterrows() method.

Open Compiler

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])

print("Original DataFrame:\n", df)

# Iterate Through DataFrame rows
print("Iterated Output:")
for row_index,row in df.iterrows():
   print(row_index,row)

Its output is as follows −

Original DataFrame:
        col1      col2      col3
0  0.468160 -0.634193 -0.603612
1  1.231840  0.090565 -0.449989
2 -1.645371  0.032578 -0.165950
3  1.956370 -0.261995  2.168167

Iterated Output:
0 col1    0.468160
col2   -0.634193
col3   -0.603612
Name: 0, dtype: float64
1 col1    1.231840
col2    0.090565
col3   -0.449989
Name: 1, dtype: float64
2 col1   -1.645371
col2    0.032578
col3   -0.165950
Name: 2, dtype: float64
3 col1    1.956370
col2   -0.261995
col3    2.168167
Name: 3, dtype: float64

Note: Because iterrows() iterate over the rows, it doesn't preserve the data type across the row. 0,1,2 are the row indices and col1,col2,col3 are column indices.

Iterate Over DataFrame as Namedtuples

The itertuples() method will return an iterator yielding a named tuple for each row in the DataFrame. The first element of the tuple will be the rows corresponding index value, while the remaining values are the row values. This method is generally faster than iterrows() and preserves the data types of the row elements.

Example

The following example uses the itertuples() method to loop thought a DataFrame rows as Namedtuples

Open Compiler

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])

print("Original DataFrame:\n", df)

# Iterate Through DataFrame rows
print("Iterated Output:")
for row in df.itertuples():
   print(row)

Its output is as follows −

Original DataFrame:
        col1      col2      col3
0  0.501238 -0.353269 -0.058190
1 -0.426044 -0.012733 -0.532594
2 -0.704042  2.201186 -1.960429
3  0.514151 -0.844160  0.508056

Iterated Output:
Pandas(Index=0, col1=0.5012381423628608, col2=-0.3532690739340918, col3=-0.058189913290578134)
Pandas(Index=1, col1=-0.42604395958954777, col2=-0.012733326002509393, col3=-0.5325942971498149)
Pandas(Index=2, col1=-0.7040424042099052, col2=2.201186165472291, col3=-1.9604285032438307)
Pandas(Index=3, col1=0.5141508750506754, col2=-0.8441600001815068, col3=0.5080555294913854)

Iterating Through DataFrame Columns

When you iterate over a DataFrame, it will simply returns the column names.

Example

Let us consider the following example to understand the iterate over a DataFrame columns.

Open Compiler

import pandas as pd
import numpy as np
 
N = 5
df = pd.DataFrame({
   'A': pd.date_range(start='2016-01-01', periods=N, freq='D'),
   'x': np.linspace(0, stop=N-1, num=N),
   'y': np.random.rand(N),
   'C': np.random.choice(['Low', 'Medium', 'High'], N).tolist(),
   'D': np.random.normal(100, 10, size=N).tolist()
})

print("Original DataFrame:\n", df)

# Iterate Through DataFrame Columns
print("Output:")
for col in df:
   print(col)

Its output is as follows −

Original DataFrame:
            A    x         y     C           D
0 2016-01-01  0.0  0.990949   Low  114.143838
1 2016-01-02  1.0  0.314517  High   95.559640
2 2016-01-03  2.0  0.180237   Low  121.134817
3 2016-01-04  3.0  0.170095   Low   95.643132
4 2016-01-05  4.0  0.920718   Low   96.379692

Output:
A
x
y
C
D

Example

While iterating over a DataFrame, you should not modify any object. Iteration is meant for reading, and the iterator returns a copy of the original object (a view), meaning changes will not reflect on the original object. The following example demonstrates the above statement.

Open Compiler

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(4,3),columns = ['col1','col2','col3'])

for index, row in df.iterrows():
   row['a'] = 10
print(df)

Its output is as follows −

        col1       col2       col3
0  -1.739815   0.735595  -0.295589
1   0.635485   0.106803   1.527922
2  -0.939064   0.547095   0.038585
3  -1.016509  -0.116580  -0.523158

As you can see, no changes are reflected in the DataFrame since the iteration only provides a view of the data.

Print Page