Open In App

How to combine two dataframe in Python – Pandas?

Last Updated : 15 Nov, 2024
Summarize
Comments
Improve
Suggest changes
Like Article
Like
Share
Report
News Follow

Combining two DataFrames in Pandas involves concat() and merge() function to join data based on common columns or indices. Let’s understand with a quick example:

import pandas as pd

# Create two sample DataFrames
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Age': [35, 40]})

# Combine using concat()
combined_df = pd.concat([df1, df2])
display(combined_df)

Output:

Combine-two-dataframe-in-Python---Pandas

Combining two DataFrames in Pandas

Note that, by default, pd.concat() will stack rows, so the DataFrames would be stacked on top of each other row by row, resulting in each DataFrame’s rows being added to the end of the other so each keeping its own default index values unless you set ignore_index=True. Let’s see this in action.

combined_df = pd.concat([df1, df2], ignore_index=True)
display(combined_df)

Output:

combine-two-dataframe-in-Python

Combining two DataFrames in Pandas

In this example, we combined two DataFrames by stacking them vertically (row-wise) using concat(). Now, let’s dive deeper into the different methods available in Pandas for combining DataFrames.

Combining DataFrames is essential when you need to merge data from different sources or files. In Pandas, there are two main ways to combine DataFrames:

  • concat(): Used for concatenating DataFrames along rows or columns.
  • merge(): Similar to SQL joins, this method allows merging based on common columns or indices.

Both methods are highly flexible and allow for various types of joins and concatenations.

1. Using concat() for Combining DataFrames

The concat() function is used to concatenate two or more DataFrames along a specified axis (either rows or columns). It can be thought of as stacking the DataFrames either vertically or horizontally.

  • axis=0: Stacks the DataFrames row-wise (default behavior).
  • axis=1: Stacks the DataFrames column-wise.
  • ignore_index=True: Resets the index in the concatenated result.

Above we have already seen an example for combine two DataFrames along the rows, now let’s see an example using pd.concat() for stacking along the columns by setting axis=1. This will place the columns from df2 to the right of the columns from df1, aligning rows by their index.

import pandas as pd
# Create two sample DataFrames
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'City': ['New York', 'Los Angeles'], 'Salary': [70000, 80000]})

# Combine using concat() along columns
combined_df = pd.concat([df1, df2], axis=1, ignore_index=False)
display(combined_df)

Output:

Name	Age	City	Salary
0 Alice 25 New York 70000
1 Bob 30 Los Angeles 80000

In this case: ignore_index=False keeps the original indexes from df1 and df2 aligned, so each row from df1 pairs with the corresponding row from df2.

2. Using merge()

The merge() function is similar to SQL joins and is used when you want to combine DataFrames based on a common column or index. This method is more flexible than concat() because it allows for different types of joins like inner join, outer join, left join, and right join.

Syntax: pd.merge(left, right, how='inner', on=None)

where,

  • how: Specifies the type of join ('inner''outer''left''right').
  • on: Specifies the column(s) to join on.
import pandas as pd
# Merging on a common column
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Salary': [50000, 60000]})

merged_df = pd.merge(df1, df2, on='Name')
display(merged_df)

Output:

	Name	Age	Salary
0 Alice 25 50000
1 Bob 30 60000

In this case, we merged two DataFrames using the common column 'Name'. The result contains only rows where 'Name' exists in both DataFrames (an inner join).

Types of Joins in merge():

  • Inner Join (default): Returns only matching rows.
  • Outer Join: Returns all rows from both DataFrames; missing values are filled with NaN.
  • Left Join: Returns all rows from the left DataFrame and matching rows from the right.
  • Right Join: Returns all rows from the right DataFrame and matching rows from the left.

Refer to Joining two Dataframe Pandas using merge() for in-depth understanding and practical examples of each type of join.

Key Differences Between concat() and merge()

Featureconcat()merge()
PurposeStack/concatenate along an axisSQL-style joining based on columns
AxisCan concatenate along rows or columnsJoins based on common columns or index
Join TypesN/ASupports inner, outer, left, right joins
FlexibilitySimple concatenationMore complex merging with conditions
  • concat() is best for stacking or appending DataFrames either row-wise or column-wise.
  • merge() is ideal for joining datasets based on shared columns or indices.
  • Both methods are highly customizable with parameters like axishow, and on.

FAQs on How to Combine Two Dataframes in Pandas

How to combine multiple DataFrames into one in Python?

You can use pd.concat() or pd.merge() to combine multiple DataFrames. For example, concat() stacks them vertically or horizontally, while merge() joins them based on common columns

How do I add a DataFrame to a DataFrame in pandas?

You can add a DataFrame to another using pd.concat(), which appends rows or columns. The deprecated append() method was also used for this purpose but is no longer recommended.

How to combine two datasets in Python?

Use pd.merge() for SQL-style joins on common columns or pd.concat() for stacking datasets vertically or horizontally, depending on your data structure and requirements.

How to combine two DataFrames in pandas vertically?

Use pd.concat([df1, df2], axis=0) to stack two DataFrames vertically, aligning the columns and appending rows from the second DataFrame



Next Article

Similar Reads

three90RightbarBannerImg