How to combine two dataframe in Python – Pandas?
Combining two DataFrames in Pandas involves concat()
and merge()
function to join data based on common columns or indices. Let’s understand with a quick example:
import pandas as pd
# Create two sample DataFrames
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Age': [35, 40]})
# Combine using concat()
combined_df = pd.concat([df1, df2])
display(combined_df)
Output:
Combining two DataFrames in Pandas
Note that, by default, pd.concat()
will stack rows, so the DataFrames would be stacked on top of each other row by row, resulting in each DataFrame’s rows being added to the end of the other so each keeping its own default index values unless you set ignore_index=True
. Let’s see this in action.
combined_df = pd.concat([df1, df2], ignore_index=True)
display(combined_df)
Output:
Combining two DataFrames in Pandas
In this example, we combined two DataFrames by stacking them vertically (row-wise) using concat()
. Now, let’s dive deeper into the different methods available in Pandas for combining DataFrames.
Combining DataFrames is essential when you need to merge data from different sources or files. In Pandas, there are two main ways to combine DataFrames:
concat()
: Used for concatenating DataFrames along rows or columns.merge()
: Similar to SQL joins, this method allows merging based on common columns or indices.
Both methods are highly flexible and allow for various types of joins and concatenations.
1. Using concat() for Combining DataFrames
The concat()
function is used to concatenate two or more DataFrames along a specified axis (either rows or columns). It can be thought of as stacking the DataFrames either vertically or horizontally.
axis=0
: Stacks the DataFrames row-wise (default behavior).axis=1
: Stacks the DataFrames column-wise.ignore_index=True
: Resets the index in the concatenated result.
Above we have already seen an example for combine two DataFrames along the rows, now let’s see an example using pd.concat()
for stacking along the columns by setting axis=1
. This will place the columns from df2
to the right of the columns from df1
, aligning rows by their index.
import pandas as pd
# Create two sample DataFrames
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'City': ['New York', 'Los Angeles'], 'Salary': [70000, 80000]})
# Combine using concat() along columns
combined_df = pd.concat([df1, df2], axis=1, ignore_index=False)
display(combined_df)
Output:
Name Age City Salary
0 Alice 25 New York 70000
1 Bob 30 Los Angeles 80000
In this case: ignore_index=False
keeps the original indexes from df1
and df2
aligned, so each row from df1
pairs with the corresponding row from df2
.
2. Using merge()
The merge()
function is similar to SQL joins and is used when you want to combine DataFrames based on a common column or index. This method is more flexible than concat()
because it allows for different types of joins like inner join, outer join, left join, and right join.
Syntax: pd.merge(left, right, how='inner', on=None)
where,
how
: Specifies the type of join ('inner'
,'outer'
,'left'
,'right'
).on
: Specifies the column(s) to join on.
import pandas as pd
# Merging on a common column
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df2 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Salary': [50000, 60000]})
merged_df = pd.merge(df1, df2, on='Name')
display(merged_df)
Output:
Name Age Salary
0 Alice 25 50000
1 Bob 30 60000
In this case, we merged two DataFrames using the common column 'Name'
. The result contains only rows where 'Name'
exists in both DataFrames (an inner join).
Types of Joins in merge()
:
- Inner Join (default): Returns only matching rows.
- Outer Join: Returns all rows from both DataFrames; missing values are filled with NaN.
- Left Join: Returns all rows from the left DataFrame and matching rows from the right.
- Right Join: Returns all rows from the right DataFrame and matching rows from the left.
Refer to Joining two Dataframe Pandas using merge() for in-depth understanding and practical examples of each type of join.
Key Differences Between concat()
and merge()
Feature | concat() | merge() |
---|---|---|
Purpose | Stack/concatenate along an axis | SQL-style joining based on columns |
Axis | Can concatenate along rows or columns | Joins based on common columns or index |
Join Types | N/A | Supports inner, outer, left, right joins |
Flexibility | Simple concatenation | More complex merging with conditions |
concat()
is best for stacking or appending DataFrames either row-wise or column-wise.merge()
is ideal for joining datasets based on shared columns or indices.- Both methods are highly customizable with parameters like
axis
,how
, andon
.
FAQs on How to Combine Two Dataframes in Pandas
How to combine multiple DataFrames into one in Python?
You can use
pd.concat()
orpd.merge()
to combine multiple DataFrames. For example,concat()
stacks them vertically or horizontally, whilemerge()
joins them based on common columns
How do I add a DataFrame to a DataFrame in pandas?
You can add a DataFrame to another using
pd.concat()
, which appends rows or columns. The deprecatedappend()
method was also used for this purpose but is no longer recommended.
How to combine two datasets in Python?
Use
pd.merge()
for SQL-style joins on common columns orpd.concat()
for stacking datasets vertically or horizontally, depending on your data structure and requirements.
How to combine two DataFrames in pandas vertically?
Use
pd.concat([df1, df2], axis=0)
to stack two DataFrames vertically, aligning the columns and appending rows from the second DataFrame