How to add column from another DataFrame in Pandas ?
In this discussion, we will explore the process of adding a column from another data frame in Pandas. Pandas is a powerful data manipulation library for Python, offering versatile tools for handling and analyzing structured data.
Add column from another DataFrame in Pandas
There are various ways to add a column from another DataFrame in Pandas. Here, we will explain some generally used methods for adding columns from another DataFrame in Pandas which are the following
- Using join() Method
- Using insert() Method
- Using ‘assign()’ Method
- Using ‘concat()‘ Method
Add column from another DataFrame using join()
Using this approach, the column to be added to the second dataframe is first extracted from the first using its name. Here the extracted column has been assigned to a variable.
Syntax: dataframe1[“name_of_the_column”]
After extraction, the column needs to be simply added to the second dataframe using join() function.
Syntax: Dataframe2.join(“variable_name”)
This function needs to be called with reference to the dataframe in which the column has to be added and the variable name which stores the extracted column name has to be passed to it as the argument. As a result, the column will be added to the end of the second dataframe with the same name as it was in the previous dataframe.
Python3
import pandas as pd # Create the first DataFrame df1 = pd.DataFrame({ "Col1" : [ 1 , 2 , 3 ], "Col2" : [ "A" , "B" , "C" ], "Col3" : [ "geeks" , "for" , "geeks" ] }) # Display the first DataFrame print ( "First DataFrame:" ) display(df1) # Create the second DataFrame df2 = pd.DataFrame({ "C1" : [ 4 , 5 , 6 ], "C2" : [ "D" , "E" , "F" ] }) # Display the second DataFrame print ( "Second DataFrame:" ) display(df2) # Extract a column from the first DataFrame extracted_col = df1[ "Col3" ] # Display the column to be added from the first DataFrame to the second DataFrame print ( "Column to be added from the first DataFrame to the second DataFrame:" ) display(extracted_col) # Add the extracted column to the second DataFrame df2 = pd.concat([df2, extracted_col.rename( "Col3_from_df1" )], axis = 1 ) # Display the second DataFrame after adding the column from the first DataFrame print ( "Second DataFrame after adding the column from the first DataFrame:" ) display(df2) |
Output:
First dataframe:
Col1 Col2 Col3
0 1 A geeks
1 2 B for
2 3 C geeks
Second dataframe:
C1 C2 Col3
0 4 D geeks
1 5 E for
2 6 F geeks
Column to be added from the first dataframe to the second:
0 geeks
1 for
2 geeks
Name: Col3, dtype: object
Second dataframe after adding column from the first dataframe:
C1 C2 Col3
0 4 D geeks
1 5 E for
2 6 F geeks
Add A Column To A DataFrame In Python Pandas using insert()
The approach is the same as above- the column to be added is first extracted and assigned to a variable and then added to another dataframe. The difference here is that this approach gives freedom to place the column anywhere and with a different column name if need be.
Syntax: insert(location, “new_name”, “extarcted_column” )
Here, the index where the column is desired to inserted is passed in place of location. new_name can be replaced by the name the column is supposed to be renamed with and extracted_column is the column from the first dataframe.
Python3
import pandas as pd # Create the first DataFrame df1 = pd.DataFrame({ "Col1" : [ 1 , 2 , 3 ], "Col2" : [ "A" , "B" , "C" ], "Col3" : [ "geeks" , "for" , "geeks" ] }) # Display the first DataFrame print ( "First DataFrame:" ) display(df1) # Create the second DataFrame df2 = pd.DataFrame({ "C1" : [ 4 , 5 , 6 ], "C2" : [ "D" , "E" , "F" ] }) # Display the second DataFrame print ( "Second DataFrame:" ) display(df2) # Extract a column from the first DataFrame extracted_col = df1[ "Col3" ] # Display the column to be added from the first DataFrame to the second DataFrame print ( "Column to be added from the first DataFrame to the second DataFrame:" ) display(extracted_col) # Add the extracted column to the second DataFrame at position 1 with the name "C3" df2.insert( 1 , "C3" , extracted_col) # Display the second DataFrame after adding the column from the first DataFrame print ( "Second DataFrame after adding the column from the first DataFrame:" ) display(df2) |
Output
First dataframe:
Col1 Col2 Col3
0 1 A geeks
1 2 B for
2 3 C geeks
Second dataframe:
C1 C2
0 4 D
1 5 E
2 6 F
Column to be added from the first dataframe to the second:
0 geeks
1 for
2 geeks
Name: Col3, dtype: object
Second dataframe after adding column from the first dataframe:
C1 C3 C2
0 4 geeks D
1 5 for E
2 6 geeks F
Add a column from one dataframe to another dataframe using ‘assign()’ Method
The assign
method in Pandas allows you to add a new column to a DataFrame. This method is versatile, providing the freedom to specify the location and name of the new column. The approach involves first extracting the column to be added and then assigning it to the desired location in another DataFrame.
Syntax : insert(location, “new_name”, “extracted_column”)
The insert
method in Pandas is used to add a new column to a DataFrame. It takes parameters for the insertion “location” index, an optional “new_name” for the column, and the “extracted_column” from another DataFrame. This method provides control over column placement and naming during insertion.
Python3
import pandas as pd # Creating two DataFrames df1 = pd.DataFrame({ 'ID' : [ 1 , 2 , 3 ], 'Name' : [ 'Geek1' , 'Geek2' , 'Geek3' ]}) df2 = pd.DataFrame({ 'Salary' : [ 50000 , 60000 , 70000 ]}) # Extracting the 'Salary' column from df2 extracted_column = df2[ 'Salary' ] # Adding the extracted column to df1 at a specific location with a new name result = df1.assign(LocationSpecificColumn = pd.Series(extracted_column).values) print (result) |
Output
ID Name LocationSpecificColumn
0 1 Geek1 50000
1 2 Geek2 60000
2 3 Geek3 70000
Add column from another DataFrame using ‘concat()‘ Method
The approach with merge
involves combining two DataFrames based on a common column. The column to be added is first extracted and assigned to a variable. Then, the merge
method is used to merge the two DataFrames on the common column. This approach provides flexibility in terms of column placement and renaming.
Syntax : pd.merge(left_dataframe, right_dataframe, on=’common_column’, how=’merge_type’)
The line uses Pandas’ merge
function to combine two DataFrames (left_dataframe
and right_dataframe
) based on a common column (‘common_column’), with the specified merge type (‘merge_type’) such as ‘left’, ‘right’, ‘outer’, or ‘inner’.
Python3
import pandas as pd # Creating two DataFrames df1 = pd.DataFrame({ 'ID' : [ 1 , 2 , 3 ], 'Name' : [ 'Geek1' , 'Geek2' , 'Geek3' ]}) df2 = pd.DataFrame({ 'ID' : [ 1 , 2 , 3 ], 'Age' : [ 25 , 30 , 35 ]}) # Extracting the 'Age' column from df2 extracted_column = df2[ 'Age' ] # Merging DataFrames on the 'ID' column result = pd.merge(df1, df2, on = 'ID' ) print (result) |
Output
ID Name Age
0 1 Geek1 25
1 2 Geek2 30
2 3 Geek3 35