How to Use Python Pandas
to manipulate and analyze data efficiently
Pandas is a Python toolbox for working with data collections. It includes functions for analyzing, cleaning, examining, and modifying data. In this article, we will see how we can use Python Pandas with the help of examples.
What is Python Pandas?
A Python library called Pandas was created to analyze and manipulate a wide variety of data, including time series, tabular data, and many kinds of data sets. Data sets in a variety of formats, including relational database tables, Excel files, XML files, comma-separated values (CSV) files, and JavaScript object notation (JSON) files, can be processed by pandas.
Pandas was developed by Wes McKinney in 2008, and it was made available as an open-source project in 2010 so that anybody may contribute to its advancement. Using NumPy, a different Python library that provides features like n-dimensional arrays, McKinney built Pandas.
Uses of Python Pandas
Below, are the uses of Pandas Library.
- Pandas find applications across various domains of data analysis, from scientific research to financial sectors.
- It excels in organizing and transforming data into formats suitable for analysis, enhancing data analytics in diverse contexts.
- Pandas offers a comprehensive suite of functions for data manipulation, including grouping, cleaning, merging, sorting, and visualization.
- Additionally, it provides tools for computing descriptive statistics such as mean, standard deviation, quartiles, and facilitates integration with other Python libraries like SciPy for inferential statistics computation, such as paired sample t-tests and ANOVA.
How to Use Python Pandas?
Using Python Pandas requires multiple steps to manipulate and analyze data efficiently. Here's a simple guide for using Pandas:
Install Pandas Library
Before using the pandas in our code we need to install it in our system, for install the pandas library use the below command.
pip install pandas
Import Pandas Library to Python
If we want to use the pandas library's functions, we first need to import it into Python. We can achieve that using the Python syntax shown below:
import pandas as pd
Create DataFrame with Pandas Library in Python
The pandas library's ability to generate new DataFrame objects is a very important feature. For this, we can use the pd.DataFrame() function, as seen below:
import pandas as pd
# Define data as a dictionary
data = {
'Name': ['Sangita', 'Rohan', 'Max'],
'Age': [25, 30, 35],
'Gender': ['Female', 'Male', 'Male']
}
# Create DataFrame
df = pd.DataFrame(data)
print(df)
Output
Name Age Gender 0 Sangita 25 Female 1 Rohan 30 Male 2 Max 35 Male
Python Pandas Examples
Below are some of the examples by which we can understand how we can use Python Pandas to create and insert row and column in the DataFrame in Python:
Example 1: Add New Column to Pandas DataFrame
In this example, we import the Pandas library and create a DataFrame from dictionary data with columns for 'Name', 'Age', and 'Gender'. To add a new 'Location' column, assign a list of values to df['Location'], ensuring its length matches the DataFrame's rows. Finally, we print the DataFrame to observe the new 'Location' column.
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Rahul', 'Mahi', 'Ram'],
'Age': [25, 30, 35],
'Gender': ['Male', 'Female', 'Male']
}
df = pd.DataFrame(data)
# Add a new column 'Location'
df['Location'] = ['Delhi', 'Banglore', 'Noida']
print(df)
Output
Name Age Gender Location 0 Rahul 25 Male Delhi 1 Mahi 30 Female Banglore 2 Ram 35 Male Noida
Example 2: Remove Column From Pandas DataFrame
In this example, we create a DataFrame df from dictionary data containing columns for 'Name', 'Age', and 'Gender'. To remove the 'Gender' column, we use the drop() function with the columns parameter set to 'Gender' and inplace=True to modify the DataFrame in place. Finally, we print the DataFrame to observe the changes after removing the 'Gender' column.
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Rahul', 'Riya', 'Rohit'],
'Age': [25, 30, 35],
'Gender': ['Male', 'Female', 'Male']
}
df = pd.DataFrame(data)
# Remove the 'Gender' column
df.drop(columns=['Gender'], inplace=True)
print(df)
Output
Name Age 0 Rahul 25 1 Riya 30 2 Rohit 35
Example 3: Add New Row to Pandas DataFrame
In this example, I’ll demonstrate adding a new row to the bottom of a DataFrame. We begin by creating a DataFrame df with columns for ‘Name’, ‘Age’, and ‘Gender’ using dictionary data. To add a new row, we define the data in a dictionary called new_row. We utilize the pd.concat() method to append the new entry to the DataFrame, specifying ignore_index=True to reindex the DataFrame after adding the new row.
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Rahul', 'Raksha', 'Mohit'],
'Age': [25, 30, 35],
'Gender': ['Male', 'Female', 'Male']
}
df = pd.DataFrame(data)
# Define data for the new row
new_row = pd.DataFrame([{'Name': 'Sakshi', 'Age': 28, 'Gender': 'Female'}])
# Append the new row to the DataFrame using pd.concat
df = pd.concat([df, new_row], ignore_index=True)
print(df)
Output
Name Age Gender 0 Rahul 25 Male 1 Raksha 30 Female 2 Mohit 35 Male 3 Sakshi 28 Female
Example 4: Remove Row from Pandas DataFrame
In this example demonstrates how to delete a row from a Pandas DataFrame in Python. We start by creating a DataFrame df with columns for 'Name', 'Age', and 'Gender' using dictionary data. To remove a row based on a condition, we utilize boolean indexing. In this case, we use df['Name'] != 'Alice' to select all rows where the 'Name' column is not 'Alice'. This effectively removes the entry with the name 'Mohit' from the DataFrame.
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Mohit', 'Sonal', 'Rishav'],
'Age': [25, 30, 35],
'Gender': ['Male', 'Female', 'Male']
}
df = pd.DataFrame(data)
# Remove the row where Name is 'Mohit'
df = df[df['Name'] != 'Mohit']
print(df)
Output
Name Age Gender 1 Sonal 30 Female 2 Rishav 35 Male