Python - Basics of Pandas Using Iris Dataset - GeeksforGeeks

Courses Tutorials Data Science Practice Sign In
GfG O line Programs Free Python 3 Tutorial Data Types Control Flow Functions List String Set Tuple Dictionary Oops Exception Handling Python Programs Python Projects Python Interview Q
Summarize
Explore Our Geeks Community
Python – Basics of Pandas using Iris Dataset Chat With This Website
C o u rs e s
Write an Interview Experience
Read Courses Practice Jobs
Share Your Campus Experience

Python language is one of the most trending programming languages as it is dynamic than
Convert A Categorical Variable Into others. Python is a simple high-level and an open-source language used for general-
Dummy Variables
purpose programming. It has many open-source libraries and Pandas is one of them.
195k+ interested Geeks 164k+ inte
Python - Bamboolib for Pandas
Pandas is a powerful, fast, flexible open-source library used for data analysis and
Complete Machine Learning & Python P
manipulations of data frames/datasets. Pandas can be used to read and write data in a
Data Science Program Foundati
Data Visualization with Pandas dataset of different formats like CSV(comma separated values), txt, xls(Microsoft Excel)
Explore Explore
etc.
Python | Visualize missing values (NaN)
values using Missingno Library
In this post, you will learn about various features of Pandas in Python and how to use it to
practice.
How to split a Dataset into Train and Test Prerequisites: Basic knowledge about coding in Python.
Sets using Python
Installation:
Predicting Air Quality Index using Python So if you are new to practice Pandas, then firstly you should install Pandas on your
system.
Vehicle Count Prediction From Sensor Go to Command Prompt and run it as administrator. Make sure you are connected with an
Data
internet connection to download and install it on your system.
Python – Categorical Encoding using Then type “pip install pandas“, then press Enter key.
Sunbird
▲
Wine Quality Prediction - Machine
Learning
Download the Dataset “Iris.csv” from here

Iris dataset is the Hello World for the Data Science, so if you have started your career in
Data Science and Machine Learning you will be practicing basic ML algorithms on this
famous dataset. Iris dataset contains five columns such as Petal Length, Petal Width,
Sepal Length, Sepal Width and Species Type.
Iris is a flowering plant, the researchers have measured various features of the different iris
flowers and recorded digitally.
Getting Started with Pandas:

Code: Importing pandas to use in our code as pd.
Python3
import pandas as pd
Code: Reading the dataset “Iris.csv”.
Python3
data = pd.read_csv("your downloaded dataset location ")
Code: Displaying up the top rows of the dataset with their columns
The function head() will display the top rows of the dataset, the default value of this
function is 5, that is it will show top 5 rows when no argument is given to it.
Python3
data.head()
Output:
Displaying the number of rows randomly.

In sample() function, it will also display the rows according to arguments given, but it will
display the rows randomly.
Python3
data.sample(10)
Output:
Code: Displaying the number of columns and names of the columns.

The column() function prints all the columns of the dataset in a list form.
Python3
data.columns
Output:
Code: Displaying the shape of the dataset.

The shape of the dataset means to print the total number of rows or entries and the total
number of columns or features of that particular dataset.
Python3
#The first one is the number of rows and

# the other one is the number of columns.
data.shape
Output:
Code: Display the whole dataset
Python3
print(data)
Output:
Code: Slicing the rows.

Slicing means if you want to print or work upon a particular group of lines that is from
10th row to 20th row.
Python3
#data[start:end]
#start is inclusive whereas end is exclusive
print(data[10:21])
# it will print the rows from 10 to 20.
# you can also save it in a variable for further use in analysis

sliced_data=data[10:21]
print(sliced_data)
Output:
Code: Displaying only specific columns.

In any dataset, it is sometimes needed to work upon only specific features or columns, so
we can do this by the following code.
Python3
#here in the case of Iris dataset

#we will save it in a another variable named "specific_data"
specific_data=data[["Id","Species"]]
#data[["column_name1","column_name2","column_name3"]]
#now we will print the first 10 columns of the specific_data dataframe.

print(specific_data.head(10))
Output:
Filtering:Displaying the specific rows using “iloc” and “loc” functions.
The “loc” functions use the index name of the row to display the particular row of the
dataset.
The “iloc” functions use the index integer of the row, which gives complete information
about the row.
Code:
Python3
#here we will use iloc
data.iloc[5]
#it will display records only with species "Iris-setosa".
data.loc[data["Species"] == "Iris-setosa"]
Output:
iloc()[/caption]
loc()
Code: Counting the number of counts of unique values using “value_counts()”.

The value_counts() function, counts the number of times a particular instance or data has
occurred.
Python3
#In this dataset we will work on the Species column, it will count number of ti
data["Species"].value_counts()
#it will display in descending order.
Output:
Calculating sum, mean and mode of a particular column.

We can also calculate the sum, mean and mode of any integer columns as I have done in
the following code.
Python3
# data["column_name"].sum()
sum_data = data["SepalLengthCm"].sum()
mean_data = data["SepalLengthCm"].mean()
median_data = data["SepalLengthCm"].median()
print("Sum:",sum_data, "\nMean:", mean_data, "\nMedian:",median_data)
Output:
Code: Extracting minimum and maximum from a column.

Identifying minimum and maximum integer, from a particular column or row can also be
done in a dataset.
Python3
min_data=data["SepalLengthCm"].min()
max_data=data["SepalLengthCm"].max()
print("Minimum:",min_data, "\nMaximum:", max_data)
Output:
Code: Adding a column to the dataset.

If want to add a new column in our dataset, as we are doing any calculations or extracting
some information from the dataset, and if you want to save it a new column. This can be
done by the following code by taking a case where we have added all integer values of all
columns.
Python3
# For example, if we want to add a column let say "total_values",

# that means if you want to add all the integer value of that particular
# row and get total answer in the new column "total_values".
# first we will extract the columns which have integer values.
cols = data.columns
# it will print the list of column names.

print(cols)
# we will take that columns which have integer values.

cols = cols[1:5]
# we will save it in the new dataframe variable

data1 = data[cols]
# now adding new column "total_values" to dataframe data.

data["total_values"]=data1[cols].sum(axis=1)
# here axis=1 means you are working in rows,

# whereas axis=0 means you are working in columns.
Output:
Code: Renaming the columns.

Renaming our column names can also be possible in python pandas libraries. We have
used the rename() function, where we have created a dictionary “newcols” to update our
new column names. The following code illustrates that.
Python3
newcols={
"Id":"id",
"SepalLengthCm":"sepallength"
"SepalWidthCm":"sepalwidth"}
data.rename(columns=newcols,inplace=True)
print(data.head())
Output:
Formatting and Styling:

Conditional formatting can be applied to your dataframe by using Dataframe.style function.
Styling is used to visualize your data, and most convenient way of visualizing your dataset
is in tabular form.
Here we will highlight the minimum and maximum from each row and columns.
Python3
#this is an example of rendering a datagram,

which is not visualised by any styles.
data.style
Output:
Now we will highlight the maximum and minimum column-wise, row-wise, and the whole
dataframe wise using Styler.apply function. The Styler.apply function passes each column
or row of the dataframe depending upon the keyword argument axis. For column-wise use
axis=0, row-wise use axis=1, and for the entire table at once use axis=None.
Python3
# we will here print only the top 10 rows of the dataset,

# if you want to see the result of the whole dataset remove
#.head(10) from the below code
data.head(10).style.highlight_max(color='lightgreen', axis=0)
data.head(10).style.highlight_max(color='lightgreen', axis=1)
data.head(10).style.highlight_max(color='lightgreen', axis=None)
Output:
for axis=0
for axis=1
for axis=None
Code: Cleaning and detecting missing values

In this dataset, we will now try to find the missing values i.e NaN, which can occur due to
several reasons.
Python3
data.isnull()
#if there is data is missing, it will display True else False.
Output:
isnull()
Code: Summarizing the missing values.

We will display how many missing values are present in each column.
Python3
data.isnull.sum()
Output:
Heatmap: Importing seaborn

The heatmap is a data visualisation technique which is used to analyse the dataset as
colors in two dimensions. Basically it shows correlation between all numerical variables in
the dataset. Heatmap is an attribute of the Seaborn library.
Code:
Python3
import seaborn as sns
iris = sns.load_dataset("iris")
sns.heatmap(iris.corr(),camp = "YlGnBu", linecolor = 'white', linewidths = 1)
Output:
Code: Annotate each cell with the numeric value using integer formatting
Python3
sns.heatmap(iris.corr(),camp = "YlGnBu", linecolor = 'white', linewidths = 1, a
Output:
heatmap with annot=True
Pandas Dataframe Correlation:

Pandas correlation is used to determine pairwise correlation of all the columns of the
dataset. In dataframe.corr(), the missing values are excluded and non-numeric columns are
also ignored.
Code:
Python3
data.corr(method='pearson')
Output:
data.corr()
The output dataframe can be seen as for any cell, row variable correlation with the column
variable is the value of the cell. The correlation of a variable with itself is 1. For that
reason, all the diagonal values are 1.00.
Multivariate Analysis:
Pair plot is used to visualize the relationship between each type of column variable. It is
implemented only by one line code, which is as follows :
Code:
Python3
g = sns.pairplot(data,hue="Species")
Output:
Pairplot of variable “Species”, to make it more understandable.
Don't miss your chance to ride the wave of the data revolution! Every industry is scaling
new heights by tapping into the power of data. Sharpen your skills, become a part of the
hottest trend in the 21st century.
Dive into the future of technology - explore the Complete Machine Learning and Data
Science Program by GeeksforGeeks and stay ahead of the curve.
Last Updated : 10 Jan, 2023 14
Previous Next
Building an Auxiliary GAN using Keras PyQt5 QSpinBox - Getting Style Name
and Tensorflow
Take a part in the ongoing discussion View all discussion
Similar Reads
Plotting graph For IRIS Dataset Using Seaborn Analyzing Decision Tree and K-means
And Matplotlib Clustering using Iris dataset
Python Bokeh – Visualizing the Iris Dataset Exploratory Data Analysis on Iris Dataset
Comparison of LDA and PCA 2D projection of How can Tensorflow be used with Estimators
Iris dataset in Scikit Learn to split the iris dataset?
Gaussian Process Classification (GPC) on Iris Decision Boundary of Label Propagation Vs

Dataset SVM on the Iris Dataset
Di erence Between Dataset.from_tensors How to get a cartesian product of a huge

and Dataset.from_tensor_slices Dataset using Pandas in Python?
Complete Tutorials
Python API Tutorial: Getting Started with APIs Advanced Python Tutorials
Python Automation Tutorial OpenAI Python API - Complete Guide
Computer Vision Tutorial
K kashishlo…
Article Tags : Python-pandas , Machine Learning , Python
Practice Tags : Machine Learning, python
Additional Information
Company Explore Languages DSA Data Science & ML HTML & CSS
A-143, 9th Floor, Sovereign Corporate About Us Job-A-Thon Hiring Python Data Structures Data Science With HTML
Tower, Sector-136, Noida, Uttar Pradesh - Challenge Python
Legal Java Algorithms CSS
201305
Careers Hack-A-Thon C++ DSA for Beginners Data Science For Bootstrap
GfG Weekly Contest Beginner
In Media PHP Basic DSA Problems Tailwind CSS
O line Classes Machine Learning
Contact Us GoLang DSA Roadmap SASS
(Delhi/NCR) Tutorial
Advertise with us SQL Top 100 DSA Interview LESS
DSA in JAVA/C++ ML Maths
GFG Corporate Solution R Language Problems Web Design
Master System Design Data Visualisation
Placement Training Android Tutorial DSA Roadmap by
Tutorial
Program Master CP Sandeep Jain
Pandas Tutorial
Apply for Mentor GeeksforGeeks Videos All Cheat Sheets
NumPy Tutorial
NLP Tutorial
Deep Learning Tutorial
Python Computer Science DevOps Competitive System Design JavaScript
Python Programming GATE CS Notes Git Programming What is System Design TypeScript
Examples Operating Systems AWS Top DS or Algo for CP Monolithic and ReactJS
Django Tutorial Computer Network Docker Top 50 Tree Distributed SD NextJS
Python Projects Database Management Kubernetes Top 50 Graph High Level Design or AngularJS
Python Tkinter System HLD
Azure Top 50 Array NodeJS
Web Scraping So ware Engineering Low Level Design or LLD
GCP Top 50 String Express.js
OpenCV Python Tutorial Digital Logic Design Crack System Design
DevOps Roadmap Top 50 DP Lodash
Round
Python Interview Engineering Maths Top 15 Websites for CP Web Browser
Question System Design
Interview Questions
Grokking Modern
System Design
NCERT Solutions School Subjects Commerce Management & UPSC Study SSC/ BANKING
Class 12 Mathematics Accountancy Finance Material SSC CGL Syllabus
Class 11 Physics Business Studies Management Polity Notes SBI PO Syllabus
Class 10 Chemistry Indian Economics HR Managament Geography Notes SBI Clerk Syllabus
Class 9 Biology Macroeconomics Income Tax History Notes IBPS PO Syllabus
Class 8 Social Science Microeconimics Finance Science and Technology IBPS Clerk Syllabus
Complete Study English Grammar Statistics for Economics Economics Notes SSC CGL Practice Papers
Material Economy Notes
Ethics Notes
Previous Year Papers
Colleges Companies Preparation Exams More Tutorials Write & Earn
Indian Colleges IT Companies Corner JEE Mains So ware Development Write an Article
Admission & Campus So ware Development Company Wise JEE Advanced So ware Testing Improve an Article
Experiences Companies Preparation GATE CS Product Management Pick Topics to Write
Top Engineering Artificial Intelligence(AI) Preparation for SDE NEET SAP Share your Experiences
Colleges Companies Experienced Interviews UGC NET SEO Internships
Top BCA Colleges CyberSecurity Internship Interviews Linux
Top MBA Colleges Companies
Competitive Excel
Top Architecture Service Based Programming
College Companies
Aptitude Preparation
Choose College For Product Based
Puzzles
Graduation Companies
PSUs for CS Engineers
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved

Python - Basics of Pandas Using Iris Dataset - GeeksforGeeks

Uploaded by

Copyright:

Available Formats

Python - Basics of Pandas Using Iris Dataset - GeeksforGeeks

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Python - Basics of Pandas Using Iris Dataset - GeeksforGeeks

Uploaded by

Copyright:

Available Formats

Courses Tutorials Data Science Practice Sign In

Share Your Campus Experience

Download the Dataset “Iris.csv” from here

Getting Started with Pandas:

Code: Reading the dataset “Iris.csv”.

data = pd.read_csv("your downloaded dataset location ")

Displaying the number of rows randomly.

Code: Displaying the number of columns and names of the columns.

Code: Displaying the shape of the dataset.

#The first one is the number of rows and

Code: Display the whole dataset

Code: Slicing the rows.

# you can also save it in a variable for further use in analysis

Code: Displaying only specific columns.

#here in the case of Iris dataset

#now we will print the first 10 columns of the specific_data dataframe.

Filtering:Displaying the specific rows using “iloc” and “loc” functions.

#here we will use iloc

Code: Counting the number of counts of unique values using “value_counts()”.

Calculating sum, mean and mode of a particular column.

print("Sum:",sum_data, "\nMean:", mean_data, "\nMedian:",median_data)

Code: Extracting minimum and maximum from a column.

print("Minimum:",min_data, "\nMaximum:", max_data)

Code: Adding a column to the dataset.

# For example, if we want to add a column let say "total_values",

# it will print the list of column names.

# we will take that columns which have integer values.

# we will save it in the new dataframe variable

# now adding new column "total_values" to dataframe data.

# here axis=1 means you are working in rows,

Code: Renaming the columns.

Formatting and Styling:

#this is an example of rendering a datagram,

# we will here print only the top 10 rows of the dataset,

Code: Cleaning and detecting missing values

Code: Summarizing the missing values.

Heatmap: Importing seaborn

import seaborn as sns

sns.heatmap(iris.corr(),camp = "YlGnBu", linecolor = 'white', linewidths = 1, a

heatmap with annot=True

Pandas Dataframe Correlation:

Pairplot of variable “Species”, to make it more understandable.

Last Updated : 10 Jan, 2023 14

Take a part in the ongoing discussion View all discussion

Gaussian Process Classification (GPC) on Iris Decision Boundary of Label Propagation Vs

Di erence Between Dataset.from_tensors How to get a cartesian product of a huge

Python Automation Tutorial OpenAI Python API - Complete Guide

Computer Vision Tutorial

Article Tags : Python-pandas , Machine Learning , Python

Practice Tags : Machine Learning, python

Deep Learning Tutorial

Python Computer Science DevOps Competitive System Design JavaScript

Class 12 Mathematics Accountancy Finance Material SSC CGL Syllabus

Class 11 Physics Business Studies Management Polity Notes SBI PO Syllabus

Class 9 Biology Macroeconomics Income Tax History Notes IBPS PO Syllabus

Previous Year Papers

Colleges Companies Preparation Exams More Tutorials Write & Earn

PSUs for CS Engineers

@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved