0% found this document useful (0 votes)
128 views20 pages

Lab3 - Python - Pandas DataFrame - GeeksforGeeks

The document discusses Pandas DataFrame, which is a two-dimensional tabular data structure with labeled rows and columns. It covers creating DataFrames from lists, dictionaries, and files. It also discusses selecting, adding, and deleting rows and columns, as well as indexing and selecting data using .loc, .iloc, and boolean indexing. The document also covers working with missing data by checking for null values, filling null values, and dropping rows/columns with null values.

Uploaded by

sa00059
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
128 views20 pages

Lab3 - Python - Pandas DataFrame - GeeksforGeeks

The document discusses Pandas DataFrame, which is a two-dimensional tabular data structure with labeled rows and columns. It covers creating DataFrames from lists, dictionaries, and files. It also discusses selecting, adding, and deleting rows and columns, as well as indexing and selecting data using .loc, .iloc, and boolean indexing. The document also covers working with missing data by checking for null values, filling null values, and dropping rows/columns with null values.

Uploaded by

sa00059
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 20

Related Articles

Python | Pandas DataFrame


Last Updated : 10 Jan, 2019

Read Discuss Courses Practice Video

Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data

structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure,

i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three

principal components, the data, rows, and columns.

We will get a brief insight on all these basic operation which can be performed on Pandas

DataFrame :
Start Your Coding Journey Now!
Creating a DataFrame

Dealing with Rows and Columns


Login Register
Indexing and Selecting Data

Working with Missing Data

Iterating over rows and columns

Creating a Pandas DataFrame

In the real world, a Pandas DataFrame will be created by loading the datasets from existing stor-

age, storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from

the lists, dictionar y, and from a list of dictionar y etc. Dataframe can be created in different ways

here are some ways by which we create a dataframe:

Creating a dataframe using List : DataFrame can be created using a single list or a list of lists.

# import pandas as pd
import pandas as pd

# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']

# Calling DataFrame constructor on list


df = pd.DataFrame(lst)
print(df)
Run on IDE

Output:

Creating DataFrame from dict of ndarray/lists: To create DataFrame from dict of narray/list, all

the narray must be of same length. If index is passed then the length index should be equal to the
Start Your Coding Journey Now!
length of arrays. If no index is passed, then by default, index will be range(n) where n is the array

length.

# Python code demonstrate creating


# DataFrame from dict narray / lists
# By default addresses.

import pandas as pd

# intialise data of lists.


data = {'Name':['Tom', 'nick', 'krish', 'jack'],
'Age':[20, 21, 19, 18]}

# Create DataFrame
df = pd.DataFrame(data)

# Print the output.


print(df)
Run on IDE

Output:

For more details refer to Creating a Pandas DataFrame


Dealing with Rows and Columns
Start Your Coding Journey Now!
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows

and columns. We can perform basic operations on rows/columns like selecting, deleting, adding,

and renaming.

Column Selection: In Order to select a column in Pandas DataFrame, we can either access the col-

umns by calling them by their columns name.

# Import pandas package


import pandas as pd

# Define a dictionary containing employee data


data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}

# Convert the dictionary into DataFrame


df = pd.DataFrame(data)

# select two columns


print(df[['Name', 'Qualification']])
Run on IDE

Output:

Row Selection: Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[]

method is used to retrieve rows from Pandas DataFrame. Rows can also be selected by passing in-

teger location to an iloc[] function.

Note: We’ll be using nba.csv file in below examples.


# importing pandas package
Start Your Coding Journey Now!
import pandas as pd

# making data frame from csv file


data = pd.read_csv("nba.csv", index_col ="Name")

# retrieving row by loc method


first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]

print(first, "\n\n\n", second)

Output:

A s shown in the output image, two series were returned since there was only one parameter both

of the times.

For more Details refer to Dealing with Rows and Columns

Indexing and Selecting Data

Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame.

Indexing could mean selecting all the rows and some of the columns, some of the rows and all of

the columns, or some of each of the rows and columns. Indexing can also be known as Subset

Selection.

Indexing a Dataframe using indexing operator [] :

Indexing operator is used to refer to the square brackets following an object. The .loc and .iloc in-

dexers also use the indexing operator to make selections. In this indexing operator to refer to df[].

Selecting a single columns


Start Your Coding Journey Now!
In order to select a single column, we simply put the name of the column in-between the brackets

# importing pandas package


import pandas as pd

# making data frame from csv file


data = pd.read_csv("nba.csv", index_col ="Name")

# retrieving columns by indexing operator


first = data["Age"]

print(first)

Output:

Indexing a DataFrame using .loc[ ] :

This function selects data by the label of the rows and columns. The df.loc indexer selects data in a

different way than just the indexing operator. It can select subsets of rows or columns. It can also

simultaneously select subsets of rows and columns.

Selecting a single row

In order to select a single row using .loc[], we put a single row label in a .loc function.

# importing pandas package


import pandas as pd

# making data frame from csv file


data = pd.read_csv("nba.csv", index_col ="Name")

# retrieving row by loc method


first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]
Start Your Coding Journey Now!
print(first, "\n\n\n", second)

Output:

A s shown in the output image, two series were returned since there was only one parameter both

of the times.

Indexing a DataFrame using .iloc[ ] :

This function allows us to retrieve rows and columns by position. In order to do that, we’ll need to

specify the positions of the rows that we want, and the positions of the columns that we want as

well. The df.iloc indexer is ver y similar to df.loc but only uses integer locations to make its

selections.

Selecting a single row

In order to select a single row using .iloc[], we can pass a single integer to .iloc[] function.

import pandas as pd

# making data frame from csv file


data = pd.read_csv("nba.csv", index_col ="Name")

# retrieving rows by iloc method


row2 = data.iloc[3]

print(row2)
Start Your Coding Journey Now!
Output:

For more Details refer

Indexing and Selecting Data with Pandas

Boolean Indexing in Pandas

Working with Missing Data

Missing Data can occur when no information is provided for one or more items or for a whole unit.

Missing Data is a ver y big problem in real life scenario. Missing Data can also refer to as NA(Not

Available) values in pandas.

Checking for missing values using isnull() and notnull() :

In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull().

Both function help in checking whether a value is NaN or not. These function can also be used in

Pandas Series in order to find null values in a series.

# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}

# creating a dataframe from list


df = pd.DataFrame(dict)
Start Your Coding Journey Now!
# using isnull() function
df.isnull()

Output:

Filling missing values using fillna(), replace() and interpolate() :

In order to fill null values in a datasets, we use fillna(), replace() and interpolate() function these

function replace NaN values with some value of their own. All these function help in filling a null

values in datasets of a DataFrame. Interpolate() function is basically used to fill NA values in the

dataframe but it uses various interpolation technique to fill the missing values rather than hard-

coding the value.

# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}

# creating a dataframe from dictionary


df = pd.DataFrame(dict)

# filling missing value using fillna()


df.fillna(0)

Output:
Start Your Coding Journey Now!

Dropping missing values using dropna() :

In order to drop a null values from a dataframe, we used dropna() function this fuction drop

Rows/Columns of datasets with Null values in different ways.

# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, np.nan, 45, 56],
'Third Score':[52, 40, 80, 98],
'Fourth Score':[np.nan, np.nan, np.nan, 65]}

# creating a dataframe from dictionary


df = pd.DataFrame(dict)

df
Start Your Coding Journey Now!

Now we drop rows with at least one Nan value (Null value)

# importing pandas as pd
import pandas as pd

# importing numpy as np
import numpy as np

# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, np.nan, 45, 56],
'Third Score':[52, 40, 80, 98],
'Fourth Score':[np.nan, np.nan, np.nan, 65]}

# creating a dataframe from dictionary


df = pd.DataFrame(dict)

# using dropna() function


df.dropna()

Output:
Start Your Coding Journey Now!
For more Details refer to Working with Missing Data in Pandas

Iterating over rows and columns

Iteration is a general term for taking each item of something, one after another. Pandas DataFrame

consists of rows and columns so, in order to iterate over dataframe, we have to iterate a dataframe

like a dictionar y.

Iterating over rows :

In order to iterate over rows, we can use three function iteritems(), iterrows(), itertuples() . These

three function will help in iteration over rows.

# importing pandas as pd
import pandas as pd

# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'degree': ["MBA", "BCA", "M.Tech", "MBA"],
'score':[90, 40, 80, 98]}

# creating a dataframe from a dictionary


df = pd.DataFrame(dict)

print(df)
Run on IDE

Now we apply iterrows() function in order to get a each element of rows.

# importing pandas as pd
import pandas as pd
Start Your Coding Journey Now!
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'degree': ["MBA", "BCA", "M.Tech", "MBA"],
'score':[90, 40, 80, 98]}

# creating a dataframe from a dictionary


df = pd.DataFrame(dict)

# iterating over rows using iterrows() function


for i, j in df.iterrows():
print(i, j)
print()
Run on IDE

Output:

Iterating over Columns :

In order to iterate over columns, we need to create a list of dataframe columns and then iterating

through that list to pull out the dataframe columns.

# importing pandas as pd
import pandas as pd

# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'degree': ["MBA", "BCA", "M.Tech", "MBA"],
'score':[90, 40, 80, 98]}

# creating a dataframe from a dictionary


df = pd.DataFrame(dict)

print(df)
Run on IDE
Start Your Coding Journey Now!

Now we iterate through columns in order to iterate through columns we first create a list of

dataframe columns and then iterate through list.

# creating a list of dataframe columns


columns = list(df)

for i in columns:

# printing the third element of the column


print (df[i][2])

Output:

For more Details refer to Iterating over rows and columns in Pandas DataFrame

DataFrame Methods:

FUNCTION DESCRIPTION
Start
index()
Your CodingMethod
Journey Now!
returns index (row labels) of the DataFrame

insert() Method inserts a column into a DataFrame

add() Method returns addition of dataframe and other, element-wise (bi-


nary operator add)

sub() Method returns subtraction of dataframe and other, element-wise


(binary operator sub)

mul() Method returns multiplication of dataframe and other, element-wise


(binary operator mul)

div() Method returns floating division of dataframe and other, element-


wise (binary operator truediv)

unique() Method extracts the unique values in the dataframe

nunique() Method returns count of the unique values in the dataframe

value_counts() Method counts the number of times each unique value occurs
within the Series

columns() Method returns the column labels of the DataFrame

axes() Method returns a list representing the axes of the DataFrame

isnull() Method creates a Boolean Series for extracting rows with null
values

notnull() Method creates a Boolean Series for extracting rows with non-null
values

between() Method extracts rows where a column value falls in between a pre-
defined range

isin() Method extracts rows from a DataFrame where a column value ex-
ists in a predefined collection
Start Your CodingMethod
dtypes()
Journey Now!
returns a Series with the data type of each column. The re-
sult’s index is the original DataFrame’s columns

astype() Method converts the data types in a Series

values() Method returns a Numpy representation of the DataFrame i.e. only


the values in the DataFrame will be returned, the axes labels will
be removed

sort_values()- Method sorts a data frame in Ascending or Descending order of


Set1, Set2 passed Column

sort_index() Method sorts the values in a DataFrame based on their index posi-
tions or labels instead of their values but sometimes a data frame
is made out of two or more data frames and hence later index can
be changed using this method

loc[] Method retrieves rows based on index label

iloc[] Method retrieves rows based on index position

ix[] Method retrieves DataFrame rows based on either index label or

index position. This method combines the best features of the .loc[]
and .iloc[] methods

rename() Method is called on a DataFrame to change the names of the in-


dex labels or column names

columns() Method is an alternative attribute to change the coloumn name

drop() Method is used to delete rows or columns from a DataFrame

pop() Method is used to delete rows or columns from a DataFrame

sample() Method pulls out a random sample of rows or columns from a


DataFrame

nsmallest() Method pulls out the rows with the smallest values in a column
Start Your CodingMethod
nlargest()
Journey Now!
pulls out the rows with the largest values in a column

shape() Method returns a tuple representing the dimensionality of the


DataFrame

ndim() Method returns an ‘int’ representing the number of axes / array di-
mensions.

Returns 1 if Series, otherwise returns 2 if DataFrame

dropna() Method allows the user to analyze and drop Rows/Columns with
Null values in different ways

fillna() Method manages and let the user replace NaN values with some
value of their own

rank() Values in a Series can be ranked in order with this method

query() Method is an alternate string-based syntax for extracting a subset


from a DataFrame

copy() Method creates an independent copy of a pandas object

duplicated() Method creates a Boolean Series and uses it to extract rows that
have duplicate values

drop_duplicates() Method is an alternative option to identifying duplicate rows and re-


moving them through filtering

set_index() Method sets the DataFrame index (row labels) using one or more
existing columns

reset_index() Method resets index of a Data Frame. This method sets a list of in-
teger ranging from 0 to length of data as index

where() Method is used to check a Data Frame for one or more condition
and return the result accordingly. By default, the rows not satisfy-
ing the condition are filled with NaN value
Start Your Coding Journey Now!
More on Pandas

1. P ython | Pandas Series

2. P ython | Pandas Working With Text Data

3. P ython | Pandas Working with Dates and Times

4. P ython | Pandas Merging, Joining, and Concatenating

Related Articles

1. Python | Pandas Working With Text Data


[https://www.geeksforgeeks.org/python-pandas-working-with-text-data/?ref=rp]

2. Python | Pandas Series


[https://www.geeksforgeeks.org/python-pandas-series/?ref=rp]

3. Python | Pandas Merging, Joining, and Concatenating


[https://www.geeksforgeeks.org/python-pandas-merging-joining-and-concatenating/?
ref=rp]

4. Python | Pandas Working with Dates and Times


[https://www.geeksforgeeks.org/python-pandas-working-with-dates-and-times/?ref=rp]

5. Python Quizzes
[https://www.geeksforgeeks.org/python-gq/?ref=rp]

6. Python Multiple Choice Questions


[https://www.geeksforgeeks.org/python-multiple-choice-questions/?ref=rp]

7. Python Numpy
[https://www.geeksforgeeks.org/python-numpy/?ref=rp]

8. Python Programming Examples


[https://www.geeksforgeeks.org/python-programming-examples/?ref=rp]
Start
9. Your CodingLanguage
Python Programming Journey Now!
[https://www.geeksforgeeks.org/python-programming-language/?ref=rp]

10. Python | Pandas DataFrame.fillna() to replace Null values in dataframe


[https://www.geeksforgeeks.org/python-pandas-dataframe-fillna-to-replace-null-values-in-
dataframe/?ref=rp]

A-143, 9th Floor, Sovereign Corporate Tower,


Sector-136, Noida, Uttar Pradesh - 201305

feedback@geeksforgeeks.org

Company Learn
About Us DSA
Careers Algorithms
In Media Data Structures
Contact Us SDE Cheat Sheet
Privacy Policy Machine learning
Copyright Policy CS Subjects
Advertise with us Video Tutorials
Courses

News Languages
Top News
Python
Technology
Java
Work & Career
CPP
Business
Golang
Finance
C#
Lifestyle
SQL
Knowledge
Kotlin

Web Development Contribute


Web Tutorials Write an Article
Start Your Django
Coding Journey Now!
Tutorial Improve an Article
HTML Pick Topics to Write
JavaScript Write Interview Experience
Bootstrap Internships
ReactJS Video Internship
NodeJS

@geeksforgeeks , Some rights reserved

You might also like