Lab3 - Python - Pandas DataFrame - GeeksforGeeks
Lab3 - Python - Pandas DataFrame - GeeksforGeeks
structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure,
i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three
We will get a brief insight on all these basic operation which can be performed on Pandas
DataFrame :
Start Your Coding Journey Now!
Creating a DataFrame
In the real world, a Pandas DataFrame will be created by loading the datasets from existing stor-
age, storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from
the lists, dictionar y, and from a list of dictionar y etc. Dataframe can be created in different ways
Creating a dataframe using List : DataFrame can be created using a single list or a list of lists.
# import pandas as pd
import pandas as pd
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']
Output:
Creating DataFrame from dict of ndarray/lists: To create DataFrame from dict of narray/list, all
the narray must be of same length. If index is passed then the length index should be equal to the
Start Your Coding Journey Now!
length of arrays. If no index is passed, then by default, index will be range(n) where n is the array
length.
import pandas as pd
# Create DataFrame
df = pd.DataFrame(data)
Output:
and columns. We can perform basic operations on rows/columns like selecting, deleting, adding,
and renaming.
Column Selection: In Order to select a column in Pandas DataFrame, we can either access the col-
Output:
Row Selection: Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[]
method is used to retrieve rows from Pandas DataFrame. Rows can also be selected by passing in-
Output:
A s shown in the output image, two series were returned since there was only one parameter both
of the times.
Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame.
Indexing could mean selecting all the rows and some of the columns, some of the rows and all of
the columns, or some of each of the rows and columns. Indexing can also be known as Subset
Selection.
Indexing operator is used to refer to the square brackets following an object. The .loc and .iloc in-
dexers also use the indexing operator to make selections. In this indexing operator to refer to df[].
print(first)
Output:
This function selects data by the label of the rows and columns. The df.loc indexer selects data in a
different way than just the indexing operator. It can select subsets of rows or columns. It can also
In order to select a single row using .loc[], we put a single row label in a .loc function.
Output:
A s shown in the output image, two series were returned since there was only one parameter both
of the times.
This function allows us to retrieve rows and columns by position. In order to do that, we’ll need to
specify the positions of the rows that we want, and the positions of the columns that we want as
well. The df.iloc indexer is ver y similar to df.loc but only uses integer locations to make its
selections.
In order to select a single row using .iloc[], we can pass a single integer to .iloc[] function.
import pandas as pd
print(row2)
Start Your Coding Journey Now!
Output:
Missing Data can occur when no information is provided for one or more items or for a whole unit.
Missing Data is a ver y big problem in real life scenario. Missing Data can also refer to as NA(Not
In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull().
Both function help in checking whether a value is NaN or not. These function can also be used in
# importing pandas as pd
import pandas as pd
# importing numpy as np
import numpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
Output:
In order to fill null values in a datasets, we use fillna(), replace() and interpolate() function these
function replace NaN values with some value of their own. All these function help in filling a null
values in datasets of a DataFrame. Interpolate() function is basically used to fill NA values in the
dataframe but it uses various interpolation technique to fill the missing values rather than hard-
# importing pandas as pd
import pandas as pd
# importing numpy as np
import numpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
Output:
Start Your Coding Journey Now!
In order to drop a null values from a dataframe, we used dropna() function this fuction drop
# importing pandas as pd
import pandas as pd
# importing numpy as np
import numpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, np.nan, 45, 56],
'Third Score':[52, 40, 80, 98],
'Fourth Score':[np.nan, np.nan, np.nan, 65]}
df
Start Your Coding Journey Now!
Now we drop rows with at least one Nan value (Null value)
# importing pandas as pd
import pandas as pd
# importing numpy as np
import numpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, np.nan, 45, 56],
'Third Score':[52, 40, 80, 98],
'Fourth Score':[np.nan, np.nan, np.nan, 65]}
Output:
Start Your Coding Journey Now!
For more Details refer to Working with Missing Data in Pandas
Iteration is a general term for taking each item of something, one after another. Pandas DataFrame
consists of rows and columns so, in order to iterate over dataframe, we have to iterate a dataframe
like a dictionar y.
In order to iterate over rows, we can use three function iteritems(), iterrows(), itertuples() . These
# importing pandas as pd
import pandas as pd
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'degree': ["MBA", "BCA", "M.Tech", "MBA"],
'score':[90, 40, 80, 98]}
print(df)
Run on IDE
# importing pandas as pd
import pandas as pd
Start Your Coding Journey Now!
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'degree': ["MBA", "BCA", "M.Tech", "MBA"],
'score':[90, 40, 80, 98]}
Output:
In order to iterate over columns, we need to create a list of dataframe columns and then iterating
# importing pandas as pd
import pandas as pd
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'degree': ["MBA", "BCA", "M.Tech", "MBA"],
'score':[90, 40, 80, 98]}
print(df)
Run on IDE
Start Your Coding Journey Now!
Now we iterate through columns in order to iterate through columns we first create a list of
for i in columns:
Output:
For more Details refer to Iterating over rows and columns in Pandas DataFrame
DataFrame Methods:
FUNCTION DESCRIPTION
Start
index()
Your CodingMethod
Journey Now!
returns index (row labels) of the DataFrame
value_counts() Method counts the number of times each unique value occurs
within the Series
isnull() Method creates a Boolean Series for extracting rows with null
values
notnull() Method creates a Boolean Series for extracting rows with non-null
values
between() Method extracts rows where a column value falls in between a pre-
defined range
isin() Method extracts rows from a DataFrame where a column value ex-
ists in a predefined collection
Start Your CodingMethod
dtypes()
Journey Now!
returns a Series with the data type of each column. The re-
sult’s index is the original DataFrame’s columns
sort_index() Method sorts the values in a DataFrame based on their index posi-
tions or labels instead of their values but sometimes a data frame
is made out of two or more data frames and hence later index can
be changed using this method
index position. This method combines the best features of the .loc[]
and .iloc[] methods
nsmallest() Method pulls out the rows with the smallest values in a column
Start Your CodingMethod
nlargest()
Journey Now!
pulls out the rows with the largest values in a column
ndim() Method returns an ‘int’ representing the number of axes / array di-
mensions.
dropna() Method allows the user to analyze and drop Rows/Columns with
Null values in different ways
fillna() Method manages and let the user replace NaN values with some
value of their own
duplicated() Method creates a Boolean Series and uses it to extract rows that
have duplicate values
set_index() Method sets the DataFrame index (row labels) using one or more
existing columns
reset_index() Method resets index of a Data Frame. This method sets a list of in-
teger ranging from 0 to length of data as index
where() Method is used to check a Data Frame for one or more condition
and return the result accordingly. By default, the rows not satisfy-
ing the condition are filled with NaN value
Start Your Coding Journey Now!
More on Pandas
Related Articles
5. Python Quizzes
[https://www.geeksforgeeks.org/python-gq/?ref=rp]
7. Python Numpy
[https://www.geeksforgeeks.org/python-numpy/?ref=rp]
feedback@geeksforgeeks.org
Company Learn
About Us DSA
Careers Algorithms
In Media Data Structures
Contact Us SDE Cheat Sheet
Privacy Policy Machine learning
Copyright Policy CS Subjects
Advertise with us Video Tutorials
Courses
News Languages
Top News
Python
Technology
Java
Work & Career
CPP
Business
Golang
Finance
C#
Lifestyle
SQL
Knowledge
Kotlin