Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Python Pandas - Basics
Python Pandas - Introduction to Data Structures
Python Pandas - Index Objects
Python Pandas - Panel
Python Pandas - Basic Functionality
Python Pandas - Indexing & Selecting Data
Python Pandas - Series
Python Pandas - Series
Python Pandas - Slicing a Series Object
Python Pandas - Attributes of a Series Object
Python Pandas - Arithmetic Operations on Series Object
Python Pandas - Converting Series to Other Objects
Python Pandas - DataFrame
Python Pandas - DataFrame
Python Pandas - Accessing DataFrame
Python Pandas - Slicing a DataFrame Object
Python Pandas - Modifying DataFrame
Python Pandas - Removing Rows from a DataFrame
Python Pandas - Arithmetic Operations on DataFrame
Python Pandas - IO Tools
Python Pandas - IO Tools
Python Pandas - Working with CSV Format
Python Pandas - Reading & Writing JSON Files
Python Pandas - Reading Data from an Excel File
Python Pandas - Writing Data to Excel Files
Python Pandas - Working with HTML Data
Python Pandas - Clipboard
Python Pandas - Working with HDF5 Format
Python Pandas - Comparison with SQL
Python Pandas - Data Handling
Python Pandas - Sorting
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Concatenation
Python Pandas - Statistical Functions
Python Pandas - Descriptive Statistics
Python Pandas - Working with Text Data
Python Pandas - Function Application
Python Pandas - Options & Customization
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Merging/Joining
Python Pandas - MultiIndex
Python Pandas - Basics of MultiIndex
Python Pandas - Indexing with MultiIndex
Python Pandas - Advanced Reindexing with MultiIndex
Python Pandas - Renaming MultiIndex Labels
Python Pandas - Sorting a MultiIndex
Python Pandas - Binary Operations
Python Pandas - Binary Comparison Operations
Python Pandas - Boolean Indexing
Python Pandas - Boolean Masking
Python Pandas - Data Reshaping & Pivoting
Python Pandas - Pivoting
Python Pandas - Stacking & Unstacking
Python Pandas - Melting
Python Pandas - Computing Dummy Variables
Python Pandas - Categorical Data
Python Pandas - Categorical Data
Python Pandas - Ordering & Sorting Categorical Data
Python Pandas - Comparing Categorical Data
Python Pandas - Handling Missing Data
Python Pandas - Missing Data
Python Pandas - Filling Missing Data
Python Pandas - Interpolation of Missing Values
Python Pandas - Dropping Missing Data
Python Pandas - Calculations with Missing Data
Python Pandas - Handling Duplicates
Python Pandas - Duplicated Data
Python Pandas - Counting & Retrieving Unique Elements
Python Pandas - Duplicated Labels
Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy
Python Pandas - Time-series Data
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Sparse Data Structures
Python Pandas - Sparse Data
Python Pandas - Visualization
Python Pandas - Visualization
Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Python Pandas - Introduction to Data Structures

Quiz

Python Pandas Data Structures

Data structures in Pandas are designed to handle data efficiently. They allow for the organization, storage, and modification of data in a way that optimizes memory usage and computational performance. Python Pandas library provides two primary data structures for handling and analyzing data −

Series
DataFrame

In general programming, the term "data structure" refers to the method of collecting, organizing, and storing data to enable efficient access and modification. Data structures are collections of data types that provide the best way of organizing items (values) in terms of memory usage.

Pandas is built on top of NumPy and integrates well within a scientific computing environment with many other third-party libraries. This tutorial will provide a detailed introduction to these data structures.

Dimension and Description of Pandas Data Structures

Data Structure	Dimensions	Description
Series	1	A one-dimensional labeled homogeneous array, sizeimmutable.
Data Frames	2	A two-dimensional labeled, size-mutable tabular structure with potentially heterogeneously typed columns.

Working with two or more dimensional arrays can be complex and time-consuming, as users need to carefully consider the data's orientation when writing functions. However, Pandas simplifies this process by reducing the mental effort required. For example, when dealing with tabular data (DataFrame), it's more easy to think in terms of rows and columns instead of axis 0 and axis 1.

Mutability of Pandas Data Structures

All Pandas data structures are value mutable, meaning their contents can be changed. However, their size mutability varies −

Series − Size immutable.
DataFrame − Size mutable.

Series

A Series is a one-dimensional labeled array that can hold any data type. It can store integers, strings, floating-point numbers, etc. Each value in a Series is associated with a label (index), which can be an integer or a string.

Name	Steve
Age	35
Gender	Male
Rating	3.5

Example

Consider the following Series which is a collection of different data types

Open Compiler

import pandas as pd

data = ['Steve', '35', 'Male', '3.5']
series = pd.Series(data, index=['Name', 'Age', 'Gender', 'Rating'])
print(series)

On executing the above program, you will get the following output −

Name      Steve
Age          35
Gender     Male
Rating      3.5
dtype: object

Key Points

Following are the key points related to the Pandas Series.

Homogeneous data
Size Immutable
Values of Data Mutable

DataFrame

A DataFrame is a two-dimensional labeled data structure with columns that can hold different data types. It is similar to a table in a database or a spreadsheet. Consider the following data representing the performance rating of a sales team −

Name	Age	Gender	Rating
Steve	32	Male	3.45
Lia	28	Female	4.6
Vin	45	Male	3.9
Katie	38	Female	2.78

Example

The above tabular data can be represented in a DataFrame as follows −

Open Compiler

import pandas as pd

# Data represented as a dictionary
data = {
    'Name': ['Steve', 'Lia', 'Vin', 'Katie'],
    'Age': [32, 28, 45, 38],
    'Gender': ['Male', 'Female', 'Male', 'Female'],
    'Rating': [3.45, 4.6, 3.9, 2.78]
}

# Creating the DataFrame
df = pd.DataFrame(data)

# Display the DataFrame
print(df)

Output

On executing the above code you will get the following output −

    Name  Age  Gender  Rating
0  Steve   32    Male    3.45
1    Lia   28  Female    4.60
2    Vin   45    Male    3.90
3  Katie   38  Female    2.78

Key Points

Following are the key points related the Pandas DataFrame −

Heterogeneous data
Size Mutable
Data Mutable

Purpose of Using More Than One Data Structure

Pandas data structures are flexible containers for lower-dimensional data. For instance, a DataFrame is a container for Series, and a Series is a container for scalars. This flexibility allows for efficient data manipulation and storage.

Building and handling multi-dimensional arrays can be boring and require careful consideration of the data's orientation when writing functions. Pandas reduces this mental effort by providing intuitive data structures.

Example

Following example represents a Series within a DataFrame.

Open Compiler

import pandas as pd

# Data represented as a dictionary
data = {
    'Name': ['Steve', 'Lia', 'Vin', 'Katie'],
    'Age': [32, 28, 45, 38],
    'Gender': ['Male', 'Female', 'Male', 'Female'],
    'Rating': [3.45, 4.6, 3.9, 2.78]
}

# Creating the DataFrame
df = pd.DataFrame(data)

# Display a Series within a DataFrame
print(df['Name'])

Output

On executing the above code you will get the following output −

0    Steve
1      Lia
2      Vin
3    Katie
Name: Name, dtype: object

Print Page