
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Calculations with Missing Data
When working with data, you will often come across missing values, which are represented as NaN (Not a Number) in Pandas. Calculations with the missing values requires more attention since NaN values propagate through most arithmetic operations, which may alter the results.
Pandas offers flexible ways to manage missing data during calculations, allowing you to control how these values affect your results. In this tutorial, we will learn how Pandas handles missing data during calculations, including arithmetic operations, descriptive statistics, and cumulative operations.
Arithmetic Operations with Missing Data
When performing arithmetic operations between Pandas objects, missing values (NaN) are propagated by default. For example, when you add two series with NaN values, the result will also have NaN wherever there was a missing value in any of series.
Example
The following example demonstrates performing the arithmetic operations between two series objects with missing values.
import pandas as pd import numpy as np # Create 2 input series objects ser1 = pd.Series([1, np.nan, np.nan, 2]) ser2 = pd.Series([2, np.nan, 1, np.nan]) # Display the series print("Input Series 1:\n",ser1) print("\nInput Series 2:\n",ser2) # Adding two series with NaN values result = ser1 + ser2 print('\nResult After adding Two series:\n',result)
Following is the output of the above code −
Input Series 1: 0 1.0 1 NaN 2 NaN 3 2.0 dtype: float64 Input Series 2: 0 2.0 1 NaN 2 1.0 3 NaN dtype: float64 Result After adding Two series: 0 3.0 1 NaN 2 NaN 3 NaN dtype: float64
Handling Missing Data in Descriptive Statistics
The Pandas library provides several methods for computing descriptive statistics, such as summing, calculating the product, or finding the cumulative sum or product. These methods are designed to handle missing data efficiently.
Example: Summing with Missing Values
When summing data with missing values, NaN values are excluded. This allows you to calculate meaningful totals even when some data is missing.
The following example performing the summing operation on a DataFrame column using the sum() function. By default, NaN values are skipped in summation operation.
import pandas as pd import numpy as np # Create a sample DataFrame data = {'A': [np.nan, 2, np.nan, 4], 'B': [5, 6, 7, 8]} df = pd.DataFrame(data) # Display the input DataFrame print("Input DataFrame:\n", df) # Summing a column with NaN values result = df['A'].sum() print('\nResult After Summing the values of a column:\n',result)
Following is the output of the above code −
Input DataFrame:
A | B | |
---|---|---|
0 | NaN | 5 |
1 | 2.0 | 6 |
2 | NaN | 7 |
3 | 4.0 | 8 |
Example: Product Calculation with Missing Values
Similar to summing, when calculating the product of values with the missing data (NaN) is treated as 1. This ensures that missing values do not alter the final product.
The following example uses the pandas df.prod() function to calculate the product of a pandas object.
import pandas as pd import numpy as np # Create a sample DataFrame data = {'A': [np.nan, 2, np.nan, 4], 'B': [5, 6, np.nan, np.nan]} df = pd.DataFrame(data) # Display the input DataFrame print("Input DataFrame:\n", df) # Product with NaN values result = df.prod() print('\nResult After Product the values of a DataFrame:\n',result)
Following is the output of the above code −
Input DataFrame:
A | B | |
---|---|---|
0 | NaN | 5.0 |
1 | 2.0 | 6.0 |
2 | NaN | NaN |
3 | 4.0 | NaN |
Cumulative Operations with Missing Data
Pandas provides cumulative methods like cumsum() and cumprod() to generate running totals or products. By default, these methods ignore missing values but preserve them in the output. If you want to include the missing data in the calculation, you can set the skipna parameter to False.
Example: Cumulative Sum with Missing Values
The following example demonstrates calculating the cumulative sum of a DataFrame with missing values using the df.cumsum() method.
import pandas as pd import numpy as np # Create a sample DataFrame data = {'A': [np.nan, 2, np.nan, 4], 'B': [5, 6, np.nan, np.nan]} df = pd.DataFrame(data) # Display the input DataFrame print("Input DataFrame:\n", df) # Calculate cumulative sum by ignoring NaN print('Cumulative sum by ignoring NaN:\n',df.cumsum())
Following is the output of the above code −
Input DataFrame:
A | B | |
---|---|---|
0 | NaN | 5.0 |
1 | 2.0 | 6.0 |
2 | NaN | NaN |
3 | 4.0 | NaN |
A | B | |
---|---|---|
0 | NaN | 5.0 |
1 | 2.0 | 11.0 |
2 | NaN | NaN |
3 | 6.0 | NaN |
From the above output you can observe that, the missing values are skipped, and the cumulative sum is computed for the available values.
Example: Including NaN in Cumulative Sum
This example shows how the cumulative sum is performed by including the missing using the df.cumsum() method by setting the skipna=False.
import pandas as pd import numpy as np # Create a sample DataFrame data = {'A': [np.nan, 2, np.nan, 4], 'B': [5, 6, np.nan, np.nan]} df = pd.DataFrame(data) # Display the input DataFrame print("Input DataFrame:\n", df) # Calculate the cumulative sum by preserving NaN print('Cumulative sum by including NaN:\n', df.cumsum(skipna=False))
Following is the output of the above code −
Input DataFrame:
A | B | |
---|---|---|
0 | NaN | 5.0 |
1 | 2.0 | 6.0 |
2 | NaN | NaN |
3 | 4.0 | NaN |
A | B | |
---|---|---|
0 | NaN | 5.0 |
1 | NaN | 11.0 |
2 | NaN | NaN |
3 | NaN | NaN |
With skipna=False, the cumulative sum stops when it encounters a NaN value, and all subsequent values are also become NaN.