Pandas DataFrame describe() Method

Last Updated : 29 Nov, 2024

describe() method in Pandas is used to generate descriptive statistics of DataFrame columns. It gives a quick summary of key statistical metrics like mean, standard deviation, percentiles, and more. By default, describe() works with numeric data but can also handle categorical data, offering tailored insights based on data type.

Syntax: DataFrame.describe(percentiles=None, include=None, exclude=None)

Parameters:

percentiles: A list of numbers between 0 and 1, specifying which percentiles to return. The default is None, which returns the 25th, 50th, and 75th percentiles.
include: A list of data types to include in the summary. You can specify data types such as int, float, object (for strings), etc. The default is None, meaning all numeric types are included.
exclude: A list of data types to exclude from the summary. This parameter is also None by default, meaning no types are excluded.
The describe() method returns a statistical summary of the data frame or series.

Using describe() method on a DataFrame

Let’s walk through an example using an NBA dataset and then use the describe() method to generate a statistical summary.

Dataset Link: nba.csv

import pandas as pd

# Reading the CSV file
data = pd.read_csv('nba.csv')

# Displaying the first few rows of the dataset
print("NBA Dataset:")
display(data.head())

print("\n Summary Table Generated by .describe() Method:")
display(data.describe())

Output

Summary generated by .describe() method

Descriptive Statistics for Numerical Columns generated using .describe() Method

count: Total number of non-null values in the column.
mean: Average value of the column.
std: Standard deviation, showing how spread out the values are.
min: Minimum value in the column.
25%: 25th percentile (Q1).
50%: Median value (50th percentile).
75%: 75th percentile (Q3).
max: Maximum value in the column.

Customizing `describe()` Method with Percentiles

You can customize the describe() method to include specific percentiles by passing a list to the percentiles parameter. Here’s an example:

percentiles = [.20, .40, .60, .80]
include = ['object', 'float', 'int']

desc = data.describe(percentiles=percentiles, include=include)

print(desc)

Output

In this output, you can see that the percentiles have been applied, providing additional insights.

Describing Series of Strings (Object Data Type)

If you want to describe a column with string data (i.e., an object data type), the output will be different. Here’s an example using the “Name” column from the dataset:

desc = data["Name"].describe()

print(desc)

Output

count 457
unique 457
top Avery Bradley
freq 1
Name: Name, dtype: object

For string data, the describe() method provides:

count: Total number of non-null values.
unique: The number of unique values.
top: The most frequent value.
freq: The frequency of the most common value.

The describe() method in Pandas is a powerful tool for quickly obtaining an overview of a DataFrame’s numeric and object columns.

Dealing with Rows and Columns in Pandas DataFrame

Kartikaybhutani

Improve

Article Tags :

Practice Tags :

python

Pandas DataFrame describe() Method

Using describe() method on a DataFrame

Descriptive Statistics for Numerical Columns generated using .describe() Method

Customizing describe() Method with Percentiles

Describing Series of Strings (Object Data Type)

Similar Reads

Introduction

Creating Objects

Viewing Data

Selection & Slicing

Operations

Manipulating Data

Grouping Data

Merging, Joining, Concatenating and Comparing

Working with Date and Time

Thank You!

What kind of Experience do you want to share?

Customizing `describe()` Method with Percentiles