Open In App

Pandas DataFrame describe() Method

Last Updated : 29 Nov, 2024
Summarize
Comments
Improve
Suggest changes
Like Article
Like
Share
Report
News Follow

describe() method in Pandas is used to generate descriptive statistics of DataFrame columns. It gives a quick summary of key statistical metrics like mean, standard deviation, percentiles, and more. By default, describe() works with numeric data but can also handle categorical data, offering tailored insights based on data type.

Syntax: DataFrame.describe(percentiles=None, include=None, exclude=None)

Parameters:

  • percentiles: A list of numbers between 0 and 1, specifying which percentiles to return. The default is None, which returns the 25th, 50th, and 75th percentiles.
  • include: A list of data types to include in the summary. You can specify data types such as int, float, object (for strings), etc. The default is None, meaning all numeric types are included.
  • exclude: A list of data types to exclude from the summary. This parameter is also None by default, meaning no types are excluded.

The describe() method returns a statistical summary of the data frame or series.

Using describe() method on a DataFrame

Let’s walk through an example using an NBA dataset and then use the describe() method to generate a statistical summary.

Dataset Link: nba.csv

import pandas as pd

# Reading the CSV file
data = pd.read_csv('nba.csv')

# Displaying the first few rows of the dataset
print("NBA Dataset:")
display(data.head())

print("\n Summary Table Generated by .describe() Method:")
display(data.describe())

Output

Dataset-summary-generated-by-describe-method

Summary generated by .describe() method

Descriptive Statistics for Numerical Columns generated using .describe() Method

  • count: Total number of non-null values in the column.
  • mean: Average value of the column.
  • std: Standard deviation, showing how spread out the values are.
  • min: Minimum value in the column.
  • 25%: 25th percentile (Q1).
  • 50%: Median value (50th percentile).
  • 75%: 75th percentile (Q3).
  • max: Maximum value in the column.

Customizing describe() Method with Percentiles

You can customize the describe() method to include specific percentiles by passing a list to the percentiles parameter. Here’s an example:

percentiles = [.20, .40, .60, .80]
include = ['object', 'float', 'int']

desc = data.describe(percentiles=percentiles, include=include)

print(desc)

Output

Dataset-summary-generated-by-describe

In this output, you can see that the percentiles have been applied, providing additional insights.

Describing Series of Strings (Object Data Type)

If you want to describe a column with string data (i.e., an object data type), the output will be different. Here’s an example using the “Name” column from the dataset:

desc = data["Name"].describe()

print(desc)

Output

count 457
unique 457
top Avery Bradley
freq 1
Name: Name, dtype: object

For string data, the describe() method provides:

  • count: Total number of non-null values.
  • unique: The number of unique values.
  • top: The most frequent value.
  • freq: The frequency of the most common value.

The describe() method in Pandas is a powerful tool for quickly obtaining an overview of a DataFrame’s numeric and object columns.


Next Article

Similar Reads

three90RightbarBannerImg