Customizing Plot Labels in Pandas

Label Encoding in Python

Last Updated : 12 Feb, 2025

When working with datasets, we often encounter categorical data, which needs to be converted into numerical format for machine learning algorithms to process. For example, a column representing car brands ("Toyota", "Honda", "Ford") or colors ("Red", "Blue", "Green") is categorical data for Cars Dataset. One common method to achieve this is Label Encoding.

In this Article, we will understand the concept of label encoding briefly with python implementation.

Label Encoding

Label Encoding is a technique that is used to convert categorical columns into numerical ones so that they can be fitted by machine learning models which only take numerical data. It is an important pre-processing step in a machine-learning project. It assigns a unique integer to each category in the data, making it suitable for machine learning models that work with numerical inputs.

Example of Label Encoding

Suppose we have a column Height in some dataset that has elements as Tall, Medium, and short. To convert this categorical column into a numerical column we will apply label encoding to this column. After applying label encoding, the Height column is converted into a numerical column having elements 0, 1, and 2 where 0 is the label for tall, 1 is the label for medium, and 2 is the label for short height.

Height	Height
Tall	0
Medium	1
Short	2

How to Perform Label Encoding in Python

We will apply Label Encoding on the iris dataset on the target column which is Species. It contains three species Iris-setosa, Iris-versicolor, Iris-virginica.

import numpy as np
import pandas as pd

df = pd.read_csv('../../data/Iris.csv')
df['species'].unique()

Output:

array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)

After applying Label Encoding with LabelEncoder() our categorical value will replace with the numerical value[int].

from sklearn import preprocessing

label_encoder = preprocessing.LabelEncoder()
df['species']= label_encoder.fit_transform(df['species'])

df['species'].unique()

Output:

array([0, 1, 2], dtype=int64)

Advantages of Label Encoding

1. Label Encoding is straightforward to use. It requires less preprocessing because it directly converts each unique category into a numeric value. Wedon’t need to create additional features or complex transformations.

For example, if you have categories like ["Red", "Green", "Blue"], Label Encoding simply assigns integers like [0, 1, 2] without extra steps

2. Label Encoding works well for ordinal data, where the order of categories is meaningful (e.g., Low, Medium, High). The numerical representation saves the relationship between categories

Example: (Low = 0, Medium = 1, High = 2), which helps the model understand their ranking or progression. It avoids unnecessary computations, making it both efficient and relevant in such cases.

Limitation of label Encoding

If the encoded values imply a relationship (e.g., Red = 0 and Blue = 2 might suggest Red < Blue), the model may incorrectly interpret the data as ordinal. To address this, we consider using One-Hot Encoding.

Conclusion

Label Encoding is an essential technique for preprocessing categorical data in machine learning. It’s simple, efficient, and works well for ordinal data. However, be cautious of its limitations and use other encoding techniques like One-Hot Encoding when necessary.

Customizing Plot Labels in Pandas

aakarshachug

News

Improve

Article Tags :

Practice Tags :

Machine Learning

Similar Reads

Label Encoding in Python

When working with datasets, we often encounter categorical data, which needs to be converted into numerical format for machine learning algorithms to process. For example, a column representing car brands ("Toyota", "Honda", "Ford") or colors ("Red", "Blue", "Green") is categorical data for Cars Dat

Customizing Plot Labels in Pandas

Customizing plot labels in Pandas is an essential skill for data scientists and analysts who need to create clear and informative visualizations. Pandas, a powerful data manipulation library in Python, provides a convenient interface for creating plots with Matplotlib, a comprehensive plotting libra

Customizing Axis Labels in Pandas Plots

Customizing axis labels in Pandas plots is a crucial aspect of data visualization that enhances the readability and interpretability of plots. Pandas, a powerful data manipulation library in Python, offers several methods to customize axis labels, particularly when using its plotting capabilities bu

Adding Labels to Histogram Bars in Matplotlib

In Matplotlib, a histogram displays data distribution across defined intervals, known as bins. Each bar in the histogram represents the frequency or density of data points within these bins. To communicate these values, you can add labels to the bars by placing text directly on or above them, showin

Best way to learn python

Python is a versatile and beginner-friendly programming language that has become immensely popular for its readability and wide range of applications. Whether you're aiming to start a career in programming or just want to expand your skill set, learning Python is a valuable investment of your time.

How to change the size of axis labels in Matplotlib?

Matplotlib offers customization options for the plots. Let's learn how to change the size of the axis labels in Matplotlib to enhance readability. Before starting let's draw a simple plot with matplotlib. [GFGTABS] Python import matplotlib.pyplot as plt x = [1, 2, 3, 4, 5] y = [9, 8, 7, 6, 5] fig, a

Python Pyforest Library

Sometimes, it happens that we spent a huge amount of time importing some common libraries like NumPy, pandas, matplotlib, seaborn, nltk and many more. To remove this headache of importing such libraries manually, we have pyforest library. It is that library which helps you to work directly without i

Welcome to "Python 101," your comprehensive guide to understanding and mastering the fundamentals of Python programming. Python is a versatile and powerful high-level programming language that has gained immense popularity due to its simplicity and readability. Whether you're an aspiring programmer,

Numpy - String Functions & Operations

NumPy String functions belong to the numpy.char module and are designed to perform element-wise operations on arrays. These functions can help to handle and manipulate string data efficiently. Table of Content String OperationsString Information String Comparison In this article, weâ€™ll explore the v

Matplotlib - Setting Ticks and Tick Labels

Matplotlib has the ability to customize ticks and tick labels on axes, which enhances the readability and interpretability of graphs. This article will explore setting ticks and tick labels, providing a clear example to illustrate the core concepts. Setting Ticks and tick labels - using set_xticks()