Open In App

Data Visualization using Plotnine and ggplot2 in Python

Last Updated : 15 Jan, 2025
Summarize
Comments
Improve
Suggest changes
Like Article
Like
Share
Report
News Follow

Plotnoine is a Python library that implements a grammar of graphics similar to ggplot2 in R. It allows users to build plots by defining data, aesthetics, and geometric objects. This approach provides a flexible and consistent method for creating a wide range of visualizations. It is built on the concept of Grammar of Graphics used in ggplot2.

In this article, we will discuss how to visualize data using plotnine in Python which follows grammar of graphics principles to visualize data effectively.

Installing Plotnine in Python

The plotnine is based on ggplot2 in R Programming language which is used to implement grammar of graphics in Python. To install plotnine type the below command in the terminal.

pip install plotnine

Plotting with Plotnine in Python: Data, Aesthetics, and Geoms

Let’s see the three main components that are required to create a plot, and without these components, the plotnine would not be able to plot the graph. These are: 

  • Data is the dataset that is used for plotting the plot.
  • Aesthetics (aes) is the mapping between the data variables and the variables used by the plot such as x-axis, y-axis, color, fill, size, labels, alpha, shape, line width, line type.
  • Geometric Objects (geoms) is the type of plot or a geometric object that we want to use such as point, line, histogram, bar, boxplot, etc.

The basic structure of Plotnine is built around the ggplot() function and geometric objects (geoms). Here’s the general template:

from plotnine import ggplot, aes, geom_point

# ggplot framework
(ggplot(data, aes(x='x_variable', y='y_variable')) + geom_point())

Let’s use components and plot one by one:

1. Data: We will use the Iris dataset and will read it using Pandas.

import pandas as pd
from plotnine import ggplot

df = pandas.read_csv("Iris.csv")

# passing the data to the ggplot 
# constructor
ggplot(df)

Output:

Specifying dataset for the ggplot

This will give us a blank output as we have not specified the other two main components.

2. Aesthetics: This step involves defining which variables from the dataset correspond to the x and y axes, colors, shapes, and other attributes. For instance, you may want to map the species of flowers to colors or map sepal length to the y-axis. Example: Defining Aesthetics of the Plotnine

import pandas as pd
from plotnine import ggplot, aes

df = pd.read_csv("Iris.csv")

ggplot(df) + aes(x="Species", y="SepalLengthCm")

Output:

Defining aesthetics of the plotnine and ggplot in Python

In the above example, we can see that Species is shown on the x-axis and sepal length is shown on the y-axis. But still there is no figure in the plot. This can be added using geometric objects.

3. Geometric Objects: After specifying the data and aesthetics, the final step is to define geoms (geometric objects). Whether you want scatter plots, bar charts, or histograms, Plotnine provides various geoms to display data effectively.

import pandas as pd
from plotnine import ggplot, aes, geom_col

df = pd.read_csv("Iris.csv")

ggplot(df) + aes(x="Species", y="SepalLengthCm") + geom_col()

Output:

Adding geometric objects to the plotnine and ggplot in Python

In the above example, we have used the geam_col() geom that is a bar plot with the base on the x-axis. We can change this to different types of geoms that we find suitable for our plot.

Plotting Basic Charts with Plotnine in Python

Plotnine allows users to create complex plots using a declarative syntax, making it easier to build, customize, and manage plots. In this section, we will cover how to create basic charts using Plotnine, including scatter plots, line charts, bar charts, box plots, and histograms.

Example 1: Plotting Histogram with Plotnine

import pandas as pd
from plotnine import ggplot, aes, geom_histogram

df = pd.read_csv("Iris.csv")

ggplot(df) + aes(x="SepalLengthCm") + geom_histogram()

Output:

Plotting Histogram with plotnine and ggplot in Python

Example 2: Plotting Scatter plot With Plotnine

import pandas as pd
from plotnine import ggplot, aes, geom_point

df = pd.read_csv("Iris.csv")

ggplot(df) + aes(x="Species", y="SepalLengthCm") + geom_point()

Output:

Plotting Scatter plot with plotnine and ggplot in Python

Example 3: Plotting Box plot with Plotnine

import pandas as pd
from plotnine import ggplot, aes, geom_boxplot

df = pd.read_csv("Iris.csv")

ggplot(df) + aes(x="Species", y="SepalLengthCm") + geom_boxplot()

Output:

Plotting Box plot with plotnine and ggplot in Python

Example 4: Plotting Line chart with Plotnine

import pandas as pd
from plotnine import ggplot, aes, geom_line

df = pd.read_csv("Iris.csv")

ggplot(df) + aes(x="Species", y="SepalLengthCm") + geom_line()

Output:

Plottin Line chart with plotnine and ggplot in Python

Till now we have learnt about how to create a basic chart using the concept of grammar of graphics and it’s three main components. Now let’s learn how to customize these charts using the other optional components.

Enhacing Data visualizations Using Plotnine – Customizations

There are various optional components that can make the plot more meaningful and presentable. These are:

  • Facets allow data to plot subsets of data
  • Statistical transformations compute the data before plotting it.
  • Coordinates define the position of the object in a 2D plane.
  • Themes define the presentation of the data such as font, color, etc.

1. Facets

Let’s consider the tips dataset that contains information about people who probably had food at a restaurant and whether or not they left a tip, their age, gender and so on. Lets have a look at it. To download the dataset used, click here.

Now let’s suppose we want to plot about what was the total bill according to the gender and on each day.

import pandas as pd
from plotnine import ggplot, aes, facet_grid, labs, geom_col

df = pd.read_csv("tips.csv")

(
    ggplot(df)
    + facet_grid(facets="~sex")
    + aes(x="day", y="total_bill")
    + labs(
        x="day",
        y="total_bill",
    )
    + geom_col()
)

Output:

Facets with plotnine and ggplot in Python

2. Statistical Transformations

Let’s consider the above example where we wanted to find the measurement of the sepal length column and now we want to distribute that measurement into 15 columns. The geom_histogram() function of the plotnine computes and plot this data automatically.

import pandas as pd
from plotnine import ggplot, aes, geom_histogram

df = pd.read_csv("Iris.csv")

ggplot(df) + aes(x="SepalLengthCm") + geom_histogram(bins=15)

Output:

Statistical transformations using plotnine and ggplot in Python

3. Coordinates

Let’s see the above example of histogram, we want to plot this histogram horizontally. We can simply do this by using the coord_flip() function.

import pandas as pd
from plotnine import ggplot, aes, geom_histogram, coord_flip

df = pd.read_csv("Iris.csv")

(
    ggplot(df)
    + aes(x="SepalLengthCm")
    + geom_histogram(bins=15)
    + coord_flip()
)

Output:

Coordinate system in plotnine and ggplot in Python

4. Themes

Plotnine includes a lot of theme. Let’s use the above example with facets and try to make the visualization more interactive.

import pandas as pd
from plotnine import ggplot, aes, facet_grid, labs, geom_col, theme_xkcd

df = pd.read_csv("tips.csv")

(
    ggplot(df)
    + facet_grid(facets="~sex")
    + aes(x="day", y="total_bill")
    + labs(
        x="day",
        y="total_bill",
    )
    + geom_col()
    + theme_xkcd()
)

Output:

Themes in plotnine and ggplot in Python

We can also fill the color according to add more information to this graph. We can add color for the time variable in the above graph using the fill parameter of the aes function.

Plotting Multidimensional Data with Plotline

Till now we have seen how to plot more than 2 variables in the case of facets. Now let’s suppose we want to plot data using four variables, doing this with facets can be a little bit of hectic, but with using the color we can plot 4 variables in the same plot only. We can fill the color using the fill parameter of the aes() function. Example: Adding Color to Plotnine and ggplot in Python

import pandas as pd
from plotnine import ggplot, aes, facet_grid, labs, geom_col, theme_xkcd

df = pd.read_csv("tips.csv")

(
    ggplot(df)
    + facet_grid(facets="~sex")
    + aes(x="day", y="total_bill", fill="time")
    + labs(
        x="day",
        y="total_bill",
    )
    + geom_col()
    + theme_xkcd()
)

Output:

Adding color to plotnine and ggplot in Python

Exporting Plots With Plotline

We can simply save the plot using the save() method. This method will export the plot as an image.

import pandas as pd
from plotnine import ggplot, aes, facet_grid, labs, geom_col, theme_xkcd

df = pd.read_csv("tips.csv")

plot = (
    ggplot(df)
    + facet_grid(facets="~sex")
    + aes(x="day", y="total_bill", fill="time")
    + labs(
        x="day",
        y="total_bill",
    )
    + geom_col()
    + theme_xkcd()
)

plot.save("gfg plotnine tutorial.png")

Output:

Saving the plotnine and ggplot in Python

In conclusion, Plotline is a versatile and powerful tool for data visualization in Python, offering a wide range of features to create professional and informative plots. Whether you are creating simple plots or complex multi-faceted visualizations plotnine provides the flexibility and functionality needed to bring your data to life.



Next Article

Similar Reads

three90RightbarBannerImg