0% found this document useful (0 votes)
21 views174 pages

Python Data Visualization

Uploaded by

cluoxu2020
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
21 views174 pages

Python Data Visualization

Uploaded by

cluoxu2020
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 174

DATA VISUALIZATION WITH

With Expert Python Instructor Chris Bruehl

*Copyright Maven Analytics, LLC


COURSE STRUCTURE

This is a project-based course for students looking for a practical, hands-on approach to
learning data visualization with Python using the Matplotlib and Seaborn libraries

Additional resources include:

Downloadable PDF to serve as a helpful reference when you’re offline or on the go

Quizzes & Assignments to test and reinforce key concepts, with step-by-step solutions

Interactive demos to keep you engaged and apply your skills throughout the course

*Copyright Maven Analytics, LLC


COURSE OUTLINE
Cover key data visualization best practices for clear communication, with tips for
1 Intro to Data Visualization choosing the right chart, formatting it effectively, and using it to tell a story

Introduce the Matplotlib library and use it to build & customize several chart types,
2 Matplotlib Fundamentals including line charts, bar charts, pie charts, scatterplots, and histograms

PROJECT: Visualizing Coffee Industry Data

3 Advanced Customization Apply advanced customization techniques in Matplotlib, including multi-chart


figures, custom layouts & colors, style sheets, and more

PROJECT: Consolidating Coffee Industry Data into a Report

Visualize data with Seaborn, another Python library that introduces new chart
4 Data Viz with Seaborn types and layouts, and interacts will with Matplotlib

PROJECT: Highlighting Insights from the Automotive Auction Industry

*Copyright Maven Analytics, LLC


WELCOME TO MAVEN CONSULTING GROUP

You’ve just been hired as an Associate Consultant for Maven Consulting Group
THE (MCG), a multinational firm that provides strategic advice to companies across
SITUATION different industries. Your new role will see you take on projects in the hotel,
coffee, automotive, and diamond industries.

Your task is to effectively visualize data from these industries to deliver key
THE insights to MCG’s clients.
ASSIGNMENT This will range from analyzing hotel customer demographics to understanding the
major players in the global coffee industry.

• Use Pandas to read & manipulate multiple datasets


THE
• Use Matplotlib to visualize data & communicate insights,
OBJECTIVES and then build reports to consolidate your findings
• Use Seaborn to conduct advanced exploratory analysis
and aid the decision-making process

*Copyright Maven Analytics, LLC


SETTING EXPECTATIONS

This course covers the core functionality for Matplotlib & Seaborn
• We’ll cover chart types, common customization options, and best practices for visualizing and analyzing data
• We’ll give the tools to use the official documentation to apply any customization option not covered in the course

We’ll focus on creating static visuals & dashboards


• Interactive data visualization with Python will be covered in a separate course

We’ll use Jupyter Notebooks as our primary coding environment


• Jupyter Notebooks are free to use, and the industry standard for conducting data analysis with Python
(we’ll introduce Google Colab as an alternative, cloud-based environment as well)

You do NOT need to be a Python expert to take this course


• It is strongly recommended that you complete our Python Foundations and Data Analysis with Pandas courses, or
have a solid understanding of basic Python syntax and DataFrame manipulation with the Pandas library

*Copyright Maven Analytics, LLC


*Copyright Maven Analytics, LLC
INSTALLING ANACONDA (MAC)

1) Go to anaconda.com/products/distribution and click

4) Follow the installation steps


(default settings are OK)

2) Click X on the Anaconda Nucleus pop-up


(no need to launch)

3) Launch the downloaded Anaconda pkg file

*Copyright Maven Analytics, LLC


INSTALLING ANACONDA (PC)

1) Go to anaconda.com/products/distribution and click

4) Follow the installation steps


(default settings are OK)

2) Click X on the Anaconda Nucleus pop-up


(no need to launch)

3) Launch the downloaded Anaconda exe file

*Copyright Maven Analytics, LLC


LAUNCHING JUPYTER

1) Launch Anaconda Navigator 2) Find Jupyter Notebook and click

*Copyright Maven Analytics, LLC


YOUR FIRST JUPYTER NOTEBOOK

1) Once inside the Jupyter interface, create a folder to store your notebooks for the course

NOTE: You can rename your folder by clicking “Rename” in the top left corner

2) Open your new coursework folder and launch your first Jupyter notebook!

NOTE: You can rename your notebook by clicking on the title at the top of the screen

*Copyright Maven Analytics, LLC


THE NOTEBOOK SERVER

NOTE: When you launch a Jupyter notebook, a terminal window may pop up as
well; this is called a notebook server, and it powers the notebook interface

If you close the server window,


your notebooks will not run!

Depending on your OS, and method


of launching Jupyter, one may not
open. As long as you can run your
notebooks, don’t worry!

*Copyright Maven Analytics, LLC


ALTERNATIVE: GOOGLE COLAB

Google Colab is Google’s cloud-based version of Jupyter Notebooks

To create a Colab notebook:


1. Log in to a Gmail account
2. Go to colab.research.google.com
3. Click “new notebook”

Colab is very similar to Jupyter Notebooks


(they even share the same file extension); the main
difference is that you are connecting to Google
Drive rather than your machine, so files will be
stored in Google’s cloud

*Copyright Maven Analytics, LLC


*Copyright Maven Analytics, LLC
DATA VISUALIZATION

In this section we’ll cover key data visualization best practices for clear communication,
with tips for choosing the right chart, formatting it effectively, and using it to tell a story

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:


• Understand the purpose behind visualizing data

• Learn the common chart types and their use cases

• Apply data visualization best practices to create clear


and compelling charts

• Address common errors and how to avoid them

*Copyright Maven Analytics, LLC


WHY VISUALIZE DATA?

Data visualization allows you to bring your data to life


• The human brain is built to interpret raw data as meaningless numbers and noise
• We need clear patterns and visual cues to help us quickly make sense of complex information

Prefrontal Cortex Visual Cortex


• Located in the frontal lobe • Located in the occipital lobe
• Responsible for cognitive • Responsible for visual perception
functioning & problem solving & understanding
• Helps us make sense of non-visual • Helps us make sense of colors,
information (like raw data) patterns, shapes, sizes, etc.
• Slow & conscious • Instantaneous & subconscious

Data visualization puts both our prefrontal and visual cortex to work, combining
the power of cognition (slow and conscious) and perception (instantaneous)

*Copyright Maven Analytics, LLC


THE TEN SECOND RULE

In 10 seconds, what can you learn from the data below?

0 TIME’S UP!
10

*Copyright Maven Analytics, LLC


THE TEN SECOND RULE

What if you were given the averages?

*Copyright Maven Analytics, LLC


THE TEN SECOND RULE

What if you visualize it?

This is a slight twist on


Anscombe’s Quartet

Despite sharing nearly


identical descriptive stats,
each series tells a very
different visual story

*Copyright Maven Analytics, LLC


THE 3 KEY QUESTIONS

The 3 key questions are a great way to help choose the right visual

What type of data are What do you want to Who is the end user and
you working with? communicate? what do they need?

• Time-series • Comparison • Analyst


Data that spans across Compares values over time or Likes to see details and understand
continuous time periods across categories what’s happening at a granular level

• Categorical • Composition • Manager


Data that can be split up into Breaks down the component Wants summarized information
groups or categories parts of a whole with clear, actionable insights

• Numeric • Distribution • Executive


Data with quantitative values, Shows the frequency of values Needs high-level, clear KPIs to track
either discrete or continuous within a series business health and performance

• Hierarchical • Relationship • General Public


Data with natural groups and Shows the correlation between Requires engaging visuals and a
sub-groups multiple variables clear story to follow

*Copyright Maven Analytics, LLC


ESSENTIAL VISUALS

KPI CARD PIE CHART TABLE


Sometimes Sort the slices, keep Add a color scale to
simple text them under ~5, and highlight patterns in
works best focus on one the data

LINE CHART BAR CHART SCATTER PLOT


Remember that
correlation does not
imply causation
The dates must be
continuous

Baseline must start at zero

AREA CHART 100% STACKED HISTOGRAM

Comparison &
composition Avoid using too
many bins!

*Copyright Maven Analytics, LLC


CHART FORMATTING

Chart formatting should be used to eliminate noise & facilitate understanding

BEFORE: Cluttered chart This is the right chart type… so why is it


so hard to understand the visual?

× The chart border and gridlines are more


distracting than useful
× The vertical axis labels are hard to read
and lack context – it’s using scientific
notation and doesn’t start at 0
× Data labels can help add context, but they
just add noise here
× It’s not clear what each line represents

PRO TIP: Be intentional about the formatting you apply – don’t just use the default settings!

*Copyright Maven Analytics, LLC


CHART FORMATTING

Chart formatting should be used to eliminate noise & facilitate understanding

AFTER: Clear chart


PRO TIPS:
✓ Remove the chart border & gridlines
✓ Format the axis labels clearly
✓ Add context with the chart title
✓ Create a visual order
✓ Make sure the story is clear

“Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away”
Antoine de Saint-Exupery

*Copyright Maven Analytics, LLC


STORYTELLING

Descriptive titles and data labels can be used to tell a clear story within your visuals

AFTER: Compelling chart


PRO TIPS:
✓ Leverage the title to guide the audience
toward specific insights
✓ Insert text & shapes directly inside the chart
✓ Use data labels and annotations to draw
attention to the main data points
✓ Use color strategically

*Copyright Maven Analytics, LLC


COMMON ERRORS

Choosing the wrong visual to represent the type of data

Using a line chart, which is


meant for time series data,
with categorical data gives the
false sense of a trend

Bar charts are great for showing


comparison with categorical data

While a tree map can work,


comparisons and compositions are
harder to make than with a bar or
pie chart
It’s best to use them with PRO TIP: Don’t prioritize
hierarchical data variety over effectiveness; use
the right chart for the job!

*Copyright Maven Analytics, LLC


COMMON ERRORS

Including too many series in a single visual

It’s hard to focus or extract


any valuable information

Try highlighting the series


you want, or aggregating
other categories

You can also group the other


categories into a single series

*Copyright Maven Analytics, LLC


COMMON ERRORS

Providing little to no context with text and labels

What does each


line represent?

What are
these values?

What does each


period represent?

When removing elements from a chart to reduce clutter and noise,


remember to keep all the elements that add understanding

*Copyright Maven Analytics, LLC


COMMON ERRORS

Using inconsistent colors between related visuals

Using different colors for the same series


makes it difficult to associate them visually

Consistency gains more


importance as the number
of visuals increases, making
it critical for dashboards

Using the same colors consistently makes


them easier to understand, and in some
cases allows you to remove the legend

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

Always answer the 3 key questions to choose the right visual


• What type of data are you working with? What do you want to communicate? Who is the end user?

Do NOT prioritize variety over effectiveness


• Choose chart types based on how clearly they communicate the data underneath – you can customize later!

Eliminate noise and distractions to facilitate understanding


• “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away”

Tell a story with the data to guide the user to the insights
• Use titles, strategic labels, and callouts to create a clear narrative

*Copyright Maven Analytics, LLC


*Copyright Maven Analytics, LLC
INTRO TO MATPLOTLIB

In this section we’ll introduce the Matplotlib library and use it to build & customize several
chart types, including line charts, bar charts, pie charts, scatterplots, and histograms

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:


• Understand the difference between the two primary
Matplotlib plotting frameworks

• Identify the key components of an object-oriented plot

• Build different variations of line, bar and pie charts, as


well as scatterplots and histograms

• Customize your charts by adding custom titles, labels,


legends, annotations and much more!

*Copyright Maven Analytics, LLC


MEET MATPLOTLIB

Matplotlib is an open-source Python library built for data visualization that lets you
produce a wide variety of highly customizable charts & graphs

‘plt’ is the standard alias for Matplotlib

The plot() function creates a line


chart by default, using the index
as the x-values and the list
elements as the y-values

*Copyright Maven Analytics, LLC


COMPATIBLE DATA TYPES

Matplotlib can plot many data types, including base Python sequences, NumPy
Arrays, and Pandas Series & DataFrames

Python List Pandas Series Pandas DataFrame

*Copyright Maven Analytics, LLC


PLOTTING METHODS

Matplotlib has two plotting methods, or interfaces:

Charts are created with the plot() function, Charts are created by defining a plot object,
and modified with additional functions and modified using figure & axis methods

1. Create the figure object and assign it to


the ‘fig’ variable
2. Add a chart, or axis, object to the figure
and assign it to the ‘ax’ variable
3. Call the axis plot() method to draw the
chart

We’ll mostly focus on the


Object-Oriented approach,
as it provides more clear
control over customization

*Copyright Maven Analytics, LLC


OBJECT-ORIENTED PLOTTING

Object-Oriented plots are built by adding axes, or charts, to a figure


• The subplots() function lets you create the figure and axes in a single line of code
• You can then use figure & axis methods to customize the different elements in the plot

Creates the figure and axis


Plots “y”

Adds a title to the figure and axis

We’ll start by adding a single


subplot to each figure for now,
but will dive deeper into
subplots later in the course!

*Copyright Maven Analytics, LLC


PLOTTING DATAFRAMES

When plotting DataFrames using the Object-Oriented interface, Matplotlib will


use the index as the x-axis and plot each column as a separate series by default

*Copyright Maven Analytics, LLC


PLOTTING DATAFRAMES

Plotting each series independently allows for improved customization


• ax.plot(x-axis series, y-series values)

*Copyright Maven Analytics, LLC


ASSIGNMENT: PLOTTING DATAFRAMES

Results Preview
NEW MESSAGE
August 29, 2022

From: Ian Intern (Summer Consultant)


Subject: Do you know Matplotlib?

Hi!

I need someone who knows Matplotlib for help with some


client work.

Can you plot Lodging Revenue and Other Revenue over time
for our hotel client?

Thanks!

section02_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: PLOTTING DATAFRAMES

Solution Code
NEW MESSAGE
August 29, 2022
Plot Each Series
From: Ian Intern (Summer Consultant)
Subject: Do you know Matplotlib?

Hi!

I need someone who knows Matplotlib for help with some Plot The DataFrame
client work.

Can you plot Lodging Revenue and Other Revenue over time
for our hotel client?

Thanks!

section02_solutions.ipynb

*Copyright Maven Analytics, LLC


FORMATTING OPTIONS

Matplotlib has these formatting options for PyPlot and Object-Oriented plots:

Figure Title

Y-axis Tick
Legend Figure Title fig.suptitle() plt.suptitle()
Axis Title Chart Title ax.set_title() plt.subtitle()

Y-axis Label X-Axis Label ax.set_xlabel() plt.xlabel()

Y-Axis Label ax.set_ylabel() plt.ylabel()

Legend ax.legend() plt.legend()


Text
X-Axis Limit ax.set_xlim() plt.xlim()

Y-Axis Limit ax.set_ylim() plt.ylim()

Axes X-Axis Ticks ax.set_xticks() plt.xticks()

Figure Y-Axis Ticks ax.set_yticks() plt.yticks()


Vertical Line
Vertical Line ax.axvline() plt.axvline()

Horizontal Line ax.axhline() plt.axhline()


X-axis Tick spine[‘bottom’]
Text ax.text() plt.text()
X-axis Label
Spines (borders) ax.spines[‘side’] plt.spines[‘side’]

*Copyright Maven Analytics, LLC


CHART TITLES

The set_title() and set_label() methods let you add chart titles and axis labels
• fig.suptitle() serves as an overall figure title

*Copyright Maven Analytics, LLC


FONT SIZES

You can modify chart font sizes with the “fontsize” argument
• You can specify the size in points (10, 12, etc.) or relative size (“smaller”, “x-large”, etc.)

*Copyright Maven Analytics, LLC


CHART LEGENDS

The legend() method lets you add a chart legend to identify each series
• The series labels are used by default, but custom values can also be passed through

*Copyright Maven Analytics, LLC


CHART LEGENDS

The legend() method lets you add a chart legend to identify each series
• The series labels are used by default, but custom values can also be passed through

*Copyright Maven Analytics, LLC


LEGEND LOCATION

You can change the legend location with the “loc” or “bbox_to_anchor” arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates

1
best (default)

upper right

upper left

upper center

lower right

lower left

lower center

center right

center left 0

center bbox
0 1

*Copyright Maven Analytics, LLC


LEGEND LOCATION

You can change the legend location with the “loc” or “bbox_to_anchor” arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates

*Copyright Maven Analytics, LLC


LEGEND LOCATION

You can change the legend location with the “loc” or “bbox_to_anchor” arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates

*Copyright Maven Analytics, LLC


LEGEND LOCATION

You can change the legend location with the “loc” or “bbox_to_anchor” arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates

*Copyright Maven Analytics, LLC


LEGEND LOCATION

You can change the legend location with the “loc” or “bbox_to_anchor” arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates

Setting coordinates beyond 1 will push the


legend outside the chart area
(useful when there is no whitespace!)
*Copyright Maven Analytics, LLC
LINE STYLE

You can change the line style with the “linestyle”, “linewidth”, and “color” arguments
• Common line styles are “solid”, “dashed”, or “dotted” (you can also use “-”, “--”, or “:”)

We will dive into colors in depth later, including changing


the default color palette and using hex color codes!

*Copyright Maven Analytics, LLC


AXIS LIMITS
The set_ylim() and set_xlim() functions let you modify the axis limits
• ax.set_xlim(lower limit, upper limit)

Your date x-axis ticks may change interval size!

PRO TIP: Keeping the base of the y-axis at 0


highlights the true magnitude of change across
periods and the differences between series

*Copyright Maven Analytics, LLC


FIGURE SIZE
You can adjust the figure size with the “figsize” argument
• figsize=(width, height) – the default is 6.4 x 4.8 inches

PRO TIP: Increasing figure size lets you add


whitespace to your visual, which can reduce
clutter and add space to crowded axes

*Copyright Maven Analytics, LLC


CUSTOM X-TICKS
You can apply custom x-ticks with the set_xticks() and xticks() functions
• ax.set_xticks(iterable)

This sets the xticks at every 2nd date from


the index and rotates them by 45 degrees

*Copyright Maven Analytics, LLC


ADDING VERTICAL LINES

You can add vertical lines to mark key points with the axvline() function

Set the coordinate (in this case days since Jan 1, 1970)
and an optional color and style

*Copyright Maven Analytics, LLC


TEXT

You can add text at specific coordinates with the text() function
• ax.text(x-coordinate, y-coordinate, string, additional text formatting)

*Copyright Maven Analytics, LLC


PRO TIP: ANNOTATIONS

Annotations are a great way to call-out and label important datapoints


• ax.annotate(string, datapoint coordinate, text coordinate, arrow style dictionary, text formatting)

Annotations have many more options that we won’t cover in depth,


but the documentation has great examples worth looking into!

For a more info on annotations, visit: https://matplotlib.org/stable/tutorials/text/annotations.html#sphx-glr-tutorials-text-annotations-py *Copyright Maven Analytics, LLC
REMOVING CHART BORDERS

You can remove specific chart borders with ax.spines[].set_visible(False)

This removes the right and top borders

*Copyright Maven Analytics, LLC


ASSIGNMENT: CHART FORMATTING

Results Preview
NEW MESSAGE
August 30, 2022

From: Ian Intern (Summer Consultant)


Subject: RE: Final Charts for Client

Hi there!

The data you plotted earlier looks good, but can you clean up
the chart a little bit? I want it to to look polished for our client.
This is my last day in my summer internship and I want to get
hired back!

Thanks!

section02_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: CHART FORMATTING

Solution Code
NEW MESSAGE
August 30, 2022

From: Ian Intern (Summer Consultant)


Subject: Final Charts for Client

Hi there!

The data you plotted earlier looks good, but can you clean up
the chart a little bit! Want to to look polished for our client.
This is my last day in my summer internship and I want to get
hired back!

Thanks!

section02_solutions.ipynb

*Copyright Maven Analytics, LLC


LINE CHARTS

Line charts are used for showing trends over time


• ax.plot(x-axis series, series values, formatting options)

Column for each series


Dates as the index

PRO TIPS
Pivot tabular data to turn each unique series into a DataFrame column, and set the datetime as the index

Divide your series by the appropriate units while plotting to simplify the y-axis scale

*Copyright Maven Analytics, LLC


LINE CHARTS

EXAMPLE Available Housing Units by Week

*Copyright Maven Analytics, LLC


STACKED LINE CHARTS

Use stackplot() to create a stacked line chart, which lets you visualize the overall
trend over time, as well as its composition by series

*Copyright Maven Analytics, LLC


STACKED LINE CHARTS

Use stackplot() to create a stacked line chart, which lets you visualize the overall
trend over time, as well as its composition by series

PRO TIP: Use the bottom series in the


stacked line chart to draw focus to its
individual trend – it’s the most visible!

*Copyright Maven Analytics, LLC


PRO TIP: DUAL AXIS CHARTS

Use twinx() to create a dual axis chart, which lets you plot series with values on
significantly different scales inside a single visual

The “Inventory” values are so small compared to “Price” that


they appear to be 0 when plotted on the same y-axis

*Copyright Maven Analytics, LLC


PRO TIP: DUAL AXIS CHARTS

Use twinx() to create a dual axis chart, which lets you plot series with values on
significantly different scales inside a single visual

Create a second axis (ax2) with ax.twinx(),


then create the desired plot on ax2

Note that using the figure level


legend picks up both series

*Copyright Maven Analytics, LLC


ASSIGNMENT: LINE CHARTS

Results Preview
NEW MESSAGE
August 30, 2022

From: Ian Intern (Summer Consultant)


Subject: Re: Re: Final Charts for Client

Hey again,

Great work on those charts!

Final request - we want to plot compare room nights booked


vs cancellations over time, we might need a dual axis chart to
effectively do this. I’m totally checked out, so can you do this?
You’ll be put in contact with the client soon.

Thanks!

section02_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: LINE CHARTS

Solution Code
NEW MESSAGE
August 30, 2022

From: Ian Intern (Summer Consultant)


Subject: Re: Re: Final Charts for Client

Hey again,

Great work on those charts!

Final request - we want to plot compare room nights booked


vs cancellations over time, we might need a dual axis chart to
effectively do this. I’m totally checked out, so can you do this?
You’ll be put in contact with the client soon.

Thanks!

section02_solutions.ipynb

*Copyright Maven Analytics, LLC


BAR CHARTS

Bar charts are used to compare values across different categories


• ax.bar(category labels, bar heights, formatting options)

Values in a single column


Categories as the index

PRO TIPS
Use .groupby() and .agg() to aggregate your data by category and push the labels into the index

Use Seaborn or the Pandas plot API for grouped bar charts

*Copyright Maven Analytics, LLC


BAR CHARTS

EXAMPLE Median Home Price by City

*Copyright Maven Analytics, LLC


PRO TIP: HORIZONTAL LINES

Use axhline() to add a horizontal line at a specified y-value on a bar chart


• This will typically be something to benchmark against, like a mean or target

*Copyright Maven Analytics, LLC


HORIZONTAL BAR CHARTS

Use barh() to create a horizontal bar chart

Note that the Series in a horizontal bar chart are


sorted in the opposite order as in a vertical bar chart

*Copyright Maven Analytics, LLC


PRO TIP: HIGHLIGHTS

Use the “color” argument to highlight the series you’d like to focus on

Use a list to specify the color for each Series

*Copyright Maven Analytics, LLC


ASSIGNMENT: BAR CHARTS

Results Preview
NEW MESSAGE
September 1, 2022

From: Sarah Shark (Managing Director)


Subject: CHARTS NEEDED ASAP

Hello,

Our hotel client is concerned about our intern’s departure.

I need YOU to step up and make sure they’re happy with us.
Start by taking a quick look at room nights and lodging by
country for our top 10 countries by total nights booked.

I expect the results in my inbox by morning (more details in


the notebook attached).

-S

section02_assignments.ipynb

*Copyright Maven Analytics, LLC


ASSIGNMENT: BAR CHARTS

Solution Code
NEW MESSAGE
September 1, 2022

From: Sarah Shark (Managing Director)


Subject: CHARTS NEEDED ASAP

Hello,

Our hotel client is concerned about our intern’s departure.

I need YOU to step up and make sure they’re happy with us.
Start by taking a quick look at room nights and lodging by
country for our top 10 countries by total nights booked.

I expect the results in my inbox by morning (more details in


the notebook attached).

-S

section02_solutions.ipynb

*Copyright Maven Analytics, LLC


STACKED BAR CHARTS

You can create a stacked bar chart by setting the “bottom” argument for the
second “stacked” series as the values from the bars below it
• This will use those values as the baseline for the stacked bars instead of the x-axis

The Oregon bars are plotted by using the


California values as their “bottom”

*Copyright Maven Analytics, LLC


100% STACKED BAR CHARTS

To create a 100% stacked bar chart, convert your DataFrame to row-level


percentages before plotting

*Copyright Maven Analytics, LLC


PRO TIP: GROUPED BAR CHARTS

You can create a grouped bar chart by reducing the width of each series and
shifting them evenly around their corresponding label

This shifts the bars to the left across


the x-axis by half their width

This shifts these bars to the right

Grouped bar charts are much easier to create


by using Seaborn or Pandas’ Matplotlib API

*Copyright Maven Analytics, LLC


PRO TIP: COMBO CHARTS

You can create a combo chart by specifying different chart types in a dual axis plot

PRO TIP: Use the “alpha” argument to


modify the transparency of each plot
(0 is invisible and 1 is solid)

*Copyright Maven Analytics, LLC


ASSIGNMENT: ADVANCED BAR CHARTS

Results Preview
NEW MESSAGE
September 2, 2022

From: Sarah Shark (Managing Director)


Subject: RE: RE: CHARTS NEEDED ASAP

Hello,

Nice work…so far. I need some more detailed views on the


breakdown of lodging revenue vs. other revenue by country.

Build a grouped bar chart with the lodging revenue and other
revenue for each country. Then, build a 100% stacked bar
chart showing how much each revenue category contributes
to overall country revenue. Add a reference line at 80% to
help illustrate which countries get less than 80% of their
revenue from lodging.

-S

section02_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: ADVANCED BAR CHARTS

Solution Code
NEW MESSAGE
September 2, 2022

From: Sarah Shark (Managing Director)


Subject: RE: RE: CHARTS NEEDED ASAP

Hello,

Nice work…so far. I need some more detailed views on the


breakdown of lodging revenue vs. other revenue by country.

Build a grouped bar chart with the lodging revenue and other
revenue for each country. Then, build a 100% stacked bar
chart showing how much each revenue category contributes
to overall country revenue. Add a reference line at 80% to
help illustrate which countries get less than 80% of their
revenue from lodging.

-S

section02_solutions.ipynb

*Copyright Maven Analytics, LLC


PIE CHARTS

Pie charts are used to compare proportions totaling 100%


• ax.pie(series values, labels= , startangle= , autopct=, pctdistance=, explode=)

Values in a single column

Labels as the index

PRO TIPS
Keep the number of slices low (<7) to enhance readability – you can group “others” into a single slice

Use bar charts if you want to compare the categories – pies are for showing how they make up a whole
Donut charts make great KPI progress trackers
*Copyright Maven Analytics, LLC
PIE CHARTS

EXAMPLE Homes Sold by City

*Copyright Maven Analytics, LLC


PRO TIP: DONUT CHARTS

You can create a donut chart by adding a “hole” to a pie chart and shifting the labels

How does this code work?


• It pushes the data labels 85% of the way towards the edge of the pie chart
• Then adds a white circle that covers the center of the pie chart to the figure

*Copyright Maven Analytics, LLC


ASSIGNMENT: PIE & DONUT CHARTS

Results Preview
NEW MESSAGE
September 3, 2022

From: Sarah Shark (Managing Director)


Subject: UPDATED CHARTS

Hello,

Our hotel client is looking for a pie/donut chart to represent


the share of revenue by country.

Create a pie chart with slices for the top 5 countries by


revenue, and a single “other” slice for the rest of the countries.

Need it ASAP.

Thx

section02_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: PIE & DONUT CHARTS

Solution Code
NEW MESSAGE
September 3, 2022

From: Sarah Shark (Managing Director)


Subject: UPDATED CHARTS

Hello,

Our hotel client is looking for a pie/donut chart to represent


the share of revenue by country.

Create a pie chart with slices for the top 5 countries by


revenue, and a single “other” slice for the rest of the countries.

Need it ASAP.

Thx

section02_solutions.ipynb

*Copyright Maven Analytics, LLC


SCATTERPLOTS

Scatterplots are used to visualize the relationship between numerical variables


• ax.scatter(x-axis series, y-axis series, size= , alpha=)

One row per point x-series y-series

PRO TIPS
Modify the alpha (transparency) level to make overlapping points more visible

Bubble charts can be useful in some cases, but they often add confusion rather than clarity

*Copyright Maven Analytics, LLC


SCATTERPLOTS

EXAMPLE Months of Supply vs. Median List Price

*Copyright Maven Analytics, LLC


BUBBLE CHARTS

To create a bubble chart, specify a third series in the “size” argument of .scatter()
• You may need to apply some arithmetic to adjust the bubble sizes

*Copyright Maven Analytics, LLC


HISTOGRAMS

Histograms are used to visualize the distribution of a numeric variable


• ax.hist(series, density= , alpha=, bins=)

numerical series

PRO TIPS
Modify the alpha (transparency) level to plot multiple distributions on the same axis

Set density=True to use relative frequencies on the y-axis (percent of total)

*Copyright Maven Analytics, LLC


HISTOGRAMS

EXAMPLE Distribution Y-o-Y Growth in Home Price for Calendar Weeks

*Copyright Maven Analytics, LLC


ASSIGNMENT: SCATTERPLOTS & HISTOGRAMS

Results Preview
NEW MESSAGE
September 4, 2022

From: Sarah Shark (Managing Director)


Subject: Additional Customer Profiling

Not bad rookie – thanks for the quick turnaround.

I need two more charts to help finalize a marketing strategy


targeting overseas guests:

1. A chart comparing average revenue per customer and


average nights stayed, with average nightly revenue as
the size of the bubbles (you’ll need to aggregate the data
by country)
2. The distribution of customer ages in France & Germany

-sent from my yPhone

section02_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: SCATTERPLOTS & HISTOGRAMS

Solution Code
NEW MESSAGE
September 4, 2022

From: Sarah Shark (Managing Director)


Subject: Additional Customer Profiling

Not bad rookie – thanks for the quick turnaround.

I need two more charts to help finalize a marketing strategy


targeting overseas guests:

1. A chart comparing average revenue per customer and


average nights stayed, with average nightly revenue as
the size of the bubbles (you’ll need to aggregate the data
by country)
2. The distribution of customer ages in France & Germany

-sent from my yPhone

section02_solutions.ipynb

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

Matplotlib has two methods for plotting data: PyPlot API & Object Oriented
• Both can visualize many data types (lists, DataFrames, etc.), but object-oriented plots are easier to fully customize

Object Oriented plots are built by adding axes to a figure


• You can layer on different elements to these objects to modify the chart formatting

You can create common chart types by using Matplotlib functions


• Each chart type can be customized further to create more advanced variations

Matplotlib's extreme customizability also adds complexity


• Understanding the anatomy of a Matplotlib figure helps pinpoint how to change every component in your chart

*Copyright Maven Analytics, LLC


*Copyright Maven Analytics, LLC
PROJECT DATA: COFFEE PRODUCTION

*Copyright Maven Analytics, LLC


PROJECT DATA: COFFEE IMPORTS

*Copyright Maven Analytics, LLC


PROJECT DATA: COFFEE PRICES

*Copyright Maven Analytics, LLC


ASSIGNMENT: MID-COURSE PROJECT

Key Objectives
NEW MESSAGE
September 7, 2022 1. Read in data from multiple csv files
From: Sarah Shark (Managing Director) 2. Reshape the data to prepare it for visualization
Subject: Coffee Industry Deep Dive
3. Build & customize charts to communicate the
key insights to the client
Hi there,

I’m starting to trust you… which is rare. We just got an inquiry


from a major coffee trader looking to get an outside view on
the coffee industry. They’re particularly interested in Brazil’s
production relative to other nations.

We’ll also look at a comparison of importer volume vs the


prices they pay to understand if we can unlock margin by
diversifying into new markets.

Do well on this and you’ll be on promotion track.

section03_coffee_project_part1.ipynb

*Copyright Maven Analytics, LLC


*Copyright Maven Analytics, LLC
ADVANCED CUSTOMIZATION

In this section we’ll cover advanced customization techniques in Matplotlib, including


multi-chart figures, custom layouts & colors, style sheets, and more

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:


• Understand how to build multi-chart figures both with
subplots and GridSpec layouts

• Learn how to customize chart colors, by leveraging


custom colormaps and creating your own!

• Take a look at pre-built stylesheets, and dive into the


settings behind them that allow for extreme chart
customization

*Copyright Maven Analytics, LLC


SUBPLOTS

Subplots let you create a grid of equally sized charts in a single figure
• fig, ax = plt.subplots(rows, columns) – this creates a grid with the specified rows & columns

Column 0 Column 1

Row 0 (0, 0) (0, 1)

Row 1 (1, 0) (1, 1)


This creates a 2 row, 2 column
grid that can be populated with
individual charts

*Copyright Maven Analytics, LLC


SUBPLOTS

Subplots let you create a grid of equally sized charts in a single figure
• fig, ax = plt.subplots(rows, columns) – this creates a grid with the specified rows & columns

(0, 0) (0, 1)

(1, 0) (1, 1)

Specify ax[row][column] to create


and modify individual subplots

*Copyright Maven Analytics, LLC


SUBPLOTS

Subplots let you create a grid of equally sized charts in a single figure
• fig, ax = plt.subplots(rows, columns) – this creates a grid with the specified rows & columns

*Copyright Maven Analytics, LLC


SUBPLOTS

Use the “sharex “& “sharey” arguments to set the same axis limits on all the plots
• This is set as “none” by default, but can be set to “all”, “row”, or “col”

*Copyright Maven Analytics, LLC


SUBPLOTS

Subplots can be any chart type, and do not have to be the same type

*Copyright Maven Analytics, LLC


ASSIGNMENT: SUBPLOTS

Results Preview
NEW MESSAGE
September 10, 2022

From: Wendy Whiz (Data Scientist)


Subject: Deeper Exploration

Hey there,

I want to get a quick read on the distribution of revenue by


customer for our top 5 countries – I’m working on a model for
a similar client and want to see if the distributions are similar.

Doesn’t need to be polished, just need the 5 histograms in a


single figure.

Thanks, and looking forward to working with you more!

Wendy

Section04_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: SUBPLOTS

Solution Code
NEW MESSAGE
September 10, 2022

From: Wendy Whiz (Data Scientist)


Subject: Deeper Exploration

Hey there,

I want to get a quick read on the distribution of revenue by


customer for our top 5 countries – I’m working on a model for
a similar client and want to see if the distributions are similar.

Doesn’t need to be polished, just need the 5 histograms in a


single figure.

Thanks, and looking forward to working with you more!

Wendy

Section04_solutions.ipynb

*Copyright Maven Analytics, LLC


GRIDSPEC

You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns

Column 0 Column 1 Column 2 Column 3

Row 0

Row 1

Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

*Copyright Maven Analytics, LLC


GRIDSPEC

You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid

Column 0 Column 1 Column 2 Column 3

Row 0

Row 1
ax1
Row 2

Use a slice to specify the ranges of Row 3


rows and columns for each axis
Row 4

Row 5

Row 6

Row 7

*Copyright Maven Analytics, LLC


GRIDSPEC

You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid

Column 0 Column 1 Column 2 Column 3

Row 0

Row 1
ax1 ax2
Row 2

Row 3

Row 4

Row 5

Row 6

Row 7

*Copyright Maven Analytics, LLC


GRIDSPEC

You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid

Column 0 Column 1 Column 2 Column 3

Row 0

Row 1
ax1 ax2
Row 2

Row 3

Row 4

Row 5
ax3
Row 6

Row 7

*Copyright Maven Analytics, LLC


GRIDSPEC

You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid

*Copyright Maven Analytics, LLC


GRIDSPEC

You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid

*Copyright Maven Analytics, LLC


ASSIGNMENT: GRIDSPEC

Results Preview
NEW MESSAGE
September 12, 2022

From: Sarah Shark (Managing Director)


Subject: Revenue Report Format

Hi there,

Big meeting with our hotel client coming up – we want to


propose a report format that will help track their revenue,
specifically with respect to their goal to get French customers
to surpass German customers.

Can you create a figure with a line chart tracking revenue by


category, a bar chart with revenue for the top 5 countries, and
a chart indicating progress towards our French revenue goal?

Thanks!

section04_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: GRIDSPEC

Solution Code
NEW MESSAGE
September 12, 2022
GridSpec Layout (see notebook for chart code):
From: Sarah Shark (Managing Director)
Subject: Revenue Report Format

Hi there,

Big meeting with our hotel client coming up – we want to


propose a report format that will help track their revenue,
specifically with respect to their goal to get French customers
to surpass German customers.

Can you create a figure with a line chart tracking revenue by


category, a bar chart with revenue for the top 5 countries, and
a chart indicating progress towards our French revenue goal?

Thanks!

section04_solutions.ipynb

*Copyright Maven Analytics, LLC


COLORS

You can pass colors to a plot by assigning them to a list

This assigns each color in the


list to each bar in the plot

*Copyright Maven Analytics, LLC


COLORS

You can also loop through a list of colors to pass them to separate series in a plot

*Copyright Maven Analytics, LLC


COLORS

Hex codes can be used to supply specific color pantones

PRO TIP: Sites like Google have


helpful hexadecimal color pickers

*Copyright Maven Analytics, LLC


PRO TIP: COLOR PALETTES

You can also modify the entire color palette for the series in a plot

Default Color Map:

The “Set2” color map is applied here

Series colors are applied in this sequential


order (at 10+ series, the cycle repeats)

rcParams are the underlying settings for Matplotlib charts and can be
modified to gain a high level of customization (more on these soon!)

For more on color palettes, visit: https://matplotlib.org/3.5.0/tutorials/colors/colormaps.html *Copyright Maven Analytics, LLC
ASSIGNMENT: COLORS

Results Preview
NEW MESSAGE
September 13, 2022

From: Sarah Shark (Managing Director)


Subject: Re: Revenue Report Format

Hi again,

Love the layout, HATE the colors! Let’s show some polish by
getting away from the defaults.

Apply the “Set2” colormap to the line chart and look up the
national color hex codes for the top 5 countries to use them
for the rest of the charts.

Thanks,

Sarah

section04_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: COLORS

Solution Code
NEW MESSAGE
Apply Set2 (see notebook for chart code): :
September 13, 2022

From: Sarah Shark (Managing Director)


Subject: Re: Revenue Report Format Country Colors:

Hi again,
Donut Chart
Love the layout, HATE the colors! Let’s show some polish by
getting away from the defaults.

Apply the “Set2” colormap to the line chart and look up the
national color hex codes for the top 5 countries to use them
for the rest of the charts.

Thanks,

Sarah

section04_solutions.ipynb

*Copyright Maven Analytics, LLC


STYLE SHEETS

Matplotlib (and Seaborn) have style sheets that can be used instead of the default

The style is set in advance

The “fivethirtyeight” style


has larger font sizing, and
adds gridlines and a
background color

*Copyright Maven Analytics, LLC


STYLE SHEETS

Matplotlib (and Seaborn) have style sheets that can be used instead of the default
• You can still customize individual formatting options after setting a style

*Copyright Maven Analytics, LLC


STYLE SHEETS

Matplotlib (and Seaborn) have style sheets that can be used instead of the default
• You can still customize individual formatting options after setting a style

The Seaborn library has


additional styles that can
be used with Matplotlib
charts, like “darkgrid”

*Copyright Maven Analytics, LLC


ADDITIONAL STYLES

These are some of the additional styles available in both libraries:

*Copyright Maven Analytics, LLC


ASSIGNMENT: STYLE SHEETS

Results Preview
NEW MESSAGE
September 14, 2022

From: Sarah Shark (Managing Director)


Subject: Re: Re: Revenue Report Format

Hi,

Layout and colors look great now, but can we spruce up the
chart styling?

Use a style sheet of your choice.

Once we’ve done that it should be ready to ship.

Thx

-S

section04_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: STYLE SHEETS

Solution Code
NEW MESSAGE
September 14, 2022
Style Setting Only (see notebook for chart code):
From: Sarah Shark (Managing Director)
Subject: Re: Re: Revenue Report Format

Hi,

Layout and colors look great now, but can we spruce up the
chart styling?

Use a style sheet of your choice.

Once we’ve done that it should be ready to ship.

Thx

-S

section04_solutions.ipynb

*Copyright Maven Analytics, LLC


STYLE PARAMETERS

Viewing the parameters of a style sheet can help format charts properly and provide
inspiration for your own formatting changes

*Copyright Maven Analytics, LLC


PARAMETER GROUPS

There are 300+ parameters that can be modified, which fall into parameter groups:

axes Chart-level formatting axes.spine.top = False, axis.titlesize=‘Large’


date Date formatting options date.autoformatter.month = %Y-%m
figure Figure-level formatting figure.figsize = (8.5, 11), figure.facecolor=“grey”
font Font settings font.size = 16, font.style=‘helvetica’, font.weight=‘bold’
grid Gridline settings grid.linestyle = ‘:’, grid.linewidth = 2
legend Legend settings legend.loc = ‘lower right’, legend.frameon=False
savefig Saved figure Settings savefig.dpi = 1000, savefig.format = ‘png’
text Text settings text.color = ‘grey’, text.usetex = True
xtick/ytick X and Y tick settings xtick.labelcolor=‘green’, ytick.minor.visible = True
boxplot Settings for boxplots boxplot.whiskerprops.color = ‘orange’
hist Settings for histograms hist.bins = 20
lines Settings for line charts lines.linewidth = 2, lines.color = ‘red’,
scatter Settings for scatterplots scatter.marker = “+”

For more on rcParams, visit: https://matplotlib.org/stable/api/matplotlib_configuration_api.html *Copyright Maven Analytics, LLC


MODIFYING PARAMETERS
There are two ways to modify parameters:
1. You can change individual parameters via assignment
2. You can change multiple parameters from the same group with the rc() function
Turn off top and right spines
Change default axes title size to 20 Modify
figure size to 8”x 6”

PRO TIP: Modify parameters to avoid having to


repeat the same formatting options on each chart

*Copyright Maven Analytics, LLC


SAVING FIGURES

The savefig() function will save figures as an image file


• Simply specify the desired filename and format

Screenshotting the images with your operating


system’s snipping tool will often be sufficient for
building plots into presentations like this course ;).

*Copyright Maven Analytics, LLC


SAVING FIGURES

The savefig() function will save figures as an image file


• Simply specify the desired filename and format

If no extension in the filename is specified, the


file will be saved as a .png. Most systems support
.jpg, .jpeg, .svg, and .pdf, among others. The
default resolution is 100dpi (pixels per inch)

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

Subplots and GridSpec allow us to create multi-chart figures


• Subplots are equally sized grids, GridSpec allows for custom layouts

Colors can be set by specifying a colormap or by assigning colors to the data of


interest
• Common color names and hex codes can be used to assign colors to your data

Set a style to spruce up the default aesthetics, or use rcParams to completely


customize your charts
• Pre-built styles can add some nice aesthetic polish compared to the matplotlib defaults
• Understanding how to modify rcParams will allow you full control over chart customization, and reduce the need
for manual formatting

*Copyright Maven Analytics, LLC


*Copyright Maven Analytics, LLC
PROJECT DATA: OVERVIEW
Coffee Production

*Copyright Maven Analytics, LLC


PROJECT DATA: OVERVIEW
Prices Paid To Growers

*Copyright Maven Analytics, LLC


ASSIGNMENT: MID-COURSE PROJECT

Key Objectives
NEW MESSAGE
September 18, 2022 1. Read in data from multiple csv files
From: Clarissa Café (Coffee Client) 2. Reshape the data with Pandas to set up charts
Subject: Summary Report
3. Build and customize line charts, bar charts,
Hi there, histograms and more to communicate key
insights to our client
Sarah told me to reach out directly to you – we loved the work
you did on breaking down the industry, but we want to 4. Modify chart colors to represent national flags
summarize your findings on Brazil into a single figure we can
pass around. 5. Combine modified charts into a single report by
leveraging meshgrid and subplots
Can you combine your findings into a single figure report?
We’ll also want to modify colors. There are more details in the
attached notebook.

Thanks!
Clarissa

section05_coffee_project_part2.ipynb

*Copyright Maven Analytics, LLC


*Copyright Maven Analytics, LLC
DATA VISUALIZATION WITH SEABORN

In this section we’ll cover data visualization with Seaborn, another Python library that
introduces new chart types and layouts, and interacts well with Matplotlib

TOPICS WE’LL COVER: GOALS FOR THIS SECTION:


• Introduce the basics of plotting data with Seaborn

• Build variations of Matplotlib charts like bar charts and


histograms, as well as new visuals like boxplots, violin
plots, and linear relationship plots

• Create FacetGrid layouts as an alternative to subplots

• Integrate Seaborn plots with Matplotlib objects to get


the best of both worlds

*Copyright Maven Analytics, LLC


MEET SEABORN

Seaborn is a Python library for built for easily visualizing Pandas DataFrames,
taking away some of the “drawing” required when using Matplotlib

‘sns’ is the standard alias for Seaborn

You simply need to specify


a DataFrame as the “data”
argument and set columns
as the “x” and “y” axes
Seaborn will automatically
aggregate the results!

*Copyright Maven Analytics, LLC


MEET SEABORN

Seaborn is a Python library for built for easily visualizing Pandas DataFrames,
taking away some of the “drawing” required when using Matplotlib

You can change the aggregation method


and suppress the confidence intervals

*Copyright Maven Analytics, LLC


CHART FORMATTING

You can apply chart formatting to Seaborn plots using Matplotlib arguments
• These are passed to the Matplotlib object that Seaborn creates internally

We’ll cover integration with Matplotlib later, which is where you’ll be able to
leverage the chart formatting skills you’ve learned throughout the course

*Copyright Maven Analytics, LLC


CHART FORMATTING

Seaborn still has some useful chart formatting functions like despine()

*Copyright Maven Analytics, LLC


BAR CHARTS

Bar charts can be created in Seaborn with sns.barplot()


• Simply specify the desired category labels and series values as “x” & “y” arguments

Note that Seaborn automatically aggregates the data for the plot, using unique category values as the labels
for the bars, the mean of each category for the bar length, and the column headers as the axis labels

*Copyright Maven Analytics, LLC


BAR CHARTS

Bar charts can be created in Seaborn with sns.barplot()


• Simply specify the desired category labels and series values as “x” & “y” arguments

To create a horizontal bar chart, specify “x” as the data and


“y” as the labels. ci=None will suppress error bars.

*Copyright Maven Analytics, LLC


GROUPED BAR CHARTS

Grouped bar charts can be created by specifying a categorical column as “hue”

You can also sort the bars by one of the


columns, and apply a different color map

*Copyright Maven Analytics, LLC


HISTOGRAMS

Histograms can be created with sns.histplot() and a single “x” argument

*Copyright Maven Analytics, LLC


HISTOGRAMS

Histograms can be created with sns.histplot() and a single “x” argument


• You can also specify the number of “bins” and add the kernel density (kde=True)

The default style for Seaborn plots can be


nicer than their Matplotlib counterparts,
and vice versa, so choose the library the
works best for each chart!

*Copyright Maven Analytics, LLC


ASSIGNMENT: BASIC CHARTS

Results Preview
NEW MESSAGE
September 20, 2022

From: Sarah Shark (Managing Director)


Subject: New Charts

Hi,

Need a few more views on the hotel data using Seaborn.

Can we look at the distribution of lodging revenue for each


booking? Only plot customers with less than 1,500 dollars to
weed out longer term stays.

Then, build a bar chart with the average room nights stayed
for our top 5 countries.

Thanks

section06_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: BASIC CHARTS

Solution Code
NEW MESSAGE
September 20, 2022

From: Sarah Shark (Managing Director)


Subject: New Charts

Hi,

Need a few more views on the hotel data using Seaborn.

Can we look at the distribution of lodging revenue for each


booking? Only plot customers with less than 1,500 dollars to
weed out longer term stays.

The build a bar chart with the average room nights stayed for
our top 5 countries.

Thanks

section06_solutions.ipynb

*Copyright Maven Analytics, LLC


BOXPLOTS

Boxplots can be created with sns.boxplot()


• They visualize the distribution of a variable by plotting key statistics

Q1 Median Q3

Min Q3+1.5*IQR
Boxplot statistics:
• Median (50th percentile) Max

• 1st & 3rd Quartiles (25th & 75th percentiles)


• Interquartile Range (IQR) Outliers

• Min & Max Values (or 1.5x the IQR)


• Outliers

IQR

*Copyright Maven Analytics, LLC


BOXPLOTS

Boxplots can be created with sns.boxplot()


• They visualize the distribution of a variable by plotting key statistics

Specify a second axis to create


separate boxplots by category

*Copyright Maven Analytics, LLC


VIOLIN PLOTS

Violin plots can be created with sns.violinplot()


• They are boxplots with symmetrical kernel densities along their sides

*Copyright Maven Analytics, LLC


ASSIGNMENT: BOX & VIOLIN PLOTS

Results Preview
NEW MESSAGE
September 24, 2022

From: Sarah Shark (Managing Director)


Subject: Re: New Charts

Hi,

Let’s view the distribution of lodging revenue using a boxplot


instead, once again capping the revenue at 1500.

Then filter the data to the top 5 countries and build a violin
plot of their lodging revenue, as well as their age distribution.

Sarah

section06_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: BOX & VIOLIN PLOTS

Solution Code
NEW MESSAGE
September 24, 2022

From: Sarah Shark (Managing Director)


Subject: Re: New Charts

Hi,

Let’s view the distribution of lodging revenue using a boxplot


instead, once again capping the revenue at 1500.

Then filter the data to the top 5 countries and build a violin
plot of their lodging revenue, as well as their age distribution.

Sarah

section06_solutions.ipynb

*Copyright Maven Analytics, LLC


LINEAR RELATIONSHIP PLOTS

Seaborn has several plots to explore linear relationships:

Creates a scatterplot sns.scatterplot(x, y, data)

Creates a scatterplot with a fitted regression line sns.regplot(x, y, data)

Create a scatterplot with a fitted regression line, and can visualize


multiple categories using color, or splitting into rows & columns sns.lmplot(x, y, hue, row, col, data)

Creates a scatterplot and adds the distribution for each variable sns.jointplot(x, y, kind, data)

Creates a matrix of scatterplots comparing multiple variables, and


shows the distribution for each one sns.pairplot(cols)

*Copyright Maven Analytics, LLC


REGPLOT()

sns.regplot() creates a scatterplot with a fitted regression line

*Copyright Maven Analytics, LLC


LMPLOT()

sns.lmplot() lets you explore the impact of other variables on the relationship

Specify the ‘hue’ to


create a line for each
category in the specified
column and set a
different color for each
category

*Copyright Maven Analytics, LLC


LMPLOT()

sns.lmplot() lets you explore the impact of other variables on the relationship

Specify the ‘row’ and ‘column’ to


create regression plots for each
combination of variables

PRO TIP: This type of visual is great


for exploring your data, but way too
complex for a presentation!

*Copyright Maven Analytics, LLC


JOINTPLOT()

sns.jointplot() creates a scatterplot and adds the distribution of each variable

The ‘kind’ argument has


several options like
‘kde’, which plots the
kernel densities, and
‘reg’, which plots the
regression line

*Copyright Maven Analytics, LLC


PAIRPLOT()

sns.pairplot() creates a matrix of scatterplots comparing multiple variables, and


shows the distribution for each one along the diagonal

This lets you see the relationship between a diamond’s


weight (carat) and its length (x), width (y), and depth (z)
You can see that the weight of the diamond has a positive
relationship with height, width, and length, with the
relationships being VERY strong for width and depth

*Copyright Maven Analytics, LLC


ASSIGNMENT: LINEAR RELATIONSHIP PLOTS

Results Preview
NEW MESSAGE
September 26, 2022

From: Wendy Whiz (Data Scientist)


Subject: More Exploration

Hi there,

Can you produce charts to explore the relationship between


room nights and lodging revenue?

First for all the data and then for each top 5 country.

Can you also produce a pairplot comparing lodging revenue


to several key variables? (more details in the notebook)

Best,

Wendy

section06_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: LINEAR RELATIONSHIP PLOTS

Solution Code
NEW MESSAGE
September 26, 2022

From: Wendy Whiz (Data Scientist)


Subject: More Exploration

Hi there,

Can you produce charts to explore the relationship between


room nights and lodging revenue?

First for all the data and then for each top 5 country.

Can you also produce a pairplot comparing lodging revenue


to several key variables? (more details in the notebook)

Best,

Wendy

section06_solutions.ipynb

*Copyright Maven Analytics, LLC


HEATMAPS

Create a heatmap to visualize a table of data with sns.heatmap()

PRO TIP: Pandas’ pivot_table


method is a great way to set up
the data needed for a heat map!

*Copyright Maven Analytics, LLC


HEATMAPS

Create a heatmap to visualize a table of data with sns.heatmap()

You can modify rcParameters


with sns.set(), but we’ll show the
syntax for combining Matplotlib
and Seaborn shortly!

*Copyright Maven Analytics, LLC


ASSIGNMENT: HEATMAPS

Results Preview
NEW MESSAGE
September 26, 2022

From: Wendy Whiz (Data Scientist)


Subject: RE: More Exploration

Hi there,

Last piece to help me look at features for my modeling work.

Can you build a heatmap with countries as rows and market


segment as columns with the mean lodging revenue for each?

Then build a heatmap for a correlation matrix.

Thanks,

Wendy

section06_assignments.ipynb

*Copyright Maven Analytics, LLC


SOLUTION: HEATMAPS

Solution Code
NEW MESSAGE
September 26, 2022

From: Wendy Whiz (Data Scientist)


Subject: RE: More Exploration

Hi there,

Last piece to help me look at features for my modeling work.

Can you build a heatmap with countries as rows and market


segment as columns with the mean lodging revenue for each?

Then build a heatmap for a correlation matrix.

Thanks,

Wendy

section06_solutions.ipynb

*Copyright Maven Analytics, LLC


FACETGRID

Seaborn’s FacetGrid is a convenient alternative to Matplotlib’s subplot grids


• sns.FacetGrid(DataFrame, column, column wrap)

This creates 7 charts, one for each


“color”, in a grid with 3 columns

*Copyright Maven Analytics, LLC


FACETGRID

Seaborn’s FacetGrid is a convenient alternative to Matplotlib’s subplot grids


• sns.FacetGrid(DataFrame, column, column wrap)

This plots a histogram of


“price” for each “color” in
the DataFrame

*Copyright Maven Analytics, LLC


MATPLOTLIB INTEGRATION

You can build Seaborn plots in Matplotlib objects, which lets you customize and
integrate Seaborn charts as if they were built using Matplotlib

This creates a Matplotlib figure and axis, sets a Seaborn style,


creates a Seaborn bar chart, and then adds Matplotlib labels

*Copyright Maven Analytics, LLC


MATPLOTLIB INTEGRATION

You can build Seaborn plots in Matplotlib objects, which lets you customize and
integrate Seaborn charts as if they were built using Matplotlib

This lets you specify which


axes to plot the chart on

*Copyright Maven Analytics, LLC


KEY TAKEAWAYS

Seaborn is a user-friendly extension of Matplotlib


• It has a simple interface, nice aesthetics, and works well with Pandas DataFrames

Seaborn adds new chart types that are useful in exploring data
• Boxplots, violin plots, and linear model plots help profile data and identify relationships between variables

Seaborn is very compatible with Matplotlib


• Seaborn charts are extensions of Matplotlib objects, so they can be placed in Matplotlib figures
• Matplotlib formatting arguments can passed to corresponding Seaborn plotting functions

*Copyright Maven Analytics, LLC


*Copyright Maven Analytics, LLC
PROJECT DATA: USED CARS DATA

*Copyright Maven Analytics, LLC


ASSIGNMENT: FINAL PROJECT

Key Objectives
NEW MESSAGE
October 10, 2022 1. Read in and manipulate data with Pandas
From: Aaron Auto (VP of Fleet Management) 2. Build summary charts with Matplotlib and Seaborn
Subject: Optimal Fleet Truck Purchase
3. Leverage Seaborn’s advanced chart types to mine
insights from the data and make a decision
Hello,

We need an outside analysis on auto procurement for our


fleet of service vehicles. We lease trucks to contractors and
other businesses, but a recent spike in demand has meant
we’re unable to get cars from traditional suppliers.

I want to see an overview of the automotive auction industry,


before diving into where we can get Ford F150s for the most
affordable price on the market (more details in the notebook).

Thanks

section07_final_project.ipynb

*Copyright Maven Analytics, LLC

You might also like