Python Data Visualization
Python Data Visualization
This is a project-based course for students looking for a practical, hands-on approach to
learning data visualization with Python using the Matplotlib and Seaborn libraries
Quizzes & Assignments to test and reinforce key concepts, with step-by-step solutions
Interactive demos to keep you engaged and apply your skills throughout the course
Introduce the Matplotlib library and use it to build & customize several chart types,
2 Matplotlib Fundamentals including line charts, bar charts, pie charts, scatterplots, and histograms
Visualize data with Seaborn, another Python library that introduces new chart
4 Data Viz with Seaborn types and layouts, and interacts will with Matplotlib
You’ve just been hired as an Associate Consultant for Maven Consulting Group
THE (MCG), a multinational firm that provides strategic advice to companies across
SITUATION different industries. Your new role will see you take on projects in the hotel,
coffee, automotive, and diamond industries.
Your task is to effectively visualize data from these industries to deliver key
THE insights to MCG’s clients.
ASSIGNMENT This will range from analyzing hotel customer demographics to understanding the
major players in the global coffee industry.
This course covers the core functionality for Matplotlib & Seaborn
• We’ll cover chart types, common customization options, and best practices for visualizing and analyzing data
• We’ll give the tools to use the official documentation to apply any customization option not covered in the course
1) Once inside the Jupyter interface, create a folder to store your notebooks for the course
NOTE: You can rename your folder by clicking “Rename” in the top left corner
2) Open your new coursework folder and launch your first Jupyter notebook!
NOTE: You can rename your notebook by clicking on the title at the top of the screen
NOTE: When you launch a Jupyter notebook, a terminal window may pop up as
well; this is called a notebook server, and it powers the notebook interface
In this section we’ll cover key data visualization best practices for clear communication,
with tips for choosing the right chart, formatting it effectively, and using it to tell a story
Data visualization puts both our prefrontal and visual cortex to work, combining
the power of cognition (slow and conscious) and perception (instantaneous)
0 TIME’S UP!
10
The 3 key questions are a great way to help choose the right visual
What type of data are What do you want to Who is the end user and
you working with? communicate? what do they need?
Comparison &
composition Avoid using too
many bins!
PRO TIP: Be intentional about the formatting you apply – don’t just use the default settings!
“Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away”
Antoine de Saint-Exupery
Descriptive titles and data labels can be used to tell a clear story within your visuals
What are
these values?
Tell a story with the data to guide the user to the insights
• Use titles, strategic labels, and callouts to create a clear narrative
In this section we’ll introduce the Matplotlib library and use it to build & customize several
chart types, including line charts, bar charts, pie charts, scatterplots, and histograms
Matplotlib is an open-source Python library built for data visualization that lets you
produce a wide variety of highly customizable charts & graphs
Matplotlib can plot many data types, including base Python sequences, NumPy
Arrays, and Pandas Series & DataFrames
Charts are created with the plot() function, Charts are created by defining a plot object,
and modified with additional functions and modified using figure & axis methods
Results Preview
NEW MESSAGE
August 29, 2022
Hi!
Can you plot Lodging Revenue and Other Revenue over time
for our hotel client?
Thanks!
section02_assignments.ipynb
Solution Code
NEW MESSAGE
August 29, 2022
Plot Each Series
From: Ian Intern (Summer Consultant)
Subject: Do you know Matplotlib?
Hi!
I need someone who knows Matplotlib for help with some Plot The DataFrame
client work.
Can you plot Lodging Revenue and Other Revenue over time
for our hotel client?
Thanks!
section02_solutions.ipynb
Matplotlib has these formatting options for PyPlot and Object-Oriented plots:
Figure Title
Y-axis Tick
Legend Figure Title fig.suptitle() plt.suptitle()
Axis Title Chart Title ax.set_title() plt.subtitle()
The set_title() and set_label() methods let you add chart titles and axis labels
• fig.suptitle() serves as an overall figure title
You can modify chart font sizes with the “fontsize” argument
• You can specify the size in points (10, 12, etc.) or relative size (“smaller”, “x-large”, etc.)
The legend() method lets you add a chart legend to identify each series
• The series labels are used by default, but custom values can also be passed through
The legend() method lets you add a chart legend to identify each series
• The series labels are used by default, but custom values can also be passed through
You can change the legend location with the “loc” or “bbox_to_anchor” arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates
1
best (default)
upper right
upper left
upper center
lower right
lower left
lower center
center right
center left 0
center bbox
0 1
You can change the legend location with the “loc” or “bbox_to_anchor” arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates
You can change the legend location with the “loc” or “bbox_to_anchor” arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates
You can change the legend location with the “loc” or “bbox_to_anchor” arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates
You can change the legend location with the “loc” or “bbox_to_anchor” arguments
• “loc” lets you set a predetermined location option
• “bbox_to_anchor” lets you set specific (x, y) coordinates
You can change the line style with the “linestyle”, “linewidth”, and “color” arguments
• Common line styles are “solid”, “dashed”, or “dotted” (you can also use “-”, “--”, or “:”)
You can add vertical lines to mark key points with the axvline() function
Set the coordinate (in this case days since Jan 1, 1970)
and an optional color and style
You can add text at specific coordinates with the text() function
• ax.text(x-coordinate, y-coordinate, string, additional text formatting)
For a more info on annotations, visit: https://matplotlib.org/stable/tutorials/text/annotations.html#sphx-glr-tutorials-text-annotations-py *Copyright Maven Analytics, LLC
REMOVING CHART BORDERS
Results Preview
NEW MESSAGE
August 30, 2022
Hi there!
The data you plotted earlier looks good, but can you clean up
the chart a little bit? I want it to to look polished for our client.
This is my last day in my summer internship and I want to get
hired back!
Thanks!
section02_assignments.ipynb
Solution Code
NEW MESSAGE
August 30, 2022
Hi there!
The data you plotted earlier looks good, but can you clean up
the chart a little bit! Want to to look polished for our client.
This is my last day in my summer internship and I want to get
hired back!
Thanks!
section02_solutions.ipynb
PRO TIPS
Pivot tabular data to turn each unique series into a DataFrame column, and set the datetime as the index
Divide your series by the appropriate units while plotting to simplify the y-axis scale
Use stackplot() to create a stacked line chart, which lets you visualize the overall
trend over time, as well as its composition by series
Use stackplot() to create a stacked line chart, which lets you visualize the overall
trend over time, as well as its composition by series
Use twinx() to create a dual axis chart, which lets you plot series with values on
significantly different scales inside a single visual
Use twinx() to create a dual axis chart, which lets you plot series with values on
significantly different scales inside a single visual
Results Preview
NEW MESSAGE
August 30, 2022
Hey again,
Thanks!
section02_assignments.ipynb
Solution Code
NEW MESSAGE
August 30, 2022
Hey again,
Thanks!
section02_solutions.ipynb
PRO TIPS
Use .groupby() and .agg() to aggregate your data by category and push the labels into the index
Use Seaborn or the Pandas plot API for grouped bar charts
Use the “color” argument to highlight the series you’d like to focus on
Results Preview
NEW MESSAGE
September 1, 2022
Hello,
I need YOU to step up and make sure they’re happy with us.
Start by taking a quick look at room nights and lodging by
country for our top 10 countries by total nights booked.
-S
section02_assignments.ipynb
Solution Code
NEW MESSAGE
September 1, 2022
Hello,
I need YOU to step up and make sure they’re happy with us.
Start by taking a quick look at room nights and lodging by
country for our top 10 countries by total nights booked.
-S
section02_solutions.ipynb
You can create a stacked bar chart by setting the “bottom” argument for the
second “stacked” series as the values from the bars below it
• This will use those values as the baseline for the stacked bars instead of the x-axis
You can create a grouped bar chart by reducing the width of each series and
shifting them evenly around their corresponding label
You can create a combo chart by specifying different chart types in a dual axis plot
Results Preview
NEW MESSAGE
September 2, 2022
Hello,
Build a grouped bar chart with the lodging revenue and other
revenue for each country. Then, build a 100% stacked bar
chart showing how much each revenue category contributes
to overall country revenue. Add a reference line at 80% to
help illustrate which countries get less than 80% of their
revenue from lodging.
-S
section02_assignments.ipynb
Solution Code
NEW MESSAGE
September 2, 2022
Hello,
Build a grouped bar chart with the lodging revenue and other
revenue for each country. Then, build a 100% stacked bar
chart showing how much each revenue category contributes
to overall country revenue. Add a reference line at 80% to
help illustrate which countries get less than 80% of their
revenue from lodging.
-S
section02_solutions.ipynb
PRO TIPS
Keep the number of slices low (<7) to enhance readability – you can group “others” into a single slice
Use bar charts if you want to compare the categories – pies are for showing how they make up a whole
Donut charts make great KPI progress trackers
*Copyright Maven Analytics, LLC
PIE CHARTS
You can create a donut chart by adding a “hole” to a pie chart and shifting the labels
Results Preview
NEW MESSAGE
September 3, 2022
Hello,
Need it ASAP.
Thx
section02_assignments.ipynb
Solution Code
NEW MESSAGE
September 3, 2022
Hello,
Need it ASAP.
Thx
section02_solutions.ipynb
PRO TIPS
Modify the alpha (transparency) level to make overlapping points more visible
Bubble charts can be useful in some cases, but they often add confusion rather than clarity
To create a bubble chart, specify a third series in the “size” argument of .scatter()
• You may need to apply some arithmetic to adjust the bubble sizes
numerical series
PRO TIPS
Modify the alpha (transparency) level to plot multiple distributions on the same axis
Results Preview
NEW MESSAGE
September 4, 2022
section02_assignments.ipynb
Solution Code
NEW MESSAGE
September 4, 2022
section02_solutions.ipynb
Matplotlib has two methods for plotting data: PyPlot API & Object Oriented
• Both can visualize many data types (lists, DataFrames, etc.), but object-oriented plots are easier to fully customize
Key Objectives
NEW MESSAGE
September 7, 2022 1. Read in data from multiple csv files
From: Sarah Shark (Managing Director) 2. Reshape the data to prepare it for visualization
Subject: Coffee Industry Deep Dive
3. Build & customize charts to communicate the
key insights to the client
Hi there,
section03_coffee_project_part1.ipynb
Subplots let you create a grid of equally sized charts in a single figure
• fig, ax = plt.subplots(rows, columns) – this creates a grid with the specified rows & columns
Column 0 Column 1
Subplots let you create a grid of equally sized charts in a single figure
• fig, ax = plt.subplots(rows, columns) – this creates a grid with the specified rows & columns
(0, 0) (0, 1)
(1, 0) (1, 1)
Subplots let you create a grid of equally sized charts in a single figure
• fig, ax = plt.subplots(rows, columns) – this creates a grid with the specified rows & columns
Use the “sharex “& “sharey” arguments to set the same axis limits on all the plots
• This is set as “none” by default, but can be set to “all”, “row”, or “col”
Subplots can be any chart type, and do not have to be the same type
Results Preview
NEW MESSAGE
September 10, 2022
Hey there,
Wendy
Section04_assignments.ipynb
Solution Code
NEW MESSAGE
September 10, 2022
Hey there,
Wendy
Section04_solutions.ipynb
You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
Row 0
Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid
Row 0
Row 1
ax1
Row 2
Row 5
Row 6
Row 7
You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid
Row 0
Row 1
ax1 ax2
Row 2
Row 3
Row 4
Row 5
Row 6
Row 7
You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid
Row 0
Row 1
ax1 ax2
Row 2
Row 3
Row 4
Row 5
ax3
Row 6
Row 7
You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid
You can build layouts with charts of varying sizes by setting a gridspec object
• This creates a grid with a specified number of rows & columns
• Each axis, or chart, can then occupy a group of squares in the grid
Results Preview
NEW MESSAGE
September 12, 2022
Hi there,
Thanks!
section04_assignments.ipynb
Solution Code
NEW MESSAGE
September 12, 2022
GridSpec Layout (see notebook for chart code):
From: Sarah Shark (Managing Director)
Subject: Revenue Report Format
Hi there,
Thanks!
section04_solutions.ipynb
You can also loop through a list of colors to pass them to separate series in a plot
You can also modify the entire color palette for the series in a plot
rcParams are the underlying settings for Matplotlib charts and can be
modified to gain a high level of customization (more on these soon!)
For more on color palettes, visit: https://matplotlib.org/3.5.0/tutorials/colors/colormaps.html *Copyright Maven Analytics, LLC
ASSIGNMENT: COLORS
Results Preview
NEW MESSAGE
September 13, 2022
Hi again,
Love the layout, HATE the colors! Let’s show some polish by
getting away from the defaults.
Apply the “Set2” colormap to the line chart and look up the
national color hex codes for the top 5 countries to use them
for the rest of the charts.
Thanks,
Sarah
section04_assignments.ipynb
Solution Code
NEW MESSAGE
Apply Set2 (see notebook for chart code): :
September 13, 2022
Hi again,
Donut Chart
Love the layout, HATE the colors! Let’s show some polish by
getting away from the defaults.
Apply the “Set2” colormap to the line chart and look up the
national color hex codes for the top 5 countries to use them
for the rest of the charts.
Thanks,
Sarah
section04_solutions.ipynb
Matplotlib (and Seaborn) have style sheets that can be used instead of the default
Matplotlib (and Seaborn) have style sheets that can be used instead of the default
• You can still customize individual formatting options after setting a style
Matplotlib (and Seaborn) have style sheets that can be used instead of the default
• You can still customize individual formatting options after setting a style
Results Preview
NEW MESSAGE
September 14, 2022
Hi,
Layout and colors look great now, but can we spruce up the
chart styling?
Thx
-S
section04_assignments.ipynb
Solution Code
NEW MESSAGE
September 14, 2022
Style Setting Only (see notebook for chart code):
From: Sarah Shark (Managing Director)
Subject: Re: Re: Revenue Report Format
Hi,
Layout and colors look great now, but can we spruce up the
chart styling?
Thx
-S
section04_solutions.ipynb
Viewing the parameters of a style sheet can help format charts properly and provide
inspiration for your own formatting changes
There are 300+ parameters that can be modified, which fall into parameter groups:
Key Objectives
NEW MESSAGE
September 18, 2022 1. Read in data from multiple csv files
From: Clarissa Café (Coffee Client) 2. Reshape the data with Pandas to set up charts
Subject: Summary Report
3. Build and customize line charts, bar charts,
Hi there, histograms and more to communicate key
insights to our client
Sarah told me to reach out directly to you – we loved the work
you did on breaking down the industry, but we want to 4. Modify chart colors to represent national flags
summarize your findings on Brazil into a single figure we can
pass around. 5. Combine modified charts into a single report by
leveraging meshgrid and subplots
Can you combine your findings into a single figure report?
We’ll also want to modify colors. There are more details in the
attached notebook.
Thanks!
Clarissa
section05_coffee_project_part2.ipynb
In this section we’ll cover data visualization with Seaborn, another Python library that
introduces new chart types and layouts, and interacts well with Matplotlib
Seaborn is a Python library for built for easily visualizing Pandas DataFrames,
taking away some of the “drawing” required when using Matplotlib
Seaborn is a Python library for built for easily visualizing Pandas DataFrames,
taking away some of the “drawing” required when using Matplotlib
You can apply chart formatting to Seaborn plots using Matplotlib arguments
• These are passed to the Matplotlib object that Seaborn creates internally
We’ll cover integration with Matplotlib later, which is where you’ll be able to
leverage the chart formatting skills you’ve learned throughout the course
Seaborn still has some useful chart formatting functions like despine()
Note that Seaborn automatically aggregates the data for the plot, using unique category values as the labels
for the bars, the mean of each category for the bar length, and the column headers as the axis labels
Results Preview
NEW MESSAGE
September 20, 2022
Hi,
Then, build a bar chart with the average room nights stayed
for our top 5 countries.
Thanks
section06_assignments.ipynb
Solution Code
NEW MESSAGE
September 20, 2022
Hi,
The build a bar chart with the average room nights stayed for
our top 5 countries.
Thanks
section06_solutions.ipynb
Q1 Median Q3
Min Q3+1.5*IQR
Boxplot statistics:
• Median (50th percentile) Max
IQR
Results Preview
NEW MESSAGE
September 24, 2022
Hi,
Then filter the data to the top 5 countries and build a violin
plot of their lodging revenue, as well as their age distribution.
Sarah
section06_assignments.ipynb
Solution Code
NEW MESSAGE
September 24, 2022
Hi,
Then filter the data to the top 5 countries and build a violin
plot of their lodging revenue, as well as their age distribution.
Sarah
section06_solutions.ipynb
Creates a scatterplot and adds the distribution for each variable sns.jointplot(x, y, kind, data)
sns.lmplot() lets you explore the impact of other variables on the relationship
sns.lmplot() lets you explore the impact of other variables on the relationship
Results Preview
NEW MESSAGE
September 26, 2022
Hi there,
First for all the data and then for each top 5 country.
Best,
Wendy
section06_assignments.ipynb
Solution Code
NEW MESSAGE
September 26, 2022
Hi there,
First for all the data and then for each top 5 country.
Best,
Wendy
section06_solutions.ipynb
Results Preview
NEW MESSAGE
September 26, 2022
Hi there,
Thanks,
Wendy
section06_assignments.ipynb
Solution Code
NEW MESSAGE
September 26, 2022
Hi there,
Thanks,
Wendy
section06_solutions.ipynb
You can build Seaborn plots in Matplotlib objects, which lets you customize and
integrate Seaborn charts as if they were built using Matplotlib
You can build Seaborn plots in Matplotlib objects, which lets you customize and
integrate Seaborn charts as if they were built using Matplotlib
Seaborn adds new chart types that are useful in exploring data
• Boxplots, violin plots, and linear model plots help profile data and identify relationships between variables
Key Objectives
NEW MESSAGE
October 10, 2022 1. Read in and manipulate data with Pandas
From: Aaron Auto (VP of Fleet Management) 2. Build summary charts with Matplotlib and Seaborn
Subject: Optimal Fleet Truck Purchase
3. Leverage Seaborn’s advanced chart types to mine
insights from the data and make a decision
Hello,
Thanks
section07_final_project.ipynb