Python Dataviz
Python Dataviz
Python Dataviz
1 / 16
Data Visualization
2 / 16
Data Visualization in Python
You already know basic concepts of visualization, and there are many
courses that go in depth. Here we’ll learn how to manipulate the data
and parameters of the visualizations available in the SciPy stack.
I Matplotlib
I Matlab-like plotting interface
I The granddaddy of all scientific plotting in Python
I Powerful, low-level
I Built on NumPy arrays
I Seaborn
I Higher-level API on top of Matplotlib
I Integrates with Pandas DataFrames
I Bokeh or Plotly/Dash
I Interactive visualizations like D3
3 / 16
Matplotlib
Standard import:
import matplotlib.pyplot as plt
Three contexts:
I Python script: (example)
Now any plot command opens a figure window. Force redraw with
plt.draw().
I Jupyter Notebook, two options:
%matplotlib notebook
Embed static plot images in notebook. We’ll usually use this option.
4 / 16
Matlab-style Interface
It’s easy to throw up a simple plot using the stateful Matlab-style
interface:
xs = np.linspace(0, 10, 100)
plt.figure()
plt.plot(xs, np.sin(xs))
6 / 16
Figures
The commands on the previous slide would produce this:
7 / 16
plt.subplots
Matplotlib includes a convenience method for making subplots.
In [20]: fig, axes = plt.subplots(2, 3)
11 / 16
Ticks and Labels
Ticks and labels are set automatically but can be customized.
fig, ax = plt.subplots(1, 1)
ax.plot(xs, np.sin(xs), "-r")
ax.set_xticklabels(["wake up", "coffee kicks in", "afternoon class",
"afternoon espresso", "party time!", "sleepy time"],
rotation=45, fontsize="small")
ax.set_title("Student Biorhythm")
12 / 16
SPX Example
1
To start, download historical S&P 500 index data: spx.csv.
spx = pd.read_csv(’spx.csv’, index_col=0, parse_dates=True)
fig, ax = plt.subplots(1,1)
ax.plot(spx.index, spx[’SPX’])
https://github.com/wesm/pydata-book
13 / 16
Annotations
Define annotation data (note use of collections.namedtuple).
import collections
Annotation = collections.namedtuple(’Annotation’, [’label’, ’date’])
events = [Annotation(label="Peak bull market", date=dt.datetime(2007, 10, 11)),
Annotation(label="Bear Stearns fails", date=dt.datetime(2008, 3, 12)),
Annotation(label="Lehman bankruptcy", date=dt.datetime(2008, 9, 15))]
2
Zoom in on period of interest and add annotations.
ax.set(xlim=[’1/1/2007’, ’1/1/2011’], ylim=[600, 1800])
for event in events:
ax.annotate(event.label,
xy=(event.date, spx.asof(event.date) + 20),
xytext=(event.date, spx.asof(event.date) + 200),
arrowprops=dict(facecolor="black",
headwidth=4, width=1, headlength=4),
horizontalalignment="left", verticalalignment="top")
15 / 16
Saving Plots to Files
The graphics encoding is inferred by the file extension (after the last ".").
You can find the supported file types and associated file name extensions
on your system by:
In [77]: fig.canvas.get_supported_filetypes()
Out[77]:
{’eps’: ’Encapsulated Postscript’,
’pdf’: ’Portable Document Format’,
’pgf’: ’PGF code for LaTeX’,
’png’: ’Portable Network Graphics’,
’ps’: ’Postscript’,
’raw’: ’Raw RGBA bitmap’,
’rgba’: ’Raw RGBA bitmap’,
’svg’: ’Scalable Vector Graphics’,
’svgz’: ’Scalable Vector Graphics’}
Note that there’s no need to show the figure before saving it.
Source: https://vis.gatech.edu/
16 / 16