Python Data Vis
Python Data Vis
You will issue two plt.plot() to draw line plots of different colors on the same
set of axes. Here, year represents the x-axis, while physical_sciences and
computer_science are the y-axes.
# Import matplotlib.pyplot
import matplotlib.pyplot as plt
Using axes()
Rather than overlaying line plots on common axes, you may prefer to plot different
line plots on distinct axes. The command plt.axes() is one way to do this (but it
requires specifying coord relative to the size of the figure).
plt.axes([xlo, ylo, width, height]), a set of axes is created and made active with
lower corner at (xlo, ylo) of the specified width and height. These coordinates can
be passed to plt.axes() in the form of a list or a tuple. The coordinates and
lengths are values between 0 and 1 representing lengths relative to the dimensions
of the figure. After a plt.axes(), plots generated are put in that set of axes.
# Create plot axes for the first line plot: blue for %women Phys Sci degree
plt.axes([.05,.05,.425,.9])
plt.plot(year, physical_sciences, color='blue')
# Create plot axes for the second line plot: red %women Comp-Sci
plt.axes([.525, .05, .425, .9])
plt.plot(year, computer_science, color='red')
# Create a figure with 1x2 subplot and make the left subplot active: blue for
%women Physical-Sciences
plt.subplot(1,2,1)
plt.plot(year, physical_sciences, color='blue')
plt.title('Physical Sciences')
# Make the right subplot active in the current 1x2 subplot grid: # red % women
Computer Science
plt.subplot(1,2,2)
plt.plot(year, computer_science, color='red')
plt.title('Computer Science')
# Create a figure with 2x2 grid, and top left: # Plot in blue the % of degrees
awarded to women in the Physical Sciences
plt.subplot(2,2,1)
plt.plot(year, physical_sciences, color='blue')
plt.title('Physical Sciences')
# Plot the % of degrees awarded to women in Computer Science and the Physical
Sciences
plt.plot(year,computer_science, color='red')
plt.plot(year, physical_sciences, color='blue')
Using axis()
plt.xlim() and plt.ylim() are useful for setting the axis limits individually. In
this exercise, you will see how you can pass a 4-tuple to plt.axis() to set limits
for both axes at once. For example, plt.axis((1980,1990,0,75)) would set the extent
of the x-axis to the period between 1980 and 1990, and would set the y-axis extent
from 0 to 75% degrees award.
Using legend()
Legends are useful for distinguishing between multiple datasets displayed on common
axes. The relevant data are created using specific line colors or markers in
various plot commands. Using the keyword argument label in the plotting function
associates a string to use in a legend.
For example, here, you will plot enrollment of women in the Physical Sciences and
in Computer Science over time. You can label each curve by passing a label argument
to the plotting call, and request a legend using plt.legend(). Specifying the
keyword argument loc determines where the legend will be placed.
Using annotate()
Plot enrollment of women in the Physical Sciences and Computer science over time,
with legend. Additionally, mark the point when enrollment of women in CompSci
reached a peak and started declining using plt.annotate().
To enable an arrow, set arrowprops=dict(facecolor='black'). The arrow will point to
the location given by xy and the text will appear at the location given by xytext.
Modifying styles
Matplotlib comes with a number of different stylesheets to customize the overall
look of different plots. To activate a particular stylesheet you can simply call
plt.style.use() with the name of the style sheet you want. To list all the
available style sheets you can execute: print(plt.style.available).
# Import matplotlib.pyplot
import matplotlib.pyplot as plt
# Add annotation
cs_max = computer_science.max()
yr_max = year[computer_science.argmax()]
plt.annotate('Maximum', xy=(yr_max, cs_max), xytext=(yr_max-1, cs_max-10),
arrowprops=dict(facecolor='black'))
Generating meshes
To visualize two-dimensional arrays of data, it is necessary to understand how to
generate and manipulate 2-D arrays. Many Matplotlib plots support arrays as input
and in particular, they support NumPy arrays. The NumPy library is the most widely-
supported means for supporting numeric arrays in Python.
Use the meshgrid function in NumPy to generate 2-D arrays. then visualize using
plt.imshow(). The simplest way to generate a meshgrid is as follows:
import numpy as np
Y,X = np.meshgrid(range(10),range(20))
This will create two arrays with a shape of (20,10), 20 rows along the Y-axis and
10 columns along the X-axis.
import numpy as np
import matplotlib.pyplot as plt
Array orientation
matrix picture
The commands
plt.pcolor(A, cmap='Blues')
plt.colorbar()
plt.show()
produce the pseudocolor plot above using a Numpy array A. Which of the commands
below could have generated A?
In this exercise, you will visualize a 2-D array repeatedly using both
plt.contour() and plt.contourf(). You will use plt.subplot() to display several
contour plots in a common figure, using the meshgrid X, Y as the axes. For example,
plt.contour(X, Y, Z) generates a default contour map of the array Z.
Modifying colormaps
When displaying a 2-D array with plt.imshow() or plt.pcolor(), the values of the
array are mapped to a corresponding color. The set of colors used is determined by
a colormap which smoothly maps values to colors, making it easy to understand the
structure of the data at a glance.
It is often useful to change the colormap from the default 'jet' colormap used by
matplotlib. A good colormap is visually pleasing and conveys the structure of the
data faithfully and in a way that makes sense for the application.
matplotlib colormaps
the option cmap=<name> in most matplotlib functions change the color map of the
resulting plot.
unique names 'jet', 'coolwarm', 'magma' and 'viridis'.
overall color 'Greens', 'Blues', 'Reds', and 'Purples'.
seasons 'summer', 'autumn', 'winter' and 'spring'.
Using hist2d()
Given a set of ordered pairs describing data points, you can count the number of
points with similar values to construct a two-dimensional histogram. This is
similar to a one-dimensional histogram, but it describes the joint variation of two
random variables rather than just one.
Using hexbin()
The function plt.hist2d() uses rectangular bins to construct a two dimensional
histogram. As an alternative, the function plt.hexbin() uses hexagonal bins. The
underlying algorithm (based on this article from 1987) constructs a hexagonal
tesselation of a planar region and aggregates points inside hexagonal bins.
The optional gridsize argument (default 100) gives the number of hexagons across
the x-direction used in the hexagonal tiling. If specified as a list or a tuple of
length two, gridsize fixes the number of hexagon in the x- and y-directions
respectively in the tiling.
The optional parameter extent=(xmin, xmax, ymin, ymax) specifies rectangular region
covered by the hexagonal tiling. In that case, xmin and xmax are the respective
lower and upper limits for the variables on the x-axis and ymin and ymax are the
respective lower and upper limits for the variables on the y-axis.
In this exercise, you'll use the same auto-mpg data as in the last exercise (again
using arrays mpg and hp). This time, you'll use plt.hexbin() to visualize the two-
dimensional histogram.
To read an image from file, use plt.imread() by passing the path to a file, such as
a PNG or JPG file.
The color image can be plotted as usual using plt.imshow().
The resulting image loaded is a NumPy array of three dimensions. The array
typically has dimensions M×N×3, where M×N is the dimensions of the image. The third
dimensions are referred to as color channels (typically red, green, and blue).
The color channels can be extracted by Numpy array slicing.
In this exercise, you will load & display an image of an astronaut (by NASA (Public
domain), via Wikimedia Commons). You will also examine its attributes to understand
how color images are represented.
In this exercise, you will perform a simple analysis using the image showing an
astronaut as viewed from space. Instead of simply displaying the image, you will
compute the total intensity across the red, green and blue channels. The result is
a single two dimensional array which you will display using plt.imshow() with the
'gray' colormap.
# Compute the sum of the red, green and blue channels: intensity
intensity = img.sum(axis=2)
# Add a colorbar
plt.colorbar()
# Specify the extent and aspect ratio of the top left subplot
plt.subplot(2,2,1)
plt.title('extent=(-1,1,-1,1),\naspect=0.5')
plt.xticks([-1,0,1])
plt.yticks([-1,0,1])
plt.imshow(img, extent=(-1,1,-1,1), aspect=0.5)
# Specify the extent and aspect ratio of the top right subplot
plt.subplot(2,2,2)
plt.title('extent=(-1,1,-1,1),\naspect=1')
plt.xticks([-1,0,1])
plt.yticks([-1,0,1])
plt.imshow(img, extent=(-1,1,-1,1), aspect=1)
# Specify the extent and aspect ratio of the bottom left subplot
plt.subplot(2,2,3)
plt.title('extent=(-1,1,-1,1),\naspect=2')
plt.xticks([-1,0,1])
plt.yticks([-1,0,1])
plt.imshow(img, extent=(-1,1,-1,1), aspect=2)
# Specify the extent and aspect ratio of the bottom right subplot
plt.subplot(2,2,4)
plt.title('extent=(-2,2,-1,1),\naspect=2')
plt.xticks([-2,-1,0,1,2])
plt.yticks([-1,0,1])
plt.imshow(img, extent=(-2,2,-1,1), aspect=2)
# Extract minimum and maximum values from the image: pmin, pmax
pmin, pmax = image.min(), image.max()
print("The smallest & largest pixel intensities are %d & %d." % (pmin, pmax))
# Generate a green residual plot of the regression between 'hp' and 'mpg'
sns.residplot(x='hp', y='mpg', data=auto, color='green')
plt.show()
Higher-order regressions
When there are more complex relationships between two variables, a simple first
order regression is often not sufficient to accurately capture the relationship
between the variables. Seaborn makes it simple to compute and visualize regressions
of varying orders.
Here, you will plot a second order regression between the horse power ('hp') and
miles per gallon ('mpg') using sns.regplot() (the function sns.lmplot() is a
higher-level interface to sns.regplot()). However, before plotting this
relationship, compare how the residual changes depending on the order of the
regression. Does a second order regression perform significantly better than a
simple linear regression?
In the automobile dataset - which has been pre-loaded here as auto - you can view
the relationship between weight ('weight') and horsepower ('hp') of the cars and
group them by their origin ('origin'), giving you a quick visual indication how the
relationship differs by continent.
# linear regr between 'weight' and 'hp', hue of 'origin' and palette 'Set1'
sns.lmplot(x='weight', y='hp', data=auto, hue='origin', palette='Set1')
plt.show()
You'll use the automobile dataset again and, this time, you'll use the keyword
argument row to display the subplots organized in rows. That is, you'll produce
horsepower vs. weight regressions grouped by continent of origin in separate
subplots stacked vertically.
The strip plot is one way of visualizing this kind of data. It plots the
distribution of variables for each category as individual datapoints. For vertical
strip plots (the default), distributions of continuous values are laid out parallel
to the y-axis and the distinct categories are spaced out along the x-axis.
For example, sns.stripplot(x='type', y='length', data=df) produces a sequence of
vertical strip plots of length distributions grouped by type (assuming length is a
continuous column and type is a categorical column of the DataFrame df).
Overlapping points can be difficult to distinguish in strip plots. The argument
jitter=True helps spread out overlapping points.
Other matplotlib arguments can be passed to sns.stripplot(), e.g., marker, color,
size, etc.
# Make the strip plot again using jitter and a smaller point size
plt.subplot(2,1,2)
sns.stripplot(x='cyl', y='hp', data=auto, size=3, jitter=True)
plt.show()
# Generate the same violin plot again with a color of 'lightgray' and without inner
annotations
plt.subplot(2,1,2)
sns.violinplot(x='cyl', y='hp', data=auto, inner=None, color='lightgray')
plt.show()
A scatter plot using the specified columns x and y from the DataFrame data.
A (univariate) histogram along the top of the scatter plot showing distribution of
the column x.
A (univariate) histogram along the right of the scatter plot showing distribution
of the column y.
# Plot the pairwise joint distributions grouped by 'origin' along with regression
lines
sns.pairplot(auto, kind='reg', hue='origin')
plt.show()
In this exercise, you will view the covariance matrix between the continuous
variables in the auto-mpg dataset. You do not have to know here how the covariance
matrix is computed; the important point is that its diagonal entries are all 1s,
and the off-diagonal entries are between -1 and +1 (quantifying the degree to which
variable pairs vary jointly). It is also, then, a symmetric matrix.
# Plot the aapl time series in blue, ibm green, csco red, msft magenta
plt.plot(aapl, color='blue', label='AAPL')
plt.plot(ibm, color='green', label='IBM')
plt.plot(csco, color='red', label='CSCO')
plt.plot(msft, color='magenta', label='MSFT')
Unlike slicing from standard Python lists, tuples, and strings, when slicing time
series by labels (and other pandas Series & DataFrames by labels), the slice
includes the right-most portion of the slice.
You can use partial strings or datetime objects for indexing and slicing from time
series.
For this exercise, you will use time series slicing to plot the time series aapl
over its full 11-year range and also over a shorter 2-year range. You'll arrange
these plots in a 2×1 grid of subplots
Partial string indexing works without slicing as well. For instance, using
my_time_series['1995'], my_time_series['1999-05'], and my_time_series['2000-11-04']
respectively extracts views of the time series my_time_series corresponding to the
entire year 1995, the entire month May 1999, and the entire day November 4, 2000.
# Slice aapl from Nov. 2007 to Apr. 2008 inclusive: view, # January 2008: view2
view_1 = aapl['2007-11':'2008-04']
view_2 = aapl['2008-01']
# Plot the 30-day moving average in the top left subplot in green
plt.subplot(2,2,1)
plt.plot(mean_30, color='green')
plt.plot(aapl, 'k-.')
plt.xticks(rotation=60)
plt.title('30d averages')
# Plot the 75-day moving average in the top right subplot in red
plt.subplot(2,2,2)
plt.plot(mean_75, 'red')
plt.plot(aapl, 'k-.')
plt.xticks(rotation=60)
plt.title('75d averages')
# Plot the 125-day moving average in the bottom left subplot in magenta
plt.subplot(2, 2, 3)
plt.plot(mean_125, 'magenta')
plt.plot(aapl, 'k-.')
plt.xticks(rotation=60)
plt.title('125d averages')
# Plot the 250-day moving average in the bottom right subplot in cyan
plt.subplot(2,2,4)
plt.plot(mean_250, 'cyan')
plt.plot(aapl, 'k-.')
plt.xticks(rotation=60)
plt.title('250d averages')
plt.show()
The time series aapl is not plotted in this case; it is of a different length scale
than the standard deviations.
The time series std_30, std_75, stdn_125, & std_250 have been computed for you
(containing the windowed standard deviations of the series aapl computed over
windows of width 30 days, 75 days, 125 days, & 250 days respectively).
For this exercise, you will load an unequalized low contrast image of Hawkes Bay,
New Zealand (originally by Phillip Capper, modified by User:Konstable, via
Wikimedia Commons, CC BY 2.0). You will plot the image and use the pixel intensity
values to plot a normalized histogram of pixel intensities.
Your task here is to plot the PDF and CDF of pixel intensities from a grayscale
image. You will use the grayscale image of Hawkes Bay, New Zealand (originally by
Phillip Capper, modified by User:Konstable, via Wikimedia Commons, CC BY 2.0).
# Specify x-axis range, hide axes, add title and display plot
plt.xlim((0,256))
plt.grid('off')
plt.title('PDF & CDF (original image)')
plt.show()
For this exercise, you will again work with the grayscale image of Hawkes Bay, New
Zealand (originally by Phillip Capper, modified by User:Konstable, via Wikimedia
Commons, CC BY 2.0). Notice the sample code produces the same plot as the previous
exercise. Your task is to modify the code from the previous exercise to plot the
new equalized image as well as its PDF and CDF.
For this final exercise, you will use the same color image of the Helix Nebula as
seen by the Hubble and the Cerro Toledo Inter-American Observatory. The separate
RGB (red-green-blue) channels will be extracted for you as one-dimensional arrays
red_pixels, green_pixels, & blue_pixels respectively.
####
####
#### VIX BOKEH
Your job is to create a figure, assign x-axis and y-axis labels, and plot
female_literacy vs fertility using the circle glyph.
After you have created the figure, in this exercise and the ones to follow, play
around with it! Explore the different options available to you on the tab to the
right, such as "Pan", "Box Zoom", and "Wheel Zoom". You can click on the question
mark sign for more details on any of these tools.
Note: You may have to scroll down to view the lower portion of the figure.
# Call the output_file() function and specify the name of the file
output_file('fert_lit.html')
In this exercise, you will plot female literacy vs fertility for two different
regions, Africa and Latin America. Each set of x and y data has been loaded
separately for you as fertility_africa, female_literacy_africa,
fertility_latinamerica, and female_literacy_latinamerica. Plot the Latin America
data with the circle() glyph, and the Africa data with the x() glyph. figure has
already been imported for you from bokeh.plotting.
Lines
We can draw lines on Bokeh plots with the line() glyph function.
In this exercise, you'll plot the daily adjusted closing price of Apple Inc.'s
stock (AAPL) from 2000 to 2013.
The data points are provided for you as lists. date is a list of datetime objects
to plot on the x-axis and price is a list of prices to plot on the y-axis.
Since we are plotting dates on the x-axis, you must add x_axis_type='datetime' when
creating the figure object.
# Import figure from bokeh.plotting
from bokeh.plotting import figure
# Plot date along the x axis and price along the y axis
p.line(date, price)
# Specify the name of the output file and show the result
output_file('line.html')
show(p)
# With date on the x-axis and price on the y-axis, add a white circle glyph of size
4
p.circle(date, price, fill_color='white', size=4)
# Specify the name of the output file and show the result
output_file('line.html')
show(p)
Patches
In Bokeh, extended geometrical shapes can be plotted by using the patches() glyph
function. The patches glyph takes as input a list-of-lists collection of numeric
values specifying the vertices in x and y directions of each distinct patch to
plot.
In this exercise, you will plot the state borders of Arizona, Colorado, New Mexico
and Utah. The latitude and longitude vertices for each state have been prepared as
lists. Your job is to plot longitude on the x-axis and latitude on the y-axis. The
figure object has been created for you as p.
# Specify the name of the output file and show the result
output_file('four_corners.html')
show(p)
In this exercise, you'll generate NumPy arrays using np.linspace() and np.cos() and
plot them using the circle glyph.
For more information on NumPy functions, you can refer to the NumPy User Guide and
NumPy Reference.
# Import numpy as np
import numpy as np
# Specify the name of the output file and show the result
output_file('numpy.html')
show(p)
The CSV file is provided for you as 'auto.csv'. Your job is to plot miles-per-
gallon (mpg) vs horsepower (hp) by passing Pandas column selections into the
p.circle() function. Additionally, each glyph will be colored according to values
in the color column.
# Import pandas as pd
import pandas as pd
# Read in the CSV file: df
df = pd.read_csv('auto.csv')
# Specify the name of the output file and show the result
output_file('auto-df.html')
show(p)
# Specify the name of the output file and show the result
output_file('sprint.html')
show(p)
You'll use the ColumnDataSource object of the Olympic Sprint dataset you made in
the last exercise. It is provided to you with the name source. After you have
created the figure, be sure to experiment with the Box Select tool you added! As in
previous exercises, you may have to scroll down to view the lower portion of the
figure.
# Add circle glyphs to the figure p with the selected and non-selected properties
p.circle('Year','Time', source=source, selection_color='red',
nonselection_alpha=.1)
# Specify the name of the output file and show the result
output_file('selection_glyph.html')
show(p)
Hover glyphs
Now let's practice using and customizing the hover tool.
In this exercise, you're going to plot the blood glucose levels for an unknown
patient. The blood glucose levels were recorded every 5 minutes on October 7th
starting at 3 minutes past midnight.
The date and time of each measurement are provided to you as x and the blood
glucose levels in mg/dL are provided as y. A bokeh figure is also provided in the
workspace as p. Your job is to add a circle glyph that will appear red when the
mouse is hovered near the data points. You will also add a customized hover tool
object to the plot.
# Specify the name of the output file and show the result
output_file('hover_glyph.html')
show(p)
Colormapping
The final glyph customization we'll practice is using the CategoricalColorMapper to
color each glyph by a categorical property. Here, you're going to use the
automobile dataset to plot miles-per-gallon vs weight and color each circle glyph
by the region where the automobile was manufactured.
The origin column will be used in the ColorMapper to color automobiles manufactured
in the US as blue, Europe as red and Asia as green. The automobile data set is
provided to you as a Pandas DataFrame called df. The figure is provided for you as
p.
# Specify the name of the output file and show the result
output_file('colormap.html')
show(p)
In this exercise and the ones to follow, you may have to scroll down to view the
lower portion of the figure.
In this exercise, you'll make a 3-plot layout in two rows using the auto-mpg data
set.
Three plots have been created for you of average mpg vs year, mpg vs hp, and mpg vs
weight.
Your job is to use the column() and row() functions to make a two-row layout where
the first row will have only the average mpg vs year plot and the second row will
have mpg vs hp and mpg vs weight plots as columns.
By using the sizing_mode argument, you can scale the widths to fill the whole
figure.
# Make a column layout that will be used as the second row: row2
row2 = column([mpg_hp, mpg_weight], sizing_mode='scale_width')
# Make a row layout that includes the above column layout: layout
layout = row([avg_mpg, row2], sizing_mode='scale_width')
In this example, you're going to display four plots of fertility vs female literacy
for four regions: Latin America, Africa, Asia and Europe.
Your job is to create a list-of-lists for the four Bokeh plots that have been
provided to you as p1, p2, p3 and p4. The list-of-lists defines the row and column
placement of each plot.
Linked axes
Linking axes between plots is achieved by sharing range objects.
In this exercise, you'll link four plots of female literacy vs fertility so that
when one plot is zoomed or dragged, one or more of the other plots will respond.
The four plots p1, p2, p3 and p4 along with the layout that you created in the last
section have been provided for you.
Your job is link p1 with the three other plots by assignment of the .x_range and
.y_range attributes.
After you have linked the axes, explore the plots by clicking and dragging along
the x or y axes of any of the plots, and notice how the linked plots change
together.
Linked brushing
By sharing the same ColumnDataSource object between multiple plots, selection tools
like BoxSelect and LassoSelect will highlight points in both plots that share a row
in the ColumnDataSource.
After you have built the figure, experiment with the Lasso Select and Box Select
tools. Use your mouse to drag a box or lasso around points in one figure, and
notice how points in the other figure that share a row in the ColumnDataSource also
get highlighted.
Before experimenting with the Lasso Select, however, click the Bokeh plot pop-out
icon to pop out the figure so that you can definitely see everything that you're
doing.
Two ColumnDataSources called latin_america and africa have been provided. Plot two
circle glyphs for these two objects. The figure p has been provided for you.
In this exercise, you'll adjust the background color and legend location of the
female literacy vs fertility plot from the previous exercise.
The figure object p has been created for you along with the circle glyphs.
In this exercise, you will create a HoverTool object and display the country for
each circle glyph in the figure that you created in the last exercise. This is done
by assigning the tooltips keyword argument to a list-of-tuples specifying the label
and the column of values from the ColumnDataSource using the @ operator.
What sort of properties can the Bokeh server automatically keep in sync? Bokeh
server will automatically keep every property of any Bokeh object in sync.
In the video, Bryan described the process for running a Bokeh app using the bokeh
serve command line tool. In this chapter and the one that follows, the DataCamp
environment does this for you behind the scenes. Notice that your code is part of a
script.py file. When you hit 'Submit Answer', you'll see in the IPython Shell that
we call bokeh serve script.py for you.
Remember, as in the previous chapters, that there are different options available
for you to interact with your plots, and as before, you may have to scroll down to
view the lower portion of the plots.
Your job here is to create a single slider, use it to create a widgetbox layout,
and then add this layout to the current document.
The slider you create here cannot be used for much, but in the later exercises,
you'll use it to update your plots!
Your job in this exercise is to create two sliders, add them to a widgetbox layout,
and then add the layout into the current document.
After you are done, notice how in the figure you generate, the slider will not
actually update the plot, because a widget callback has not been defined. You'll
learn how to update the plot using widget callbacks in the next exercise.
All the necessary modules have been imported for you. The plot is available in the
workspace as plot, and the slider is available as slider.
Your job in this exercise is to use the slider's on_change() function to update the
plot's data from the previous example. NumPy's sin() function will be used to
update the y-axis data of the plot.
Now that you have added a widget callback, notice how as you move the slider of
your app, the figure also updates!
The ColumnDataSource source has been created for you along with the plot. Your job
in this exercise is to add a drop down menu to update the plot's data.
Button widgets
It's time to practice adding buttons to your interactive visualizations. Your job
in this exercise is to create a button and use its on_click() method to update a
plot.
All necessary modules have been imported for you. In addition, the ColumnDataSource
with data x and y as well as the figure have been created for you and are available
in the workspace as source and plot.
Button styles
You can also get really creative with your Button widgets.
In this exercise, you'll practice using CheckboxGroup, RadioGroup, and Toggle to
add multiple Button widgets with different styles. curdoc and widgetbox have
already been imported for you.
It is always a good idea to begin with some Exploratory Data Analysis. Pandas has a
number of built-in methods that help with this. For example, data.head() displays
the first five rows/entries of data, while data.tail() displays the last five
rows/entries. data.shape gives you information about how many rows and columns
there are in the data set. Another particularly useful method is data.info(), which
provides a concise summary of data, including information about the number of
entries, columns, data type of each column, and number of non-null entries in each
column.
Use the IPython Shell and the pandas methods mentioned above to explore this data
set. How many entries and columns does this data set have?
data.shape
As in the previous chapter, the DataCamp environment executes the bokeh serve
command to run the app for you. When you hit 'Submit Answer', you'll see in the
IPython Shell that bokeh serve script.py gets called to run the app. This is
something to keep in mind when you are creating your own interactive visualizations
outside of the DataCamp environment.
# Save the minimum and maximum values of the fertility column: xmin, xmax
xmin, xmax = min(data.fertility), max(data.fertility)
# Save the minimum and maximum values of the life expectancy column: ymin, ymax
ymin, ymax = min(data.life), max(data.life)
Your job is to make a list of the unique regions from the data frame, prepare a
ColorMapper, and add it to the circle glyph.
# Make a list of the unique values from the region column: regions_list
regions_list = data.region.unique().tolist()
# Add the plot to the current document and add the title
curdoc().add_root(plot)
curdoc().title = 'Gapminder'
After you are done, you may have to scroll to the right to view the entire plot. As
you play around with the slider, notice that the title of the plot is not updated
along with the year. This is something you'll fix in the next exercise!
# Make a row layout of widgetbox(slider) and plot and add it to the current
document
layout = row(widgetbox(slider), plot)
curdoc().add_root(layout)
In Python, you can format strings by specifying placeholders with the % keyword.
For example, if you have a string company = 'DataCamp', you can use print('%s' %
company) to print DataCamp. Placeholders are useful when you are printing values
that are not static, such as the value of the year slider. You can specify a
placeholder for a number with %d. Here, when you're updating the plot title inside
your callback function, you should make use of a placeholder so that the year
displayed is in accordance with the value of the year slider.
In addition to updating the plot title, you'll also create the callback function
and slider as you did in the previous exercise, so you get a chance to practice
these concepts further. All necessary modules have been imported for you, and as in
the previous exercise, you may have to scroll to the right to view the entire
figure.
# Make a row layout of widgetbox(slider) and plot and add it to the current
document
layout = row(widgetbox(slider), plot)
curdoc().add_root(layout)
After you're done, experiment with the hover tool and see how it displays the name
of the country when your mouse hovers over a point! The figure and slider have been
created for you and are available in the workspace as plot and slider.
All necessary modules have been imported, and the previous code you wrote is taken
care of. In the provided sample code, the dropdown for selecting features on the x-
axis has been added for you. Using this as a reference, your job in this final
exercise is to add a dropdown menu for selecting features on the y-axis.
Take a moment, after you are done, to enjoy exploring the visualization by
experimenting with the hover tools, sliders, and dropdown menus that you have
learned how to implement in this course.