Data Visualization
Data Visualization
Overview
Visualization Libraries And Modules
Version Overview
Visualization Plot Types
• Line Chart
•
Visualization Libraries And Modules
Libraries and Description Doc Link Installation Import Statement
Modules Link
Pandas pandas is an open source software library Link Installation import pandas as pd
providing high-performance, easy-to-use data
structures and data analysis tools for the
Python programming language.
NumPy NumPy is the fundamental package for Link Installation import numpy as np
scientific computing with Python providing
multidimensional array object, various derived
objects such as masked arrays and matrices.
Plotly Plotly python open source graphing library Link Installation import plotly.express as px
makes interactive graphs online import plotly.graph_objects
as go
Bokeh Bokeh is a python based interactive Link Installation from bokeh.plotting import
visualization library that targets web browsers figure, show
to present very large and streaming datasets.
Version Overview
Version What’s New in Each Version?
2. matplotlib.ticker.FuncFormatter(func) : Uses user defined function for label formatting. For details click here.
3.ax.set_yticks(self, ticks, minor=False) : Set the y ticks with list of ticks. It the parameter minor is False sets major
ticks, if True sets major ticks. Default is False.
4.ax.set_yticklabels(self, labels, fontdict=None, minor=False, **kwargs) : Sets the y-tick labels with list of strings
labels. For details click here.
5. ax.invert_yaxis(self) : Inverts the y-axis.
Horizontal Bar Chart : Example
Example : Displays Market-Cap of different technology industries. Data were collected from Yahoo Finance . See full code
on github-horizontalbar.
Stacked Bar Chart
Functions for plotting stacked bar chart :
1. matplotlib.pyplot.bar(x, height, width, bottom=None, align='center', data=None, **kwargs)
Make a Stacked Bar Chart. For more details, click here.
Parameters: x: Sequence of scalars; the bars are positioned at x with the given alignment.
height, Scalar or sequence of scalar or array like; the dimensions of bar are set by these
parameters.
width :
Scalar or array like; the vertical baseline is bottom (default 0). In the given example, we
botto have set bottom=revenue to plot stacked bar chart.
m:
align : Alignment of the bars to the x coordinates; {‘center’, ‘edge’}, default(‘center’).
Parameters: x: Sequence of scalars; the bars are positioned at x with the given alignment. In the given
example, for bar1, we have set (x = x - width/2 ) and for bar2 (x = x + width/2) to plot a
grouped bar.
height, Scalar or sequence of scalar or array like; the dimensions of bar are set by these
parameters.
width :
Scalar or array like; the vertical baseline is bottom (default 0).
bottom :
Alignment of the bars to the x coordinates; {‘center’, ‘edge’}, default(‘center’).
align :
2.matplotlib.pyplot.xticks(ticks=None, labels=None, **kwargs) : Get or set the current tick locations and labels of
the x-axis. For more details, click here.
3. Axes.yaxis.set_major_formatter(formatter) : Provides Configurable tick locating and formatting.
4. matplotlib.ticker.FuncFormatter(func) : Use user defined function for label formatting. For details click here.
5. autolabel(bars) : Attach a text label above each bar, displaying its height.
Grouped Bar Chart: Example
Example : Comparison between Microsoft's Revenue and Earnings (in billions) for the year 2010-2015. Data were collected
from Yahoo Finance. See full code on github-groupedbar.
Line Graph
Line Chart displays time-series relationships with continuous data.
Parameters: x, y : array-like or scalar; the coordinates of the points or line nodes are given by x, y.
fmt : str, optional; a format string, e.g., ‘ro’ for red circles.
data : indexable object, optional; An object with label data.
**kw Used to specify properties like line label (for auto legend), line width, antialiasing,
args marker face color.
2. plt.setps(obj, *args, **kwargs) : set the property on an artist object. For more details, click here.
Pie Chart : Example
Example : Pie Chart of total residential electricity usage of California Counties (1990-2015). Source: California Electricity
Consumption Database. Data were collected from data.ca.gov. All Usage Expressed in Millions of kWh (GWh). See full code
on github-piechart.
Histogram
Histogram displays the underlying frequency distribution of a set of continuous data (univariate data).
Parameters: x: (n,) array or sequence of (n,) arrays. Input values, this takes either a single array or a
sequence of arrays which are not required to be of same length.
bins : int or sequence of str, optional. If an integer is given, bins + 1 bin edges are
calculated and returned.
If bins is a sequence, gives bin edges, including left edge of first bin and right edge of
last bin.
density : Bool, optional; if True, the area under histogram will sum to 1.
Other parameters: weights, cumulative, bottom, histtype, align, orientation, rwidth, log, label, stack,
normed, data, **kwargs. Details here.
Histogram : Example
Example : Histogram of NBA player’s weight. See full code on github-histogram.
Density Plot
Density Plot displays the univariate distribution of data.
Uses a kernel density estimate to show the probability density function (PDF) of the variable.
Parameters: x: int or str; the column name or column position to be used as horizontal
coordinates for each point.
y: int or str; the column name or column position to be used as vertical
coordinates for each point.
s: Scalar or array-like, optional; the size of each point.
c: Scalar or array-like, optional; the color of each point.
*kwargs : Keyword arguments to pass on to DataFrame.plot().
Scatter Plot : Example
Example : Scatter plot using iris dataset. In the given example, the scatter plot shows petal and sepal distribution for each
species. Data were collected from uci-machine-learning-repository. See full code on github-scatter.
Boxplot
Box Plot displays the distribution of data through their quartiles (minimum, first quartile(Q1), median, third
quartile(Q3), and maximum).
Displays outliers with their values.
Histogram with Kde Plot seaborn.distplot(a, bins=None, hist=True, kde=True, rug=False, hist_kws=None, … )
The kind parameter in function catplot() selects the underlying functions to plot:
Categorical scatterplots: kind=‘strip’ for stripplot(); kind=‘swarm’ for
swarmplot()
Categorical distribution plots: kind=‘box’ for boxplot(); kind=‘violin’ for violinplot(); kind=‘boxen’ for boxenplot()
Categorical estimate plots: kind=‘point’ for pointplot(); kind=‘bar’ for barplot(); kind=‘count’ for countplot
Parameters: x, y, hue: Name of variables in data; inputs for plotting long-form data.
data : DataFrame; long-form (tidy) dataset.
kind : String, optional; the kind of plot to draw (bar, strip, swarm, box, violin, boxen).
Barplot using catplot(): Example
Example : Displays Titanic survival probability for class and sex. See full code on github-seaborn-catplot.
Boxplot and Boxen plot using catplot()
kind=‘box’ kind=‘boxen’
Strip plot and Swarm plot using catplot()
kind=‘strip’ kind=‘swarm’
Histogram, Density Plot, Rug Plot using distplot()
Parameters: data : DataFrame; Tidy (long-form) dataframe where each column is a variable and each
row is an observation.
hue : String (variable name), optional. Variable in data to map plot aspects to different
colors.
hue_order : List of strings. Order for the levels of the hue variable in the palette.
palette : dict or seaborn color palette. Set of colors for mapping the hue variable. If a dict,
keys should be values in the hue variable.
vars : List of variable names, optional. Variables within data to use, otherwise use every
column with a numeric datatype.
pairplot(): Example
Example : Displays pairplot using iris dataset. See full code on github-seaborn-pairplot.
Summary
Plot Types API Call
Categorical Bar plot seaborn.catplot(x=None, y=None, hue=None, data=None, kind=‘bar’, … )
data Strip plot
Visualization: Swarm
plot
Count plot
Boxplot
Boxen plot
Violin plot
Distribution Histogram seaborn.distplot(a, bins=None, hist=True, kde=True, rug=False,
of data: Density hist_kws=None, kde_kws, rug_kws=None, … )
plot Rug
plot
Density plot seaborn.kdeplot(data, data2=None, shade=False, vertical=False,
kernel=‘gau’, … )
Pairplot seaborn.pairplot(data, hue=None, hue_order=None, palette=None,
vars=None, … )
Jointplot seaborn.jointplot(x, y, data=None, kind=‘scatter’, stat_func=None,
color=None, … )
Linear Scatter plot seaborn.regplot(x, y, data=None, x_estimator=None, x_bins=None,
relationship with fit_reg=True, … )
regression seaborn.lmplot(x, y, data, hue=None, col=None, … )
line