Data Visualization

Download as pdf or txt
Download as pdf or txt
You are on page 1of 66

Data Visualization

Try this code:

import matplotlib.pyplot as plt


fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot([1, 2, 3, 4], [10, 20, 25, 30], color='lightblue', linewidth=3)
ax.scatter([0.3, 3.8, 1.2, 2.5], [11, 25, 9, 26], color='darkgreen', marker='^')
ax.set_xlim(0.5, 4.5)
plt.show()
Try this code:

import matplotlib.pyplot as plt

# Pie chart, where the slices will be ordered and plotted counter-clockwise:
labels = 'Frogs', 'Hogs', 'Dogs', 'Logs'
sizes = [15, 30, 45, 10]
explode = (0, 0.1, 0, 0) # only "explode" the 2nd slice (i.e. 'Hogs')

fig1, ax1 = plt.subplots()


ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
shadow=True, startangle=90)
ax1.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()
Try this code:
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame({'GeoArea':[83743,78438,22327,22429,21081,16579,10486],'ForestCover':[67353,27692,1
7280,17321,19240,13464,8073]},
index=['Arunachal Pradesh','Assam','Manipur','Meghalaya','Mizoram','Nagaland','Tripura'])
df.plot(kind='pie',y='ForestCover',title='Forest cover of North Eastern states',legend=False)
plt.show()
Plotting with PyPlot I –
Bar Graphs and Scatter Plots
Data Visualization basically refers to the graphical or visual
representation of information and data using visual elements
like charts, graphs, maps etc.

Data Visualization is immensely useful in decision making. It


unveils patterns, trends, correlations etc in data.
Plotting with PyPlot I
Using PyPlot of Matplotlib Library

The mathplotlib is a Python library that provides many


interfaces and functionality for 2D graphics. It is a high
quality plotting library of Python.

Pyplot is a collection of methods within matplotlib that


allows user to construct 2D plots easily and interatively.
Plotting with PyPlot I
Installing and importing Matplotlib Library

You need to import the matplotlib as per the Python version


installed in your system
Plotting with PyPlot I
Importing PyPlot

import matplotlib.pyplot as pl

In the above statement, pl is set as the shorthand for


matplotlib.pyplot and hence we can invoke PyPlot’s
methods as follows:
pl.plot(x,y)
Plotting with PyPlot I
Working with PyPlot methods
The PyPlot interface provides many methods for 2D plotting
of data. The mathplotlib’s PyPlot interface lets one to plot
the data in multiple ways such as line chart, bar chart, pie
chart, scatter chart etc.

You can easily plot the data available in the form of NumPy
arrays or dataframes etc.
Plotting with PyPlot I
Working with PyPlot methods
Try this code:
import numpy as np
import matplotlib.pyplot as pl
x=np.linspace(1,5,6)
y=np.log(x)
pl.plot(x,y)
pl.show()
Plotting with PyPlot I
Working with PyPlot methods

x=np.linspace(1,5,6)
Plotting with PyPlot I
Working with PyPlot methods
x=np.linspace(start=0, stop=100, num=5)

The NumPy linspace function is a tool in Python for


creating numeric sequences. It’s somewhat similar to the
NumPy arange function, in that it creates sequences of
evenly spaced numbers structured as a NumPy array.
Plotting with PyPlot I
Try these one by one in your code

pl.bar(x,y)

pl.scatter(x,y)
Plotting with PyPlot I
Basics of simple plotting
Data visualization means graphical representation of
compiled data. Thus graphs and charts are very effective
tools for data visualization. You can create different types of
graphs and charts using PyPlot.
Plotting with PyPlot I
Basics of simple plotting

Data visualization means graphical representation of


compiled data. Thus graphs and charts are very effective
tools for data visualization. You can create different types of
graphs and charts using PyPlot.
Plotting with PyPlot I
Some commonly used chart types:
Plotting with PyPlot I
Creating Line charts and scatter charts

The line charts and scatter charts are similar. The only
difference is the presence/absence of the line connecting the
points.
Plotting with PyPlot I
Line chart using plot() function

A line chart is a type of chart which displays information as a


series of data points called markers connected by straight
line segments.
The PyPlot interface offers plot() function for creating a line
graph.
Plotting with PyPlot I
Line chart using plot() function
import matplotlib.pyplot as pl

a=[1,2,3,4]

b=[2,4,3,8]

pl.plot(a,b) Plot the points a,b as x,y co-ordinates

pl.show() Display the chart on the screen


Plotting with PyPlot I
How to put x axis and y axis labels in a Line chart

import matplotlib.pyplot as pl
a=[1,2,3,4]
b=[2,4,3,8]
pl.plot(a,b) Plot the points a,b as x,y co-ordinates
pl.ylabel("Number of points scored")
pl.xlabel("Test Numbers") Display labels
pl.show() Display the chart on the screen
Plotting with PyPlot I
Applying various settings in plot() function:

• Color (line color/marker color)


• Line Style
• Marker type
• Marker size etc
Plotting with PyPlot I
Applying various settings in plot() function:
pl.plot(a,b,color='red',
linestyle="dashed",
marker='d',
markersize=10,
markeredgecolor='black')
Plotting with PyPlot I
Applying various settings in plot() function:
Try with different line styles, color codes, marker types, line color
etc. It will help you while developing Python projects.

•Solid •b' = blue.


•Dashed •'g' = green.
•Dotted •'r' = red.
•Dashdot •'c' = cyan.
•None •'m' = magenta.
•'y' = yellow.
•'k' = black.
•'w' = white.
Plotting with PyPlot I
Creating Scatter Charts:

Scatter charts can be created through two functions of


pyplot library:
(i) plot()
(ii) scatter()
Plotting with PyPlot I
Creating Scatter Charts: .,x+v^<>sd
Use plot() to create scatter chart:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 30)
y = np.sin(x)
plt.plot(x, y, 'o')
plt.show()
The third argument in plot() is a character that
represents the type of symbol used for the plotting.
Plotting with PyPlot I

Use plot() to create line chart:


import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 30)
y = np.sin(x)
plt.plot(x, y)
plt.show()
If only x and y is given in plot(), then it will draw a
line graph.
Plotting with PyPlot I
Creating Scatter Charts:
Use scatter() to create scatter chart:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 30)
y = np.sin(x)
plt.scatter(x,y)
plt.show()
scatter() will take the marker as ‘o’ by default and
plot a scatter chart. If you want to draw scatter
chart with different marker use plot().
Plotting with PyPlot I
Creating Scatter Charts:
Setting various colors and various sizes in scatter chart:
import matplotlib.pyplot as pl
a=[1,2,3,4]
b=[2,4,3,8]
colarr=['r','b','m','g']
sarr=[20,60,100,45]
pl.scatter(a,b,color=colarr,s=sarr)
pl.ylabel("Number of points scored")
pl.xlabel("Test Numbers")
pl.show()
Plotting with PyPlot I: Creating Bar Charts
import matplotlib.pyplot as pl
a=[1,2,3,4]
b=[2,4,3,8]
pl.bar(a,b)
pl.ylabel("Number of points scored")
pl.xlabel("Test Numbers")
pl.show()

The first set of points given in the


bar() forms the x axis and the
second sequence points are
plotted on y-axis.
Changing widths of the bars in a barchart:
import matplotlib.pyplot as pl
a=[1,2,3,4]
b=[2,4,3,8]
pl.bar(a,b,width=0.5)
pl.ylabel("Number of points scored")
pl.xlabel("Test Numbers")
pl.show()

The first set of points given in the


bar() forms the x axis and the
second sequence points are
plotted on y-axis.
Changing widths of the bars in a barchart:
import matplotlib.pyplot as pl
a=[1,2,3,4]
b=[2,4,3,8]
pl.bar(a,b,width=[0.5,0.6,0.7,0.8])
pl.ylabel("Number of points scored")
pl.xlabel("Test Numbers")
pl.show()

The first value of the sequence


specifies the width of first value
of data sequence, and so on.
Changing colors of the bars in a bar chart:

import matplotlib.pyplot as pl
a=[1,2,3,4]
b=[2,4,3,8]
pl.bar(a,b,width=0.5,color='r')
pl.ylabel("Number of points scored")
pl.xlabel("Test Numbers")
pl.show()

The color given will be applied to


all the bars.
Changing colors of the bars in a bar chart:
import matplotlib.pyplot as pl
a=[1,2,3,4]
b=[2,4,3,8]
pl.bar(a,b,width=0.5,color=['r','b','g','black'])
pl.ylabel("Number of points scored")
pl.xlabel("Test Numbers")
pl.show()

The color sequence must have colors


for all the bars.
Creating a horizontal bar chart:
import matplotlib.pyplot as pl
import numpy as np
a=np.arange(4)
b=[5,25,45,20]
pl.barh(a,b,color='g')
pl.xlabel("Number of points scored")
pl.ylabel("Test Numbers")
pl.show()

Use barh() function (bar horizontal), in place


of bar(). The label you gave to x axis in bar(),
will become y-axis in barh() and vice-versa.
Customizing the plot:
Anatomy of a chart
• Figure – Plotting area
• Axes – x axis and y axis
Axis label
Limits
Tick Marks
• Title - Text on top of the chart
• Legend – Different colors that identify different sets of data.
Customizing the plot:
Anatomy of a chart

Adding a title:
pl.title(“Tests Score Analysis”)

Adding X axis and Y axis labels:


pl.ylabel("Number of points scored")
pl.xlabel("Test Numbers")
Customizing the plot:
Anatomy of a chart
Example:
import matplotlib.pyplot as pl
a=[1,2,3,4]
b=[2,4,3,8]
pl.bar(a,b,width=[0.5,0.6,0.7,0.8])
pl.title("Tests Score Analysis")
pl.ylabel("Number of points scored")
pl.xlabel("Test Numbers")
pl.show()

Only the data that falls into the limits of X and Y axes will be plotted, rest of the
data will not be plotted.
Customizing the plot:
Anatomy of a chart
Setting Ticks for axes:
By default, PyPlot will automatically decide which data points will
have ticks on the axes, but you can set it using the following
statement:
Syntax:
xticks(<sequence containing tick data points>,
[<optional sequence containing tick labels>])

Similarly yticks() can be used to set data points on Y axis.


Customizing the plot:
Anatomy of a chart
Setting Ticks for axes:

import matplotlib.pyplot as pl
a=[1,2,3,4]
b=[2,4,3,8]
pl.bar(a,b,width=[0.5,0.6,0.7,0.8])
pl.title("Tests Score Analysis")
pl.ylabel("Number of points scored")
pl.xlabel("Test Numbers")
pl.xticks(a)
pl.show()
Customizing the plot:
Anatomy of a chart
Setting Ticks for axes:

import matplotlib.pyplot as pl
a=[1,2,3,4]
b=[2,4,3,8]
pl.bar(a,b,width=[0.5,0.6,0.7,0.8])
pl.title("Tests Score Analysis")
pl.ylabel("Number of points scored")
pl.xlabel("Test Numbers")
pl.xticks(a,['T1','T2','T3','T4'])
pl.show()
Question: CBSE Sample Paper 2019-20
Question: Solution
Steps to follow:

1. Import the necessary files


2. Create list of x points (use arange())
3. Create percentage list
4. Plot the bar chart with suitable color and thickness.
5. Put the xlabel
6. Put the ylabel
Question: Solution
import matplotlib.pyplot as plt
import numpy as np
n_points=np.arange(4)
percentage=[82,83,85,90]
plt.bar(n_points, percentage, width=0.5, color='Blue')
plt.ylabel("Pass Percentage")
plt.xlabel('Years')
plt.show()
Steps to follow:
1. Import the necessary files
2. Create list of x points (use arange())
3. Create percentage list
4. Plot the bar chart with suitable color and thickness.
5. Put the xlabel
6. Put the ylabel
Question: Solution
import matplotlib.pyplot as plt
import numpy as np
years=['2015', '2016', '2017', '2018']
n_points=np.arange(len(years))
percentage=[82,83,85,90]
plt.bar(n_points, percentage, width=0.5, color='Blue')
plt.xticks(n_points,years)
plt.ylabel("Pass Percentage")
plt.xlabel('Years')
plt.show()
Steps to follow:
1. Import the necessary files
2. Create list of x points (use arange())
3. Create percentage list
4. Plot the bar chart with suitable color and thickness.
5. Put years in x axis.
6. Put the xlabel
7. Put the ylabel
Customizing the plot:
Anatomy of a chart
Adding limits for X and Y axis data (xlim() and ylim()):
When you specify X and Y ranges for plotting, PyPlot automatically
tries to find best fitting range for X axis and Y axis depending on the
data being plotted. But sometimes we may have to set the limits for X
and Y axis. For this you can use xlim() and ylim() functions.
Syntax:
pl.xlim(<xmin>,<xmax>) & pl.ylim(<ymin>,<ymax>)
Eg: pl.xlim(-2.0,4.0)
Only the data that falls into the limits of X and Y axes will be plotted, rest of the
data will not be plotted.
Creating multiple bars chart:
Steps:
1. Decide the number of X points.
2. Decide the Y points to plot. (List of lists)
(Number of inner lists is equal to number of bars in a group.
Number of elements in each inner list is equal to number of same colour bars)
3. Decide the thickness of each bar and accordingly adjust X
points on X axis.
4. Give different color to different data ranges.
5. The width argument remain same for all the data range
being plotted.
6. Plot using bar() for each data range separately.
Creating multiple bars chart:
Write code to plot the following multiple bar graph
Creating multiple bars chart:
Steps:
1. Decide the number of X points. import matplotlib.pyplot as pl
2. Decide the Y points to plot. (List of import numpy as np
lists)(Number of inner lists is equal to #No. of elements in a gives number of X points
number of bars in a group. Number a=np.arange(4)
of elements in each inner list is equal b=[[5,25,45,20],[4,23,49,17],[6,22,47,19]]
#Difference in thickness of each bar is 0.25
to number of same colour bars)
#Give different color to each bar
3. Decide the thickness of each bar and #Set same width
accordingly adjust X points on X axis. #plot using bar()
4. Give different color to different data pl.bar(a,b[0],width=0.25,color='b')
ranges. pl.bar(a+0.25,b[1],width=0.25,color='r')
5. The width argument remain same for pl.bar(a+0.5,b[2],width=0.25,color='g')
all the data range being plotted. pl.ylabel("Number of points scored")
6. Plot using bar() for each data range pl.xlabel("Test Numbers")
separately. pl.show()
Creating multiple bars chart:
import matplotlib.pyplot as pl
import numpy as np
a=np.arange(4)
b=[[5,25,45,20],[4,23,49,17],[6,22,47,19]]
pl.bar(a,b[0],width=0.25,color='b')
pl.bar(a+0.25,b[1],width=0.25,color='r')
pl.bar(a+0.5,b[2],width=0.25,color='g')
pl.ylabel("Number of points scored")
pl.xlabel("Test Numbers")
pl.show()
A multi bar chart plotting three
different data ranges.
Customizing the plot:
Anatomy of a chart
Adding Legends:
A legend is a color or mark linked to a specific data range plotted. To
plot a legend you need to do the following things:
(i) In the plotting functions like plot(), bar() etc., give a specific label
to data range using argument label.
(ii) Add legend to the plot using legend() as per the format:
pl.legend(loc=<position number or string>)
The loc argument can either take values 1,2,3,4 signifying the
position strings ‘upper right’, ‘upper left’, ‘lower left’, ‘lower right’
respectively. Default position is ‘upper right’ or 1.
Customizing the plot:
import matplotlib.pyplot as pl
Anatomy of a chart import numpy as np
val=[[5.,25.,45.,20.],[4.,23.,49.,17.],[6.,22.,47.,19.]]
Adding Legends: x=np.arange(4)
#step1: specify label for each range being plotted using label
argument
pl.bar(x+0.00,val[0],color='b',width=0.25,label="Range 1")
pl.bar(x+0.25,val[1],color='g',width=0.25,label="Range 2")
pl.bar(x+0.50,val[2],color='r',width=0.25,label="Range 3")
#step2: add legend
pl.legend(loc='upper left')
#Other chart formatting
pl.title("Multiple Bar Chart")
pl.xlabel("X axis")
pl.ylabel("Y axis")
pl.show()
Customizing the plot:
Saving a figure:

pl.savefig(“C:\\Data\\mychart.pdf”)
or
pl.savefig(“C:\\Data\\mychart.png”)
or
pl.savefig(“C:\\Data\\mychart.eps”)
Customizing the plot:
Saving a figure:

Encapsulated PostScript (EPS) –


used in Adobe Illustrator
Portable Document Format (PDF) –
developed by Adobe
Portable Network Graphics (PNG) -
more compression of data than GIF
Practical Question 1:
Consider the data given below:
Rainfall in mm
Zones Jan Feb Mar Apr May
North 140 130 130 190 160
South 160 200 130 200 200
East 140 180 150 170 190
West 180 150 200 120 180
Central 110 160 130 110 120

(i) Create bar charts to see the distribution of rainfall from Jan to May for all the
zones.
(ii) Create a line chart to observe any trends from Jan to May
Practical Question: Sample Output
Practical Question: Solution
x=np.arange(1,20,4)
pl.xlim(0,20)
pl.ylim(50,300)
pl.bar(x,jan,width=0.5,color='b',label="January")
pl.bar(x+0.5,feb,width=0.5,color='r',label="February")
pl.bar(x+1,mar,width=0.5,color='g',label="March")
pl.bar(x+1.5,apr,width=0.5,color='black',label="April")
import numpy as np pl.bar(x+2,may,width=0.5,color='pink',label="May")
import matplotlib.pyplot as pl pl.xticks(x,zones)
jan=[140,160,140,180,110] pl.title("Rainfall from Jan to May")
feb=[130,200,180,150,160] pl.xlabel("Zones")
mar=[130,130,150,200,130] pl.legend(loc='upper right')
apr=[190,200,170,120,110] pl.ylabel("Rainfall in mm")
may=[160,200,190,180,120] pl.show()
zones=["North","South","East","West","Central"]
Practical Question: Sample Output – Line Chart
Practical Question: Solution – Line Chart
import numpy as np
import matplotlib.pyplot as pl pl.xticks(x,zones)
jan=[140,160,140,180,110] pl.title("Rainfall from Jan to May")
feb=[130,200,180,150,160] pl.xlabel("Zones")
mar=[130,130,150,200,130] pl.legend(loc='upper right')
apr=[190,200,170,120,110] pl.ylabel("Rainfall in mm")
may=[160,200,190,180,120] pl.show()
zones=["North","South","East","West","Central"]
x=np.arange(1,20,4)
pl.xlim(0,20)
pl.ylim(50,300)
pl.plot(x,jan,color='b',linestyle="dashed",marker="d",markersize=10,label="January")
pl.plot(x,feb,color='r',linestyle="dashed",marker="d",markersize=10,label="February")
pl.plot(x,mar,color='g',linestyle="dashed",marker="d",markersize=10,label="March")
pl.plot(x,apr,color='black',linestyle="dashed",marker="d",markersize=10,label="April")
pl.plot(x,may,color='pink',linestyle="dashed",marker="d",markersize=10,label="May")
Practical Question 2:
CBSE Result Analysis – 2018-19
Pie Chart
import matplotlib.pyplot as plt
items = ['Cookies', 'Jellybean', 'Milkshake', 'Cheesecake']
data = [38.4, 40.6, 20.7, 10.3]
cols = ['yellowgreen', 'gold', 'lightskyblue', 'lightcoral']
exp=[0,0.1,0,0]
plt.pie(data, labels=items,colors=cols, shadow=True, startangle=90, autopct='%.2f%%',
explode=exp)
plt.legend(loc="lower left")
plt.axis('equal') #sets the aspect ratio so that the data units are the same in every direction.
plt.tight_layout() #to fit plots within the figure cleanly.
plt.show()
Practical Question 3:
Pie chart from data collected from data.gov.in
Practical Question 3:
Pie chart from data collected from data.gov.in
Practical Question 3:
Pie chart from data collected from data.gov.in
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("C:\\Data\\Agricultural Land.csv")
print(df)
data=df['Agricultural Land'].head(6)
states=df['States/UTs'].head(6)
cols=['m','c','g','b','r','gold']
exp=[0,0,0,0,0.1,0]
plt.pie(data, labels=states,colors=cols, shadow=True, startangle=90, autopct='%.2f%%', explode=exp)
plt.title(“State wise Agricultural Land”)
plt.legend(loc="lower left")
plt.axis('equal') #sets the aspect ratio so that the data units are the same in every direction.
plt.tight_layout() #to fit plots within the figure cleanly.
plt.show()

You might also like