Chapter 3

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 18

Data Science

CS300
By: Dr. Muhammad Khan Afridi
Visualizing Data
I believe that visualization is one of the most powerful
means of achieving personal goals.
—Harvey Mackay

A fundamental part of the data scientist’s toolkit is data


visualization. Although it is very easy to create
visualizations, it’s much harder to produce good ones.
Visualizing Data
There are two primary uses for data visualization:
1. To explore data
2. To communicate data
matplotlib
matplotlib works pretty well for simple bar charts, line
charts, and scatterplots.
We will be using the matplotlib.pyplot module. In its
simplest use, pyplot maintains an internal state in which you
build up a visualization step by step.
Line Charts
1. from matplotlib import pyplot as plt
2. years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]
3. gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7,
14958.3]
4. # create a line chart, years on x-axis, gdp on y-axis
5. plt.plot(years, gdp, color='green', marker='o',
linestyle='solid')
6. # add a title
7. plt.title("Nominal GDP")
8. # add a label to the y-axis
9. plt.ylabel("Billions of $")
10. plt.show()
Example
Bar Charts
A bar chart is a good choice when you want to show how
some quantity varies among some discrete set of items.
1. movies = ["Annie Hall", "Ben-Hur", "Casablanca",
"Gandhi", "West Side Story"]
2. num_oscars = [5, 11, 3, 8, 10]
3. # bars are by default width 0.8, so we'll add 0.1 to the left
coordinates
Example
4. # so that each bar is centered
5. xs = [i + 0.1 for i, _ in enumerate(movies)]
6. # plot bars with left x-coordinates [xs], heights [num_oscars]
7. plt.bar(xs, num_oscars)
8. plt.ylabel("# of Academy Awards")
9. plt.title("My Favorite Movies")
10. # label x-axis with movie names at bar centers
11. plt.xticks([i + 0.5 for i, _ in enumerate(movies)], movies)
12. plt.show()
Example
Example 2
1. grades = [83,95,91,87,70,0,85,82,100,67,73,77,0]
2. decile = lambda grade: grade // 10 * 10
3. histogram = Counter(decile(grade) for grade in grades)
4. plt.bar([x - 4 for x in histogram.keys()], # shift each bar to
the left by 4
5. histogram.values(), # give each bar its correct height
6. 8) # give each bar a width of 8
7. plt.axis([-5, 105, 0, 5]) # x-axis from -5 to 105,
8. # y-axis from 0 to 5
9. plt.xticks([10 * i for i in range(11)]) # x-axis labels at 0,
10, ..., 100
Example 2
10. plt.xlabel("Decile")
11. plt.ylabel("# of Students")
12. plt.title("Distribution of Exam 1 Grades")
13. plt.show()
Example 2
Line Charts (Multiple Lines)
These are a good choice for showing trends.
1. variance = [1, 2, 4, 8, 16, 32, 64, 128, 256]
2. bias_squared = [256, 128, 64, 32, 16, 8, 4, 2, 1]
3. total_error = [x + y for x, y in zip(variance, bias_squared)]
4. xs = [i for i, _ in enumerate(variance)]
5. # we can make multiple calls to plt.plot
6. # to show multiple series on the same chart
7. plt.plot(xs, variance, 'g-', label='variance') # green solid line
8. plt.plot(xs, bias_squared, 'r-.', label='bias^2') # red dot-dashed
line
9. plt.plot(xs, total_error, 'b:', label='total error') # blue dotted line
Example
10. # because we've assigned labels to each series
11. # we can get a legend for free
12. # loc=9 means "top center"
13. plt.legend(loc=9)
14. plt.xlabel("model complexity")
15. plt.title("The Bias-Variance Tradeoff")
16. plt.show()
Example
Scatterplots
A scatterplot is the right choice for visualizing the
relationship between two paired sets of data.
1. friends = [ 70, 65, 72, 63, 71, 64, 60, 64, 67]
2. minutes = [175, 170, 205, 120, 220, 130, 105, 145, 190]
3. labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
4. plt.scatter(friends, minutes)
Example
5. # label each point
6. for label, friend_count, minute_count in zip(labels, friends,
minutes):
7. plt.annotate(label,
8. xy=(friend_count, minute_count), # put the label with its
point
9. xytext=(5, -5), # but slightly offset
10. textcoords='offset points')
11. plt.title("Daily Minutes vs. Number of Friends")
12. plt.xlabel("# of friends")
13. plt.ylabel("daily minutes spent on the site")
14. plt.show()
Example

You might also like