This document discusses different types of data visualizations that can be created using the matplotlib library in Python. It provides code examples for creating line charts, bar charts, histograms, scatterplots, and line charts with multiple lines. The goal of data visualization is to explore and communicate data in a visual format. Matplotlib is useful for simple visualizations like bar charts, line charts, and scatterplots.
This document discusses different types of data visualizations that can be created using the matplotlib library in Python. It provides code examples for creating line charts, bar charts, histograms, scatterplots, and line charts with multiple lines. The goal of data visualization is to explore and communicate data in a visual format. Matplotlib is useful for simple visualizations like bar charts, line charts, and scatterplots.
This document discusses different types of data visualizations that can be created using the matplotlib library in Python. It provides code examples for creating line charts, bar charts, histograms, scatterplots, and line charts with multiple lines. The goal of data visualization is to explore and communicate data in a visual format. Matplotlib is useful for simple visualizations like bar charts, line charts, and scatterplots.
This document discusses different types of data visualizations that can be created using the matplotlib library in Python. It provides code examples for creating line charts, bar charts, histograms, scatterplots, and line charts with multiple lines. The goal of data visualization is to explore and communicate data in a visual format. Matplotlib is useful for simple visualizations like bar charts, line charts, and scatterplots.
Download as PPTX, PDF, TXT or read online from Scribd
Download as pptx, pdf, or txt
You are on page 1of 18
Data Science
CS300 By: Dr. Muhammad Khan Afridi Visualizing Data I believe that visualization is one of the most powerful means of achieving personal goals. —Harvey Mackay
A fundamental part of the data scientist’s toolkit is data
visualization. Although it is very easy to create visualizations, it’s much harder to produce good ones. Visualizing Data There are two primary uses for data visualization: 1. To explore data 2. To communicate data matplotlib matplotlib works pretty well for simple bar charts, line charts, and scatterplots. We will be using the matplotlib.pyplot module. In its simplest use, pyplot maintains an internal state in which you build up a visualization step by step. Line Charts 1. from matplotlib import pyplot as plt 2. years = [1950, 1960, 1970, 1980, 1990, 2000, 2010] 3. gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3] 4. # create a line chart, years on x-axis, gdp on y-axis 5. plt.plot(years, gdp, color='green', marker='o', linestyle='solid') 6. # add a title 7. plt.title("Nominal GDP") 8. # add a label to the y-axis 9. plt.ylabel("Billions of $") 10. plt.show() Example Bar Charts A bar chart is a good choice when you want to show how some quantity varies among some discrete set of items. 1. movies = ["Annie Hall", "Ben-Hur", "Casablanca", "Gandhi", "West Side Story"] 2. num_oscars = [5, 11, 3, 8, 10] 3. # bars are by default width 0.8, so we'll add 0.1 to the left coordinates Example 4. # so that each bar is centered 5. xs = [i + 0.1 for i, _ in enumerate(movies)] 6. # plot bars with left x-coordinates [xs], heights [num_oscars] 7. plt.bar(xs, num_oscars) 8. plt.ylabel("# of Academy Awards") 9. plt.title("My Favorite Movies") 10. # label x-axis with movie names at bar centers 11. plt.xticks([i + 0.5 for i, _ in enumerate(movies)], movies) 12. plt.show() Example Example 2 1. grades = [83,95,91,87,70,0,85,82,100,67,73,77,0] 2. decile = lambda grade: grade // 10 * 10 3. histogram = Counter(decile(grade) for grade in grades) 4. plt.bar([x - 4 for x in histogram.keys()], # shift each bar to the left by 4 5. histogram.values(), # give each bar its correct height 6. 8) # give each bar a width of 8 7. plt.axis([-5, 105, 0, 5]) # x-axis from -5 to 105, 8. # y-axis from 0 to 5 9. plt.xticks([10 * i for i in range(11)]) # x-axis labels at 0, 10, ..., 100 Example 2 10. plt.xlabel("Decile") 11. plt.ylabel("# of Students") 12. plt.title("Distribution of Exam 1 Grades") 13. plt.show() Example 2 Line Charts (Multiple Lines) These are a good choice for showing trends. 1. variance = [1, 2, 4, 8, 16, 32, 64, 128, 256] 2. bias_squared = [256, 128, 64, 32, 16, 8, 4, 2, 1] 3. total_error = [x + y for x, y in zip(variance, bias_squared)] 4. xs = [i for i, _ in enumerate(variance)] 5. # we can make multiple calls to plt.plot 6. # to show multiple series on the same chart 7. plt.plot(xs, variance, 'g-', label='variance') # green solid line 8. plt.plot(xs, bias_squared, 'r-.', label='bias^2') # red dot-dashed line 9. plt.plot(xs, total_error, 'b:', label='total error') # blue dotted line Example 10. # because we've assigned labels to each series 11. # we can get a legend for free 12. # loc=9 means "top center" 13. plt.legend(loc=9) 14. plt.xlabel("model complexity") 15. plt.title("The Bias-Variance Tradeoff") 16. plt.show() Example Scatterplots A scatterplot is the right choice for visualizing the relationship between two paired sets of data. 1. friends = [ 70, 65, 72, 63, 71, 64, 60, 64, 67] 2. minutes = [175, 170, 205, 120, 220, 130, 105, 145, 190] 3. labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'] 4. plt.scatter(friends, minutes) Example 5. # label each point 6. for label, friend_count, minute_count in zip(labels, friends, minutes): 7. plt.annotate(label, 8. xy=(friend_count, minute_count), # put the label with its point 9. xytext=(5, -5), # but slightly offset 10. textcoords='offset points') 11. plt.title("Daily Minutes vs. Number of Friends") 12. plt.xlabel("# of friends") 13. plt.ylabel("daily minutes spent on the site") 14. plt.show() Example