WS#3 Python Data Science Toolbox - Nitro

Name APPLIED DATA SCIENCE
Eliezer Nitro
WORKSHEET #3: PYTHON DATA SCIENCE TOOLBOX
Write codes in Jupyter as required by the problems. Copy the code and output and paste them here. Use page breaks to start a new
number on a new page.
1 Date:
Create a list of lists. The individual lists should contain, in the correct order, the height (in inches), the weight (in pounds) and the
age of the baseball players.
Heights: 74 74 72 72 73 69 69 71 76 71 73 73 74 74 69 70 73 75 78 79
Weights: 180 215 210 210 188 176 209 200 231 180 188 180 185 160 180 185 189 185 219 230
Ages: 23 35 31 36 36 30 31 36 31 28 24 27 24 27 28 35 28 23 23 26
Convert the list of lists into a NumPy array named np_baseball. Using NumPy functionality, convert the unit of height to m
and that of weight to kg. Print the resulting array.
Code
height_in = [74, 74, 72, 72, 73, 69, 69, 71, 76, 71, 73, 73, 74, 74, 69, 70, 73, 75, 78, 79]
weight_lb = [180, 215, 210, 210, 188, 176, 209, 200, 231, 180, 188, 180, 185, 160, 180, 185, 189, 185, 219, 230]
age_year = [23, 35, 31, 36, 36, 30, 31, 36, 31, 28, 24, 27, 24, 27, 28, 35, 28, 23, 23, 26]
import numpy as np_baseball
np_weight_lb = np.array(weight_lb)
np_height_in = np.array(height_in)
np_age_year = np.array(age_year)
np_weight_kg = np_weight_lb * 0.453592
print(np_weight_kg)
np_height_m = np_height_in * 0.0254
print(np_height_m)
Output
[ 81.64656 97.52228 95.25432 95.25432 85.275296 79.832192
94.800728 90.7184 104.779752 81.64656 85.275296 81.64656
83.91452 72.57472 81.64656 83.91452 85.728888 83.91452
99.336648 104.32616 ]
[1.8796 1.8796 1.8288 1.8288 1.8542 1.7526 1.7526 1.8034 1.9304 1.8034
1.8542 1.8542 1.8796 1.8796 1.7526 1.778 1.8542 1.905 1.9812 2.0066]
2 Date:
Refer to the code in #1. Write a code that determines the age of the 8 th player. The output should be in the following form:
The 8th player is <age> years old.
Code
age_year = [23, 35, 31, 36, 36, 30, 31, 36, 31, 28, 24, 27, 24, 27, 28, 35, 28, 23, 23, 26]
import numpy as np
print("The 8th player is", np_age_year[7], "years old")
Output
The 8th player is 36 years old
3 Date:
Refer to the code in #1. Print out the ages of the young players (those who are 25 years old and below).
Code
age_year = [23, 35, 31, 36, 36, 30, 31, 36, 31, 28, 24, 27, 24, 27, 28, 35, 28, 23, 23, 26]
Page 1 of 6
Eliezer Nitro
import numpy as np
print(np_age_year[0], np_age_year[10], np_age_year[12], np_age_year[17], np_age_year[18])
Output
23 24 24 23 23
4 Date:
Visualize Child Mortality as a function of GDP per Capita for some of South East Asia countries. Use population as additional
argument. Do not forget to label the axes and to add a title.
Fertility Life Expectancy Population Child Mortality GDP Per Capita
Philippines 3.151 68.207 93.2 31.9 5614
Thailand 1.443 73.852 69.1 14.5 12822
Singapore 1.261 81.788 50.9 2.8 72056
Vietnam 1.82 75.49 87.8 24.8 4486
Indonesia 2.434 70.185 239.9 33.1 8498
Malaysia 2.001 74.479 48.0 8.3 20398
Code
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.read_csv('ilikecsv.csv')
df1.index = df1['Unnamed: 0'].values
df1.drop('Unnamed: 0', axis=1, inplace=True)
df1
fig, ax = plt.subplots()
scatter = ax.scatter(
x=df1['GDP Per Capita'],
y=df1['Child Mortality'],
s=df1['Population'],
c='blue',
alpha = 0.5)
plt.title("GDP Per Capita and Population vs Child Mortality")

plt.xlabel("GDP per Capita")
plt.ylabel("Child Mortality")
h, l = scatter.legend_elements(prop="sizes", alpha=0.5, c='blue')

legend2 = ax.legend(h, l, loc="upper right", title="Population")
plt.show()
ilikecsv.csv
Output
Page 2 of 6
Eliezer Nitro
5 Date:
Create a line plot of CO2 emissions per person in the Philippines as a function of year. Make sure to add labels and a title to your
plot.
CO2 Emissions per country per year (tons per person)
country 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Brunei 13.9 13.7 13.1 22.5 24 20.5 21.1 24.6 24.2 19.2 22.1
Cambodia 0.187 0.209 0.223 0.253 0.281 0.33 0.35 0.358 0.369 0.373 0.438
Indonesia 1.51 1.51 1.5 1.61 1.76 1.87 1.77 2.46 2.56 1.95 1.82
Lao 0.246 0.244 0.265 0.153 0.156 0.204 0.262 0.256 0.265 0.243 0.297
Malaysia 6.51 6.8 6.41 6.94 7.53 7.2 7.77 7.7 7.5 7.96 8.03
Myanmar 0.259 0.239 0.263 0.262 0.198 0.205 0.25 0.283 0.217 0.25 0.417
Philippines 0.875 0.867 0.771 0.808 0.869 0.841 0.905 0.897 0.942 0.996 1.06
Singapore 6.52 6.76 6.68 4.21 7.45 11.3 11 8.74 6.9 10.4 10.3
Thailand 3.74 3.78 3.83 3.81 3.79 4 4.19 4.12 4.37 4.4 4.62
Vietnam 1.08 1.16 1.21 1.22 1.36 1.47 1.61 1.7 1.57 1.61 1.8
Code
ph_co2_emissions=[0.875, 0.867, 0.771, 0.808, 0.869, 0.841, 0.905, 0.897, 0.942, 0.996, 1.06]
ph_emissions_per_year = [2004, 2005, 2006, 2007, 2008,2009, 2010, 2011, 2012, 2013, 2014]
xlab='Year'
ylab='CO2 Emissions (in tons per peson)'
title='CO2 Emissions in the Philippines per year'
import matplotlib.pyplot as plt

plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
plt.plot(ph_emissions_per_year, ph_co2_emissions)
plt.show()
Page 3 of 6
Eliezer Nitro
Output
6 Date:
Which of the following conclusions can you derive from the plot? A
A. The countries in blue, corresponding to Africa, have

both low life expectancy and a low GDP per capita.
B. There is a negative correlation between GDP per

capita and life expectancy.
C. China has both a lower GDP per capita and lower life
expectancy compared to India.
Page 4 of 6
Eliezer Nitro
7 Date:
Import cars.csv. Use the country abbreviations as index. Print the first three lines.
import pandas as pd
cars=pd.read_csv("cars.csv", index_col=0)
#Select US, AUS, and JAP
print(cars[0:3])
Output
cars_per_cap country drives_right
US 809 United States True
AUS 731 Australia False
JAP 588 Japan False
8 Date:
Refer to the cars dataset. Create a code that prints out the observations for the countries with few cars (cars per capita less than
500).
Code
import pandas as pd
cars=pd.read_csv("cars.csv", index_col=0)
cars_per_cap = [809, 731, 588, 18, 200, 70, 45]
print(cars["cars_per_cap"]<500)
Output
US False
AUS False
JAP False
IN True
RU True
MOR True
EG True
Name: cars_per_cap, dtype: bool
9 Date:
Import weather_data_austin_2010.csv. Make sure to use a DateTimeIndex. Extract the Temperature column and
save the result to temp0. Extract data from temp0 for a single hour – the hour from 9 pm to 10 pm on October 11, 2010. Assign
the data to temp1.
Code
import pandas as pd
temp1=pd.read_csv("weather_data_austin_2010.csv", parse_dates=True, index_col="Date")
temp0=temp1.loc["20101011 21:00:00":"20101011 22:00:00"]
print(temp1.loc["20101011 21:00:00":"20101011 22:00:00"])
Output
Temperature DewPoint Pressure

Date
2010-10-11 21:00:00 69.0 59.8 1.0
2010-10-11 22:00:00 67.7 59.9 1.0
Page 5 of 6
Eliezer Nitro
10 Date:
Resample temp0 from #9 to every 6 hours frequency. Aggregate using mean.
Code
import pandas as pd
temp0=pd.read_csv("weather_data_austin_2010.csv", parse_dates=True, index_col="Date")
print(temp0.loc["20101011 21:00:00": "20101012 22:00:00"].resample("6H").ffill())

Output
Temperature DewPoint Pressure
Date
2010-10-11 18:00:00 NaN NaN NaN
2010-10-12 00:00:00 66.1 59.4 1.0
2010-10-12 06:00:00 62.5 58.1 1.0
2010-10-12 12:00:00 77.9 59.0 1.0
2010-10-12 18:00:00 76.6 58.3 1.0
Page 6 of 6

WS#3 Python Data Science Toolbox - Nitro

Uploaded by

WS#3 Python Data Science Toolbox - Nitro

Uploaded by

Name APPLIED DATA SCIENCE

WORKSHEET #3: PYTHON DATA SCIENCE TOOLBOX

WORKSHEET #3: PYTHON DATA SCIENCE TOOLBOX

plt.title("GDP Per Capita and Population vs Child Mortality")

h, l = scatter.legend_elements(prop="sizes", alpha=0.5, c='blue')

WORKSHEET #3: PYTHON DATA SCIENCE TOOLBOX

import matplotlib.pyplot as plt

WORKSHEET #3: PYTHON DATA SCIENCE TOOLBOX

A. The countries in blue, corresponding to Africa, have

B. There is a negative correlation between GDP per

WORKSHEET #3: PYTHON DATA SCIENCE TOOLBOX

Temperature DewPoint Pressure

WORKSHEET #3: PYTHON DATA SCIENCE TOOLBOX

print(temp0.loc["20101011 21:00:00": "20101012 22:00:00"].resample("6H").ffill())

You might also like