0% found this document useful (0 votes)
192 views6 pages

WS#3 Python Data Science Toolbox - Nitro

The document contains a series of coding exercises involving data analysis and visualization in Python. In exercise 1, the student imports baseball player data and converts it to a NumPy array, changing the units of height and weight. Exercise 2 finds the age of the 8th baseball player. Exercise 3 prints the ages of players aged 25 and under. Exercise 4 visualizes the relationship between child mortality, GDP per capita, and population for several Southeast Asian countries.

Uploaded by

Eliezer Nitro
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
192 views6 pages

WS#3 Python Data Science Toolbox - Nitro

The document contains a series of coding exercises involving data analysis and visualization in Python. In exercise 1, the student imports baseball player data and converts it to a NumPy array, changing the units of height and weight. Exercise 2 finds the age of the 8th baseball player. Exercise 3 prints the ages of players aged 25 and under. Exercise 4 visualizes the relationship between child mortality, GDP per capita, and population for several Southeast Asian countries.

Uploaded by

Eliezer Nitro
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 6

Name APPLIED DATA SCIENCE

Eliezer Nitro

WORKSHEET #3: PYTHON DATA SCIENCE TOOLBOX

Write codes in Jupyter as required by the problems. Copy the code and output and paste them here. Use page breaks to start a new
number on a new page.

1 Date:
Create a list of lists. The individual lists should contain, in the correct order, the height (in inches), the weight (in pounds) and the
age of the baseball players.

Heights: 74 74 72 72 73 69 69 71 76 71 73 73 74 74 69 70 73 75 78 79
Weights: 180 215 210 210 188 176 209 200 231 180 188 180 185 160 180 185 189 185 219 230
Ages: 23 35 31 36 36 30 31 36 31 28 24 27 24 27 28 35 28 23 23 26

Convert the list of lists into a NumPy array named np_baseball. Using NumPy functionality, convert the unit of height to m
and that of weight to kg. Print the resulting array.
Code
height_in = [74, 74, 72, 72, 73, 69, 69, 71, 76, 71, 73, 73, 74, 74, 69, 70, 73, 75, 78, 79]
weight_lb = [180, 215, 210, 210, 188, 176, 209, 200, 231, 180, 188, 180, 185, 160, 180, 185, 189, 185, 219, 230]
age_year = [23, 35, 31, 36, 36, 30, 31, 36, 31, 28, 24, 27, 24, 27, 28, 35, 28, 23, 23, 26]
import numpy as np_baseball
np_weight_lb = np.array(weight_lb)
np_height_in = np.array(height_in)
np_age_year = np.array(age_year)
np_weight_kg = np_weight_lb * 0.453592
print(np_weight_kg)
np_height_m = np_height_in * 0.0254
print(np_height_m)
Output
[ 81.64656 97.52228 95.25432 95.25432 85.275296 79.832192
94.800728 90.7184 104.779752 81.64656 85.275296 81.64656
83.91452 72.57472 81.64656 83.91452 85.728888 83.91452
99.336648 104.32616 ]

[1.8796 1.8796 1.8288 1.8288 1.8542 1.7526 1.7526 1.8034 1.9304 1.8034
1.8542 1.8542 1.8796 1.8796 1.7526 1.778 1.8542 1.905 1.9812 2.0066]

2 Date:
Refer to the code in #1. Write a code that determines the age of the 8 th player. The output should be in the following form:
The 8th player is <age> years old.
Code
age_year = [23, 35, 31, 36, 36, 30, 31, 36, 31, 28, 24, 27, 24, 27, 28, 35, 28, 23, 23, 26]
import numpy as np
np_age_year = np.array(age_year)
print("The 8th player is", np_age_year[7], "years old")
Output
The 8th player is 36 years old

3 Date:
Refer to the code in #1. Print out the ages of the young players (those who are 25 years old and below).
Code
age_year = [23, 35, 31, 36, 36, 30, 31, 36, 31, 28, 24, 27, 24, 27, 28, 35, 28, 23, 23, 26]

Page 1 of 6
Name APPLIED DATA SCIENCE
Eliezer Nitro

WORKSHEET #3: PYTHON DATA SCIENCE TOOLBOX

import numpy as np
np_age_year = np.array(age_year)
print(np_age_year[0], np_age_year[10], np_age_year[12], np_age_year[17], np_age_year[18])

Output
23 24 24 23 23

4 Date:
Visualize Child Mortality as a function of GDP per Capita for some of South East Asia countries. Use population as additional
argument. Do not forget to label the axes and to add a title.
Fertility Life Expectancy Population Child Mortality GDP Per Capita
Philippines 3.151 68.207 93.2 31.9 5614
Thailand 1.443 73.852 69.1 14.5 12822
Singapore 1.261 81.788 50.9 2.8 72056
Vietnam 1.82 75.49 87.8 24.8 4486
Indonesia 2.434 70.185 239.9 33.1 8498
Malaysia 2.001 74.479 48.0 8.3 20398

Code
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.read_csv('ilikecsv.csv')
df1.index = df1['Unnamed: 0'].values
df1.drop('Unnamed: 0', axis=1, inplace=True)
df1
fig, ax = plt.subplots()
scatter = ax.scatter(
x=df1['GDP Per Capita'],
y=df1['Child Mortality'],
s=df1['Population'],
c='blue',
alpha = 0.5)

plt.title("GDP Per Capita and Population vs Child Mortality")


plt.xlabel("GDP per Capita")
plt.ylabel("Child Mortality")

h, l = scatter.legend_elements(prop="sizes", alpha=0.5, c='blue')


legend2 = ax.legend(h, l, loc="upper right", title="Population")
plt.show()

ilikecsv.csv

Output

Page 2 of 6
Name APPLIED DATA SCIENCE
Eliezer Nitro

WORKSHEET #3: PYTHON DATA SCIENCE TOOLBOX

5 Date:
Create a line plot of CO2 emissions per person in the Philippines as a function of year. Make sure to add labels and a title to your
plot.
CO2 Emissions per country per year (tons per person)
country 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Brunei 13.9 13.7 13.1 22.5 24 20.5 21.1 24.6 24.2 19.2 22.1
Cambodia 0.187 0.209 0.223 0.253 0.281 0.33 0.35 0.358 0.369 0.373 0.438
Indonesia 1.51 1.51 1.5 1.61 1.76 1.87 1.77 2.46 2.56 1.95 1.82
Lao 0.246 0.244 0.265 0.153 0.156 0.204 0.262 0.256 0.265 0.243 0.297
Malaysia 6.51 6.8 6.41 6.94 7.53 7.2 7.77 7.7 7.5 7.96 8.03
Myanmar 0.259 0.239 0.263 0.262 0.198 0.205 0.25 0.283 0.217 0.25 0.417
Philippines 0.875 0.867 0.771 0.808 0.869 0.841 0.905 0.897 0.942 0.996 1.06
Singapore 6.52 6.76 6.68 4.21 7.45 11.3 11 8.74 6.9 10.4 10.3
Thailand 3.74 3.78 3.83 3.81 3.79 4 4.19 4.12 4.37 4.4 4.62
Vietnam 1.08 1.16 1.21 1.22 1.36 1.47 1.61 1.7 1.57 1.61 1.8

Code
ph_co2_emissions=[0.875, 0.867, 0.771, 0.808, 0.869, 0.841, 0.905, 0.897, 0.942, 0.996, 1.06]
ph_emissions_per_year = [2004, 2005, 2006, 2007, 2008,2009, 2010, 2011, 2012, 2013, 2014]

xlab='Year'
ylab='CO2 Emissions (in tons per peson)'
title='CO2 Emissions in the Philippines per year'

import matplotlib.pyplot as plt


plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
plt.plot(ph_emissions_per_year, ph_co2_emissions)
plt.show()

Page 3 of 6
Name APPLIED DATA SCIENCE
Eliezer Nitro

WORKSHEET #3: PYTHON DATA SCIENCE TOOLBOX

Output

6 Date:

Which of the following conclusions can you derive from the plot? A

A. The countries in blue, corresponding to Africa, have


both low life expectancy and a low GDP per capita.

B. There is a negative correlation between GDP per


capita and life expectancy.

C. China has both a lower GDP per capita and lower life
expectancy compared to India.

Page 4 of 6
Name APPLIED DATA SCIENCE
Eliezer Nitro

WORKSHEET #3: PYTHON DATA SCIENCE TOOLBOX

7 Date:
Import cars.csv. Use the country abbreviations as index. Print the first three lines.
import pandas as pd
cars=pd.read_csv("cars.csv", index_col=0)
#Select US, AUS, and JAP
print(cars[0:3])
Output
cars_per_cap country drives_right
US 809 United States True
AUS 731 Australia False
JAP 588 Japan False

8 Date:
Refer to the cars dataset. Create a code that prints out the observations for the countries with few cars (cars per capita less than
500).
Code
import pandas as pd
cars=pd.read_csv("cars.csv", index_col=0)
cars_per_cap = [809, 731, 588, 18, 200, 70, 45]
print(cars["cars_per_cap"]<500)
Output
US False
AUS False
JAP False
IN True
RU True
MOR True
EG True
Name: cars_per_cap, dtype: bool

9 Date:
Import weather_data_austin_2010.csv. Make sure to use a DateTimeIndex. Extract the Temperature column and
save the result to temp0. Extract data from temp0 for a single hour – the hour from 9 pm to 10 pm on October 11, 2010. Assign
the data to temp1.
Code
import pandas as pd
temp1=pd.read_csv("weather_data_austin_2010.csv", parse_dates=True, index_col="Date")
temp0=temp1.loc["20101011 21:00:00":"20101011 22:00:00"]
print(temp1.loc["20101011 21:00:00":"20101011 22:00:00"])

Output

Temperature DewPoint Pressure


Date
2010-10-11 21:00:00 69.0 59.8 1.0
2010-10-11 22:00:00 67.7 59.9 1.0

Page 5 of 6
Name APPLIED DATA SCIENCE
Eliezer Nitro

WORKSHEET #3: PYTHON DATA SCIENCE TOOLBOX

10 Date:
Resample temp0 from #9 to every 6 hours frequency. Aggregate using mean.
Code
import pandas as pd
temp0=pd.read_csv("weather_data_austin_2010.csv", parse_dates=True, index_col="Date")

print(temp0.loc["20101011 21:00:00": "20101012 22:00:00"].resample("6H").ffill())


Output
Temperature DewPoint Pressure
Date
2010-10-11 18:00:00 NaN NaN NaN
2010-10-12 00:00:00 66.1 59.4 1.0
2010-10-12 06:00:00 62.5 58.1 1.0
2010-10-12 12:00:00 77.9 59.0 1.0
2010-10-12 18:00:00 76.6 58.3 1.0

Page 6 of 6

You might also like