Python Introduction
Python Introduction
Kevin Sheppard
University of Oxford
www.kevinsheppard.com
September 2021
© 2021 Kevin Sheppard
Contents
Installing i
1 Getting Started 1
2 Basic Python Types 9
3 Importing Modules 13
4 Series and DataFrames 15
5 Constructing DataFrames from Series 19
6 Methods and Functions 21
7 Custom Functions 25
8 Using DataFrames 27
9 Common DataFrame methods 31
10 Accessing Elements in DataFrames 35
11 Accessing Elements in NumPy Arrays 37
12 Numeric Indexing of DataFrames 41
13 for Loops 43
14 Logical Operators 45
15 Boolean Arrays 47
16 Boolean Selection 49
17 Conditional Statements 51
18 Logic and Loops 53
19 Importing Data 55
20 Saving and Exporting Data 57
21 Graphics: Line Plots 59
22 Graphics: Other Plots 61
Final Exam 63
Installing
Install Anaconda
2. Install the Python extension by clicking on Extensions and searching for “Python”
#%%
1. Download PyCharm Professional and install using the 30-day trial. You can get a free copy using your
academic email address if you want to continue after the first 30 days.
3. Open File > Setting and select Python Interpreter. Select the Anaconda interpreter if it is not already
selected.
i
print("Python has a steeper curve than MATLAB but more long-run upside")
ii
Lesson 1
Getting Started
An Anaconda terminal allows python to be run directly. It also allows other useful programs, for example
pip, the Python package manager to be used to install packages that are not available through Anaconda.
Windows
Open the terminal (instructions depend on your distribution). If you allowed conda to initialize, then you should
be ready to call Anaconda"s python and supporting functions. If not, you should
cd ~/anaconda3/bin
./conda init
1
1.2 Running IPython in a Terminal
1. Open a terminal.
2. Run IPython by entering ipython in the terminal window. You should see a window like the one below
with the iconic In [1] indicating that you are at the start of a new IPython session.
2
Jupyter Notebook
1. Open a text editor and enter the following lines. Save the file as lesson-2.py. Note that Python is white-
space sensitive, and so these lines should not not indented.
x = exp(1)
y = log(x)
print(f"exp(1)={x}, log(exp(1))={y}")
2. Run the code in an IPython session using %run -i lesson-2.py. Note: you should create the python
file in the same directory as the notebook.
exp(1)=2.718281828459045, log(exp(1))=1.0
Visual Studio Code (or VS Code) is a lightweight IDE that supports adding features through extensions. The
key extension for working with notebooks is Python extension for Visual Studio Code. With this extension
installed, VS code provides native support for Jupyter notebooks.
3
2. Open the command palette and enter “create jupyter” and select the only available item.
See the screenshot below for an example of the experience of using Jupyter notebooks in VS Code.
VS Code Notebook
4
"""
# Cell Heading
Likeness darkness. That give brought creeping. Doesn"t may. Fruit kind
midst seed. Creature, let under created void god to. Them day was Was
creature set it from. Fourth. Created don"t man. Man. Light fourth
light given the he image first multiply after deep she"d great. Morning
likeness very have give also fowl third land beast from moving thing
creepeth herb creeping won"t fifth. Us bring was our beast wherein our
void and green he fruit kind upon a given, saying fruit, moveth face
forth. His you it. Good beginning hath.
"""
# # Cell Heading
#
# Likeness darkness. That give brought creeping. Doesn"t may. Fruit kind
# midst seed. Creature, let under created void god to. Them day was Was
# creature set it from. Fourth. Created don"t man. Man. Light fourth
# light given the he image first multiply after deep she"d great. Morning
# likeness very have give also fowl third land beast from moving thing
# creepeth herb creeping won"t fifth. Us bring was our beast wherein our
# void and green he fruit kind upon a given, saying fruit, moveth face
# forth. His you it. Good beginning hath.
The cells have a special button above them that allows the contents to be executed and the result to be displayed
in the interactive window. See the screenshot below for an example of the experience of using VS Code. There
is also an interactive console at the bottom left where commands can be directly executed.
VS Code Notebook
5
Importing an exiting notebook into Magic Python
VS Code only understands Magic Python files as notebook-like documents, and so .ipynb files must be con-
verted to use. The process of importing is simple:
VS Code Export
To export a Magic Python file, open the command palette and enter “import jupyter”. Select the option to
import the notebook.
VS Code Import
PyCharm Professional is my recommended approach if you are going to use Python throughout the course. It
provides the best experience and can be acquired for free using the student program.
PyCharm Professional has deeply integrated Jupyter Notebooks. To create an IPython notebook:
6
PyCharm New Notebook
PyCharm uses a special syntax where cells look like code and so can be edited like text. This allows PyCharm
to use introspection and code completion on the code you have written, a highly useful set of features. PyCharm
stores the notebook in a Jupyter notebook file (.ipynb), which means that you can trivially open it in any other
Jupyter notebook aware app. This differs from VS code which stores the file as a play Python file (.py) and
requires an explicit export to a Jupyter notebook file.
A code cell is demarcated using #%% and a markdown cell begins with #%% md. Below is a screenshot of this
notebook in PyCharm.
7
PyCharm Notebook
PyCharm supports Magic Python cell execution. To use Magic Python, you need to enable Scientific Mode in
the View menu. You can then use #%% to indicate the start and end of cells. Individual Cells can be executed in
the console by pressing CTRL+Enter.
1. In PyCharm, right-click on the root directory and select New > Python File. Give your file a mean-
ingful name.
2. Enter
#%%
print("This is the first cell")
#%%
print("This is not executed when the first cell is run")
4. Run the first cell by placing you mouse in the cell and pressing CTRL+Enter.
5. Run the second cell by clicking on the Play button (arrow) that appears in the gutter of the editor.
Note: Magic Python in PyCharm only supports python code, and so it is not possible to mix Markdown text
and Python in the same file.
8
Lesson 2
Course Structure
Problems
Problems are explicitly explained and answered in the online course videos.
Exercises
Exercises are not covered, and are left as additional problems for you to attempt after watching a lesson and
going through the problems. Solutions for the exercises are available in the solutions folder.
Final Exam
When you have completed the course, or if you have a reasonable background in Python, you can attempt the
final exam. Ideally you should do this without looking at the solutions. If you can complete the final exam,
then you are prepared the remainder of the course.
9
2. Create a f-string the prints The value of scalar_float is 3.1415 using the variable created in
the previous step
3. Create two string, first containing String concatenation and the second containing is like
addition, and join the two using + to produce String concatenation is like addition.
1. Create a dictionary containing the key-value pairs "float" and 3.1415, "int" and 31415, and
"string" and "three-point-one-four-one-five".
Exercises
10
2. Add the elements 9, "Eight" and 7.0 (in order) to the list.
3. Extend the list with the list ["Six", 5, 4.0] using extend
4. Select first 4 elements of lst
5. Select last 3 elements of lst
Hint You must use both types of quotes. For example, to access a value in an f-string.
f"{other_dct['apple']}"
11
12
Lesson 3
Importing Modules
• Module import
Python is a general-purpose programming language and is not specialized for numerical or statistical compu-
tation. The core modules that enable Python to store and access data efficiently and that provide statistical
algorithms are located in modules. The most important are:
• NumPy (numpy) - provide the basic array block used throughout numerical Python
• pandas (pandas) - provides DataFrames which are used to store data in an easy-to-use format
• SciPy (scipy) - Basic statistics and random number generators. The most important submodule is
scipy.stats
• matplotlib (matplotlib) - graphics. The most important submodule is matplotlib.pyplot.
• statsmodels (statsmodels) - statistical models such as OLS. The most important submodules are
statsmodels.api and statsmodels.tsa.api.
Use the as keyword to import the modules using their canonical names:
numpy np
pandas pd
scipy sp
scipy.stats stats
matplotlib.pyplot plt
statsmodels.api sm
13
Module Canonical Name
statsmodels.tsa.api tsa
Exercises
1. numpy
2. np
3. By importing linalg from numpy and accessing it from linalg
4. By directly importing the function
14
Lesson 4
## Data September 2018 prices (adjusted closing prices) for the S&P 500 EFT (SPY), Apple (AAPL) and
Google (GOOG) are listed below:
Create vectors for each of the days in the Table named sep_xx where xx is the numeric date. For example,
15
import pandas as pd
Use the pandas function pd.to_datetime to convert a list of string dates to a pandas DateTimeIndex, which
can be used to set dates in other arrays.
For example, the first two dates are
import pandas as pd
dates_2 = pd.to_datetime(["4-9-2018","5-9-2018"])
print(dates_2)
which produces
Create vectors for each of the ticker symbols in Table named spy, aapl and goog, respectively. Use the variable
dates that you created in the previous step as the index.
For example
Create a DataFrame named prices containing Table. Set the column names equal to the ticker and set the
index to dates.
16
# Setup: Save prices, goog and sep_04 into a single file for use in other
,→lessons
dates = pd.Series(dates)
variables = [
"sep_04",
"sep_05",
"sep_06",
"sep_07",
"sep_10",
"sep_11",
"sep_12",
"sep_13",
"sep_14",
"sep_17",
"sep_18",
"sep_19",
"spy",
"goog",
"aapl",
"prices",
"dates",
]
with pd.HDFStore("data/dataframes.h5", mode="w") as h5:
for var in variables:
h5.put(var, globals()[var])
Exercises
Turn the table below into a DataFrame where the index is set as the index and the column names are used in
the DataFrame.
A Alcoa 3,428
B Berkshire 67,421
C Coca Cola 197.4
D Dannon -342.1
17
18
Lesson 5
import pandas as pd
hdf_file = "data/dataframes.h5"
Create a DataFrame named prices_row from the row vectors previously entered such that the results are
identical to prices. For example, the first two days worth of data are:
19
dates_2 = pd.to_datetime(["1998-09-04", "1998-09-05"])
prices_row = pd.DataFrame([sep_04, sep_05])
# Set the index after using concat to join
prices_row.index = dates_2
Verify that the DataFrame identical by printing the difference with prices
print(prices_row - prices)
Create a DataFrame named prices_col from the 3 column vectors entered such that the results are identical
to prices.
Note: .T transposes a 2-d array since DataFrame builds the array by rows.
Verify that the DataFrame identical by printing the difference with prices
Create a DataFrame named prices_dict from the 3 column vectors entered such that the results are identical
to prices
Verify that the DataFrame identical by printing the difference with prices
Exercises
Use the three series populated below to create a DataFrame using each as a row.
Note: Notice what happens in the resulting DataFrame since one of the Series has 4 elements while the
others have 3.
# Setup: Data for the Exercises
import pandas as pd
Build a DataFrame from the three series where each is used as a column.
20
Lesson 6
Read the data in momentum.csv and creating some variable. This cell uses some magic to automate repeated
typing.
# Setup: Load the momentum data
import pandas as pd
print(momentum.head())
mom_01 = momentum["mom_01"]
mom_10 = momentum["mom_10"]
This data set contains 2 years of data on the 10 momentum portfolios from 2016–2018. The variables are named
mom_XX where XX ranges from 01 (work return over the past 12 months) to 10 (best return over the past 12
months).
Get used to calling methods by computing the mean, standard deviation, skewness, kurtosis, max, and min.
Use the DataFrame functions mean, std, skew and kurt, min and max to print the values for mom_01.
In the second cell, call describe, a method that summarizes Series and DataFrames on mom_01.
Use the NumPy functions mean, std, min, max and the SciPy stats functions skew and kurtosis to produce
the same output.
21
Problem: Calling Functions with 2 Outputs
Some useful functions return 2 or more outputs. One example is np.linalg.slogdet computes the signed
log determinant of a square array. It returns two output, the sign and the log of the absolute determinant.
Use this function to compute the sign and log determinant of the 2 by 2 array:
1 2
2 9
Many functions take two or more inputs. Like outputs, the inputs are simply listed in order separated by
commas. Use np.linspace to produce a series of 11 points evenly spaced between 0 and 1.
Many functions have optional arguments. You can see these in a docstring since optional arguments take the
form variable=default. For example, see the help for scipy.special.comb, which has the function
signature
This tells us that N and k are required and that the other 2 inputs can be omitted if you are happy with the
defaults. However, if we want to change some of the optional inputs, then we can directly use the inputs name
in the function call.
Compute the number of distinct combinations of 5 objects from a set of 10.
Compute the total number of combinations allowing for repetition using the repetition=True keyword
argument.
Compute the number of combinations using the exact representation using only positional arguments for all 3
inputs. Repeat using the keyword argument for exact.
Explore the help available for calling functions ? operator. For example,
stats.kurtosis?
opens a help window that shows the inputs and output, while
help(stats.kurtosis)
22
Problem: Use help with a method
Use help to get the help for the kurt method attached to momentum.
Exercises
Use the info method on momentum to get information about this DataFrame.
Compute the day-by-day mean return of the portfolios in the momentum DataFrame using the axis keyword
argument. Use head and tail to show the first 5 rows and last 5 rows
23
24
Lesson 7
Custom Functions
Custom functions will play an important role later in the course when estimating parameters. Construct a
custom function that takes two arguments, mu and sigma2 and computes the likelihood function of a normal
random variable.
(x − µ)2
2 1
f (x; µ, σ ) = √ exp −
2πσ 2 2σ 2
Use def to start the function and compute the likelihood of:
x = 0, µ = 0, σ 2 = 1.
Exercises
Write a function named summary_stats that will take a single input, x, a DataFrame and return a DataFrame
with 4 columns and as many rows as there were columns in the original data where the columns contain the
mean, standard deviation, skewness and kurtosis of x.
Check your function by running
summary_stats(momentum)
25
momentum = pd.read_csv("data\momentum.csv", index_col="date", parse_dates=True)
Test your function using the momentum data in the next cell.
Change your previous function to return 4 outputs, each a pandas Series for the mean, standard deviation,
skewness, and the kurtosis.
Returning multiple outputs uses the syntax
return w, x, y, z
26
Lesson 8
Using DataFrames
returns = prices.pct_change()
spy_returns = returns["SPY"]
import numpy as np
log_returns = np.log(prices).diff()
Pt
first difference of the natural log of the prices. Mathematically this is rt = ln (Pt ) − ln (Pt−1 ) = ln Pt−1 ≈
Pt
Pt−1 − 1.
27
8.1 Basic Mathematical Operations
Parentheses () 4
Exponentiation ** 3
Multiplication * 2
Division / 2
Floor division // 2
Modulus % 2
Matrix multiplication @ 2
Addition + 1
Subtraction - 1
Note: Higher precedence operators are evaluated first, and ties are evaluated left to right.
Using only basic mathematical operations compute the correlation between the returns on AAPL and SPY.
Construct a DataFrame that only contains the SPY column from returns and add it to the return DataFrame
28
Problem: Constructing portfolio returns
Exercises
• a+b
• a+c
• b+c
• a+b+c
rs = np.random.RandomState(19991231)
29
30
Lesson 9
This lesson introduces the common DataFrame methods that we will repeatedly use in the course.
This first cell load data for use in this lesson.
# Setup: Load prices
import pandas as pd
Compute the return of a portfolio with weight 31 in each security using multiplication (*) and .sum().
Note: You need to use the axis keyword for the sum.
Using the function mean, compute the mean of the three returns series one at a time. For example
goog_mean = goog_returns.mean()
retmean = returns.mean()
What is the relationship between these two? Repeat this exercise for the standard deviation (std()).
31
Problem: Summing all elements
Compute the sum of the columns of returns using .sum(). How is this related to the mean computed in the
previous step?
Compute the minimum and maximum values of the columns of returns using the min() and max() commands.
Rounding up is handled by ceil, rounding down is handled by floor and rounding to the closest integer is handled
by round. Try all of these commands on 100 times returns. For example,
rounded = (100*returns).round()
Exercises
Compute the 5%, 25%, 50%, 75% and 95% quantiles of momentum using the quantile method.
# Setup: Load data
import pandas as pd
Exercise: Sorting
Use sort_values to sort momentum by the column mom_10. Verify that the sort was successful by looking
at the minimum of a diff.
Use sort_values to sort momentum by by the column mom_10 using a descending sort (see the help for
sort_values). Verify the sort worked by looking at the maximum of a diff.
Use the shape property to get the number of observations in momentum. Use it again to get the number of
columns.
32
Exercise: Use shift to Compute Returns
Compute the percentage change using only shift, division (/) and subtraction (-) on the Series mom_10.
Verify that your result matches what pct_change produces.
33
34
Lesson 10
Accessing elements in an array or a DataFrame is a common task. To begin this lesson, clear the workspace set
up some vectors and a 5 × 5 array. These vectors and matrix will make it easy to determine which elements are
selected by a command.
Start by creating 2 DataFrame and 2 Series. Define x=np.arange(24).reshape(5,5) which is a 5 by 5
array and y=np.arange(5) which is a 5-element 1-d array. We need:
Select the 2nd and 4th rows and 1st and 3rd columns of x_name.
Select the 2nd and 4th rows and 1st and 3rd columns of x_df.
35
Problem: Series selection with the default index
Select the subseries of y_named and y_s containing the first, fourth and fifth element.
Load the data in momentum.csv.
# Setup: Load the momentum data
import pandas as pd
Exercises
Select the data for May 2017 for momentum portfolios 1 and 10.
Using a slice of YYYY-MM, select the returns for momentum portfolio 5 in the final 6 months of 2016 as
Series
36
Lesson 11
Accessing elements in an array or a DataFrame is a common task. To begin this lesson, clear the workspace set
up some vectors and a 5 × 5 array. These vectors and matrix will make it easy to determine which elements are
selected by a command.
Using arange and reshape to create 3 arrays:
Python indexing is 0 based so that the first element has position 0, the second has position 1 and so on until the
last element has position n-1 in an array that contains n elements in total.
Use a slice to select the 2nd row of x and the 2nd element of y and z.
Question: What are the dimension selections?
37
Problem: List selection of a single row
Use a list to select the 2nd row of x and the 2nd element of y and z.
Question: What are the dimension selections?
Select the 2nd column of x using a scalar integer, a slice and a list.
Question: What the the dimensions of the selected elements?
Select the 2nd and 4th rows of x using both a slice and a list.
Combine these be combined to select the 2nd and 3rd columns and 2nd and 4th rows.
Use ix_ to select the 2nd and 4th rows and 1st and 3rd columns of x.
38
Exercises
Select the second and third rows of a and the first and last column. Use at least three different methods including
all slices, np.ix_, and mixed slice-list selection.
# Setup: Data for Exercises
import numpy as np
rs = np.random.RandomState(20000214)
a = rs.randint(1, 10, (4, 3))
b = rs.randint(1, 10, (6, 4))
print(f"a = \n {a}")
print()
print(f"b = \n {b}")
x[0:2,0:3] = y[1:3,1:4]
Assign the block consisting the first and third columns and the second and last rows of b to the last two rows
and last two columns of a
39
40
Lesson 12
Accessing elements in a DataFrame is a common task. To begin this lesson, clear the workspace set up some
vectors and a 5 × 5 array. These vectors and matrix will make it easy to determine which elements are selected
by a command.
Begin by creating:
Using double index notation, select the (0,2) and the (2,0) element of x_named.
Select the 2nd row of x_named using the colon (:) operator.
1. Select the 2nd row of x_named using a slice so that the selection remains a DataFrame.
2. Repeat using a list of indices to retain the DataFrame.
Select the 2nd column of x_named using the colon (:) operator.
41
Problem: Selecting Single Columns as DataFrames
Select the 2nd column of x_named so that the selection remains a DataFrame.
Select the 2nd and 4th rows of x_named using a slice. Repeat the selection using a list of integers.
Combine the previous selections to the 2nd and 3rd columns and the 2nd and 4th rows of x_named.
Note: This is the only important difference with NumPy. Arbitrary row/column selection using
DataFrame.iloc is simpler but less flexible.
Select the columns c1 and c2 and the 1st, 3rd and 5th row.
Select the rows r1 and r2 and the 1st, 3rd and final column.
Exercises
Compute the mean return of the momentum data in the first 66 observations and the last 66 observations.
# Setup: Load the momentum data
import pandas as pd
Compute the correlation of momentum portfolio 1, 5, and 10 in the first half of the sample and in the second
half.
42
Lesson 13
for Loops
• for loops
• Nested loops
Construct a for loop to sum the numbers between 1 and N for any N. A for loop that does nothing can be
written:
n = 10
for i in range(n):
pass
The compound return on a bond that pays interest annually at rate r is given by crt = Ti=1 (1 + r) = (1 + r)T .
Q
Use a for loop compute the total return for £100 invested today for 1, 2, . . . , 10 years. Store this variable in a 10
by 1 vector cr.
%matplotlib inline
43
plt.plot(y)
Begin by loading momentum data used in an earlier lesson. Compute a 22-day moving-window standard
deviation for each of the columns. Store the value at the end of the window.
When finished, make sure that std_dev is a DataFrame and plot the annualized percentage standard deviations
using:
import pandas as pd
Exercises
Exercise
1. Simulate a 1000 by 10 matrix consisting of 10 standard random walks using both nested loops and
np.cumsum.
2. Plot the results.
Using the momentum data, compute the maximum drawdown over all 22-day consecutive periods defined as
the smallest cumulative produce of the gross return (1+r) for 1, 2, .., 22 days.
Finally, compute the mean drawdown for each of the portfolios.
44
Lesson 14
Logical Operators
import numpy as np
rs = np.random.RandomState(20000101)
1. Check if z is 7
2. Check is z is not 5
3. Check if z is greater than or equal to 9
1. Determine if 2 ≤ z < 8
2. Determine if z < 2 ∪ z ≥ 8 using or
3. Rewrite 2 using not and your result from 1.
45
Exercises
rs = np.random.RandomState(19991213)
Exercise
Exercise
46
Lesson 15
Boolean Arrays
import numpy as np
import pandas as pd
print(momentum.head())
mom_01 = momentum["mom_01"]
mom_10 = momentum["mom_10"]
mom_05 = momentum["mom_05"]
For portfolios 1 and 10, determine whether each return is < 0 (separately).
Count the number of times that the returns in both portfolio 1 and portfolio 10 are negative. Next count the
number of times that the returns in portfolios 1 and 10 are both greater, in absolute value, that 2 times their
respective standard deviations.
For portfolios 1 and 10, count the number of times either of the returns is < 0.
47
Problem: Count the frequency of negative returns
What percent of returns are negative for each of the 10 momentum portfolios?
Use any to determine if any of the 10 portfolios experienced a loss greater than -5%.
Use all and negation to do the same check as any.
Exercises
Use all and sum to count the number of days where all of the portfolio returns were negative. Use any to
compute the number of days with at least 1 negative return and with no negative returns (Hint: use negation (~
or logical_not)).
Count the number of days where each of the portfolio returns is less than the 5% quantile for the portfolio. Also
report the fraction of days where all are in their lower 5% tail.
48
Lesson 16
Boolean Selection
• Boolean selection
• where
import numpy as np
import pandas as pd
print(momentum.head())
Select the rows in momentum where all returns on a day are negative.
Select the rows in momentum where 50% or more of the returns on a day are negative.
Select the columns in momentum what have the smallest and second smallest average returns.
Select the returns for the column with the single most negative return on days where all of the returns are
negative.
49
Problem: Selecting Elements using Logical Statements
For portfolio 1 and portfolio 10 compute the correlation when both returns are negative and when both are
positive.
# Setup: Reproducible random numbers
rs = np.random.RandomState(19991231)
x = rs.randint(1, 11, size=(10, 3))
x
Problem: Select the rows and column of x where both have means < E[x]
Use where to select the index of the elements in portfolio 5 that are negative. Next, use the where command
in its two output form to determine which elements of the portfolio return matrix are less than -2%.
Exercises
Select the column in momentum that has the highest standard deviation.
Select the columns that have kurtoses above the median kurtosis.
Exercise: Select
Select the rows where all of the returns in the row are less than the 25% quantile for their portfolio.
Note: Comparisons between DataFrames and Series works like mathematical operations (+, -, etc.).
50
Lesson 17
Conditional Statements
• if-elif-else blocks
Draw a standard normal value using np.random.standard_normal and print the value if it is negative.
Note: Rerun the cell a few time to see different output.
Draw a standard normal value and print “Positive” if it is positive and “Negative” if not.
Problem:
Draw a standard t random variable with 2 degrees of freedom using np.random.standard_t(2) and print
“Negative Outlier” if less than -2, “Positive Outlier” if larger than 2, and “Inlier” if between -2 and 2.
Exercises
Generate two standard normal values x and y using two calls to rs.standard_normal(). Use an if-elif-
else clause to print the quadrant they are in. The four quadrants are upper right, upper left, lower left and
lower right.
Generate a uniform using u = rs.sample(). Using this value and an if-else clause, generate a contami-
nated normal which is a draw from a N(0, 1) (N(µ, sigma2 )) if u < 0.95 or a draw from a N(0, 10) otherwise.
Use rs.normal to generate the normal variable.
51
52
Lesson 18
import pandas as pd
mom_01 = momentum.mom_01
print(momentum.head())
Use a for loop along with an if statement to simulate an asymmetric random walk of the form
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(y)
plt.plot(z)
plt.legend(["y", "z"])
Use boolean multiplication to simulate the same random walk without using an if-then statement.
53
# Setup: Plot the data
%matplotlib inline
For momentum portfolios 1 and 10, compute the length of the runs in the series. In pseudo code,
1. Compute the length longest run in the series and the index of the location of the longest run. Was it
positive or negative?
2. How many distinct runs lasted 5 or more days?
%matplotlib inline
Exercises
Simulate 100 observations of a time series with heteroskedasticity that follows a random walk of the form:
yt = yt−1 + σt εt
where εt ∼ N(0, 1), y0 = 0 and σt is:
• When generating the first 3 values, treat ε−1 , ε−2 and ε−3 as 0 (non-negative).
• Re-run the simulation to see different paths.
54
Lesson 19
Importing Data
• Importing data
• Converting dates
Read in the files GS10.csv and GS10.xls which have both been downloaded from FRED.
Exercises
1. Load the data in data/fred-md.csv in the columns sasdate, RPI and INDPRO using the usecols
keyword.
2. Remove the first row by selecting the second to the end.
3. Convert sasdate to dates
4. Set sasdate as the index and remove it from the DataFrame.
1. Load the data on the sheet “Long Mat” in the Excel file “data/exercise.xlsx”. These are 10 and 20 year
constant maturity yields.
2. Load the data on the sheet “Short Mat” in the Excel file “data/exercise.xlsx”. These are 1 and 3 year
constant maturity yields.
3. Combine the columns in the two DataFrames by creating a dictionary of the keys in each with the values
equal to the column names.
55
56
Lesson 20
This first block loads the data that was used in the previous lesson.
# Setup: Load the data to use later
import pandas as pd
Export both to a single HDF file (the closest thing to a “native” format in pandas).
Import the data saved as HDF and verify it is the same as the original data.
Exercises
57
• Parse the dates and set the index column to “sasdate”
• Remove first row labeled “Transform:” (Hint: Transpose, del and transpose back, or use drop)
• Re-parse the dates on the index
• Remove columns that have more than 10% missing values
• Save to “data/fred-md.h5” as HDF.
• Load the data into the variable reloaded and verify it is identical.
Export the columns RPI, INDPRO, and HWI from the FRED-MD data to "data/variablename.csv" so that,
e.g., RPI is exported to data/RPI.csv:
Note You need to complete the previous exercise first (or at least the first 4 steps).
58
Lesson 21
• Basic plotting
• Subplots
• Histograms
• Scatter Plots
Plotting in notebooks requires using a magic command, which starts with %, to initialize the plotting backend.
# Setup
%matplotlib inline
import matplotlib.pyplot as plt
Begin by loading the data in hf.h5. This data set contains high-frequency price data for IBM and MSFT on a
single day stored as two Series. IBM is stored as “IBM” in the HDF file, and MSFT is stored as "MSFT.
Problem: Subplot
Create a 2 by 1 subplot with the price of IBM in the top subplot and the price of MSFT in the bottom subplot.
Use matplotlib to directly plot ibm against its index. This is a repeat of a previous plot but shows how to
use the plot command directly.
59
Exercises
Use the HLOC data to produce a plot of MSFT’s 5 minute HLOC where the there are no lines, high is demar-
cated using a green triangle, low is demarcated using a red downward pointing triangle, open is demarcated
using a light grey leftward facing triangle and close is demarcated using a right facing triangle.
Note Get the axes from the first, plot, and reuse this when plotting the other series.
# Setup: Load data and create values
import pandas as pd
60
Lesson 22
• Histograms
• Scatter Plots
Plotting in notebooks requires using a magic command, which starts with %, to initialize the plotting backend.
# Setup
%matplotlib inline
import matplotlib.pyplot as plt
Begin by loading the data in hf.h5. This data set contains high-frequency price data for IBM and MSFT on a
single day stored as two Series. IBM is stored as “IBM” in the HDF file, and MSFT is stored as "MSFT.
Problem: Histogram
Produce a histogram of MSFT 1-minute returns (Hint: you have to produce the 1-minute Microsoft returns first
using resample and pct_change).
Scatter the 5-minute MSFT returns against the 5-minute IBM returns.
Hint: You will need to create both 5-minute return series, merge them, and then plot using the combined
DataFrame.
61
Exercises
Produce a 2 by 1 subplot with a histogram of the 5-minute returns of IBM in the top panel and 10-minute
returns of IBM in the bottom. Set an appropriate title on each of the 2 plots.
Exercise: Export the result of the previous exercise to JPEG and PDF
62
Final Exam
This self-grading notebook serves as a final exam for the introductory course. If you have grasped the contents
of the course, you should be able to complete this exam.
It is essential that you answer each cell by assigning the solution to QUESTION_# where # is the question
number.
We will start with a warm-up question that is already answered.
Question 0
QUESTION_0 = np.ones(3)
Question 1
1 0.2 0.5
0.2 1 0.8
0.5 0.8 1
as a NumPy array.
Question 2
1 0.2 0.5
0.2 1 0.8
0.5 0.8 1
as a DataFrame with columns and index both equal to ['A', 'B', 'C'].
63
Question 3
Load the momentum data in the CSV file momentum.csv, set the column date as the index, and ensure that
date is a DateTimeIndex.
Question 4
Construct a DataFrame using the data loaded in the previous question that contains the returns from momentum
portfolio 5 in March and April 2016.
Question 5
1, 3, 1, 2, 9, 4, 5, 6, 10, 4
Question 6
Compute the correlation matrix of momentum portfolios 1, 4, 6, and 10 as a DataFrame where the index and
columns are the portfolio names (e.g., ‘mom_01’) in the order listed above.
Question 7
Compute the percentage of returns of each of the 10 momentum portfolios that are outside of the interval
[µ̂ − σ̂ , µ̂ + σ̂ ]
where µ̂ is the mean and σ̂ is the standard deviation computed using 1 dof. The returned variable must be a
Series where the index is the portfolio names ordered from 1 to 10.
Question 8
Import the data the data in the sheet question 8 in final-exam.xlsx into a DataFrame where the index
contains the dates and variable name is the column name.
Question 9
Enter the DataFrame in the table below and save it to HDF with the key ‘question9’. The answer to this problem
must be the full path to the hdf file. The values in index should be the DataFrame’s index.
index data
A 6.0
E 2.7
G 1.6
64
index data
P 3.1
Note: If you want to get the full path to a file saved in the current directory, you can use
import os
file_name = 'my_file_name'
full_path = os.path.join(os.getcwd(), file_name)
Question 10
Compute the cumulative return on a portfolio the longs mom_10 and shorts mom_01. The first value should
be 1 + mom_10.iloc[0] - mom_01.iloc[0]. The second cumulative return should be the first return
times 1 + mom_10.iloc[1] - mom_01.iloc[1], and so on. The solution must be a Series with the name
‘momentum_factor’ and index equal to the index of the momentum DataFrame.
Note: The data in the momentum return file is in percentages, i.e., a return of 4.2% is recorded as 4.2.
Question 11
Write a function named QUESTION_11 that take 1 numerical input x and returns:
Question 12
Produce a scatter plot of the momentum returns of portfolios 1 (x-axis) and 10 using only data in 2016. Set the
x limits and y limits to be tight so that the lower bound is the smallest return plotted and the upper bound is the
largest return plotted. Use the ‘darkgrid’ theme from seaborn. Assign the figure handle to QUESTION_12.
Question 13
Compute the excess kurtosis of daily, weekly (using Friday and the end of the week) and monthly returns on the
10 momentum portfolios using the pandas function kurt. The solution must be a DataFrame with the portfolio
names as the index ordered form 1 to 10 and the sampling frequencies, ‘daily’, ‘weekly’, or ‘monthly’ as the
columns (in order). When computing weekly or monthly returns from daily data, use the sum of the daily
returns.
Question 14
Simulate a random walk using 100 normal observations from a NumPy RandomState initialized with a seed
of 19991231.
65
Question 15
Defining
import numpy as np
compute the ratio of the high-price to the low price in each month. The solution should be a DataFrame where
the index is the last date in each month and the columns are the variables names.
Question 16
where εi is a standard normal shock. Set y0 = ε0 and y1 = ε0 + ε1 . The solution should be a 1-d NumPy array
with 100 elements. Use a RandomState with a seed value of 19991231.
Question 17
What is the ratio of the largest eigenvalue to the smallest eigenvalue of the correlation matrix of the 10 momen-
tum returns?
Note: This is called the condition number of a matrix and is a measure of how closely correlated the series are.
You can compute the eigenvalues from the correlation matrix using np.linalg.eigs. See the help of this
function for more details.
Question 18
Write a function that takes a single input ‘x’ and return the string “The value of x is” and the value of x. For
example, if x is 3.14, then the returned value should be “The value of x is 3.14”. The function name must be
QUESTION_18.
Question 19
Compute the percentage of days where all 10 returns are positive and subtract the percentage of days where all
10 momentum returns are negative on the same day.
Question 20
Write the function QUESTION_20 that will take a single input s, which is a string and will return a Series that
counts the number of times each letter in s appears in s without regard to case. Do not include spaces. Ensure
the Series returned as its index sorted.
Hints:
66
• You can iterate across the letters of a string using
some_string = 'abcdefg'
for letter in some_string:
do somethign with letter...
67