Python Programming For Economics Finance
Python Programming For Economics Finance
and Finance
I Introduction to Python 1
1 About Python 3
3 An Introductory Example 31
4 Functions 49
5 Python Essentials 59
9 NumPy 115
10 Matplotlib 135
11 SciPy 147
12 Numba 157
13 Parallelization 171
14 Pandas 181
3
4 CONTENTS
17 Debugging 257
Part I
Introduction to Python
1
Chapter 1
About Python
1.1 Contents
• Overview 1.2
• What’s Python? 1.3
• Scientific Programming 1.4
• Learn More 1.5
“Python has gotten sufficiently weapons grade that we don’t descend into R any-
more. Sorry, R people. I used to be one of you but we no longer descend into R.”
– Chris Wiggins
1.2 Overview
3
4 CHAPTER 1. ABOUT PYTHON
The following chart, produced using Stack Overflow Trends, shows one measure of the relative
popularity of Python
The figure indicates not only that Python is widely used but also that adoption of Python
has accelerated significantly since 2012.
We suspect this is driven at least in part by uptake in the scientific domain, particularly in
rapidly growing fields like data science.
For example, the popularity of pandas, a library for data analysis with Python has exploded,
as seen here.
(The corresponding time path for MATLAB is shown for comparison)
1.4. SCIENTIFIC PROGRAMMING 5
Note that pandas takes off in 2012, which is the same year that we see Python’s popularity
begin to spike in the first figure.
Overall, it’s clear that
• Python is one of the most popular programming languages worldwide.
• Python is a major tool for scientific computing, accounting for a rapidly rising share of
scientific work around the globe.
1.3.3 Features
One nice feature of Python is its elegant syntax — we’ll see many examples later on.
Elegant code might sound superfluous but in fact it’s highly beneficial because it makes the
syntax easy to read and easy to remember.
Remembering how to read from files, sort dictionaries and other such routine tasks means
that you don’t need to break your flow in order to hunt down correct syntax.
Closely related to elegant syntax is an elegant design.
Features like iterators, generators, decorators and list comprehensions make Python highly
expressive, allowing you to get more done with less code.
Namespaces improve productivity by cutting down on bugs and syntax errors.
Fundamental matrix and array processing capabilities are provided by the excellent NumPy
library.
NumPy provides the basic array data type plus some simple processing operations.
For example, let’s build some arrays
In [2]: b @ c
Out[2]: 4.04891256782214e-16
The number you see here might vary slightly but it’s essentially zero.
(For older versions of Python and NumPy you need to use the np.dot function)
The SciPy library is built on top of NumPy and provides additional functionality.
2
For example, let’s calculate ∫−2 𝜙(𝑧)𝑑𝑧 where 𝜙 is the standard normal density.
ϕ = norm()
value, error = quad(ϕ.pdf, -2, 2) # Integrate using Gaussian quadrature
value
Out[3]: 0.9544997361036417
1.4.2 Graphics
The most popular and comprehensive Python library for creating figures and graphs is Mat-
plotlib, with functionality including
1.4. SCIENTIFIC PROGRAMMING 7
Example 3D plot
Out[4]: 3𝑥 + 𝑦
Out[5]: 𝑥2 + 2𝑥𝑦 + 𝑦2
solve polynomials
1.4. SCIENTIFIC PROGRAMMING 9
solve(x**2 + x + 2)
limit(1 / x, x, 0)
Out[7]: ∞
In [8]: limit(sin(x) / x, x, 0)
Out[8]: 1
In [9]: diff(sin(x), x)
The beauty of importing this functionality into Python is that we are working within a fully
fledged programming language.
We can easily create tables of derivatives, generate LaTeX output, add that output to figures
and so on.
1.4.4 Statistics
Python’s data manipulation and statistics libraries have improved rapidly over the last few
years.
Pandas
One of the most popular libraries for working with data is pandas.
Pandas is fast, efficient, flexible and well designed.
Here’s a simple example, using some dummy data generated with Numpy’s excellent random
functionality.
price weight
2010-12-28 0.471435 -1.190976
2010-12-29 1.432707 -0.312652
2010-12-30 -0.720589 0.887163
2010-12-31 0.859588 -0.636524
2011-01-01 0.015696 -2.242685
In [11]: df.mean()
nodelist=list(p.keys()),
node_size=120, alpha=0.5,
node_color=list(p.values()),
cmap=plt.cm.jet_r)
plt.show()
/home/ubuntu/anaconda3/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py:
↪579:
MatplotlibDeprecationWarning:
The iterable function was deprecated in Matplotlib 3.1 and will be removed in 3.3.�
↪Use
np.iterable instead.
if not cb.iterable(width):
Running your Python code on massive servers in the cloud is becoming easier and easier.
A nice example is Anaconda Enterprise.
See also
• Amazon Elastic Compute Cloud
• The Google App Engine (Python, Java, PHP or Go)
• Pythonanywhere
• Sagemath Cloud
Apart from the cloud computing options listed above, you might like to consider
12 CHAPTER 1. ABOUT PYTHON
There are many other interesting developments with scientific programming in Python.
Some representative examples include
• Jupyter — Python in your browser with interactive code cells, embedded images and
other useful features.
• Numba — Make Python run at the same speed as native machine code!
• Blaze — a generalization of NumPy.
• PyTables — manage large data sets.
• CVXPY — convex optimization in Python.
2.1 Contents
• Overview 2.2
• Anaconda 2.3
• Jupyter Notebooks 2.4
• Installing Libraries 2.5
• Working with Python Files 2.6
• Exercises 2.7
2.2 Overview
2.3 Anaconda
The core Python package is easy to install but not what you should choose for these lectures.
These lectures require the entire scientific programming ecosystem, which
• the core installation doesn’t provide
• is painful to install one piece at a time.
Hence the best approach for our purposes is to install a Python distribution that contains
13
14 CHAPTER 2. SETTING UP YOUR PYTHON ENVIRONMENT
Anaconda supplies a tool called conda to manage and upgrade your Anaconda packages.
One conda command you should execute regularly is the one that updates the whole Ana-
conda distribution.
As a practice run, please execute the following
1. Open up a terminal
Jupyter notebooks are one of the many possible ways to interact with Python and the scien-
tific libraries.
They use a browser-based interface to Python with
• The ability to write and execute Python commands.
• Formatted output in the browser, including tables, figures, animation, etc.
• The option to mix in formatted text and mathematical expressions.
Because of these features, Jupyter is now a major player in the scientific computing ecosys-
tem.
2.4. JUPYTER NOTEBOOKS 15
Here’s an image showing execution of some code (borrowed from here) in a Jupyter notebook
While Jupyter isn’t the only way to code in Python, it’s great for when you wish to
• start coding in Python
• test new ideas or interact with small pieces of code
• share or collaborate scientific ideas with students or colleagues
These lectures are designed for executing in Jupyter notebooks.
Once you have installed Anaconda, you can start the Jupyter notebook.
Either
• search for Jupyter in your applications menu, or
• open up a terminal and type jupyter notebook
– Windows users should substitute “Anaconda command prompt” for “terminal” in
16 CHAPTER 2. SETTING UP YOUR PYTHON ENVIRONMENT
The notebook displays an active cell, into which you can type Python commands.
Let’s start with how to edit code and run simple programs.
Running Cells
Notice that, in the previous figure, the cell is surrounded by a green border.
This means that the cell is in edit mode.
In this mode, whatever you type will appear in the cell with the flashing cursor.
When you’re ready to execute the code in a cell, hit Shift-Enter instead of the usual En-
ter.
2.4. JUPYTER NOTEBOOKS 19
(Note: There are also menu and button options for running code in a cell that you can find
by exploring)
Modal Editing
The next thing to understand about the Jupyter notebook is that it uses a modal editing sys-
tem.
This means that the effect of typing at the keyboard depends on which mode you are in.
The two modes are
1. Edit mode
1. Command mode
• The green border is replaced by a grey (or grey and blue) border
• Keystrokes are interpreted as commands — for example, typing b adds a new cell below
the current one
To switch to
• command mode from edit mode, hit the Esc key or Ctrl-M
20 CHAPTER 2. SETTING UP YOUR PYTHON ENVIRONMENT
Python supports unicode, allowing the use of characters such as 𝛼 and 𝛽 as names in your
code.
In a code cell, try typing \alpha and then hitting the tab key on your keyboard.
A Test Program
ax = plt.subplot(111, projection='polar')
ax.bar(θ, radii, width=width, bottom=0.0, color=colors, alpha=0.5)
plt.show()
2.4. JUPYTER NOTEBOOKS 21
Don’t worry about the details for now — let’s just run it and see what happens.
The easiest way to run this code is to copy and paste it into a cell in the notebook.
Hopefully you will get a similar plot.
Tab Completion
On-Line Help
Clicking on the top right of the lower split closes the on-line help.
Other Content
In addition to executing code, the Jupyter notebook allows you to embed text, equations, fig-
ures and even videos in the page.
For example, here we enter a mixture of plain text and LaTeX instead of code
24 CHAPTER 2. SETTING UP YOUR PYTHON ENVIRONMENT
Next we Esc to enter command mode and then type m to indicate that we are writing Mark-
down, a mark-up language similar to (but simpler than) LaTeX.
(You can also use your mouse to select Markdown from the Code drop-down box just below
the list of menu items)
Now we Shift+Enter to produce this
2.4. JUPYTER NOTEBOOKS 25
Notebook files are just text files structured in JSON and typically ending with .ipynb.
You can share them in the usual way that you share files — or by using web services such as
nbviewer.
The notebooks you see on that site are static html representations.
To run one, download it as an ipynb file by clicking on the download icon at the top right.
Save it somewhere, navigate to it from the Jupyter dashboard and then run as discussed
above.
QuantEcon has its own site for sharing Jupyter notebooks related to economics – QuantEcon
Notes.
Notebooks submitted to QuantEcon Notes can be shared with a link, and are open to com-
ments and votes by the community.
26 CHAPTER 2. SETTING UP YOUR PYTHON ENVIRONMENT
into a cell.
Alternatively, you can type the following into a terminal
So far we’ve focused on executing Python code entered into a Jupyter notebook cell.
Traditionally most Python code has been run in a different way.
Code is first saved in a text file on a local machine
By convention, these text files have a .py extension.
We can create an example of such a file as follows:
print("foobar")
Writing foo.py
This writes the line print("foobar") into a file called foo.py in the local directory.
Here %%file is an example of a cell magic.
2.7. EXERCISES 27
If you come across code saved in a *.py file, you’ll need to consider the following questions:
Option 1: JupyterLab
One can also edit files using a text editor and then run them from within Jupyter notebooks.
A text editor is an application that is specifically designed to work with text files — such as
Python programs.
Nothing beats the power and efficiency of a good text editor for working with program text.
A good text editor will provide
• efficient text editing commands (e.g., copy, paste, search and replace)
2.7 Exercises
2.7.1 Exercise 1
If Jupyter is still running, quit by using Ctrl-C at the terminal where you started it.
Now launch again, but this time using jupyter notebook --no-browser.
This should start the kernel without launching the browser.
28 CHAPTER 2. SETTING UP YOUR PYTHON ENVIRONMENT
Note also the startup message: It should give you a URL such as
http://localhost:8888 where the notebook is running.
Now
2. Enter the URL from above (e.g. http://localhost:8888) in the address bar at the
top.
2.7.2 Exercise 2
1. Installing Git.
For example, if you’ve installed the command line version, open up a terminal and enter.
(This is just git clone in front of the URL for the repository)
As the 2nd task,
1. Sign up to GitHub.
2.7. EXERCISES 29
2. Look into ‘forking’ GitHub repositories (forking means making your own copy of a
GitHub repository, stored on GitHub).
3. Fork QuantEcon.py.
4. Clone your fork to some local directory, make edits, commit them, and push them back
up to your forked GitHub repo.
An Introductory Example
3.1 Contents
• Overview 3.2
• The Task: Plotting a White Noise Process 3.3
• Version 1 3.4
• Alternative Implementations 3.5
• Another Application 3.6
• Exercises 3.7
• Solutions 3.8
3.2 Overview
Suppose we want to simulate and plot the white noise process 𝜖0 , 𝜖1 , … , 𝜖𝑇 , where each draw
𝜖𝑡 is independent standard normal.
In other words, we want to generate figures that look something like this:
31
32 CHAPTER 3. AN INTRODUCTORY EXAMPLE
3.4 Version 1
Here are a few lines of code that perform the task we set
ϵ_values = np.random.randn(100)
plt.plot(ϵ_values)
plt.show()
3.4. VERSION 1 33
3.4.1 Imports
The first two lines of the program import functionality from external code libraries.
The first line imports NumPy, a favorite Python package for tasks like
• working with arrays (vectors and matrices)
• common mathematical functions like cos and sqrt
• generating random numbers
• linear algebra, etc.
After import numpy as np we have access to these attributes via the syntax
np.attribute.
Here’s two more examples
In [3]: np.sqrt(4)
Out[3]: 2.0
In [4]: np.log(4)
Out[4]: 1.3862943611198906
numpy.sqrt(4)
34 CHAPTER 3. AN INTRODUCTORY EXAMPLE
Out[5]: 2.0
But the former method (using the short name np) is convenient and more standard.
Packages
2. possibly some compiled code that can be accessed by Python (e.g., functions compiled
from C or FORTRAN code)
3. a file called __init__.py that specifies what will be executed when we type import
package_name
In fact, you can find and explore the directory for NumPy on your computer easily enough if
you look around.
On this machine, it’s located in
anaconda3/lib/python3.7/site-packages/numpy
Subpackages
np.sqrt(4)
3.4. VERSION 1 35
Out[6]: 2.0
sqrt(4)
Out[7]: 2.0
Returning to our program that plots white noise, the remaining three lines after the import
statements are
The first line generates 100 (quasi) independent standard normals and stores them in
ϵ_values.
The next two lines genererate the plot.
We can and will look at various ways to configure and improve this plot below.
36 CHAPTER 3. AN INTRODUCTORY EXAMPLE
Let’s try writing some alternative versions of our first program, which plotted IID draws from
the normal distribution.
The programs below are less efficient than the original one, and hence somewhat artificial.
But they do help us illustrate some important Python syntax and semantics in a familiar set-
ting.
for i in range(ts_length):
e = np.random.randn()
ϵ_values.append(e)
plt.plot(ϵ_values)
plt.show()
In brief,
• The first line sets the desired length of the time series.
• The next line creates an empty list called ϵ_values that will store the 𝜖𝑡 values as we
generate them.
• The statement # empty list is a comment, and is ignored by Python’s interpreter.
• The next three lines are the for loop, which repeatedly draws a new random number 𝜖𝑡
and appends it to the end of the list ϵ_values.
3.5. ALTERNATIVE IMPLEMENTATIONS 37
• The last two lines generate the plot and display it to the user.
Let’s study some parts of this program in more detail.
3.5.2 Lists
Out[10]: list
The first element of x is an integer, the next is a string, and the third is a Boolean value.
When adding a value to a list, we can use the syntax list_name.append(some_value)
In [11]: x
In [12]: x.append(2.5)
x
Here append() is what’s called a method, which is a function “attached to” an object—in
this case, the list x.
We’ll learn all about methods later on, but just to give you some idea,
• Python objects such as lists, strings, etc. all have methods that are used to manipulate
the data contained in the object.
• String objects have string methods, list objects have list methods, etc.
Another useful list method is pop()
In [13]: x
In [14]: x.pop()
Out[14]: 2.5
In [15]: x
Lists in Python are zero-based (as in C, Java or Go), so the first element is referenced by
x[0]
Out[16]: 10
Out[17]: 'foo'
Now let’s consider the for loop from the program above, which was
Python executes the two indented lines ts_length times before moving on.
These two lines are called a code block, since they comprise the “block” of code that we
are looping over.
Unlike most other languages, Python knows the extent of the code block only from indenta-
tion.
In our program, indentation decreases after line ϵ_values.append(e), telling Python that
this line marks the lower limit of the code block.
More on indentation below—for now, let’s look at another example of a for loop
This example helps to clarify how the for loop works: When we execute a loop of the form
In discussing the for loop, we explained that the code blocks being looped over are delimited
by indentation.
In fact, in Python, all code blocks (i.e., those occurring inside loops, if clauses, function defi-
nitions, etc.) are delimited by indentation.
Thus, unlike most other languages, whitespace in Python code affects the output of the pro-
gram.
Once you get used to it, this is a good thing: It
• forces clean, consistent indentation, improving readability
• removes clutter, such as the brackets or end statements used in other languages
On the other hand, it takes a bit of care to get right, so please remember:
• The line before the start of a code block always ends in a colon
– for i in range(10):
– if x > y:
– while x < 100:
– etc., etc.
• All lines in a code block must have the same amount of indentation.
• The Python standard is 4 spaces, and that’s what you should use.
The for loop is the most common technique for iteration in Python.
But, for the purpose of illustration, let’s modify the program above to use a while loop in-
stead.
Note that
• the code block for the while loop is again delimited only by indentation
• the statement i = i + 1 can be replaced by i += 1
for t in range(T):
b[t+1] = (1 + r) * b[t]
The statement b = np.empty(T+1) allocates storage in memory for T+1 (floating point)
numbers.
These numbers are filled in by the for loop.
Allocating memory at the start is more efficient than using a Python list and append, since
the latter must repeatedly ask for storage space from the operating system.
Notice that we added a legend to the plot — a feature you will be asked to use in the exer-
cises.
3.7 Exercises
Now we turn to exercises. It is important that you complete them before continuing, since
they present new concepts we will need.
3.7.1 Exercise 1
Your first task is to simulate and plot the correlated time series
3.7.2 Exercise 2
Starting with your solution to exercise 2, plot three simulated time series, one for each of the
cases 𝛼 = 0, 𝛼 = 0.8 and 𝛼 = 0.98.
Use a for loop to step through the 𝛼 values.
If you can, add a legend, to help distinguish between the three time series.
Hints:
• If you call the plot() function multiple times before calling show(), all of the lines
you produce will end up on the same figure.
• For the legend, noted that the expression 'foo' + str(42) evaluates to 'foo42'.
3.7.3 Exercise 3
3.7.4 Exercise 4
One important aspect of essentially all programming languages is branching and conditions.
In Python, conditions are usually implemented with if–else syntax.
Here’s an example, that prints -1 for each negative number in an array and 1 for each non-
negative number
-1
1
-1
1
Now, write a new solution to Exercise 3 that does not use an existing function to compute
the absolute value.
Replace this existing function with an if–else condition.
3.8. SOLUTIONS 43
3.7.5 Exercise 5
3.8 Solutions
3.8.1 Exercise 1
In [26]: α = 0.9
T = 200
x = np.empty(T+1)
x[0] = 0
for t in range(T):
x[t+1] = α * x[t] + np.random.randn()
plt.plot(x)
plt.show()
44 CHAPTER 3. AN INTRODUCTORY EXAMPLE
3.8.2 Exercise 2
for α in α_values:
x[0] = 0
for t in range(T):
x[t+1] = α * x[t] + np.random.randn()
plt.plot(x, label=f'$\\alpha = {α}$')
plt.legend()
plt.show()
3.8. SOLUTIONS 45
3.8.3 Exercise 3
In [28]: α = 0.9
T = 200
x = np.empty(T+1)
x[0] = 0
for t in range(T):
x[t+1] = α * np.abs(x[t]) + np.random.randn()
plt.plot(x)
plt.show()
46 CHAPTER 3. AN INTRODUCTORY EXAMPLE
3.8.4 Exercise 4
In [29]: α = 0.9
T = 200
x = np.empty(T+1)
x[0] = 0
for t in range(T):
if x[t] < 0:
abs_x = - x[t]
else:
abs_x = x[t]
x[t+1] = α * abs_x + np.random.randn()
plt.plot(x)
plt.show()
3.8. SOLUTIONS 47
In [30]: α = 0.9
T = 200
x = np.empty(T+1)
x[0] = 0
for t in range(T):
abs_x = - x[t] if x[t] < 0 else x[t]
x[t+1] = α * abs_x + np.random.randn()
plt.plot(x)
plt.show()
48 CHAPTER 3. AN INTRODUCTORY EXAMPLE
3.8.5 Exercise 5
In [31]: n = 100000
count = 0
for i in range(n):
u, v = np.random.uniform(), np.random.uniform()
d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
if d < 0.5:
count += 1
area_estimate = count / n
3.12892
Chapter 4
Functions
4.1 Contents
• Overview 4.2
• Function Basics 4.3
• Defining Functions 4.4
• Applications 4.5
• Exercises 4.6
• Solutions 4.7
4.2 Overview
One construct that’s extremely useful and provided by almost all programming languages is
functions.
We have already met several functions, such as
• the sqrt() function from NumPy and
• the built-in print() function
In this lecture we’ll treat functions systematically and begin to learn just how useful and im-
portant they are.
One of the things we will learn to do is build our own user-defined functions
We will use the following imports.
49
50 CHAPTER 4. FUNCTIONS
Python has a number of built-in functions that are available without import.
We have already met some
Out[2]: 20
In [3]: print('foobar')
foobar
In [4]: str(22)
Out[4]: '22'
In [5]: type(22)
Out[5]: int
Out[6]: False
Out[7]: True
If the built-in functions don’t cover what we need, we either need to import functions or cre-
ate our own.
Examples of importing and using functions were given in the previous lecture
Here’s another one, which tests whether a given year is a leap year:
calendar.isleap(2020)
Out[8]: True
4.4. DEFINING FUNCTIONS 51
4.4.1 Syntax
Here’s a very simple Python function, that implements the mathematical function 𝑓(𝑥) =
2𝑥 + 1
Now that we’ve defined this function, let’s call it and check whether it does what we expect:
In [10]: f(1)
Out[10]: 3
In [11]: f(10)
Out[11]: 21
Here’s a longer function, that computes the absolute value of a given number.
(Such a function already exists as a built-in, but let’s write our own for the exercise.)
if x < 0:
abs_value = -x
else:
abs_value = x
return abs_value
In [13]: print(new_abs_function(3))
print(new_abs_function(-3))
52 CHAPTER 4. FUNCTIONS
3
3
User-defined functions are important for improving the clarity of your code by
• separating different strands of logic
• facilitating code reuse
(Writing the same thing twice is almost always a bad idea)
We will say more about this later.
4.5 Applications
for i in range(ts_length):
e = np.random.randn()
ϵ_values.append(e)
plt.plot(ϵ_values)
plt.show()
data = generate_data(100)
plt.plot(data)
plt.show()
When the interpreter gets to the expression generate_data(100), it executes the function
body with n set equal to 100.
The net result is that the name data is bound to the list ϵ_values returned by the func-
tion.
Hopefully, the syntax of the if/else clause is self-explanatory, with indentation again delimit-
ing the extent of the code blocks.
Notes
• We are passing the argument U as a string, which is why we write it as 'U'.
• Notice that equality is tested with the == syntax, not =.
– For example, the statement a = 10 assigns the name a to the value 10.
– The expression a == 10 evaluates to either True or False, depending on the
value of a.
Now, there are several ways that we can simplify the code above.
For example, we can get rid of the conditionals all together by just passing the desired gener-
ator type as a function.
To understand this, consider the following version.
4.5. APPLICATIONS 55
Out[18]: 7
In [19]: m = max
m(7, 2, 4)
Out[19]: 7
56 CHAPTER 4. FUNCTIONS
Here we created another name for the built-in function max(), which could then be used in
identical ways.
In the context of our program, the ability to bind new names to functions means that there is
no problem passing a function as an argument to another function—as we did above.
4.6 Exercises
4.6.1 Exercise 1
4.6.2 Exercise 2
The binomial random variable 𝑌 ∼ 𝐵𝑖𝑛(𝑛, 𝑝) represents the number of successes in 𝑛 binary
trials, where each trial succeeds with probability 𝑝.
Without any import besides from numpy.random import uniform, write a function
binomial_rv such that binomial_rv(n, p) generates one draw of 𝑌 .
Hint: If 𝑈 is uniform on (0, 1) and 𝑝 ∈ (0, 1), then the expression U < p evaluates to True
with probability 𝑝.
4.6.3 Exercise 3
First, write a function that returns one realization of the following random device
Second, write another function that does the same task except that the second rule of the
above random device becomes
• If a head occurs k or more times within this sequence, pay one dollar.
Use no import besides from numpy.random import uniform.
4.7 Solutions
4.7.1 Exercise 1
factorial(4)
Out[20]: 24
4.7.2 Exercise 2
binomial_rv(10, 0.5)
Out[21]: 6
4.7.3 Exercise 3
payoff = 0
count = 0
for i in range(10):
U = uniform()
count = count + 1 if U < 0.5 else 0
print(count) # print counts for clarity
if count == k:
payoff = 1
return payoff
draw(3)
0
0
0
0
0
1
58 CHAPTER 4. FUNCTIONS
0
1
2
3
Out[22]: 1
payoff = 0
count = 0
for i in range(10):
U = uniform()
count = count + ( 1 if U < 0.5 else 0 )
print(count)
if count == k:
payoff = 1
return payoff
draw_new(3)
0
0
1
1
1
1
2
3
4
4
Out[23]: 1
Chapter 5
Python Essentials
5.1 Contents
• Overview 5.2
• Data Types 5.3
• Input and Output 5.4
• Iterating 5.5
• Comparisons and Logical Operators 5.6
• More Functions 5.7
• Coding Style and PEP8 5.8
• Exercises 5.9
• Solutions 5.10
5.2 Overview
59
60 CHAPTER 5. PYTHON ESSENTIALS
Python provides numerous other built-in Python data types, some of which we’ve already met
• strings, lists, etc.
Let’s learn a bit more about them.
One simple data type is Boolean values, which can be either True or False
In [1]: x = True
x
Out[1]: True
We can check the type of any object in memory using the type() function.
In [2]: type(x)
Out[2]: bool
In the next line of code, the interpreter evaluates the expression on the right of = and binds y
to this value
Out[3]: False
In [4]: type(y)
Out[4]: bool
In [5]: x + y
Out[5]: 1
In [6]: x * y
Out[6]: 0
Out[7]: 2
5.3. DATA TYPES 61
sum(bools)
Out[8]: 3
In [9]: x = complex(1, 2)
y = complex(2, 1)
print(x * y)
type(x)
5j
Out[9]: complex
5.3.2 Containers
Python has several basic types for storing collections of (possibly heterogeneous) data.
We’ve already discussed lists.
A related data type is tuples, which are “immutable” lists
In [11]: type(x)
Out[11]: tuple
In Python, an object is called immutable if, once created, the object cannot be changed.
Conversely, an object is mutable if it can still be altered after creation.
Python lists are mutable
In [12]: x = [1, 2]
x[0] = 10
x
Out[12]: [10, 2]
In [13]: x = (1, 2)
x[0] = 10
�
↪---------------------------------------------------------------------------
<ipython-input-13-d1b2647f6c81> in <module>
1 x = (1, 2)
----> 2 x[0] = 10
We’ll say more about the role of mutable and immutable data a bit later.
Tuples (and lists) can be “unpacked” as follows
Out[14]: 10
In [15]: y
Out[15]: 20
Slice Notation
To access multiple elements of a list or tuple, you can use Python’s slice notation.
For example,
In [16]: a = [2, 4, 6, 8]
a[1:]
Out[16]: [4, 6, 8]
In [17]: a[1:3]
Out[17]: [4, 6]
5.3. DATA TYPES 63
Out[18]: [6, 8]
In [19]: s = 'foobar'
s[-3:] # Select the last three elements
Out[19]: 'bar'
Two other container types we should mention before moving on are sets and dictionaries.
Dictionaries are much like lists, except that the items are named instead of numbered
Out[20]: dict
In [21]: d['age']
Out[21]: 33
Out[22]: set
Out[23]: False
In [24]: s1.intersection(s2)
Out[24]: {'b'}
Let’s briefly review reading and writing to text files, starting with writing
Here
• The built-in function open() creates a file object for writing to.
• Both write() and close() are methods of file objects.
Where is this file that we’ve created?
Recall that Python maintains a concept of the present working directory (pwd) that can be
located from with Jupyter or IPython via
In [27]: %pwd
Out[27]: '/home/ubuntu/repos/lecture-python-programming/_build/jupyterpdf/
↪executed'
In [29]: print(out)
Testing
Testing again
5.4.1 Paths
Note that if newfile.txt is not in the present working directory then this call to open()
fails.
In this case, you can shift the file to the pwd or specify the full path to the file
f = open('insert_full_path_to_file/newfile.txt', 'r')
5.5. ITERATING 65
5.5 Iterating
One of the most important tasks in computing is stepping through a sequence of data and
performing a given action.
One of Python’s strengths is its simple, flexible interface to this kind of iteration via the for
loop.
Many Python objects are “iterable”, in the sense that they can be looped over.
To give an example, let’s write the file us_cities.txt, which lists US cities and their popula-
tion, to the present working directory.
Overwriting us_cities.txt
Here format() is a string method used for inserting variables into strings.
The reformatting of each line is the result of three different string methods, the details of
which can be left till later.
The interesting part of this program for us is line 2, which shows that
1. The file object data_file is iterable, in the sense that it can be placed to the right of
in within a for loop.
One thing you might have noticed is that Python tends to favor looping without explicit in-
dexing.
For example,
1
4
9
is preferred to
1
4
9
When you compare these two alternatives, you can see why the first one is preferred.
Python provides some facilities to simplify looping without indices.
One is zip(), which is used for stepping through pairs from two sequences.
For example, try running the following code
The zip() function is also useful for creating dictionaries — for example
If we actually need the index from a list, one option is to use enumerate().
To understand what enumerate() does, consider the following example
letter_list[0] = 'a'
letter_list[1] = 'b'
letter_list[2] = 'c'
We can also simplify the code for generating the list of random draws considerably by using
something called a list comprehension.
List comprehensions are an elegant Python tool for creating lists.
Consider the following example, where the list comprehension is on the right-hand side of the
second line
In [38]: range(8)
Out[38]: range(0, 8)
5.6.1 Comparisons
Many different kinds of expressions evaluate to one of the Boolean values (i.e., True or
False).
A common type is comparisons, such as
In [40]: x, y = 1, 2
x < y
Out[40]: True
In [41]: x > y
Out[41]: False
Out[42]: True
Out[43]: True
In [44]: x = 1 # Assignment
x == 2 # Comparison
Out[44]: False
In [45]: 1 != 2
Out[45]: True
Note that when testing conditions, we can use any valid Python expression
Out[46]: 'yes'
5.7. MORE FUNCTIONS 69
Out[47]: 'no'
Out[48]: True
Out[49]: False
Out[50]: True
Out[51]: False
Out[52]: True
Remember
• P and Q is True if both are True, else False
• P or Q is False if both are False, else True
Let’s talk a bit more about functions, which are all important for good programming style.
70 CHAPTER 5. PYTHON ESSENTIALS
Functions without a return statement automatically return the special Python object None.
5.7.2 Docstrings
Python has a system for adding comments to functions, modules, etc. called docstrings.
The nice thing about docstrings is that they are available at run-time.
Try running this
In [55]: f?
Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
Definition: f(x)
Docstring: This function squares its argument
In [56]: f??
Type: function
String Form:<function f at 0x2223320>
5.7. MORE FUNCTIONS 71
File: /home/john/temp/temp.py
Definition: f(x)
Source:
def f(x):
"""
This function squares its argument
"""
return x**2
With one question mark we bring up the docstring, and with two we get the source code as
well.
and
quad(lambda x: x**3, 0, 2)
Here the function created by lambda is said to be anonymous because it was never given a
name.
In this call to Matplotlib’s plot function, notice that the last argument is passed in
name=argument syntax.
This is called a keyword argument, with label being the keyword.
Non-keyword arguments are called positional arguments, since their meaning is determined by
order
• plot(x, 'b-', label="white noise") is different from plot('b-', x,
label="white noise")
Keyword arguments are particularly useful when a function has a lot of arguments, in which
case it’s hard to remember the right order.
You can adopt keyword arguments in user-defined functions with no difficulty.
The next example illustrates the syntax
The keyword argument values we supplied in the definition of f become the default values
In [61]: f(2)
Out[61]: 3
Out[62]: 14
To learn more about the Python programming philosophy type import this at the
prompt.
Among other things, Python strongly favors consistency in programming style.
We’ve all heard the saying about consistency and little minds.
In programming, as in mathematics, the opposite is true
• A mathematical paper where the symbols ∪ and ∩ were reversed would be very hard to
read, even if the author told you so on the first page.
In Python, the standard style is set out in PEP8.
(Occasionally we’ll deviate from PEP8 in these lectures to better match mathematical nota-
tion)
5.9 Exercises
5.9.1 Exercise 1
Part 1: Given two numeric lists or tuples x_vals and y_vals of equal length, compute their
inner product using zip().
Part 2: In one line, count the number of even numbers in 0,…,99.
• Hint: x % 2 returns 0 if x is even, 1 otherwise.
Part 3: Given pairs = ((2, 5), (4, 2), (9, 8), (12, 10)), count the number of
pairs (a, b) such that both a and b are even.
5.9.2 Exercise 2
𝑛
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥 + ⋯ 𝑎𝑛 𝑥 = ∑ 𝑎𝑖 𝑥𝑖
2 𝑛
(1)
𝑖=0
Write a function p such that p(x, coeff) that computes the value in (1) given a point x
and a list of coefficients coeff.
Try to use enumerate() in your loop.
5.9.3 Exercise 3
Write a function that takes a string as an argument and returns the number of capital letters
in the string.
Hint: 'foo'.upper() returns 'FOO'.
5.9.4 Exercise 4
Write a function that takes two sequences seq_a and seq_b as arguments and returns True
if every element in seq_a is also an element of seq_b, else False.
• By “sequence” we mean a list, a tuple or a string.
• Do the exercise without using sets and set methods.
5.9.5 Exercise 5
When we cover the numerical libraries, we will see they include many alternatives for interpo-
lation and function approximation.
Nevertheless, let’s write our own function approximation routine as an exercise.
In particular, without using any imports, write a function linapprox that takes as argu-
ments
• A function f mapping some interval [𝑎, 𝑏] into ℝ.
• Two scalars a and b providing the limits of this interval.
• An integer n determining the number of grid points.
• A number x satisfying a <= x <= b.
74 CHAPTER 5. PYTHON ESSENTIALS
and returns the piecewise linear interpolation of f at x, based on n evenly spaced grid points
a = point[0] < point[1] < ... < point[n-1] = b.
Aim for clarity, not efficiency.
5.9.6 Exercise 6
Using list comprehension syntax, we can simplify the loop in the following code.
n = 100
ϵ_values = []
for i in range(n):
e = np.random.randn()
ϵ_values.append(e)
5.10 Solutions
5.10.1 Exercise 1
Part 1 Solution:
Out[64]: 6
Out[65]: 6
Part 2 Solution:
One solution is
Out[66]: 50
Out[67]: 50
Some less natural alternatives that nonetheless help to illustrate the flexibility of list compre-
hensions are
Out[68]: 50
and
Out[69]: 50
Part 3 Solution
In [70]: pairs = ((2, 5), (4, 2), (9, 8), (12, 10))
sum([x % 2 == 0 and y % 2 == 0 for x, y in pairs])
Out[70]: 2
5.10.2 Exercise 2
Out[72]: 6
5.10.3 Exercise 3
Out[73]: 3
76 CHAPTER 5. PYTHON ESSENTIALS
Out[74]: 3
5.10.4 Exercise 4
Here’s a solution:
# == test == #
True
False
Of course, if we use the sets data type then the solution is easier
5.10.5 Exercise 5
Parameters
==========
f : function
The function to approximate
n : integer
Number of grid points
Returns
5.10. SOLUTIONS 77
=======
A float. The interpolant evaluated at x
"""
length_of_interval = b - a
num_subintervals = n - 1
step = length_of_interval / num_subintervals
# === x must lie between the gridpoints (point - step) and point === #
u, v = point - step, point
5.10.6 Exercise 6
In [78]: n = 100
ϵ_values = [np.random.randn() for i in range(n)]
78 CHAPTER 5. PYTHON ESSENTIALS
Chapter 6
6.1 Contents
• Overview 6.2
• Objects 6.3
• Summary 6.4
6.2 Overview
Python is a pragmatic language that blends object-oriented and procedural styles, rather than
taking a purist approach.
However, at a foundational level, Python is object-oriented.
In particular, in Python, everything is an object.
In this lecture, we explain what that statement means and why it matters.
79
80 CHAPTER 6. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING
6.3 Objects
In Python, an object is a collection of data and instructions held in computer memory that
consists of
1. a type
2. a unique identity
4. methods
6.3.1 Type
Python provides for different types of objects, to accommodate different categories of data.
For example
Out[1]: str
Out[2]: int
Out[3]: '300cc'
Out[4]: 700
�
↪---------------------------------------------------------------------------
<ipython-input-5-263a89d2d982> in <module>
----> 1 '300' + 400
Here we are mixing types, and it’s unclear to Python whether the user wants to
• convert '300' to an integer and then add it to 400, or
• convert 400 to string and then concatenate it with '300'
Some languages might try to guess but Python is strongly typed
• Type is important, and implicit type conversion is rare.
• Python will respond instead by raising a TypeError.
To avoid the error, you need to clarify by changing the relevant type.
For example,
Out[6]: 700
6.3.2 Identity
In Python, each object has a unique identifier, which helps Python (and us) keep track of the
object.
The identity of an object can be obtained via the id() function
In [7]: y = 2.5
z = 2.5
id(y)
Out[7]: 140187764307240
In [8]: id(z)
Out[8]: 140187764307216
In this example, y and z happen to have the same value (i.e., 2.5), but they are not the
same object.
The identity of an object is in fact just the address of the object in memory.
82 CHAPTER 6. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING
If we set x = 42 then we create an object of type int that contains the data 42.
In fact, it contains more, as the following example shows
In [9]: x = 42
x
Out[9]: 42
In [10]: x.imag
Out[10]: 0
In [11]: x.__class__
Out[11]: int
When Python creates this integer object, it stores with it various auxiliary information, such
as the imaginary part, and the type.
Any name following a dot is called an attribute of the object to the left of the dot.
• e.g.,imag and __class__ are attributes of x.
We see from this example that objects have attributes that contain auxiliary information.
They also have attributes that act like functions, called methods.
These attributes are important, so let’s discuss them in-depth.
6.3.4 Methods
Out[12]: True
In [13]: callable(x.__doc__)
Out[13]: False
Methods typically act on the data contained in the object they belong to, or combine that
data with other data
In [15]: s.lower()
It doesn’t look like there are any methods used here, but in fact the square bracket assign-
ment notation is just a convenient interface to a method call.
What actually happens is that Python calls the __setitem__ method, as follows
(If you wanted to you could modify the __setitem__ method, so that square bracket as-
signment does something totally different)
6.4 Summary
In [20]: type(f)
Out[20]: function
In [21]: id(f)
Out[21]: 140187764206856
In [22]: f.__name__
Out[22]: 'f'
We can see that f has type, identity, attributes and so on—just like any other object.
It also has methods.
One example is the __call__ method, which just evaluates the function
In [23]: f.__call__(3)
Out[23]: 9
id(math)
Out[24]: 140187861625016
This uniform treatment of data in Python (everything is an object) helps keep the language
simple and consistent.
Chapter 7
7.1 Contents
• Overview 7.2
• OOP Review 7.3
• Defining Your Own Classes 7.4
• Special Methods 7.5
• Exercises 7.6
• Solutions 7.7
7.2 Overview
85
86 CHAPTER 7. OOP II: BUILDING CLASSES
As discussed an earlier lecture, in the OOP paradigm, data and functions are bundled to-
gether into “objects”.
An example is a Python list, which not only stores data but also knows how to sort itself, etc.
In [2]: x = [1, 5, 4]
x.sort()
x
Out[2]: [1, 4, 5]
As we now know, sort is a function that is “part of” the list object — and hence called a
method.
If we want to make our own types of objects we need to use class definitions.
A class definition is a blueprint for a particular class of objects (e.g., lists, strings or complex
numbers).
It describes
• What kind of data the class stores
• What methods it has for acting on these data
An object or instance is a realization of the class, created from the blueprint
• Each instance has its own unique data.
• Methods set out in the class definition act on this (and other) data.
In Python, the data and methods of an object are collectively referred to as attributes.
Attributes are accessed via “dotted attribute notation”
• object_name.data
• object_name.method_name()
In the example
In [3]: x = [1, 5, 4]
x.sort()
x.__class__
7.4. DEFINING YOUR OWN CLASSES 87
Out[3]: list
• x is an object or instance, created from the definition for Python lists, but with its own
particular data.
• x.sort() and x.__class__ are two attributes of x.
• dir(x) can be used to view all the attributes of x.
OOP is useful for the same reason that abstraction is useful: for recognizing and exploiting
the common structure.
For example,
• a Markov chain consists of a set of states and a collection of transition probabilities for
moving across states
• a general equilibrium theory consists of a commodity space, preferences, technologies,
and an equilibrium definition
• a game consists of a list of players, lists of actions available to each player, player pay-
offs as functions of all players’ actions, and a timing protocol
These are all abstractions that collect together “objects” of the same “type”.
Recognizing common structure allows us to employ common tools.
In economic theory, this might be a proposition that applies to all games of a certain type.
In Python, this might be a method that’s useful for all Markov chains (e.g., simulate).
When we use OOP, the simulate method is conveniently bundled together with the Markov
chain object.
Usage
Out[5]: 5
In [6]: c1.earn(15)
c1.spend(100)
Insufficent funds
We can of course create multiple instances each with its own data
In [7]: c1 = Consumer(10)
c2 = Consumer(12)
c2.spend(4)
c2.wealth
Out[7]: 8
7.4. DEFINING YOUR OWN CLASSES 89
In [8]: c1.wealth
Out[8]: 10
In [9]: c1.__dict__
In [10]: c2.__dict__
Out[10]: {'wealth': 8}
When we access or set attributes we’re actually just modifying the dictionary maintained by
the instance.
Self
If you look at the Consumer class definition again you’ll see the word self throughout the
code.
The rules with self are that
• Any instance data should be prepended with self
– e.g., the earn method references self.wealth rather than just wealth
• Any method defined within the class should have self as its first argument
– e.g., def earn(self, y) rather than just def earn(y)
• Any method referenced within the class should be called as self.method_name
There are no examples of the last rule in the preceding code but we will see some shortly.
Details
In this section, we look at some more formal details related to classes and self
• You might wish to skip to the next section on first pass of this lecture.
• You can return to these details after you’ve familiarized yourself with more examples.
Methods actually live inside a class object formed when the interpreter reads the class defini-
tion
Note how the three methods __init__, earn and spend are stored in the class object.
Consider the following code
In [12]: c1 = Consumer(10)
c1.earn(10)
c1.wealth
Out[12]: 20
When you call earn via c1.earn(10) the interpreter passes the instance c1 and the argu-
ment 10 to Consumer.earn.
In fact, the following are equivalent
• c1.earn(10)
• Consumer.earn(c1, 10)
In the function call Consumer.earn(c1, 10) note that c1 is the first argument.
Recall that in the definition of the earn method, self is the first parameter
The end result is that self is bound to the instance c1 inside the function call.
That’s why the statement self.wealth += y inside earn ends up modifying c1.wealth.
For our next example, let’s write a simple class to implement the Solow growth model.
The Solow growth model is a neoclassical growth model where the amount of capital stock
per capita 𝑘𝑡 evolves according to the rule
𝑠𝑧𝑘𝑡𝛼 + (1 − 𝛿)𝑘𝑡
𝑘𝑡+1 = (1)
1+𝑛
Here
• 𝑠 is an exogenously given savings rate
• 𝑧 is a productivity parameter
• 𝛼 is capital’s share of income
• 𝑛 is the population growth rate
• 𝛿 is the depreciation rate
The steady state of the model is the 𝑘 that solves (1) when 𝑘𝑡+1 = 𝑘𝑡 = 𝑘.
Here’s a class that implements this model.
Some points of interest in the code are
• An instance maintains a record of its current capital stock in the variable self.k.
• The h method implements the right-hand side of (1).
• The update method uses h to update capital as per (1).
7.4. DEFINING YOUR OWN CLASSES 91
– Notice how inside update the reference to the local method h is self.h.
The methods steady_state and generate_sequence are fairly self-explanatory
"""
def __init__(self, n=0.05, # population growth rate
s=0.25, # savings rate
δ=0.1, # depreciation rate
α=0.3, # share of labor
z=2.0, # productivity
k=1.0): # current capital stock
def h(self):
"Evaluate the h function"
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Apply the update rule
return (s * z * self.k**α + (1 - δ) * self.k) / (1 + n)
def update(self):
"Update the current state (i.e., the capital stock)."
self.k = self.h()
def steady_state(self):
"Compute the steady state value of capital."
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Compute and return steady state
return ((s * z) / (n + δ))**(1 / (1 - α))
Here’s a little program that uses the class to compute time series from two different initial
conditions.
The common steady state is also plotted for comparison
In [15]: s1 = Solow()
s2 = Solow(k=8.0)
T = 60
fig, ax = plt.subplots(figsize=(9, 6))
92 CHAPTER 7. OOP II: BUILDING CLASSES
ax.set_xlabel('$k_{t+1}$', fontsize=14)
ax.set_ylabel('$k_t$', fontsize=14)
ax.legend()
plt.show()
Next, let’s write a class for a simple one good market where agents are price takers.
The market consists of the following objects:
• A linear demand curve 𝑄 = 𝑎𝑑 − 𝑏𝑑 𝑝
• A linear supply curve 𝑄 = 𝑎𝑧 + 𝑏𝑧 (𝑝 − 𝑡)
Here
• 𝑝 is price paid by the consumer, 𝑄 is quantity and 𝑡 is a per-unit tax.
• Other symbols are demand and supply parameters.
The class provides methods to compute various values of interest, including competitive equi-
librium price and quantity, tax revenue raised, consumer surplus and producer surplus.
7.4. DEFINING YOUR OWN CLASSES 93
class Market:
"""
self.ad, self.bd, self.az, self.bz, self.tax = ad, bd, az, bz, tax
if ad < az:
raise ValueError('Insufficient demand.')
def price(self):
"Return equilibrium price"
return (self.ad - self.az + self.bz * self.tax) / (self.bd + self.
↪ bz)
def quantity(self):
"Compute equilibrium quantity"
return self.ad - self.bd * self.price()
def consumer_surp(self):
"Compute consumer surplus"
# == Compute area under inverse demand function == #
integrand = lambda x: (self.ad / self.bd) - (1 / self.bd) * x
area, error = quad(integrand, 0, self.quantity())
return area - self.price() * self.quantity()
def producer_surp(self):
"Compute producer surplus"
# == Compute area above inverse supply curve, excluding tax == #
integrand = lambda x: -(self.az / self.bz) + (1 / self.bz) * x
area, error = quad(integrand, 0, self.quantity())
return (self.price() - self.tax) * self.quantity() - area
def taxrev(self):
"Compute tax revenue"
return self.tax * self.quantity()
Here’s a short program that uses this class to plot an inverse demand curve together with in-
verse supply curves with and without taxes
q_max = m.quantity() * 2
q_grid = np.linspace(0.0, q_max, 100)
pd = m.inverse_demand(q_grid)
ps = m.inverse_supply(q_grid)
psno = m.inverse_supply_no_tax(q_grid)
fig, ax = plt.subplots()
ax.plot(q_grid, pd, lw=2, alpha=0.6, label='demand')
ax.plot(q_grid, ps, lw=2, alpha=0.6, label='supply')
ax.plot(q_grid, psno, '--k', lw=2, alpha=0.6, label='supply without tax')
ax.set_xlabel('quantity', fontsize=14)
ax.set_xlim(0, q_max)
ax.set_ylabel('price', fontsize=14)
ax.legend(loc='lower right', frameon=False, fontsize=14)
plt.show()
7.4. DEFINING YOUR OWN CLASSES 95
Out[21]: 1.125
Let’s look at one more example, related to chaotic dynamics in nonlinear systems.
One simple transition rule that can generate complex dynamics is the logistic map
Let’s write a class for generating time series from this model.
Here’s one implementation
def update(self):
"Apply the map to update state."
self.x = self.r * self.x *(1 - self.x)
fig, ax = plt.subplots()
ax.set_xlabel('$t$', fontsize=14)
ax.set_ylabel('$x_t$', fontsize=14)
x = ch.generate_sequence(ts_length)
ax.plot(range(ts_length), x, 'bo-', alpha=0.5, lw=2, label='$x_t$')
plt.show()
7.4. DEFINING YOUR OWN CLASSES 97
ax.set_xlabel('$r$', fontsize=16)
ax.set_ylabel('$x_t$', fontsize=16)
plt.show()
98 CHAPTER 7. OOP II: BUILDING CLASSES
Python provides special methods with which some neat tricks can be performed.
For example, recall that lists and tuples have a notion of length and that this length can be
queried via the len function
Out[26]: 2
If you want to provide a return value for the len function when applied to your user-defined
object, use the __len__ special method
def __len__(self):
return 42
Now we get
In [28]: f = Foo()
len(f)
Out[28]: 42
In [30]: f = Foo()
f(8) # Exactly equivalent to f.__call__(8)
Out[30]: 50
7.6 Exercises
7.6.1 Exercise 1
The empirical cumulative distribution function (ecdf) corresponding to a sample {𝑋𝑖 }𝑛𝑖=1 is
defined as
1 𝑛
𝐹𝑛 (𝑥) ∶= ∑ 1{𝑋𝑖 ≤ 𝑥} (𝑥 ∈ ℝ) (3)
𝑛 𝑖=1
Here 1{𝑋𝑖 ≤ 𝑥} is an indicator function (one if 𝑋𝑖 ≤ 𝑥 and zero otherwise) and hence 𝐹𝑛 (𝑥)
is the fraction of the sample that falls below 𝑥.
The Glivenko–Cantelli Theorem states that, provided that the sample is IID, the ecdf 𝐹𝑛 con-
verges to the true distribution function 𝐹 .
Implement 𝐹𝑛 as a class called ECDF, where
100 CHAPTER 7. OOP II: BUILDING CLASSES
• A given sample {𝑋𝑖 }𝑛𝑖=1 are the instance data, stored as self.observations.
• The class implements a __call__ method that returns 𝐹𝑛 (𝑥) for any 𝑥.
Your code should work as follows (modulo randomness)
7.6.2 Exercise 2
𝑁
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑁 𝑥𝑁 = ∑ 𝑎𝑛 𝑥𝑛 (𝑥 ∈ ℝ) (4)
𝑛=0
The instance data for the class Polynomial will be the coefficients (in the case of (4), the
numbers 𝑎0 , … , 𝑎𝑁 ).
Provide methods that
2. Differentiate the polynomial, replacing the original coefficients with those of its deriva-
tive 𝑝′ .
7.7 Solutions
7.7.1 Exercise 1
if obs <= x:
counter += 1
return counter / len(self.observations)
In [32]: # == test == #
print(F(0.5))
0.6
0.482
7.7.2 Exercise 2
def differentiate(self):
"Reset self.coefficients to those of p' instead of p."
new_coefficients = []
for i, a in enumerate(self.coefficients):
new_coefficients.append(i * a)
# Remove the first element, which is zero
del new_coefficients[0]
# And reset coefficients data to new values
self.coefficients = new_coefficients
return new_coefficients
102 CHAPTER 7. OOP II: BUILDING CLASSES
Part II
103
Chapter 8
8.1 Contents
• Overview 8.2
• Scientific Libraries 8.3
• The Need for Speed 8.4
• Vectorization 8.5
• Beyond Vectorization 8.6
“We should forget about small efficiencies, say about 97% of the time: premature
optimization is the root of all evil.” – Donald Knuth
8.2 Overview
105
106 CHAPTER 8. PYTHON FOR SCIENTIFIC COMPUTING
Let’s briefly review Python’s scientific libraries, starting with why we need them.
One obvious reason we use scientific libraries is because they implement routines we want to
use.
For example, it’s almost always better to use an existing routine for root finding than to write
a new one from scratch.
(For standard algorithms, efficiency is maximized if the community can coordinate on a com-
mon set of implementations, written by experts and tuned by users to be as fast and robust
as possible.)
But this is not the only reason that we use Python’s scientific libraries.
Another is that pure Python, while flexible and elegant, is not fast.
So we need libraries that are designed to accelerate execution of Python code.
As we’ll see below, there are now Python libraries that can do this extremely well.
In terms of popularity, the big four in the world of scientific Python libraries are
• NumPy
• SciPy
• Matplotlib
• Pandas
For us, there’s another (relatively new) library that will also be essential for numerical com-
puting:
• Numba
Over the next few lectures we’ll see how to use these libraries.
But first, let’s quickly review how they fit together.
• NumPy forms the foundations by providing a basic array data type (think of vectors
and matrices) and functions for acting on these arrays (e.g., matrix multiplication).
• SciPy builds on NumPy by adding the kinds of numerical methods that are routinely
used in science (interpolation, optimization, root finding, etc.).
• Matplotlib is used to generate figures, with a focus on plotting data stored in NumPy
arrays.
• Pandas provides types and functions for empirical work (e.g., manipulating data).
• Numba accelerates execution via JIT compilation — we’ll learn about this soon.
Before we learn how to do this, let’s try to understand why plain vanilla Python is slower
than C or Fortran.
This will, in turn, help us figure out how to speed things up.
Dynamic Typing
In [2]: a, b = 10, 10
a + b
Out[2]: 20
Even for this simple operation, the Python interpreter has a fair bit of work to do.
For example, in the statement a + b, the interpreter has to know which operation to invoke.
If a and b are strings, then a + b requires string concatenation
Out[3]: 'foobar'
(We say that the operator + is overloaded — its action depends on the type of the objects on
which it acts)
As a result, Python must check the type of the objects and then call the correct operation.
This involves substantial overheads.
Static Types
#include <stdio.h>
int main(void) {
int i;
int sum = 0;
for (i = 1; i <= 10; i++) {
sum = sum + i;
}
printf("sum = %d\n", sum);
return 0;
}
In C or Fortran, these integers would typically be stored in an array, which is a simple data
structure for storing homogeneous data.
Such an array is stored in a single contiguous block of memory
• In modern computers, memory addresses are allocated to each byte (one byte = 8 bits).
8.5. VECTORIZATION 109
8.5 Vectorization
There is a clever method called vectorization that can be used to speed up high level lan-
guages in numerical applications.
The key idea is to send array processing operations in batch to pre-compiled and efficient na-
tive machine code.
The machine code itself is typically compiled from carefully optimized C or Fortran.
For example, when working in a high level language, the operation of inverting a large ma-
trix can be subcontracted to efficient machine code that is pre-compiled for this purpose and
supplied to users as part of a package.
This clever idea dates back to MATLAB, which uses vectorization extensively.
Vectorization can greatly accelerate many numerical computations (but not all, as we shall
see).
Let’s see how vectorization works in Python, using NumPy.
Next let’s try some non-vectorized code, which uses a native Python loop to generate, square
and then sum a large number of random variables:
In [6]: n = 1_000_000
In [7]: %%time
CPU times: user 741 ms, sys: 708 µs, total: 741 ms
Wall time: 761 ms
In [8]: %%time
x = np.random.uniform(0, 1, n)
y = np.sum(x**2)
CPU times: user 14.9 ms, sys: 7.76 ms, total: 22.6 ms
Wall time: 23.6 ms
As you can see, the second code block runs much faster. Why?
The second code block breaks the loop down into three basic operations
1. draw n uniforms
2. square them
3. sum them
Many functions provided by NumPy are so-called universal functions — also called ufuncs.
This means that they
8.5. VECTORIZATION 111
In [9]: np.cos(1.0)
Out[9]: 0.5403023058681398
cos(𝑥2 + 𝑦2 )
𝑓(𝑥, 𝑦) = and 𝑎 = 3
1 + 𝑥2 + 𝑦 2
Here’s a plot of 𝑓
In [13]: %%time
m = -np.inf
for x in grid:
for y in grid:
z = f(x, y)
if z > m:
m = z
In [14]: %%time
x, y = np.meshgrid(grid, grid)
np.max(f(x, y))
CPU times: user 38.3 ms, sys: 16.1 ms, total: 54.4 ms
Wall time: 53.8 ms
Out[14]: 0.9999819641085747
In the vectorized version, all the looping takes place in compiled code.
As you can see, the second version is much faster.
(We’ll make it even faster again later on, using more scientific programming tricks.)
NumPy
9.1 Contents
• Overview 9.2
• NumPy Arrays 9.3
• Operations on Arrays 9.4
• Additional Functionality 9.5
• Exercises 9.6
• Solutions 9.7
“Let’s be clear: the work of science has nothing whatever to do with consensus.
Consensus is the business of politics. Science, on the contrary, requires only one
investigator who happens to be right, which means that he or she has results that
are verifiable by reference to the real world. In science consensus is irrelevant.
What is relevant is reproducible results.” – Michael Crichton
9.2 Overview
9.2.1 References
115
116 CHAPTER 9. NUMPY
In [2]: a = np.zeros(3)
a
In [3]: type(a)
Out[3]: numpy.ndarray
NumPy arrays are somewhat like native Python lists, except that
• Data must be homogeneous (all elements of the same type).
• These types must be one of the data types (dtypes) provided by NumPy.
The most important of these dtypes are:
• float64: 64 bit floating-point number
• int64: 64 bit integer
• bool: 8 bit True or False
There are also dtypes to represent complex numbers, unsigned integers, etc.
On modern machines, the default dtype for arrays is float64
In [4]: a = np.zeros(3)
type(a[0])
Out[4]: numpy.float64
Out[5]: numpy.int64
9.3. NUMPY ARRAYS 117
In [6]: z = np.zeros(10)
Here z is a flat array with no dimension — neither row nor column vector.
The dimension is recorded in the shape attribute, which is a tuple
In [7]: z.shape
Out[7]: (10,)
Here the shape tuple has only one element, which is the length of the array (tuples with one
element end with a comma).
To give it dimension, we can change the shape attribute
Out[8]: array([[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.]])
In [9]: z = np.zeros(4)
z.shape = (2, 2)
z
In the last case, to make the 2 by 2 array, we could also pass a tuple to the zeros() func-
tion, as in z = np.zeros((2, 2)).
In [10]: z = np.empty(3)
z
118 CHAPTER 9. NUMPY
In [12]: z = np.identity(2)
z
In addition, NumPy arrays can be created from Python lists, tuples, etc. using np.array
In [14]: type(z)
Out[14]: numpy.ndarray
See also np.asarray, which performs a similar function, but does not make a distinct copy
of data already in a NumPy array.
Out[17]: True
Out[18]: False
To read in the array data from a text file containing numeric data use np.loadtxt or
np.genfromtxt—see the documentation for details.
9.3. NUMPY ARRAYS 119
In [19]: z = np.linspace(1, 2, 5)
z
In [20]: z[0]
Out[20]: 1.0
In [22]: z[-1]
Out[22]: 2.0
In [24]: z[0, 0]
Out[24]: 1
In [25]: z[0, 1]
Out[25]: 2
And so on.
Note that indices are still zero-based, to maintain compatibility with Python sequences.
Columns and rows can be extracted as follows
In [26]: z[0, :]
In [27]: z[:, 1]
120 CHAPTER 9. NUMPY
In [28]: z = np.linspace(2, 4, 5)
z
In [30]: z
In [32]: z[d]
Out[32]: array([2.5, 3. ])
In [33]: z = np.empty(3)
z
In [34]: z[:] = 42
z
Out[37]: 10
Out[38]: 2.5
Out[39]: 4
Out[40]: 3
Out[43]: 1.25
Out[44]: 1.118033988749895
In [46]: z = np.linspace(2, 4, 5)
z
In [47]: z.searchsorted(2.2)
Out[47]: 1
Many of the methods discussed above have equivalent functions in the NumPy namespace
In [49]: np.sum(a)
Out[49]: 10
In [50]: np.mean(a)
Out[50]: 2.5
In [52]: a * b
In [53]: a + 10
9.4. OPERATIONS ON ARRAYS 123
In [54]: a * 10
In [56]: A + 10
In [57]: A * B
With Anaconda’s scientific Python package based around Python 3.5 and above, one can use
the @ symbol for matrix multiplication, as follows:
(For older versions of Python and NumPy you need to use the np.dot function)
We can also use @ to take the inner product of two flat arrays
Out[59]: 50
In [61]: A @ (0, 1)
Mutability leads to the following behavior (which can be shocking to MATLAB program-
mers…)
In [64]: a = np.random.randn(3)
a
In [65]: b = a
b[0] = 0.0
a
Making Copies
In [66]: a = np.random.randn(3)
a
In [67]: b = np.copy(a)
b
In [68]: b[:] = 1
b
In [69]: a
NumPy provides versions of the standard functions log, exp, sin, etc. that act element-
wise on arrays
126 CHAPTER 9. NUMPY
In [71]: n = len(z)
y = np.empty(n)
for i in range(n):
y[i] = np.sin(z[i])
Because they act element-wise on arrays, these functions are called vectorized functions.
In NumPy-speak, they are also called ufuncs, which stands for “universal functions”.
As we saw above, the usual arithmetic operations (+, *, etc.) also work element-wise, and
combining these with the ufuncs gives a very large set of fast element-wise functions.
In [72]: z
In [75]: x = np.random.randn(4)
x
In [77]: f = np.vectorize(f)
f(x) # Passing the same vector x as in the previous example
However, this approach doesn’t always obtain the same speed as a more carefully crafted vec-
torized function.
9.5. ADDITIONAL FUNCTIONALITY 127
9.5.2 Comparisons
In [79]: y[0] = 5
z == y
In [80]: z != y
In [82]: z > 3
In [83]: b = z > 3
b
In [84]: z[b]
9.5.3 Sub-packages
NumPy provides some additional functionality related to scientific programming through its
sub-packages.
We’ve already seen how we can generate random variables using np.random
y.mean()
Out[86]: 5.03
Out[87]: -2.0000000000000004
Out[88]: array([[-2. , 1. ],
[ 1.5, -0.5]])
Much of this functionality is also available in SciPy, a collection of modules that are built on
top of NumPy.
We’ll cover the SciPy versions in more detail soon.
For a comprehensive list of what’s available in NumPy see this documentation.
9.6 Exercises
9.6.1 Exercise 1
𝑁
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑁 𝑥𝑁 = ∑ 𝑎𝑛 𝑥𝑛 (1)
𝑛=0
Earlier, you wrote a simple function p(x, coeff) to evaluate (1) without considering effi-
ciency.
Now write a new function that does the same job, but uses NumPy arrays and array opera-
tions for its computations, rather than any form of Python loop.
(Such functionality is already implemented as np.poly1d, but for the sake of the exercise
don’t use this class)
• Hint: Use np.cumprod()
9.6. EXERCISES 129
9.6.2 Exercise 2
def sample(q):
a = 0.0
U = uniform(0, 1)
for i in range(len(q)):
if a < U <= a + q[i]:
return i
a = a + q[i]
If you can’t see how this works, try thinking through the flow for a simple example, such as q
= [0.25, 0.75] It helps to sketch the intervals on paper.
Your exercise is to speed it up using NumPy, avoiding explicit loops
• Hint: Use np.searchsorted and np.cumsum
If you can, implement the functionality as a class called DiscreteRV, where
• the data for an instance of the class is the vector of probabilities q
• the class has a draw() method, which returns one draw according to the algorithm de-
scribed above
If you can, write the method so that draw(k) returns k draws from q.
9.6.3 Exercise 3
2. Add a method that plots the ECDF over [𝑎, 𝑏], where 𝑎 and 𝑏 are method parameters.
130 CHAPTER 9. NUMPY
9.7 Solutions
9.7.1 Exercise 1
Let’s test it
In [92]: x = 2
coef = np.linspace(2, 4, 3)
print(coef)
print(p(x, coef))
# For comparison
q = np.poly1d(np.flip(coef))
print(q(x))
[2. 3. 4.]
24.0
24.0
9.7.2 Exercise 2
class DiscreteRV:
"""
Generates an array of draws from a discrete random variable with�
↪vector of
probabilities given by q.
"""
"""
Returns k draws from q. For each such draw, the value i is returned
with probability q[i].
"""
return self.Q.searchsorted(uniform(0, 1, size=k))
The logic is not obvious, but if you take your time and read it slowly, you will understand.
There is a problem here, however.
Suppose that q is altered after an instance of discreteRV is created, for example by
The problem is that Q does not change accordingly, and Q is the data used in the draw
method.
To deal with this, one option is to compute Q every time the draw method is called.
But this is inefficient relative to computing Q once-off.
A better option is to use descriptors.
A solution from the quantecon library using descriptors that behaves as we desire can be
found here.
9.7.3 Exercise 3
In [95]: """
Modifies ecdf.py from QuantEcon to add in a plot method
"""
class ECDF:
"""
One-dimensional empirical distribution function given a vector of
observations.
Parameters
----------
observations : array_like
An array of observations
Attributes
----------
observations : array_like
An array of observations
"""
self.observations = np.asarray(observations)
Parameters
----------
x : scalar(float)
The x at which the ecdf is evaluated
Returns
-------
scalar(float)
Fraction of the sample less than x
"""
return np.mean(self.observations <= x)
Parameters
----------
a : scalar(float), optional(default=None)
Lower endpoint of the plot interval
b : scalar(float), optional(default=None)
Upper endpoint of the plot interval
"""
Matplotlib
10.1 Contents
• Overview 10.2
• The APIs 10.3
• More Features 10.4
• Further Reading 10.5
• Exercises 10.6
• Solutions 10.7
10.2 Overview
We’ve already generated quite a few figures in these lectures using Matplotlib.
Matplotlib is an outstanding graphics library, designed for scientific computing, with
• high-quality 2D and 3D plots
• output in all the usual formats (PDF, PNG, etc.)
• LaTeX integration
• fine-grained control over all aspects of presentation
• animation, etc.
135
136 CHAPTER 10. MATPLOTLIB
Here’s the kind of easy example you might find in introductory treatments
This is simple and convenient, but also somewhat limited and un-Pythonic.
For example, in the function calls, a lot of objects get created and passed around without
making themselves known to the programmer.
Python programmers tend to prefer a more explicit style of programming (run import this
in a code block and look at the second line).
This leads us to the alternative, object-oriented Matplotlib API.
Here’s the code corresponding to the preceding figure using the object-oriented API
10.3.3 Tweaks
We’ve also used alpha to make the line slightly transparent—which makes it look smoother.
The location of the legend can be changed by replacing ax.legend() with
ax.legend(loc='upper center').
Matplotlib has a huge array of functions and features, which you can discover over time as
you have need for them.
We mention just a few.
fig, ax = plt.subplots()
x = np.linspace(-4, 4, 150)
for i in range(3):
m, s = uniform(-1, 1), uniform(1, 2)
y = norm.pdf(x, loc=m, scale=s)
current_label = f'$\mu = {m:.2}$'
ax.plot(x, y, linewidth=2, alpha=0.6, label=current_label)
ax.legend()
plt.show()
10.4. MORE FEATURES 141
10.4.3 3D Plots
ygrid = xgrid
x, y = np.meshgrid(xgrid, ygrid)
Perhaps you will find a set of customizations that you regularly use.
Suppose we usually prefer our axes to go through the origin, and to have a grid.
Here’s a nice example from Matthew Doty of how the object-oriented API can be used to
build a custom subplots function that implements these changes.
Read carefully through the code and see if you can follow what’s going on
fig, ax = plt.subplots()
ax.grid()
return fig, ax
1. calls the standard plt.subplots function internally to generate the fig, ax pair,
10.6 Exercises
10.6.1 Exercise 1
10.7 Solutions
10.7.1 Exercise 1
for θ in θ_vals:
ax.plot(x, f(x, θ))
plt.show()
146 CHAPTER 10. MATPLOTLIB
Chapter 11
SciPy
11.1 Contents
• Overview 11.2
• SciPy versus NumPy 11.3
• Statistics 11.4
• Roots and Fixed Points 11.5
• Optimization 11.6
• Integration 11.7
• Linear Algebra 11.8
• Exercises 11.9
• Solutions 11.10
11.2 Overview
SciPy builds on top of NumPy to provide common tools for scientific programming such as
• linear algebra
• numerical integration
• interpolation
• optimization
• distributions and random number generation
• signal processing
• etc., etc
Like NumPy, SciPy is stable, mature and widely used.
Many SciPy routines are thin wrappers around industry-standard Fortran libraries such as
LAPACK, BLAS, etc.
It’s not really necessary to “learn” SciPy as a whole.
A more common approach is to get some idea of what’s in the library and then look up docu-
mentation as required.
In this lecture, we aim only to highlight some useful parts of the package.
147
148 CHAPTER 11. SCIPY
SciPy is a package that contains various tools that are built on top of NumPy, using its array
data type and related functionality.
In fact, when we import SciPy we also get NumPy, as can be seen from this excerpt the
SciPy initialization file:
However, it’s more common and better practice to use NumPy functionality explicitly
a = np.identity(3)
11.4 Statistics
This generates a draw from the distribution with the density function below when a, b =
5, 5
𝑥(𝑎−1) (1 − 𝑥)(𝑏−1)
𝑓(𝑥; 𝑎, 𝑏) = 1
(0 ≤ 𝑥 ≤ 1) (1)
∫0 𝑢(𝑎−1) (1 − 𝑢)(𝑏−1) 𝑑𝑢
Sometimes we need access to the density itself, or the cdf, the quantiles, etc.
11.4. STATISTICS 149
For this, we can use scipy.stats, which provides all of this functionality as well as random
number generation in a single consistent interface.
Here’s an example of usage
fig, ax = plt.subplots()
ax.hist(obs, bins=40, density=True)
ax.plot(grid, q.pdf(grid), 'k-', linewidth=2)
plt.show()
The object q that represents the distribution has additional useful methods, including
Out[5]: 0.26656768000000003
Out[6]: 0.6339134834642708
In [7]: q.mean()
Out[7]: 0.5
150 CHAPTER 11. SCIPY
The general syntax for creating these objects that represent distributions (of type
rv_frozen) is
name = scipy.stats.distribution_name(shape_parameters,
loc=c, scale=d)
fig, ax = plt.subplots()
ax.hist(obs, bins=40, density=True)
ax.plot(grid, beta.pdf(grid, 5, 5), 'k-', linewidth=2)
plt.show()
x = np.random.randn(200)
y = 2 * x + 0.1 * np.random.randn(200)
gradient, intercept, r_value, p_value, std_err = linregress(x, y)
gradient, intercept
fig, ax = plt.subplots()
ax.plot(x, f(x))
ax.axhline(ls='--', c='k', label='$f(x)$')
ax.set_xlabel('$x$', fontsize=12)
ax.set_ylabel('$f(x)$', fontsize=12)
ax.legend(fontsize=12)
plt.show()
152 CHAPTER 11. SCIPY
11.5.1 Bisection
In [12]: bisect(f, 0, 1)
Out[12]: 0.408294677734375
bisect(f, 0, 1)
Out[13]: 0.4082935042806639
11.5. ROOTS AND FIXED POINTS 153
Out[14]: 0.40829350427935673
Out[15]: 0.7001700000000279
2. Check diagnostics
In scipy.optimize, the function brentq is such a hybrid method and a good default
brentq(f, 0, 1)
Out[16]: 0.40829350427936706
Here the correct solution is found and the speed is better than bisection:
154 CHAPTER 11. SCIPY
34 µs ± 518 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
135 µs ± 2.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Out[19]: array(1.)
If you don’t get good results, you can always switch back to the brentq root finder, since
the fixed point of a function 𝑓 is the root of 𝑔(𝑥) ∶= 𝑥 − 𝑓(𝑥).
11.6 Optimization
Out[20]: 0.0
11.7. INTEGRATION 155
11.7 Integration
Out[21]: 0.33333333333333337
In fact, quad is an interface to a very standard numerical integration routine in the Fortran
library QUADPACK.
It uses Clenshaw-Curtis quadrature, based on expansion in terms of Chebychev polynomials.
There are other options for univariate integration—a useful one is fixed_quad, which is fast
and hence works well inside for loops.
There are also functions for multivariate integration.
See the documentation for more details.
We saw that NumPy provides a module for linear algebra called linalg.
SciPy also provides a module for linear algebra with the same name.
The latter is not an exact superset of the former, but overall it has more functionality.
We leave you to investigate the set of available routines.
156 CHAPTER 11. SCIPY
11.9 Exercises
11.9.1 Exercise 1
11.10 Solutions
11.10.1 Exercise 1
Out[23]: 0.408294677734375
Chapter 12
Numba
12.1 Contents
• Overview 12.2
• Compiling Functions 12.3
• Decorators and “nopython” Mode ??
• Compiling Classes 12.5
• Alternatives to Numba 12.6
• Summary and Comments 12.7
• Exercises 12.8
• Solutions 12.9
In addition to what’s in Anaconda, this lecture will need the following libraries:
Please also make sure that you have the latest version of Anaconda, since old versions are a
common source of errors.
Let’s start with some imports:
%matplotlib inline
12.2 Overview
In an earlier lecture we learned about vectorization, which is one method to improve speed
and efficiency in numerical work.
Vectorization involves sending array processing operations in batch to efficient low-level code.
However, as discussed previously, vectorization has several weaknesses.
One is that it is highly memory-intensive when working with large amounts of data.
Another is that the set of algorithms that can be entirely vectorized is not universal.
157
158 CHAPTER 12. NUMBA
As stated above, Numba’s primary use is compiling functions to fast native machine code
during runtime.
12.3.1 An Example
Let’s consider a problem that is difficult to vectorize: generating the trajectory of a difference
equation given an initial condition.
We will take the difference equation to be the quadratic map
𝑥𝑡+1 = 𝛼𝑥𝑡 (1 − 𝑥𝑡 )
In [3]: α = 4.0
Here’s the plot of a typical trajectory, starting from 𝑥0 = 0.1, with 𝑡 on the x-axis
x = qm(0.1, 250)
fig, ax = plt.subplots()
ax.plot(x, 'b-', lw=2, alpha=0.8)
ax.set_xlabel('$t$', fontsize=12)
ax.set_ylabel('$x_{t}$', fontsize = 12)
plt.show()
12.3. COMPILING FUNCTIONS 159
qm_numba = jit(qm)
In [6]: n = 10_000_000
qe.tic()
qm(0.1, int(n))
time1 = qe.toc()
In [7]: qe.tic()
qm_numba(0.1, int(n))
time2 = qe.toc()
In [8]: qe.tic()
qm_numba(0.1, int(n))
time3 = qe.toc()
Out[9]: 198.12424701855295
This kind of speed gain is huge relative to how simple and clear the implementation is.
Numba attempts to generate fast machine code using the infrastructure provided by the
LLVM Project.
It does this by inferring type information on the fly.
(See our earlier lecture on scientific computing for a discussion of types.)
The basic idea is this:
• Python is very flexible and hence we could call the function qm with many types.
– e.g., x0 could be a NumPy array or a list, n could be an integer or a float, etc.
• This makes it hard to pre-compile the function.
• However, when we do actually call the function, say by executing qm(0.5, 10), the
types of x0 and n become clear.
• Moreover, the types of other variables in qm can be inferred once the input is known.
• So the strategy of Numba and other JIT compilers is to wait until this moment, and
then compile the function.
That’s why it is called “just-in-time” compilation.
Note that, if you make the call qm(0.5, 10) and then follow it with qm(0.9, 20), compi-
lation only takes place on the first call.
The compiled code is then cached and recycled as required.
In the code above we created a JIT compiled version of qm via the call
(We will explain all about decorators in a later lecture but you can skip the details at this
stage.)
Let’s see how this is done.
To target a function for JIT compilation we can put @jit before the function definition.
Here’s what this looks like for qm
In [11]: @jit
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = α * x[t] * (1 - x[t])
return x
@njit
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return x
In [15]: solow_data = [
('n', float64),
('s', float64),
('δ', float64),
('α', float64),
('z', float64),
('k', float64)
]
@jitclass(solow_data)
class Solow:
r"""
Implements the Solow growth model with the update rule
"""
def __init__(self, n=0.05, # population growth rate
s=0.25, # savings rate
δ=0.1, # depreciation rate
α=0.3, # share of labor
12.5. COMPILING CLASSES 163
z=2.0, # productivity
k=1.0): # current capital stock
def h(self):
"Evaluate the h function"
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Apply the update rule
return (s * z * self.k**α + (1 - δ) * self.k) / (1 + n)
def update(self):
"Update the current state (i.e., the capital stock)."
self.k = self.h()
def steady_state(self):
"Compute the steady state value of capital."
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Compute and return steady state
return ((s * z) / (n + δ))**(1 / (1 - α))
First we specified the types of the instance data for the class in solow_data.
After that, targeting the class for JIT compilation only requires adding
@jitclass(solow_data) before the class definition.
When we call the methods in the class, the methods are compiled just like functions.
In [16]: s1 = Solow()
s2 = Solow(k=8.0)
T = 60
fig, ax = plt.subplots()
12.6.1 Cython
Like Numba, Cython provides an approach to generating fast compiled code that can be used
from Python.
As was the case with Numba, a key problem is the fact that Python is dynamically typed.
As you’ll recall, Numba solves this problem (where possible) by inferring type.
Cython’s approach is different — programmers add type definitions directly to their “Python”
code.
As such, the Cython language can be thought of as Python with type definitions.
In addition to a language specification, Cython is also a language translator, transforming
Cython code into optimized C and C++ code.
Cython also takes care of building language extensions — the wrapper code that interfaces
between the resulting compiled code and Python.
While Cython has certain advantages, we generally find it both slower and more cumbersome
than Numba.
12.7. SUMMARY AND COMMENTS 165
If you are comfortable writing Fortran you will find it very easy to create extension modules
from Fortran code using F2Py.
F2Py is a Fortran-to-Python interface generator that is particularly simple to use.
Robert Johansson provides a nice introduction to F2Py, among other things.
Recently, a Jupyter cell magic for Fortran has been developed — you might want to give it a
try.
12.7.1 Limitations
As we’ve seen, Numba needs to infer type information on all variables to generate fast
machine-level instructions.
For simple routines, Numba infers types very well.
For larger ones, or for routines using external libraries, it can easily fail.
Hence, it’s prudent when using Numba to focus on speeding up small, time-critical snippets of
code.
This will give you much better performance than blanketing your Python programs with
@jit statements.
In [17]: a = 1
@jit
def add_a(x):
return a + x
print(add_a(10))
11
In [18]: a = 2
print(add_a(10))
11
166 CHAPTER 12. NUMBA
Notice that changing the global had no effect on the value returned by the function.
When Numba compiles machine code for functions, it treats global variables as constants to
ensure type stability.
12.8 Exercises
12.8.1 Exercise 1
12.8.2 Exercise 2
In the Introduction to Quantitative Economics with Python lecture series you can learn all
about finite-state Markov chains.
For now, let’s just concentrate on simulating a very simple example of such a chain.
Suppose that the volatility of returns on an asset can be in one of two regimes — high or low.
The transition probabilities across states are as follows
For example, let the period length be one day, and suppose the current state is high.
We see from the graph that the state tomorrow will be
• high with probability 0.8
• low with probability 0.2
Your task is to simulate a sequence of daily volatility states according to this rule.
Set the length of the sequence to n = 1_000_000 and start in the high state.
Implement a pure Python version and a Numba version, and compare speeds.
To test your code, evaluate the fraction of time that the chain spends in the low state.
If your code is correct, it should be about 2/3.
Hints:
• Represent the low state as 0 and the high state as 1.
• If you want to store integers in a NumPy array and then apply JIT compilation, use x
= np.empty(n, dtype=np.int_).
12.9. SOLUTIONS 167
12.9 Solutions
12.9.1 Exercise 1
@njit
def calculate_pi(n=1_000_000):
count = 0
for i in range(n):
u, v = uniform(0, 1), uniform(0, 1)
d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
if d < 0.5:
count += 1
area_estimate = count / n
return area_estimate * 4 # dividing by radius**2
Out[20]: 3.141656
Out[21]: 3.142244
If we switch of JIT compilation by removing @njit, the code takes around 150 times as long
on our machine.
So we get a speed gain of 2 orders of magnitude–which is huge–by adding four characters.
12.9.2 Exercise 2
We let
• 0 represent “low”
• 1 represent “high”
In [22]: p, q = 0.1, 0.2 # Prob of leaving low and high state respectively
168 CHAPTER 12. NUMBA
Let’s run this code and check that the fraction of time spent in the low state is about 0.666
In [24]: n = 1_000_000
x = compute_series(n)
print(np.mean(x == 0)) # Fraction of time x is in state 0
0.665906
In [25]: qe.tic()
compute_series(n)
qe.toc()
Out[25]: 1.4737977981567383
compute_series_numba = jit(compute_series)
In [27]: x = compute_series_numba(n)
print(np.mean(x == 0))
0.665506
In [28]: qe.tic()
compute_series_numba(n)
qe.toc()
Out[28]: 0.01894402503967285
Parallelization
13.1 Contents
• Overview 13.2
• Types of Parallelization 13.3
• Implicit Multithreading in NumPy 13.4
• Multithreaded Loops in Numba 13.5
• Exercises 13.6
• Solutions 13.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
13.2 Overview
The growth of CPU clock speed (i.e., the speed at which a single chain of logic can be run)
has slowed dramatically in recent years.
This is unlikely to change in the near future, due to inherent physical limitations on the con-
struction of chips and circuit boards.
Chip designers and computer programmers have responded to the slowdown by seeking a dif-
ferent path to fast execution: parallelization.
Hardware makers have increased the number of cores (physical CPUs) embedded in each ma-
chine.
For programmers, the challenge has been to exploit these multiple CPUs by running many
processes in parallel (i.e., simultaneously).
This is particularly important in scientific programming, which requires handling
• large amounts of data and
• CPU intensive simulations and other calculations.
In this lecture we discuss parallelization for scientific computing, with a focus on
171
172 CHAPTER 13. PARALLELIZATION
%matplotlib inline
Large textbooks have been written on different approaches to parallelization but we will keep
a tight focus on what’s most useful to us.
We will briefly review the two main kinds of parallelization commonly used in scientific com-
puting and discuss their pros and cons.
13.3.1 Multiprocessing
Multiprocessing means concurrent execution of multiple processes using more than one pro-
cessor.
In this context, a process is a chain of instructions (i.e., a program).
Multiprocessing can be carried out on one machine with multiple CPUs or on a collection of
machines connected by a network.
In the latter case, the collection of machines is usually called a cluster.
With multiprocessing, each process has its own memory space, although the physical memory
chip might be shared.
13.3.2 Multithreading
Multithreading is similar to multiprocessing, except that, during execution, the threads all
share the same memory space.
Native Python struggles to implement multithreading due to some legacy design features.
But this is not a restriction for scientific libraries like NumPy and Numba.
Functions imported from these libraries and JIT-compiled code run in low level execution en-
vironments where Python’s legacy restrictions don’t apply.
Multithreading is more lightweight because most system and memory resources are shared by
the threads.
In addition, the fact that multiple threads all access a shared pool of memory is extremely
convenient for numerical programming.
On the other hand, multiprocessing is more flexible and can be distributed across clusters.
13.4. IMPLICIT MULTITHREADING IN NUMPY 173
For the great majority of what we do in these lectures, multithreading will suffice.
Actually, you have already been using multithreading in your Python code, although you
might not have realized it.
(We are, as usual, assuming that you are running the latest version of Anaconda Python.)
This is because NumPy cleverly implements multithreading in a lot of its compiled code.
Let’s look at some examples to see this in action.
The next piece of code computes the eigenvalues of a large number of randomly generated
matrices.
It takes a few seconds to run.
In [3]: n = 20
m = 1000
for i in range(n):
X = np.random.randn(m, m)
λ = np.linalg.eigvals(X)
Now, let’s look at the output of the htop system monitor on our machine while this code is
running:
Over the last few years, NumPy has managed to push this kind of multithreading out to more
and more operations.
For example, let’s return to a maximization problem discussed previously:
1.27 s ± 60.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If you have a system monitor such as htop (Linux/Mac) or perfmon (Windows), then try run-
ning this and then observing the load on your CPUs.
(You will probably need to bump up the grid size to see large effects.)
At least on our machine, the output shows that the operation is successfully distributed
across multiple threads.
This is one of the reasons why the vectorized code above is fast.
To get some basis for comparison for the last example, let’s try the same thing with Numba.
In fact there is an easy way to do this, since Numba can also be used to create custom ufuncs
with the [@vectorize](http://numba.pydata.org/numba-doc/dev/user/vectorize.html) decora-
tor.
@vectorize
def f_vec(x, y):
return np.cos(x**2 + y**2) / (1 + x**2 + y**2)
Out[6]: 0.9999992797121728
813 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
At least on our machine, the difference in the speed between the Numba version and the vec-
torized NumPy version shown above is not large.
13.5. MULTITHREADED LOOPS IN NUMBA 175
But there’s quite a bit going on here so let’s try to break down what is happening.
Both Numba and NumPy use efficient machine code that’s specialized to these floating point
operations.
However, the code NumPy uses is, in some ways, less efficient.
The reason is that, in NumPy, the operation np.cos(x**2 + y**2) / (1 + x**2 +
y**2) generates several intermediate arrays.
For example, a new array is created when x**2 is calculated.
The same is true when y**2 is calculated, and then x**2 + y**2 and so on.
Numba avoids creating all these intermediate arrays by compiling one function that is special-
ized to the entire operation.
But if this is true, then why isn’t the Numba code faster?
The reason is that NumPy makes up for its disadvantages with implicit multithreading, as
we’ve just discussed.
Out[8]: 0.9999992797121728
573 ms ± 54.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Now our code runs significantly faster than the NumPy version.
We just saw one approach to parallelization in Numba, using the parallel flag in
@vectorize.
This is neat but, it turns out, not well suited to many problems we consider.
Fortunately, Numba provides another approach to multithreading that will work for us almost
everywhere parallelization is possible.
176 CHAPTER 13. PARALLELIZATION
To illustrate, let’s look first at a simple, single-threaded (i.e., non-parallelized) piece of code.
The code simulates updating the wealth 𝑤𝑡 of a household via the rule
Here
• 𝑅 is the gross rate of return on assets
• 𝑠 is the savings rate of the household and
• 𝑦 is labor income.
We model both 𝑅 and 𝑦 as independent draws from a lognormal distribution.
Here’s the code:
@njit
def h(w, r=0.1, s=0.3, v1=0.1, v2=1.0):
"""
Updates household wealth.
"""
# Draw shocks
R = np.exp(v1 * randn()) * (1 + r)
y = np.exp(v2 * randn())
# Update wealth
w = R * s * w + y
return w
T = 100
w = np.empty(T)
w[0] = 5
for t in range(T-1):
w[t+1] = h(w[t])
ax.plot(w)
ax.set_xlabel('$t$', fontsize=12)
ax.set_ylabel('$w_{t}$', fontsize=12)
plt.show()
13.5. MULTITHREADED LOOPS IN NUMBA 177
Now let’s suppose that we have a large population of households and we want to know what
median wealth will be.
This is not easy to solve with pencil and paper, so we will use simulation instead.
In particular, we will simulate a large number of households and then calculate median wealth
for this group.
Suppose we are interested in the long-run average of this median over time.
It turns out that, for the specification that we’ve chosen above, we can calculate this by tak-
ing a one-period snapshot of what has happened to median wealth of the group at the end of
a long simulation.
Moreover, provided the simulation period is long enough, initial conditions don’t matter.
• This is due to something called ergodicity, which we will discuss later on.
So, in summary, we are going to simulate 50,000 households by
In [12]: @njit
def compute_long_run_median(w0=1, T=1000, num_reps=50_000):
obs = np.empty(num_reps)
for i in range(num_reps):
w = w0
178 CHAPTER 13. PARALLELIZATION
for t in range(T):
w = h(w)
obs[i] = w
return np.median(obs)
In [13]: %%time
compute_long_run_median()
Out[13]: 1.8251695416638585
@njit(parallel=True)
def compute_long_run_median_parallel(w0=1, T=1000, num_reps=50_000):
obs = np.empty(num_reps)
for i in prange(num_reps):
w = w0
for t in range(T):
w = h(w)
obs[i] = w
return np.median(obs)
In [15]: %%time
compute_long_run_median_parallel()
Out[15]: 1.8409026225668061
13.5.1 A Warning
Parallelization works well in the outer loop of the last example because the individual tasks
inside the loop are independent of each other.
13.6. EXERCISES 179
13.6 Exercises
13.6.1 Exercise 1
13.7 Solutions
13.7.1 Exercise 1
@njit(parallel=True)
def calculate_pi(n=1_000_000):
count = 0
for i in prange(n):
u, v = uniform(0, 1), uniform(0, 1)
d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
if d < 0.5:
count += 1
area_estimate = count / n
return area_estimate * 4 # dividing by radius**2
180 CHAPTER 13. PARALLELIZATION
Out[17]: 3.139408
Out[18]: 3.142336
By switching parallelization on and off (selecting True or False in the @njit annotation),
we can test the speed gain that multithreading provides on top of JIT compilation.
On our workstation, we find that parallelization increases execution speed by a factor of 2 or
3.
(If you are executing locally, you will get different numbers, depending mainly on the number
of CPUs on your machine.)
Chapter 14
Pandas
14.1 Contents
• Overview 14.2
• Series 14.3
• DataFrames 14.4
• On-Line Data Sources 14.5
• Exercises 14.6
• Solutions 14.7
In addition to what’s in Anaconda, this lecture will need the following libraries:
14.2 Overview
181
182 CHAPTER 14. PANDAS
Just as NumPy provides the basic array data type plus core array operations, pandas
• reading in data
• adjusting indices
• working with dates and time series
• sorting, grouping, re-ordering and general data munging Section ??
• dealing with missing values, etc., etc.
More sophisticated statistical functionality is left to other packages, such as statsmodels and
scikit-learn, which are built on top of pandas.
This lecture will provide a basic introduction to pandas.
Throughout the lecture, we will assume that the following imports have taken place
14.3 Series
Two important data types defined by pandas are Series and DataFrame.
You can think of a Series as a “column” of data, such as a collection of observations on a
single variable.
A DataFrame is an object for storing related columns of data.
Let’s start with Series
14.3. SERIES 183
Out[3]: 0 0.073715
1 0.362161
2 -1.512099
3 -1.174992
Name: daily returns, dtype: float64
Here you can imagine the indices 0, 1, 2, 3 as indexing four listed companies, and the
values being daily returns on their shares.
Pandas Series are built on top of NumPy arrays and support many similar operations
In [4]: s * 100
Out[4]: 0 7.371542
1 36.216113
2 -151.209885
3 -117.499217
Name: daily returns, dtype: float64
In [5]: np.abs(s)
Out[5]: 0 0.073715
1 0.362161
2 1.512099
3 1.174992
Name: daily returns, dtype: float64
In [6]: s.describe()
Viewed in this way, Series are like fast, efficient Python dictionaries (with the restriction
that the items in the dictionary all have the same type—in this case, floats).
In fact, you can use much of the same syntax as Python dictionaries
In [8]: s['AMZN']
Out[8]: 0.07371541772276423
In [9]: s['AMZN'] = 0
s
In [10]: 'AAPL' in s
Out[10]: True
14.4 DataFrames
While a Series is a single column of data, a DataFrame is several columns, one for each
variable.
In essence, a DataFrame in pandas is analogous to a (highly optimized) Excel spreadsheet.
Thus, it is a powerful tool for representing and analyzing data that are naturally organized
into rows and columns, often with descriptive indexes for individual rows and individual
columns.
Let’s look at an example that reads data from the CSV file pandas/data/test_pwt.csv
that can be downloaded here.
Here’s the content of test_pwt.csv
"country","country isocode","year","POP","XRAT","tcgdp","cc","cg"
"Argentina","ARG","2000","37335.653","0.9995","295072.21869","75.716805379","5.5
"Australia","AUS","2000","19053.186","1.72483","541804.6521","67.759025993","6.7
"India","IND","2000","1006300.297","44.9416","1728144.3748","64.575551328","14.0
"Israel","ISR","2000","6114.57","4.07733","129253.89423","64.436450847","10.2666
"Malawi","MWI","2000","11801.505","59.543808333","5026.2217836","74.707624181","
"South Africa","ZAF","2000","45064.098","6.93983","227242.36949","72.718710427",
"United States","USA","2000","282171.957","1","9898700","72.347054303","6.032453
"Uruguay","URY","2000","3219.793","12.099591667","25255.961693","78.978740282","
Supposing you have this data saved as test_pwt.csv in the present working directory
(type %pwd in Jupyter to see what this is), it can be read in as follows:
14.4. DATAFRAMES 185
In [11]: df = pd.read_csv('https://raw.githubusercontent.com/QuantEcon/lecture-
source-
py/master/source/_static/lecture_specific/pandas/data/test_pwt.csv')
type(df)
Out[11]: pandas.core.frame.DataFrame
In [12]: df
cc cg
0 75.716805 5.578804
1 67.759026 6.720098
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954
5 72.718710 5.726546
6 72.347054 6.032454
7 78.978740 5.108068
We can select particular rows using standard Python array slicing notation
In [13]: df[2:5]
cc cg
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954
To select columns, we can pass a list containing the names of the desired columns represented
as strings
3 Israel 1.292539e+05
4 Malawi 5.026222e+03
5 South Africa 2.272424e+05
6 United States 9.898700e+06
7 Uruguay 2.525596e+04
To select both rows and columns using integers, the iloc attribute should be used with the
format .iloc[rows, columns]
To select rows and columns using a mixture of integers and labels, the loc attribute can be
used in a similar way
Let’s imagine that we’re only interested in population (POP) and total GDP (tcgdp).
One way to strip the data frame df down to only these variables is to overwrite the
dataframe using the selection method described above
Here the index 0, 1,..., 7 is redundant because we can use the country names as an in-
dex.
To do this, we set the index to be the country variable in the dataframe
In [18]: df = df.set_index('country')
df
14.4. DATAFRAMES 187
Next, we’re going to add a column showing real GDP per capita, multiplying by 1,000,000 as
we go because total GDP is in millions
One of the nice things about pandas DataFrame and Series objects is that they have
methods for plotting and visualization that work through Matplotlib.
For example, we can easily generate a bar plot of GDP per capita
At the moment the data frame is ordered alphabetically on the countries—let’s change it to
GDP per capita
https://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv
One option is to use requests, a standard Python library for requesting data over the Inter-
net.
To begin, try the following code on your computer
In [25]: r = requests.get('http://research.stlouisfed.org/fred2/series/UNRATE/
↪downloaddata/UNRATE
.csv')
1. You are not connected to the Internet — hopefully, this isn’t the case.
2. Your machine is accessing the Internet through a proxy server, and Python isn’t aware
of this.
source = requests.get(url).content.decode().split("\n")
source[0]
Out[26]: 'DATE,VALUE\r'
In [27]: source[1]
Out[27]: '1948-01-01,3.4\r'
14.5. ON-LINE DATA SOURCES 191
In [28]: source[2]
Out[28]: '1948-02-01,3.8\r'
We could now write some additional code to parse this text and store it as an array.
But this is unnecessary — pandas’ read_csv function can handle the task for us.
We use parse_dates=True so that pandas recognizes our dates column, allowing for simple
date filtering
The data has been read into a pandas DataFrame called data that we can now manipulate in
the usual way
In [30]: type(data)
Out[30]: pandas.core.frame.DataFrame
Out[31]: VALUE
DATE
1948-01-01 3.4
1948-02-01 3.8
1948-03-01 4.0
1948-04-01 3.9
1948-05-01 3.5
In [32]: pd.set_option('precision', 1)
data.describe() # Your output might differ slightly
Out[32]: VALUE
count 868.0
mean 5.7
std 1.7
min 2.5
25% 4.5
50% 5.5
75% 6.8
max 14.7
We can also plot the unemployment rate from 2006 to 2012 as follows
The maker of pandas has also authored a library called pandas_datareader that gives pro-
grammatic access to many data sources straight from the Jupyter notebook.
While some sources require an access key, many of the most important (e.g., FRED, OECD,
EUROSTAT and the World Bank) are free to use.
For now let’s work through one example of downloading and plotting data — this time from
the World Bank.
The World Bank collects and organizes data on a huge range of indicators.
For example, here’s some data on government debt as a ratio to GDP.
The next code example fetches the data for you and plots time series for the US and Aus-
tralia
ax.set_xlabel('year', fontsize=12)
plt.title("Government Debt to GDP (%)")
plt.show()
The documentation provides more details on how to access various data sources.
14.6 Exercises
14.6.1 Exercise 1
Write a program to calculate the percentage price change over 2019 for the following shares:
'GOOG': 'Google',
'SNE': 'Sony',
'PTR': 'PetroChina'}
return ticker
ticker = read_data(ticker_list)
Complete the program to plot the result as a bar graph like this one:
14.6.2 Exercise 2
Using the method read_data introduced in Exercise 1, write a program to obtain year-on-
year percentage change for the following indices:
14.7. SOLUTIONS 195
Complete the program to show summary statistics and plot the result as a time series graph
like this one:
14.7 Solutions
14.7.1 Exercise 1
There are a few ways to approach this problem using Pandas to calculate the percentage
change.
First, you can extract the data and perform the calculation such as:
PTR -17.4
dtype: float64
Alternatively you can use an inbuilt method pct_change and configure it to perform the
correct calculation using periods argument.
In [41]: price_change.sort_values(inplace=True)
price_change = price_change.rename(index=ticker_list)
fig, ax = plt.subplots(figsize=(10,8))
ax.set_xlabel('stock', fontsize=12)
ax.set_ylabel('percentage change in price', fontsize=12)
price_change.plot(kind='bar', ax=ax)
plt.show()
14.7. SOLUTIONS 197
14.7.2 Exercise 2
Following the work you did in Exercise 1, you can query the data using read_data by up-
dating the start and end dates accordingly.
Then, extract the first and last set of prices per year as DataFrames and calculate the yearly
returns such as:
yearly_returns
Next, you can obtain summary statistics by using the method describe.
In [44]: yearly_returns.describe()
plt.tight_layout()
200 CHAPTER 14. PANDAS
Footnotes
[1] Wikipedia defines munging as cleaning data from one raw form into a structured, purged
one.
Part III
201
Chapter 15
15.1 Contents
• Overview 15.2
• An Example of Poor Code 15.3
• Good Coding Practice 15.4
• Revisiting the Example 15.5
• Exercises 15.6
• Solutions 15.7
15.2 Overview
When computer programs are small, poorly written code is not overly costly.
But more data, more sophisticated models, and more computer power are enabling us to take
on more challenging problems that involve writing longer programs.
For such programs, investment in good coding practices will pay high returns.
The main payoffs are higher productivity and faster code.
In this lecture, we review some elements of good coding practice.
We also touch on modern developments in scientific computing — such as just in time compi-
lation — and how they affect good program design.
Here
• 𝑘𝑡 is capital at time 𝑡 and
• 𝑠, 𝛼, 𝛿 are parameters (savings, a productivity parameter and depreciation)
203
204 CHAPTER 15. WRITING GOOD CODE
1. sets 𝑘0 = 1
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**α[j] + (1 - δ) * k[t]
axes[0].plot(k, 'o-', label=rf"$\alpha = {α[j]},\; s = {s},\;�
↪\delta={δ}$")
axes[0].grid(lw=0.2)
axes[0].set_ylim(0, 18)
axes[0].set_xlabel('time')
axes[0].set_ylabel('capital')
axes[0].legend(loc='upper left', frameon=True)
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s[j] * k[t]**α + (1 - δ) * k[t]
axes[1].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s[j]},\;�
↪\delta={δ}$")
axes[1].grid(lw=0.2)
axes[1].set_xlabel('time')
axes[1].set_ylabel('capital')
axes[1].set_ylim(0, 18)
axes[1].legend(loc='upper left', frameon=True)
15.3. AN EXAMPLE OF POOR CODE 205
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**α + (1 - δ[j]) * k[t]
axes[2].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s},\;�
↪\delta={δ[j]}$")
axes[2].set_ylim(0, 18)
axes[2].set_xlabel('time')
axes[2].set_ylabel('capital')
axes[2].grid(lw=0.2)
axes[2].legend(loc='upper left', frameon=True)
plt.show()
206 CHAPTER 15. WRITING GOOD CODE
15.4. GOOD CODING PRACTICE 207
There are usually many different ways to write a program that accomplishes a given task.
For small programs, like the one above, the way you write code doesn’t matter too much.
But if you are ambitious and want to produce useful things, you’ll write medium to large pro-
grams too.
In those settings, coding style matters a great deal.
Fortunately, lots of smart people have thought about the best way to write code.
Here are some basic precepts.
If you look at the code above, you’ll see numbers like 50 and 49 and 3 scattered through the
code.
These kinds of numeric literals in the body of your code are sometimes called “magic num-
bers”.
This is not a compliment.
While numeric literals are not all evil, the numbers shown in the program above should cer-
tainly be replaced by named constants.
For example, the code above could declare the variable time_series_length = 50.
Then in the loops, 49 should be replaced by time_series_length - 1.
The advantages are:
• the meaning is much clearer throughout
• to alter the time series length, you only need to change one value
More importantly, repeating the same logic in different places means that eventually one of
them will likely be wrong.
If you want to know more, read the excellent summary found on this page.
We’ll talk about how to avoid repetition below.
Sure, global variables (i.e., names assigned to values outside of any function or class) are con-
venient.
Rookie programmers typically use global variables with abandon — as we once did ourselves.
But global variables are dangerous, especially in medium to large size programs, since
• they can affect what happens in any part of your program
• they can be changed by any function
This makes it much harder to be certain about what some small part of a given piece of code
actually commands.
Here’s a useful discussion on the topic.
While the odd global in small scripts is no big deal, we recommend that you teach yourself to
avoid them.
(We’ll discuss how just below).
JIT Compilation
For scientific computing, there is another good reason to avoid global variables.
As we’ve seen in previous lectures, JIT compilation can generate excellent performance for
scripting languages like Python.
But the task of the compiler used for JIT compilation becomes harder when global variables
are present.
Put differently, the type inference required for JIT compilation is safer and more effective
when variables are sandboxed inside a function.
Fortunately, we can easily avoid the evils of global variables and WET code.
• WET stands for “we enjoy typing” and is the opposite of DRY.
We can do this by making frequent use of functions or classes.
In fact, functions and classes are designed specifically to help us avoid shaming ourselves by
repeating code or excessive use of global variables.
Both can be useful, and in fact they work well with each other.
15.5. REVISITING THE EXAMPLE 209
Here’s some code that reproduces the plot above with better coding style.
ax.set_xlabel('time')
ax.set_ylabel('capital')
ax.set_ylim(0, 18)
ax.legend(loc='upper left', frameon=True)
plt.show()
210 CHAPTER 15. WRITING GOOD CODE
15.6. EXERCISES 211
15.6 Exercises
15.6.1 Exercise 1
𝑞𝑠 (𝑝) = exp(𝛼𝑝) − 𝛽.
𝑞𝑑 (𝑝) = 𝛾𝑝−𝛿 .
This yields the equilibrium price 𝑝∗ . From this we get the equilibrium price by 𝑞 ∗ = 𝑞𝑠 (𝑝∗ )
The parameter values will be
• 𝛼 = 0.1
• 𝛽=1
• 𝛾=1
• 𝛿=1
# Compute equilibrium
def h(p):
return p**(-1) - (np.exp(0.1 * p) - 1) # demand - supply
p_star = brentq(h, 2, 4)
q_star = np.exp(0.1 * p_star) - 1
qs = np.exp(0.1 * grid) - 1
qd = grid**(-1)
ax.set_xlabel('price')
ax.set_ylabel('quantity')
ax.legend(loc='upper center')
plt.show()
p_star = brentq(h, 2, 4)
15.6. EXERCISES 213
qs = np.exp(0.1 * p_grid) - 1
qd = 1.25 * p_grid**(-1)
ax.set_xlabel('price')
ax.set_ylabel('quantity')
ax.legend(loc='upper center')
plt.show()
Now we might consider supply shifts, but you already get the idea that there’s a lot of re-
peated code here.
Refactor and improve clarity in the code above using the principles discussed in this lecture.
214 CHAPTER 15. WRITING GOOD CODE
15.7 Solutions
15.7.1 Exercise 1
def compute_equilibrium(self):
def h(p):
return self.qd(p) - self.qs(p)
p_star = brentq(h, 2, 4)
q_star = np.exp(self.α * p_star) - self.β
def plot_equilibrium(self):
# Now plot
grid = np.linspace(2, 4, 100)
fig, ax = plt.subplots()
ax.set_xlabel('price')
ax.set_ylabel('quantity')
ax.legend(loc='upper center')
plt.show()
In [8]: eq = Equilibrium()
In [9]: eq.compute_equilibrium()
In [10]: eq.plot_equilibrium()
15.7. SOLUTIONS 215
One of the nice things about our refactored code is that, when we change parameters, we
don’t need to repeat ourselves:
In [12]: eq.compute_equilibrium()
In [13]: eq.plot_equilibrium()
216 CHAPTER 15. WRITING GOOD CODE
Chapter 16
16.1 Contents
• Overview 16.2
• Iterables and Iterators 16.3
• Names and Name Resolution 16.4
• Handling Errors 16.5
• Decorators and Descriptors 16.6
• Generators 16.7
• Recursive Function Calls 16.8
• Exercises 16.9
• Solutions 16.10
16.2 Overview
With this last lecture, our advice is to skip it on first pass, unless you have a burning de-
sire to read it.
It’s here
2. for those who have worked through a number of applications, and now want to learn
more about the Python language
A variety of topics are treated in the lecture, including generators, exceptions and descriptors.
217
218 CHAPTER 16. MORE LANGUAGE FEATURES
16.3.1 Iterators
Writing us_cities.txt
In [2]: f = open('us_cities.txt')
f.__next__()
In [3]: f.__next__()
We see that file objects do indeed have a __next__ method, and that calling this method
returns the next line in the file.
The next method can also be accessed via the builtin function next(), which directly calls
this method
In [4]: next(f)
In [6]: next(e)
Writing test_table.csv
f = open('test_table.csv', 'r')
nikkei_data = reader(f)
next(nikkei_data)
In [9]: next(nikkei_data)
All iterators can be placed to the right of the in keyword in for loop statements.
In fact this is how the for loop works: If we write
for x in iterator:
<code block>
f = open('somefile.txt', 'r')
for line in f:
# do something
16.3.3 Iterables
You already know that we can put a Python list to the right of in in a for loop
spam
eggs
Out[11]: list
In [12]: next(x)
�
↪---------------------------------------------------------------------------
<ipython-input-12-92de4e9f6b1e> in <module>
----> 1 next(x)
Out[13]: list
In [14]: y = iter(x)
type(y)
Out[14]: list_iterator
In [15]: next(y)
Out[15]: 'foo'
In [16]: next(y)
Out[16]: 'bar'
In [17]: next(y)
�
↪---------------------------------------------------------------------------
<ipython-input-17-81b9d2f0f16a> in <module>
----> 1 next(y)
StopIteration:
In [18]: iter(42)
222 CHAPTER 16. MORE LANGUAGE FEATURES
�
↪ ---------------------------------------------------------------------------
<ipython-input-18-ef50b48e4398> in <module>
----> 1 iter(42)
Some built-in functions that act on sequences also work with iterables
• max(), min(), sum(), all(), any()
For example
Out[19]: 10
In [20]: y = iter(x)
type(y)
Out[20]: list_iterator
In [21]: max(y)
Out[21]: 10
One thing to remember about iterators is that they are depleted by use
Out[22]: 10
In [23]: max(y)
16.4. NAMES AND NAME RESOLUTION 223
�
↪---------------------------------------------------------------------------
<ipython-input-23-062424e6ec08> in <module>
----> 1 max(y)
In [24]: x = 42
We now know that when this statement is executed, Python creates an object of type int in
your computer’s memory, containing
• the value 42
• some associated attributes
But what is x itself?
In Python, x is called a name, and the statement x = 42 binds the name x to the integer
object we have just discussed.
Under the hood, this process of binding names to objects is implemented as a dictionary—
more about this in a moment.
There is no problem binding two or more names to the one object, regardless of what that
object is
g = f
id(g) == id(f)
Out[25]: True
In [26]: g('test')
test
224 CHAPTER 16. MORE LANGUAGE FEATURES
In the first step, a function object is created, and the name f is bound to it.
After binding the name g to the same object, we can use it anywhere we would use f.
What happens when the number of names bound to an object goes to zero?
Here’s an example of this situation, where the name x is first bound to one object and then
rebound to another
In [27]: x = 'foo'
id(x)
Out[27]: 139893190370168
16.4.2 Namespaces
In [29]: x = 42
Writing math2.py
Next let’s import the math module from the standard library
16.4. NAMES AND NAME RESOLUTION 225
In [33]: math.pi
Out[33]: 3.141592653589793
In [34]: math2.pi
Out[34]: 'foobar'
These two different bindings of pi exist in different namespaces, each one implemented as a
dictionary.
We can look at the dictionary directly, using module_name.__dict__
math.__dict__.items()
origin='/home/ubuntu/anaconda3/lib/python3.7/lib-dynload/math.cpython-37m-
x86_64-linux-
gnu.so')), ('acos', <built-in function acos>), ('acosh', <built-in�
↪function acosh>),
math2.__dict__.items()
('__spec__', ModuleSpec(name='math2',
loader=<_frozen_importlib_external.SourceFileLoader object at�
↪0x7f3b6821a240>,
origin='/home/ubuntu/repos/lecture-python-
programming/_build/jupyterpdf/executed/math2.py')), ('__file__',
'/home/ubuntu/repos/lecture-python-programming/_build/jupyterpdf/executed/
↪math2.py'),
('__cached__', '/home/ubuntu/repos/lecture-python-
programming/_build/jupyterpdf/executed/__pycache__/math2.cpython-37.pyc'),
('__builtins__', {'__name__': 'builtins', '__doc__': "Built-in functions,�
↪exceptions,
'license': Type license() to see the full license text, 'help': Type�
↪help() for
InteractiveShell.get_ipython of <ipykernel.zmqshell.ZMQInteractiveShell�
↪object at
As you know, we access elements of the namespace using the dotted attribute notation
In [37]: math.pi
Out[37]: 3.141592653589793
Out[38]: True
In [39]: vars(math).items()
origin='/home/ubuntu/anaconda3/lib/python3.7/lib-dynload/math.cpython-37m-
x86_64-linux-
gnu.so')), ('acos', <built-in function acos>), ('acosh', <built-in�
↪function acosh>),
In [40]: dir(math)[0:10]
Out[40]: ['__doc__',
'__file__',
16.4. NAMES AND NAME RESOLUTION 231
'__loader__',
'__name__',
'__package__',
'__spec__',
'acos',
'acosh',
'asin',
'asinh']
In [41]: print(math.__doc__)
In [42]: math.__name__
Out[42]: 'math'
In [43]: print(__name__)
__main__
When we run a script using IPython’s run command, the contents of the file are executed as
part of __main__ too.
To see this, let’s create a file mod.py that prints its own __name__ attribute
Writing mod.py
232 CHAPTER 16. MORE LANGUAGE FEATURES
mod
__main__
In the second case, the code is executed as part of __main__, so __name__ is equal to
__main__.
To see the contents of the namespace of __main__ we use vars() rather than
vars(__main__) .
If you do this in IPython, you will see a whole lot of variables that IPython needs, and has
initialized when you started up your session.
If you prefer to see only the variables you have initialized, use whos
In [47]: x = 2
y = 3
import numpy as np
%whos
We are now working in the module __main__, and hence the namespace for __main__ is
the global namespace.
Next, we import a module called amodule
import amodule
At this point, the interpreter creates a namespace for the module amodule and starts exe-
cuting commands in the module.
While this occurs, the namespace amodule.__dict__ is the global namespace.
Once execution of the module finishes, the interpreter returns to the module from where the
import statement was made.
In this case it’s __main__, so the namespace of __main__ again becomes the global names-
pace.
Important fact: When we call a function, the interpreter creates a local namespace for that
function, and registers the variables in that namespace.
The reason for this will be explained in just a moment.
Variables in the local namespace are called local variables.
After the function returns, the namespace is deallocated and lost.
While the function is executing, we can view the contents of the local namespace with
locals().
For example, consider
In [49]: f(1)
{'x': 1, 'a': 2}
Out[49]: 2
We have been using various built-in functions, such as max(), dir(), str(), list(),
len(), range(), type(), etc.
How does access to these names work?
234 CHAPTER 16. MORE LANGUAGE FEATURES
In [50]: dir()[0:10]
Out[50]: ['In', 'Out', '_', '_11', '_13', '_14', '_15', '_16', '_19', '_2']
In [51]: dir(__builtins__)[0:10]
Out[51]: ['ArithmeticError',
'AssertionError',
'AttributeError',
'BaseException',
'BlockingIOError',
'BrokenPipeError',
'BufferError',
'BytesWarning',
'ChildProcessError',
'ConnectionAbortedError']
In [52]: __builtins__.max
But __builtins__ is special, because we can always access them directly as well
In [53]: max
Out[54]: True
Here f is the enclosing function for g, and each function gets its own namespaces.
Now we can give the rule for how namespace resolution works:
The order in which the interpreter searches for names is
If the name is not in any of these namespaces, the interpreter raises a NameError.
This is called the LEGB rule (local, enclosing, global, builtin).
Here’s an example that helps to illustrate .
Consider a script test.py that looks as follows
a = 0
y = g(10)
print("a = ", a, "y = ", y)
Writing test.py
a = 0 y = 11
In [58]: x
Out[58]: 2
First,
• The global namespace {} is created.
• The function object is created, and g is bound to it within the global namespace.
• The name a is bound to 0, again in the global namespace.
Next g is called via y = g(10), leading to the following sequence of actions
• The local namespace for the function is created.
• Local names x and a are bound, so that the local namespace becomes {'x': 10,
'a': 1}.
• Statement x = x + a uses the local a and local x to compute x + a, and binds local
name x to the result.
• This value is returned, and y is bound to it in the global namespace.
• Local x and a are discarded (and the local namespace is deallocated).
Note that the global a was not affected by the local a.
This is a good time to say a little more about mutable vs immutable objects.
Consider the code segment
x = 1
print(f(x), x)
2 1
We now understand what will happen here: The code prints 2 as the value of f(x) and 1 as
the value of x.
First f and x are registered in the global namespace.
The call f(x) creates a local namespace and adds x to it, bound to 1.
Next, this local x is rebound to the new integer object 2, and this value is returned.
None of this affects the global x.
However, it’s a different story when we use a mutable data type such as a list
return x
x = [1]
print(f(x), x)
[2] [2]
𝑛
1
𝑠2 ∶= ∑(𝑦𝑖 − 𝑦)̄ 2 𝑦 ̄ = sample mean
𝑛 − 1 𝑖=1
16.5.1 Assertions
For example, pretend for a moment that the np.var function doesn’t exist and we need to
write our own
If we run this with an array of length one, the program will terminate and print our error
message
In [62]: var([1])
�
↪ ---------------------------------------------------------------------------
<ipython-input-62-8419b6ab38ec> in <module>
----> 1 var([1])
<ipython-input-61-e6ffb16a7098> in var(y)
1 def var(y):
2 n = len(y)
----> 3 assert n > 1, 'Sample size must be greater than one.'
4 return np.sum((y - y.mean())**2) / float(n-1)
The approach used above is a bit limited, because it always leads to termination.
Sometimes we can handle errors more gracefully, by treating special cases.
Let’s look at how this is done.
Exceptions
In [63]: def f:
Since illegal syntax cannot be executed, a syntax error terminates execution of the program.
Here’s a different kind of error, unrelated to syntax
In [64]: 1 / 0
�
↪---------------------------------------------------------------------------
<ipython-input-64-bc757c3fda29> in <module>
----> 1 1 / 0
Here’s another
In [65]: x1 = y1
�
↪---------------------------------------------------------------------------
<ipython-input-65-a7b8d65e9e45> in <module>
----> 1 x1 = y1
And another
240 CHAPTER 16. MORE LANGUAGE FEATURES
In [66]: 'foo' + 6
�
↪ ---------------------------------------------------------------------------
<ipython-input-66-216809d6e6fe> in <module>
----> 1 'foo' + 6
And another
In [67]: X = []
x = X[0]
�
↪ ---------------------------------------------------------------------------
<ipython-input-67-082a18d7a0aa> in <module>
1 X = []
----> 2 x = X[0]
Catching Exceptions
We can catch and deal with exceptions using try – except blocks.
Here’s a simple example
except ZeroDivisionError:
print('Error: division by zero. Returned None')
return None
In [69]: f(2)
Out[69]: 0.5
In [70]: f(0)
In [71]: f(0.0)
In [73]: f(2)
Out[73]: 0.5
In [74]: f(0)
In [75]: f('foo')
In [77]: f(2)
Out[77]: 0.5
In [78]: f(0)
In [79]: f('foo')
Let’s look at some special syntax elements that are routinely used by Python developers.
You might not need the following concepts immediately, but you will see them in other peo-
ple’s code.
Hence you need to understand them at some stage of your Python education.
16.6.1 Decorators
Decorators are a bit of syntactic sugar that, while easily avoided, have turned out to be popu-
lar.
It’s very easy to say what decorators do.
On the other hand it takes a bit of effort to explain why you might use them.
16.6. DECORATORS AND DESCRIPTORS 243
An Example
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
Now suppose there’s a problem: occasionally negative numbers get fed to f and g in the cal-
culations that follow.
If you try it, you’ll see that when these functions are called with negative numbers they re-
turn a NumPy object called nan .
This stands for “not a number” (and indicates that you are trying to evaluate a mathematical
function at a point where it is not defined).
Perhaps this isn’t what we want, because it causes other problems that are hard to pick up
later on.
Suppose that instead we want the program to terminate whenever this happens, with a sensi-
ble error message.
This change is easy enough to implement
def f(x):
assert x >= 0, "Argument must be nonnegative"
return np.log(np.log(x))
def g(x):
assert x >= 0, "Argument must be nonnegative"
return np.sqrt(42 * x)
Notice however that there is some repetition here, in the form of two identical lines of code.
Repetition makes our code longer and harder to maintain, and hence is something we try
hard to avoid.
Here it’s not a big deal, but imagine now that instead of just f and g, we have 20 such func-
tions that we need to modify in exactly the same way.
This means we need to repeat the test logic (i.e., the assert line testing nonnegativity) 20
times.
The situation is still worse if the test logic is longer and more complicated.
In this kind of scenario the following approach would be neater
244 CHAPTER 16. MORE LANGUAGE FEATURES
def check_nonneg(func):
def safe_function(x):
assert x >= 0, "Argument must be nonnegative"
return func(x)
return safe_function
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
# Program continues with various calculations using f and g
Enter Decorators
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
16.6. DECORATORS AND DESCRIPTORS 245
with
In [86]: @check_nonneg
def f(x):
return np.log(np.log(x))
@check_nonneg
def g(x):
return np.sqrt(42 * x)
16.6.2 Descriptors
One potential problem we might have here is that a user alters one of these variables but not
the other
Out[88]: 1000
In [89]: car.kms
Out[89]: 1610.0
Out[90]: 1610.0
In the last two lines we see that miles and kms are out of sync.
What we really want is some mechanism whereby each time a user sets one of these variables,
the other is automatically updated.
A Solution
def get_miles(self):
return self._miles
def get_kms(self):
return self._kms
Out[92]: 1000
Out[93]: 9660.0
How it Works
The names _miles and _kms are arbitrary names we are using to store the values of the
variables.
The objects miles and kms are properties, a common kind of descriptor.
The methods get_miles, set_miles, get_kms and set_kms define what happens when
you get (i.e. access) or set (bind) these variables
• So-called “getter” and “setter” methods.
The builtin Python function property takes getter and setter methods and creates a prop-
erty.
For example, after car is created as an instance of Car, the object car.miles is a property.
Being a property, when we set its value via car.miles = 6000 its setter method is trig-
gered — in this case set_miles.
These days its very common to see the property function used via a decorator.
Here’s another version of our Car class that works as before but now uses decorators to set
up the properties
@property
def miles(self):
return self._miles
@property
def kms(self):
return self._kms
@miles.setter
def miles(self, value):
self._miles = value
self._kms = value * 1.61
@kms.setter
def kms(self, value):
self._kms = value
self._miles = value / 1.61
16.7 Generators
Out[95]: tuple
In [97]: type(plural)
Out[97]: list
Out[98]: generator
In [99]: next(plural)
Out[99]: 'dogs'
In [100]: next(plural)
Out[100]: 'cats'
In [101]: next(plural)
Out[101]: 'birds'
Out[102]: 285
The function sum() calls next() to get the items, adds successive terms.
In fact, we can omit the outer brackets in this case
Out[103]: 285
The most flexible way to create generator objects is to use generator functions.
Let’s look at some examples.
Example 1
It looks like a function, but uses a keyword yield that we haven’t met before.
Let’s see how it works after running this code
In [105]: type(f)
Out[105]: function
In [107]: next(gen)
Out[107]: 'start'
In [108]: next(gen)
Out[108]: 'middle'
In [109]: next(gen)
250 CHAPTER 16. MORE LANGUAGE FEATURES
Out[109]: 'end'
In [110]: next(gen)
�
↪---------------------------------------------------------------------------
<ipython-input-110-6e72e47198db> in <module>
----> 1 next(gen)
StopIteration:
The generator function f() is used to create generator objects (in this case gen).
Generators are iterators, because they support a next method.
The first call to next(gen)
• Executes code in the body of f() until it meets a yield statement.
• Returns that value to the caller of next(gen).
The second call to next(gen) starts executing from the next line
In [ ]: def f():
yield 'start'
yield 'middle' # This line!
yield 'end'
Example 2
In [112]: g
16.7. GENERATORS 251
Out[113]: generator
In [114]: next(gen)
Out[114]: 2
In [115]: next(gen)
Out[115]: 4
In [116]: next(gen)
Out[116]: 16
In [117]: next(gen)
�
↪---------------------------------------------------------------------------
<ipython-input-117-6e72e47198db> in <module>
----> 1 next(gen)
StopIteration:
In [ ]: def g(x):
while x < 100:
yield x
x = x * x # execution continues from here
252 CHAPTER 16. MORE LANGUAGE FEATURES
Out[119]: 4998994
But we are creating two huge lists here, range(n) and draws.
This uses lots of memory and is very slow.
If we make n even bigger then this happens
In [120]: n = 100000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
In [122]: n = 10000000
draws = f(n)
draws
In [123]: sum(draws)
16.8. RECURSIVE FUNCTION CALLS 253
Out[123]: 5001057
In summary, iterables
• avoid the need to create big lists/tuples, and
• provide a uniform interface to iteration that can be used transparently in for loops
This is not something that you will use every day, but it is still useful — you should learn it
at some stage.
Basically, a recursive function is a function that calls itself.
For example, consider the problem of computing 𝑥𝑡 for some t when
What happens here is that each successive call uses it’s own frame in the stack
• a frame is where the local variables of a given function call are held
• stack is memory used to process function calls
– a First In Last Out (FILO) queue
This example is somewhat contrived, since the first (iterative) solution would usually be pre-
ferred to the recursive solution.
We’ll meet less contrived applications of recursion later on.
16.9 Exercises
16.9.1 Exercise 1
The first few numbers in the sequence are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55.
Write a function to recursively compute the 𝑡-th Fibonacci number for any 𝑡.
16.9.2 Exercise 2
Complete the following code, and test it using this csv file, which we assume that you’ve put
in your current working directory
dates = column_iterator('test_table.csv', 1)
16.9.3 Exercise 3
prices
3
8
7
21
Using try – except, write a program to read in the contents of the file and sum the num-
bers, ignoring lines without numbers.
16.10 Solutions
16.10.1 Exercise 1
return 0
if t == 1:
return 1
else:
return x(t-1) + x(t-2)
Let’s test it
16.10.2 Exercise 2
dates = column_iterator('test_table.csv', 1)
i = 1
for date in dates:
print(date)
if i == 10:
break
i += 1
Date
2009-05-21
2009-05-20
2009-05-19
2009-05-18
2009-05-15
2009-05-14
2009-05-13
2009-05-12
2009-05-11
16.10.3 Exercise 3
7
21
Writing numbers.txt
In [130]: f = open('numbers.txt')
total = 0.0
for line in f:
try:
total += float(line)
except ValueError:
pass
f.close()
print(total)
39.0
Chapter 17
Debugging
17.1 Contents
• Overview 17.2
• Debugging 17.3
• Other Useful Magics 17.4
“Debugging is twice as hard as writing the code in the first place. Therefore, if
you write the code as cleverly as possible, you are, by definition, not smart enough
to debug it.” – Brian Kernighan
17.2 Overview
Are you one of those programmers who fills their code with print statements when trying to
debug their programs?
Hey, we all used to do that.
(OK, sometimes we still do that…)
But once you start writing larger programs you’ll need a better system.
Debugging tools for Python vary across platforms, IDEs and editors.
Here we’ll focus on Jupyter and leave you to explore other settings.
We’ll need the following imports
17.3 Debugging
257
258 CHAPTER 17. DEBUGGING
�
↪ ---------------------------------------------------------------------------
<ipython-input-2-c32a2280f47b> in <module>
5 plt.show()
6
----> 7 plot_log() # Call the function, generate plot
<ipython-input-2-c32a2280f47b> in plot_log()
2 fig, ax = plt.subplots(2, 1)
3 x = np.linspace(1, 2, 10)
----> 4 ax.plot(x, np.log(x))
5 plt.show()
6
This code is intended to plot the log function over the interval [1, 2].
But there’s an error here: plt.subplots(2, 1) should be just plt.subplots().
(The call plt.subplots(2, 1) returns a NumPy array containing two axes objects, suit-
able for having two subplots on the same figure)
The traceback shows that the error occurs at the method call ax.plot(x, np.log(x)).
The error occurs because we have mistakenly made ax a NumPy array, and a NumPy array
has no plot method.
But let’s pretend that we don’t understand this for the moment.
We might suspect there’s something wrong with ax but when we try to investigate this ob-
ject, we get the following exception:
In [3]: ax
�
↪ ---------------------------------------------------------------------------
<ipython-input-3-b00e77935981> in <module>
----> 1 ax
The problem is that ax was defined inside plot_log(), and the name is lost once that func-
tion terminates.
Let’s try doing it a different way.
We run the first cell block again, generating the same error
�
↪ ---------------------------------------------------------------------------
<ipython-input-4-c32a2280f47b> in <module>
5 plt.show()
6
----> 7 plot_log() # Call the function, generate plot
<ipython-input-4-c32a2280f47b> in plot_log()
2 fig, ax = plt.subplots(2, 1)
3 x = np.linspace(1, 2, 10)
----> 4 ax.plot(x, np.log(x))
5 plt.show()
6
%debug
You should be dropped into a new prompt that looks something like this
ipdb>
For example, here we simply type the name ax to see what’s happening with this object:
ipdb> ax
array([<matplotlib.axes.AxesSubplot object at 0x290f5d0>,
<matplotlib.axes.AxesSubplot object at 0x2930810>], dtype=object)
It’s now very clear that ax is an array, which clarifies the source of the problem.
To find out what else you can do from inside ipdb (or pdb), use the online help
ipdb> h
Undocumented commands:
======================
retval rv
ipdb> h c
c(ont(inue))
Continue execution, only stop when a breakpoint is encountered.
plot_log()
262 CHAPTER 17. DEBUGGING
Here the original problem is fixed, but we’ve accidentally written np.logspace(1, 2,
10) instead of np.linspace(1, 2, 10).
Now there won’t be any exception, but the plot won’t look right.
To investigate, it would be helpful if we could inspect variables like x during execution of the
function.
To this end, we add a “break point” by inserting breakpoint() inside the function code
block
def plot_log():
breakpoint()
fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
plot_log()
Now let’s run the script, and investigate via the debugger
> <ipython-input-6-a188074383b7>(6)plot_log()
-> fig, ax = plt.subplots()
(Pdb) n
> <ipython-input-6-a188074383b7>(7)plot_log()
-> x = np.logspace(1, 2, 10)
(Pdb) n
> <ipython-input-6-a188074383b7>(8)plot_log()
-> ax.plot(x, np.log(x))
(Pdb) x
17.4. OTHER USEFUL MAGICS 263
We used n twice to step forward through the code (one line at a time).
Then we printed the value of x to see what was happening with that variable.
To exit from the debugger, use q.