NumPy Essentials - Sample Chapter

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Fr

In today's world of science and technology, it's all about


speed and flexibility. When it comes to scientific computing,
NumPy tops the list by giving you both speed and
high productivity.
This book will walk you through NumPy using clear,
step-by-step examples and theory. We will focus on the
fundamentals of NumPy, including array objects, functions,
and matrices with practical examples.

Who this book is written for

P U B L I S H I N G

pl

Implement regression and curve fitting


for models
Perform time frequency / spectral density
analysis using the Fourier Transform modules
Collate with the distutils and setuptools
modules used by other Python libraries
Integrate Cython and NumPy
Write extension modules for NumPy code
using the C API
Build sophisticated data structures using
NumPy arrays with libraries such as Pandas
and SciPy

$ 29.99 US
19.99 UK

community experience distilled

Sa
m

Utilize matrix and mathematical computation


using linear algebra modules

C o m m u n i t y

Tanmay Dutta

If you are an experienced Python developer who intends


to drive your numerical and scientific applications with
NumPy, this book is for you. Prior experience or knowledge
of working with the Python language is required.

Manipulate the key attributes and universal


functions of NumPy

Leo (Liang-Huan) Chin

You will then learn about different NumPy modules


while performing operations such as calculating the
Fourier Transform; solving linear systems of equations,
interpolation, extrapolation, regression, and curve fitting;
and evaluating integrals and derivatives. We will
introduce you to using Cython with NumPy arrays
and writing extension modules for NumPy code using
the C API. This book will give you exposure to the vast
NumPy library and help you build efficient, high-speed
programs.

What you will learn from this book

NumPy Essentials

NumPy Essentials

ee

D i s t i l l e d

NumPy Essentials
Boost your scientific and analytic capabilities in no time at all
by discovering how to build real-world applications with NumPy

Prices do not include


local sales tax or VAT
where applicable

Visit www.PacktPub.com for books, eBooks,


code, downloads, and PacktLib.

E x p e r i e n c e

Leo (Liang-Huan) Chin


Tanmay Dutta

In this package, you will find:

The authors biography


A preview chapter from the book, Chapter 1 'An Introduction to NumPy'
A synopsis of the books content
More information on NumPy Essentials

About the Authors


Leo (Liang-Huan) Chin is a data engineer with more than 5 years of experience in the field
of Python. He works for Gogoro smart scooter, Taiwan, where his job entails discovering
new and interesting biking patterns . His previous work experience includes ESRI,
California, USA, which focused on spatial-temporal data mining. He loves data, analytics,
and the stories behind data and analytics. He received an MA degree of GIS in geography
from State University of New York, Buffalo. When Leo isn't glued to a computer screen, he
spends time on photography, traveling, and exploring some awesome restaurants across the
world. You can reach Leo at http://chinleock.github.io/portfolio/.

Tanmay Dutta is a seasoned programmer with expertise in programming languages such


as Python, Erlang, C++, Haskell, and F#. He has extensive experience in developing
numerical libraries and frameworks for investment banking businesses. He was also
instrumental in the design and development of a risk framework in Python (pandas,
NumPy, and Django) for a wealth fund in Singapore. Tanmay has a master's degree in
financial engineering from Nanyang Technological University, Singapore, and a
certification in computational finance from Tepper Business School, Carnegie Mellon
University.

Preface
Whether you are new to scientific/analytic programming, or a seasoned expert, this book
will provide you with the skills you need to successfully create, optimize, and distribute
your Python/NumPy analytical modules.
Starting from the beginning, this book will cover the key features of NumPy arrays and the
details of tuning the data format to make it most fit to your analytical needs. You will then
get a walkthrough of the core and submodules that are common to various
multidimensional, data-typed analysis. Next, you will move on to key technical
implementations, such as linear algebra and Fourier analysis. Finally, you will learn about
extending your NumPy capabilities for both functionality and performance by using
Cython and the NumPy C API. The last chapter of this book also provides advanced
materials to help you learn further by yourself.
This guide is an invaluable tutorial if you are planning to use NumPy in analytical projects.

What this book covers


Chapter 1, An Introduction to NumPy, is a Getting Started chapter of this book, which

provides the instructions to help you set up the environment. It starts with introducing the
Scientific Python Module family (SciPy Stack) and explains the key role NumPy plays in
scientific computing with Python.
Chapter 2, The NumPy ndarray Object, covers the essential usage of NumPy ndarray object,

including the initialization, the fundamental attributes, data types, and memory layout. It
also covers the theory underneath the operation, which gives you a clear picture of ndarray.
Chapter 3, Using Numpy Arrays, is an advanced chapter on NumPy ndarray usage, which

continues Chapter 2, The NumPy ndarray Object. It covers the universal functions in
NumPy and shows you the tricks to speed up your code. It also shows you the shape
manipulation and broadcasting rules.

Chapter 4, Numpy Core and Libs Submodules, includes two sections. The first section has

detailed explanation about the relationship between the way NumPy ndarray allocates
memory and the interaction of CPU cache. The second part of this chapter covers the special
NumPy Array containing multiple data types (the structure/record array). Also, this chapter
explores the experimental datetime64 module in NumPy.

Preface

Chapter 5, Linear Algebra in NumPy, starts by utilizing matrix and mathematical

computation using linear algebra modules. It shows you multiple ways to solve a
mathematical problem: using Matrix, vector decomposition, and polynomials. It also
provides concrete practice for curve fitting and regression.
Chapter 6, Fourier Analysis in NumPy, covers the signal processing with NumPy FFT

module and the Fourier application on amplifying signals/enlarging images without


distortion. It also provides the basic usage of the matplotlib package in Python.
Chapter 7, Building and Distributing NumPy Code, covers the basic details around

packaging and publishing the code in Python. It provides a basic introduction to NumPyspecific setup files and how to build extension modules.
Chapter 8, Speeding Up NumPy with Cython, introduces the users to the Cython

programming language and introduces readers to techniques that can be used to speed up
existing Python code.
Chapter 9, Introduction to the NumPy C-API, provides a basic introduction to the NumPy C

API and, in general, how to write wrappers around the existing C/C++ library. The chapter
aims to provide a gentle introduction along with equipping the readers with a basic
knowledge of how to create new wrappers and understand the existing programs.
Chapter 10, Further Reading, is the last chapter of this book. It gives a summary of what

we've learned in the book and explores 4 SciPy stack Python modules relying on NumPy
arrays, which give you ideas about further scientific Python programming.

An Introduction to NumPy
I'd rather do math in a general-purpose language than try to do general-purpose
programming in a math language.
- John D Cook
Python has become one of the most popular programming languages in scientific
computing over the last decade. The reasons for its success are numerous, and these will
gradually become apparent as you proceed with this book. Unlike many other
mathematical languages, such as MATLAB, R and Mathematica, Python is a generalpurpose programming language. As such, it provides a suitable framework to build
scientific applications and extend them further into any commercial or academic domain.
For example, consider a (somewhat) simple application that requires you to write a piece of
software and predicts the popularity of a blog post. Usually, these would be the steps that
you'd take to do this:
1. Generating a corpus of blog posts and their corresponding ratings (assuming that
the ratings here are suitably quantifiable).
2. Formulating a model that generates ratings based on content and other data
associated with the blog post.
3. Training a model on the basis of the data you found in step 1. Keep doing this
until you are confident of the reliability of the model.
4. Deploying the model as a web service.

An Introduction to NumPy

Normally, as you move through these steps, you will find yourself jumping between
different software stacks. Step 1 requires a lot of web scraping. Web scraping is a very
common problem, and there are tools in almost every programming language to scrape the
Web (if you are already using Python, you would probably choose Beautiful Soup or
Scrapy). Steps 2 and 3 involve solving a machine learning problem and require the use of
sophisticated mathematical languages or frameworks, such as Weka or MATLAB, which
are only a few of the vast variety of tools that provide machine learning functionality.
Similarly, step 4 can be implemented in many ways using many different tools. There isn't
one right answer. Since this is a problem that has been amply studied and solved (to a
reasonable extent) by a lot of scientists and software developers, getting a working solution
would not be difficult. However, there are issues, such as stability and scalability, that
might severely restrict your choice of programming languages, web frameworks, or
machine learning algorithms in each step of the problem. This is where Python wins over
most other programming languages. All the preceding steps (and more) can be
accomplished with only Python and a few third-party Python libraries. This flexibility and
ease of developing software in Python is precisely what makes it a comfortable host for a
scientific computing ecosystem. A very interesting interpretation of Python's prowess as a
mature application development language can be found in Python Data Analysis, Ivan Idris,
Packt Publishing. Precisely, Python is a language that is used for rapid prototyping, and it is
also used to build production-quality software because of the vast scientific ecosystem it has
acquired over time. The cornerstone of this ecosystem is NumPy.
Numerical Python (NumPy) is a successor to the Numeric package. It was originally
written by Travis Oliphant to be the foundation of a scientific computing environment in
Python. It branched off from the much wider SciPy module in early 2005 and had its first
stable release in mid-2006. Since then, it has enjoyed growing popularity among Pythonists
who work in the mathematics, science, and engineering fields. The goal of this book is to
make you conversant enough with NumPy so that you're able to use it and can build
complex scientific applications with it.

[8]

Chapter 1

The scientific Python stack


Let's begin by taking a brief tour of the Scientific Python (SciPy) stack.
Note that SciPy can mean a number of things: the Python module named
scipy (http://www.scipy.org/scipylib), the entire SciPy stack (http:
//www.scipy.org/about.html), or any of the three conferences on
scientific Python that take place all over the world.

Figure 1: The SciPy stack, standard, and extended libraries

Fernando Perez, the primary author of IPython, said in his keynote at PyCon, Canada 2012:
Computing in science has evolved not only because software has evolved, but also because
we, as scientists, are doing much more than just floating point arithmetic.

[9]

An Introduction to NumPy

This is precisely why the SciPy stack boasts such rich functionality. The evolution of most of
the SciPy stack is motivated by teams of scientists and engineers trying to solve scientific
and engineering problems in a general-purpose programming language. A one-line
explanation of why NumPy matters so much is that it provides the core multidimensional
array object that is necessary for most tasks in scientific computing. This is why it is at the
root of the SciPy stack. NumPy provides an easy way to interface with legacy Fortran and
C/C++ numerical code using time-tested scientific libraries, which we know have been
working well for decades. Companies and labs across the world use Python to glue together
legacy code that has been around for a long time. In short, this means that NumPy allows us
to stand on the shoulders of giants; we do not have to reinvent the wheel. It is a dependency
for every other SciPy package. The NumPy ndarray object, which is the subject of the next
chapter, is essentially a Pythonic interface to data structures used by libraries written in
Fortran, C, and, C++. In fact, the internal memory layouts used by NumPy ndarray objects
implement C and Fortran layouts. This will be addressed in detail in upcoming chapters.
The next layer in the stack consists of SciPy, matplotlib, IPython (the interactive shell of
Python; we will use it for the examples throughout the book, and details of its installation
and usage will be provided in later sections), and SymPy modules. SciPy provides the bulk
of the scientific and numerical functionality that a major part of the ecosystem relies on.
Matplotlib is the de facto plotting and data visualization library in Python. IPython is an
increasingly popular interactive environment for scientific computing in Python. In fact, the
project has had such active development and enjoyed such popularity that it is no longer
limited to Python and extends its features to other scientific languages, particularly R and
Julia. This layer in the stack can be thought of as a bridge between the core array-oriented
functionality of NumPy and the domain-specific abstractions provided by the higher layers
of the stack. These domain-specific tools are commonly called SciKits-popular ones among
them are scikit-image (image processing), scikit-learn (machine learning), statsmodels
(statistics), pandas (advanced data analysis), and so on. Listing every scientific package in
Python would be nearly impossible since the scientific Python community is very active,
and there is always a lot of development happening for a large number of scientific
problems. The best way to keep track of projects is to get involved in the community. It is
immensely useful to join mailing lists, contribute to code, use the software for your daily
computational needs, and report bugs. One of the goals of this book is to get you interested
enough to actively involve yourself in the scientific Python community.

[ 10 ]

Chapter 1

The need for NumPy arrays


A fundamental question that beginners ask is. Why are arrays necessary for scientific
computing at all? Surely, one can perform complex mathematical operations on any abstract
data type, such as a list. The answer lies in the numerous properties of arrays that make
them significantly more useful. In this section, let's go over a few of these properties to
emphasize why something such as the NumPy ndarray object exists at all.

Representing of matrices and vectors


The abstract mathematical concepts of matrices and vectors are central to many scientific
problems. Arrays provide a direct semantic link to these concepts. Indeed, whenever a piece
of mathematical literature makes reference to a matrix, one can safely think of an array as
the software abstraction that represents the matrix. In scientific literature, an expression
th
th
such as Aij is typically used to denote the element in the i row and j column of array A.
The corresponding expression in NumPy would simply be A[i,j]. For matrix operations,
NumPy arrays also support vectorization (details are addressed in Chapter 3, Using
NumPy Arrays), which speeds up execution greatly. Vectorization makes the code more
concise, easier to read, and much more akin to mathematical notation. Like matrices, arrays
can be multidimensional too. Every element of an array is addressable through a set of
integers called indices, and the process of accessing elements of an array with sets of
integers is called indexing. This functionality can indeed be implemented without using
arrays, but this would be cumbersome and quite unnecessary.

Efficiency
Efficiency can mean a number of things in software. The term may be used to refer to the
speed of execution of a program, its data retrieval and storage performance, its memory
overhead (the memory consumed when a program is executing), or its overall throughput.
NumPy arrays are better than most other data structures with respect to almost all of these
characteristics (with a few exceptions such as pandas, DataFrames, or SciPy's sparse
matrices, which we shall deal with in later chapters). Since NumPy arrays are statically
typed and homogenous, fast mathematical operations can be implemented in compiled
languages (the default implementation uses C and Fortran). Efficiency (the availability of
fast algorithms working on homogeneous arrays) makes NumPy popular and important.

[ 11 ]

An Introduction to NumPy

Ease of development
The NumPy module is a powerhouse of off-the-shelf functionality for mathematical tasks. It
adds greatly to Python's ease of development. The following is a brief summary of what the
module contains, most of which we shall explore in this book. A far more detailed treatment
of the NumPy module is in the definitive Guide to NumPy, Travis Oliphat. The NumPy API is
so flexible that it has been adopted extensively by the scientific Python community as the
standard API to build scientific applications. Examples of how this standard is applied
across scientific disciplines can be found in The NumPy Array: a structure for efficient
numerical computation, Van Der Walt, and others:
Submodule

Contents

numpy.core

Basic objects

lib

Additional utilities

linalg

Basic linear algebra

fft

Discrete Fourier transforms

random

Random number generators

distutils

Enhanced build and distribution

testing

Unit testing

f2py

Automatic wrapping of the Fortran code

NumPy in Academia and Industry


It is said that, if you stand at Times Square long enough, you will meet everyone in the
world. By now, you must have been convinced that NumPy is the Times Square of SciPy. If
you are writing scientific applications in Python, there is not much you can do without
digging into NumPy. Figure 2 shows the scope of SciPy in scientific computing at varying
levels of abstraction. The red arrow denotes the various low-level functions that are
expected of scientific software, and the blue arrow denotes the different application
domains that exploit these functions. Python, armed with the SciPy stack, is at the forefront
of the languages that provide these capabilities.

[ 12 ]

Chapter 1

A Google Scholar search for NumPy returns nearly 6,280 results. Some of these are papers
and articles about NumPy and the SciPy stack itself, and many more are about NumPy's
applications in a wide variety of research problems. Academics love Python, which is
showcased by the increasing popularity of the SciPy stack as the primary language of
scientific programming in countless universities and research labs all over the world. The
experiences of many scientists and software professionals have been published on the
Python website:

Figure 2: Python versus other languages

Code conventions used in the book


Now that the credibility of Python and NumPy has been established, let's get our hands
dirty.
The default environment used for all Python code in this book will be IPython. Instructions
on how to install IPython and other tools follow in the next section. Throughout the book,
you will only have to enter input in either the command window or the IPython prompt.
Unless otherwise specified, code will refer to Python code, and command will refer to bash
or DOS commands.
[ 13 ]

An Introduction to NumPy

All Python input code will be formatted in snippets like these:


In [42]: print("Hello, World!")
nd

In [42]: in the preceding snippet indicates that this is the 42 input to the IPython

session. Similarly, all input to the command line will be formatted as follows:
$ python hello_world.py

On Windows systems, the same command will look something like this:
C:\Users\JohnDoe> python hello_world.py

For the sake of consistency, the $ sign will be used to denote the command-line prompt,
regardless of OS. Prompts, such as C:\Users\JohnDoe>, will not appear in the book.
While, conventionally, the $ sign indicates bash prompts on Unix systems, the same
commands (without typing the actual dollar sign or any other character), can be used on
Windows too. If, however, you are using Cygwin or Git Bash, you should be able to use
Bash commands on Windows too.
Note that Git Bash is available by default if you install Git on Windows.

Installation requirements
Let's take a look at the various requirements we need to set up before we proceed.

Using Python distributions


The three most important Python modules you need for this book are NumPy, IPython, and
matplotlib; in this book, the code is based on the Python 3.4/2.7- compatible version,
NumPy version 1.9, and matplotlib 1.4.3. The easiest way to install these requirements (and
more) is to install a complete Python distribution, such as Enthought Canopy, EPD,
Anaconda, or Python (x,y). Once you have installed any one of these, you can safely skip
the remainder of this section and should be ready to begin.

[ 14 ]

Chapter 1

Note for Canopy users: You can use the Canopy GUI, which includes an
embedded IPython console, a text editor, and IPython notebook editors.
When working with the command line, for best results use the Canopy
Terminal found in Canopy's Tools menu.
Note for Windows OS users: Besides the Python distribution, you can also
install the prebuilt Windows python extended packages from Ghristoph
Gohlke's website at http://www.lfd.uci.edu/~gohlke/pythonlibs/

Using Python package managers


You can also use Python package managers, such enpkg, Conda, pip or easy_install, to
install the requirements using one of the following commands; replace numpy with any
other package name you'd like to install, for example, ipython, matplotlib and so on:
$
$
$
$

pip install numpy


easy_install numpy
enpkg numpy # for Canopy users
conda install numpy # for Anaconda users

Using native package managers


If the Python interpreter you want to use comes with the OS and is not a third-party
installation, you may prefer using OS-specific package managers such as aptitude, yum, or
Homebrew. The following table illustrates the package managers and the respective
commands used to install NumPy:
Package managers Commands
Aptitude

$ sudo apt-get install python-numpy

Yum

$ yum install python-numpy

Homebrew

$ brew install numpy

Note that, when installing NumPy (or any other Python modules) on OS X systems with
Homebrew, Python should have been originally installed with Homebrew.

[ 15 ]

An Introduction to NumPy

Detailed installation instructions are available on the respective websites of NumPy,


IPython, and matplotlib. As a precaution, to check whether NumPy was installed properly,
open an IPython terminal and type the following commands:
In [1]: import numpy as np
In [2]: np.test()

If the first statement looks like it does nothing, this is a good sign. If it executes without any
output, this means that NumPy was installed and has been imported properly into your
Python session. The second statement runs the NumPy test suite. It is not critically
necessary, but one can never be too cautious. Ideally, it should run for a few minutes and
produce the test results. It may generate a few warnings, but these are no cause for alarm. If
you wish, you may run the test suites of IPython and matplotlib, too.
Note that the matplotlib test suite only runs reliably if matplotlib has been
installed from a source. However, testing matplotlib is not very necessary.
If you can import matplotlib without any errors, it indicates that it is ready
for use.
Congratulations! We are now ready to begin.

Summary
In this chapter, we introduced ourselves to the NumPy module. We took a look at how
NumPy is a useful software tool to have for those of you who are working in scientific
computing. We installed the software required to proceed through the rest of this book.
In next chapter, we will get to the powerful NumPy ndarray object, showing you how to
use it efficiently.

[ 16 ]

Get more information NumPy Essentials

Where to buy this book


You can buy NumPy Essentials from the Packt Publishing website.
Alternatively, you can buy the book from Amazon, BN.com, Computer Manuals and most internet
book retailers.
Click here for ordering and shipping details.

www.PacktPub.com

Stay Connected:

You might also like