NumPy Essentials - Sample Chapter
NumPy Essentials - Sample Chapter
NumPy Essentials - Sample Chapter
P U B L I S H I N G
pl
$ 29.99 US
19.99 UK
Sa
m
C o m m u n i t y
Tanmay Dutta
NumPy Essentials
NumPy Essentials
ee
D i s t i l l e d
NumPy Essentials
Boost your scientific and analytic capabilities in no time at all
by discovering how to build real-world applications with NumPy
E x p e r i e n c e
Preface
Whether you are new to scientific/analytic programming, or a seasoned expert, this book
will provide you with the skills you need to successfully create, optimize, and distribute
your Python/NumPy analytical modules.
Starting from the beginning, this book will cover the key features of NumPy arrays and the
details of tuning the data format to make it most fit to your analytical needs. You will then
get a walkthrough of the core and submodules that are common to various
multidimensional, data-typed analysis. Next, you will move on to key technical
implementations, such as linear algebra and Fourier analysis. Finally, you will learn about
extending your NumPy capabilities for both functionality and performance by using
Cython and the NumPy C API. The last chapter of this book also provides advanced
materials to help you learn further by yourself.
This guide is an invaluable tutorial if you are planning to use NumPy in analytical projects.
provides the instructions to help you set up the environment. It starts with introducing the
Scientific Python Module family (SciPy Stack) and explains the key role NumPy plays in
scientific computing with Python.
Chapter 2, The NumPy ndarray Object, covers the essential usage of NumPy ndarray object,
including the initialization, the fundamental attributes, data types, and memory layout. It
also covers the theory underneath the operation, which gives you a clear picture of ndarray.
Chapter 3, Using Numpy Arrays, is an advanced chapter on NumPy ndarray usage, which
continues Chapter 2, The NumPy ndarray Object. It covers the universal functions in
NumPy and shows you the tricks to speed up your code. It also shows you the shape
manipulation and broadcasting rules.
Chapter 4, Numpy Core and Libs Submodules, includes two sections. The first section has
detailed explanation about the relationship between the way NumPy ndarray allocates
memory and the interaction of CPU cache. The second part of this chapter covers the special
NumPy Array containing multiple data types (the structure/record array). Also, this chapter
explores the experimental datetime64 module in NumPy.
Preface
computation using linear algebra modules. It shows you multiple ways to solve a
mathematical problem: using Matrix, vector decomposition, and polynomials. It also
provides concrete practice for curve fitting and regression.
Chapter 6, Fourier Analysis in NumPy, covers the signal processing with NumPy FFT
packaging and publishing the code in Python. It provides a basic introduction to NumPyspecific setup files and how to build extension modules.
Chapter 8, Speeding Up NumPy with Cython, introduces the users to the Cython
programming language and introduces readers to techniques that can be used to speed up
existing Python code.
Chapter 9, Introduction to the NumPy C-API, provides a basic introduction to the NumPy C
API and, in general, how to write wrappers around the existing C/C++ library. The chapter
aims to provide a gentle introduction along with equipping the readers with a basic
knowledge of how to create new wrappers and understand the existing programs.
Chapter 10, Further Reading, is the last chapter of this book. It gives a summary of what
we've learned in the book and explores 4 SciPy stack Python modules relying on NumPy
arrays, which give you ideas about further scientific Python programming.
An Introduction to NumPy
I'd rather do math in a general-purpose language than try to do general-purpose
programming in a math language.
- John D Cook
Python has become one of the most popular programming languages in scientific
computing over the last decade. The reasons for its success are numerous, and these will
gradually become apparent as you proceed with this book. Unlike many other
mathematical languages, such as MATLAB, R and Mathematica, Python is a generalpurpose programming language. As such, it provides a suitable framework to build
scientific applications and extend them further into any commercial or academic domain.
For example, consider a (somewhat) simple application that requires you to write a piece of
software and predicts the popularity of a blog post. Usually, these would be the steps that
you'd take to do this:
1. Generating a corpus of blog posts and their corresponding ratings (assuming that
the ratings here are suitably quantifiable).
2. Formulating a model that generates ratings based on content and other data
associated with the blog post.
3. Training a model on the basis of the data you found in step 1. Keep doing this
until you are confident of the reliability of the model.
4. Deploying the model as a web service.
An Introduction to NumPy
Normally, as you move through these steps, you will find yourself jumping between
different software stacks. Step 1 requires a lot of web scraping. Web scraping is a very
common problem, and there are tools in almost every programming language to scrape the
Web (if you are already using Python, you would probably choose Beautiful Soup or
Scrapy). Steps 2 and 3 involve solving a machine learning problem and require the use of
sophisticated mathematical languages or frameworks, such as Weka or MATLAB, which
are only a few of the vast variety of tools that provide machine learning functionality.
Similarly, step 4 can be implemented in many ways using many different tools. There isn't
one right answer. Since this is a problem that has been amply studied and solved (to a
reasonable extent) by a lot of scientists and software developers, getting a working solution
would not be difficult. However, there are issues, such as stability and scalability, that
might severely restrict your choice of programming languages, web frameworks, or
machine learning algorithms in each step of the problem. This is where Python wins over
most other programming languages. All the preceding steps (and more) can be
accomplished with only Python and a few third-party Python libraries. This flexibility and
ease of developing software in Python is precisely what makes it a comfortable host for a
scientific computing ecosystem. A very interesting interpretation of Python's prowess as a
mature application development language can be found in Python Data Analysis, Ivan Idris,
Packt Publishing. Precisely, Python is a language that is used for rapid prototyping, and it is
also used to build production-quality software because of the vast scientific ecosystem it has
acquired over time. The cornerstone of this ecosystem is NumPy.
Numerical Python (NumPy) is a successor to the Numeric package. It was originally
written by Travis Oliphant to be the foundation of a scientific computing environment in
Python. It branched off from the much wider SciPy module in early 2005 and had its first
stable release in mid-2006. Since then, it has enjoyed growing popularity among Pythonists
who work in the mathematics, science, and engineering fields. The goal of this book is to
make you conversant enough with NumPy so that you're able to use it and can build
complex scientific applications with it.
[8]
Chapter 1
Fernando Perez, the primary author of IPython, said in his keynote at PyCon, Canada 2012:
Computing in science has evolved not only because software has evolved, but also because
we, as scientists, are doing much more than just floating point arithmetic.
[9]
An Introduction to NumPy
This is precisely why the SciPy stack boasts such rich functionality. The evolution of most of
the SciPy stack is motivated by teams of scientists and engineers trying to solve scientific
and engineering problems in a general-purpose programming language. A one-line
explanation of why NumPy matters so much is that it provides the core multidimensional
array object that is necessary for most tasks in scientific computing. This is why it is at the
root of the SciPy stack. NumPy provides an easy way to interface with legacy Fortran and
C/C++ numerical code using time-tested scientific libraries, which we know have been
working well for decades. Companies and labs across the world use Python to glue together
legacy code that has been around for a long time. In short, this means that NumPy allows us
to stand on the shoulders of giants; we do not have to reinvent the wheel. It is a dependency
for every other SciPy package. The NumPy ndarray object, which is the subject of the next
chapter, is essentially a Pythonic interface to data structures used by libraries written in
Fortran, C, and, C++. In fact, the internal memory layouts used by NumPy ndarray objects
implement C and Fortran layouts. This will be addressed in detail in upcoming chapters.
The next layer in the stack consists of SciPy, matplotlib, IPython (the interactive shell of
Python; we will use it for the examples throughout the book, and details of its installation
and usage will be provided in later sections), and SymPy modules. SciPy provides the bulk
of the scientific and numerical functionality that a major part of the ecosystem relies on.
Matplotlib is the de facto plotting and data visualization library in Python. IPython is an
increasingly popular interactive environment for scientific computing in Python. In fact, the
project has had such active development and enjoyed such popularity that it is no longer
limited to Python and extends its features to other scientific languages, particularly R and
Julia. This layer in the stack can be thought of as a bridge between the core array-oriented
functionality of NumPy and the domain-specific abstractions provided by the higher layers
of the stack. These domain-specific tools are commonly called SciKits-popular ones among
them are scikit-image (image processing), scikit-learn (machine learning), statsmodels
(statistics), pandas (advanced data analysis), and so on. Listing every scientific package in
Python would be nearly impossible since the scientific Python community is very active,
and there is always a lot of development happening for a large number of scientific
problems. The best way to keep track of projects is to get involved in the community. It is
immensely useful to join mailing lists, contribute to code, use the software for your daily
computational needs, and report bugs. One of the goals of this book is to get you interested
enough to actively involve yourself in the scientific Python community.
[ 10 ]
Chapter 1
Efficiency
Efficiency can mean a number of things in software. The term may be used to refer to the
speed of execution of a program, its data retrieval and storage performance, its memory
overhead (the memory consumed when a program is executing), or its overall throughput.
NumPy arrays are better than most other data structures with respect to almost all of these
characteristics (with a few exceptions such as pandas, DataFrames, or SciPy's sparse
matrices, which we shall deal with in later chapters). Since NumPy arrays are statically
typed and homogenous, fast mathematical operations can be implemented in compiled
languages (the default implementation uses C and Fortran). Efficiency (the availability of
fast algorithms working on homogeneous arrays) makes NumPy popular and important.
[ 11 ]
An Introduction to NumPy
Ease of development
The NumPy module is a powerhouse of off-the-shelf functionality for mathematical tasks. It
adds greatly to Python's ease of development. The following is a brief summary of what the
module contains, most of which we shall explore in this book. A far more detailed treatment
of the NumPy module is in the definitive Guide to NumPy, Travis Oliphat. The NumPy API is
so flexible that it has been adopted extensively by the scientific Python community as the
standard API to build scientific applications. Examples of how this standard is applied
across scientific disciplines can be found in The NumPy Array: a structure for efficient
numerical computation, Van Der Walt, and others:
Submodule
Contents
numpy.core
Basic objects
lib
Additional utilities
linalg
fft
random
distutils
testing
Unit testing
f2py
[ 12 ]
Chapter 1
A Google Scholar search for NumPy returns nearly 6,280 results. Some of these are papers
and articles about NumPy and the SciPy stack itself, and many more are about NumPy's
applications in a wide variety of research problems. Academics love Python, which is
showcased by the increasing popularity of the SciPy stack as the primary language of
scientific programming in countless universities and research labs all over the world. The
experiences of many scientists and software professionals have been published on the
Python website:
An Introduction to NumPy
In [42]: in the preceding snippet indicates that this is the 42 input to the IPython
session. Similarly, all input to the command line will be formatted as follows:
$ python hello_world.py
On Windows systems, the same command will look something like this:
C:\Users\JohnDoe> python hello_world.py
For the sake of consistency, the $ sign will be used to denote the command-line prompt,
regardless of OS. Prompts, such as C:\Users\JohnDoe>, will not appear in the book.
While, conventionally, the $ sign indicates bash prompts on Unix systems, the same
commands (without typing the actual dollar sign or any other character), can be used on
Windows too. If, however, you are using Cygwin or Git Bash, you should be able to use
Bash commands on Windows too.
Note that Git Bash is available by default if you install Git on Windows.
Installation requirements
Let's take a look at the various requirements we need to set up before we proceed.
[ 14 ]
Chapter 1
Note for Canopy users: You can use the Canopy GUI, which includes an
embedded IPython console, a text editor, and IPython notebook editors.
When working with the command line, for best results use the Canopy
Terminal found in Canopy's Tools menu.
Note for Windows OS users: Besides the Python distribution, you can also
install the prebuilt Windows python extended packages from Ghristoph
Gohlke's website at http://www.lfd.uci.edu/~gohlke/pythonlibs/
Yum
Homebrew
Note that, when installing NumPy (or any other Python modules) on OS X systems with
Homebrew, Python should have been originally installed with Homebrew.
[ 15 ]
An Introduction to NumPy
If the first statement looks like it does nothing, this is a good sign. If it executes without any
output, this means that NumPy was installed and has been imported properly into your
Python session. The second statement runs the NumPy test suite. It is not critically
necessary, but one can never be too cautious. Ideally, it should run for a few minutes and
produce the test results. It may generate a few warnings, but these are no cause for alarm. If
you wish, you may run the test suites of IPython and matplotlib, too.
Note that the matplotlib test suite only runs reliably if matplotlib has been
installed from a source. However, testing matplotlib is not very necessary.
If you can import matplotlib without any errors, it indicates that it is ready
for use.
Congratulations! We are now ready to begin.
Summary
In this chapter, we introduced ourselves to the NumPy module. We took a look at how
NumPy is a useful software tool to have for those of you who are working in scientific
computing. We installed the software required to proceed through the rest of this book.
In next chapter, we will get to the powerful NumPy ndarray object, showing you how to
use it efficiently.
[ 16 ]
www.PacktPub.com
Stay Connected: