0% found this document useful (0 votes)
52 views40 pages

Python Numpy Array Tutorial (Article) - DataCamp

This document provides a tutorial on Python NumPy arrays for beginners. It explains that NumPy arrays are a compact and efficient data structure for scientific computing. The tutorial covers how to install NumPy, create NumPy arrays from data in files, use broadcasting in array operations, subset, slice and index arrays, ask for NumPy help, manipulate arrays, and visualize arrays. It recommends taking a DataCamp course for more in-depth learning of NumPy and other data science tools.

Uploaded by

dummydummp01
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
52 views40 pages

Python Numpy Array Tutorial (Article) - DataCamp

This document provides a tutorial on Python NumPy arrays for beginners. It explains that NumPy arrays are a compact and efficient data structure for scientific computing. The tutorial covers how to install NumPy, create NumPy arrays from data in files, use broadcasting in array operations, subset, slice and index arrays, ask for NumPy help, manipulate arrays, and visualize arrays. It recommends taking a DataCamp course for more in-depth learning of NumPy and other data science tools.

Uploaded by

dummydummp01
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 40

Log in Create Account

Karlijn Willems
March 19th, 2019

PYTHON +3

Python Numpy Array Tutorial


A NumPy tutorial for beginners in which you'll learn how to create a
NumPy array, use broadcasting, access values, manipulate arrays, and
much more.

NumPy is, just like SciPy, Scikit-Learn, Pandas, etc. one of the packages that you just can’t
miss when you’re learning data science, mainly because this library provides you with an
array data structure that holds some bene ts over Python lists, such as: being more
compact, faster access in reading and writing items, being more convenient and more
ef cient.

Today’s post will focus precisely on this. This NumPy tutorial will not only show you what
NumPy arrays actually are and how you can install Python, but you’ll also learn how to make
arrays (even when your data comes from les!), how broadcasting works, how you can ask for
help, how to manipulate your arrays and how to visualize them.

Content
What Is A Python Numpy Array?

How To Install Numpy

How To Make NumPy Arrays

Want to leave a comment?


How NumPy Broadcasting Works
How Do Array Mathematics Work?

How To Subset, Slice, And Index Arrays

How To Ask For Help

How To Manipulate Arrays

How To Visualize NumPy Arrays

Beyond Data Analysis with NumPy

If you want to know even more about NumPy arrays and the other data structures that you
will need in your data science journey, consider taking a look at DataCamp’s Intro to Python
for Data Science, which has a chapter on NumPy.

What Is A Python Numpy Array?


You already read in the introduction that NumPy arrays are a bit like Python lists, but still
very much different at the same time. For those of you who are new to the topic, let’s clarify
what it exactly is and what it’s good for.

As the name gives away, a NumPy array is a central data structure of the numpy library. The
library’s name is short for “Numeric Python” or “Numerical Python”.

This already gives an idea of what you’re dealing with, right?

In other words, NumPy is a Python library that is the core library for scienti c computing in
Python. It contains a collection of tools and techniques that can be used to solve on a
computer mathematical models of problems in Science and Engineering. One of these tools
is a high-performance multidimensional array object that is a powerful data structure for
ef cient computation of arrays and matrices. To work with these arrays, there’s a vast
amount of high-level mathematical functions operate on these matrices and arrays.

Then, what is an array?


Want to leave a comment?
When you look at the print of a couple of arrays, you could see it as a grid that contains
values of the same type:

script.py IPython Shell


1 # Print the array
2 print(my_array)
3
4 # Print the 2d array
5 print(my_2d_array)
6
7 # Print the 3d array
8 print(my_3d_array)

Run 

You see that, in the example above, the data are integers. The array holds and represents any
regular data in a structured way.

However, you should know that, on a structural level, an array is basically nothing but
pointers. It’s a combination of a memory address, a data type, a shape, and strides:

The data pointer indicates the memory address of the rst byte in the array,

The data type or dtype pointer describes the kind of elements that are contained within
the array,

The shape indicates the shape of the array, and

The strides are the number of bytes that should be skipped in memory to go to the next
element. If your strides are (10,1), you need to proceed one byte to get to the next column
and 10 bytes to locate the next row.

Or, in other words, an array contains information about the raw data, how to locate an
element Want
and how to interpret
to leave an element.
a comment?
Enough of the theory. Let’s check this out ourselves:

You can easily test this by exploring the numpy array attributes:

script.py IPython Shell


1 # Print out memory address
2 print(my_2d_array.data)
3
4 # Print out the shape of `my_array`
5 print(my_2d_array.shape)
6
7 # Print out the data type of
`my_array`
8 print(my_2d_array.dtype)
9
10 # Print out the stride of `my_array`
11 print(my_2d_array.strides)

Run 

You see that now, you get a lot more information: for example, the data type that is printed
out is ‘int64’ or signed 32-bit integer type; This is a lot more detailed! That also means that
the array is stored in memory as 64 bytes (as each integer takes up 8 bytes and you have an
array of 8 integers). The strides of the array tell us that you have to skip 8 bytes (one value) to
move to the next column, but 32 bytes (4 values) to get to the same position in the next row.
As such, the strides for the array will be (32,8).

Note that if you set the data type to int32 , the strides tuple that you get back will be
(16, 4) , as you will still need to move one value to the next column and 4 values to get the
same position. The only thing that will have changed is the fact that each integer will take up
4 bytes instead of 8.

Want to leave a comment?


The array that you see above is, as its name already suggested, a 2-dimensional array: you
have rows and columns. The rows are indicated as the “axis 0”, while the columns are the
“axis 1”. The number of the axis goes up accordingly with the number of the dimensions: in 3-
D arrays, of which you have also seen an example in the previous code chunk, you’ll have an
additional “axis 2”. Note that these axes are only valid for arrays that have at least 2
dimensions, as there is no point in having this for 1-D arrays;

These axes will come in handy later when you’re manipulating the shape of your NumPy
arrays.

How To Install Numpy


Before you can start to try out these NumPy arrays for yourself, you rst have to make sure
that you have it installed locally (assuming that you’re working on your pc). If you have the
Python library already available, go ahead and skip this section :)

If you still need to set up your environment, you must be aware that there are two major
ways of installing NumPy on your pc: with the help of Python wheels or the Anaconda
Python distribution.

… With Python Wheels

Make sure rstly that you have Python installed. You can go here if you still need to do this :)

If you’re working on Windows, make sure that you have added Python to the PATH
environment variable. Then, don’t forget to install a package manager, such as pip , which
will ensure that you’re able to use Python’s open-source libraries.

Note that recent versions of Python 3 come with pip, so double check if you have it and if
you do, upgrade it before you install NumPy:

pip install pip --upgrade


pip --version

Want to leave a comment?


Next, you can go here or here to get your NumPy wheel. After you have downloaded it,
navigate to the folder on your pc that stores it through the terminal and install it:

install "numpy-1.9.2rc1+mkl-cp34-none-win_amd64.whl"
import numpy
numpy.__version__

The two last lines allow you to verify that you have installed NumPy and check the version of
the package.

After these steps, you’re ready to start using NumPy!

… With The Anaconda Python Distribution

To get NumPy, you could also download the Anaconda Python distribution. This is easy and
will allow you to get started quickly! If you haven’t downloaded it already, go here to get it.
Follow the instructions to install, and you're ready to start!

Do you wonder why this might actually be easier?

The good thing about getting this Python distribution is the fact that you don’t need to
worry too much about separately installing NumPy or any of the major packages that you’ll
be using for your data analyses, such as pandas, scikit-learn, etc.

Because, especially if you’re very new to Python, programming or terminals, it can really
come as a relief that Anaconda already includes 100 of the most popular Python, R and Scala
packages for data science. But also for more seasoned data scientists, Anaconda is the way to
go if you want to get started quickly on tackling data science problems.

What’s more, Anaconda also includes several open source development environments such
as Jupyter and Spyder. If you’d like to start working with Jupyter Notebook after this tutorial,
go to this page.

In short, consider downloading Anaconda to get started on working with numpy and other
packages that are relevant to data science!
Want to leave a comment?
How To Make NumPy Arrays
So, now that you have set up your environment, it’s time for the real work. Admittedly, you
have already tried out some stuff with arrays in the above DataCamp Light chunks. However,
you haven’t really gotten any real hands-on practice with them, because you rst needed to
install NumPy on your own pc. Now that you have done this, it’s time to see what you need to
do in order to run the above code chunks on your own.

Some exercises have been included below so that you can already practice how it’s done
before you start on your own!

To make a numpy array, you can just use the np.array() function. All you need to do is
pass a list to it, and optionally, you can also specify the data type of the data. If you want to
know more about the possible data types that you can pick, go here or consider taking a
brief look at DataCamp’s NumPy cheat sheet.

There’s no need to go and memorize these NumPy data types if you’re a new user; But you do
have to know and care what data you’re dealing with. The data types are there when you
need more control over how your data is stored in memory and on disk. Especially in cases
where you’re working with extensive data, it’s good that you know to control the storage
type.

Don’t forget that, in order to work with the np.array() function, you need to make sure
that the numpy library is present in your environment. The NumPy library follows an import
convention: when you import this library, you have to make sure that you import it as np . By
doing this,
Wantyou’ll make
to leave sure that other Pythonistas understand your code more easily.
a comment?
In the following example you’ll create the my_array array that you have already played
around with above:

script.py IPython Shell


1 # Import `numpy` as `np`
2 _________________
3
4 # Make the array `my_array`
5 my_array = ________([[1,2,3,4], [5,6,7
,8]], dtype=np.int64)
6
7 # Print `my_array`
8 print(_________)

Solution Run 

If you would like to know more about how to make lists, go here.

However, sometimes you don’t know what data you want to put in your array, or you want to
import data into a numpy array from another source. In those cases, you’ll make use of initial
placeholders or functions to load data from text into arrays, respectively.

The following sections will show you how to do this.

How To Make An “Empty” NumPy Array

What people often mean when they say that they are creating “empty” arrays is that they
want to make use of initial placeholders, which you can ll up afterward. You can initialize
arrays with ones or zeros, but you can also create arrays that get lled up with evenly spaced
values, constant or random values.

However, you can still make a totally empty array, too.

Luckily for us, there are quite a lot of functions to make

Try it allWant
out below!
to leave a comment?
script.py IPython Shell
p (( ))
3
4 # Create an array of zeros
5 np.zeros((2,3,4),dtype=np.int16)
6
7 # Create an array with random values
8 np.random.random((2,2))
9
10 # Create an empty array
11 np.empty((3,2))
12
13 # Create a full array
14 np.full((2,2),7)
15
16 # Create an array of evenly-spaced
values
17 np.arange(10,25,5)
18
19 # Create an array of evenly-spaced
values
20 np.linspace(0,2,9)

Run 

Tip: play around with the above functions so that you understand how they work!

For some, such as np.ones() , np.random.random() , np.empty() , np.full() or


np.zeros() the only thing that you need to do in order to make arrays with ones or zeros
is pass the shape of the array that you want to make. As an option to np.ones() and
np.zeros() , you can also specify the data type. In the case of np.full() , you also have
to specify the constant value that you want to insert into the array.

With np.linspace() and np.arange() you can make arrays of evenly spaced values. The
difference between these two functions is that the last value of the three that are passed in
the code chunk above designates either the step value for np.linspace() or a number of
samples for np.arange() . What happens in the rst is that you want, for example, an
array of 9 values that lie between 0 and 2. For the latter, you specify that you want an array
to start at 10 and per steps of 5, generate values for the array that you’re creating.

Remember that NumPy also allows you to create an identity array or matrix with np.eye()
and np.identity() . An identity matrix is a square matrix of which all elements in the
Want to leave a comment?
principal diagonal are ones, and all other elements are zeros. When you multiply a matrix
with an identity matrix, the given matrix is left unchanged.

In other words, if you multiply a matrix by an identity matrix, the resulting product will be
the same matrix again by the standard conventions of matrix multiplication.

Even though the focus of this tutorial is not on demonstrating how identity matrices work, it
suf ces to say that identity matrices are useful when you’re starting to do matrix
calculations: they can simplify mathematical equations, which makes your computations
more ef cient and robust.

How To Load NumPy Arrays From Text

Creating arrays with the help of initial placeholders or with some example data is an
excellent way of getting started with numpy . But when you want to get started with data
analysis, you’ll need to load data from text les.

With that what you have seen up until now, you won’t really be able to do much. Make use of
some speci c functions to load data from your les, such as loadtxt() or genfromtxt() .

Let’s say you have the following text les with data:

# This is your data in the text file


# Value1 Value2 Value3
# 0.2536 0.1008 0.3857
# 0.4839 0.4536 0.3561
# 0.1292 0.6875 0.5929
# 0.1781 0.3049 0.8928
# 0.6253 0.3486 0.8791

# Import your data


x, y, z = np.loadtxt('data.txt',
skiprows=1,
unpack=True)

In the code above,


Want you
to leave use loadtxt() to load the data in your environment. You see that
a comment?
the rst argument that both functions take is the text le data.txt . Next, there are some
speci c arguments for each: in the rst statement, you skip the rst row, and you return the
columns as separate arrays with unpack=TRUE . This means that the values in column
Value1 will be put in x , and so on.

Note that, in case you have comma-delimited data or if you want to specify the data type,
there are also the arguments delimiter and dtype that you can add to the loadtxt()
arguments.

That’s easy and straightforward, right?

Let’s take a look at your second le with data:

# Your data in the text file


# Value1 Value2 Value3
# 0.4839 0.4536 0.3561
# 0.1292 0.6875 MISSING
# 0.1781 0.3049 0.8928
# MISSING 0.5801 0.2038
# 0.5993 0.4357 0.7410

my_array2 = np.genfromtxt('data2.txt',
skip_header=1,
filling_values=-999)

You see that here, you resort to genfromtxt() to load the data. In this case, you have to
handle some missing values that are indicated by the 'MISSING' strings. Since the
genfromtxt() function converts character strings in numeric columns to nan , you can
convert these values to other ones by specifying the filling_values argument. In this
case, you choose to set the value of these missing values to -999.

If by any chance, you have values that don’t get converted to nan by genfromtxt() , there’s
always the missing_values argument that allows you to specify what the missing values of
your data exactly are.

But this is not all.


Want to leave a comment?
Tip: check out this page to see what other arguments you can add to import your data
successfully.

You now might wonder what the difference between these two functions really is.

The examples indicated this maybe implicitly, but, in general, genfromtxt() gives you a
little bit more exibility; It’s more robust than loadtxt() .

Let’s make this difference a little bit more practical: the latter, loadtxt() , only works when
each row in the text le has the same number of values; So when you want to handle missing
values easily, you’ll typically nd it easier to use genfromtxt() .

But this is de nitely not the only reason.

A brief look on the number of arguments that genfromtxt() has to offer will teach you that
there is really a lot more things that you can specify in your import, such as the maximum
number of rows to read or the option to automatically strip white spaces from variables.

How To Save NumPy Arrays

Once you have done everything that you need to do with your arrays, you can also save them
to a le. If you want to save the array to a text le, you can use the savetxt() function to
do this:

import numpy as np
x = np.arange(0.0,5.0,1.0)
np.savetxt('test.out', x, delimiter=',')

Remember that np.arange() creates a NumPy array of evenly-spaced values. The third
value that you pass to this function is the step value.

There are, of course, other ways to save your NumPy arrays to text les. Check out the
functions in the table below if you want to get your data to binary les or archives:

save() Save an array to a binary le in NumPy .npy format


Want to leave a comment?
save() Save an array to a binary le in NumPy .npy format

savez() Save several arrays into an uncompressed .npz archive

savez_compressed() Save several arrays into a compressed .npz archive

For more information or examples of how you can use the above functions to save your data,
go here or make use of one of the help functions that NumPy has to offer to get to know
more instantly!

Are you not sure what these NumPy help functions are?

No worries! You’ll learn more about them in one of the next sections!

How To Inspect Your NumPy Arrays

Besides the array attributes that have been mentioned above, namely, data , shape , dtype
and strides , there are some more that you can use to easily get to know more about your
arrays. The ones that you might nd interesting to use when you’re just starting out are the
following:

script.py IPython Shell


1 # Print the number of `my_array`'s
dimensions
2 print(my_array.ndim)
3
4 # Print the number of `my_array`'s
elements
5 print(my_array.size)
6
7 # Print information about
`my_array`'s memory layout
8 print(my_array.flags)
9
10 # Print the length of one array
element in bytes
11 print(my_array.itemsize)
12
13 # Print the total consumed bytes by
`my_array`'s elements
14 print(my_array.nbytes)

Run 
Want to leave a comment?
These are almost all the attributes that an array can have.

Don’t worry if you don’t feel that all of them are useful for you at this point; This is fairly
normal, because, just like you read in the previous section, you’ll only get to worry about
memory when you’re working with large data sets.

Also note that, besides the attributes, you also have some other ways of gaining more
information on and even tweaking your array slightly:

script.py IPython Shell


1 # Print the length of `my_array`
2 print(len(my_array))
3
4 # Change the data type of `my_array`
5 my_array.astype(float)

Run 

Now that you have made your array, either by making one yourself with the np.array() or
one of the initial placeholder functions, or by loading in your data through the loadtxt()
or genfromtxt() functions, it’s time to look more closely into the second key element that
really de nes the NumPy library: scienti c computing.

How NumPy Broadcasting Works


Before you go deeper into scienti c computing, it might be a good idea to rst go over what
broadcasting exactly is: it’s a mechanism that allows NumPy to work with arrays of different
shapes when you’re performing arithmetic operations.
Want to leave a comment?
To put it in a more practical context, you often have an array that’s somewhat larger and
another one that’s slightly smaller. Ideally, you want to use the smaller array multiple times
to perform an operation (such as a sum, multiplication, etc.) on the larger array.

To do this, you use the broadcasting mechanism.

However, there are some rules if you want to use it. And, before you already sigh, you’ll see
that these “rules” are very simple and kind of straightforward!

First off, to make sure that the broadcasting is successful, the dimensions of your arrays
need to be compatible. Two dimensions are compatible when they are equal. Consider the
following example:

script.py IPython Shell


1 # Initialize `x`
2 x = np.ones((3,4))
3
4 # Check shape of `x`
5 print(x.shape)
6
7 # Initialize `y`
8 y = np.random.random((3,4))
9
10 # Check shape of `y`
11 print(y.shape)
12
13 # Add `x` and `y`
14 x + y

Run 

Two dimensions are also compatible when one of them is 1:

script.py IPython Shell


1 # Import `numpy` as `np`
2 import numpy as np
Want to leave a comment?
3
4 # Initialize `x`
5 x = np.ones((3,4))
6
7 # Check shape of `x`
8 print(x.shape)
9
10 # Initialize `y`
11 y = np.arange(4)
12
13 # Check shape of `y`
14 print(y.shape)
15
16 # Subtract `x` and `y`
17 x - y

Run 

Note that if the dimensions are not compatible, you will get a ValueError .

Tip: also test what the size of the resulting array is after you have done the computations!
You’ll see that the size is actually the maximum size along each dimension of the input
arrays.

In other words, you see that the result of x-y gives an array with shape (3,4) : y had a
shape of (4,) and x had a shape of (3,4) . The maximum size along each dimension of x
and y is taken to make up the shape of the new, resulting array.

Lastly, the arrays can only be broadcast together if they are compatible in all dimensions.
Consider the following example:

script.py IPython Shell


1 # Import `numpy` as `np`
2 import numpy as np
3
4 # Initialize `x` and `y`
5 x = np.ones((3,4))
6 y = np.random.random((5,1,4))
7
8 # Add `x` and `y`
9 x + y

Want to leave a comment?


Run 

You see that, even though x and y seem to have somewhat different dimensions, the two
can be added together.

That is because they are compatible in all dimensions:

Array x has dimensions 3 X 4,

Array y has dimensions 5 X 1 X 4

Since you have seen above that dimensions are also compatible if one of them is equal to 1,
you see that these two arrays are indeed a good candidate for broadcasting!

What you will notice is that in the dimension where y has size 1, and the other array has a
size greater than 1 (that is, 3), the rst array behaves as if it were copied along that
dimension.

Note that the shape of the resulting array will again be the maximum size along each
dimension of x and y : the dimension of the result will be (5,3,4)

In short, if you want to make use of broadcasting, you will rely a lot on the shape and
dimensions of the arrays with which you’re working.

But what if the dimensions are not compatible?

What if they are not equal or if one of them is not equal to 1?

You’ll have to x this by manipulating your array! You’ll see how to do this in one of the next
sections.

How Do Array Mathematics Work?


Want to leave a comment?
You’ve seen that broadcasting is handy when you’re doing arithmetic operations. In this
section, you’ll discover some of the functions that you can use to do mathematics with
arrays.

As such, it probably won’t surprise you that you can just use + , - , * , / or % to add,
subtract, multiply, divide or calculate the remainder of two (or more) arrays. However, a big
part of why NumPy is so handy, is because it also has functions to do this. The equivalent
functions of the operations that you have seen just now are, respectively, np.add() ,
np.subtract() , np.multiply() , np.divide() and np.remainder() .

You can also easily do exponentiation and taking the square root of your arrays with
np.exp() and np.sqrt() , or calculate the sines or cosines of your array with np.sin()
and np.cos() . Lastly, its’ also useful to mention that there’s also a way for you to calculate
the natural logarithm with np.log() or calculate the dot product by applying the dot() to
your array.

Try it all out in the DataCamp Light chunk below.

Just a tip: make sure to check out rst the arrays that have been loaded for this exercise!

script.py IPython Shell


1 # Add `x` and `y`
2 _______________(x,y)
3
4 # Subtract `x` and `y`
5 _______________(x,y)
6
7 # Multiply `x` and `y`
8 _______________(x,y)
9
10 # Divide `x` and `y`
11 ______________(x,y)
12
13 # Calculate the remainder of `x` and
`y`
14 ______________(x,y)

Solution Run 
Want to leave a comment?
Remember how broadcasting works? Check out the dimensions and the shapes of both x
and y in your IPython shell. Are the rules of broadcasting respected?

But there is more.

Check out this small list of aggregate functions:

a.sum() Array-wise sum

a.min() Array-wise minimum value

b.max(axis=0) Maximum value of an array row

b.cumsum(axis=1) Cumulative sum of the elements

a.mean() Mean

b.median() Median

a.corrcoef() Correlation coef cient

np.std(b) Standard deviation

Besides all of these functions, you might also nd it useful to know that there are
mechanisms that allow you to compare array elements. For example, if you want to check
whether the elements of two arrays are the same, you might use the == operator. To check
whether the array elements are smaller or bigger, you use the < or > operators.

This all seems quite straightforward, yes?

However, you can also compare entire arrays with each other! In this case, you use the
np.array_equal() function. Just pass in the two arrays that you want to compare with each
other, and you’re done.

Note that, besides comparing, you can also perform logical operations on your arrays. You
can start with np.logical_or() , np.logical_not() and np.logical_and() . This
basicallyWant
worksto like
leaveyour
a comment?
typical OR, NOT and AND logical operations;
In the simplest example, you use OR to see whether your elements are the same (for
example, 1), or if one of the two array elements is 1. If both of them are 0, you’ll return FALSE
. You would use AND to see whether your second element is also 1 and NOT to see if the
second element differs from 1.

Test this out in the code chunk below:

script.py IPython Shell


1 # `a` AND `b`
2 _____________(a, b)
3
4 # `a` OR `b`
5 _____________(a, b)
6
7 # `a` NOT `b`
8 _____________(a,b)

Solution Run 

How To Subset, Slice, And Index Arrays


Besides mathematical operations, you might also consider taking just a part of the original
array (or the resulting array) or just some array elements to use in further analysis or other
operations. In such case, you will need to subset, slice and/or index your arrays.

These operations are very similar to when you perform them on Python lists. If you want to
check out the similarities for yourself, or if you want a more elaborate explanation, you
might consider checking out DataCamp’s Python list tutorial.

If you have no clue at all on how these operations work, it suf ces for now to know these
two basic things:

You use square brackets [] as the index operator, and

Want to leave a comment?


Generally, you pass integers to these square brackets, but you can also put a colon : or a
combination of the colon with integers in it to designate the elements/rows/columns you
want to select.

Besides from these two points, the easiest way to see how this all ts together is by looking
at some examples of subsetting:

script.py IPython Shell


1 # Select the element at the 1st index
2 print(my_array[1])
3
4 # Select the element at row 1 column
2
5 print(my_2d_array[1][2])
6
7 # Select the element at row 1 column
2
8 print(my_2d_array[1,2])
9
10 # Select the element at row 1, column
2 and
11 print(my_3d_array[1,1,2])

Run 

Something a little bit more advanced than subsetting, if you will, is slicing. Here, you
consider not just particular values of your arrays, but you go to the level of rows and
columns. You’re basically working with “regions” of data instead of pure “locations”.

You can see what is meant with this analogy in these code examples:

script.py IPython Shell


1 # Select items at index 0 and 1
2 print(my_array[0:2])
3
4 # Select items at row 0 and 1, column
1
5 print(my_2d_array[0:2,1])
6
7 # Select items at row 1
8 # This is the same as saying
`my_3d_array[1,:,:]
9 print(my_3d_array[1,...])
Want to leave a comment?
Run 

You’ll see that, in essence, the following holds:

a[start:end] # items start through the end (but the end is not included!)
a[start:] # items start through the rest of the array
a[:end] # items from the beginning through the end (but the end is not included!)

Lastly, there’s also indexing. When it comes to NumPy, there are boolean indexing and
advanced or “fancy” indexing.

(In case you’re wondering, this is true NumPy jargon, I didn’t make the last one up!)

First up is boolean indexing. Here, instead of selecting elements, rows or columns based on
index number, you select those values from your array that ful ll a certain condition.

Putting this into code can be pretty easy:

script.py IPython Shell


1 # Try out a simple example
2 print(my_array[my_array<2])
3
4 # Specify a condition
5 bigger_than_3 = (my_3d_array >= 3)
6
7 # Use the condition to index our 3d
array
8 print(my_3d_array[bigger_than_3])

Run 

Note that, to specify a condition, you can also make use of the logical operators | (OR) and
Want
& (AND). to leave
If you woulda comment?
want to rewrite the condition above in such a way (which would be
inef cient, but I demonstrate it here for educational purposes :)), you would get
bigger_than_3 = (my_3d_array > 3) | (my_3d_array == 3) .

With the arrays that have been loaded in, there aren’t too many possibilities, but with arrays
that contain for example, names or capitals, the possibilities could be endless!

When it comes to fancy indexing, that what you basically do with it is the following: you pass
a list or an array of integers to specify the order of the subset of rows you want to select out
of the original array.

Does this sound a little bit abstract to you?

No worries, just try it out in the code chunk below:

script.py IPython Shell


1 # Select elements at (1,0), (0,1), (1
,2) and (0,0)
2 print(my_2d_array[[1, 0, 1, 0],[0, 1,
2, 0]])
3
4 # Select a subset of the rows and
columns
5 print(my_2d_array[[1, 0, 1, 0]][:,[0,1
,2,0]])

Run 

Now, the second statement might seem to make less sense to you at rst sight. This is
normal. It might make more sense if you break it down:

If you just execute my_2d_array[[1,0,1,0]] , the result is the following:

array([[5, 6, 7, 8],
[1, 2, 3, 4],
[5, 6, 7, 8],
Want to leave a comment?
[1, 2, 3, 4]])
What the second part, namely, [:,[0,1,2,0]] , is tell you that you want to keep all the
rows of this result, but that you want to change the order of the columns around a bit. You
want to display the columns 0, 1, and 2 as they are right now, but you want to repeat
column 0 as the last column instead of displaying column number 3. This will give you the
following result:

array([[5, 6, 7, 5],
[1, 2, 3, 1],
[5, 6, 7, 5],
[1, 2, 3, 1]])

Advanced indexing clearly holds no secrets for you any more!

How To Ask For Help


As a short intermezzo, you should know that you can always ask for more information about
the modules, functions or classes that you’re working with, especially becauseNumPy can be
quite something when you rst get started on working with it.

Asking for help is fairly easy.

You just make use of the speci c help functions that numpy offers to set you on your way:

Use lookfor() to do a keyword search on docstrings. This is speci cally handy if you’re
just starting out, as the ‘theory’ behind it all might fade in your memory. The one downside
is that you have to go through all of the search results if your query is not that speci c, as
is the case in the code example below. This might make it even less overviewable for you.

Use info() for quick explanations and code examples of functions, classes, or modules. If
you’re a person that learns by doing, this is the way to go! The only downside about using
this function is probably that you need to be aware of the module in which certain
attributes or functions are in. If you don’t know immediately what is meant by that, check
out the code example below.
Want to leave a comment?
You see, both functions have their advantages and disadvantages, but you’ll see for yourself
why both of them can be useful: try them out for yourself in the DataCamp Light code chunk
below!

script.py IPython Shell


1 # Look up info on `mean` with `np
.lookfor()`
2 print(np.lookfor("mean"))
3
4 # Get info on data types with `np.info
()`
5 np.info(np.ndarray.dtype)

Run 

Note that you indeed need to know that dtype is an attribute of ndarray . Also, make sure
that you don’t forget to put np in front of the modules, classes or terms you’re asking
information about, otherwise you will get an error message like this:

Traceback (most recent call last):


File "<stdin>", line 1, in <module>
NameError: name 'ndarray' is not defined

You now know how to ask for help, and that’s a good thing. The next topic that this NumPy
tutorial covers is array manipulation.

Not that you can not overcome this topic on your own, quite the contrary!

But some of the functions might raise questions, because, what is the difference between
resizing and reshaping?

And what is the difference between stacking your arrays horizontally and vertically?
Want to leave a comment?
The next section is all about answering these questions, but if you ever feel in doubt, feel free
to use the help functions that you have just seen to quickly get up to speed.

How To Manipulate Arrays


Performing mathematical operations on your arrays is one of the things that you’ll be doing,
but probably most importantly to make this and the broadcasting work is to know how to
manipulate your arrays.

Below are some of the most common manipulations that you’ll be doing.

How To Transpose Your Arrays

What transposing your arrays actually does is permuting the dimensions of it. Or, in other
words, you switch around the shape of the array. Let’s take a small example to show you the
effect of transposition:

script.py IPython Shell


1 # Print `my_2d_array`
2 print(my_2d_array)
3
4 # Transpose `my_2d_array`
5 print(np.transpose(my_2d_array))
6
7 # Or use `T` to transpose
`my_2d_array`
8 print(my_2d_array.T)

Want to leave a comment?


Run 

Tip: if the visual comparison between the array and its transposed version is not entirely
clear, inspect the shape of the two arrays to make sure that you understand why the
dimensions are permuted.

Note that there are two transpose functions. Both do the same; There isn’t too much
difference. You do have to take into account that T seems more of a convenience function
and that you have a lot more exibility with np.transpose() . That’s why it’s recommended
to make use of this function if you want to more arguments.

All is well when you transpose arrays that are bigger than one dimension, but what happens
when you just have a 1-D array? Will there be any effect, you think?

Try it out for yourself in the code chunk below. Your 1-D array has already been loaded in:

script.py IPython Shell


1 # Print `my_2d_array`
2 print(my_array)
3
4 # Transpose `my_2d_array`
5 print(np.transpose(my_array))
6
7 # Or use `T` to transpose
`my_2d_array`
8 print(my_array.T)

Run 

You’re absolutely right! There is no effect when you transpose a 1-D array!

Reshaping Versus Resizing Your Arrays


Want to leave a comment?
You might have read in the broadcasting section that the dimensions of your arrays need to
be compatible if you want them to be good candidates for arithmetic operations. But the
question of what you should do when that is not the case, was not answered yet.

Well, this is where you get the answer!

What you can do if the arrays don’t have the same dimensions, is resize your array. You will
then return a new array that has the shape that you passed to the np.resize() function. If
you pass your original array together with the new dimensions, and if that new array is larger
than the one that you originally had, the new array will be lled with copies of the original
array that are repeated as many times as is needed.

However, if you just apply np.resize() to the array and you pass the new shape to it, the
new array will be lled with zeros.

Let’s try this out with an example:

script.py IPython Shell


1 # Print the shape of `x`
2 print(x.shape)
3
4 # Resize `x` to ((6,4))
5 np.resize(x, (6,4))
6
7 # Try out this as well
8 x.resize((6,4))
9
10 # Print out `x`
11 print(x)

Run 

Besides resizing, you can also reshape your array. This means that you give a new shape to
an array without changing its data. The key to reshaping is to make sure that the total size of
the new array is unchanged. If you take the example of array x that was used above, which
has a size of 3 X 4 or 12, you have to make sure that the new array also has a size of 12.
Want to leave a comment?
Psst… If you want to calculate the size of an array with code, make sure to use the size
attribute: x.size or x.reshape((2,6)).size :

script.py IPython Shell


1 # Print the size of `x` to see what's
possible
2 print(x.size)
3
4 # Reshape `x` to (2,6)
5 print(x.reshape((2,6)))
6
7 # Flatten `x`
8 z = x.ravel()
9
10 # Print `z`
11 print(z)

Run 

If all else fails, you can also append an array to your original one or insert or delete array
elements to make sure that your dimensions t with the other array that you want to use for
your computations.

Another operation that you might keep handy when you’re changing the shape of arrays is
ravel() . This function allows you to atten your arrays. This means that if you ever have
2D, 3D or n-D arrays, you can just use this function to atten it all out to a 1-D array.

Pretty handy, isn’t it?

How To Append Arrays

When you append arrays to your original array, they are “glued” to the end of that original
array. If you want to make sure that what you append does not come at the end of the array,
you might consider inserting it. Go to the next section if you want to know more.

Appending is a pretty easy thing to do thanks to the NumPy library; You can just make use of
the np.append() .

Want to leave a comment?


Check how it’s done in the code chunk below. Don’t forget that you can always check which
arrays are loaded in by typing, for example, my_array in the IPython shell and pressing
ENTER.

script.py IPython Shell


1 # Append a 1D array to your
`my_array`
2 new_array = _________(my_array, [7, 8
, 9, 10])
3
4 # Print `new_array`
5 _________(new_array)
6
7 # Append an extra column to your
`my_2d_array`
8 new_2d_array = __________(my_2d_array
, [[7], [8]], axis=1)
9
10 # Print `new_2d_array`

Solution Run 

Note how, when you append an extra column to my_2d_array , the axis is speci ed.
Remember that axis 1 indicates the columns, while axis 0 indicates the rows in 2-D arrays.

How To Insert And Delete Array Elements

Next to appending, you can also insert and delete array elements. As you might have guessed
by now, the functions that will allow you to do these operations are np.insert() and
np.delete() :

script.py IPython Shell


1 # Insert `5` at index 1
2 ____________(my_array, 1, 5)
3
4 # Delete the value at index 1
5 ____________(my_array,[1])

Want to leave a comment?


Solution Run 
How To Join And Split Arrays

You can also ‘merge’ or join your arrays. There are a bunch of functions that you can use for
that purpose and most of them are listed below.

Try them out, but also make sure to test out what the shape of the arrays is in the IPython
shell. The arrays that have been loaded are x , my_array , my_resized_array and
my_2d_array .

script.py IPython Shell


1 # Concatentate `my_array` and `x`
2 print(np.concatenate((my_array,x)))
3
4 # Stack arrays row-wise
5 print(np.vstack((my_array,
my_2d_array)))
6
7 # Stack arrays row-wise
8 print(np.r_[my_resized_array,
my_2d_array])
9
10 # Stack arrays horizontally
11 print(np.hstack((my_resized_array,
my_2d_array)))
12
13 # Stack arrays column-wise
14 print(np.column_stack
((my_resized_array, my_2d_array)))
15
16 # Stack arrays column-wise
17 i t( [ i d
Run 

You’ll note a few things as you go through the functions:

The number of dimensions needs to be the same if you want to concatenate two arrays
with np.concatenate() . As such, if you want to concatenate an array with my_array ,
which is 1-D, you’ll need to make sure that the second array that you have, is also 1-D.

With np.vstack() , you effortlessly combine my_array with my_2d_array . You just have
Want to leave a comment?
to make sure that, as you’re stacking the arrays row-wise, that the number of columns in
both arrays is the same. As such, you could also add an array with shape (2,4) or (3,4)
to my_2d_array , as long as the number of columns matches. Stated differently, the arrays
must have the same shape along all but the rst axis. The same holds also for when you
want to use np.r[] .

For np.hstack() , you have to make sure that the number of dimensions is the same and
that the number of rows in both arrays is the same. That means that you could stack arrays
such as (2,3) or (2,4) to my_2d_array , which itself as a shape of (2,4) . Anything is
possible as long as you make sure that the number of rows matches. This function is still
supported by NumPy, but you should prefer np.concatenate() or np.stack() .

With np.column_stack() , you have to make sure that the arrays that you input have the
same rst dimension. In this case, both shapes are the same, but if my_resized_array
were to be (2,1) or (2,) , the arrays still would have been stacked.

np.c_[] is another way to concatenate. Here also, the rst dimension of both arrays
needs to match.

When you have joined arrays, you might also want to split them at some point. Just like you
can stack them horizontally, you can also do the same but then vertically. You use
np.hsplit() and np.vsplit() , respectively:

script.py IPython Shell


1 # Split `my_stacked_array`
horizontally at the 2nd index
2 print(np.hsplit(my_stacked_array, 2))
3
4 # Split `my_stacked_array` vertically
at the 2nd index
5 print(np.vsplit(my_stacked_array, 2))

Run 

Want to leave a comment?


What you need to keep in mind when you’re using both of these split functions is probably
the shape of your array. Let’s take the above case as an example: my_stacked_array has a
shape of (2,8) . If you want to select the index at which you want the split to occur, you
have to keep the shape in mind.

How To Visualize NumPy Arrays


Lastly, something that will de nitely come in handy is to know how you can plot your arrays.
This can especially be handy in data exploration, but also in later stages of the data science
work ow, when you want to visualize your arrays.

With np.histogram()

Contrary to what the function might suggest, the np.histogram() function doesn’t draw
the histogram but it does compute the occurrences of the array that fall within each bin;
This will determine the area that each bar of your histogram takes up.

What you pass to the np.histogram() function then is rst the input data or the array that
you’re working with. The array will be attened when the histogram is computed.

script.py IPython Shell


1 # Import `numpy` as `np`
2 import numpy as np
3
4 # Initialize your array
5 my_3d_array = np.array([[[1,2,3,4],
[5,6,7,8]], [[1,2,3,4], [9,10,11
,12]]], dtype=np.int64)
6
7 # Pass the array to `np.histogram()`
8 print(np.histogram(my_3d_array))
9
10 # Specify the number of bins
11 print(np.histogram(my_3d_array, bins
=range(0,13)))

Run 

You’ll see that as a result, the histogram will be computed: the rst array lists the frequencies
for all the elements of your array, while the second array lists the bins that would be used if
Want to leave a comment?
you don’t specify any bins.
If you do specify a number of bins, the result of the computation will be different: the oats
will be gone and you’ll see all integers for the bins.

There are still some other arguments that you can specify that can in uence the histogram
that is computed. You can nd all of them here.

But what is the point of computing such a histogram if you can’t visualize it?

Visualization is a piece of cake with the help of Matplotlib, but you don’t need
np.histogram() to compute the histogram. plt.hist() does this for itself when you pass
it the ( attened) data and the bins:

# Import numpy and matplotlib


import numpy as np
import matplotlib.pyplot as plt

# Construct the histogram with a flattened 3d array and a range of bins


plt.hist(my_3d_array.ravel(), bins=range(0,13))

# Add a title to the plot


plt.title('Frequency of My 3D Array Elements')

# Show the plot


plt.show()

The above code will then give you the following (basic) histogram:

Want to leave a comment?


Using np.meshgrid()

Another way to (indirectly) visualize your array is by using np.meshgrid() . The problem
that you face with arrays is that you need 2-D arrays of x and y coordinate values. With the
above function, you can create a rectangular grid out of an array of x values and an array of y
values: the np.meshgrid() function takes two 1D arrays and produces two 2D matrices
corresponding to all pairs of (x, y) in the two arrays. Then, you can use these matrices to
make all sorts of plots.

np.meshgrid() is particularly useful if you want to evaluate functions on a grid, as the code
below demonstrates:

# Import NumPy and Matplotlib


import numpy
Want as np
to leave a comment?
import matplotlib.pyplot as plt
# Create an array
points = np.arange(-5, 5, 0.01)

# Make a meshgrid
xs, ys = np.meshgrid(points, points)
z = np.sqrt(xs ** 2 + ys ** 2)

# Display the image on the axes


plt.imshow(z, cmap=plt.cm.gray)

# Draw a color bar


plt.colorbar()

# Show the plot


plt.show()

The code above gives the following result:

Want to leave a comment?


Beyond Data Analysis with NumPy
Congratulations, you have reached the end of the NumPy tutorial!

You have covered a lot of ground, so now you have to make sure to retain the knowledge that
you have gained. Don’t forget to get your copy of DataCamp’s NumPy cheat sheet to support
you in doing this!

After all this theory, it’s also time to get some more practice with the concepts and
techniques that you have learned in this tutorial. One way to do this is to go back to the
scikit-learn tutorial and start experimenting with further with the data arrays that are used
to build machine learning models.

Want to leave a comment?


If this is not your cup of tea, check again whether you have downloaded Anaconda. Then, get
started with NumPy arrays in Jupyter with this De nitive Guide to Jupyter Notebook. Also
make sure to check out this Jupyter Notebook, which also guides you through data analysis
in Python with NumPy and some other libraries in the interactive data science environment
of the Jupyter Notebook.

Lastly, consider checking out DataCamp’s courses on data manipulation and visualization.
Especially our latest courses in collaboration with Continuum Analytics will de nitely
interest you! Take a look at the Manipulating DataFrames with Pandas or the Pandas
Foundations courses.

90 11

COMMENTS

nicegreatest7asan
04/03/2018 04:49 PM

there is a mix up when explaining np.linspace() and np.arange(), np.linspace takes number of
samples and np.arange() takes the step value

7 R E P LY

Joshua Brengel
09/05/2018 12:42 AM

Nice tutorial. In the "How NumPy Broadcasting Works" section, when talking about array "x"
having dimensions 3x4 and "y" dimensions 5x1x4, is that not the shape of the arrays as opposed
to the dimensions? x.ndim gives me 2 and y.ndim gives me 3. I'm trying to learn more about arrays
and I was a bit confused by this. Are shape and dimension sometimes used interchangeably?
Does this mean that if any of the arrays being broadcasted has a 1 in its shape (as opposed to
dimension), it is possible to broadcast without an error? Thanks!

2 R E P LY

Want to leave a comment?


Gabriel Shih
04/09/2018 12:44 PM

The comments on np.hsplit and np.vsplit are not very correct. As the second arguments are 2,
my_stacked_array will be split into two sections of equal sizes. It will be split at the 2nd index if the
second argument is [2].

3 R E P LY

Rufaro Sithole
05/10/2018 07:52 PM

I have not gone far with this tutorial but I have searched for the best, this is what I found

0 R E P LY

Kalyan Chatterjea
17/10/2018 07:00 PM

Thank you for this fantastic tutorial. With your permission, I would like to provide my students a
link to this page, so that they have a good understanding of Numpy Arrays.

2 R E P LY

Piyush Yadav
12/11/2018 06:02 PM

You can share this tutorial to anyone you want there's no need of taking permission regarding
this.

1 R E P LY

muaadh Abdo
01/11/2018 11:11 PM

thanks allot its very Nice tutorial

1 R E P LY

Haris Riaz
08/01/2019 06:15 AM

There is a bit of correction to be made in this tutorial in the part regarding splitting of arrays. In
both np.hsplit() and np.vsplit(), the second argument is in fact not the index at which to split the
array, but rather the equal number of parts in which the array must be split. This means we can't
just supply any arbitrary number as the second argument...it depends upon the shape of our array.
In this case my_stacked_array has shape (2,8) which means it can be split horizontally into two
arrays
Want ofto equal
leave ashape
comment?
(2,4) each or it can be split vertically into two arrays of equal shape (1,8)
each. This wasn't explained in the tutorial and I felt it important to highlight.
2 R E P LY

Mohammed Innat
20/03/2019 02:29 AM

Awesome overview on NumPy. Someone may also nd this repository useful too for NumPy.

https://github.com/iphton/NumPy-Tutorials

1 R E P LY

Python Examples Org


15/05/2019 10:46 AM

Numpy is the most basic package one has to master. Be it image processing, deep-learning or
solving any mathematical problem. And the Python Examples provided here are very good to
learn Python Numpy. Thanks for this detailed examples.

1 R E P LY

Grant Wilson
06/06/2019 02:41 PM

Thanks for the comprehensive and highly informative tutorial. NumPy is a must have for any
scienti c project. We used it for Machine Learning with Python: from Linear Models to Deep
Learning (projects by post graduates at Massachusetts Institute of Technology). Python NumPy
Tutorial: learn with examples or buy essay tips.

1 R E P LY

Subscribe to RSS

About Terms Privacy

Want to leave a comment?

You might also like