NumPy I
NumPy I
● 2001 - “expansion of
statistics beyond theory
into technical areas”
○ Internet, Big Data
● 2003 - Journal of Data
Science
● 2008 - Data Scientist
● See Wikipedia
Python
Big Data
● Horizontal scalability
● Commodity hardware
● 2004 - Google MapReduce
○ Functional programming
○ map and reduce
● 2008 - Yahoo! Hadoop
○ HDFS
○ MapReduce
● Apache projects (Spark)
Python
● Multidisciplinar language
○ Web development
○ Operative systems
○ Data Science
● ¿Features?
Python
● Multidisciplinar language
○ Web development
○ Operative systems
○ Data Science
● Features
○ Object oriented
■ Hiding vs removing state
○ Dynamic
○ Imperative
○ REPL (Read Evaluate Print Loop)
○ “batteries included”, “it just works”
Python
● Various libraries and diverse
ecosystem
● Highly interactive (REPL)
○ Fits nicely with Data
Science requirements
○ Data exploration and
visualization (Notebooks)
Python
Summary:
● Multidisciplinar language
● Active and highly innovative community (“hackers”)
● Highly interactive
● Not Data Science oriented in the beginning
○ This changed when NumPy appeared
NumPy
NumPy
● Python library oriented to “array programming” (numerical computation)
○ n dimensional
● It’s the root of all Python ecosystem in Data Science
● It originates from the unification of Numeric and Numarray in 2005
○ Projects oriented to create a numerical library for Python
● Paper available in Nature
● Physics, chemistry, astronomy, geoscience, biology, psychology, materials
science, engineering, finance and economics
○ Climate variables (precipitation, temperature, wind speed, ...)
NumPy
NumPy
● Why do we need NumPy in order to perform numerical computation in Python?
We could just use lists and nested lists
● How are Python lists implemented?
○ Python docs
○ Blog
● Big O notation (Quicksort?)
● NumPy defines a data structure contiguous in memory that defines its internal
state beforehand
● Locality of reference
● __getitem__, __setitem__, __delitem__
NumPy
NumPy
Data types
● int8, uint8 - i1, u1 - Signed and unsigned 8-bit (1 byte) integer types
● in32, uint32 - i4, u4 - Signed and unsigned 32-bit (4 byte) integer types
● in64, uint64 - i8, u8 - Signed and unsigned 64-bit (8 byte) integer types
● float32, float64 - f4, f8 - Single and double precision floating point
● bool - ? - Boolean, True and False
● object - O - Python object type
● string_ - S - Fixed-length string type (1 byte per character). Ex. S10 (10 bytes)
● unicode_ - U - Fixed-length unicode type (number of bytes is platform
dependent)
NumPy
● Properties
○ dtype
○ shape
○ strides
● Views and copies
● Vectorization
○ Element-wise vectorization
○ CPU vectorization
● Broadcasting
● Fancy indexing
NumPy
Exercises