Unit 5
Unit 5
ANDTECHNOLOGY,
CHENNAI.
21CSS101J – Programming for Problem
SRM
Solving Unit 5
INSTITUTE OF SCIENCE
ANDTECHNOLOGY,
CHENNAI.
SRM
LEARNING RESOURCES
S. No TEXT BOOKS
3. https://www.tutorialspoint.com/python/index.htm
4. https://www.w3schools.com/python/
INSTITUTE OF SCIENCE ANDTECHNOLOGY,
CHENNAI.
UNIT V
SRM
(TOPICS COVERED)
UNIT-5
Numpy
(Numerical Python)
NumPy
Stands for Numerical Python
Is the fundamental package required for high performance
computing and data analysis
NumPy is so important for numerical computations in Python is
because it is designed for efficiency on large arrays of data.
It provides
ndarray for creating multiple dimensional arrays
Internally stores data in a contiguous block of memory,
independent of other built-in Python objects, use much less
memory than built-in Python sequences.
Standard math functions for fast operations on entire arrays
of data without having to write loops
NumPy Arrays are important because they enable you to
express batch operations on data without writing any for
loops. We call this vectorization.
NumPy ndarray vs list
One of the key features of NumPy is its N-dimensional array object, or ndarray,
which is a fast, flexible container for large datasets in Python.
Whenever you see “array,” “NumPy array,” or “ndarray” in the text, with few
exceptions they all refer to the same thing: the ndarray object.
NumPy-based algorithms are generally 10 to 100 times faster (or more) than their
pure Python counterparts and use significantly less memory.
import numpy as np
my_arr = np.arange(1000000)
my_list = list(range(1000000))
ndarray
ndarray is used for storage of homogeneous data
i.e., all elements the same type
Every array must have a shape and a dtype
Supports convenient slicing, indexing and efficient vectorized
computation
import numpy as np
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
print(arr1)
print(arr1.dtype)
print(arr1.shape)
print(arr1.ndim)
Numpy
- Numerical Python
- Fast Computation with n-dimensional arrays
- Based around one data structure
- ndarray
- n-dimensional array
- import with import numpy as np
- Usage is np.command(xxx)
ndarrays
Creating ndarrays
Multidimensional arrays
Operations between arrays and scalars
Array creation functions
astype
Astype – string to float
Basic indexing and slicing (broadcasting)
The original array has changed
https://slideplayer.com/slide/13118328/
Numpy Indexing
Where,
N is the total number of elements or frequency of distribution.
Parameters:
a: Array containing data to be averaged
axis: Axis or axes along which to average a
dtype: Type to use in computing the variance.
out: Alternate output array in which to place the result.
ddof: Delta Degrees of Freedom
keepdims: If this is set to True, the axes which are reduced are left
in the result as dimensions with size one
Example:
Example:
import pandas
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pandas.DataFrame(mydataset)
print(myvar)
Pandas as pd
import pandas as pd
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pd.DataFrame(mydataset)
print(myvar)
0 1 1 7 2 2 dtype: int64
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
Labels
If nothing else is specified, the values are labeled with their index
number. First value has index 0, second value has index 1 etc.
This label can be used to access a specified value.
Example
Return the first value of the Series:
print(myvar[0])
Create Labels
With the index argument, you can name your own labels.
Example
Create your own labels:
import pandas as pd
a = [1, 7, 2]
print(myvar)
Key/Value Objects as Series
You can also use a key/value object, like a dictionary, when
creating a Series.
Example
Create a simple Pandas Series from a dictionary:
import pandas as pd
myvar = pd.Series(calories)
print(myvar)
To select only some of the items in the dictionary, use the index
argument and specify only the items you want to include in the
Series.
Example
Create a Series using only data from "day1" and "day2":
import pandas as pd
print(myvar)
Pandas DataFrame
It is two-dimensional
size-mutable, potentially
heterogeneous tabular data
structure with labeled axes (rows
and columns). A Data frame is a
two-dimensional data structure,
i.e., data is aligned in a tabular
fashion in rows and columns.
Pandas DataFrame consists of
three principal components,
the data, rows, and columns.
Data Frame Objects
Data sets in Pandas are usually multi-dimensional tables, called
DataFrames.
Series is like a column, a DataFrame is the whole table.
Example
Create a DataFrame from two Series:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
myvar = pd.DataFrame(data)
print(myvar)
What is a DataFrame?
A Pandas DataFrame is a 2 dimensional data structure, like a 2
dimensional array, or a table with rows and columns.
Example
Create a simple Pandas DataFrame:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
print(df)
Locate Row
As you can see from the result above, the DataFrame is like a
table with rows and columns.
Example
Return row 0:
Example
Add a list of names to give each row a name:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
print(df)
Locate Named Indexes
Use the named index in the loc attribute to return the specified
row(s).
Example
Return "day2":
Example
Load a comma separated file (CSV file) into a DataFrame:
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
Output:
Simple Operations with Data frames
Basic operation which can be performed on Pandas DataFrame :
Creating a DataFrame
Dealing with Rows and Columns
Indexing and Selecting Data
Working with Missing Data
Iterating over rows and columns
Create a Pandas DataFrame from Lists
DataFrame can be created using a single list or a list of lists.
# import pandas as pd
import pandas as pd
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']
print(first)
Working with Missing Data
Checking for missing values using isnull() and notnull() :
In order to check missing values in Pandas DataFrame, we use a function isnull()
and notnull(). Both function help in checking whether a value is NaN or not.
These function can also be used in Pandas Series in order to find null values in a
series.
# importing pandas as pd
import pandas as pd
# importing numpy as np
import numpy as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
# creating a dataframe from list
df = pd.DataFrame(dict)
# using isnull() function
df.isnull()
Querying from Data Frames
import pandas as pd
data = {
"name": ["Sally", "Mary", "John"],
"age": [50, 40, 30]
}
df = pd.DataFrame(data)
Ref:
https://towardsdatascience.com/speed-testing
-pandas-vs-numpy-ffbf80070ee7
Other Python Libraries