0% found this document useful (0 votes)
6 views7 pages

Python End Term Python Code

The document is a Jupyter notebook that demonstrates data manipulation and analysis using Python libraries such as pandas, NumPy, and Matplotlib. It includes loading a food texture dataset, creating and accessing various data structures like lists, dictionaries, and NumPy arrays, and showcases the use of pandas for handling data frames. The notebook also covers basic data visualization and provides examples of data access and manipulation techniques.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
6 views7 pages

Python End Term Python Code

The document is a Jupyter notebook that demonstrates data manipulation and analysis using Python libraries such as pandas, NumPy, and Matplotlib. It includes loading a food texture dataset, creating and accessing various data structures like lists, dictionaries, and NumPy arrays, and showcases the use of pandas for handling data frames. The notebook also covers basic data visualization and provides examples of data access and manipulation techniques.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 7

8/28/24, 2:51 AM End_Term_Webinar.

ipynb - Colab

Load essential libraries

import pandas as pd # For data manipulation and analysis with DataFrames


import numpy as np # For numerical operations on arrays and matrices
import matplotlib.pyplot as plt # For creating visualizations and plots
import matplotlib.cm as cm # For handling colormaps in visualizations

# Set plot style to Seaborn's updated 'whitegrid' style in Matplotlib


plt.style.use('seaborn-v0_8')

# Ensure plots are displayed inline in Jupyter notebooks


%matplotlib inline

# Provides access to system-specific parameters and functions (not directly used here)
import sys

Mount Google Drive folder if running Google Colab

add Code add Text


# Check if running in Google Colab
if 'google.colab' in sys.modules:
from google.colab import drive # Import Google Drive integration module for Colab
drive.mount('/content/drive', force_remount=True) # Mount Google Drive to the Colab environment
# Set directory paths for data storage in Google Drive
DIR = '/content/drive/MyDrive/Anand Programming'
DATA_DIR = DIR + '/Data/'
else:
# Set directory path for local environment
DATA_DIR = 'Data/'

Mounted at /content/drive

Load the food texture dataset

# Load the data from a CSV file


FILE = DATA_DIR + 'food-texture.csv' # Define the file path by combining the directory path and the file name
df_food = pd.read_csv(FILE, index_col=0, header=0) # Read the CSV file into a pandas DataFrame, using the first column as the index and
df_food.head() # Display the first 5 rows of the DataFrame to get an overview of the data

Oil Density Crispy Fracture Hardness

B110 16.5 2955 10 23 97

B136 17.7 2660 14 9 139

B171 16.2 2870 12 17 143

B192 16.7 2920 10 31 95

B225 16.3 2975 11 26 143


 

Data structures in Python

# Create a list containing an integer, a string, and two nested lists


mylist = [100, 'Sudarsan', ['a', 'b', 'c'], [9, 10, 11]]

# Create a dictionary with keys 'First' and 'Second', pointing to a string and a list, respectively
mydict = {'First': 'Sudarsan', 'Second': [1, 2, 3]}

# Create a 1D NumPy array with three integers


myarray1d = np.array([100, 200, 300])

# Create a 2D NumPy array (matrix) with two rows and three columns
myarray2d = np.array([[100, 200, 300], [400, 500, 600]])

# Create a tuple with four integers; tuples are immutable, meaning their elements cannot be changed
mytuple = (10, 20, 30, 40)

# Access and print various elements from the list and arrays
print(mylist[1]) # Print the second element of the list ('Sudarsan')
print(mylist[2]) # Print the third element of the list (['a', 'b', 'c'])

https://colab.research.google.com/drive/1AIORmRsezAP9eVcc_hTqONPDbEkTO1zK#scrollTo=20W0d4ruQjE4&printMode=true 1/7
8/28/24, 2:51 AM End_Term_Webinar.ipynb - Colab
print(mylist[2][0]) # Print the first element of the nested list within the list ('a')

print(myarray1d[0]) # Print the first element of the 1D array (100)


print(myarray2d[0]) # Print the first row of the 2D array ([100, 200, 300])
print(myarray2d[0][0]) # Print the first element of the first row of the 2D array (100)
print(myarray2d[0, 0]) # Alternative way to print the first element of the first row of the 2D array (100)

# Access and print elements using negative indexing


print(mylist[-1]) # Print the last element of the list ([9, 10, 11])
print(myarray1d[-2]) # Print the second-to-last element of the 1D array (200)

# Access and print slices of the list and array


print(mylist[0:2]) # Print the first two elements of the list ([100, 'Sudarsan'])
print(myarray1d[0:2]) # Print the first two elements of the 1D array ([100, 200])

# Access and print slices using negative indexing


print(mylist[-2:-1]) # Print the second-to-last element in the list as a list ([[9, 10, 11]])
print(mylist[-3:-1]) # Print the second and third-to-last elements in the list (['a', 'b', 'c'], [9, 10, 11])

Sudarsan
['a', 'b', 'c']
a
100
[100 200 300]
100
100
[9, 10, 11]
200
[100, 'Sudarsan']
[100 200]
[['a', 'b', 'c']]
['Sudarsan', ['a', 'b', 'c']]

A typical question on the exam

patientinfo = ['Sudarsan', [76, 132, 37.5], 0] # A list containing a string, a nested list, and an integer
patientinfo[1][0] # Accesses the first element (76) of the nested list [76, 132, 37.5]

76

Pandas series

a = [76, 132, 37.5] # A list containing three elements: Heart Rate (HR), Blood Pressure (BP), and Temperature (Temp)
type(a) # Check the type of the variable 'a' (list)
type(a[0]) # Check the type of the first element in the list 'a' (integer)

# Create a pandas Series from the list 'a'


myseries = pd.Series(a)
print(myseries) # Print the Series; by default, indices will be 0, 1, 2
type(myseries) # Check the type of 'myseries' (pandas Series)

# Create a new pandas Series with custom indices ('HR', 'BP', 'Temp') for each corresponding value
mynewseries = pd.Series(a, index=['HR', 'BP', 'Temp'])
print(mynewseries) # Print the new Series with the custom indices

# Access specific elements of the Series


print(mynewseries['HR']) # Access the element with the index 'HR' (returns 76)
print(mynewseries[0]) # Access the first element of the Series by position (returns 76)

0 76.0
1 132.0
2 37.5
dtype: float64
HR 76.0
BP 132.0
Temp 37.5
dtype: float64
76.0
76.0
<ipython-input-36-efb7a371874f>:16: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version,
print(mynewseries[0]) # Access the first element of the Series by position (returns 76)

 

Dictionary

https://colab.research.google.com/drive/1AIORmRsezAP9eVcc_hTqONPDbEkTO1zK#scrollTo=20W0d4ruQjE4&printMode=true 2/7
8/28/24, 2:51 AM End_Term_Webinar.ipynb - Colab

# Create a dictionary with two key-value pairs


mydict = {'First': 'Sudarsan', 'Second': [1, 2, 3]}

# mydict[0] # This line is commented out because it would result in a KeyError (dictionaries are accessed by key, not by index)

# Access the value associated with the key 'First' in the dictionary
mydict['First'] # Returns 'Sudarsan'

# Create a new dictionary with names as keys and lists of measurements as values
newdict = {'Sudarsan': [76, 124, 37.5], 'Priya': [78, 128, 37.2]}

# Access the list of measurements associated with the key 'Sudarsan'


newdict['Sudarsan'] # Returns [76, 124, 37.5]

# Loop through the new dictionary, printing each key and its corresponding value
for key, value in newdict.items():
print(key) # Print the key (e.g., 'Sudarsan')
print(value) # Print the value associated with the key (e.g., [76, 124, 37.5])

# Create another dictionary with more complex nested data (lists within lists)
anotherdict = {'Sudarsan': [[76, 124, 37.5], [80, 132, 38]], 'Priya': [[78, 128, 37.2], [82, 131, 38]]}

# Access the first measurement of the first list associated with the key 'Priya'
anotherdict['Priya'][0][0] # Returns 78

Sudarsan
[76, 124, 37.5]
Priya
[78, 128, 37.2]
78

Numpy array

# Create a 2D NumPy array


X = np.array([[3, 5, 6, 3], [9, 7, 4, 1], [-1, 0, 2, -4]])
print(X) # Print the entire 2D array

# Access specific elements and slices of the 2D array


print(X[0, 0]) # Access the element at row 0, column 0 (3)
print(X[0][0]) # Alternative way to access the element at row 0, column 0 (3)

print(X[1, :]) # Access the entire second row ([9, 7, 4, 1])

print(X[1, -3:-2]) # Access the second element of the second row, using negative slicing ([7])

print(X[:, 2]) # Access the entire third column ([6, 4, 2])

print(X[0:2, 2]) # Access the elements in the third column of the first two rows ([6, 4])

print(X[0, 1:]) # Access all elements of the first row starting from the second element ([5, 6, 3])

print(X[0, :-1]) # Access all elements of the first row except the last one ([3, 5, 6])

# Create a 1D NumPy array


newarray1d = np.array([-10, 29, -45, 54, 98, -70])
print(newarray1d) # Print the 1D array

# Use a condition to filter elements in the array


print(newarray1d[newarray1d >= 0]) # Print elements that are greater than or equal to 0 ([29, 54, 98])

# Create a list of heart rates


listHR = [76, 76, 86, 74, 82]
threshold = 75

# listHR > threshold # This would result in an error because comparison operations on lists like this aren't supported

# Convert the list to a NumPy array for element-wise comparison


arrayHR = np.array(listHR)
print(arrayHR) # Print the array of heart rates

# Filter and print heart rates that are greater than the threshold (75)
print(arrayHR[arrayHR > threshold]) # Print elements of arrayHR that are greater than 75 ([76, 76, 86, 82])

[[ 3 5 6 3]
[ 9 7 4 1]
[-1 0 2 -4]]
3

https://colab.research.google.com/drive/1AIORmRsezAP9eVcc_hTqONPDbEkTO1zK#scrollTo=20W0d4ruQjE4&printMode=true 3/7
8/28/24, 2:51 AM End_Term_Webinar.ipynb - Colab
3
[9 7 4 1]
[7]
[6 4 2]
[6 4]
[5 6 3]
[3 5 6]
[-10 29 -45 54 98 -70]
[29 54 98]
[76 76 86 74 82]
[76 76 86 82]

List and dictionary comprehension

for j in range(4): # Loop over a range of numbers from 0 to 3


print(j) # Print the current value of j

0
1
2
3

# Creating a matrix using list comprehension


mymatrix = np.array([[j for j in range(4)] for i in range(5)])
print(mymatrix) # Print the created matrix
type(mymatrix) # Check the type of 'mymatrix'

[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]
numpy.ndarray

Dataframes

# Display the first 3 rows of the DataFrame 'df_food'


df_food.head(3)

Oil Density Crispy Fracture Hardness

B110 16.5 2955 10 23 97

B136 17.7 2660 14 9 139

B171 16.2 2870 12 17 143


 

# Attributes of a DataFrame

df_food.shape # Returns a tuple representing the dimensionality of the DataFrame (number of rows, number of columns)

df_food.columns # Returns an Index object containing the column labels of the DataFrame

df_food.columns[0] # Accesses the first column label in the DataFrame

df_food.columns[-1] # Accesses the last column label in the DataFrame

df_food.index # Returns the index (row labels) of the DataFrame

Index(['B110', 'B136', 'B171', 'B192', 'B225', 'B237', 'B261', 'B264', 'B353',


'B360', 'B366', 'B377', 'B391', 'B397', 'B404', 'B437', 'B445', 'B462',
'B485', 'B488', 'B502', 'B554', 'B556', 'B575', 'B576', 'B605', 'B612',
'B615', 'B649', 'B665', 'B674', 'B692', 'B694', 'B719', 'B727', 'B758',
'B776', 'B799', 'B836', 'B848', 'B861', 'B869', 'B876', 'B882', 'B889',
'B907', 'B911', 'B923', 'B971', 'B998'],
dtype='object')

# Access elements of a DataFrame

# Access the 'Oil' column as a Series


df_food['Oil']

# Access the values of the 'Oil' column as a NumPy array


df_food['Oil'].values

https://colab.research.google.com/drive/1AIORmRsezAP9eVcc_hTqONPDbEkTO1zK#scrollTo=20W0d4ruQjE4&printMode=true 4/7
8/28/24, 2:51 AM End_Term_Webinar.ipynb - Colab

# Access all rows for the 'Oil' column using the .loc accessor (label-based)
df_food.loc[:, 'Oil']

# Access all rows of the first column (index 0) using the .iloc accessor (position-based)
df_food.iloc[:, 0]

# Access the 'Oil' column, but the result is a DataFrame (not a Series) by using double brackets
df_food[['Oil']]

# Access the first element of the 'Oil' column (index 0)


df_food['Oil'][0]

# Subset the DataFrame to include only the 'Oil' and 'Hardness' columns
df_food[['Oil', 'Hardness']]

# The following line is commented out because the .select method is not available in pandas as it is in R
# df_food.select(['Oil', 'Hardness']) # R-style selection, not applicable in pandas

# Filter the DataFrame to include only the 'Oil' and 'Hardness' columns
df_food.filter(['Oil', 'Hardness'])

# Filter rows where 'Oil' is greater than or equal to 17 and then select 'Oil' and 'Hardness' columns
df_food[df_food['Oil'] >= 17].filter(['Oil', 'Hardness'])

# Calculate the percentage of rows where 'Oil' is greater than or equal to 17


np.mean(df_food['Oil'] >= 17) * 100

# Access the 'Crispy' column for rows where 'Oil' is greater than or equal to 17 and 'Hardness' is less than or equal to 120
df_food.loc[(df_food['Oil'] >= 17).values & (df_food['Hardness'] <= 120).values, 'Crispy']

<ipython-input-43-5c90f5b540a6>:19: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version,


df_food['Oil'][0]
Crispy

B261 13

B264 10

B445 14

B575 8

B665 12

B758 14

B776 13

B836 11

B889 12

B971 13

dtype: int64

# Create a figure and a single subplot


fig, ax = plt.subplots(1)

# Create a scatter plot of 'Oil' vs 'Density' with red-colored points


ax.scatter(df_food['Oil'], df_food['Density'], color='red')

# Set the label for the x-axis to 'Oil Percentage'


ax.set_xlabel('Oil Percentage')

# Set the label for the y-axis to 'Density'


ax.set_ylabel('Density')

https://colab.research.google.com/drive/1AIORmRsezAP9eVcc_hTqONPDbEkTO1zK#scrollTo=20W0d4ruQjE4&printMode=true 5/7
8/28/24, 2:51 AM End_Term_Webinar.ipynb - Colab

Text(0, 0.5, 'Density')

 

# Create an array with values ranging from 0 to 4


np.arange(5)

# Create a range object with values from 0 to 4


range(5)

# Iterate over each value in the array created by np.arange(5) and print it
for j in np.arange(5):
print(j)

0
1
2
3
4

# Find the maximum value in the 'Oil' column of the df_food DataFrame
max(df_food['Oil'])

# Find the minimum value in the 'Oil' column of the df_food DataFrame
min(df_food['Oil'])

# Filter the DataFrame for rows where 'Crispy' is greater than or equal to 12,
# select the 'Oil' column, and calculate the mean value of the 'Oil' column
np.mean(df_food[df_food['Crispy'] >= 12].filter(['Oil']))

18.014814814814812

# Create a figure and a single subplot


fig, ax = plt.subplots(1)

# Create a histogram of the 'Oil' values from the DataFrame


# - `df_food['Oil'].values` converts the 'Oil' column to a NumPy array.
# - `bins = np.arange(min(df_food['Oil']), max(df_food['Oil']), 1.0)` specifies the bin edges for the histogram.
# - `min(df_food['Oil'])`: The minimum value of the 'Oil' column is the start of the first bin.
# - `max(df_food['Oil'])`: The maximum value of the 'Oil' column is the end of the last bin.
# - `1.0`: The width of each bin is set to 1.0.
ax.hist(df_food['Oil'].values, bins=np.arange(min(df_food['Oil']), max(df_food['Oil']) + 1, 1.0))

# Display the histogram


plt.show()

https://colab.research.google.com/drive/1AIORmRsezAP9eVcc_hTqONPDbEkTO1zK#scrollTo=20W0d4ruQjE4&printMode=true 6/7
8/28/24, 2:51 AM End_Term_Webinar.ipynb - Colab

https://colab.research.google.com/drive/1AIORmRsezAP9eVcc_hTqONPDbEkTO1zK#scrollTo=20W0d4ruQjE4&printMode=true 7/7

You might also like