Aiml - 4351601

Diploma Engineering
Laboratory Manual
(Foundation of AI and ML)
(4351601)
[Information Technology, Semester-V]

Enrolment No
Name
Branch
Academic Term
Institute
Directorate Of Technical Education

Gandhinagar - Gujarat
Foundation of AI and ML(4351601)
DTE’s Vision:
● To provide globally competitive technical education;
● Remove geographical imbalances and inconsistencies;
● Develop student friendly resources with a special focus on girls’ education
and support to weaker sections;
● Develop programs relevant to industry and create a vibrant pool of technical
professionals.
DTE’s Mission:
Institute’s Vision:
Institute’s Mission:
Department’s Vision:
Department’s Mission:
Foundation of AI and ML(4351601)
Certificate
This is to certify that Mr./Ms ………………………………………………………………….

Enrolment No. ………….……………. of …….………. Semester of Diploma in Information Technology
of …………………. (GTU Code) has satisfactorily completed the term work in course Foundation of
AI and ML (4351601)for the academic year: …………………… Term: Odd/Even prescribed in the
GTU curriculum.
Place:…………..
Date: …………………..
Signature of Course Faculty Head of the Department

Foundation of AI and ML (4351601)
Preface
The primary aim of any laboratory/Practical/field work is enhancement of required skills as
well as creative ability amongst students to solve real time problems by developing relevant
competencies in psychomotor domain. Keeping in view, GTU has designed competency focused
outcome-based curriculum -2021 (COGC-2021) for Diploma engineering programmes. In this more
time is allotted to practical work than theory. It shows importance of enhancement of skills amongst
students and it pays attention to utilize every second of time allotted for practical amongst Students,
Instructors and Lecturers to achieve relevant outcomes by performing rather than writing practice in
study type. It is essential for effective implementation of competency focused outcome- based Green
curriculum-2021. Every practical has been keenly designed to serve as a tool to develop & enhance
relevant industry needed competency in each and every student. These psychomotor skills are very
difficult to develop through traditional chalk and board content delivery method in the classroom.
Accordingly, this lab manual has been designed to focus on the industry defined relevant outcomes,
rather than old practice of conducting practical to prove concept and theory.
By using this lab manual, students can read procedure one day in advance to actual
performance day of practical experiment which generates interest and also, they can have idea of
judgement of magnitude prior to performance. This in turn enhances predetermined outcomes amongst
students. Each and every Experiment /Practical in this manual begins by competency, industry relevant
skills, course outcomes as well as practical outcomes which serve as a key role for doing the practical.
The students will also have a clear idea of safety and necessary precautions to be taken while
performing experiment.
This manual also provides guidelines to lecturers to facilitate student-centred lab activities for
each practical/experiment by arranging and managing necessary resources in order that the students
follow the procedures with required safety and necessary precautions to achieve outcomes. It also
gives an idea that how students will be assessed by providing Rubrics.
Fundamentals of machine learning course will help students to build up core competencies in
understanding machine learning approaches and students will be able to design and train machine
learning modes for various use cases. The lab work of the course is designed to develop crisp
understanding of the underpinning theory.
Although we try our level best to design this lab manual, but always there are chances of
improvement. We welcome any suggestions for improvement.
3 | Page
Programme Outcomes (POs):
1. Basic and Discipline specific knowledge: Apply knowledge of basic mathematics, science
and engineering fundamentals and engineering specialization to solve the engineering
problems.
2. Problem analysis: Identify and analyse well-defined engineering problems using codified
standard methods.
3. Design/ development of solutions: Design solutions for engineering well-defined technical

problems and assist with the design of systems components or processes to meet specified
needs.
4. Engineering Tools, Experimentation and Testing: Apply modern engineering tools and
appropriate technique to conduct standard tests and measurements.
5. Engineering practices for society, sustainability and environment: Apply appropriate

technology in context of society, sustainability, environment and ethical practices.
6. Project Management: Use engineering management principles individually, as a team

member or a leader to manage projects and effectively communicate about well-defined
engineering activities.
7. Life-long learning: Ability to analyze individual needs and engage in updating in the context
of technological changes in field of engineering.
4 | Page
Practical Outcome - Course Outcome matrix

Course Outcomes (COs):
CO1) Understand fundamental principles of Artificial Intelligence.

CO2) Compare types of machine learning.
CO3) Build a simple Neural Network model to solve real world problem.
CO4) Apply data preprocessing on text/paragraph using NLTK library.
CO5) Demonstrate word embedding techniques to develop real world NLP applications.
CO1 CO2 CO3 CO4 CO5

S. No. Practical Outcome/Title of experiment
1. Use Cancer Dataset for detecting whether the

cancer cells in data are benign or malignant.
The data contains 2 types of cancers: 1. ✔
benign cancer (B) and 2. Malignant cancer
(M). Use relevant ML techniques for the
same.
2. Implement following activation functions ✔
using python to build simple neural network.
a. ReLU b. Sigmoid c. Tanh
3. Implement following feed forward neural ✔
network using python programming: a. Single
layer feed forward neural network. b. Multi-
layer feed forward neural network.
4. Perform following data preprocessing on ✔
text/paragraph using NLTK library:
a. Write a Python program to tokenize
words, sentence wise.
b. Write a python program that accepts
the list of tokenized word and stems it
into root word.
c. Write a program in python to identify
the part of speech for each word in the
text.
d. Write a Python NLTK program to
remove stop words from a given text.
e. Write a python program for identifying
and correcting misspelled words in a
given text, such as an essay or a letter.
5. Implement following Word embedding ✔
techniques in NLP.
a. TFIDF- Term Frequency Inverse
document Frequency
b. BOW (Bag of Words)
c. Word2Vec
5 | Page
Implement Pre Trained word Embedding: ✔

6.
GloVe Technique in NLP.
Industry Relevant Skills
The following industry relevant skills are expected to be developed in the students by
performance of experiments of this course.
a) Student will learn to automate variety of task making system more efficient and cost
effective.
b) Student will learn efficient handling of data that will cater to better data analytics
c) Student will lean to implement AI and ML approaches to varied field of applications from
healthcare to e-commerce.
Guidelines to Course Faculty

1. Course faculty should demonstrate experiment with all necessary implementation
strategies described in curriculum.
2. Course faculty should explain industrial relevance before starting of each experiment.
3. Course faculty should involve& give opportunity to all students for hands on experience.
4. Course faculty should ensure mentioned skills are developed in the students by asking.
5. Utilise 2 hours of lab hours effectively and ensure completion of write up with quiz also.
6. Encourage peer to peer learning by doing same experiment through fast learners.
Instructions for Students

1. Organize the work in the group and make record of all observations.
2. Students shall develop maintenance skill as expected by industries.
3. Student shall attempt to develop related hand-on skills and build confidence.
4. Student shall develop the habits of evolving more ideas, innovations, skills etc.
5. Student shall refer technical magazines and data books.
6. Student should develop habit to submit the practical on date and time.
7. Student should well prepare while submitting write-up of exercise.
6 | Page
Continuous Assessment Sheet

Enrolment No: Name:
Term:
Sr Practical Outcome/Title of experiment Page Date Marks Sign
no
(25)
1 Use Cancer Dataset for detecting whether the

cancer cells in data are benign or malignant. The
data contains 2 types of cancers: 1. benign cancer
(B) and 2. Malignant cancer (M). Use relevant ML
techniques for the same.
2 Implement following activation functions using
python to build simple neural network. a. ReLU b.
Sigmoid c. Tanh
Implement following feed forward neural network
3 using python programming: a. Single layer feed
forward neural network. b. Multi-layer feed forward
neural network.
4 Perform following data preprocessing on
text/paragraph using NLTK library:
a. Write a Python program to tokenize
words, sentence wise.
b. Write a python program that accepts the
list of tokenized word and stems it into
root word.
c. Write a program in python to identify the
part of speech for each word in the text.
d. Write a Python NLTK program to remove
stop words from a given text.
e. Write a python program for identifying
and correcting misspelled words in a given
text, such as an essay or a letter.
5 Implement following Word embedding
techniques in NLP.
a. TFIDF- Term Frequency Inverse document
Frequency
c. Word2Vec
6 Implement Pre Trained word Embedding: GloVe
Technique in NLP.
Date: ……………
Practical No.1: Use Cancer Dataset for detecting whether the cancer cells in data are
benign or malignant. The data contains 2 types of cancers: 1. benign
cancer (B) and 2. Malignant cancer (M). Use relevant ML techniques for
the same.
A. Objective: To develop a machine learning model to accurately classify cancer cells
as benign or malignant.
B. Expected Program Outcomes (POs):-PO1, PO2, PO3, PO4, PO5, PO6, PO7
C. Expected Skills to be developed based on competency:
 To develop accurate machine learning model

 To implement various ML technique.
D. Expected Course Outcomes(Cos) – CO2
E. Practical Outcome (PRo)
Build machine learning model that can effectively classify cancer cells as either
benign or malignant, providing a valuable tool for cancer diagnosis and treatment.
F. Expected Affective domain Outcome (ADos)
To foster empathy and motivation for positive change in participants regarding
cancer research and healthcare.
G. Prerequisite Theory:
Refer course Fundamentals of Machine Learning (4341603).

Few Technical Terms:
1. Supervised Learning: ML with labeled data to train a model for making
predictions.
2. Classification: Task of categorizing data into predefined classes (benign or
malignant).
3. Feature Vector: Numeric representation of data sample's characteristics.
4. Training Set: Labeled data used to train the model.
5. Testing Set: Unlabeled data used to evaluate the model's performance.
6. Feature Selection: Choosing relevant features for model training.
7. Model Evaluation Metrics: Accuracy, Precision, Recall, F1 Score for assessing
model performance.
8. Model Selection: Choosing the best ML algorithm and hyperparameters.
9. Cross-Validation: Technique to robustly evaluate the model.
10. Hyperparameters: Parameters set before training, influencing model

behaviour.
H. Resources/Equipment Required
Sr.No. Instrument/Equipment
Specification Quantity
/Components/Trainer kit
1 System supporting Jupyter Python 3.x 1
Notebook
I. Safety and necessary Precautions followed

 Read the experiment thoroughly before starting and ensure that you
understand all the steps and concepts involved from underpinning theory.
 Keep the workspace clean and organized, free from clutter and unnecessary
materials.
 Use the software according to its intended purpose and instructions.
 Ensure that all the necessary equipment and software are in good working
condition.
J. Procedure to be followed/Source code:
Student must use the space for writing source code. Understand and re-implement
different methods for handling data.(Exhaustive use of functions must be done)
Source Code & Output

K. Practical related Quiz

1. Which machine learning technique is commonly used for cancer cell
classification?
a) Decision trees b) Linear Regression
c) Support Vector d) K – Means clustering
Machine(SVM)
2. Which evaluation metric is typically used to measure the performance of a cancer
cell classification model?
a) Accuracy b) Mean Squared Error
c) Area under the ROC d) Precision & Recall

Curve(AUC-ROC)
L. References / Suggestions
1. https://www.kaggle.com/datasets/erdemtaha/cancer-data
2. https://towardsdatascience.com/building-a-simple-machine-learning-
model-on-breast-cancer-data-eca4b3b99fa3
M. Assessment-Rubrics
Total Exceptional Satisfactory Developing Limited

Criteria
Marks (5- Marks) (4 to 3 -Marks) (2-Marks) (1-Mark)
Presentinpractical
Watched other
sessionbutnotatte
Performe Performed students
ntivelyparticipate
d practical with performing
dinperformance
Engagement /5 practical others help practical but not
him/hers tried him/herself
elf
Accuracy /5 Accurately done 1-2 3-5 Morethan5errors/

mistakes
errors/mistakes errors/mistakes committed
found identified
No errors, Complete write-up Some of the

Program is well and output tables commands
Documentation /5 Poor write-up and
Executed and but presentation is missing with
diagram or missing
Documented poor missing outputs
content
Properly.
Fully Understood the Partially

Partially
understood performance but understood
Understanding & understood and
/5 the cannot explain the
Explanation cannot give
performance performance &
explanation
& can can give little
explain explanation
perfectly
Work is submitted
Work done after
later than 1week nd
2 week but
Time /5 Completed Work submitted
but by the end of before the end
the work nd after 3 week time
2 week rd
of 3 week
within
1week
Total Marks: /25 Signature with Date:

Date: ……………
Practical No.2: Implement following activation functions using python to build simple
neural network. a. ReLU b. Sigmoid c. Tanh
A. Objective: To understand and develop python code to implement activation
functions
B. Expected Program Outcomes (POs):- PO1, PO2, PO3, PO4, PO5, PO6, PO7
 To understand activation functions

 To implement activation functions using python
D. Expected Course Outcomes(Cos) – CO3
E. Practical Outcome(PRo)
Implement activation functions to build neural network.
F. Expected Affective domain Outcome(ADos)
To encourage further learning, and foster exploration of activation function
properties.
G. Prerequisite Theory: Refer Unit – 3 for more detail
ReLU (Rectified Linear Unit) - It applies a simple thresholding operation where all
negative values are set to zero, while positive values are left unchanged. R(z) =
max(0, z)
Sigmoid - The sigmoid activation function (also called logistic function) takes any
real value as input and outputs a value in the range (0,1). It is calculated as
follows: S(x) = 1/(1 + e-x)where x is the output value of the neuron.
Tanh -The tanh (hyperbolic tangent) activation function is a non-linear function
commonly used in neural networks. It maps the input values to a range between -
1 and 1, providing a smooth transition from negative to positive values.
Notebook

materials.
condition.
different methods for handling data. (Exhaustive use of functions must be done)


1. Which of the following statements is true regarding the ReLU activation function?
a) It sets all negative values b) It maps input values to a
to zero and leaves positive range between -1 and 1
values unchanged.
c) It is a linear activation d) It is used for binary
function classification tasks.
2. What is the range of output values produced by the sigmoid activation function?
a) Between -1 and 1 b) Between 0 and 1
c) Between -infinity and - d) Between -π/2 and π/2

infinity
1. https://towardsdatascience.com/activation-functions-neural-networks-
1cbd9f8d91d6
2. https://www.baeldung.com/cs/sigmoid-vs-tanh-functions

Criteria
Presentinpractical
Watched other
sessionbutnotatte
Performed Performed students
ntivelyparticipate
practical practical with performing
dinperformance
Engagement /5 him/herself others help practical but not
tried him/herself

mistakes
found identified
No errors, Complete write- Some of the

Program is well up and output commands
Executed and tables but missing with
diagram or missing
Documented presentation is missing outputs
content
Properly. poor

Partially
performance & performance &
explanation
can explain can give little
perfectly explanation
Work is
Work done after
submitted later nd
2 week but
than 1week but before the end
by the end of 2 rd
of 3 week
within 1week
week

Date: ……………
Practical No.3: Implement following feed forward neural network using python
programming: a. Single layer feed forward neural network. b. Multi-layer
feed forward neural network.
A. Objective: To understand and develop python code to implement neural network
B. Expected Program Outcomes (POs):-PO1, PO2, PO3, PO4, PO5, PO6, PO7
 To implement neural network

D. Expected Course Outcomes(Cos) - CO3
Build Single layer and Multi-layer feed forward neural network using Python
Programming
To encourage further learning, and apply it for various domains.
 Single Layer Feed Forward Neural Network:

It consists of an input layer, an output layer, and no hidden layers. The
input layer directly connects to the output layer without any intermediate
processing units. The output is calculated using a linear combination of the
input features and their associated weights. A bias term may also be
included to shift the output.
 Multi-Layer Feed Forward Neural Network:

It consists of an input layer, one or more hidden layers, and an output
layer. Each layer contains multiple processing units, also known as neurons.
The information flows forward through the layers, with each neuron in a
layer connected to all neurons in the subsequent layer.
 Refer Unit – 3 for more detail

Notebook

materials.
condition.
different methods for handling data. (Exhaustive use of functions must be done)


1. A single-layer feed-forward neural network is also known as a?
a) Multi-layer perceptron b) Deep Neural network
c) Single layer perceptron d) Convolution Neural
Network
2. A multi-layer feed-forward neural network consists of:
a) One Layer only b) Two layers only
c) Multiple hidden layers d) No hidden layers
1. https://www.cs.cmu.edu/afs/cs/academic/class/15381-
s07/www/slides/042407neuralNetworks.pdf

Criteria
Presentinpractical
Watched other
sessionbutnotatte
Performed Performed students
ntivelyparticipate
practical practical with performing
dinperformance
Engagement /5 him/herself others help practical but not
tried him/herself

mistakes
found identified
No errors, Complete write- Some of the

Program is well up and output commands
Executed and tables but missing with
diagram or missing
Documented presentation is missing outputs
content
Properly. poor

Partially
performance & performance &
explanation
can explain can give little
perfectly explanation
Work is
Work done after
submitted later nd
2 week but
than 1week but before the end
by the end of 2 rd
of 3 week
within 1week
week

Date: ……………
Practical No.4: Perform following data preprocessing on text/paragraph using NLTK
library:
a. Write a Python program to tokenize words, sentence wise.
b. Write a python program that accepts the list of tokenized word and
stems it into root word.
c. Write a program in python to identify the part of speech for each word
in the text.
d. Write a Python NLTK program to remove stop words from a given text.
e. Write a python program for identifying and correcting misspelled
words in a given text, such as an essay or a letter.
A. Objective: Learn data pre-processing using NLTK library to write the python
program.
B. Expected Program Outcomes (POs): PO1, PO2, PO3, PO4, PO5, PO6, PO7
 Able to apply data preprocessing on text/paragraph using NLTK library.
D. Expected Course Outcomes(Cos)

CO4
The program demonstrates the usage by tokenizing the example text and prints
each tokenized sentence.
Follow ethical practices
NLTK (Natural Language Toolkit) is a widely used library for natural language
processing (NLP) in Python. It provides a wide range of functionalities and
resources for tasks such as tokenization, stemming, part-of-speech tagging,
syntactic and semantic analysis, and much more. Here is an overview of the key
components and capabilities of NLTK:
 Tokenization: NLTK offers tokenization functions to break down text into

individual words or sentences. It provides methods like word_tokenize() and
sent_tokenize() to split text accordingly.
 Stemming: Stemming is the process of reducing words to their base or root
form. NLTK includes several stemmers, such as the Porter Stemmer and the
Snowball Stemmer, which can be used to perform stemming operations on
words.
 Part-of-Speech (POS) Tagging: POS tagging assigns grammatical tags to words
based on their context and role in a sentence. NLTK provides the pos_tag()
function, which uses pre-trained models to identify and tag the part of speech
for each word in a given text.
 Stop Words Removal: Stop words are common words like "a," "the," "and,"
etc., that often carry little or no meaningful information. NLTK includes a
corpus of stop words for various languages. You can use this corpus to filter
out stop words from your text and focus on more relevant words.
 Named Entity Recognition (NER): NLTK offers NER capabilities to identify and
classify named entities in text, such as names of persons, organizations,
locations, and other specified categories.
 Syntax and Semantic Analysis: NLTK provides tools for syntactic and semantic
analysis, including parsing algorithms, semantic role labeling, and semantic
similarity calculations.
 WordNet: NLTK integrates WordNet, a large lexical database of English words,
which provides a rich resource for semantic relationships, synsets (groups of
synonymous words), and definitions. You can use WordNet to perform tasks
like word sense disambiguation or synonym expansion.
 Machine Learning Integration: NLTK facilitates the integration of machine
learning algorithms for various NLP tasks. It provides support for feature
extraction, classification, clustering, and other machine learning techniques.
 Corpora and Language Resources: NLTK includes numerous pre-processed
corpora and language resources for tasks like sentiment analysis, text
classification, language modeling, and more. These resources can be
leveraged to train models and perform evaluations.
NLTK is highly extensible and allows users to customize and extend its
functionalities as per their requirements. It is widely used by researchers,
students, and professionals in the NLP field due to its comprehensive set of tools,
extensive documentation, and active community support. Overall, NLTK serves as
a powerful toolkit for various NLP tasks and serves as a great starting point for
developing NLP applications in Python.
Explore more on the following link:

1. https://www.nltk.org/
2. https://realpython.com/nltk-nlp-python/
H. Experimental set up/ Program Logic-Flow chart :
Here is program logic for a Python program that utilizes NLTK for various NLP tasks:
1. Import the necessary modules and libraries:
 nltk for NLP functionalities
 Specific modules like PorterStemmer or WordNetLemmatizer for word
stemming or lemmatization
2. Define functions for each task:
 Tokenization:
 Use word_tokenize() to tokenize words
 Use sent_tokenize() to tokenize sentences
 Word Stemming:
 Initialize a stemmer object (e.g., PorterStemmer())
 Use the stemmer's stem() function to stem each word
 Part-of-Speech (POS) Tagging:
 Use pos_tag() to get POS tags for each word
 Stop Words Removal:
 Use stopwords.words() to get a list of stopwords for a specific
language
 Filter out the stopwords from the tokenized words
 Misspelled Words Correction:
 Initialize a spell checker object (e.g., SpellChecker())
 Use the spell checker's correction() function to correct
misspelled words
3. Get user input or load text from a file.
4. Perform the desired NLP tasks:

 Tokenize the text into words and sentences.
 Stem or lemmatize the words if required.
 Perform POS tagging on the words.
 Remove stop words from the text.
 Correct any misspelled words in the text.
5. Display the results or store them for further processing.
Here is an example program structure that incorporates these steps:
import nltk
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
from nltk import pos_tag, word_tokenize
from spellchecker import SpellChecker
# Tokenization
deftokenize_words(text):
return word_tokenize(text)
deftokenize_sentences(text):
return sent_tokenize(text)
# Word Stemming
defstem_words(words):
stemmer = PorterStemmer()
return [stemmer.stem(word) for word in words]
# POS Tagging
defidentify_pos(words):
return pos_tag(words)
# Stop Words Removal

defremove_stop_words(words):
stop_words = set(stopwords.words("english"))
return [word for word in words if word.lower() not in stop_words]
# Misspelled Words Correction

defcorrect_spelling(words):
spell = SpellChecker()
return [spell.correction(word) for word in words]
# Example usage
text = "This is an example sentence. And here's another one!"
# Tokenization
words = tokenize_words(text)
sentences = tokenize_sentences(text)
# Word Stemming
stemmed_words = stem_words(words)
# POS Tagging
pos_tags = identify_pos(words)
# Stop Words Removal

filtered_words = remove_stop_words(words)
# Misspelled Words Correction

corrected_words = correct_spelling(words)
# Display the results

print("Tokenized words:", words)
print("Tokenized sentences:", sentences)
print("Stemmed words:", stemmed_words)
print("POS tags:", pos_tags)
print("Filtered words:", filtered_words)
print("Corrected words:", corrected_words)
I. Resources/Equipment Required
Specification
1 Computer system with Windows 7 or higher Ver., macOS, and
operating system Linux, with 4GB or higher RAM, Python
versions: 2.7.X, 3.6.X
2 Python IDEs and Code Editors jupyter, spyder, google colab, Open
Source : Anaconda Navigator
J. Safety and necessary Precautions followed

materials.
condition.
 Never eat or drink in the lab, as it can cause contamination and create safety
hazards.
 If any accidents or injuries occur, immediately notify the instructor and seek
medical attention if necessary.
K. Procedure to be followed/Source code


L. Practical related Quiz.

1. Which of the following is an example of a natural language generation task?
a) Identifying named entities b) Part-of-speech tagging
in a text
c) Machine translation d) Generating new text based
on input
2. Which of the following is an example of a pre-processing step in natural language
processing?
a) Creating a language model b) Identifying named entities
in a text
c) Tokenization d) Text classification
M. References / Suggestions
1. https://www.geeksforgeeks.org/machine-learning/
2. https://www.geeksforgeeks.org/natural-language-processing-nlp-tutorial/
3. https://www.tutorialspoint.com/machine_learning_with_python/index.htm
N. Assessment-Rubrics

Criteria
Presentinpractical
Watched other
sessionbutnotatte
ntivelyparticipate
dinperformance
elf

mistakes
found identified

diagram or missing
content
Properly.

Partially
Understanding& understood and
explanation
explain explanation
perfectly
Work is submitted
Work done after
later than 1week nd
2 week but
2 week rd
of 3 week
within
1week

Date: ……………
Practical No.5: Implement following Word embedding techniques in NLP.
a. TFIDF- Term Frequency Inverse document Frequency
c. Word2Vec
A. Objective: Learn Word embedding techniques in NLP

Demonstrate word embedding techniques to develop real world NLP applications.

CO5
The program demonstrates the usage of word embedding techniques to develop real
world NLP applications.

a. TF-IDF (Term Frequency-Inverse Document Frequency): TF-IDF is a numerical
statistic that represents the importance of a word in a document or a corpus. It
combines two factors: term frequency (TF), which measures how frequently a
word appears in a document, and inverse document frequency (IDF), which
measures the rarity of a word across the entire corpus. TF-IDF assigns higher
weights to words that appear frequently in a document but rarely across other
documents in the corpus. It is commonly used for information retrieval, text
mining, and document similarity analysis.
b. BOW (Bag of Words): The Bag of Words model represents text as an unordered
collection of words, disregarding grammar and word order. It builds a vocabulary
from all the unique words in the corpus and creates a vector representation for
each document, where the vector elements represent the presence or absence of
words in the document. BOW is a simple yet effective word embedding technique
that captures the frequency information of words in a document. It is often used
as a baseline approach for various NLP tasks like text classification and sentiment
analysis.
c. Word2Vec: Word2Vec is a popular word embedding technique that represents
words as dense vectors in a continuous vector space. It captures the semantic and
syntactic relationships between words by learning from large text corpora using
neural network architectures. Word2Vec offers two approaches: Continuous Bag
of Words (CBOW) and Skip-gram. CBOW predicts a target word from its
surrounding context words, while Skip-gram predicts the context words given a
target word. Word2Vec embeddings are known to capture meaningful
relationships between words and can be used for various NLP tasks like word
similarity, analogy completion, and language generation.
These word embedding techniques provide ways to represent words or
documents in a numerical form that can be easily processed by machine learning
algorithms. They facilitate capturing semantic relationships, similarity, and
contextual information from text data, enabling better understanding and analysis
of natural language. Each technique has its strengths and limitations, and the
choice depends on the specific task and requirements of the NLP application.
Also explore the following links
1. https://scikit-learn.org/stable/user_guide.html
2. https://www.geeksforgeeks.org/word-embeddings-in-nlp/
H. Experimental set up/ Program Logic :
TF-IDF is a numerical statistic that reflects the importance of a word in a

document within a collection or corpus. It combines two metrics: Term Frequency
(TF) and Inverse Document Frequency (IDF). TF measures the frequency of a word
in a document, while IDF measures the rarity of a word across the entire corpus.
Here's a step-by-step implementation guide for TF-IDF:
1. Corpus Preparation:
 Preprocess your text data by removing punctuation, converting to
lowercase, and handling stopwords (common words like "and,"
"the," etc., that don't carry much meaning).
 Tokenize your text data into individual words or n-grams
(contiguous sequences of n words).
2. Term Frequency (TF) Calculation:

 Calculate the frequency of each word in a document. You can use
simple word counting or more advanced techniques like sublinear
TF scaling, which downweights the frequency of highly frequent
words.
Example (using word counting):

defcalculate_tf(document):
word_frequency = {}
total_words = len(document)
for word in document:

if word in word_frequency:
word_frequency[word] += 1
else:
word_frequency[word] = 1
tf = {word: count/total_words for word, count in word_frequency.items()}

return tf
Inverse Document Frequency (IDF) Calculation:

 Calculate the IDF score for each word in the corpus. IDF is calculated as the
logarithm of the total number of documents divided by the number of documents
containing the word, smoothed to avoid division by zero.
import math
defcalculate_idf(corpus):
total_documents = len(corpus)
word_document_count = {}
for document in corpus:

unique_words = set(document)
for word in unique_words:
if word in word_document_count:
word_document_count[word] += 1
else:
word_document_count[word] = 1
idf = {word: math.log(total_documents / count) for word, count in

word_document_count.items()}
return idf
TF-IDF Calculation:
 Multiply the TF score (step 2) with the IDF score (step 3) for each word in a document
to get the final TF-IDF score.
defcalculate_tfidf(tf, idf):
tfidf = {word: tf[word] * idf[word] for word in tf}
return tfidf
Note: Apply same steps for BOW and word2Vec

I. Resources/Equipment Required
Specification
J. Safety and necessary Precautions followed

materials.
condition.
hazards.
K. Procedure to be followed/Source code:


L. Practical related Quiz.

1. Which of the following is an example of natural language processing?
a) Translating a document from b) Extracting insights from
English to Spanish customer review
c) Analyzing data in a d) Playing a game of chess
spreadsheet
2. What is the purpose of a corpus in natural language processing?
a) To represent a language b) To store and organize large
model amounts of text data
c) To measure the accuracy of d) To train a machine learning

a language model algorithm
3. What is the purpose of word embeddings in natural language processing?

a) To represent words as b) To identify the tone or
numerical vectors emotion expressed in a text
c) To identify and categorize d) To generate new text based

named entities in a text on input
4. Which of the following is efficient representation of text data?

a) Bag of Word b) TF-IDF
c) Word Vector d) BERT

M. References / Suggestions ( lab manual designer should give)
3. https://www.tutorialspoint.com/machine_learning_with_python/index.ht
N. Assessment-Rubrics

Criteria
Presentinpractical
Watched other
sessionbutnotatte
ntivelyparticipate
dinperformance
elf

mistakes
found identified

diagram or missing
content
Properly.

Partially
explanation
explain explanation
perfectly
Work is submitted
Work done after
later than 1week nd
2 week but
2 week rd
of 3 week
within
1week

Date: ……………
Practical No.6: Implement Pre Trained word Embedding: GloVe Technique in NLP.
A. Objective: To understand and develop python code to implement neural network

techniques.

Implement Pre Trained word Embedding: GloVe Technique in NLP.
CO5
Demonstrates the usage of Pre Trained word Embedding: GloVe Technique in NLP.

Pre-trained word vectors are words trained on a large corpus of text and are made
available for download in natural language processing (NLP) tasks. These pre-
trained word vectors can be used as a starting point for training a model on a new
dataset or as features in a machine learning model.
GloVe stands for Global Vectors for word representation. It is an unsupervised
learning algorithm developed by researchers at Stanford University aiming to
generate word embeddings by aggregating global word co-occurrence matrices
from a given corpus.
The basic idea behind the GloVe word embedding is to derive the relationship
between the words from statistics. Unlike the occurrence matrix, the co-
occurrence matrix tells you how often a particular word pair occurs together. Each
value in the co-occurrence matrix represents a pair of words occurring together.
To download pre-trained word vectors for GloVe (short for Global Vectors), you
can visit the GloVe website and select the desired word vector model from the
available options. These options may include different sizes of word vectors
(e.g., 50, 100, 200, 300 dimensions) and different training corpora (e.g., Wikipedia,
Common Crawl).
Also explore the following links

1. https://www.scaler.com/topics/nlp/glove-embeddings/
2. https://analyticsindiamag.com/hands-on-guide-to-word-embeddings-using-
glove/
Specification

materials.
condition.
hazards.


K. Practical related Quiz.

1. Same Word Can Have Multiple Word Embeddings Possible With ____________?
a) Glove b) Word2Vec
c) Elmo d) Nltk
2. What is the difference between natural language processing and machine

learning?
a) Machine learning is a type b) Machine learning is a type
of natural language of natural language
processing processing
c) Natural language d) There is no difference between

processing is focused on natural language processing
language-specific tasks, and machine learning
L. References / Suggestions ( lab manual designer should give)

3. https://www.tutorialspoint.com/machine_learning_with_python/index.htm

Criteria
Presentinpractical
Watched other
sessionbutnotatte
ntivelyparticipate
dinperformance
elf

mistakes
found identified

diagram or missing
content
Properly.

Partially
explanation
explain explanation
perfectly
Work is submitted
Work done after
later than 1week nd
2 week but
2 week rd
of 3 week
within
1week

Foundation of AI and ML
4351601
Lab manuals are prepared by
Ms. Rikita D. Parekh
Lecturer
Government Polytechnic for Girls, Ahmedabad
Dr. Lataben J. Gadhavi

Lecturer
Government Polytechnic, Gandhinagar
Dr. Esan P. Panchal

Lecturer
Branch Coordinator
Shri. Nandu A. Fatak
Head of Department
Information Technology
Government Polytechnic for Girls, Ahmedabad
Committee Chairman
Shri. R. D. Raghani
Head of Department- EC
Principal (I/C)

Aiml - 4351601

Uploaded by

Copyright:

Available Formats

Aiml - 4351601

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Aiml - 4351601

Uploaded by

Copyright:

Available Formats

Diploma Engineering

[Information Technology, Semester-V]

Directorate Of Technical Education

This is to certify that Mr./Ms ………………………………………………………………….

Signature of Course Faculty Head of the Department

Programme Outcomes (POs):

3. Design/ development of solutions: Design solutions for engineering well-defined technical

5. Engineering practices for society, sustainability and environment: Apply appropriate

6. Project Management: Use engineering management principles individually, as a team

Practical Outcome - Course Outcome matrix

CO1) Understand fundamental principles of Artificial Intelligence.

CO1 CO2 CO3 CO4 CO5

1. Use Cancer Dataset for detecting whether the

Implement Pre Trained word Embedding: ✔

Guidelines to Course Faculty

Instructions for Students

Continuous Assessment Sheet

1 Use Cancer Dataset for detecting whether the

C. Expected Skills to be developed based on competency:

 To develop accurate machine learning model

Refer course Fundamentals of Machine Learning (4341603).

10. Hyperparameters: Parameters set before training, influencing model

I. Safety and necessary Precautions followed

Source Code & Output

K. Practical related Quiz

c) Area under the ROC d) Precision & Recall

Total Exceptional Satisfactory Developing Limited

Accuracy /5 Accurately done 1-2 3-5 Morethan5errors/

No errors, Complete write-up Some of the

Fully Understood the Partially

Total Marks: /25 Signature with Date:

C. Expected Skills to be developed based on competency:

 To understand activation functions

I. Safety and necessary Precautions followed

Source Code & Output

K. Practical related Quiz

a) Between -1 and 1 b) Between 0 and 1

c) Between -infinity and - d) Between -π/2 and π/2

Total Exceptional Satisfactory Developing Limited

Accuracy /5 Accurately done 1-2 3-5 Morethan5errors/

No errors, Complete write- Some of the

Fully Understood the Partially

Total Marks: /25 Signature with Date:

C. Expected Skills to be developed based on competency:

 To implement neural network

 Single Layer Feed Forward Neural Network:

 Multi-Layer Feed Forward Neural Network:

 Refer Unit – 3 for more detail

I. Safety and necessary Precautions followed

Source Code & Output

K. Practical related Quiz

Total Exceptional Satisfactory Developing Limited

Accuracy /5 Accurately done 1-2 3-5 Morethan5errors/

No errors, Complete write- Some of the

Fully Understood the Partially

Total Marks: /25 Signature with Date:

 Able to apply data preprocessing on text/paragraph using NLTK library.

D. Expected Course Outcomes(Cos)

 Tokenization: NLTK offers tokenization functions to break down text into