2 IPCC - Natural Language Processing

NATURAL LANGUAGE PROCESSING
(Effective from the Academic Year 2023 - 2024)

VII SEMESTER
Course Code 21AI62 CIA Marks 50
Number of Contact Hours/Week (L: T: P: S) 3:0:2:0 SEE Marks 50
Total Hours of Pedagogy 40L + 20P Exam Hours 03
CREDITS – 4
COURSE PREREQUISITES:
 Fundamentals of Automata Theory and Basic knowledge of English Grammar.

COURSE OBJECTIVES:
 Define the natural language and analyze the importance of natural language.
 Analyze spelling error detection and correction methods and parsing techniques in NLP.
 Understand the Applications of natural language processing.
 Illustrate the information retrieval models in natural language processing.
TEACHING - LEARNING STRATEGY:
Following are some sample strategies that can be incorporate for the Course Delivery
 Chalk and Talk Method/Blended Mode Method
 Power Point Presentation
 Expert Talk/Webinar/Seminar
 Video Streaming/Self-Study/Simulations
 Peer-to-Peer Activities
 Activity/Problem Based Learning
 Case Studies
 MOOC/NPTEL Courses
 Any other innovative initiatives with respect to the Course contents
COURSE CONTENTS
MODULE - I
Overview and language modeling: Overview: Origins and challenges of NLP-Language and Grammar-
Processing Indian Languages- NLP Applications,
Language Modeling: Statistical Language Model- N-gram model- (unigram, bigram), Paninion 8
Framework, Karaka theory, Smoothing Techniques. Hours
Textbook 1: Ch. 1,2
Textbook 2: Part I - Ch3 ( Ch 3.1, 3.5)

MODULE - II
Word Level Analysis: Regular Expressions, Finite State Automata, Morphological Parsing, Spelling
Error Detection and Correction, Words and Word Classes-Part-of Speech Tagging.
8
Syntactic Analysis: Context-free Grammar, Constituency, top-down and bottom-up Parsing, CYK Hours
parsing.
Textbook 1: Ch. 3, 4
MODULE - III
Naive Bayes and Sentiment Classification: Naive Bayes Classifiers, Training the Naive Bayes Classifier,
worked example, Optimizing for Sentiment Analysis, Naive Bayes for other text classification tasks,
Naive Bayes as a Language Model 8
Hours
Textbook 2: Part I - Ch 4 (Section 4- 4.6)

MODULE - IV
Information Retrieval and Lexical Resources: Information Retrieval: Design features of Information
Retrieval Systems-Classical, Non-classical, Alternative Models of Information Retrieval- Custer model,
Fuzzy model, LSTM model, Major Issues in Information Retrieval. 8
Hours
Lexical Resources: World Net, Frame Net, Stemmers, POS Tagger- Research Corpora.
Textbook 1: Ch. 9,1, 12

MODULE - V
Machine Translation: Language Divergences and Typology, Machine Translation using Encoder-
Decoder, Details of the Encoder-Decoder Model, Translating in low-resource situations, MT Evaluation,
8
Bias and Ethical Issues Hours
Textbook 2: Part II - Ch 13 (Sections 13.1 - 13.7)

COURSE OUTCOMES
Upon completion of this course, the students will be able to:
Bloom’s
CO
Course Outcome Description Taxonomy
No.
Level
Discuss the concepts of NLP and demonstrate the statistical-based language models and
CO1 CL3
smoothing techniques
Demonstrate the use of morphological analysis and parsing using Finite State Transducers,
CO2 spelling error detection and correction, parts of speech tagging, context-free grammar, and CL3
different parsing approaches.
Apply the Naïve Bayes classifier and sentiment analysis for Natural language problems and
CO3 CL3
text classifications.
Illustrate the use of Information Retrieval in the context of NLP and understand the
CO4 concept of lexical semantics, lexical dictionaries such as WordNet, lexical computational CL3
semantics, distributional word similarity.
CO5 Develop the Machine Translation applications using Encoder and Decoder model. CL3
LABORATORY COMPONENTS
Bloom’s
Exp. CO
Experiment Description Taxonomy
No. No.
Level
Consider the following Corpus of three sentences
a) There is a big garden.
b) Children play in a garden
c) They play inside beautiful garden
1 Calculate P for the sentence “They play in a big Garden” assuming a bi-gram CL3
language model.
2 Find the bigram count for the given corpus. Apply Laplace smoothing and find the CL3
bigram probabilities after add-one smoothing (up to 4 decimal places)
3 Implement rule-based tagger and stochastic tagger for the give corpus of sentences. CL3
4 Implement top-down and bottom-up parsing using python NLTK. CL3
Given the following short movie reviews, each labeled with a genre, either comedy or
action:
a) fun, couple, love, love : comedy
b) fast, furious, shoot : action
c) couple, fly, fast, fun, fun :comedy
5 CL3
d) furious, shoot, shoot, fun :action
e) fly, fast, shoot, love :action
and a new document D: fast, couple, shoot, fly
compute the most likely class for D. Assume a naive Bayes classifier and use add-1
smoothing for the likelihoods.
The dataset contains following 5 documents.
D1: "Shipment of gold damaged in a fire"
D2: "Delivery of silver arrived in a silver truck"
D3: "Shipment of gold arrived in a truck"
D4: “Purchased silver and gold arrived in a wooden truck”
D5: “The arrival of gold and silver shipment is delayed.”
6 CL3
Find the top two relevant documents for the query document with the content “gold
silver truck " using the vector space model.
Use the following similarity measure and analyze the result.
a) Euclidean distance
b) Manhattan distance
c) Cosine similarity
The dataset contains following 4 documents.
D1: " It is going to rain today "
D2: " Today Rama is not going outside to watch rain"
D3: “I am going to watch the movie tomorrow with Rama"
D4: “Tomorrow Rama is going to watch the rain at sea shore "
Find the top two relevant documents for the query document with the content “Rama
7
watching the rain " using the latent semantic space model.
Use the following similarity measure and show the result analysis using bar chart.
a) Euclidean distance
b) Cosine similarity
c) Jaccard similarity
d) Dice Similarity Coefficient.
8 Extract Synonyms and Antonyms for a given word using WordNet. CL3
Implement a machine translator for 10 words using encoder-decoder model for any two
9 CL3
languages.
CO-PO-PSO MAPPING
Programme
Specific
CO Programme Outcomes (PO)
Outcome
No.
(PSO)
1 2 3 4 5 6 7 8 9 10 11 12 1 2
CO1 3
CO2 3
CO3 3
CO4 3
CO5
3: Substantial (High) 2: Moderate (Medium) 1: Poor (Low)
ASSESSMENT STRATEGY
Assessment will be both CIA and SEE. Students learning will be assessed using Direct and Indirect methods:
Sl. No. Assessment Description Weightage (%) Max. Marks
1 Continuous Internal Assessment (CIA) 100 % 50
Continuous Internal Evaluation (CIE) 60 % 30
Practical Session (Laboratory Component) 40 % 20
2 Semester End Examination (SEE) 100 % 50
ASSESSMENT DETAILS
Continuous Internal Assessment (CIA) (50%) Semester End Exam (SEE) (50%)
Practical Sessions (40%)
Continuous Internal Evaluation (CIE) (60%)
I II III
Syllabus Coverage Syllabus Coverage Syllabus Coverage
40% 30% 30% 100% 100%
MI MI MI
MII MII MII MII
MIII MIII MIII
MIV MIV MIV
MV MV MV
NOTE:
● Assessment will be both CIA and SEE.
● The practical sessions of the IPCC shall be for CIE only.
● The Theory component of the IPCC shall be for both CIA and SEE respectively.
● The questions from the practical sessions shall be included in Theory SEE.
Note: For Examinations (both CIE and SEE), the question papers shall contain the questions mapped to the
appropriate Bloom’s Level. Any COs mapped with higher cognitive Bloom’s Level may also be assessed through the
assignments.
SEE QUESTION PAPER PATTERN:
1. The question paper will have TEN full questions from FIVE Modules
2. There will be 2 full questions from each module. Every question will carry a maximum of 20 marks.
3. Each full question may have a maximum of four sub-questions covering all the topics under a module.
4. The students will have to answer FIVE full questions, selecting one full question from each module.
TEXT BOOKS:
1. Tanveer Siddiqui, U.S. Tiwary, “Natural Language Processing and Information Retrieval”, Oxford
University Press, 2008.
2. D. Jurafsky, J. H. Martin, “Speech and Language Processing, An Introduction to Natural Language
Processing, Computational Linguistics, and Speech Recognition (3e)”, Pearson Education, 2023.
REFERENCE BOOKS:
1. Akshay Kulkarni, Adarsha Shivananda, “Natural Language Processing Recipes - Unlocking Text Data
with Machine Learning and Deep Learning using Python”, Apress, 2019
2. James Allen, “Natural Language Understanding”, 2nd edition, Benjamin/Cummings publishing company,
1995.
3. Gerald J. Kowalski and Mark.T. Maybury, “Information Storage and Retrieval systems”, Kluwer
Academic Publishers, 2000
REFERENCE WEB LINKS AND VIDEO LECTURES (E - RESOURCES):
1. https://onlinecourses.nptel.ac.in/noc23_cs45/preview
2. https://www.coursera.org/specializations/natural-language-processing

2 IPCC - Natural Language Processing

Uploaded by

Copyright:

Available Formats

2 IPCC - Natural Language Processing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 IPCC - Natural Language Processing

Uploaded by

Copyright:

Available Formats

NATURAL LANGUAGE PROCESSING

(Effective from the Academic Year 2023 - 2024)

 Fundamentals of Automata Theory and Basic knowledge of English Grammar.

Textbook 1: Ch. 1,2

Textbook 2: Part I - Ch3 ( Ch 3.1, 3.5)

Textbook 2: Part I - Ch 4 (Section 4- 4.6)

Textbook 1: Ch. 9,1, 12

Textbook 2: Part II - Ch 13 (Sections 13.1 - 13.7)

● The practical sessions of the IPCC shall be for CIE only.

You might also like