Curriculum 10042024193622

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

SCHOOL OF ADVANCED SCIENCES

DEPARTMENT OF MATHEMATICS

M.Sc. Data Science


(MDT)

Curriculum & Syllabi


(2021–2022 Admitted Students)
VISION STATEMENT OF VELLORE INSTITUTE OF TECHNOLOGY

Transforming life through excellence in education and research.

MISSION STATEMENT OF VELLORE INSTITUTE OF


TECHNOLOGY

 World class Education: Excellence in education, grounded in ethics and


critical thinking, for improvement of life.
 Cutting edge Research: An innovation ecosystem to extend knowledge
and solve critical problems.
 Impactful People: Happy, accountable, caring and effective workforce
and students.
 Rewarding Co-creations: Active collaboration with national &
international industries & universities for productivity and economic
development.
 Service to Society: Service to the region and world through knowledge
and compassion.

VISION STATEMENT OF SCHOOL OF ADVANCED SCIENCES

To be an internationally renowned science school in research and innovation by


imparting futuristic education relevant to the society.

MISSION STATEMENT OF SCHOOL OF ADVANCED SCIENCES

 To nurture students from India and abroad by providing quality education


and training to become scientists, technologists, entrepreneurs and global
leaders with ethical values for a sustainable future.

 To enrich knowledge through innovative research in niche areas.

 To ignite passion for science and provide solutions for national and
global challenges.

M.Sc. Data Science - Curriculum Page 2


M.Sc. Data Science

PROGRAMME EDUCATIONAL OBJECTIVES (PEOs)

1. Graduates will be practitioners and leaders in their chosen field.

2. Graduates will function in their profession with social awareness and


responsibility.

3. Graduates will interact with their peers in other disciplines in their work
place and society and contribute to the economic growth of the country.

4. Graduates will be successful in pursuing higher studies in their chosen


field.

5. Graduates will pursue career paths in teaching or research.

M.Sc. Data Science - Curriculum Page 3


M.Sc. Data Science

PROGRAMME OUTCOMES (POs)

PO_01: Having a clear understanding of the subject related concepts and of


contemporary issues.

PO_02: Having problem solving ability to address social issues.

PO_03: Having a clear understanding of professional and ethical responsibility.

PO_04: Having cross cultural competency exhibited by working in teams.

PO_05: Having a good working knowledge of communicating in English.

M.Sc. Data Science - Curriculum Page 4


M.Sc. Data Science

PROGRAMME SPECIFIC OUTCOMES (PSOs)

On completion of M.Sc. Data Science programme, graduates will be able to

PSO1: To become a skilled Data Scientist in industry, academia, or


government

PSO2: To use specialist software tools for data storage, analysis and
visualization

PSO3: Able to independently carry out research/investigation to solve


practical problems

M.Sc. Data Science - Curriculum Page 5


M.Sc. Data Science

CREDIT STRUCTURE

Category-wise Credit distribution

Category Credits
University Core (UC) 29
University Elective (UE) 06
Programme Core (PC) 23
Programme Elective (PE) 22
Total Credits 80

M.Sc. Data Science - Curriculum Page 6


M.Sc. Data Science

DETAILED CURRICULUM

University Core (UC)


Course
S. No. Course Title L T P J C
Code
1 MAT5010 Foundations of Data Science 3 0 0 0 3
2 RES5001 Research Methodology 2 0 0 0 2
Science, Engineering and Technology
3 SET5001 0 0 0 0 2
Project – I
Science, Engineering and Technology
4 SET5002 0 0 0 0 2
Project – II
Science, Engineering and Technology
5 SET5003 0 0 0 0 2
Project – III
6 MDT6099 Master's Thesis 0 0 0 0 14
ENG5003/ English for Science and Technology/ 0 0 4 0 2
7 FRE5001/ French/ 2 0 0 0 2
GER5001 German 2 0 0 0 2
8 STS4001 Essentials of Business Etiquettes-Soft Skills 3 0 0 0 1
9 STS4002 Preparing for Industry 3 0 0 0 1

M.Sc. Data Science - Curriculum Page 7


M.Sc. Data Science

DETAILED CURRICULUM

Programme Core (PC)


Course
S. No. Course Title L T P J C
Code
1 MAT5011 Matrix Theory and Linear Algebra 3 0 0 0 3
2 MAT5012 Probability Theory and Distributions 3 0 2 0 4
3 MAT5013 Statistical Inference 3 0 2 0 4
4 MAT5016 Time series analysis and Forecasting 3 0 2 0 4
5 MAT5017 Multivariate Data Analysis 3 0 2 0 4
Regression Analysis and Predictive
6 MAT6002 3 0 2 0 4
Models

M.Sc. Data Science - Curriculum Page 8


M.Sc. Data Science

DETAILED CURRICULUM

Programme Elective (PE)


Course
S. No. Course Title L T P J C
Code
1 MAT6003Programming for Data Science 0 0 4 0 2
2 MAT6004Computational Statistics for Data Science 0 0 4 0 2
3 MAT6005Machine learning for Data Science 3 0 2 0 4
4 MAT6007Deep learning 2 0 2 0 3
5 MAT6008Artificial intelligence for Data Science 2 0 2 0 3
6 MAT6009Design and Analysis of Experiments 3 0 2 0 4
7 MAT6010Optimization Techniques 3 2 0 0 4
8 MAT6011Statistical Quality Control 3 0 2 0 4
9 MAT6012Programming for Data Analysis 2 0 4 0 4
10 MATXXXX Bio-Statistics 2 0 2 0 3
11 MATXXXX Reliability and Survival Analysis 2 0 2 0 3
12 MATXXXX Queuing Theory and Network Analysis 3 0 0 0 3
13 MATXXXX Stochastic Process and Applications 3 0 0 0 3
14 MATXXXX Statistical Computing for Data Analysis 0 0 4 0 2
15 MATXXXX Statistics for Managers 3 0 0 0 3
16 MATXXXX Data Mining and Information Security 2 0 0 4 3
Exploratory Data Analysis and
17 MATXXXX 3 0 2 0 4
Visualization
18 MATXXXX Actuarial statistics 2 2 0 0 3

M.Sc. Data Science - Curriculum Page 9


University Core

M.Sc. Data Science - Curriculum Page 10


Course Code Course Title L T P J C
MAT5010 Foundations of Data Science 3 0 0 0 3
Pre-Requisite Syllabus Version
1.1
Course Objectives :
The course is aimed at
 Building the fundamentals of data science.
 Imparting design thinking capability to build big-data.
 Developing design skills of models for big data problems.
 Gaining practical experience in programming tools for data sciences.
 Empowering students with tools and techniques used in data science.

Expected Course Outcome:


At the end of the course the student should be able to
 Apply data visualisation in big-data analytics.
 Utilise EDA, inference and regression techniques.
 Utilize Matrix decomposition techniques to perform data analysis.
 Apply data pre-processing techniques.
 Apply Basic Machine Learning Algorithms.

Module: 1 Introduction 4 hours


Big Data and Data Science - Big Data Analytics, Business intelligence vs Big data, big data
frameworks, Current landscape of analytics, data visualisation techniques, visualisation software.

Module: 2 EDA 6 hours


Exploratory Data Analysis (EDA), statistical measures, Basic tools (plots, graphs and summary
statistics) of EDA, Data Analytics Lifecycle, Discovery.

Module: 3 Basic Statistical Inference 6 hours


Developing Initial Hypotheses, Identifying Potential Data Sources, EDA case study, testing
hypotheses on means, proportions and variances.

Module: 4 Regression models 6 hours


Regression models: Simple linear regression, least-squares principle, MLR, logistic regression,
Multiple correlation, Partial correlation.

Module: 5 Linear Algebra Basics 6 hours


Matrices to represent relations between data, Linear algebraic operations on matrices – Matrix
decomposition: Singular Value Decomposition (SVD) and Principal Component Analysis (PCA).

Module: 6 Data Pre-processing and Feature Selection 7 hours


Data cleaning - Data integration - Data Reduction - Data Transformation and Data Discretization,
Feature Generation and Feature Selection, Feature Selection algorithms: Filters- Wrappers - Decision
Trees - Random Forests.

M.Sc. Data Science - Curriculum Page 11


Module:7 Basic Machine Learning Algorithms 8 hours
Classifiers - Decision tree - Naive Bayes - k-Nearest Neighbors (k-NN), k-means – SVM Association
Rule mining – Ensemble methods.

Module: 8 Contemporary issues 2 hours


Lecture by Industry Experts

Total Lecture hours: 45 hours


Text Book(s)
 Mining of Massive Datasets. v2.1, Jure Leskovek, Anand Rajaraman and Jefrey Ullman.,
Cambridge University Press, 2019. (free online)
 Big Data Analytics, paperback 2nd ed., Seema Acharya, Subhasini Chellappan, Wiley, 2019.
Reference Book(s)
 Doing Data Science, Straight Talk From The Frontline, Cathy O'Neil and Rachel Schutt,
O'Reilly, 2014.
 Data Mining: Concepts and Techniques”, Third Edition, Jiawei Han, Micheline Kamber and
Jian Pei, ISBN 0123814790, 2011.
 Big Data and Business Analytics, Jay Liebowitz, CRC press, 2013.
 Data mining methods,2nd edition, C. Rajan, Narosa , 2016.
Mode of Evaluation: CAT / Assignment / Quiz / FAT / Project / Seminar
Recommended by Board of
24.06.2020
Studies
Approved by Academic Council No. 59 Date 24.09.2020

M.Sc. Data Science - Curriculum Page 12


Course Code Course Title L T P J C
ENG5003 English for Science and Technology 0 0 4 0 2
(for MCA & M.Sc., programmes)
Pre-Requisite Cleared EPT Syllabus Version
1.1
Course Objectives:
 To enable students to communicate effectively in social, academic and professional contexts
thereby enhancing their interpersonal, managerial, problem-solving, and presentation skills.
 To facilitate students to develop their listening competency and critically evaluate and review
documentaries, talks and speeches.
 To Assist students to read and comprehend News Articles and Scientific Texts; effectively
interpret tables and graphs; write and proof-read official correspondences.

Expected Course Outcomes (CO):


 Make effective presentations and display their interpersonal skills in academic and professional
contexts.
 Emerge as good listeners and critically evaluate oral communication.
 Excel in reading, comprehending and interpreting technical reports, texts and data.
 Able to write effectively in English and also display their proof-reading abilities.
 Face real interviews and handle personal and professional conflicts effectively.

Module:1 Career Goals 4hours


Short term and long term career goals
Activity: SWOT Analysis/ Comprehending speeches

Module:2 Interpersonal Skills 4 hours


Interpersonal Communication in/with Groups (Corporate Etiquette: Journey from Campus to corporate)
Activity: Role Plays/Mime/Skit

Module:3 Listening Skills 4 hours


Listening to Documentary
Activity: Critically evaluate/Review a documentary/TED Talk

Module:4 Reading Skills 4hours


Skimming, Scanning, Intensive & Extensive reading
Activity: Reading News Papers/Magazines/Scientific Texts

Module:5 Report Writing 4hours


Language and mechanics of writing report
Activity: Writing a Report/Mini Project

Module:6 Study Skills 4hours


Summarizing the report
Activity: Abstract, Executive Summary, Digital Synopsis

Module:7 Interpreting skills 4hours

M.Sc. Data Science - Curriculum Page 13


Interpret data in tables and graphs
Activity: Transcoding

Module:8 Editing Skills 4hours


Proof Reading Sequencing
Activity: Editing any given text

Module:9 Presentation Skills 4 hours


Oral Presentation using digital tools
Activity: Oral presentation on the given topic using appropriate non-verbal cues

Module:10 Group Discussion 4 hours


Intragroup interaction (avoid, accommodate, compete, compromise, collaborate)
Activity: Group discussion on a given topic

Module:11 Professional Skills 4 hours


Résumé Writing
Activity: Prepare an Electronic Résumé

Module:12 Skill-Gap Analysis 4 hours


Tailor your skills to suit the Job needs
Activity: Write a SoP for higher Studies/Purpose Statement for job

Module:13 Interview Skills 4 hours


Placement/Job Interview
Activity: Mock Interview

Module:14 Managerial Skills 4 hours


Official Meeting to organize events
Activity: Writing Agenda, Minutes of Meeting (video conferencing) and Organizing an event

Module:15 Problem Solving Skills 4 hours


Conflict Management & Decision Making
Activity: Case analysis of a challenging Scenario

Total Lecture hours: 60 hours


Text Book(s)
 Kuhnke, E. Communication Essentials For Dummies. (2015). First Edition. John Wiley
& Sons.

Hewings, M. Advanced Grammar in Use Book with Answers and CD-ROM: A Self-
Study Reference and Practice Book for Advanced Learners of English. (2013). Third
Edition. Cambridge University Press. UK.
Reference Book(s)
 Churches, R. Effective Classroom Communication Pocketbook. Management Pocketbooks.
(2015). First Edition. USA.
 Wallwork, A. English for Writing Research Papers. (2016). Second Edition. Springer.

M.Sc. Data Science - Curriculum Page 14


 Wood, J. T. Communication in Our Lives. (2016). Cengage Learning. Boston. USA.
 Anderson, C. TED Talks: The Official TED Guide to Public Speaking. (2016). First
Edition.Boston. Houghton Mifflin. New. York.
 Zinsser, William. On writing well. HarperCollins Publishers. 2016. Thirtieth Edition. New
York.
 Tebeaux, Elizabeth, and Sam Dragga. The essentials of Technical Communication. 2015.
First Edition Oxford University Press. USA.
Mode of Evaluation: Mini Project, Flipped Class Room, Lecture, PPT’s, Role play, Assignments
Class/Virtual Presentations, Report and beyond the classroom activities
List of Challenging Experiments (Indicative)
1. Setting short term and long term goals 2 hours
2. Mime/Skit/ Activities through VIT Community Radio 6 hours
3. Critically evaluate / review a documentary/ Activities through VIT 4 hours
Community Radio
4. Mini Project 10 hours
5. Digital Synopsis 4 hours
6. Case analysis of a challenging Scenario 4 hours
7. Intensive & Extensive reading of Scientific Texts 4 hours
8. Editing any given text 8 hours
9. Group discussion on a given topic / Activities through VIT Community 8 hours
Radio
10. Prepare a video résumé along with your video introduction and then 10 hours
create a website (in Google Sites/Webly/Wix) showcasing skills and
achievements.
Total Laboratory Hours 60 hours
Mode of evaluation: Mini Project, Flipped Class Room, Lecture, PPT’s, Role play, Assignments
Class/Virtual Presentations, Report and beyond the classroom activities
Recommended by Board of 22-07-2017
Studies
Approved by Academic Council No. 47 Date 24.08.2017

M.Sc. Data Science - Curriculum Page 15


Course Code Course Title L T P J C
FRE5001 Francais Fonctionnel 2 0 0 0 2
Pre-Requisite Nil Syllabus Version
1.0
Course Objectives:
The course gives students the necessary background to:
 demonstrate competence in reading, writing, and speaking basic French, including knowledge
of vocabulary (related to profession, emotions, food, workplace, sports/hobbies, classroom and
family).
 achieve proficiency in French culture-oriented viewpoint.

Expected Course Outcome: Students will be able to


 Remember the daily life communicative situations via personal pronouns, emphatic pronouns,
salutations, negations, interrogations etc.
 Create communicative skill effectively in the French language via regular/irregular verbs.
 Demonstrate comprehension of the spoken/written language in translating simple sentences.
 Understand and demonstrate the comprehension of some particular new range of unseen
written materials.
 Demonstrate a clear understanding of the French culture through the language studied.

Module:1 Saluer, Se présenter, Etablir des contacts 3 hours


Les Salutations, Les nombres (1-100), Les jours de la semaine, Les mois de l’année, Les Pronoms
Sujets, Les Pronoms Toniques, La conjugaison des verbes réguliers, La conjugaison des verbes
irréguliers- avoir / être / aller / venir / faire etc.

Module:2 Présenter quelqu’un, Chercher un(e) correspondant(e), 3 hours


Demander des nouvelles d’une personne.

La conjugaison des verbes Pronominaux, La Négation, L’interrogation avec ‘Est-ce que ou sans Est-ce
que’.

Module:3 Situer un objet ou un lieu, Poser des questions 4 hours


L’article (défini/ indéfini), Les prépositions (à/en/au/aux/sur/dans/avec etc.), L’article contracté, Les
heures en français, La Nationalité du Pays, L’adjectif (La Couleur, l’adjectif possessif, l’adjectif
démonstratif/ l’adjectif interrogatif (quel/quelles/quelle/quelles), L’accord des adjectifs avec le
nom, L’interrogation avec Comment/ Combien / Où etc.,

Module:4 Faire des achats, Comprendre un texte court, Demander et 6 hours


indiquer le chemin.
La traduction simple :(français-anglais / anglais –français)

Module:5 Trouver les questions, Répondre aux questions générales en 5 hours


français.
L’article Partitif, Mettez les phrases aux pluriels, Faites une phrase avec les mots donnés, Exprimez les
phrases données au Masculin ou Féminin, Associez les phrases.

M.Sc. Data Science - Curriculum Page 16


Module:6 Comment ecrire un passage 3 hours
Décrivez :
La Famille /La Maison, /L’université /Les Loisirs/ La Vie quotidienne etc.

Module:7 Comment ecrire un dialogue 4 hours


Dialogue:
a) Réserver un billet de train
b) Entre deux amis qui se rencontrent au café
c) Parmi les membres de la famille
d) Entre le client et le médecin

Module:8 Invited Talk: Native speakers 2 hours

Total Lecture hours: 30 hours


Text Book(s)
 Echo-1, Méthode de français, J. Girardet, J. Pécheur, Publisher CLE International, Paris 2010.

 Echo-1, Cahier d’exercices, J. Girardet, J. Pécheur, Publisher CLE International, Paris 2010.
Reference Books
CONNEXIONS 1, Méthode de français, Régine Mérieux, Yves Loiseau,Les Éditions Didier,
 2004.
CONNEXIONS 1, Le cahier d’exercices, Régine Mérieux, Yves Loiseau, Les Éditions Didier,

 2004.
ALTER EGO 1, Méthode de français, Annie Berthet, Catherine Hugo, Véronique M. Kizirian,
Béatrix Sampsonis, Monique Waendendries , Hachette livre 2006.
Mode of Evaluation: CAT / Assignment / Quiz / FAT
Recommended by Board of Studies 26.2.2016
Approved by Academic Council No 41 Date 17.6.2016

M.Sc. Data Science - Curriculum Page 17


Course Code Course Title L T P J C
GER5001 Deutsch für Anfänger 2 0 0 0 2
Pre-Requisite NIL Syllabus Version
1.0
Course Objectives:
The course gives students the necessary background to:
 enable students to read and communicate in German in their day to day life
 become industry-ready
 make them understand the usage of grammar in the German Language.

Expected Course Outcome: Students will be able to


 Create the basics of the German language in their day to day life.
 Understand the conjugation of different forms of regular/irregular verbs.
 Understand the rule to identify the gender of the Nouns and apply articles appropriately.
 Apply the German language skill in writing corresponding letters, E-Mails etc.
 Create the talent of translating passages from English-German and vice versa and to frame
simple dialogues based on given situations.

Module:1 3 hours
Einleitung, Begrüssungsformen, Landeskunde, Alphabet, Personalpronomen, Verb Konjugation,
Zahlen (1-100), W-fragen, Aussagesätze, Nomen – Singular und Plural
Lernziel:
Elementares Verständnis von Deutsch, Genus- Artikelwörter

Module:2 3 hours
Konjugation der Verben (regelmässig /unregelmässig) die Monate, die Wochentage, Hobbys,
Berufe, Jahreszeiten, Artikel, Zahlen (Hundert bis eine Million), Ja-/Nein- Frage, Imperativ mit Sie
Lernziel :
Sätze schreiben, über Hobbys erzählen, über Berufe sprechen usw.

Module:3 4 hours
Possessivpronomen, Negation, Kasus- AkkusatitvundDativ (bestimmter, unbestimmterArtikel),
trennnbare verben, Modalverben, Adjektive, Uhrzeit, Präpositionen, Mahlzeiten, Lebensmittel,
Getränke
Lernziel :
Sätze mit Modalverben, Verwendung von Artikel, über Länder und Sprachen sprechen, über eine
Wohnung beschreiben.

Module:4 6 hours
Übersetzungen : (Deutsch – Englisch / Englisch – Deutsch)
Lernziel :
Grammatik – Wortschatz – Übung

Module:5 5 hours
Leseverständnis,Mindmap machen,Korrespondenz- Briefe, Postkarten, E-Mail
Lernziel :

M.Sc. Data Science - Curriculum Page 18


Wortschatzbildung und aktiver Sprach gebrauch

Module:6 . 3 hours
Aufsätze :
Meine Universität, Das Essen, mein Freund oder meine Freundin, meine Familie, ein Fest in
Deutschland usw

Module:7 4 hours
Dialoge:
e) Gespräche mit Familienmitgliedern, Am Bahnhof,
f) Gespräche beim Einkaufen ; in einem Supermarkt ; in einer Buchhandlung ;
g) in einem Hotel - an der Rezeption ;ein Termin beim Arzt. Treffen im Cafe

Module:8 Contemporary issues 2 hours


Lecture by Industry Experts.
Total Lecture hours: 30 hours

Text Book(s)
 Studio
1 d A1 Deutsch als Fremdsprache, Hermann Funk, Christina Kuhn, Silke Demme :
2012
.
Reference Books
 Netzwerk
1 Deutsch als Fremdsprache A1, Stefanie Dengler, Paul Rusch, Helen Schmtiz,
Tanja Sieber, 2013
 Lagune
2 ,Hartmut Aufderstrasse, Jutta Müller, Thomas Storz, 2012.
 Deutsche
3 SprachlehrefürAUsländer, Heinz Griesbach, Dora Schulz, 2011
 ThemenAktuell
4 1, HartmurtAufderstrasse, Heiko Bock, MechthildGerdes, Jutta Müller und
Helmut Müller, 2010
www.goethe.de
wirtschaftsdeutsch.de
hueber.de, klett-sprachen.de
www.deutschtraning.org
Mode of Evaluation: CAT / Assignment / Quiz / FAT
Recommended by Board of Studies 04.03.2016
Approved by Academic Council No. 41 Date 17.06.2016

M.Sc. Data Science - Curriculum Page 19


Course Code Course Title L T P J C
STS4001 Essentials of Business Etiquettes 3 0 0 0 1
Pre-Requisite Syllabus Version
2.0
Course Objectives:
 To develop the students’ logical thinking skills
 To learn the strategies of solving quantitative ability problems
 To enrich the verbal ability of the students
 To enhance critical thinking and innovative skills

Expected Course Outcome:


 Enabling students to use relevant aptitude and appropriate language to express themselves
 To communicate the message to the target audience clearly

Module:1 Business Etiquette: Social and Cultural Etiquette and Writing 9 hours
Company Blogs and Internal Communications and Planning
and Writing press release and meeting notes
Value, Manners, Customs, Language, Tradition, Building a blog, Developing brand message, FAQs',
Assessing Competition, Open and objective Communication, Two-way dialogue, Understanding the
audience, Identifying, Gathering Information, Analysis, Determining, Selecting plan, Progress check,
Types of planning, Write a short, catchy headline, Get to the Point –summarize your subject in the
first paragraph., Body – Make it relevant to your audience.

Module:2 Study skills – Time management skills 3 hours


Prioritization, Procrastination, Scheduling, Multitasking, Monitoring, Working under pressure and
adhering to deadlines

Module:3 Presentation skills – Preparing presentation and Organizing 7 hours


materials and Maintaining and preparing visual aids and
Dealing with questions
10 Tips to prepare PowerPoint presentation, Outlining the content, Passing the Elevator Test, Blue
sky thinking, Introduction, body and conclusion, Use of Font, Use of Color, Strategic presentation,
Importance and types of visual aids, Animation to captivate your audience, Design of posters, Setting
out the ground rules, Dealing with interruptions, Staying in control of the questions, Handling
difficult questions

Module:4 Quantitative Ability -L1 – Number properties and Averages 11 hours


and Progressions and Percentages and Ratios
Number of factors, Factorials, Remainder Theorem, Unit digit position, Tens digit position,
Averages, Weighted Average, Arithmetic Progression, Geometric Progression, Harmonic
Progression, Increase & Decrease or successive increase, Types of ratios and proportions.

Module:5 Reasoning Ability-L1 – Analytical Reasoning 8 hours


Data Arrangement(Linear and circular & Cross Variable Relationship), Blood Relations,
Ordering/ranking/grouping, Puzzle test, Selection Decision table

Module:6 Verbal Ability-L1 – Vocabulary Building 7 hours

M.Sc. Data Science - Curriculum Page 20


Synonyms & Antonyms, One-word substitutes, Word Pairs, Spellings, Idioms, Sentence
completion, Analogies

Total Lecture hours: 45 hours

Reference Books
1. Kerry Patterson, Joseph Grenny, Ron McMillan, Al Switzler(2001) Crucial Conversations:
Tools for Talking When Stakes are High. Bangalore. McGraw‐Hill Contemporary
2. Dale Carnegie,(1936) How to Win Friends and Influence People. New York. Gallery Books
3. Scott Peck. M(1978) Road Less Travelled. New York City. M. Scott Peck.
4. FACE(2016) Aptipedia Aptitude Encyclopedia. Delhi. Wiley publications
5. ETHNUS(2013) Aptimithra. Bangalore. McGraw-Hill Education Pvt. Ltd.
Websites:
1. www.chalkstreet.com
2. www.skillsyouneed.com
3. www.mindtools.com
4. www.thebalance.com
5. www.eguru.ooo
Mode of Evaluation: FAT, Assignments, Projects, Case studies, Roleplays,
3 Assessments with Term End FAT (Computer Based Test)
Recommended by Board of Studies 09.06.2017
Approved by Academic Council No. 45 Date 15.06.2017

M.Sc. Data Science - Curriculum Page 21


Course Code Course Title L T P J C
STS4002 Preparing for Industry 3 0 0 0 1
Pre-Requisite Syllabus Version
2.0
Course Objectives:
 To develop the students’ logical thinking skills
 To learn the strategies of solving quantitative ability problems
 To enrich the verbal ability of the students
 To enhance critical thinking and innovative skills

Expected Course Outcome:


 Enabling students to simplify, evaluate, analyze and use functions and expressions to
simulate real situations to be industry-ready.

Module:1 Interview skills – Types of interview and Techniques to face 3 hours


remote interviews and Mock Interview
Structured and unstructured interview orientation, Closed questions and hypothetical questions,
Interviewers’ perspective, Questions to ask/not ask during an interview, Video interview¸ Recorded
feedback, Phone interview preparation, Tips to customize preparation for personal interview,
Practice rounds

Module:2 Resume skills – Resume Template and Use of power verbs 2 hours
and Types of resume and Customizing resume
Structure of a standard resume, Content, color, font, Introduction to Power verbs and Write up, Quiz
on types of resume, Frequent mistakes in customizing resume, Layout - Understanding different
company's requirement, Digitizing career portfolio

Module:3 Emotional Intelligence - L1 – Transactional Analysis and 12 hours


Brain storming and Psychometric Analysis and Rebus
Puzzles/Problem Solving
Introduction, Contracting, ego states, Life positions, Individual Brainstorming, Group
Brainstorming, Stepladder Technique, Brain writing, Crawford's Slip writing approach, Reverse
brainstorming, Star bursting, Charlette procedure, Round robin brainstorming, Skill Test, Personality
Test, More than one answer, Unique ways

Module:4 Quantitative Ability-L3 – Permutation-Combinations and 14 hours


Probability and Geometry and mensuration and
Trigonometry and Logarithms and Functions and Quadratic
Equations and Set Theory
Counting, Grouping, Linear Arrangement, Circular Arrangements, Conditional Probability,
Independent and Dependent Events, Properties of Polygon, 2D & 3D Figures, Area & Volumes,
Heights and distances, Simple trigonometric functions, Introduction to logarithms, Basic rules of
logarithms, Introduction to functions, Basic rules of functions, Understanding Quadratic Equations,
Rules & probabilities of Quadratic Equations, Basic concepts of Venn Diagram.

Module:5 Reasoning ability-L3 – Logical reasoning and Data Analysis 7 hours

M.Sc. Data Science - Curriculum Page 22


and Interpretation
Syllogisms, Binary logic, Sequential output tracing, Crypto arithmetic, Data Sufficiency, Data
interpretation-Advanced, Interpretation tables, pie charts & bar chats

Module:6 Verbal Ability-L3 – Comprehension and Logic 7 hours


Reading comprehension, Para Jumbles, Critical Reasoning (a) Premise and Conclusion, (b)
Assumption & Inference, (c) Strengthening & Weakening an Argument
Total Lecture hours: 45 hours

Reference Books
 1Michael Farra and JIST Editors(2011) Quick Resume & Cover Letter Book: Write and Use
. an Effective Resume in Just One Day. Saint Paul, Minnesota. Jist Works
 . Daniel Flage Ph.D(2003) The Art of Questioning: An Introduction to Critical Thinking.
London. Pearson
 David Allen( 2002) Getting Things done : The Art of Stress -Free productivity. New
York City. Penguin Books.
 . FACE(2016) Aptipedia Aptitude Encyclopedia.Delhi. Wiley publications
 ETHNUS(2013) Aptimithra. Bangalore. McGraw-Hill Education Pvt. Ltd.
Websites:
1. www.chalkstreet.com
2. www.skillsyouneed.com
3. www.mindtools.com
4. www.thebalance.com
5. www.eguru.ooo
Mode of Evaluation: FAT, Assignments, Projects, Case studies, Role plays,
3 Assessments with Term End FAT (Computer Based Test)
Recommended by Board of Studies 09.06.2017
Approved by Academic Council No. 45 Date 15.06.2017

M.Sc. Data Science - Curriculum Page 23


Course Code Course Title L T P J C
SET5001 Science, Engineering and Technology Project– I 0 0 0 0 2
Pre-Requisite Syllabus Version
Anti-Requisite 1.10
Course Objectives:
 To provide opportunity to involve in research related to science / engineering
 To inculcate research culture
 To enhance the rational and innovative thinking capabilities

Expected Course Outcome: Student will be able to


 Identify a research problem and carry out literature survey
 Analyse the research gap and formulate the problem
 Interpret the data and synthesize research findings
 Report research findings in written and verbal forms
Modalities / Requirements
1. Individual or group projects can be taken up
2. Involve in literature survey in the chosen field
3. Use Science/Engineering principles to solve identified issues
4. Adopt relevant and well-defined / innovative methodologies to fulfill the specified
objective
5. Submission of scientific report in a specified format (after plagiarism check)
Student Assessment : Periodical reviews, oral/poster presentation
Recommended by Board of Studies 17.08.2017
Approved by Academic Council No. 47 Date 05.10.2017

M.Sc. Data Science - Curriculum Page 24


Course Code Course Title L T P J C
SET5002 Science, Engineering and Technology Project– II 0 0 0 0 2
Pre-Requisite Syllabus Version
Anti-Requisite 1.10
Course Objectives:
 To provide an opportunity to involve in research related to science/engineering
 To inculcate research culture
 To enhance the rational and innovative thinking capabilities

Expected Course Outcome: Student will be able to


 Identify a research problem and carry out a literature survey
 Analyse the research gap and formulate the problem
 Interpret the data and synthesize research findings
 Report research findings in written and verbal forms
Modalities / Requirements
6. Individual or group projects can be taken up
7. Involve in literature survey in the chosen field
8. Use Science/Engineering principles to solve identified issues
9. Adopt relevant and well-defined / innovative methodologies to fulfill the specified objective
10. Submission of scientific report in a specified format (after plagiarism check)
Student Assessment : Periodical reviews, oral/poster presentation
Recommended by Board of Studies 17.08.2017
Approved by Academic Council No. 47 Date 05.10.2017

M.Sc. Data Science - Curriculum Page 25


Course Code Course Title L T P J C
SET5003 Science, Engineering and Technology Project– III 0 0 0 0 2
Pre-Requisite Syllabus Version
Anti-Requisite 1.10
Course Objectives:
 To provide an opportunity to involve in research related to science/engineering
 To inculcate research culture
 To enhance the rational and innovative thinking capabilities

Expected Course Outcome: Student will be able to


 Identify a research problem and carry out a literature survey
 Analyse the research gap and formulate the problem
 Interpret the data and synthesize research findings
 Report research findings in written and verbal forms
Modalities / Requirements
11. Individual or group projects can be taken up
12. Involve in the literature survey in the chosen field
13. Use Science/Engineering principles to solve identified issues
14. Adopt relevant and well-defined/innovative methodologies to fulfil the specified objective
15. Submission of a scientific report in a specified format (after plagiarism check)
Student Assessment: Periodical reviews, oral/poster presentation
Recommended by Board of Studies 17.08.2017
Approved by Academic Council No. 47 Date 05.10.2017

M.Sc. Data Science - Curriculum Page 26


Course Code Course Title L T P J C
RES5001 Research Methodology 2 0 0 0 2
Pre-Requisite Nil Syllabus Version
1.0
Course Objectives:
 Impart skills to develop a research topic and design
 Define a purpose statement, a research question or hypothesis, and a research objective
 Analyze the data and arrive at a valid conclusion
 Compile and present research findings

Expected Course Outcome: student will be able to


 Explain the basic aspects of research and its ethics
 Outline research problems, their types and objectives
 Formulate good research designs and carry out statistically relevant sampling
 Collect, collate, analyze and interpret data systematically
 Experiment with animals ethically
 Make use of literature and other search engines judiciously for research purposes

Module:1 Introduction and Foundation of Research 2 hours


Meaning, Objectives, Motivation, Utility for research. Concept of theory, empiricism, deductive
and inductive theory. Characteristics of scientific method –Understanding the language of research.

Module:2 Problem identification and formulation 4 hours


Scientific Research: Problem, Definition, Objectives, Types, Purposes and components of
Research problem

Module:3 Research Design 4 hours


Concept and Importance in Research : Features of a good research design, Exploratory
Research Design and Descriptive Research Designs

Module:4 Sampling 6 hours


Sampling methods, Merits and Demerits. Observation methods, Sampling Errors (Type I and Type
II). Determining size of the sample. Experimental Design: Concept of Independent &
Dependent variables.

Module:5 Data analysis and Reporting 6 hours


Fundamentals of Statistical Analysis and Inference, Multivariate methods, Concepts of
Correlation and Regression; Research Reports: Structure, Components, Types and Layout of
Research report and articles, Writing and interpreting research results, Figures and Graphs

Module:6 Animal handling 2 hours


Guidelines-animal ethical committee, animal models, various routes of drug administrations,
LD50, ED50

Module:7 Use of encyclopedias and tools in research 4 hours


Research Guides, Handbook, Academic Databases for Biological Science Discipline. Methods to
search required information effectively.

M.Sc. Data Science - Curriculum Page 27


Module:8 Contemporary issues: 2 hours
Lecture by Industry Experts
Total Lecture hours: 30 hours
Text Book(s)
 Catherine Dawson, Introduction to research methods : a practical guide for anyone
undertaking a research project, Oxford : How To Books, Reprint 2010
 Julius S. Bendat, Allan G. Piersol, Random Data: Analysis and Measurement Procedures,
4thEdition, ISBN: 978-1-118-21082-6, 640 pages, September, 2011
 Research in Medical and Biological Sciences, 1st Edition, From Planning and Preparation
to Grant Application and Publication, Editos: Petter Laake Haakon Benestad Bjorn Olsen,
ISBN: 9780128001547, Academic Press, March 2015
Reference Book(s)
 John
1 Creswell, Research Design: Qualitative, Quantitative, and Mixed Methods Approaches,
Fourth
. Edition March, 2013
Mode of Evaluation: CAT / Assignment / Quiz / FAT / Project / Seminar
Recommended by Board of Studies 03.08.2017
Approved by Academic Council No. 46 Date 24.08.2017

M.Sc. Data Science - Curriculum Page 28


Course Code Course Title L T P J C
MDT6099 Master’s Thesis 0 0 0 0 14
Pre-Requisite As per the Academic Regulations Syllabus Version
1.0
Course Objectives:
To provide sufficient hands-on learning experience related to the area of specialization with a
focus on research orientation.

Expected Course Outcome: Students will be able to


 Formulate specific problem statements for ill-defined real-life problems with
reasonable assumptions and constraints.
 Perform a literature search and/or patent search in the area of interest.
 Develop a suitable solution methodology for the problem
 Conduct experiments / Design & Analysis / solution iterations and document the
results
 Perform error analysis / benchmarking/costing
 Synthesise the results and arrive at scientific conclusions/products/solution
 Document the results in the form of technical report/presentation
Mode of Evaluation: Periodic reviews, Presentation, Final oral viva, Poster submission
Recommended by Board of Studies 10.09.2019
Approved by Academic Council No. 56 Date 24.09.2019

M.Sc. Data Science - Curriculum Page 29


Programme Core

M.Sc. Data Science - Curriculum Page 30


Course Code Course Title L T P J C
MAT 5011 Matrix theory and Linear Algebra 3 0 0 0 3
Pre-Requisite Syllabus Version
1.1
Course Objectives:
 Understand the basic concepts of matrix algebra and its applications.
 Solving computational problems of linear algebra.

Expected Course Outcomes:


At the end of the course students will be able to:
 Understand basic matrix properties like rank, determinant, inverse and a special type of
matrices
 Introduce Gaussian / Gauss-Jordan elimination methods, LU factorisation technique
 Use computational techniques for singular value decomposition (Computational and
Algebraic Skills).
 Understand the concepts of vector space and subspaces.
 Find the matrix representation of a linear transformation given bases of the relevant vector
spaces.
 Compute inner products on a real vector space and compute angle and orthogonality in
inner product spaces.
 Understand the use of linear algebra and matrices in several important, modern applications
of research and industrial problems involving statistics.

Module: 1 Matrix theory 6 hours


Algebra of Matrices, Trace and Rank of a Matrix and their properties, Determinants, Inverse,
Eigen values and Eigen vectors, symmetric, orthogonal and idempotent matrices and their
properties

Module:2 Matrix Factorization 6 hours


Gauss elimination, row canonical form, diagonal form, triangular form, Gauss-Jordan-LU
decomposition, solving systems of linear equations.

Module:3 Decomposition of Matrices 6 hours


Spectral decomposition, singular value decomposition, Quadratic forms, definiteness and related
results with proofs.

Module:4 Vector Spaces 6 hours


Vector Spaces, Subspaces, Basis and dimension of a vector space, linear dependence and linear
independence, spanning set.

Module:5 Linear transformation 6 hours


Linear transformation, kernel, range, Matrix Representation of a linear transformation, rank-
nullity theorem, change of basis and similar matrices.

Module:6 Inner product spaces 6 hours


Inner-product spaces, orthogonal sets and bases, Orthogonal Projection, Gram-Schmidt
orthogonalization process.

M.Sc. Data Science - Curriculum Page 31


Module:7 Applications in Statistics 7 hours
Generalized inverses (g-inverses), Methods of constructing g-inverses, general solution to a
system of linear equations. Sparse matrices, Linear Discriminant Analysis and Canonical
Correlation Analysis.

Module:8 Contemporary issues 2 hours


Lecture by Industry Experts.

Total Lecture hours: 30 hours


Tutorial  A minimum of 5 problems to be worked out by student 15 hours
in every tutorial class.
 Another 5 problems per tutorial class to be given as
home work.
Text Book(s)
 Gilbert Strang, Introduction to linear algebra, 5/e., Wellesley-Cambridge, 2016.
 David C. Lay, Linear Algebra and Its Applications, Pearson, 5/e 2019.
Reference Book(s)
 G. Allaire and S. M. Kaber. Numerical Linear Algebra, Texts in Applied
Mathematics, Springer, 2008.
 L. Hogben, Handbook of Linear Algebra, CRC Press/Taylor & Francis Group, 2014.
 Friedberg, S., Insel, A., and Spence, L., Linear Algebra, 5/e, Pearson, 2019.
Nick Fieller, "Basics of Matrix Algebra for Statistics with R", CRC Press, 2015.
Mode of Evaluation: CAT, Quiz, Assignment and FAT.
Recommended by Board of Studies 24.06.2020
Approved by Academic Council No. 59 Date 24.09.2020

M.Sc. Data Science - Curriculum Page 32


Course Code Course Title L T P J C
MAT5012 Probability Theory and Distributions 3 0 2 0 4
Basic knowledge of sets, sample space, probability space,
Pre-Requisite Syllabus Version
measure space, probability measure and calculus.
1.1
Course Objectives:
 To incorporate the concepts of probability theory and its applications as the core material in
building theoretical ideas along with the practical notion.
 To integrate the intrinsic ideas of preliminary and advanced distributions to correlate with the
real-world scenarios.
Expected Course Outcome:
At the end of the course students will be able to:
 Develop problem-solving techniques needed to calculate probability and conditional probability.
 Formulate fundamental probability distribution and density functions, as well as functions of
random variables, derive the probability density function of transformations.
 Derive the expectation and conditional expectation, and describe their properties.
 Understand various types of generating functions used in statistics.
 Describe commonly used univariate discrete and continuous probability distributions.
 Apply sampling distributions to testing of hypotheses.
 Translate and correlate the statistical problems into Statistical analysis

Module:1 Probability and Random variables 8 hours


Introduction – Random Experiments, Empirical basis of probability, Algebra of events, laws of
probability; Conditional Probability, Independence, Bayes’ law; Application of probability to business
and economics. One-dimensional Random variable- Discrete and Continuous; Distribution functions and
its properties; Bivariate Random Variables- Joint Probability functions, marginal distributions,
conditional distribution functions; Notion of Independence of Random variables

Module:2 Functions of Random Variables 6 hours


Functions of random variables: introduction, distribution function technique, transformation
technique: one variable, transformation technique: several variables, theory and applications.

Module:3 Mathematical Expectation 6 hours


Expectation, Variance, and Co-variance of random variables; Conditional expectation and
conditional variance; Markov, Holder, Jensen and Chebyshev’s Inequality; Weak Law of Large
numbers, Strong law of large numbers and Kolmogorov theorem; Central Limit Theorem.

Module:4 Generating Functions 5 hours


Probability generating function (p.g.f.), moment generating function (m.g.f.), characteristic function
(c.f.); Properties and Applications. Probability distributions of functions of random variables: one and
two dimensions.

Module:5 Discrete Distributions 7 hours


Bernoulli, Binomial, Poisson, Geometric, Hypergeometric, Negative Binomial, Multinomial,
distributions and Discrete Uniform distribution - definition, properties and applications with numerical
problems.

M.Sc. Data Science - Curriculum Page 33


Module:6 Continuous Distributions 7 hours
Uniform, Normal distribution function, Exponential, Gamma, Beta distributions (First and Second
kind), Weibull, Cauchy and Laplace distributions, lognormal, logistic, Pareto and Rayleigh
distribution functions - definition, properties and applications; concept of truncated distributions.

Module:7 Sampling Distributions 4 hours


Introduction, The sampling distribution of the Mean: Finite Populations, Sampling distribution of the
proportion: Finite Populations, distribution of sample variance, the chi-square distribution, the t
distribution, the F distribution, order statistics: properties, and applications, procedure of hypothesis
testing.

Module:8 Contemporary issues 2 hours


Lecture by Industry Experts.
Total Lecture hours: 45 hours
Text Book(s)
 Sheldon Ross; A First Course in Probability, Pearson, 2014.
 Parimal Mukhopadhyay; An Introduction to the Theory of Probability, World scientific, 2012.
 Irwin Miller, Marylees Miller, John E. Freund’s; Mathematical Statistics, Pearson, 2017
Reference Book(s)
 1FetsjeBijma, Marianne Jonker and Aad van der Vaart; Introduction to Mathematical Statistics,
. Amsterdam University Press, 2018.
 Krishnamoorthy, K., Handbook of Statistical Distributions with Applications, Chapman &
Hall/CRC, 2006.
 Rohatgi, V.K. and Ebsanes Saleh, A.K. Md., An introduction to Probability and Statistics, 2nd
Ed., John Wiley & Sons, 2002.
 Shanmugam, R., Chattamvelli, R. Statistics for scientists and engineers, John Wiley, 2015.
Mode of Evaluation: CAT, Quiz, Assignment and FAT.
List of Challenging Experiments (Indicative): Using Computational software’s like MS-Excel/MS-
Solver/R/Python/Minitab etc.
Introduction to computational procedure, import and export of data, data
1. processing, tabulation and visualization of data and charts, Diagrammatical 4 hours
Presentation of data.
2. Various plots and graphical Presentation of Statistical Data 4 hours
3. Computation of descriptive Statistics and summarizing the data 4 hours
Computational methods of discrete distributions and generating random
4. 2 hours
numbers using standard distributions.
Normal distribution : calculation of probabilities, fitting of normal data and
5. 4 hours
related applications
Binomial distribution: Calculation of probabilities, fitting of binomial data and
6. 4 hours
related applications on real time data.
Poisson distribution: Calculation of probabilities, fitting of Poisson data and
7. 2 hours
related applications on real time data.
Exponential distribution: Calculation of probabilities, fitting of exponential data
8. 2 hours
and related applications on real time data.
Gamma distribution: Calculation of probabilities, fitting of Gamma data and
9. 2 hours
related applications on real time data.

M.Sc. Data Science - Curriculum Page 34


Beta distribution: Calculation of probabilities, fitting of Beta data and related
10. 2 hours
applications on real time data.
Total Laboratory hours 30 hours
Mode of evaluation: Continuous assessment and FAT.
Recommended by Board of Studies 24.06.2020
Approved by Academic Council No. 59 Date 24.09.2020

M.Sc. Data Science - Curriculum Page 35


Course Code Course Title L T P J C
MAT5013 Statistical Inference 3 0 2 0 4
Pre-requisite Nil Syllabus Version
1.1
Course Objectives:
 Understand the types of questions that the statistical method addresses for decision making.
 Apply statistical methods to hypotheses testing and inference problems.
 Interpret the results in a way that addresses the question of interest.
 Use data to make evidence-based decisions that are technically sound.
 Communicate the purposes of the analyses, the findings from the analysis, and the implications of
those findings.
Expected Course Outcomes:
At the end of the course students will be able to:
 Understand the notion of a parametric model and point estimation of the parameters of those
models and properties of a good estimator.
 Learn the approaches to point estimation of parameters.
 Understand the concept of interval estimation and confidence intervals.
 Basic concepts in tests of hypotheses.
 Understand and apply large-sample tests.
 Use small-sample tests of hypotheses.
 Discuss nonparametric tests of hypotheses.
 Translate and correlate the statistical analysis into Statistical inference

Module:1 Introduction 9 hours


Population, sample, parameter and statistic; characteristics of a good estimator; Consistency –
Invariance property of Consistent estimator, Sufficient condition for consistency; Unbiasedness;
Sufficiency – Factorization Theorem – Minimal sufficiency; Efficiency – Most efficient estimator,
likelihood equivalence, Uniformly minimum variance unbiased estimator, applications of Lehmann-
Scheffe’s Theorem, Rao - Blackwell Theorem and applications.

Module:2 Point Estimation 6 hours


Point Estimation- Estimator, Estimate, Methods of point estimation – Maximum likelihood method
(the asymptotic properties of ML estimators are not included), Large sample properties of ML
estimator(without proof)- applications , Method of moments, method of least squares, method of
minimum chi-square and modified minimum chi-square-Asymptotic Maximum Likelihood
Estimation and applications

Module:3 Interval Estimation 4 hours


Confidence limits and confidence coefficient; Duality between acceptance region of a test and a
confidence interval; Construction of confidence intervals for population proportion (small and large
samples) and between two population proportions(large samples); Confidence intervals for mean
and variance of a normal population; Difference between the mean and ratio of two normal
populations.

M.Sc. Data Science - Curriculum Page 36


Module:4 Testing of hypotheses 6 hours
Types of errors, power of a test, most powerful tests; Neyman-Pearson Fundamental Lemma and its
applications; Notion of Uniformly most powerful tests; Likelihood Ratio tests: Description and
property of LR tests - Application to standard distributions.

Module:5 Large sample tests 4 hours


Large sample properties; Tests of significance (under normality assumption)- Test for a population
mean, proportion; Test for equality of two means, proportions; Test for variance,Test for correlation,
Test for Regression.

Module:6 Small sample tests 6 hours


Student’s t-test, test for a population mean, equality of two population means, paired t-test, F-test
for equality of two population variances; Chi-square test for goodness of fit and test for
independence of attributes, χ2 test for testing variance of a normal distribution

Module:7 Non-parametric tests 8 hours


Sign test, Signed rank test, Median test, Mann-Whitney test, Run test and One sample Kolmogorov
–Smirnov test ,Kruskal – Wallis H test(Description, properties and applications only).

Module:8 Contemporary issues 2 hours


Lecture by Industry Experts.
Total Lecture hours: 45 hours

Text Book(s)
 Manoj
1 Kumar Srivastava and Namita Srivastava, Statistical Inference – Testing of
Hypotheses,
. Prentice Hall of India, 2014.
 Robert
. V Hogg, Elliot A Tannis and Dale L.Zimmerman, Probability and Statistical
Inference,9th edition,Pearson publishers, 2013.
Reference Book(s)
 Marc S. Paolella, Fundamental statistical inference: A computational approach, Wiley, 2018.
 B. K. Kale and K. Muralidharan, Parametric Inference, Narosa Publishing House, 2016.
 Miller, I and Miller, M, John E. Freund's Mathematical statistics with Applications, Pearson
Education, 2002.
 Rao, C.R., Linear Statistical Inference and its applications, 2 nd Edition, Wiley Eastern, 1973.
 Gibbons, J.D., Non-Parametric Statistical Inference, 2/e,Marckel Decker, 1985.
 Bansilal, Sanjay Arora and Sudha Arora, Introducing Probability and Statistics, 2/e, Satya
Prakash Publications, 2006.
 George Casella and Roger L.Berger: , Statistical Inference, 2 nd edition,Casebound Engelska,
2002.
Mode of Evaluation: CAT, Quiz, Assignment and FAT.
List of Experiments
1 Calculating Confidence intervals, p-value 2 hours
2 Large Sample Tests- Test for Population mean & Population proportions 4 hours
3 Small Sample Tests – t – test for population mean, Paired t test 4 hours
4 F- test for population variances 2 hours
5 Chi-square test for goodness of fit and Independence of Attributes 4 hours

M.Sc. Data Science - Curriculum Page 37


6 Computation of - consistent estimator, unbiased estimators and their
2 hours
variances.
7 Computation of ML estimator by Iterative method/Method of scoring,
2 hours
computation of estimators for grouped data applying the ML.
8 Minimum χ2 and modified minimum χ2 2 hours
9 Computation of least squares estimator - calculation of standard errors of
2 hours
estimators
10 Test for correlation coefficient & Non-parametric Tests 6 hours
Total Laboratory hours 30 hours
Mode of evaluation: Continuous assessment and FAT.
Recommended by Board of Studies 24.06.2020
Approved by Academic Council No. 59 Date 24.09.2020

M.Sc. Data Science - Curriculum Page 38


Course Code Course Title L T P J C
MAT5016 Time Series Analysis and Forecasting 3 0 2 0 4
Pre-Requisite NIL Syllabus Version
1.0
Course Objectives:
 To equip various forecasting techniques and familiarize on modern statistical methods for
analyzing time-series data.
 To amalgamate the intellectual facts of the time series data to implement in the field projects
scientifically.
 To link time-dependent analytical tools and building the models by extracting real-time data.
Expected Course Outcomes:
On completion of the course, students will be able to
 understand the fundamental advantages and apply essential of forecasting techniques
 apply an appropriate forecasting method in any given situation.
 apply non-stationary methods in real-time problems.
 forecast with better statistical models based on statistical data analysis
 learn and apply variance transformation techniques
 understand the application of frequency-domain time series analysis.

Module:1 Exploratory analysis of Time Series 4 hours


Graphical display, classical decomposition model, Components and various decompositions of Time
Series Models-Numerical description of Time Series: Stationarity, Autocovariance and
Autocorrelation functions - Data transformations - Methods of estimation –Trend, Seasonal and
exponential.

Module:2 Smoothing Techniques 6 hours


Moving Averages: Simple, centered, double and weighted moving averages; single and double
exponential smoothing – Holt’s and winter’s methods - Exponential smoothing techniques for series
with trend and seasonality-Basic evaluation of exponential smoothing.

Module:3 Stationary models 6 hours


Time series data, Trend, seasonality, cycles and residuals, Stationary, White noise processes,
Autoregressive (AR), Moving Average (MA), Autoregressive and Moving Average (ARMA) and
Autoregressive Integrated Moving Average (ARIMA) processes, Choice of AR and MA periods.

Module:4 Non-stationary time series models 9 hours


Tests for Nonstationarity: Random walk –random walk with drift –Trend stationary –General Unit
Root Tests: Dickey Fuller Test, Augmented Dickey Fuller Test. ARIMA Models: Basic formulation
of the ARIMA Model and their statistical properties - Autocorrelation function (ACF), Partial
autocorrelation function (PACF) and their standard errors.

Module:5 Forecasting 6 hours


Nature of Forecasting – Forecasting methods- qualitative and quantitative methods – Steps involved
in stochastic model building – Forecasting model evaluation. Model selection techniques: AIC, BIC

M.Sc. Data Science - Curriculum Page 39


and AICC – Forecasting model monitoring.

Module:6 Transfer function and Intervention analysis 6 hours


Transfer function models- Transfer function – noise models; Cross correlation function; Model
specification; Forecasting with Transfer function – noise models; Intervention analysis.

Module:7 Spectral analysis 6 hours


Spectral density function (s. d. f.) and its properties, s. d. f. of AR, MA and ARMA processes, Fourier
transformation and periodogram.

Module:8 Contemporary issues 2 hours


Lecture by Industry Experts.
Total Lecture hours: 45 hours
Text Book(s)
 1Douglas C. Montgomery, Cheryl L. Jennings, Murat Kulahci, Introduction to Time Series
.Analysis and Forecasting, Second Ed., Wiley, 2016.
 2George E. P. Box, Gwilym M. Jenkins, Gregory C. Reinsel, Greta M. Ljung, Time Series
.Analysis: Forecasting and Control, Fifth Ed., Wiley, 2016.
Reference Books
 1Brockwell, P. J., & Davis, R. A., Introduction to time series and forecasting, Third Ed.,
Springer, 2016.
 Terence C. Mills, Applied Time Series Analysis: A Practical Guide to Modeling and
Forecasting, Academic Press, 2019.
Mode of Evaluation: CAT, Quiz, Digital Assignment and FAT.
List of Challenging Experiments (Indicative)
1 Visualization of Stationary and Non-stationary time series 4 hours
2 Moving Average Time Series Model and Differencing 4 hours
3 Exponential smoothing technique (Single, double and triple) 4 hours
4 Auto-Regressive Model for Stationary Time Series 4 hours
5 Autoregressive Integrated Moving Average for Non- Stationary Time Series 4 hours
6 Forecasting With Univariate Models 4 hours
7 Transfer Functions and Autoregressive Distributed Lag Modeling 4 hours
8 Spectral density function 2 hours
Total Laboratory hours 30 hours
Mode of Evaluation: Continuous assessment and FAT.
Recommended by Board of Studies 10.09.2019
Approved by Academic Council No. 56 Date 24.09.2019

M.Sc. Data Science - Curriculum Page 40


Course Code Course Title L T P J C
MAT5017 Multivariate Data Analysis 3 0 2 0 4
Knowledge of Fundamental of Statistics, Matrices and Linear
Pre-Requisite Syllabus Version
Algebra
1.0
Course Objectives:
The objective of the course is to make the student:
 Understand the fundamental concepts of Multivariate Data Analysis / Multivariate Statistical
Analysis.
 Conversant with various methods and techniques used in summarization and analysis of
multivariate data.
 Prepare for investigation of multivariate data and examine the possible diagnostics in multivariate
methods.
 Formulate real time problem in a form of multivariate model.
 Develop feasible solution of real-life problems, using multivariate methods and techniques.
 Conduct research using multivariate data analysis techniques.
Expected Course Outcome:
At the end of the course students will be able to:
 Learn to develop an in-depth understanding of the Multivariate models, methods and techniques.
 Demonstrate the knowledge and skill of multivariate normal distributions, related probability
distributions and their applications.
 Examine the relationships between dependent and independent variables of multivariate models,
estimate the parameters and fit a model.
 Perform, handle and manipulate the analysis of discriminant function and logistic regression.
 Apply the method and analysis of principal components, factor analysis and dimension reduction
of sample data.
 Investigate the events of clustering and multidimensional scaling presence in sample data.
 Conduct the application of Structural Equation Modeling (SEM) to real-time observations.
 Research on real-time problems from various disciplines using multivariate data analysis.

Module:1 Introduction to Multivariate Data Analysis 5 hours


Multivariate data and their diagrammatic representation. Exploratory multivariate data analysis, sample
mean vector, sample dispersion matrix, sample correlation matrix, graphical representation, means,
variances, co-variances, correlations of linear transforms, six step approach to multivariate model
building. Introduction to multivariate linear regression, logistic regression, principal component analysis,
factor analysis, cluster analysis, canonical analysis and canonical variables, structured equation modeling
(SEM).

Module:2 Multivariate Normal Distribution(MND) 8 hours


Introduction to multivariate normal distribution, probability density function and moment generating
function of multivariate normal distribution, singular and nonsingular normal distributions, distribution of
linear and quadratic form of normal variables, marginal and conditional distributions. Random sampling
from multivariate normal distributions. Goodness of fit of multivariate normal distribution. Wishart
matrix-its distribution and properties.

M.Sc. Data Science - Curriculum Page 41


Module:3 Multivariate Linear Model and Analysis of Variance and 8 hours
Covariance
Maximum likelihood estimation of parameters, tests of linear hypothesis, distribution of partial and
multiple correlation coefficients and regression coefficients. Multivariate linear regression, multivariate
analysis of variance of one and two way classification data (only LR test). Multivariate analysis of
covariance. Hoteling 𝑇 2 and Mahalanobis 𝐷2 applications in testing and confidence set construction.

Module:4 Multiple Discriminant Analysis and Logistic Regression 7 hours


Discriminant model and analysis: a two group discriminant analysis, a three group discriminant analysis,
the decision process of discriminant analysis( objective, research design, assumptions, estimation of the
model, assessing overall fit of a model, interpretation of the results, validation of the results). Logistic
Regression model and analysis: regression with a binary dependent variable, representation of the binary
dependent variable, estimating the logistic regression model, assessing the goodness of fit of the
estimation model, testing for significance of the coefficients, interpreting the coefficients.

Module:5 Principal Components and common Factor Analysis 5 hours


Population and sample principal components, their uses and applications, large sample inferences,
graphical representation of principal components, Biplots, the orthogonal factor model, dimension
reduction, estimation of factor loading and factor scores, interpretation of factor analysis.

Module:6 Cluster Analysis and Multidimensional Scaling 5 hours


Concepts of cluster analysis and multidimensional scaling, similarity measures, hierarchical clustering
methods, Ward’s hierarchical clustering method’s, nonhierarchical clustering methods, K-means
methods. Clustering based on statistical models, multidimensional scaling and correspondence analysis,
perceptual mapping.

Module:7 Structural Equation Modelling (SEM) 5 hours


Concept of structural equation modeling, Confirmatory factor analysis, canonical correlation analysis,
conjoint analysis.

Module:8 Contemporary issues 2 hours


Lecture by Industry Experts.

Total Lecture Hours: 45 hours


Text Book(s)
 Hardly W.K. and Simor L., Applied Multivariate Statistical Analysis, 4 th Edition, Springer-
 Verlag, 2015.
Richard A. Johnson and Dean W. Wichern, Applied Multivariate Statistical Analysis,
Prentice hall India, 7th Edition, 2019.
Reference Books
 Joseph F. Hair, Jr., William C. Black, Barry J. Babin, Rolph E. Anderson and Ronald L. Tatham,
Multivariate Data Analysis, 7th Edition, Pearson Education India, 2014.
 Rao, C. R. and Rao, M. M., Multivariate Statistics and Probability, Elsevier & Academic Press,
2014.
 Kshirsagar, A. M., Multivariate Analysis, Marcel Dekkar, 2006.
 Anderson T.W., An Introduction to Multivariate Statistical Analysis, John Wiley & sons, 3rd
Edition, 2009.
 Bhuyan, K. C., Multivariate Analysis and its Applications, New Central book Agency Pvt. Ltd.,

M.Sc. Data Science - Curriculum Page 42


2005.
 Weisberg S., Applied Linear Regression, 4th Edition, Wiley, 2013.
 Kollo T., and Rosen D. Von, Advanced Multivariate Statistical Analysis with Matrices,
Springer, New York, 2005.
Mode of Evaluation: CAT , Quiz , Assignment and FAT.
List of Challenging Experiments (Indicative) using packages, software’s and other scientific devices
MLE of mean vector and variance-covariance matrix from the normal 4 hours
1 .population. Generating random numbers from a multivariate normal
distribution.
2 .Hoteling 𝑇 2 and Mahalanobis 𝐷2 4 hours
3 .Computation of principal components and conducting factor analysis 4 hours
4 .Fitting a multivariate linear regression model and its interpretation. 4 hours
5 .Error analysis, outliers detection and related tests 2 hours
6 .Estimation, fitting and validating a logistic regression model. 4 hours
7 .Classification between two normal populations using discriminant analysis. 2 hours
8 .Cluster analysis 2 hours
9 .Computation of canonical variables and correlation 2 hours
10 0Structural Equation Modeling and related computations 2 hours
Total Laboratory hours 30 hours
Mode of assessment: Continuous Assessment and FAT.
Recommended by Board of Studies 24.06.2020
Approved by Academic Council No. 59 Date 24.09.2020

M.Sc. Data Science - Curriculum Page 43


Course Code Course Title L T P J C
MAT6002 Regression Analysis and Predictive Modelling 3 0 2 0 4
Pre-Requisite MAT5012 - Probability Theory and Distributions Syllabus Version
1.0
Course Objectives:
 Develop an understanding of regression analysis and model building.
 Provide the ability to develop relationship between variables
 Investigate possible diagnostics in regression techniques
 Formulate feasible solution using regression model for real-life problems.
Expected Course Outcome:
At the end of the course students will be able to:
 develop in-depth understanding of the linear and nonlinear regression model.
 demonstrate the knowledge of regression modeling and model selection techniques.
 examine the relationships between dependent and independent variables.
 estimate the parameters and fit a model.
 investigate possible diagnostics in regression modeling and analysis.
 validate the model using hypothesis testing and confidence interval approach.
 understand the generalizations of the linear model to binary and count data.

Module:1 Simple Regression Analysis 6 hours


Introduction to a linear and nonlinear model. Ordinary Least Square methods. Simple linear
regression model, using simple regression to describe a linear relationship. Fitting a linear trend to
time series data, Validating simple regression model using t, F and p test. Developing confidence
interval. Precautions in interpreting regression results.

Module:2 Multiple Regression Analysis 6 hours


Concept of Multiple regression model to describe a linear relationship, Assessing the fit of the
regression line, inferences from multiple regression analysis, problem of overfitting of a model,
comparing two regression model, prediction with multiple regression equation.

Module:3 Fitting Curves and Model Adequacy Checking 6 hours


Introduction, fitting curvilinear relationship, residual analysis, PRESS statistics, detection and
treatment of outliers, lack of fit of the regression model, test of lack of fit, Problem of autocorrelation
and heteroscedasticity. Estimation of pure errors from near neighbors.

Module:4 Transformation techniques 5 hours


Introduction, variance stabilizing transformations, transformations to linearize the model, Box-Cox
methods, transformations on the repressors variables, Generalized and weighted least squares, Some
practical applications.

Module:5 Multicollinearity 7 hours


Introduction, sources of multicollinearity, effects of multicollinearity. Multicollinearity diagnostics:
examination of correlation matrix, variance Inflation factors (VIF), Eigen system analysis of X 1X.
Methods of dealing with Multicollinearity: collecting additional data, model re-specification, and
ridge regression.

M.Sc. Data Science - Curriculum Page 44


Module:6 Generalized Linear Models 7 hours
Generalized linear model: link functions and linear predictors, parameter estimation and inference
in the GLM, prediction and estimation with the GLM, Residual Analysis, and concept of over
dispersion.

Module:7 Model building and Nonlinear Regression 6 hours


Variable selection, model building, model misspecification. Model validation techniques: Analysis
of model coefficients, and predicted values, data splitting method. Nonlinear regression model,
nonlinear least squares, transformation to linear model, parameter estimation in nonlinear system,
statistical inference in nonlinear regression.

Module:8 Contemporary issues 2 hours


Lecture by Industry Experts.
Total Lecture hours: 45 hours

Text Book(s)
 Douglas C. Montgomery, Elizabeth A. Peck, G. Geoffrey Vining, Introduction to Linear
Regression Analysis, Third Ed., Wiley India Pvt. Ltd., 2016.
 Norman R. Draper, Harry Smith; Applied Regression Analysis, WILEY India Pvt. Ltd.
New Delhi; Third Edition, 2015.
Reference Books
 . Johnson, R A., Wichern, D. W., Applied Multivariate Statistical Analysis, Sixth Ed., PHI
learning Pvt., Ltd., 2013.
 . Iain Pardoe, Applied Regression Modeling, John Wiley and Sons, Inc, 2012.
Mode of Evaluation: CAT, Quiz, Assignment and FAT
List of Challenging Experiments
Correlation Analysis using- scatter diagram, Karl Pearson’s correlation 2 hours
1.
coefficient and drawing inferences.
Simple linear regression: model fitting, estimation of parameters, 4 hours
2.
computing R2 and adjusted R2 and model interpretation.
3. Residual analysis and forecast accuracy for a given data set. 2 hours
4. Validating Simple linear regression using t, F and p- test. 4 hours
Developing confidence interval and testing the model simple and multiple 4 hours
5.
regression.
Multiple regression: estimation of parameters, fitting of the model, error 4 hours
6.
analysis, model validation, variable selection and testing.
7. Problem of multicollinearity and, determination of VIF. 2 hours
Diagnostic measures and outliers detection, Durbin Watson test, variable 4 hours
8.
selection and model building
9. Autocorrelation, auto regressive model. 2 hours
10. 1 Fitting of nonlinear regression model. 2 hours
Total Laboratory hours: 30 hours
Mode of assessment: Continuous Assessment and FAT
Recommended by Board of Studies 10.09.2019
Approved by Academic Council No. 56 Date 24.09.2019

M.Sc. Data Science - Curriculum Page 45


Programme Elective

M.Sc. Data Science - Curriculum Page 46


Course Code Course Title L T P J C
MAT6003 Programming for Data Science 0 0 4 0 2
Pre-Requisite MAT5012 – Probability Theory and Distributions Syllabus Version
1.0
Course Objectives:
 Formulate simple problems, and code a high-level appropriate programme for data science.
 Acquire knowledge of standard data visualization and formal inference procedures to
interpret the results.
 To develop complex statistical models to assess data and apply to real-world contexts.
Expected Course Outcome:
At the end of the course students will be able to:
 develop relevant programming techniques of moderate complexity and execute in data
science.
 demonstrate the proficiency in statistical data analysis of inferential methods and interpret
the results contextually.
 apply data science concepts and methods to solve problems in real-world contexts
 integrate data from disparate sources and transform in relational databases.

List of Challenging Experiments (Indicative)


1 Introduction to Python – Keywords, identifiers, I/O statements. 4 hours
2 Sequence and File operations. 4 hours
3 Functions, loops, Modules, errors and exceptions. 4 hours
Data Manipulation- Basic Functionalities, Merging, Concatenation of data
4 6 hours
objects, Exploring a Dataset and Analysing a dataset.
5 Data visualization – Graphical and diagrammatical presentation 4 hours
6 Descriptive statistical analysis – evaluation, plotting and interpretation. 4 hours
7 Evaluation of probability using various distribution functions 4 hours
Correlation – Simple, Partial and Multiple Correlations for linear and non-
8 6 hours
linear data.
9 Regression – Simple, Multiple Regression and linear models. 6 hours
Test for normality and homogeneity of variance-Inferential Statistics for
10 6 hours
Single through multiple samples.
Experimental Design: One way ANOVA-two way ANOVA- Multiple
11 6 hours
comparison tests
Time series analysis – White noise, AR, MA, ARMA, ARIMA, ACF and
12 6 hours
PACF.
Total Laboratory hours: 60 hours
Text Book(s)
 1Jake VanderPlas, Python Data Science Handbook - Essential Tools for Working with Data,
O’Reily Media, 2017.
 Zhang.Y, An Introduction to Python and Computer Programming, Springer, 2016.
Reference Book(s)
 Nelli, F., Python Data Analytics: With Pandas, NumPy, and Matplotlib, 2 Ed., Apress,
nd

 2018.
Samir Madhavan, Mastering Python for Data Science, Packt Publishing Ltd., 2015.
Mode of Evaluation: Continuous assessment and FAT

M.Sc. Data Science - Curriculum Page 47


Recommended by Board of Studies 10.09.2019
Approved by Academic Council No. 56 Date 24.09.2019

Course Code Course Title L T P J C


MAT6004 Computational Statistics for Data Science 0 0 4 0 2
Pre-Requisite MAT5013 - Statistical Inference Syllabus Version
1.0
Course Objectives:
 Use of software packages for statistical theory towards computing environment.
 To enhance the theoretical concepts and its application in the real-time domain.
Expected Course Outcomes:
Students will be able to
 use software tools for projects in data management.
 apply technical skills in the statistical data analysis to transform a simple to multiple
variables.
 understand the statistical decision-making theory and interpretation.
 analyze and solve real-time problems

List of Challenging Experiments (Indicative)


1 Data Management – Handling Big data sets and variable selection 6 hours
2 Descriptive statistics and their interpretation 8 hours
3 Tabulation of Data and Cross Tabulation 6 hours
4 Correlation analysis 8 hours
5 Regression analysis 8 hours
2
6 Testing of the hypothesis ( 𝑍, 𝑡, 𝐹 and 𝜒 - tests) 8 hours
7 Non-parametric tests 8 hours
8 Design and analysis of experiments 8 hours
Total Laboratory hours: 60 hours
Text Book(s)
 McCormick,
1 Keith; Salcedo, Jesus, SPSS statistics for data analysis and visualization, Wiley,
 .2017.
K. V. S. Sarma, Statistics Made Simple Do It Yourself, 2nd Ed, Prentice-Hall, 2010.
Reference Book(s)
 Murtaza
1 Haider, Getting Started with Data Science: Making Sense of Data with Analytics,
 IBM Press, 2015.
J.P. Verma, Data Analysis in Management with SPSS Software, Springer, 2013.
Mode of Evaluation: Continuous Assessment and FAT.
Recommended by Board of Studies 10.09.2019
Approved by Academic Council No. 56 Date 24.09.2019

M.Sc. Data Science - Curriculum Page 48


Course Code Course Title L T P J C
MAT6005 Machine Learning for Data Science 3 0 2 0 4
Pre-Requisite MAT5010 – Foundations of Data Science Syllabus Version
1.0
Course Objectives:
 Lay the foundation of machine learning and its practical applications and prepare students for
real-time problem-solving in data science.
 Develop self-learning algorithms using training data to classify or predict the outcome of
future datasets.
 Distinguish overtraining and techniques to avoid it such as cross-validation.
Expected Course Outcome:
At the end of the course students will be able to:
 understand the most popular machine learning algorithms
 analyze and perform an evaluation of learning algorithms and model selection.
 compare the strengths and weaknesses of many popular machine learning approaches
 appreciate the underlying mathematical relationships within and across machine learning
algorithms and the paradigms of supervised and unsupervised learning.
 design and implement various machine learning algorithms in a range of real-world
applications.

Module:1 Introduction to Machine Learning 2 hours


The origins of machine learning-How machines learn - Machine learning in practice- Exploring and
understanding state-of-the-art methods.

Module:2 Classification 6 hours


Learning Associations-Classification-Regression- Decision Trees - Reinforcement Learning- Probably
Approximately Correct Learning (PAC)- Noise-Learning -Multiple classes-Model Selection and
Generalization- Support Vector Machines.

Module:3 Parametric Methods 5 hours


Introduction to Parametric methods-Maximum Likelihood Estimation: Bernoulli, binomial, Poisson
distributions - Gaussian Density. Evaluating an Estimator: Bias and Variance-The Bayes Estimator-
Parametric Classification.

Module:4 Nonparametric Methods 8 hours


Introduction-Nonparametric Density Estimation: Histogram Estimator-Kernel Estimator-K-Nearest
Neighbour Estimator-Generalization to Multivariate Data-Nonparametric classification-Distance
Based Classification-Outlier Detection.

Module:5 Multivariate Methods 8 hours


Multivariate Data-Parameter Estimation-Estimation of Missing Values- Expectation-Maximization
algorithm -Multivariate Normal Distribution- Multivariate Classification-Tuning Complexity-Discrete
Features.

M.Sc. Data Science - Curriculum Page 49


Module:6 Dimensionality Reduction 8 hours
Introduction- Subset Selection-Principal Component Analysis, Feature Embedding-Factor Analysis-
Singular Value Decomposition-Multidimensional Scaling- Canonical Correlation Analysis.

Module:7 Supervised Learning and Unsupervised Learning 6 hours


Linear Discrimination: Introduction- Generalizing the Linear Model-Geometry of the Linear
Discriminant- Linear Discriminant Analysis- Pairwise Separation-Gradient Descent-Logistic
Discrimination. Clustering: Introduction, K-Means Clustering- Mixtures of Latent Variable Models-
Spectral Clustering-Hierarchical Clustering-Clustering, Choosing the number of Clusters.

Module:8 Contemporary issues 2 hours


Lecture by Industry Experts.
Total Lecture hours: 45 hours
Text Book(s)
 E. Alpaydin, Introduction to Machine Learning, 3 Edition, MIT Press, 2015.
rd

 Pratap Dangeti, Statistics for Machine Learning, Packt Publishing, 2017.


Reference Book(s)
 1C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2016
 .K. P. Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 2012
Mode of Evaluation: CAT, Quiz, Digital Assignment and FAT
List of Challenging Experiments (Indicative)
1 Exploring and Understanding data and formats 2 hours
2 Classification techniques using Decision Trees 4 hours
3 Support Vector Machines 4 hours
4 Clustering Algorithms 4 hours
5 Computation of missing values and multivariate classification 4 hours
6 Dimensionality reduction: A factor analysis. 4 hours
7 Discriminant analysis 4 hours
8 Canonical Correlation analysis 4 hours
Total Laboratory hours: 30 hours
Mode of evaluation: Continuous Assessment and FAT.
Recommended by Board of Studies 10.09.2019
Approved by Academic Council No. 56 Date 24.09.2019

M.Sc. Data Science - Curriculum Page 50


Course Code Course Title L T P J C
MAT6007 Deep Learning 2 0 2 0 3
Pre-Requisite NIL Syllabus Version
1.0
Course Objectives:
 To introduce the fundamentals of neural networks as well as some advanced topics such as
recurrent neural networks, long/short term memory cells and convolutional neural
networks.
 To introduce complex learning models and deep learning models
 To explore various learning models using different software packages
Expected Course Outcome:
On completion of the course, students will be able to
 understand the fundamentals of deep learning and build deep learning models
 Apply the most appropriate deep learning method in any given situation.
 Develop neural network models in data-intensive real-time problems.
 Develop efficient generative models
 Learn and apply convolutional and recurrent neural network techniques.

Module:1 Introduction 4 hours


What is neural network, Biological Neuron, Idea of computational units, McCulloch–Pitts unit and
Thresholding logic, Linear Perceptron, Perceptron Learning Algorithm, Convergence theorem for
Perceptron Learning Algorithm, Linear separability, feed-forward networks, input, hidden and
output layers, organization and architecture of neural networks, linear and nonlinear networks

Module:2 Training algorithms for Feedforward networks 5 hours


Learning the weights, Cost functions, Back-propagation algorithms, gradient descent algorithm,
unit saturation, heuristics to avoid local optima, accelerated algorithms, Multilayer Perceptron,
Empirical Risk Minimization, regularization, autoencoders

Module:3 Deep Neural Networks 4 hours


Architectures, Properties of CNN representations: invertibility, stability, invariance, convolution,
pooling of layers, CNN and Tensorflow, Difficulty of training deep neural networks, Greedy layer-
wise training.

Module:4 Better Training of Neural Networks 4 hours


Newer optimization methods for neural networks (Adagrad, adadelta, rmsprop, adam, NAG),
second order methods for training, Saddle point problem in neural networks, Regularization
methods (dropout, drop connect, batch normalization).

Module:5 Recurrent neural networks 4 hours


LSTM, GRU, Encoder-decoder architectures, Auto-encoders (standard, de-noising, contractive,
etc), Variational Autoencoders, kohonen SOM, : Back propagation through time, Long Short Term
Memory, Gated Recurrent Units, Bidirectional LSTMs, Bidirectional RNNs.

M.Sc. Data Science - Curriculum Page 51


Module:6 Deep Generative learning 4 hours
Dynamic memory models. Reinforcement learning, Restrictive Boltzmann Machines (RBMs),
Introduction to MCMC and Gibbs Sampling, gradient computations in RBMs, Deep Boltzmann
Machine., deep belief networks, convolutional networks, LeNet, AlexNet

Module:7 Recent trends 3 hours


Variational Auto-encoders, Generative Adversarial Networks, Multi-task Deep Learning, Multi-
view Deep Learning

Module:8 Contemporary issues 2 hours


Lecture by Industry Experts.
Total Lecture hours: 30 hours
Text Book(s)
 Bengio, Yoshua, Ian Goodfellow, Aaron Courville, Deep learning, MIT press, 2016.
Reference Book(s)
 Raúl Rojas, Neural Networks: A Systematic Introduction, 2nd edition, 1996.
 Bishop C., neural networks for pattern recognition, Oxford university press, 2015.
Mode of Evaluation: CAT, Quiz, Assignment and FAT.
List of Challenging Experiments (Indicative)
1 Setting up a neural network in memory 6 hours
2 Backpropagation training experiment 6 hours
3 Recurrent NN 6 hours
4 Experiment: Object recognition 6 hours
5 Experiment: Highway sign recognition 6 hours
Total Laboratory hours: 30 hours
Mode of assessment: Continuous assessment and FAT
Recommended by Board of Studies 24.06.2020
Approved by Academic Council No. 59 Date 24.09.2020

M.Sc. Data Science - Curriculum Page 52


Course Code Course Title L T P J C
MAT6008 Artificial Intelligence for Data Science 2 0 2 0 3
Pre-Requisite NIL Syllabus Version
1.0
Course Objectives:
 The main purpose of this course is to provide the most fundamental knowledge to the students
so that they can understand AI.
 To provide the foundations for AI problem-solving techniques and knowledge representation
formalisms.
Expected Course Outcome:
On completion of the course, students will be able to
 Ability to identify and formulate appropriate AI methods for solving a problem
 Ability to implement AI algorithms
 Ability to Identify the type of AI problem (search, inference, decision making under
uncertainty, game theory, etc).
 Ability to compare the difficulty of different versions of AI problems, in terms of
computational complexity and the efficiency of existing algorithms.

Module:1 Introduction 3 hours


The AI problems, AI technique, philosophy and development of Artificial intelligence.

Module:2 Problem Spaces and Search 4 hours


State-space search, Uninformed and informed search techniques: BFS, A*, variations of A*. Local
search and optimization: hill-climbing, simulated annealing.

Module:3 Adversarial Search and Game Playing 4 hours


Minimax algorithm, alpha-beta pruning, stochastic games, Constraint- satisfaction problems.

Module:4 Knowledge and Reasoning 5 hours


Logical agents, Propositional logic, First-order logic, Inference in FoL: forward chaining, backward
chaining, resolution, Knowledge representation: Frames, Ontologies, Semantic web and RDF.

Module:5 Introduction to PROLOG 4 hours


Facts and predicates, data types, goal finding, backtracking, simple object, compound objects, use
of cut and fail predicates, recursion, lists, simple input/output, dynamic database.

Module:6 Uncertain knowledge and reasoning 4 hours


Probabilistic reasoning, Bayesian networks, Fuzzy logic

Module:7 Natural Language Processing 4 hours


An Introduction to Natural language Understanding, Perception, Learning.

Module:8 Contemporary issues 2 hours

M.Sc. Data Science - Curriculum Page 53


Lecture by Industry Experts.
Total Lecture hours: 30 hours

Text Book(s)
 Elaine
1 Rich, Kevin Knight, Artificial Intelligence, 3/Ed., Tata McGraw Hill, 2017.
 Dan
. W. Patterson, Introduction to AI and ES, Pearson Education, 2015.
Reference Book(s)
 Deepak Khemani, Artificial Intelligence, Tata Mc Graw Hill Education, 2017.
 Stuart Russel, Peter Norvig, Artificial Intelligence, 3/Ed, Perason, 2015.
 N.P. padhy: Artificial Intelligence and Intelligent Systems, Oxford Higher Education,
OxfordUniversity Press, 2005.
 Ivan Bratko, PROLOG Programming, 4/Ed. Pearson Education, 2020.
Mode of Evaluation: CAT, Quiz, Digital Assignment and FAT.
List of Challenging Experiments (Indicative)
1 Study of facts, objects, predicates and variables in PROLOG 2 hours
2 Study of Rules and Unification in PROLOG 2 hours
3 Study of “cut” and “fail” predicate in PROLOG 2 hours
4 Study of arithmetic operators, simple input/output and compound goals in 4 hours
PROLOG
5 Study of recursion in PROLOG 4 hours
6 Study of Lists in PROLOG 2 hours
7 Study of dynamic database in PROLOG 2 hours
8 Study of string operations in PROLOG (Implement string operations like 4 hours
substring, string position, palindrome etc.)
9 Write a prolog program to maintain family tree 4 hours
10 Write a prolog program to implement all set operations (Union, intersection, 4 hours
complement etc.)
Total Laboratory hours 30 hours
Mode of Evaluation: Continuous assessment and FAT.
Recommended by Board of Studies 24.06.2020
Approved by Academic Council No. 59 Date 24.09.2020

M.Sc. Data Science - Curriculum Page 54


Course Code Course Title LT P J C
MAT6009 Design and Analysis of Experiments 3 0 2 0 4
Pre-Requisite MAT5013 – Statistical Inference Syllabus Version
1.0
Course Objectives
 Describe how to design experiments, carry them out, and analyze the data they yield.
 Construct appropriate experimental designs for given problems: sample size determination,
choice of levels of variables, designs with restrictions on randomization, utility functions for
measuring design objectives, use of simulation to characterize properties of designs.
Expected Course Outcome
 Describe the purpose of robust construction and how it is applied in experimental design
 To formulate and validate the experimental designs in agricultural, medical, biomedical
projects
 Avails them to fetch the background concepts of Model formulation and validation
 To accomplish research-oriented concepts given for statistical techniques required for
experimental designs

Module:1 Basic Principles of Experimental design 2 hours


Strategy of Experimentation - Applications of Experimental Design – Basic Principles – Guidelines
for designing experiments.

Module:2 Simple Comparative Experiments 8 hours


Principles of scientific experimentation – Basic Designs: Completely Randomized Design (CRD),
Randomized Block Design (RBD) and Latin Square Design (LSD) – Analysis of RBD (with one
observation per cell, more than one but equal number of observations per cell).

Module:3 Analysis of Co-variance 6 hours


Multiple Comparisons – Multiple Range Tests - Analysis of Covariance – Construction of
Orthogonal Latin Square – Analysis of Graeco Latin Squares.

Module:4 Factorial experiments 8 hours


2 3 2 3
Factorial experiments - 2 , 2 and 3 , 3 experiments and their analysis - Fractional replication in
Factorial Experiments.

Module:5 Confounding 6 hours


Necessity of confounding, Types of confounding, complete and partial confounding in 2 n, 32 and 33-
factorial designs, Analysis of confounded factorial designs; Fractional Replication.

Module:6 Balanced Incomplete Block design 6 hours


Balanced Incomplete Block Design (BIBD)– Types of BIBD – Simple construction methods –
Concept of connectedness and balancing – Intra Block analysis of BIBD.

Module:7 Partially Balanced Incomplete Block design 6 hours


Partially Balanced Incomplete Block Design with two associate classes – intra block analysis - Split
plot and strip plot design and their analysis.

M.Sc. Data Science - Curriculum Page 55


Module:8 Contemporary issues 2 hours
Lecture by Industry Experts.
Total Lecture hours 45 hours
Text Book(s)
 Douglas
1 C. Montgomery, Design and Analysis of Experiments, 9h Edition, John Whiley and
 Sons,
2 2017.
Angela Dean and Daniel Voss Danel Draguljić, Design and Analysis of Experiments, 2 nd
Edition, Springer International Publishing AG, 2017.
Reference Books
 Das
1 M.N. and Giri N.C., Design and Analysis of Experiments, 3rd Edition, New Age
 International
2 (P) Ltd., 2017.
John Lawson, Design and Analysis of Experiments with R, 1 st Edition, CRC Press, 2015.
Mode of Evaluation: CAT, Quiz, Digital Assignment and FAT
List of Challenging Experiments (Indicative)
1 One-way analysis of variance - CRD 2 hours
2 RBD & LSD analysis of one and two observations 4 hours
3 Analysis of Co-variance CRD & RBD 4 hours
4 Analysis of Graeco Latin Squares 4 hours
5 Factorial experiments 4 hours
6 Confounding 4 hours
7 BIBD and PBIBD 4 hours
8 Split plot design 4 hours
Total Laboratory hours 30 hours
Mode of Evaluation: Continuous assessment and FAT
Recommended by Board of Studies 24.06.2020
Approved by Academic Council No. 59 Date 24.09.2020

M.Sc. Data Science - Curriculum Page 56


Course Code Course Title L T P J C
MAT6010 Optmization Techniques 3 2 0 0 4
Pre-Requisite NIL Syllabus Version
1.0
Course Objectives:
 To familiarize the students with some basic concepts of optimization techniques and
approaches.
 To formulate a real-world problem as a mathematical programming model.
 To develop the model formulation and applications are used in solving decision problems.
 To solve specialized linear programming problems like the transportation and assignment
 Problems.
Expected Course Outcome:
Student will be able to
 apply operations research techniques like linear programming problem in industrial
optimization problems.
 solve allocation problems using various OR methods.
 understand the characteristics of different types of decisionmaking environment and the
appropriate decision making approaches and tools to be used in each type.
 recognize competitive forces in the marketplace and develop appropriate reactions based
on existing constraints and resources.

Module:1 Introduction to Operations Research 6 hours


Introduction-Mathematical models of Operation Research-Scope and applications of Operation
Research-Phases of Operation Research study-Characteristics of Operation Research-Limitations
of Operation Research.

Module:2 Linear Programming 6 hours


Introduction –Properties of Linear Programming-Basic assumptions-Mathematical formulation of
Linear Programming-Limitations or constraints-Methods for the solution of LP Problem-Graphical
analysis of LP-Graphical LP Maximization problem-Graphical LP Minimization problem.

Module:3 Linear Programming Models 7 hours


Simplex Method-Basics of Simplex Method-Formulating the Simplex Method-Simplex Method
with two variables-Simplex Method with more than two variables-Big M Method.

Module:4 Dual Linear Programming 6 hours


Introduction- Primal and Dual problem -Dual problem properties-Solution techniques of Dual
problem-Dual Simplex method-Relations between direct and dual problem-Economic
interpretation of Duality.

Module:5 Transportation and Assignment Models 6 hours


Introduction: Transportation problem-Balanced-Unbalanced-Methods of basic feasible solution-
Optimal solution-MODI method. Assignment problem-Hungarian Method.

Module:6 Network Analysis 6 hours


Basic concepts-Construction of Network-Rules and precautions-CPM and PERT Networks-
Obtaining of critical path. Probability and cost consideration. Advantages of Network.

M.Sc. Data Science - Curriculum Page 57


Module:7 Theory of Games 6 hours
Introduction-Terminology-Two Person Zero-Sum game-Solution of games with saddle points and
without saddle points-2X2 games-dominance principle – mX2 and 2Xn games-Graphical method.

Module:8 Contemporary issues 2 hours


Lecture by Industry Experts.
Total Lecture hours: 45 hours
 A minimum of 5 problems to be worked out by students in every
Tutorial tutorial class 15 hours
 Another 5 problems per tutorial class to be given as a home work
Text Book(s)
 1Hamdy Taha, Operations Research, 10th edition, Prentice Hall India, 2019.
 2P. K. Gupta and D. S. Hira, Operations Research, S. Chand & co., 2007.
Reference Books
 1S.D. Sharma, Operations Research, Nath & Co., Meerut, 2000.
 Maurice Solient, Arthur Yaspen, Lawrence Fridman, OR methods and Problems, New Age
2International Edition, 2003.
 3J K Sharma, Operations Research Theory & Applications, 3e, Macmillan India Ltd., 2007.
 P. Sankara Iyer, Operations Research, Tata McGraw-Hill, 2008.
 4A Ravindran, Don T Philips and James J Solberg, Operations Research: Principles and
Practice, 2nd edition, John Wiley and sons, 2007.
Mode of Evaluation: CAT, Quiz, Assignment and FAT.
Recommended by Board of Studies 24.06.2020
Approved by Academic Council No. 59 Date 24.09.2020

M.Sc. Data Science - Curriculum Page 58


Course Code Course Title L T P J C
MAT6011 Statistical Quality Control 3 0 2 0 4
Pre-Requisite NIL Syllabus Version
1.0
Course Objectives:
 To understand different control charts for analyzing industrial quality experiments.
 To amalgamate the intellectual facts of the quality characteristics to implement in the Industrial
experiments scientifically.
 To link and analyse the various sampling schemes to find the plan for quality inspection.
Expected Course Outcome:
On completion of the course, students will be able to
 understand the fundamental advantages and apply essential of Control charts
 apply appropriate Charts for the industrial experiments.
 apply some standard distributions for construction of sampling plans.
 able to construct the AOQL plans for normal inspection scheme.
 learn and apply variance transformation techniques.
 understand the difference between sampling plans for attributes and variables.

Module:1 Control Charts 4 hours


Introduction to Quality control; control charts for mean – CUSUM chart – technique of V-mask –
Weighted Moving average charts – multivariate control charts – Hotelling’s T2-control charts and
Economic design of X-bar chart.

Module:2 Process Capability analysis 8 hours


Process Capability analysis: Meaning, Estimation technique for capability of a process –Capability
Indices: Process capability ratios Cp; Cpk, Cpm, Cmk, Cpc – Process capability analysis using a
control chart – Process capability analysis using design of experiments.

Module:3 Acceptance Sampling 6 hours


Acceptance sampling – Terminologies – Attribute sampling plan by attributes – Single sampling plan
and Double sampling plan – OC, ASN, AOQ, AOQL and ATI curves –MILSTD -105E Tables.

Module:4 Acceptance sampling variables 6 hours


Acceptance sampling variables for process parameter – Sequential plans for process parameter (σ
known and unknown) – Sampling variables for proportion non-conforming – � method, K method.

Module:5 Double Sampling methods 6 hours


Double specification limits – M-method, Double sampling by variables - MILSTD -414 Tables –
Continuous Sampling plan – CSP-1, CSP-2, CSP-3, Wald and Wolfowitz SP-A.

Module:6 Attribute Sampling plans 6 hours


Producers risk, Consumers Risk, designing single sampling plan for stipulated Producers and
consumers risk,OC curves under Normal,Tightened and reduces inspection,Single, Double and
Multiple sampling plans in AQL systems.

Module:7 Six-Sigma 7 hours

M.Sc. Data Science - Curriculum Page 59


Concept of six sigma, methods of six sigma, DMAIC methodology, DFSS methodology, six-sigma
control chart, case studies.

Module:8 Contemporary issues 2 hours


Lecture by Industry Experts.
Total Lecture hours 45 hours
Text Book(s)
 1Eugene L.Grant Richard S. Leavenworth, Statistical Quality Control,7 edition,McGraw Hill
Education,India, 2017.
 2Douglas C. Montgomery, Introduction to Statistical Quality Control, Seventh Edition, John
Wiley and Sons, New York. 2013.
Reference Book(s)
 1Edward G. Schilling, Dean V. Neubauer, Acceptance Sampling in Quality Control, Second
Edition, Taylor & Francis, 2009.
 2Poornima M.Charantimath,Total quality Management, 3/e, Pearson India Limited, 2017.
Mode of Evaluation: CAT, Quiz, Digital Assignment and FAT.
List of Challenging Experiments (Indicative)
1 Mean and Range charts: Experimental control charts for process control. 4 hours
2 Control chart for nonconformities. 4 hours
3 A control chart for nonconformities per unit with variable subgroup size. 4 hours
4 C chart used to control errors on forms. 2 hours
5 Acceptance decisions based on plotted frequency distributions. 4 hours
6 AOQL inspection to produce quality improvement. 4 hours
7 Construction of rectifying inspection using AOQL normal inspection plans 4 hours
8 Acceptance sampling under standard sampling plans. 4 hours
Total Laboratory hours 30 hours
Mode of Evaluation: Continuous assessment and FAT
Recommended by Board of Studies 24.06.2020
Approved by Academic Council No. 59 Date 24.09.2020

M.Sc. Data Science - Curriculum Page 60


Course Code Course Title L T P J C
MAT6012 Programming for Data Analysis 2 0 4 0 4
Pre-Requisite NIL Syllabus Version
1.0
Course Objectives:
 To introduce core programming basics required for data science using Python language
 To read and write simple Python programs
 To develop Python programs with conditionals and loops
 To use Python data structures – lists, tuples, dictionaries
 To introduce the important data science modules NumPy, SciPy and Matplotlib
 To introduce the input/output with files in Python and statistical processing of a data using
Pandas
Expected Course Outcome:
At the end of the course students will be able to:
 Read, write, execute simple Python programs
 Decompose a Python program into functions
 Manipulate with 1-d,2-d and multidimensional data using Python
 Read and write data from/to files in Python programs
 Develop algorithmic solutions to data science related problems
Module:1 Algorithmic Problem Solving 3 hours
Algorithms, building blocks of algorithms (statements, state, control flow, functions); algorithmic
problem solving; iteration, recursion. Illustrative problems: finding minimum in a list, guess an integer
number in a range, factorial of a number.

Module:2 Data, Expressions, Statements in Python 4 hours


Python Strengths and Weakness; Installing Python; IDLE - Spyder – Jupyter; Mutable and Immutable
Data Types, Naming Conventions; String Values; String Operations; String Slices; String Operators;
String functions – split, join, chr, ord; Numeric Data Types; Arithmetic Operators and Expressions;
Comments in the Program; Understanding Error Messages.

Module:3 Data Collection and Language Component of Python 4 hours


List; Tuples; Sets; Dictionaries; Sorting Dictionaries; Control Flow and Syntax; Indenting; The if
statement; Relational Operators; Logical Operators; Bit-wise Operators; The while Loop – break and
continue statements; The for Loop; List Comprehension.

Module:4 Functions and Modules in Python 4 hours


Functions - Introduction; Defining your own functions; parameters; local and global scope; passing
collections to a function; variable number of arguments; passing functions to a function; Lambda
function; map; filter; Modules: Introduction; Standard Modules – sys, math, time.

Module:5 Python Modules for Data Science – I 5 hours


NumPy arrays – 1-d, multidimensional arrays and matrices; Mathematical operations with arrays;
Slicing and addressing arrays; Boolean masks; Difference between lists and arrays; SciPy – Scientific
Computing library of Python – Introduction, Basic functions, Special functions, scipy.integrate,
scipy.optimize, scipy, interpolate.

M.Sc. Data Science - Curriculum Page 61


Module:6 Python Modules for Data Science – II 5 hours
Python Plotting: PyPlot – Basic Plotting; Logarithmic Plots; Plots with multiple axes; Matplotlib –
interactive functions 3d plotting; Pandas – Introduction, DataFrame, Reading and writing CSV, XLS
files, Working with missing data, categorical data, data visualization with pandas.

Module:7 Error Handling in Python 3 hours


Handling IO Exceptions, Metadata, Errors, Runtime Errors, Exception Model.

Module:8 Contemporary issues 2 hours


Lecture by Industry Experts.
Total Lecture Hours 30 hours
Text Book(s)
 David J. Pine, Introduction to Python for Science and Engineering, CRC Press, 2019.
 Jake vander Plas, Python Data Science Handbook – Essential Tools for Working with Data,

O’Really Media, 2017.


Reference Book(s)
 Robert Johansson, Numerical Python – Scientific Computing and Data Science Applications
with NumPy, SciPy
 And Matplotlib, Apress, 2019
Robert Sedgewick, Kevin Wayne, Robert Dondero, Introduction to Programming in Python: An
 Inter-disciplinary
Approach, Pearson India Education Services Pvt. Ltd., 2016
Nelli, F., Python Data Analytics: with Pandas, NumPy and Matplotlib, Apress, 2018.
Mode of Evaluation: CAT, Quiz, Digital Assignment and FAT.
List of Challenging Experiments (Indicative)
1. Python Program Environment, IDLE, Jupyter, Spyder environments 4 hours
First Basic Experiment(s): (i) “Hello World!” Program in IDLE, Jupyter, Spyder
Environments.
(ii) Program(s) to demonstrate the Python data types
2. Python Operators, Expressions and Flow Controls 4 hours
Simple Experiment(s): (i) Program to demonstrate the Python operators and their
order of preference.
(ii) Program to add/multiply/divide two numbers
(iii) Program to verify whether a given number is even or odd
Perfection: Program to verify whether a given number is Armstrong number or
not. A number is said to Armstrong number if sum of the cubes of individual
digits of that number is equal to the number itself. Viz., 153 = 1^3 + 5^3 + 3^3
3. Python Lists, Tuples, Dictionaries & Sets 6 hours
Simple Experiment: Write a Python program which demonstrate the use of Lists,
Tuples Dictionaries and Sets. This program should accepts the elements into
various types and perform the other operations such as append, copy, extend,
pop, remove operations.
4. Python Functions, Modules and Packages 4 hours
Simple Experiment(s): Write a function file which accepts a set of numbers and
displays the largest among them
Perfection: Write a function which accepts a number ‘n’ and list the first ‘n’

M.Sc. Data Science - Curriculum Page 62


Fibonacci numbers
Challenging: Create a own module in Python which includes functions such as
greeting() which greets a welcome message to user. This module should also
contain some variables and functions which finds the maximum among the two
given numbers.
5. Array and Matrix Manipulation in Python 4 hours
Simple Experiment: Write a Python program demonstrating the NumPy matrix
operations such as accepting two matrices finding the dimension, adding the two
matrices
Perfection: Write a Python program which accepts a matrix A of order m x p
another matrix B of order p x n and checks whether the matrix multiplication is
possible or not. If possible then finds matrix multiplication and displays it to
user.
6. Data Manipulation – SciPy Module 6 hours
Simple Experiment: Write a Python program to find the det, inv, eigenvalues and
eigenvectors of a matrix using corresponding SciPy module functions
Challenging: Create a data set consisting of time series observations of an
experiment. Using the interpolation techniques of SciPy module form an
interpolating polynomial and use it to estimate the experimental values for
intermediate values.
7. Data Visualization in Python – PyPlot Module 6 hours
Compare: Given the examination scores of students of three different classes for
the same subject taught by different professors, display them visually to aid
comparison of pass percentage, A grades etc.
8. Data Manipulation using Pandas – Exploring a Dataset and Analysing a Dataset 6 hours
Simple Experiments: Create a data frame consists of five countries, their capitals,
area of the country, population. The program should also print the description of
the data frame and finally save this data frame to a csv file.
Challenging: Write a Python program demonstrating the Pandas indexing
capabilities, identifying the null values in the dataset and filling them with or
dropping them from the dataset. Also demonstrate the merging, joining and
concatenating data frames using Pandas.
9. Descriptive Statistical Analysis – Evaluation, Plotting and Interpretation 6 hours
Linear Regression: Read a data frame in csv/xls format containing the weather
data such as pressure, min temp, max temp, humidity, rainfall. Using the
Pandas, MatPlotlib and SciPy plot the scatter plots and develop a linear
interpolation between rainfall with all other parameters and evaluate the
statistical significance of the model.
10. Evaluation of Probability using various Distributions Functions 6 hours
Simple Experiments: Write Python programs to generate a normal distribution,
binomial distribution and Poisson distribution using Python and visualize them.
Challenging: Write Python program to check the normality of a dataset, which a
foremost important test, required to determine whether to apply parametric tests
or nonparametric tests on the given test. These tests include Histogram,
Quantile-quantile plot, Shapiro-Wilk test, D’Agotino’s K-squared test,
Anderson-Darling test
11. Linear and Nonlinear Regression in Python 4 hours
Simple Linear Regression: Write a Python program to implement the Simple

M.Sc. Data Science - Curriculum Page 63


Linear Regression model to predict the wine quality using the physicochemical
and sensory variables by using Scikit-Learn module and estimate the statistical
significance of the model.
Nonlinear Linear Regression: Write a Python program to predict the price of oil
(OIL) from indicators such as the West Texas Intermediate (WTI) price, Henry
Hub gas price (HH), and the Mont Belvieu (MB) propane spot price. Data is
available for OIL, WTI, HH, and MB from the years 2000 to 2016 at the link
https://apmonitor.com/me575/uploads/Main/oil_data.txt. The OIL is related with
WTI, HH and MB nonlinearly as follows:
OIL = A (WTIB) (HHC) (MBD)
12. Decision Trees and Time Series Analysis in Python 4 hours
Programs to illustrate the use of decision trees in machine learning to develop
the decisions and their possible consequences. In this experiment we will use the
dataset related breast cancer to predict the breast cancer spread using decision
trees.
Total Laboratory Hours 60 hours
Mode of Evaluation: Continuous Assessment and FAT
Recommended by Board of Studies 24.06.2020
Approved by Academic Council No. 59 Date 24.09.2020

***************

M.Sc. Data Science - Curriculum Page 64

You might also like