Data Mining Handout

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Birla Institute of Technology and Science, Pilani

K.K. Birla Goa Campus

First Semester, 2018-19
Course Handout (Part-II)

Date: Aug 2, 2018

In addition to part I (General Handout for all courses appended to the time-table) this portion gives
further specific details regarding the course.

Course Title: Data Mining Instructor-in-charge: Hemant Rathore

Course No.: CS F415 Chamber No.: L-108
E-mail: [email protected]

Course Objective:
To gain a comprehensive understanding of various data mining technique (theoretical and practical
aspect) and the ability to compare their merits and demerits for solving real-world problems.

Course Description:
This course explores the concepts and techniques of data mining, a promising and flourishing frontier in
database systems. The scope of the course covers basic data mining tasks like data pre-processing,
exploratory data analysis, data quality measures, classification, clustering, and anomaly detection
techniques. This course is designed to provide students with a broad understanding of the design and
use of different data mining algorithms. The course also aims at providing a holistic view of data
mining. It will have database, statistical, algorithmic and application perspectives of data mining.
Furthermore, the objective of the course is to have hands-on on data mining algorithms.

Text Book:
T1 Pang-Ning Tan, Micheal Steinbach, Vipin Kumar, “Introduction to Data Mining”, Pearson,

Reference Books:
R1 Han J & Kamber M, “Data Mining: Concepts and Techniques”, Morgan Kaufmann
Publishers, 2001
R2 Hand D, Mannila H, & Smyth P, “Principles of Data Mining”, MIT Press, 2001
R3 Pujari A K, “Data Mining Techniques”, University Press (India), 2001
R4 Kimball R, “The Data Warehouse Toolkit”, 2e, John Wiley, 2002

Learning Objectives
LO1 Students will gain an understanding of Data Mining as a whole and its components.
LO2 Students will know data pre-processing techniques, their issues and possible
conventional solutions- Noise Reduction, Data Reduction, and Missing Values etc.
LO3 Students will have a detailed understanding of clustering and classification methods, their
limitations and applications.
LO4 Students will acquire knowledge about data warehousing, decision making, and association
rule mining algorithms.
LO5 After the course completion, students will be able to design and build real-world
applications using data mining algorithms.
Course Plan:
1 Introduction, Motivation, Plan, Evaluation, Policies
Introduction to Data Mining
 What is Data mining
 Motivation & challenges
 Data Mining Tasks
 Types of Data
4-5  Data quality
 Data Preprocessing
 Measures of Similarity & Dissimilarity
6-7 Exploratory Data Analysis
Cluster Analysis: Basic concepts and algorithms
 Overview
 K-Means
 Agglomerative and Divisive hierarchical clustering
 Cluster evaluation
Cluster Analysis: Additional Issues and Algorithms
 Characteristics of Data, Clusters and Clustering Algorithms
14-18  Prototype-based clustering
 Density-based Clustering
 Graph-based Clustering
 Basics
 General approach to solving a classification problem
19-21  Decision Tree Introduction
 Model overfitting
 Evaluating the performance of a classifier
 Methods of comparing classifiers
22 Course Pre-Summary for Mid-Semester Exam
Classification: Alternative Techniques
 Rule-based classifiers
 Nearest-neighbour classifiers
 Bayesian Classifiers
 Support vector machines
 Ensemble methods
Neural Networks
 Introduction and motivation
 Biological Neural Network
 Artificial Neural Network
 Learning and Training in NN
 Perceptron, backpropagation and its variants
Adversarial Machine Learning
33-36  Poisoning Attacks
 Evasion Attacks
Anomaly Detection
 Preliminaries
 Statistical Approaches
 Proximity-based outlier detection
 Density-based outlier detection
 Clustering based Techniques
41-42 Course Summary, Review for End-Semester exam

Component Nature Examination Schedule Weightage
Quiz – I/II/III Closed Book TBA 10%
As per the timetable
Mid Semester Closed Book 13/10/18, Saturday 30%
(4:00 PM - 5:30 PM)
Lab/Assignment Open Book TBA 20%
As per the timetable
Comprehensive Test TBA 07/12/18, Friday 40%

Office Hours:
Hemant Rathore: Every Saturday 10:00am – 12:00pm

 All notices concerning this course will be displayed on the course page of the Photon server.
 Follow-up with ID/ARC notices as well.

Make-up Policy:
 Quiz / Assignment: No Makeup
 Mid-Semester/Comprehensive Makeup:
o Only with prior permission (in written)
o Given only on justifiable ground
o Will not be given to attend any marriage/function etc.

Assignment Submission Format:

A zip file consisting of the followings:
 Portable source code:
o Must contain all required packages/libraries.
o Path for any required file(s) should not be local to your machine
o Instructor should be able to run your code after direct download.
 Report in PDF format (max 2 pages. 11pt. Times New Roman.)
o Insights, inferences, results and conclusions drawn from the assignment.
o Proper references to the source code and figures.
 Figures (depends on the type of the assignment)
o Self-explanatory caption to the figures. 1.jpg, q1.jpg, abc.jpg
 README.txt
o Step by step instructions to run your code.
o Download package 1, download xyz.jar, install MySQL

Assignment Submission Policy:

 Submission accepted through Photon only.
 No assignment will be accepted by email or after the deadline.

Plagiarism: Plagiarism will be checked for every submission with Turnitin.

 The rule is very simple
 If (Plagiarism % from Turnitin Report) > 30
o Will be awarded “Component Maximum Marks * -1”
Instructor in-charge
CS F415

You might also like