DIP Lab 13 DBSCAN Clustering

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Department of Electrical Engineering

Faculty Member: Date:

Semester:

Digital Image Processing


Lab 13: DBSCAN

Lab Report Quiz/viva


Name Reg. 10 Marks 5 Marks
No

Lab#13: DBSCAN
Objectives
This laboratory exercise is focused DBSCAN clustering which is a widely used
unsupervised learning technique. Clustering is used on unlabeled data to look
for interesting groups and patterns.

Lab Instructions
 This lab activity comprises of following parts: Lab Exercises, and Post-Lab
Viva/Quiz session.
 The lab report shall be uploaded on LMS.
 Only those tasks that are completed during the allocated lab time will be credited
to the students. Students are however encouraged to practice on their own in spare
time for enhancing their skills.
Lab Report Instructions
All questions should be answered precisely to get maximum credit. Lab report must
ensure following items:
 Lab objectives
 Python codes
 Results (graphs/tables) duly commented and discussed
 Conclusion
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a


popular clustering algorithm in machine learning and data mining. It is
particularly useful for identifying clusters of data points in a dataset based on the
density of points in the feature space. Unlike k-means or hierarchical clustering,
DBSCAN doesn't require specifying the number of clusters beforehand and can
discover clusters of arbitrary shapes.
Here is the basic algorithm for DBSCAN:
1. Input:
 Dataset: D={x1,x2,...,xn}, where xi is a data point in the feature space.
 Parameters:
 Epsilon (ε): The maximum distance between two points for one to
be considered as in the neighborhood of the other.
 MinPts: The minimum number of points required to form a dense
region.

Exploring two concepts known as Density Reachability and Density Connectivity


helps in understanding these parameters.

Density Reachability, with respect to density, defines a point as reachable from


another if it is within a specific distance (epsilon) from it.

Density Connectivity, on the other hand, employs a transitivity-based chaining


approach to ascertain if points belong to a specific cluster. For instance, points p
and q may be connected if p->r->s->t->q, where a->b signifies that b is in the
neighborhood of a.

2. Algorithm:
 For each data point p in the dataset D:
 If p is not visited:
 Mark p as visited.
 Find all points in the ε-neighborhood of p (including p).
 If the number of points in the neighborhood is less than
MinPts, mark p as noise.
 Otherwise, create a new cluster and add p to the cluster.
 Expand the cluster by adding all reachable points in the ε-
neighborhood to the cluster.
3. Output:
 The algorithm identifies clusters of data points and marks some points
as noise if they don't belong to any cluster.
In the algorithm, a point q is considered to be in the ε-neighborhood of p if the
distance between p and q is less than or equal to ε. The algorithm classifies points
into three categories:
 Core points: Points with at least MinPts points in their ε-neighborhood.
 Border points: Points with fewer than MinPts points in their ε-neighborhood
but are reachable from a core point.
 Noise points: Points that are neither core nor border points.
Figure 1:Credit https://www.theaidream.com/post/dbscan-clustering-algorithm-in-machine-learning

DBSCAN has advantages such as being robust to outliers and capable of


discovering clusters of arbitrary shapes. However, it may struggle with datasets
of varying densities, and choosing appropriate values for ε and MinPts can be
challenging.
Figure 2: Credits: https://github.com/NSHipster/DBSCAN

Lab Task – Your Own Dataset ___________________________________________________


Download your own CSV dataset from the internet (e.g. Kaggle). Perform
DBSCAN clustering of your dataset and showcase the plots .
Lab Task 6 – Take home(optional)
Download your own CSV dataset from the internet e.g heatmap. Perform
Hierarchical clustering of your dataset and showcase the plots .

You might also like