Lesson 4.1 - Unsupervised Learning Partitioning Methods
Lesson 4.1 - Unsupervised Learning Partitioning Methods
1 / 32
Unsupervised Learning: Density-based Methods
Roadmap
Introduction
DBSCAN Algorithm
OPTICS Algorithm
DENCLUE Algorithm
2 / 32
Unsupervised Learning: Density-based Methods Introduction
The Principle
Major features
Discover clusters of arbitrary shape
Handle noise (regions of low density)
One scan
Need of density parameters as termination condition
Several interesting studies
DBSCAN. Ester, et al. (KDD’96)
OPTICS. Ankerst, et al (SIGMOD’99).
DENCLUE. Hinneburg & D. Keim (KDD’98)
CLIQUE. Agrawal, et al. (SIGMOD’98) (more grid-based)
3 / 32
Unsupervised Learning: Density-based Methods Introduction
Introduction
4 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
Roadmap
Introduction
DBSCAN Algorithm
OPTICS Algorithm
DENCLUE Algorithm
5 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
DBSCAN Algorithm
DBSCAN Algorithm
Stands for Density-Based Spatial Clustering of Applications with
Noise
It is a density-based clustering algorithm
The algorithm grows regions with sufficiently high density into
clusters and discovers clusters of arbitrary shape in spatial
databases with noise
It defines a cluster as a maximal set of density-connected points
6 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
Definitions
-Neighborhood of an object
The neighborhood within a radius of a given object
Core object
If the -neighborhood of an object contains at least a minimum
number, MinPts, of objects
7 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
Definitions
8 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
Definitions
Example
q is density-reachable from p because q is directly
density-reachable from m and m is directly density-reachable from
p
p is not density-reachable from q because q is not a core object
9 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
Definitions
Example
p, q and m are all density connected
10 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
A given represented by the radius of the circles, and, say, let MinPts =
3.
11 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
1 Core objects
m, p, o, and r are core objects because each is in an
-neighborhood containing at least three points
2 Directly density-reachable objects
q is directly density-reachable from m
m is directly density-reachable from p and vice versa
3 Indirectly density-reachable objects
q is indirectly density-reachable from p because q is directly
density-reachable from m and m is directly densityreachable from p
However, p is not indirectly density-reachable from q because q is
not a core object
Similarly, r and s are indirectly density-reachable from o, and o is
indirectly density-reachable from r
4 Indirectly Density-connected objects
o, r, and s are all indirectly density-connected
12 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
Definitions
A density-based cluster
A set of density-connected objects that is maximal with respect to
density-reachability
Every object not contained in any cluster is considered to be noise
13 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
DBSCAN
Main steps:
1 Search for clusters by checking the -neighborhood of each
point in the database
2 If the -neighborhood of a point p contains at least MinPts, a new
cluster with p as a core object is created
3 Iteratively collect directly density-reachable objects from these
core objects, which may involve the merge of a few
density-reachable clusters
4 Terminate process when no new point can be added to any
cluster
14 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
Step 1. Step 2.
Step 3. Step 4.
15 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
Step 5. Step 6.
Step 7. Step 8.
16 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
DBSCAN Algorithm
17 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
Clusters
18 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
19 / 32
Unsupervised Learning: Density-based Methods DBSCAN Algorithm
MinPts=4, =9.92
20 / 32
Unsupervised Learning: Density-based Methods OPTICS Algorithm
Roadmap
Introduction
DBSCAN Algorithm
OPTICS Algorithm
DENCLUE Algorithm
21 / 32
Unsupervised Learning: Density-based Methods OPTICS Algorithm
OPTICS Principle
22 / 32
Unsupervised Learning: Density-based Methods OPTICS Algorithm
Definitions
Core-distance of an object
The core-distance of an object p is the smallest 0 that makes p a
core object
If p is not a core object, the core distance of p is undefined
Example (, MinPts=5)
0 is the core distance of p
It is the distance between p and the fourth closest object
23 / 32
Unsupervised Learning: Density-based Methods OPTICS Algorithm
Definitions
Reachability-distance of an object
The reachability-distance of an object q with respect to object to
object p is:
max(core − distance(p), Euclidian(p, q))
Example
Reachability − distance(q1 , p) = core − distance(p) = 0
Reachability − distance(q2 , p) = Euclidian(q2 , p)
24 / 32
Unsupervised Learning: Density-based Methods OPTICS Algorithm
OPTICS Algorithm
25 / 32
Unsupervised Learning: Density-based Methods DENCLUE Algorithm
Roadmap
Introduction
DBSCAN Algorithm
OPTICS Algorithm
DENCLUE Algorithm
26 / 32
Unsupervised Learning: Density-based Methods DENCLUE Algorithm
DENCLUE Algorithm
27 / 32
Unsupervised Learning: Density-based Methods DENCLUE Algorithm
DENCLUE Algorithm
Influence function
Let x and y be objects or points in Fd , a d-dimensional input space
The influence function of data object y on x is a function:
y
fB (x) = fB (x, y)
σ is a threshold parameter
28 / 32
Unsupervised Learning: Density-based Methods DENCLUE Algorithm
DENCLUE Algorithm
Density function
The density function at an object or point x is defined as the sum of
influence functions of all data points
That is, it is the total influence on x of all of the data points
Given n data objects, the density function at x is defined as
n
x x x x
X
fBD (x) = fBi (x) = fB1 (x) + fB2 (x) + · · · + fBn (x)
i =1
29 / 32
Unsupervised Learning: Density-based Methods DENCLUE Algorithm
DENCLUE Algorithm
Dataset Square
Gaussian
30 / 32
Unsupervised Learning: Density-based Methods DENCLUE Algorithm
DENCLUE Algorithm
From the density function, we can define the density attractor, the
local maxima of the overall density function
A hill-climbing algorithm guided by the gradient can be used to
determine the density attractor of a set of data points
31 / 32
Unsupervised Learning: Density-based Methods DENCLUE Algorithm
References
32 / 32