ML 10 Decision Trees
ML 10 Decision Trees
ML 10 Decision Trees
from Textbook
from Textbook
Stay In Outlook?
from Textbook
Stay In Outlook?
Sunny Overcast Rainy
Go to Beach Go Running
from Textbook
Stay In Outlook?
Sunny Overcast Rainy
from Textbook
Work to Do?
Yes No
Stay In Outlook?
Sunny Overcast Rainy
Stay In Go to Movies
from Textbook
Work to Do?
Yes No
Stay In Sunny?
Yes No
Go to Beach Overcast?
Yes No
Stay In Go to Movies
Features Categories
Sunny? Go to Beach
Overcast? Go Running
C1 F2 Node = Feature
Yes No
C2 F3
Yes No
Leaf = Class C3 F4
Yes No
C1 C4
F2
F2
F3 F3 F3 F3
F4 F4 F4 F4 F4 F4 F4 F4
C C C C C C C C C C C C C C C C
1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0
1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
F1
F2
F2
F3 F3 F3 F3
F4 F4 F4 F4 F4 F4 F4
F1
R1 R2
F
1 A York St.
59th 4th
St. St. A
72nd
St C High St.
C
Clark St.
2 3
F1
n binary features can be
used to classify samples
F2
F2 into at most 2n classes
F3
depth = d
F3 F3 F3
F4 F4 F4
size = s = 2d+1-1
F4 F4 F4 F4 F4
C C C C C C C C C C C C C C C C
d =4
s= 24-1=31
depth = length of the longest path from
the root to a leaf
size = the number of nodes in the tree
The City College of New York
CSc 59929 -- Introduction to Machine Learning 22
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
Decision stump
F1
R1 R2
m Nj
IG ( D=
p, f ) I ( Dp ) − ∑ I ( Dj )
j =1 Np
• IG Gini impurity
• IH Entropy
• IE Classification Error
=i 1 i≠ j
pi denotes the fraction of the elements in the set that are in class i.
0.90
0.80
0.70
0.60
I 0.50
0.40
0.30
0.20
0.10
0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
p1=1- p2
The City College of New York
CSc 59929 -- Introduction to Machine Learning 32
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
Impurity measures for sets with two classes
Gini Impurity Scaled Entropy Classification Error
0.50
0.45
0.40
0.35
0.30
I 0.25
0.20
0.15
0.10
0.05
0.00
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
p1=1- p2
The City College of New York
CSc 59929 -- Introduction to Machine Learning 33
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
Examples of impurity measures
n = 10, m = 2
n = 10, m = 2
n = 10, m = 2
n = 10, m = 2
x1
[10,10]
x1
[9,1]
[1,9]
x1
x1 <= 65.75
[7,0]
[1,9]
x1
x1 <=57.5
[0,7]
[7,0]
x2 <= 67.0
[1,2]
x1
X1 <= 58.75
entropy = 0.0 entropy = 0.918 entropy = 0.918 entropy = 0.0
samples = 7 samples = 3 samples = 3 samples = 7 2
value = [7,0] value = [2,1] value = [1,2] value = [0,7]
[0,7]
[7,0]
[1,2]
x1
x1 <= 58.75
[0,7]
[7,0]
[1,0] [0,2]
x1
x1 <= 92.0
x1
x1
Pruning algorithms
• Reduced error pruning
• Cost complexity pruning
[0,7]
[7,0]
[1,0] [0,2]
x1
R (T ) 0.00
A) T 6,=
=
The City College of New York
CSc 59929 -- Introduction to Machine Learning 63
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
X1 <= 65.75
B) T = 5 entropy = 1.0
samples = 20
tree
value = [10,10] depth
True False
X1 <= 57.5 X2 <= 67.0
entropy = 0.469 entropy = 0.469 1
samples = 10 samples = 10
value = [9,1] value = [1,9]
X1 <= 58.75
entropy = 0.0 entropy = 0.918 entropy = 0.918 entropy = 0.0
samples = 7 samples = 3 samples = 3 samples = 7 2
value = [7,0] value = [2,1] value = [1,2] value = [0,7]
[0,7]
[7,0]
[1,2]
x1
R (T ) 0.05
B) T 5,=
=
The City College of New York
CSc 59929 -- Introduction to Machine Learning 65
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
X1 <= 65.75
C ) T = 5 entropy = 1.0
samples = 20
tree
value = [10,10] depth
True False
X1 <= 57.5 X2 <= 67.0
entropy = 0.469 entropy = 0.469 1
samples = 10 samples = 10
value = [9,1] value = [1,9]
X1 <= 92.0
entropy = 0.0 entropy = 0.918 entropy = 0.918 entropy = 0.0
samples = 7 samples = 3 samples = 3 samples = 7 2
value = [7,0] value = [2,1] value = [1,2] value = [0,7]
[0,7]
[7,0]
[1,0] [0,2]
x1
R (T ) 0.05
C ) T 5,=
=
The City College of New York
CSc 59929 -- Introduction to Machine Learning 67
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
X1 <= 65.75
D) T = 4 entropy = 1.0
samples = 20
tree
value = [10,10] depth
True False
X1 <= 57.5 X2 <= 67.0
entropy = 0.469 entropy = 0.469 1
samples = 10 samples = 10
value = [9,1] value = [1,9]
[0,7]
[7,0]
[1,2]
x1
R (T ) 0.10
D) T 4,=
=
The City College of New York
CSc 59929 -- Introduction to Machine Learning 69
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
X1 <= 65.75
E ) T = 4 entropy = 1.0
samples = 20
tree
value = [10,10] depth
True False
X1 <= 57.5
entropy = 0.469 entropy = 0.469 1
samples = 10 samples = 10
value = [9,1] value = [1,9]
X1 <= 58.75
entropy = 0.0 entropy = 0.918
samples = 7 samples = 3 2
value = [7,0] value = [2,1]
[1,9]
[7,0]
x1
R (T ) 0.05
E ) T 4,=
=
The City College of New York
CSc 59929 -- Introduction to Machine Learning 71
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
X1 <= 65.75
F ) T = 4 entropy = 1.0
samples = 20
tree
value = [10,10] depth
True False
X2 <= 67.0
entropy = 0.469 entropy = 0.469 1
samples = 10 samples = 10
value = [9,1] value = [1,9]
X1 <= 92.0
entropy = 0.918 entropy = 0.0
samples = 3 samples = 7 2
value = [1,2] value = [0,7]
[0,7]
[9,1]
[1,0] [0,2]
x1
R (T ) 0.05
F ) T 4,=
=
The City College of New York
CSc 59929 -- Introduction to Machine Learning 73
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
X1 <= 65.75
G ) T = 3 entropy = 1.0
samples = 20
tree
value = [10,10] depth
True False
X1 <= 57.5 X2 <= 67.0
entropy = 0.469 entropy = 0.469 1
samples = 10 samples = 10
value = [9,1] value = [1,9]
[1,9]
[7,0]
x1
R (T ) 0.10
G ) T 3,=
=
The City College of New York
CSc 59929 -- Introduction to Machine Learning 75
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
X1 <= 65.75
H ) T = 3 entropy = 1.0
samples = 20
tree
value = [10,10] depth
True False
X2 <= 67.0
entropy = 0.469 entropy = 0.469 1
samples = 10 samples = 10
value = [9,1] value = [1,9]
[0,7]
[9,1]
[1,2]
x1
R (T ) 0.10
H ) T 3,=
=
The City College of New York
CSc 59929 -- Introduction to Machine Learning 77
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
X1 <= 65.75
I ) T = 2 entropy = 1.0
samples = 20
tree
value = [10,10] depth
True False
[1,9]
[9,1]
x1
R (T ) 0.10
I ) T 2,=
=
The City College of New York
CSc 59929 -- Introduction to Machine Learning 79
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
J ) T = 1 entropy = 1.0
samples = 20
tree
value = [10,10] depth
[10,10]
x1
R (T ) 0.50
J ) T 1,=
=
The City College of New York
CSc 59929 -- Introduction to Machine Learning 81
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
All subtrees
A 6 0.00 3 6α
B 0.05 3
5 0.05+5α
C 0.05 3
D 0.10 2 0.10+4α
E 4 0.05 3 0.05+4α
F 0.05 3 0.05+4α
G 0.10 2
3 0.10+3α
H 0.10 2
I 2 0.10 1 0.10+2α
J 1 0.50 0 0.50+α
0.500
A [6]
0.400 B,C [5]
D[4]
E,F [4]
Cα(T)
0.300 G, H [3]
I [2]
J [1]
0.200
0.100
0.000
0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035 0.040
α
The City College of New York
CSc 59929 -- Introduction to Machine Learning 83
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
Cost Complexity Function
0.250
0.200
0.150
A [6]
Cα(T)
A [6]
B,C [5]
0.100
D[4]
E,F [4]
G, H [3]
0.050 I [2]
I [2]
J [1]
0.000
0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035 0.040
α
The City College of New York
CSc 59929 -- Introduction to Machine Learning 84
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
Candidate subtrees
A 6 0.00 3 6α [0,0.025)
B
5 0.05 3 0.05+5α
C
D 4 0.10 2 0.10+4α
E
4 0.05 3 0.05+4α 0.025
F
G
3 0.10 2 0.10+3α
H
I 2 0.10 1 0.10+2α [0.025,∞)
J 1 0.50 0 0.50+α
TA > TI
30,000
25,000
20,000
Pruned Subtrees
15,000
10,000
5,000
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Leaves in Full Tree
The City College of New York
CSc 59929 -- Introduction to Machine Learning 89
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
Number of Pruned Subtrees
12
10
8
ln(Pruned Subtrees)
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Leaves in Full Tree
The City College of New York
CSc 59929 -- Introduction to Machine Learning 90
Spring 2020 -- Erik K. Grimmelmann, Ph.D.
Pruned Subtrees
T
The number of pruned subtrees of T is ≈ 1.5028369 .
T
For
=
our sample tree T 6 and thus 1.5028369 ≈ 12.
{
R (Tt ) +α Tt = R ( t ) + α =t} R ( t ) + α
Solving for α we obtain
R ( t ) − R (Tt )
α= .
T −1
t
2) α k +1 = min ( g k (t ) )
3) Visit the nodes in top-down order and prune whenever
g k (t ) = α k +1 to obtain Tk +1
4) k += 1