Exercise 4
Exercise 4
Computer exercise 4
Tree-based classifiers
25.04.2006
(a) (b)
Figure 1: Classification in a basic decision tree proceeds from top to bottom (a).
Two-dimensional two-category example with decision regions marked R1 and R2 (right).
Decision boundaries are perpendicular to the feature axes.
Much of the work in designing trees focuses on deciding which property test or query
should be performed at each node. For nonmetric data one might well consider logical
combinations of properties, such as using (size=medium)AND (NOT(color=yellow))? as
a query. For numerical data, there is a simple way to visualize the decision boundaries
that are produced by decision trees. Suppose that the query at each node has the form
“Is xi > xj ? This leads to hyperplane decision boundaries that are perpendicular to the
coordinate axis and to decision regions of the form illustrated in Fig. 1. The fundamental
principle underlying tree creation is that of simplicity. We seek a property query T at
each node N that makes the data reaching the immediate descendent nodes as “pure as
possible”. The most popular measure is the entropy impurity:
X
i(N ) = − P (ωj )log2 P (ωj ), (1)
j
where P (ωj ) is the fraction of patterns at node N that are in category ωj . By the property
of entropy, if all the patterns are of the same category, the impurity is 0, otherwise it is
1
positive, with the greatest value occuring when the different classes are equally likely. An
obvious heuristic is to choose the query that decreases the impurity as much as possible.
The drop in impurity is defined by
2) Train a binary CART tree using the entropy impurity. Use the CART.m function for
that.
3) Using the test CART.m function and obtained decision tree, build a decision surface
D over the range of given data. Plot the decision boundary superimposed on the
data points (Hint: use the contour command to plot the contour of matrix D. Put
the number of contour lines equal to 1 to get a single decision boundary).
4) Draw the decision tree for the given data as a block diagram (similarly to Fig.1(a))
using the structured array tree returned by CART.m function. Include the tree in your
report. You can access a field (or a substructure) of a particular structure using the
field structure followed by the field name (e.g. struct1.field1, struct1.substruct1.subsubstruct1).
5) In 2), a tree was grown fully (i.e. until no possible splits found). This typically
leads to overfitting. Simplify the decision tree obtained in (4) by pruning all the
neighboring leaf nodes (linked to a common antecedent node, one level above) whose
elimination gives a very small increase in impurity (take for this particular example
values less then 10−2 ).
6) Are there any other redundancies in the tree, which might be simplified?
7) Repeat step 2), using Gini and Misclassification impurity measures. Compare the
performance of three different measures in terms of complexity of the decision tree
(number of nodes), and in terms of misclassification rate for a given dataset.
8) Consider the nonmetric data from the file letters.mat sampled from three categories
and consisting of five features and twenty patterns (see figure below). Train a tree
for this data using the entropy impurity. Check the misclassification rate for it using
test CART letters.m function.
9) Train a tree only with patterns belonging to the first and second class. Simplify it
and convert information in your tree into a single logical expression, which describes
the first category. Repeat the same for the second category. (Hint: use the char
command to convert integers into ASCII characters).
2
Sample Category A−D E−G H −J K −L M −N
1 ω1 A E H K M
2 ω1 B E I L M
3 ω1 A G I L N
4 ω1 B G H K M
5 ω1 A G I L M
6 ω2 B F I L M
7 ω2 B F J L N
8 ω2 B E I L N
9 ω2 C G J K N
10 ω2 C G J L M
11 ω2 D G J K M
12 ω2 B F I L M
13 ω3 D E H K N
14 ω3 A E H K N
15 ω3 D E H L N
16 ω3 D F J L N
17 ω3 A F H K N
18 ω3 D E J L M
19 ω3 C F J L M
20 ω3 D F H L M