Exercise 4

This document discusses tree-based classifiers and the CART algorithm for building decision trees. It provides exercises to: 1) Load sample data with 2 features and 16 samples from 2 classes, visualize it, and build a binary decision tree using CART. 2) Plot the decision boundary from the tree on the data. 3) Draw the decision tree as a block diagram. 4) Prune the tree to simplify it. 5) Compare building trees using different impurity measures. 6) Build and simplify a tree on non-metric letter data from 3 classes.

Uploaded by

nobeen666

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views

Exercise 4

Uploaded by

nobeen666

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

SGN-2556 Pattern Recognition

Computer exercise 4

Tree-based classifiers
25.04.2006

It is natural and intuitive to classify a pattern through a sequence of questions, in which

the next question asked depends on the answer to the current question. Such a sequence
of questions is displayed in a directed decision tree or simply tree, where by convention
the first or root node is displayed at the top, connected by successive (directional) links
or branches to other nodes. These are similarly connected until we reach terminal or leaf
nodes, which have no further link (see the figure below).

(a) (b)

Figure 1: Classification in a basic decision tree proceeds from top to bottom (a).
Two-dimensional two-category example with decision regions marked R1 and R2 (right).
Decision boundaries are perpendicular to the feature axes.

Much of the work in designing trees focuses on deciding which property test or query
should be performed at each node. For nonmetric data one might well consider logical
combinations of properties, such as using (size=medium)AND (NOT(color=yellow))? as
a query. For numerical data, there is a simple way to visualize the decision boundaries
that are produced by decision trees. Suppose that the query at each node has the form
“Is xi > xj ? This leads to hyperplane decision boundaries that are perpendicular to the
coordinate axis and to decision regions of the form illustrated in Fig. 1. The fundamental
principle underlying tree creation is that of simplicity. We seek a property query T at
each node N that makes the data reaching the immediate descendent nodes as “pure as
possible”. The most popular measure is the entropy impurity:
X
i(N ) = − P (ωj )log2 P (ωj ), (1)
j

where P (ωj ) is the fraction of patterns at node N that are in category ωj . By the property
of entropy, if all the patterns are of the same category, the impurity is 0, otherwise it is

1
positive, with the greatest value occuring when the different classes are equally likely. An
obvious heuristic is to choose the query that decreases the impurity as much as possible.
The drop in impurity is defined by

∆i(N ) = i(N ) − PL i(NL ) − (1 − PL )i(NR ), (2)

where NL and NR are the left and right descendent nodes, i(NL ) and i(NR ) are their
impurities, and PL is the fraction of patterns at node N that will go to NL when property
query T is used. Then the best query is the choice for T that maximizes ∆i (T ).

1 CART (Classification And Regression Trees) method

1) Download the file CARTdata.mat that contains a vector of features (2 features, 16
samples taken from two classes) and a vector of targets. Visualize the data.

2) Train a binary CART tree using the entropy impurity. Use the CART.m function for
that.

3) Using the test CART.m function and obtained decision tree, build a decision surface
D over the range of given data. Plot the decision boundary superimposed on the
data points (Hint: use the contour command to plot the contour of matrix D. Put
the number of contour lines equal to 1 to get a single decision boundary).

4) Draw the decision tree for the given data as a block diagram (similarly to Fig.1(a))
using the structured array tree returned by CART.m function. Include the tree in your
report. You can access a field (or a substructure) of a particular structure using the
field structure followed by the field name (e.g. struct1.field1, struct1.substruct1.subsubstruct1).

5) In 2), a tree was grown fully (i.e. until no possible splits found). This typically
leads to overfitting. Simplify the decision tree obtained in (4) by pruning all the
neighboring leaf nodes (linked to a common antecedent node, one level above) whose
elimination gives a very small increase in impurity (take for this particular example
values less then 10−2 ).

6) Are there any other redundancies in the tree, which might be simplified?

7) Repeat step 2), using Gini and Misclassification impurity measures. Compare the
performance of three different measures in terms of complexity of the decision tree
(number of nodes), and in terms of misclassification rate for a given dataset.

8) Consider the nonmetric data from the file letters.mat sampled from three categories
and consisting of five features and twenty patterns (see figure below). Train a tree
for this data using the entropy impurity. Check the misclassification rate for it using
test CART letters.m function.

9) Train a tree only with patterns belonging to the first and second class. Simplify it
and convert information in your tree into a single logical expression, which describes
the first category. Repeat the same for the second category. (Hint: use the char
command to convert integers into ASCII characters).

2
Sample Category A−D E−G H −J K −L M −N
1 ω1 A E H K M
2 ω1 B E I L M
3 ω1 A G I L N
4 ω1 B G H K M
5 ω1 A G I L M
6 ω2 B F I L M
7 ω2 B F J L N
8 ω2 B E I L N
9 ω2 C G J K N
10 ω2 C G J L M
11 ω2 D G J K M
12 ω2 B F I L M
13 ω3 D E H K N
14 ω3 A E H K N
15 ω3 D E H L N
16 ω3 D F J L N
17 ω3 A F H K N
18 ω3 D E J L M
19 ω3 C F J L M
20 ω3 D F H L M

Vivint Flyer
No ratings yet
Vivint Flyer
2 pages
Decision Tree & Techniques
71% (7)
Decision Tree & Techniques
41 pages
Proj-Ppt-Manufacturing of Steam Turbine Blade
0% (1)
Proj-Ppt-Manufacturing of Steam Turbine Blade
12 pages
Dressing To IMPRESS
100% (1)
Dressing To IMPRESS
19 pages
PSR 0607 Chap10
No ratings yet
PSR 0607 Chap10
33 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Classification and Regression Trees (CART) Theory and Applications
No ratings yet
Classification and Regression Trees (CART) Theory and Applications
40 pages
TEAA_ Tree Ensembles-1
No ratings yet
TEAA_ Tree Ensembles-1
43 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
ESGB_2025_classification and regression tress [Enregistré automatiquement]
No ratings yet
ESGB_2025_classification and regression tress [Enregistré automatiquement]
43 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
Ch13. Decision Tree: KH Wong
No ratings yet
Ch13. Decision Tree: KH Wong
82 pages
Classification and Regression Tree Construction
No ratings yet
Classification and Regression Tree Construction
18 pages
CART - Machine Learning
No ratings yet
CART - Machine Learning
29 pages
Ml Unit 2 Final_iii Yr
No ratings yet
Ml Unit 2 Final_iii Yr
72 pages
BANA 560 Lecture - 5 - NaiveBayes - Decision - Tree
No ratings yet
BANA 560 Lecture - 5 - NaiveBayes - Decision - Tree
42 pages
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
48 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
PR GTU IMP questions by jay
No ratings yet
PR GTU IMP questions by jay
35 pages
Decision Trees - 2022
No ratings yet
Decision Trees - 2022
49 pages
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
18 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
CART1
No ratings yet
CART1
17 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
DS535 Note 6 (Page1-14)
No ratings yet
DS535 Note 6 (Page1-14)
13 pages
Lesson 5.0 Supervised Learning with Decision Trees (1)
No ratings yet
Lesson 5.0 Supervised Learning with Decision Trees (1)
16 pages
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
37 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Trees and Forests: Machine Learning With Python Cookbook
No ratings yet
Trees and Forests: Machine Learning With Python Cookbook
5 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Apznzayn4iudcvxyoppqs61j04 7hfvwveb4orry3irmq7ekrlv08lh81olz64cb1ycwzmxuattzrg0ox0g-e Tcprei1i3bwhbnbqofqhvtixwokm0ftaoxwee3znpcytoh6jgknlof6 Rukjysosqdyan8wfbovpzrikmrpeywyu07ft Vvpsanuerxuhcghc7g6sd4pcyi9z-Wao8bn
No ratings yet
Apznzayn4iudcvxyoppqs61j04 7hfvwveb4orry3irmq7ekrlv08lh81olz64cb1ycwzmxuattzrg0ox0g-e Tcprei1i3bwhbnbqofqhvtixwokm0ftaoxwee3znpcytoh6jgknlof6 Rukjysosqdyan8wfbovpzrikmrpeywyu07ft Vvpsanuerxuhcghc7g6sd4pcyi9z-Wao8bn
20 pages
Untitled presentation
No ratings yet
Untitled presentation
6 pages
Untitled presentation
No ratings yet
Untitled presentation
6 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Peer Reviewed Scientific Journals
No ratings yet
Peer Reviewed Scientific Journals
9 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Evaluating Model Accuracy and Bias-Variance Tradeoff
No ratings yet
Evaluating Model Accuracy and Bias-Variance Tradeoff
40 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
ML (Interview)
No ratings yet
ML (Interview)
20 pages
Decision Trees: A Recent Overview: S. B. Kotsiantis
No ratings yet
Decision Trees: A Recent Overview: S. B. Kotsiantis
23 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
phys361-S24-lecture-17-random-forests
No ratings yet
phys361-S24-lecture-17-random-forests
24 pages
Chapter 9 - Classification and Regression Trees: Data Mining For Business Intelligence
No ratings yet
Chapter 9 - Classification and Regression Trees: Data Mining For Business Intelligence
36 pages
Objective Segmentation
No ratings yet
Objective Segmentation
21 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
Lec4 Tree v2.4 1
No ratings yet
Lec4 Tree v2.4 1
54 pages
Tree-Based Methods
No ratings yet
Tree-Based Methods
32 pages
Random Forest
No ratings yet
Random Forest
83 pages
Chapter 09 CART - N
No ratings yet
Chapter 09 CART - N
24 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms
No ratings yet
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms
27 pages
Unit 3
No ratings yet
Unit 3
31 pages
WEEK 13 ML
No ratings yet
WEEK 13 ML
3 pages
Topics on Tournaments in Graph Theory
From Everand
Topics on Tournaments in Graph Theory
John W. Moon
No ratings yet
Major Revision Facts in Mathematics
From Everand
Major Revision Facts in Mathematics
B. N. Kumar
No ratings yet
The Red Book of Mathematical Problems
From Everand
The Red Book of Mathematical Problems
Kenneth S. Williams
No ratings yet
K-Nearest Neighbor: Classification On Spatial Data Streams Using P-Trees
No ratings yet
K-Nearest Neighbor: Classification On Spatial Data Streams Using P-Trees
12 pages
Bit Sequential (BSQ) Data Model and Peano Count Trees (P-Trees)
No ratings yet
Bit Sequential (BSQ) Data Model and Peano Count Trees (P-Trees)
20 pages
Iccit 2005 Ppmtree Paper
No ratings yet
Iccit 2005 Ppmtree Paper
5 pages
Chapter 5. Paper 1: Fast Rule-Based Classification Using P-Trees 5.1. Abstract
No ratings yet
Chapter 5. Paper 1: Fast Rule-Based Classification Using P-Trees 5.1. Abstract
22 pages
044 Kernelizing The Outp
No ratings yet
044 Kernelizing The Outp
8 pages
O StA 08
No ratings yet
O StA 08
6 pages
Face Iccit 04
No ratings yet
Face Iccit 04
5 pages
MDM/KDD2002: Multimedia Data Mining Between Promises and Problems
No ratings yet
MDM/KDD2002: Multimedia Data Mining Between Promises and Problems
4 pages
Periodicities in Trees Associated With The Classification of P - Groups by Width, Rank and Obliquity
No ratings yet
Periodicities in Trees Associated With The Classification of P - Groups by Width, Rank and Obliquity
15 pages
Lazy Classifiers Using P-Trees
No ratings yet
Lazy Classifiers Using P-Trees
4 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Ptree
No ratings yet
Ptree
2 pages
Hidden Tree Markov Models For Document Image Classification: Michelangelo Diligenti Paolo Frasconi Marco Gori
No ratings yet
Hidden Tree Markov Models For Document Image Classification: Michelangelo Diligenti Paolo Frasconi Marco Gori
15 pages
Image Classification For Content-Based Indexing
No ratings yet
Image Classification For Content-Based Indexing
14 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Decision Tree Classification of Spatial Data Streams Using Peano Count Trees
No ratings yet
Decision Tree Classification of Spatial Data Streams Using Peano Count Trees
5 pages
Efficient Hierarchical Clustering of Large Data Sets Using P-Trees
No ratings yet
Efficient Hierarchical Clustering of Large Data Sets Using P-Trees
4 pages
Cosmic Reality
100% (1)
Cosmic Reality
12 pages
Notes Important Questions Answers of 11th Math Chapter 8 Excercise 8.2
No ratings yet
Notes Important Questions Answers of 11th Math Chapter 8 Excercise 8.2
16 pages
BOOK 3 - Basic Math
100% (2)
BOOK 3 - Basic Math
80 pages
Fish Scraps As A Fertilizer
No ratings yet
Fish Scraps As A Fertilizer
5 pages
Lec1 PDF
No ratings yet
Lec1 PDF
15 pages
Tuberias Conduit
No ratings yet
Tuberias Conduit
2 pages
Homework 1, Selected Problems: DL DT
No ratings yet
Homework 1, Selected Problems: DL DT
6 pages
hw2 New
No ratings yet
hw2 New
9 pages
Rizal
No ratings yet
Rizal
2 pages
Hello From Heart
No ratings yet
Hello From Heart
17 pages
Vaibhav&Divyansh DT
No ratings yet
Vaibhav&Divyansh DT
49 pages
The Evolution of Recent Developments in Formal Language Theory From The 1960's.
No ratings yet
The Evolution of Recent Developments in Formal Language Theory From The 1960's.
2 pages
Seminar Presentation: Google Smart Contact Lens
No ratings yet
Seminar Presentation: Google Smart Contact Lens
23 pages
Module 5 CPS
No ratings yet
Module 5 CPS
98 pages
Astral Coach - Angel Numbers
No ratings yet
Astral Coach - Angel Numbers
1 page
Final SPCEM OBE Learning Plan in Entrepreneurship Revised As of April 30
No ratings yet
Final SPCEM OBE Learning Plan in Entrepreneurship Revised As of April 30
12 pages
Communication Skills Some Problems in Nursing Education and Practice PDF
No ratings yet
Communication Skills Some Problems in Nursing Education and Practice PDF
2 pages
Unesco Als Ls1 English m08 (v1.1.1)
100% (1)
Unesco Als Ls1 English m08 (v1.1.1)
78 pages
Case Study
No ratings yet
Case Study
10 pages
Person X Situation Interactions: February 15, 2017 Psy 215
No ratings yet
Person X Situation Interactions: February 15, 2017 Psy 215
41 pages
Strong Duality Results: September 22, 2008
No ratings yet
Strong Duality Results: September 22, 2008
17 pages
Mock Exam 2
No ratings yet
Mock Exam 2
2 pages
Safi Uddin Razi - 10195
No ratings yet
Safi Uddin Razi - 10195
10 pages
Milestone06 Neumann Opara 28.04.2024
No ratings yet
Milestone06 Neumann Opara 28.04.2024
5 pages
All VNX CLARiiON Celerra Storage System Disk and FLARE OE Matrices
No ratings yet
All VNX CLARiiON Celerra Storage System Disk and FLARE OE Matrices
120 pages
Discuss Both Views
No ratings yet
Discuss Both Views
4 pages
BA Part-I Assignment Question For Examination-2024
No ratings yet
BA Part-I Assignment Question For Examination-2024
8 pages

Exercise 4

Uploaded by

Exercise 4

Uploaded by

SGN-2556 Pattern Recognition

It is natural and intuitive to classify a pattern through a sequence of questions, in which

∆i(N ) = i(N ) − PL i(NL ) − (1 − PL )i(NR ), (2)

1 CART (Classification And Regression Trees) method

You might also like