On Unit-4
On Unit-4
On Unit-4
YES
DAY TEMPARATURE HUMIDITY WIND PLAY TENNIS
D1 HOT HIGH WEAK NO
D2 HOT HIGH STRONG NO
D8 MILD HIGH WEAK NO
D9 COOL NORMAL WEAK YES
D11 MILD NORMAL STRONG YES
Attribute: TEMPARATURE
values(TEMPARATURE)= Hot,mild,cool
1st entropy of entire dataset:
Ssunny[2+ , 3-] =>
entropy(Ssunny)= -2/5 log2(2/5)- 3/5 log2(3/5)=0.97
Shot ->[o+ ,2-]=>0
S mild ->[1+ ,1-]=>1
Scool ->[1+ ,0-]=>0
yes
humidity
high normal
no yes
DAY TEMP HUMIDITY WIND PLAY TENNIS
D4 MILD HIGH WEAK YES
D5 COOL NORMAL WEAK YES
D6 COOL NORMAL STRONG NO
D10 MILD NORMAL WEAK YES
D14 MILD HIGH STRONG NO
Attribute: TEMPARATURE
values(TEMPARATURE)= Hot,mild,cool
1st entropy of entire dataset:
Srain[3+ , 2-] =>
entropy(Srain)= -3/5 log2(3/5)- 2/5 log2(2/5)=0.97
Shot ->[o+ ,0-]=>0
S mild ->[2+ ,1-]=> -2/3 log2(2/3)- 1/3 log2(1/3)=0.9183
Scool ->[1+ ,1-]=>1.0
no yes yes no
Ensemble learning:
an ensemble method is a technique that combines the
predictions from multiple machine learning
algorithms together to make more accurate
predictions .
a model comprised of many models is called as an
ensemble model
knn
Decision
tree
Many ensemble methods contains the same type of
learning algorithms which are called “ homogeneous
ensembles”
There are also some methods contains the different type of
learning algorithms which are called “ heterogeneous”
ensembles
Types of ensemble learning:
Bagging (boot strap aggregation)
Boosting
Stacking
cascading
Bagging:
It is a general procedure that can be used to reduce the
variance for that algorithm that has high variance.
Bagging makes each model run independently and then
aggregates the outputs at the end without preference to any
model.
Example: random forest
In bagging, we take different subsets of data set randomly and
combine them with the help of bootstrap sampling. In detail,
given a training data set containing the n number of training
examples, a sample of m training examples will be generated
by sampling with replacement
A B C
D E F
G H I
A H E G
F A
E C H F
G C
AVERAGE / VOTING
Random forests
Random forest is a supervised learning algorithm which
uses ensemble learning method for classification and
regression
Random forest is a bagging technique. The trees in
random forest are run in parallel. There is no
interaction between these trees while building the trees
The basic idea behind random forest is that it combines
multiple decision trees to determine the merges their
predictions together to get a more accurate and a stable
prediction
The steps for random forest algorithm are as follows:
Step 1:pick at a random k data points from the training
set
Step 2:build the decision tree associated with these k
data points
Step 3:choose the number Ntree of trees you want to
build and repeat step 1 &2
Step 4:for a new data point, make each one of your Ntree
trees predict the category to chich the data points
belongs and assign the new data point to the category
that wins the majority vote.
Training Training Training
data 1 data 2 data n
Training set
Decisio Decisio Decisio
n tree 1 n tree 2 n tree n
Voting (averaging)
prediction
Construct a decision tree using id3
algorithm-example for practice
Major Exp Tie Hired ?
Cs Programming Pretty No
Cs Programming Pretty No
Cs Management Pretty Yes
Cs Management Ugly Yes
Business Programming Pretty Yes
Business Programming Ugly Yes
Business Management Pretty No
business management pretty no