Machine Learning Unit 5
Machine Learning Unit 5
Machine Learning Unit 5
MACH INE
DESIer N AND ANALYSIS OF
LEARING1 ExPERIMENTS
thorve
Cavose of tino
Machne
poCes that Covons nht forn Shnco data
Dha
ldertfttotitn to ocol doolepranb, odal bploant
cd wodol wainteyamte. A.b
Cutire activitos fall tdon bw0 brood oteros
Mdal dondoprnant and ML modal
3uch as ML
bperotions .
has
ilke Mache 'ne Joanng iecp
fulnaing pdaos.
Podrness gual TdenbirfKabivo
ML Paotlorm framg
Data colo cti ondala pepoasig
feabune evqimaariog)
Madel doolop mant (Tracnimy, honimg, evaloahon
Medol doplogmant (Tnfesonee pautietion)
Mudol monionng
Busimaze qual
hane
An organizatim (oneidenig ML Shoobd
Valie to be gaiet by Solbiog poblamy.
be abe to woasne bucimogS Vahe
Training
acurabe ML wodel Tauine
data pooLsi3 to Convert data into a Ugabla
fusmat
Data poosig steps indoda Colleefion
prepasiag data and tahne aginaiog
the procoss cf oreing
ovd
tenshg, ertactirg
selectig vahablas fpfrarn
DEpU MENT r data
MODEL DEVSopMENT:
After a model is tso'ne, ned
evalwvoated and Valiolated we cau oeplry the
made Dhto the prodcbior. we can mabo predich os
and infeseress aqimat to modol .
MbDEL DEECopME N1 :
ouolopnads Consids modal
boudng, traming, toning awmd enlovat tn
Meolal building inclodas seotirng a
Pipelne tat aulomatog Tho build ,aim
taqimg Qud prodoeh tov
envom manig
Bueimass
gal
wonitoihg
fanig
Data
procossir9
detecti Dn
pefraname gh ealy
igitatiou
Gu1DELNES MACHINE LEARNING ExPERIMENTS
AlM OF THE SUDY :
glOhat are bie cbjecbives (eg
asessing bru
erpected enos of an algostm an a partiolad
poobiers, ect)
SELECTTON OF TtE DegpoNSE VAR\ABE:
Ohat should e e
MeasUne (eg esor, preasi on ard
e,
qaliy
CHOI CE OF PACTO2s AND Comdexiy, ec
LEJELS:
trmyauty Conclosi is
hoed fors futkes expe mon tation Thone
tiat Dur Concosion
be orong espectalley i f tio cata is Srall
and noisy ohen DUr expectabibns ane not mat, it is mest
helpol bo invstigae ohy they ane hot
DIASET PPE AON:
Machine aaring it. abot Jaasrig
Some
popesti of data sot ard aplyng
nous dola.
(ommon 50ctico In rrachira
a
to esnlivato n algont ig to 3pt
dara at
raiming set (ohich ue Jaam data ooere
and Cne all a
tet Those
tætig
Casing
poperties
set, bn ohich
mot
*pobem
a god
trcng
estimaoY for tha tat
Test Set:
to asegs
pertnmanes of a cagsifie
Mever used duing bhe troining
tost s e
pooase
Poruideg Unbaige estimae of tha
ererali atin eoret tha euooadge aboot
# ls
Traiming datach ue ue to Cnct cassitir
1he daba Sovce V ohi
dotaset a haimng set is inplo rno nted la
boild p a mudol Lohile a test
les (or validotie) o
is Vaidate ta modol buil
x Data pons in tha rauming seto} o
exolocad fum tha tast ( Val'datim) set
a
set, a
runng Set, Valdatory Set (Sonme pocpla bse
ingtead) in each derotibn, 0r dividd into
in.
Set ,a Val'daiy Set and a tegt st
traung
ach iteration.
Macina daasne'
baialy iy
to prediet tie test data
to create mode)
So wa Tha traunubg duta fit The moda!
avnd teatng ata to Gest it .
The modes generate e to preict
the reullo Unknon oeh s amed as Tho
test cet
CROsS VALIDAT1ON CCU) AND RESAMPLI NGt
gValidation
Uged
bechqe in wachine zamng
o yet the eonr rate of the M ML
model Which Can be onsideved as Qose to the
yate popoati cn
*I he cale Voloma Jore enn'ga
be epagentive of tha populaion yu nay
hot hoa the Validation Aechnia
Machine loaming, wodal hlidatio
olemad Jbo bhe protDS ohore
traimed wodal
dabcusot
Ls Qvalovoded Ubh a
tatig
*The tas tig data seb a separate
portion of the data Set frm aiCh
the Set is desved
the
lhe vain porpose of osig
data et to tet dha narali zatin
iy of a toaulmed
abily odel.
Data set
Traming ostig Holdot mtfkod
Cooss Vadation
Data pesmitng
Teatiro
Cross Valudati on a techuqe. for eosvoir
ML
Sobeeks
mads
hrainng
data
Sereal ML
to fosm
t Than he
trials is Computeol.
Leae - one -out C0es Validaion is k-fold
Validation abon to i4s ogil ex feseme
oitn k eqel te nombor
o N ta ombor of
of data poins
Set
WWoans thak NSeparate imes
data
the mction apox wak ie tsouned n he
execpt tor oe foint ad predichiY is wacd or
avd a pedi'cibny
Ttat point
k. fold Chos Validation:
k- fold cv is where
given cola set
Split ito a k womber of
Sectiors folds
ohere each old is used as testing et at Som
poBnt
,Lets toba the Scanansics o s-fold Choss
lalidabior (K=s) Hene tha daloset 1s Splat Into
5 o ds
tha first inesa cbion the firs+ fold is
Used to teet tie modol and bta Test osed Jo trau
the modal
*Dh ta Seeond
the
iteabhion- 2ud d is used
tattng set Ohde me Yost Sesve as ta
trauming Set This proas As
each fold of tte s fulds Dapeate
as been ased
unbil
the testmg dta sat
k -fold \hlidation, is petormed
per tho follaoing stepe
partition the ogimal huning dota et into
K equal Sobsets,Fach Sobgzt is alled aa fold
plds be naned as A,fe.. k.
Fod 4
lods
od s
t Tiau'ming sel
he acvontage of
watrs Jass aw the this-mathod,
daa qet
is aab the
dividad .
ve data polng gete to be
onto. and gek to be in a in tegt a
Sek
eracty
k - times
iaming
|he Varionce of tne
js educa
The disodvautage of 1e wotnod is ttat
tte raummg wothod
mcthed hasdro be cm Surabch
9ich maans it talas k timos ag mwoh
Compojebien o .
4
A vardaut b t s wthod is to
divide te
h daln nto aa tes) and traing
Set K
dueeek
hrag times
Tndependanty checse hew
Jang eoch
set is and do wany tri als Yat oorca
OVeY
Boolshapping:
It is a nethod off ample sese tat
is wch mae
heral. "an cos aidatio
ha idaa is bo use 1o obseed
Sanple to astimate tfa populat un dishbuhon
amplas Cau be drauon fono tha
*Then
csti mated popolatior ad Tha Sampling
dishibotioy o ay ype o estimato an itsel be
stimoded.
boctstrap is ,faxibe and
* The
paur Statistial tool. fat an be used to quautty
ayeroaked wlth afiven estimato
or Satisial lsarng wmetfod
*Por example iH ca an ostt mate
of t e Stendad e r of a Coeffiaent oY
Cofidene inleal or tcat Coelhcien+ .
where
Vor),oyVar (y) and oy= Crv)
But tho Valcs a o,0y ad dxy ao onkwon
Car Cormpole. osnatQs )
Aata set
*y vsmg daba
ant thak Contaim,
measUomonts br X and
Can hen
minimbo ur Inenont using
dhe aonta of ouY
he Vale of d is o6 .
tne
The wane Over all ,o0O egti natog fos d is
rel
Coge to
doviatio of the esti mateg
lopo
|000-)
oO83
the
The
Jem
resaunpling bolatmp Can
that
y The Seniipaametic
the bcotstap agsumas ta
population imcdg otfer temg
Similas ko oha
a Smooted
dsenad Sampla by Soupling fom
version of tte omole h'shogo
Senple
be done.
Vany siapy by frct tabig a Soumpla toith
eplacement fro the Obseed ample ad than
a ddimg noise .
Combuabiou
perfoswemco metics fo binay Classifi cati oy
dsignad to (astnad tadoefts bohaan fou
fdametal populabieu quauttie True
false positives, toe vagpti a oud fabe posi tie
*The evalatbi en
vaqpkives
measunes in docifiation
potems dainad Bom a matix witt tha
a
wmbey
Cosecty
for each class, vaed Couhsi
aud inceNoctty Cbsifed
wotnx
The Confvgion watix fr a binay
Closcifialior oldom is Shoon belos
Trve ass prediced cbac
egatie postive
False napbive Tre positive
Negative False positiue
classificati System
pesomamea f och yste is
data Matx
eon fusion watx iss also calod ag
(
false posiive
Ohich ame
Examples prediced as positive,
dbho class
Fase
vagtie
vagatie :
Exomple8 pedice as hagabie,olose
toe Cos is positi ve
Tieje positves:
Exoms Cassectty psedlced as pertaing
to the positive Cass
Tre hagaties:
Exonples ase coseety.pirediced as
belonings bo ta hegptie class -
measne wat
wost Usedin
tn racHco.
he eralvabi on
is the acunag rate
Acunaey ate Tne nogabie tt Te
Te positive
eprasents a
Senaitivihy Speuhciny paiy cenegendy
a particular deu'silon threshdd .
The are
mont
olmgtmualy hon - ushen
A
wo o Concauiadjacant
wona y Sogmens oith
Cvave
increasim
than
Slopes, tnd'ates Loally worse
andau raneing
90ald bete
raubing pertoace by Jonlng
Seaments imvoLued in oha Con Cawihy
Tus
Creabtng a loarsey elassifeN
PRE ceION AND RECALL:
Relevamce Subjecti e hotion
Diffenent way dtfer about bhe
TeleraMee non- Yolavenca of porttealaf
docomeuk dho qen quasti imns
Yesponse o
IR Sygem Searchos is doesment Colechien and
ems Odorod esponses
Is Called Tei Set of
26
A better eanch geilds a better raukod
and better ranbed sts hep tha gerr
imhomatbBoy raod
Set o
Yelevant
ifemsio
the dateba
\nelonl
items
reiewed
pestenta A: wmber of
relevat
Yeods ratneved
C- Nmbey of i
YO LoYds roeevElyawe
(
preslon = A
A+c
ts a measna d et
aturay and
i's daj'med eighbad hamone
mean Yeaal
tta test
ohee
6 NUmber of elavont dolomOnt
for
NeOmber aquenes
P(aea
doe mans
MOutCLASS UASSIFICATION TECHNIQUES:
truig
aiffenent clos point balogs
The gol Cenatvet a fachi
which guen a he dala polnt oi
Gaechmg
oo -
closs uoit te most
loss.deeisions r
\oing otedime
Degutnas Just peiy sise daesiong, it
predes class dabel.
A xXX
example
Eror Comechg Doptpat Cedomg
yEoros Coechng code appro aches
o Conmbime bimamy cossi Aers tm
way that You exploit de
Corelation and Commeet
E- Test
ko aBre,
Thus entisely
to dea! with pob lemo of Small Sanples. Bat
Ohe Ghold note that tae methods and teorg
q Sma ll Samples applicoble bo lange
Barnples but it Coveise
when
Sample &ine Sma |), as to ften
ease n paetie, tu Conttal linit
(leorem loes not
not appy ne must ton
èmpuse 8\icter aMunptcons on teu poplatioy
to quwe latstial valdik to te kestprocacha
asunpbio
Fáom bwich Tlu Sample b balken has
normal Aobabiiby dostibution bo bogis wits
* Dogee fheodom (di): ey degue freedom
we mean numbr classes bo whih
he value Can be
nignod asb taily Dr
or af
withot Asthidios Or
Ainiations plasd.
For ercanple ashe to
whose todal 5o. cleorly
we at feelonn
e
Mubes eSoy lo, 23, g but tu fourt5
rmter, fireol ine ttu botal
6o To- C1o +23+ )
7)i o . Thus'we e quen
a estkiction, hare bue qreedom asclaylcon
area tue
de taiks a tre t- lisibution is
tn tee tal[ te norma)
largor fan te area
utribution.
tte t- dstibuteon N olkpendent
6 qte 3hape
lnereales tiee cshibton
AS Sampe Siza
becomes Opprorimately nomal.
Tee Stand ard lewiaton } reater
than .
8
tre t- drstribufion
tails
Je
a hf greator than the area in the 6ails Of
Tuo Standayol norha listibution, betaue we
}ndsodueng further
an
estimate ot
Vaniabilety
q. e men, medlian , and mode fue t- ostaibution
eual to
populor is ho) p
o-efhcient Comelabin The
lation! Coe foy Test
Woma)
960 |:
|:962
O42 2
213|
251)
o25 totreodg lagnee
af
n
wonmal) te
ed dishobuon
Qre fofrocdan
r
to t a
dogsecs
f ous Vard too Vales Critical y
Valtag Citca) T-
numbeY&.
lore Lawg tue by claser
bo
get
values increOog
Sixe
n Sampe tte as
urve. olensity ma
l no
te toloser get luve
y teireasos In Sample tu'
33)
Shonat h of tto volahnship
desenbi
beean wo Vanad
CoYrolation
The Co-efdent tho clopo of
behemn wo Vana Ha
Yeopeslon
Vamabos aue bean andardiod
when bote
t a r moang aud aividing
Ssbhihgstandasod moang Or doation
\oy bolwaen
Comolation
rovg
mias one
plus
Casinptive
tohenSpecial dishhbutional assUmptiong
Shatisic, ho aboot t e Voaniates
wada
to be
whi ch
it is Calculated
Comelatien oefaens
fest fo
Rrmola
t r
with dagness
State moll and alemate hypotesis
Ho
Ha p +0
Hee p Is tta popuation Comelatior
Lo-oftiedent
Stote tho Significanto davel.
the fest eatisttes of Corobhn
to-efHelent oith
aboe - dahod
above
brma
do's ion Use
Vale approach or tha p- Vale approach
Fnalty Stete tfe Conlgion
Me emar's Test:
hominal aato
most
Juoo Oud one indopendent
ategones aud
toitt oo ConnoCted oops
Variabio
Vali'dation Sek V;
Tha percantagu of
Vali'dation Gets
clossiers (On
oordod as p and p?
buo obgcifiahisn alguitims
daue Some eor rate, Hhon we expect
Jhoue Same
ttthem o
eruvalently,te differenca of tfair maansis
Tates
in
,The difperera
ld i as Ri =P-P tu's a paied
test
that is or each both aloithns
bhe Sone
Validaim Ses
k
(Pi-m)
K -)
*Undar coha hyoóllais ttok o,
have a Shatist t tat is 4- di'sibute
oith
Vanana in
the t statisti ray
Somattas Uhalosastimaded, te maan he
eshimated
regu in t Valos
large