FDS Assignment
FDS Assignment
FDS Assignment
Machine learning is a subset of AI, which enables the machine to automatically learn
from data, improve performance from past experiences, and make predictions.
Machine learning contains a set of algorithms that work on a huge amount of data. Data
is fed to these algorithms to train them, and on the basis of training, they build the
model & perform a specific task.
These ML algorithms help to solve different business problems like Regression,
Classification, Forecasting, Clustering, and Associations, etc.
Based on the methods and way of learning, machine learning is divided into mainly four
types, which are:
4. Reinforcement Learning
The main goal of the supervised learning technique is to map the input variable(x) with
the output variable(y). Some real-world applications of supervised learning are Risk
Assessment, Fraud Detection, Spam filtering, etc.
Supervised machine learning can be classified into two types of problems, which are
given below:
○ Classification
○ Regression
a) Classification
Classification algorithms are used to solve the classification problems in which the
output variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc.
The classification algorithms predict the categories present in the dataset. Some
real-world examples of classification algorithms are Spam Detection, Email filtering,
etc.
b) Regression
Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous
output variables, such as market trends, weather prediction, etc.
○ Lasso Regression
Unsupervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision. It means, in unsupervised machine learning,
the machine is trained using the unlabeled dataset, and the machine predicts the output
without any supervision.
In unsupervised learning, the models are trained with the data that is neither classified
nor labelled, and the model acts on that data without any supervision.
The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences. Machines are
instructed to find the hidden patterns from the input dataset.
So, now the machine will discover its patterns and differences, such as colour
difference, shape difference, and predict the output when it is tested with the test
dataset.
Unsupervised Learning can be further classified into two types, which are given below:
○ Clustering
○ Association
1) Clustering
The clustering technique is used when we want to find the inherent groups from the
data. It is a way to group the objects into a cluster such that the objects with the most
similarities remain in one group and have fewer or no similarities with the objects of
other groups. An example of the clustering algorithm is grouping the customers by their
purchasing behaviour.
○ Mean-shift algorithm
○ DBSCAN Algorithm
Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat,
FP-growth algorithm.
3. Semi-Supervised Learning
4. Reinforcement Learning
In reinforcement learning, there is no labelled data like supervised learning, and agents
learn from their experiences only.
The reinforcement learning process is similar to a human being; for example, a child
learns various things by experiences in his day-to-day life. An example of reinforcement
learning is to play a game, where the Game is the environment, moves of an agent at
each step define states, and the goal of the agent is to get a high score. Agent receives
feedback in terms of punishment and rewards.
Due to its way of working, reinforcement learning is employed in different fields such as
Game theory, Operation Research, Information theory, multi-agent systems.
Classification and Regression algorithms are Supervised Learning algorithms. Both the
algorithms can be used for forecasting in Machine learning and operate with the labelled
datasets. But the distinction between classification vs regression is how they are used
on particular machine learning problems.
The mapping function is used for The mapping function is used for the
assigning values to predefined groups. assignment of values to continuous
output.
In Classification, the output element must In Regression, the output element must
be a discrete attribute. be of the constant type of real value.
The role of the classification algorithm is The role of the regression algorithm is to
to map the input value(x) with the discrete map the continuous output variable(y)
output variable(y). with the input value (x).
Classification Algorithms are used for Regression Algorithms are used for
discrete data. continuous data.
In machine learning, features are individual independent variables that act like an input in a system.
Actually, while making the predictions, models use such features to make the predictions. And using the
feature engineering process, new features can also be obtained from old features in machine
learning.Features in machine learning is very important, being building a blocks of datasets, the quality
of the features in your dataset has major impact on the quality of the insights you will get while using
the dataset for machine learning.However, depending on the different business problems in different
industries it is not necessary the features should be same features, so here you need to strongly
understand the business goal of your data science project.
The training data is the biggest (in -size) subset of the original dataset, which is used to train or fit the
machine learning model. Firstly, the training data is fed to the ML algorithms, which lets them learn how
to make predictions for the given task.
The training data varies depending on whether we are using Supervised Learning or Unsupervised
Learning Algorithms.
For Unsupervised learning, the training data contains unlabeled data points, i.e., inputs are not tagged
with the corresponding outputs. Models are required to find the patterns from the given training datasets
in order to make predictions.
On the other hand, for supervised learning, the training data contains labels in order to train the model
and make predictions.
The type of training data that we provide to the model is highly responsible for the model's accuracy and
prediction ability. It means that the better the quality of the training data, the better will be the
performance of the model. Training data is approximately more than or equal to 60% of the total data for
an ML project.
The target vector is a central concept in supervised machine learning, representing the output variable
that a model aims to predict based on input features. In essence, it's the desired outcome or the variable
we seek to approximate accurately. The target vector guides the learning process, shaping the model's
understanding of the data. Its accuracy and relevance are vital for the model's ability to generalize to
new data, making it a crucial element in building effective machine learning models.
In supervised learning, we have a dataset containing both input features and corresponding target values.
The model uses this labeled data to learn patterns and relationships, enabling it to make predictions on
new, unseen data.
In regression, the target vector contains continuous numerical values. The model's goal is to predict a
continuous output, like estimating a house price based on features like size, location, etc.
In classification, the target vector comprises categorical labels representing different classes. The
model's task is to assign the correct class label to new data points, using patterns learned from the
training data.
Once we train the model with the training dataset, it's time to test the model with the test dataset. This
dataset evaluates the performance of the model and ensures that the model can generalize well with the
new or unseen dataset. The test dataset is another subset of original data, which is independent of the
training dataset. However, it has some similar types of features and class probability distribution and
uses it as a benchmark for model evaluation once the model training is completed. Test data is a
well-organized dataset that contains data for each type of scenario for a given problem that the model
would be facing when used in the real world. Usually, the test dataset is approximately 20-25% of the
total original data for an ML project.
At this stage, we can also check and compare the testing accuracy with the training accuracy, which
means how accurate our model is with the test dataset against the training dataset. If the accuracy of the
model on training data is greater than that on testing data, then the model is said to have overfitting.
As the number of dimensions or features increases, the amount of data needed to generalize the machine
learning model accurately increases exponentially. The curse of dimensionality is a fundamental
challenge that arises when dealing with high-dimensional data in machine learning. With each added
dimension, the amount of data required to represent the feature space accurately grows exponentially.
This sparsity of data points can lead to issues like overfitting, where models capture noise rather than
meaningful patterns.
Moreover, the curse of dimensionality leads to increased computational demands. Algorithms that
perform well in lower dimensions might become impractical in high-dimensional spaces due to
escalating processing requirements. Identifying relevant features also becomes intricate, as irrelevant or
redundant dimensions introduce noise that can degrade model performance.
Adi
scri
minantf
unct
iont
hati
sal
i
nearcombi
nat
ionoft
hecomponent
sofxcanbe
wr
it
tenas
wherewi stheweightvect
orandw0t hebi
asorthreshol
dweight.Li
neardiscri
minant
funct
ionsaregoingtobestudi
edforthetwo-cat
egorycase,mult
i-
categorycase,and
generalcaseForthegener
alcasetherewi
llbecsuchdiscri
minantfuncti
ons,onefor
eachofccat egor
ies.
Di
scr
imi
nantFunct
ionsar
e
TheTwo-
Cat
egor
yCase
TheMul
ti
cat
egor
yCase
Gener
ali
zedLi
nearDi
scr
imi
nantFunct
ions
TheTwo-
Cat
egor
yCase
Foradiscri
mi nantfuncti
onoft heform ofeq. 9.
1,at wo-categor yclassifi
er
i
mplement sthef oll
owingdecisionrule:Decidew1i fg(x)>0andw2i fg(x)<0.
T
Thus,xisassignedt ow1iftheinnerpr oductw xexceedst het hreshold–w0and
tow2otherwise.Ifg(x)=0,xcanor di
nar i
l
ybeassi gnedt oeithercl ass,orcanbel eft
undef
ined.Theequat ong(
i x)=0def i
nest hedecisionsurfacet hatsepar atespoints
assi
gnedt ow1f rom point
sassi gnedtow2.Wheng( x)i
sl i
near, thi
sdeci sionsurf
ace
i
sahy perplane.Ifx1andx2arebot hont hedecisionsur f
ace, t
hen
or
wherexpisthenormalproj
ect
ionofxontoH,andristhedesi
redalgebr
aicdi
stance
whichisposit
ivei
fxisontheposit
ivesi
deandnegati
veifxisonthenegati
veside.
Then,becauseg(xp)
=0,
Si
nce t
hen
or
Thel
i
neardeci
si yHsepar
onboundar atest
hef
eat
urespacei
ntot
wohal
f-
spaces.
I
npart
icular
,thedi
stancef
rom theori
gi oHi
nt sgiv
enby .I fw0>0,t
heori
ginison
t
hepositi
vesideofH,andifw0<0,i
tisont
henegat
ivesi fw0=0,
de.I heng(
t x)hasthe
homogeneousf orm ,
andt hehy perpl
anepassest hr
oughtheorigin(Fi
gure9.
2).
Ali
neardiscri
minantfunct i
ondi v
idest hefeatur
espacebyahy ¬perplanedeci
sion
sur
face.Theorientati
onoft hesur f
acei sdeterminedbythenormalvectorw,andthe
l
ocati
onoft hesurfaceisdet er
mi nedbyt hebiasw0.Thediscri
m¬inantfunctong(
i x)
i
sproporti
onaltothesi gneddistancef rom xtothehyperpl
ane,wi
thg( x)
>0whenxi s
onthepositi
veside,andg( x)<0whenxi sont henegati
veside.
TheMul
ti
cat
egor
yCase
Thereismor ethanonewayt odevisemul ti
categorycl
assi fi
ersempl oyingli
near
discri
mi nantfuncti
ons.Forexampl e, wemightreducet hepr oblem toct wo-cl
ass
t
h
problems, wher hei pr
et oblem issol vedbyal i
neardiscriminantfunct i
onthat
separatespoi ntsassignedt owi,f
rom t hosenotassignedt ow1.Amor eextravagant
approachwoul dbet ousec( c-1)
/2lineardiscri
minants, onef oreverypairofclasses.
Asi l
l
ust rat
edi nFi
gure9. 3,bothoft heseapproachescanl eadt oregionsinwhi chthe
cl
assi f
icati
oni sundefined.Weshal l avoi
dthisproblem bydef i
ningcl i
near
discri
mi nantfuncti
ons
andassigningxt owii
f forallj
¹i;i
ncaseofties,theclassi
fi
cati
onislef
t
undefi
ned.Ther esul
ti
ngclassi
f i
eriscaledal
l i
nearmachine.Al i
nearmachine
di
videsthef eaturespacei ocdeci
nt si
onregions(Fi
gure9.4),wihgj(
t x)bei
ngthe
l
argestdis¬criminantifxisi
nr egionRi.I
fRiandRjarecontiguous,theboundary
betweenthem i sapor ti
onofthehy per aneHijdef
pl i
nedby
or
I
tfol
lowsatoncet
hat i
snor
mal
toHijandt
hesi
gneddi
stancef
r oHiji
om xt s
gi
venby
Thus, wi
ththelinearmachi
neitisnottheweightvectorst
hemselvesbuttheir
di
fferencesthatareimport
ant.Whilet
herearec(c-1)/
2pairsofr
egions,
theyneed
notallbecontiguous,andt
het ot
alnumberofhy perpl
anesegmentsappearingint
he
decisi
onsurfacesi soft
enfewerthanc(c-
1)/
2,asshowni nFigur
e9.4.
Li
neardeci
sionboundar
iesf
oraf
our
-cl
asspr
obl
em wi
thundef
inedr
egi
ons.
Deci
si
onboundar
iesdef
inedbyal
i
nearmachi
ne.
Gener
ali
zedLi
nearDi
scr
imi
nantFunct
ions
Thel
i
neardi
scr
imi
nantf
unct
iong(
x)canbewr
it
tenas
wherethecoeff
ici
entswiar
ethecomponentsoft
heweightvect
orw.Byaddi
ng
addi
ti
onal t
ermsinvol
vi
ngtheproduct
sofpair
sofcomponentsofx,
weobtai
n
thequadrat
icdi
scri
minantf
uncti
on
Becausexixj=xjxiwecanassumethatwij=wjiwi
thnol
ossi
ngener
ali
ty.Theeq.
9.9
eval
uatesin2Df eat
urespaceas
whereeq. 9.
9and9. 10ar eequivalent.Thus, t
hequadrat i
cdi
scri
mi nantf uncti
onhas
anadditionald(
d+l)/2coeffici
entsati t
sdi sposalwit
hwhi chtoproducemor e
compl i
catedseparatingsurfaces.Thesepar at
ingsurfacedefnedbyg(
i x)=0isa
second-degreeorhy perquadri
csur face.Ifthesymmet r
icmatr
ixW=[ wij]
, wheret
he
el
ement swijarethewei ghtsofthel asttermi neq.9.
9,isnonsi
ngularthel i
neart
erms
ng(
i x)canbeel i
minat edbyt r
anslatingtheaxes.Thebasi ccharacteroft he
separati
ngsur f
acecanbedescr ibedi ntermsoft hescaledmatrix
wher
e wij]
and W=[ .
Ty
pesofQuadr
ati
cSur
faces:
Thetypesofquadrat
icsepar
ati
ngsur
facest
hatar
isei
nthegener
almul
ti
var
iat
e
Gaussiancasear
easf ol
lows.
1.I
f i
saposi
ti
vemul
ti
pleoft
hei
dent
it
ymat
ri
x,t
hesepar
ati
ngsur
facei
s
ahy
per
spher
esucht
hat
: , ek³0.Al
wher sonot
ethat
,hy
per
spher
eisdef
ined
as
2.I
f i
sposi
ti
vedef
ini
te,
thesepar
ati
ngsur
f sahy
acesi per
ell
i
psoi
dwhoseaxes
ar
eint
hedi
rect
ionsoft
heei
genv
ect
orsof .
3.I
fnoneoftheabovecaseshol
ds,t
hatis,
someoftheei
genval
uesof ar e
posi
ti
veandothersar
enegati
ve,
thesurf
aceisoneoft
hevari
eti
esoft
ypes
ofhyper
hyper
boloi
ds.
Byconti
nui
ngt oaddt mssuchaswijkxixjxk,
er wecanobt ai
ntheclass
ofpolyno¬mialdiscr
imi
nantfunct
ions.Thesecanbet houghtofastruncatedser
ies
expansionsofsomear bi
tr
aryg(x)
,andt hi
sint urnsuggestthegeneral
izedli
near
discri
minantfuncti
on
or
wher
eai snowa - di
mensional
wei ghtv
ect
orandwherethe f unctionsy(
ix)can
bear
bit
raryfunct
ionsofx.Suchfuncti
onsmightbecomputedbyaf eatur
edet ect
ing
subsy
stem.Byselecti
ngt
hesef
unct
ionsj
udi
¬ci
ousl
yandlet
ti
ng besuf f
ici
entl
y
l
arge,onecanapproxi
mat
eanydesi
reddi
scr
imi
¬nantf
uncti
onbysuchanexpansi
on.
Ther
esul
ti
ngdi
scr
imi
nantf
unct
ioni
snotl
i
neari
nx,
buti
tisl
i
neari
ny.The
f
unct
ionsy
i(
x)mer
elymappoi
nt nd-
si dimensi
onal
x-spacet
opoi
ntsi
n -
dimensionaly
-space.Thehomogeneousdiscri
mi nant sep¬arat
espoint
sinthi
s
transf
ormedspacebyahy perpl
anepassi
ngthroughtheor
igi
n.Thus,t
hemapping
from xtoyreducestheproblem t
ooneoffindi
ngahomogeneousl i
neardi
scri
minant
functi
on.
Someoftheadv
antagesanddisadvant
agesoft
hisappr
oachcanbecl
ari
fi
edby
consi
der
ingasi
mpleexample.Letthequadr
ati
cdi
scri
minantf
unct
ionbe
g(
x)=a1+a2x+a3x2
sot
hatt
het
hree-
dimensi
onal
vect
oryi
sgi
venby
Themappi
ngfrom xtoyi
si l
l
ust
ratedi
nFi
gur
e9.5.Thedat
aremai
ninher
entl
yone-
di
mensi
onal
,becausevar
yingxcausesyt
otr
aceoutacurvei
nthr
eedimensi
ons
1xx2)
T
Themappi
ngy
=( takesal
i
neandt
ransf
ormsi
ttoapar
abol
a.
Themappi
ngfrom xtoyi
si l
l
ust
ratedi
nFi
gur
e9.5.Thedat
aremai
ninher
entl
yone-
di
mensi
onal
,becausevar
yingxcausesyt
otr
aceoutacurvei
nthr
eedimensi
ons.
Thepl
ane def
inedby =0di
vi
dest
hey
-spacei
ntot
wodeci
sionr
egi
ons and
T
.Fi e5.
gur 6showst
hesepar
ati
ngpl
anecor
respondi
ngt
oa=(
-1,
1,2),t
hedeci
sion
regions and , andtheircorr
espondi
ngdecisionregions and i ntheor
igi
nal
x
-space.Thequadraticdi
scriminantf
unct
iong(x)=-1+x+2x2i
sposi
tiv
eifx<-1or
fx>0.
i 5.Ev
enwithr el
ati
velysimplef
unctonsy
i i(
x),decisi
onsur
facesinducedi
nanx-
spacecanbef ai
rl
ycompl ex.
Whi l
eitmaybehardtoreali
zethepotenti
albenefi
tsofageneral
izedl
inear
dis¬cri
minantf
unct
ion,wecanatleastexploi
ttheconveni
enceofbeingablet
owrite
g(x)inthehomogeneousform at
y.I
nt hepart
icul
arcaseoftheli
neardiscr
imi
nant
functi
onwehav e
ewesetx0=1.Thuswecanwr
wher it
e
andyissomet
i edanaugment
mescal
l edf
eat
urev
ect
or.Li
kewi anaugment
se, ed
weightv
ect
orcanbewri
tt
enas:
Thismappi ngf r
om d-dimensionalx-spaceto(d+1)
-dimensionaly
-spaceis
mat hematicall
ytri
vi
al butnonethel
essquiteconvenient
.Byusingthismappingwe
reducethepr oblem offindi
ngawei ghtvectorwandat hresholdweightw0tot
he
problem offindi
ngasi ngleweightvectora.
1xx2)
T
Themappi
ngy
=( takesal
i
neandt
ransf
ormsi
ttoapar
abol
a.
Bay
esi
anDeci
si
onTheor
y
Bay
esi
anDeci
si
onTheor
yisaf
undament
alst
ati
sti
cal
appr
oacht
othepr
obl
em of
pat
ter
ncl
assi
fi
cat
ion.I
tisconsi
der
edast
hei
deal
pat
ter
ncl
assi
fi
erandof
tenusedas
t
hebenchmar
kforot
heral
gor
it
hmsbecausei
tsdeci
si
onr
uleaut
omat
ical
l
y
mi
nimi
zesi
tsl
ossf
unct
ion.I
tmi
ghtnotmakemuchsenser
ightnow,
sohol
don,
we’
l
l
unr
avel
ital
l
.
I
tmakest
heassumpt
iont
hatt
hedeci
si
onpr
obl
em i
sposedi
npr
obabi
l
ist
ict
erms,
andt
hatal
lther
elev
antpr
obabi
l
ityv
aluesar
eknown.
Bay
es’
Theor
em
Deri
vati
onofBay es’Theorem:
Wek nowfrom thecondit
ionalprobabi
li
ty
:P(
A|B)=P(A,B)/P(
B)
=>P(A,B)=P( A|
B)*P(B)...(i
)Si
milar
ly
,
P(A,
B)=P( B|A)*P(A)..
.(i
i)Fr
om equati
on(i
)and(ii
):
P(A|
B)*P(B)=P(
B|A)*P(
A)
=>P(A|B)=[P(B|
A)*P(A)] /P(B)
Fort
hecaseofcl
assi
fi
cat
ion,
let
:
A≡ (
stat
eoft
henat
ureort
hecl
assofanent
ry)
B≡x(
inputf
eat
urev
ect
or)
Aftersubst ituti
ngweget :
P(ω| x)=[ P( x|
ω)*P( ω) ]/P( x)
whi chbec omes :P(ω|x )=[ p(x|ω)*P( ω) ]/p( x)
bec aus e,
*
P(ω| x)≡c all
edt hepos t
erior,i
tisthepr obabi l
i
tyoft hepr edictedc l
as stobeωf ora
givenent r
yoff eature( x).Analogoust oP( O|θ),becaus et hec las sist hedes ired
outc omet obepr edi ctedac cordingtot hedat adi st
ribution( model ).Capital'P'bec aus e
ωi sadi scret erandom v ariabl
e.*p( x|
ω)≡c lass -
condi t
ional probabi lit
ydens i
tyfunc ti
on
forthef eat ure.Wec alltl
i ikeli
hoodofωwi t
hr espec ttox , atermc hos ent oindic atethat,
othert hingsbei ngequal ,t
hec ategor y(orc l
ass )forwhi c hitisl argei smor e" l
ikely"tobe
thet ruec ategor y.Itisaf unc ti
onofpar amet erswi t
hint hepar amet er i
cs pacet hat
des cribest hepr obabi li
tyofobt ainingtheobs erveddat a( x )
.Smal l'P'bec ausexi sa
cont i
nousr andom v ar i
able. Weus uallyas sumei ttobef ol l
owi ngGaus si
anDi stributi
on.*
P(ω)≡apr i
or iprobabi li
ty(ors i
mpl yprior)ofc lassω. I
ti sus ual l
ypr e-determinedand
dependsont heex ter nalfac t
ors.Itmeanshowpr obablet heoc curenc eofc lassωoutof
allthec lass es.*p(x )≡c alledtheev i
denc e,i
tismer elyas cali
ngf actorthatguar antees
thatt hepos teri
orpr obabi l
iti
ess um toone. p(x)=s um( p( x |
ω)* P(ω) )ov erallthec l
asses.
Sof inallywegett hef ol
lowi ngequat iont of r
ameourdeci sionr ul
e:
Bay
es’
For
mul
aforCl
assi
fi
cat
ion
Deci
sionRul
e
Theabov
eequat
ioni
sthegov
erni
ngf
ormul
aforourdeci
si
ont
heor
y.Ther
ulei
sas
f
oll
ows:
Foreachsampl
einput
,i
tcal
cul
atesi
tspost
eri
orandassi
gni
ttot
hecl
ass
cor
respondi
ngt
othemax
imum v
alueoft
hepost
eri
orpr
obabi
l
ity
.Mat
hemat
ical
l
yit
canbewr
it
tenas:
Bay
es’
Deci
sionRul
e
Bay
esi
ancl
assi
fi
cat
ionf
ornor
mal
dist
ri
but
ions:
NaiveBayesclassi
fi
ersareacoll
ecti
onofcl assif
icat
ionalgori
thmsbasedonBay es’
Theorem.Iti
snotasi ngl
ealgori
thm butafami l
yofalgori
thmswher eal
lofthem
shar
eacommonpr i
ncipl
e,i
.e.ev
erypairoffeaturesbeingclassi
fi
edisi
ndependent
ofeachother
.
Gaussi
anNai
veBay
escl
assi
fi
er
InGaussi
anNaiveBayes,conti
nuousv al
uesassociat
edwi t
heachf eat
ureare
assumedt obedi
str
ibutedaccordingtoaGaussiandistr
ibution.AGaussian
dist
ri
buti
onisal
socalledNormal dist
ri
but
ion.Whenplotted,itgiv
esabellshaped
curvewhichi
ssymmet ri
caboutt hemeanoft hefeat
urev al
uesasshownbel ow:
Thelikel
ihoodofthef
eat
uresi
sassumedt
obeGaussi
an,
hence,
condi
ti
onal
probabil
i
tyisgiv
enby:
TheGaussi
an(
Nor
mal
)Densi
ty
Thedefi
nit
ionoftheexpect
edv
alueofascal
arf
unct
ionf
(x)def
inedf
orsome
densi
typ(
x)isgiv
enby
I
fthevaluesofthef
eat
urexar
erest
ri
ctedt
opoi
ntsi
nadi
scr
etesetDwemustsum
overal
lsamplesas
eP(
wher x)i
sthepr
obabi
l
itymass.
Thecont
inuousuni
var
iat
enor
mal
densi
tyi
sgi
venby
emeanm (
wher expect
edv
alue,
aver
age)i
sgi
venby
andt
hespeci
alexpect
ati
ont
hati
svar
iance(
squar
eddev
iat
ion)i
sgi
venby
Theunivari
atenormaldensi
tyisspeci
fi
edbyitstwopar
amet ers,i
tsmeanm, andthe
varances.Sampl
i esfrom normaldi
stri
buti
onstendt
oclusteraboutthemean,and
theextendtowhichtheyspreadoutdependsonthevar
iance(Figure4.
4).
GaussianDi
scr
imi
nantAnal
ysi
sini
tsgeneral
for
m assumesthatp(
x|t
)isdi
str
ibut
ed
accordi
ngtoamul
tiv
ari
atenor
mal(Gaussi
an)di
str
ibut
ion.Mul
ti
var
iat
eGaussian
dist
ri
buti
on:
wher
e|Σk|
denot
est
hedet
ermi
nantoft
hemat
ri
x,anddi
sdi
mensi
onofx.
Eachcl
asskhasassoci
atedmeanv
ect
orµkandcov
ari
ancemat
ri
xΣk.
Ty
pical
l
ythecl
assesshar
easi
ngl
ecov
ari
ancemat
ri
xΣ(
“shar
e”meanst
hatt
hey
hav
ethesamepar
amet
ers;
thecov
ari
ancemat
ri
xint
hiscase)
:Σ=Σ1=···=Σk.
Mul
ti
var
iat
edat
a:
Mul
ti
plemeasur
ement
s(sensor
s)
di
nput
s/f
eat
ures/
att
ri
but
es
Ni
nst
ances/
obser
vat
ions/
exampl
es
Mul
ti
var
iat
epar
amet
ers:
Mean:E[
x]=[
µ1,
···,
µd]T
Cov
ari
ance:
Cor
rel
ati
on=Cor
r(x)i
sthecov
ari
ancedi
vi
dedbyt
hepr
oductofst
andar
d
dev
iat
ion:
x∼N(
µ,Σ)
,aGaussi
an(
ornor
mal
)di
str
ibut
iondef
inedas:
Mahal
anobi
sdi
stance(
x−µk)TΣ−1(
x−µk)measur
est
hedi
stancef
rom xt
oµi
n
t
ermsofΣ.
Appl
i
cat
ionsofNai
veBay
esAl
gor
it
hm
Asyoumust ’
venot
iced,t
hisalgor
it
hm of
fersplent
yofadvantagest
oit
suser
s.
That
’swhyithasalotofappli
cati
onsi
nv ar
ioussector
stoo.Herear
esome
appl
icat
ionsofNai
veBayesalgori
thm:
Ast
hisal
gor
it
hm i
sfastandef
fi
cient
,youcanusei
ttomaker
eal
-t
ime
pr
edi
cti
ons.
Thi
sal
gor
it
hm i
spopul
arf
ormul
ti
-cl
asspr
edi
cti
ons.Youcanf
indt
he
pr
obabi
l
ityofmul
ti
plet
argetcl
asseseasi
l
ybyusi
ngt
hisal
gor
it
hm.
Emai
lser
vices(
li
keGmai
l
)uset
hisal
gor
it
hm t
ofi
gur
eoutwhet
heranemai
li
s
aspam ornot
.Thi
sal
gor
it
hm i
sexcel
l
entf
orspam f
il
ter
ing.
I
tsassumpt
ionoff
eat
urei
ndependence,
andi
tsef
fect
ivenessi
nsol
vi
ngmul
ti
-
cl
asspr
obl
ems,
makesi
tper
fectf
orper
for
mingSent
imentAnal
ysi
s.
Sent
imentAnal
ysi
sref
erst
othei
dent
if
icat
ionofposi
ti
veornegat
ive
sent
iment
sofat
argetgr
oup(
cust
omer
s,audi
ence,
etc.
)
Col
l
abor
ati
veFi
l
ter
ingandt
heNai
veBay
esal
gor
it
hm wor
ktoget
hert
obui
l
d
r
ecommendat
ionsy
stems.Thesesy
stemsusedat
ami
ningandmachi
ne
l
ear
ningt
opr
edi
cti
ftheuserwoul
dli
keapar
ti
cul
arr
esour
ceornot
.
CONVOLUTIONAL NEURAL NETWORKS
Convolutional Neural Networks are a special type of feed-forward artificial neural network in
which the connectivity pattern between its neurons is inspired by the visual cortex. The visual
cortex encompasses a small region of cells that are region sensitive to visual fields. In case some
certain orientation edges are present then only some individual neuronal cells get fired inside the
brain such as some neurons responds as and when they get exposed to the vertical edges,
however some responds when they are shown to horizontal or diagonal edges, which is nothing
but the motivation behind Convolutional Neural Networks.
The Convolutional Neural Networks, which are also called covnets, are nothing but neural
networks, sharing their parameters. Suppose that there is an image, which is embodied as a
cuboid, such that it encompasses length, width, and height. Here the dimensions of the image are
represented by the Red, Green, and Blue channels, as shown in the image given below.
Now assume that we have taken a small patch of the same image, followed by running a small
neural network on it, having k number of outputs, which is represented in a vertical manner. Now
when we slide our small neural network all over the image, it will result in another image
constituting different width, height as well as depth. We will notice that rather than having R, G,
B channels, we have come across some more channels that, too, with less width and height,
which is actually the concept of Convolution. In case, if we accomplished in having similar patch
size as that of the image, then it would have been a regular neural network. We have some wights
due to this small patch
Working of CNN
Generally, a Convolutional Neural Network has three layers, which are as follows;
Convolution layer :
Convolution layer is the first layer to extract features from an input image. By learning image
features using a small square of input data, the convolutional layer preserves the relationship
between pixels. It is a mathematical operation which takes two inputs such as image matrix and a
kernel or filter.
o The dimension of the image matrix is h×w×d.
o The dimension of the filter is fh×fw×d.
o The dimension of the output is (h-fh+1)×(w-fw+1)×1.
Let's start with consideration a 5*5 image whose pixel values are 0, 1, and filter matrix
3*3 as:
The convolution of 5*5 image matrix multiplies with 3*3 filter matrix is called "Features Map"
and show as an output.
Convolution of an image with different filters can perform an operation such as blur, sharpen,
and edge detection by applying filters.
Strides : Stride is the number of pixels which are shift over the input matrix. When the stride
is equaled to 1, then we move the filters to 1 pixel at a time and similarly, if the stride is equaled
to 2, then we move the filters to 2 pixels at a time. The following figure shows that the
convolution would work with a stride of 2.
Padding : Padding plays a crucial role in building the convolutional neural network. If the
image will get shrink and if we will take a neural network with 100's of layers on it, it will give
us a small image after filtered in the end. To overcome this, we have introduced padding to an
image. "Padding is an additional layer which can add to the border of an image."
Pooling Layer
Pooling layer plays an important role in pre-processing of an image. Pooling layer reduces the
number of parameters when the images are too large. Pooling is "downscaling" of the image
obtained from the previous layers. It can be compared to shrinking an image to reduce its pixel
density. Spatial pooling is also called down sampling or subsampling, which reduces the
dimensionality of each map but retains the important information. There are the following types
of spatial pooling:
Max Pooling : Max pooling is a sample-based discretization process. Its main objective is to
downscale an input representation, reducing its dimensionality and allowing for the assumption
to be made about features contained in the sub-region binned. Max pooling is done by applying a
max filter to non-overlapping sub-regions of the initial representation.
Average Pooling : Down-scaling will perform through average pooling by dividing the input
into rectangular pooling regions and computing the average values of each region.
Sum Pooling : The sub-region for sum pooling or mean pooling are set exactly the same as
for max-pooling but instead of using the max function we use sum or mean.
The fully connected layer is a layer in which the input from the other layers will be flattened into
a vector and sent. It will transform the output into the desired number of classes by the network.
In the above diagram, the feature map matrix will be converted into the vector such as x1, x2,
x3... xn with the help of fully connected layers. We will combine features to create a model and
apply the activation function such as softmax or sigmoid to classify the outputs as a car, dog,
truck, etc.
REGULARIZATION
Regularization refers to techniques that are used to calibrate machine learning models in order to
minimize the adjusted loss function and prevent overfitting or underfitting. Regularization is a technique
that helps prevent overfitting, which occurs when a neural network learns too much from the training data
and fails to generalize well to new data. Convolutional neural networks (CNNs) are a type of neural
network that are especially good at processing images, but they can also suffer from overfitting due to
their high complexity and large number of parameters.
How does regularization work in CNNs?
Regularization is a way of adding some constraints or penalties to the model, so that it does not overfit the
training data. There are different types of regularization methods, but they all aim to reduce the variance
of the model and increase its bias. Variance measures how sensitive the model is to small changes in the
data, while bias measures how far the model is from the true relationship. A good model should have low
variance and low bias, but there is usually a trade-off between them.
Regularization helps find a balance between them by shrinking or pruning the model parameters, adding
noise or dropout to the layers, or augmenting the data with transformations.
What are some common regularization methods for CNNs?
Regularization methods for CNNs are commonly used to reduce overfitting. L1 and L2 regularization,
also known as weight decay or ridge and lasso regularization, add a term to the loss function that
penalizes large weights in the model. L1 regularization tends to make some weights zero, while L2
regularization makes all weights smaller. Dropout is a technique that randomly drops out some units or
neurons in the hidden layers during training, preventing co-adaptation of features. Data augmentation
artificially increases the size and diversity of the training data by applying random transformations, such
as flipping, rotating, cropping, scaling, or adding noise to the images. This helps the model learn from
different perspectives and variations of the data.
INITIALIZATION
Initialization in the context of neural networks or CNN refers to the process of setting the initial values for
the parameters (weights and biases) of the network before training begins. Proper initialization is crucial
for achieving efficient and stable training, as it can help prevent issues like vanishing gradients, exploding
gradients, and slow convergence.
Various examples of Convolutional Neural Networks (CNNs) that have been influential in the field of
computer vision:
1. LeNet-5:
● LeNet-5 is one of the earliest CNN architectures developed by Yann LeCun for
handwritten digit recognition. It played a crucial role in demonstrating the effectiveness
of CNNs for image classification.
● It consists of two sets of convolutional and average pooling layers followed by fully
connected layers. It also introduced the concept of using non-linear activation functions
(sigmoid) in convolutional layers.
2. AlexNet:
● AlexNet, developed by Alex Krizhevsky, is a significant milestone in the resurgence of
neural networks. It won the ImageNet Large Scale Visual Recognition Challenge
(ILSVRC) in 2012.
● It features multiple convolutional and max-pooling layers, ReLU activation functions,
and dropout for regularization. It also popularized the use of GPUs for training deep
networks.
3. VGGNet:
● VGGNet, created by the Visual Geometry Group (VGG) at the University of Oxford, is
known for its simplicity and uniform architecture.
● It consists of multiple convolutional layers with small 3x3 filters, followed by
max-pooling layers. The repeated stacking of these layers results in a deep network.
4. GoogLeNet (Inception):
● The GoogLeNet architecture introduced the concept of "inception modules," which use
multiple filter sizes within the same layer to capture features at various scales.
● This design allows the network to efficiently extract both local and global features,
contributing to improved accuracy.
5. ResNet (Residual Network):
● ResNet is a breakthrough architecture that addresses the challenge of training very deep
networks by introducing residual connections.
● Residual connections allow the network to skip certain layers and retain information from
previous layers, combating the vanishing gradient problem.
6. DenseNet:
● DenseNet is designed to improve gradient flow and feature reuse by connecting each
layer to all subsequent layers in a feed-forward manner.
● DenseNet's densely connected blocks lead to more compact models and better feature
propagation.
7. MobileNet:
● MobileNet focuses on efficient model architectures for mobile and embedded devices. It
introduces depthwise separable convolutions to reduce the computational cost.
● Depthwise separable convolutions split the standard convolution into separate depthwise
and pointwise convolutions, significantly reducing the number of parameters.
8. YOLO (You Only Look Once):
● YOLO is an object detection architecture that performs real-time object detection in a
single pass. It divides the image into a grid and predicts bounding boxes and class
probabilities for each grid cell.
● YOLO's efficiency and speed make it suitable for real-time applications like video
analysis.
9. U-Net:
● U-Net is a CNN architecture designed for semantic segmentation tasks, such as
segmenting objects within an image.
● It has a U-shaped architecture with a contracting path (encoder) and an expanding path
(decoder), allowing it to capture both global and local context.
10. Transformer-based Vision Models (e.g., ViT, DeiT):
● Transformers, initially designed for natural language processing, have been adapted for
computer vision tasks with architectures like Vision Transformers (ViT) and
Data-efficient Image Transformer (DeiT).
● These models use attention mechanisms to capture relationships between different image
patches, eliminating the need for hand-designed convolutional architectures.
These are just a few examples of CNN architectures, each designed to tackle specific challenges in
computer vision. The field continues to evolve, and researchers are exploring novel architectures to
improve performance, efficiency, and versatility across various tasks.
Over-fitting
Overfitting is a problem in machine learning that occurs when a model learns the training data too
well and is unable to generalize to new data. This happens when the model is too complex and learns
the noise and patterns in the training data that are not relevant to the problem it is trying to solve.
A model that is overfit will perform very well on the training data, but it will not perform as well on
new data. This is because the model has learned the specific details of the training data, and it will
not be able to generalize to new data that has different features or patterns.
Use a simpler model. A simpler model is less likely to learn the noise in the training data.
Use regularization. Regularization is a technique that penalizes the model for complexity.
This can help to prevent the model from learning the noise in the training data.
Use cross-validation. Cross-validation is a technique for evaluating a model on data that it
has not seen before. This can help to identify if the model is overfitting the training data.
If you are concerned that your model is overfitting, you can try using some of these techniques to
prevent it.
If you are not sure if your model is overfitting, you can try using cross-validation to evaluate it. Cross-
validation will help you to identify if the model is performing well on the training data, but not on
new data. If this is the case, then your model is likely overfitting.
There are a number of techniques that can be used to prevent overfitting. Some of these techniques
include:
Data augmentation: This involves artificially increasing the size of the training data by
creating new data points that are similar to the existing data points. This can help to prevent
the model from overfitting to the noise in the training data.
Regularization: This involves adding a penalty to the model's loss function that discourages
the model from becoming too complex. This can help to prevent the model from overfitting
to the training data.
Early stopping: This involves stopping the training process early, before the model has had a
chance to overfit the training data. This can be done by monitoring the model's performance
on a validation set.
If you are concerned that your model is overfitting, you can try using some of these techniques to
prevent it.
Curse of Dimensionality
The curse of dimensionality arises from the fact that the volume of a high-dimensional space
increases exponentially with the number of dimensions. This means that there will be fewer data
points in each dimension, which can make it difficult to find patterns in the data. Additionally, the
distance between two points in high-dimensional space can be misleading, as two points that are far
apart in terms of Euclidean distance may actually be very close together in terms of other measures
of similarity.
The curse of dimensionality can have a number of negative implications for data science. For
example, it can make it difficult to:
Identify patterns in the data: As the number of dimensions increases, the data becomes
more sparse, which can make it difficult to find patterns in the data.
Build accurate models: Machine learning models are often trained on labeled data, which
means that they are given the correct labels for the data points. However, in high-
dimensional space, it is more likely that the data points will be mislabeled, which can lead to
inaccurate models.
Interpret the results of models: The results of machine learning models can be difficult to
interpret in high-dimensional space, as the models may be making decisions based on
features that are not easily understandable.
There are a number of techniques that can be used to mitigate the problems caused by the curse of
dimensionality. Some of these techniques include:
Dimensionality reduction: This involves reducing the number of dimensions in the data. This
can be done by using techniques such as principal component analysis (PCA) or linear
discriminant analysis (LDA).
Feature selection: This involves selecting a subset of the features in the data that are most
relevant to the problem being solved. This can help to reduce the dimensionality of the data
and improve the performance of machine learning models.
Regularization: This involves adding a penalty to the model's loss function that discourages
the model from becoming too complex. This can help to prevent the model from overfitting
to the training data.
Ensemble learning: This involves training multiple models on different subsets of the data
and then combining the predictions of the models. This can help to improve the robustness
of the model to the curse of dimensionality.
By using these techniques, it is possible to build machine learning models that can learn from high-
dimensional data and make accurate predictions.
The curse of dimensionality is an important concept to understand in data science, as it can have a
significant impact on the performance of machine learning models. By understanding the curse of
dimensionality and the techniques that can be used to mitigate its effects, data scientists can build
more accurate and reliable models
BIAS VARIANCE TRADE-OFF
What is Bias?
The bias is known as the difference between the prediction of the values by
the Machine Learning model and the correct value. Being high in biasing gives a
large error in training as well as testing data. It recommended that an algorithm
should always be low-biased to avoid the problem of under fitting. By high bias, the
data predicted is in a straight line format, thus not fitting accurately in the data in the
data set. Such fitting is known as the Underfitting of Data. This happens when
the hypothesis is too simple or linear in nature. Refer to the graph given below for
an example of such a situation.
What is Variance?
The variability of model prediction for a given data point which tells us the spread
of our data is called the variance of the model. The model with high variance has a
very complex fit to the training data and thus is not able to fit accurately on the data
which it hasn’t seen before. As a result, such models perform very well on training
data but have high error rates on test data. When a model is high on variance, it is
then said to as Over-fitting of Data. Over-fitting is fitting the training set accurately
via complex curve and high order hypothesis but is not the solution as the error with
unseen data is high. While training a data model variance should be kept low. The
high variance data looks as follows.
In such a problem, a hypothesis looks like follows.
The best fit will be given by the hypothesis on the trade -off point. The error to
complexity graph to show trade-off is given as –
This is referred to as the best point chosen for the training of the algorithm which
gives low error in training as well as testing data.
TRAINING SET
A training dateset is a collection of instances used in the learning process to fit the
parameters (e.g., weights) of a classifier
A supervised learning method for classification tasks examines the training dateset to
discover, or learn, the best combinations of variables that will produce a strong
predictive model.
The goal is to create a fitted model that does a good job of generalizing new, unknown
data. To estimate the model’s accuracy in categorizing fresh data, “new” instances
from the held-out datasets are used to evaluate the fitted model. The examples in
the validation and test datasets should not be utilized to train the model to minimize the
danger of over-fitting.
Most approaches to finding empirical links in training data tend to overfit the data,
which means they can find and exploit apparent links in the training data that don’t
hold in general.
VALIDATION SET
The model must be assessed regularly to be trained, which is exactly what the
validation set is for. We may determine how accurate a model is by computing the loss
it produces on the validation set at each given point. This is what training is all about.
If the most appropriate classifier for the problem is sought, the training data set is used
to train the various candidate classifiers, the data validation in machine learning is used
to compare their performances and choose which one to employ, and the test data set is
used to acquire performance characteristics like as F-measure, sensitivity, accuracy or
specificity.
The validation data set is a hybrid: it is training data that is used for testing, but it is not
included in either the low-level training or the final testing. Early stopping is a
technique in which the candidate models are iterations of the same network, and
training stops when the error on the validation set develops, choosing the previous
model – the one with the least error.. All based on our open-source core.
TEST SET
This refers to the model’s final evaluation when the training phase is done. This stage
is crucial for determining the model’s generalization. We can get the working accuracy
of our model by using this collection.
Machine learning validation vs testing = Instructing the model to learn from its
errors vs concluding on the model’s performance
MULTIVARIATE REGRESSION
Multivariate regression is a technique used to measure the degree to which the various
independent variable and various dependent variables are linearly related to each
other. The relation is said to be linear due to the correlation between the variables. Once
the multivariate regression is applied to the dataset, this method is then used to predict
the behaviour of the response variable based on its corresponding predictor variables.
An agriculture expert decides to study the crops that were ruined in a certain region. He
collects the data about recent climatic changes, water supply, irrigation methods,
pesticide usage, etc. To understand why the crops are turning black, do not yield any
fruits and dry out soon.
In the above example, the expert decides to collect the mentioned data, which act as the
independent variables. These variables will affect the dependent variables which are
nothing but the conditions of the crops. In such a case, using single regression would be
a bad choice and multivariate regression might just do the trick.
Steps to achieve multivariate regression.
The processes involved in multivariate regression analysis include the selection of
features, engineering the features, feature normalization, selection loss functions,
hypothesis analysis, and creating a regression model.
Predictive modelling: Regression analysis can be used to predict future values of the
dependent variable based on changes in the independent variable.
Causal analysis: Regression analysis can be used to determine whether changes in
the independent variable cause changes in the dependent variable.
Forecasting: Regression analysis can be used to forecast trends and patterns in the
data, which can be useful for planning and decision-making.
Control and optimization: Regression analysis can be used to optimize processes
and control systems by identifying the factors that have the greatest impact on the
outcome.
BIAS AND VARIANCE
Bias is simply defined as the inability of the model because of that there is some
difference or error occurring between the model’s predicted value and the actual
value. These differences between actual or expected values and the predicted values
are known as error or bias error or error due to bias. Bias is a systematic error that
occurs due to wrong assumptions in the machine learning process.
Let be the true value of a parameter and let be an estimator of based on a sample of
data. Then, the bias of the estimator is given by:
Low Bias: Low bias value means fewer assumptions are taken to build the
target function. In this case, the model will closely match the training
dataset.
High Bias: High bias value means more assumptions are taken to build the
target function. In this case, the model will not match the training dataset
closely.
The high-bias model will not be able to capture the dataset trend. It is considered as
the underfitting model which has a high error rate. It is due to a very simplified
algorithm.
For example, a linear regression model may have a high bias if the data has a non-
linear relationship.
Variance
Variance is the measure of spread in data from its mean position. In machine learning
variance is the amount by which the performance of a predictive model changes when
it is trained on different subsets of the training data. More specifically, variance is the
variability of the model that how much it is sensitive to another subset of the training
dataset. i.e. how much it can adjust on the new subset of the training dataset.
Let Y be the actual values of the target variable, and be the predicted values of the
target variable. Then the variance of a model can be measured as the expected value of
the square of the difference between predicted values and the expected value of the
predicted values.
Where is the expected value of the predicted values. Here expected value is
averaged over all the training data.
Low variance: Low variance means that the model is less sensitive to
changes in the training data and can produce consistent estimates of the
target function with different subsets of data from the same distribution.
This is the case of underfitting when the model fails to generalize on both
training and test data.
High variance: High variance means that the model is very sensitive to
changes in the training data and can result in significant changes in the
estimate of the target function when trained on different subsets of data
from the same distribution. This is the case of overfitting when the model
performs well on the training data but poorly on new, unseen test data. It
fits the training data too closely that it fails on the new training dataset.
High Bias, Low Variance: A model with high bias and low variance is said
to be underfitting.
High Variance, Low Bias: A model with high variance and low bias is said
to be overfitting.
High-Bias, High-Variance: A model has both high bias and high variance,
which means that the model is not able to capture the underlying patterns
in the data (high bias) and is also too sensitive to change in the training data
(high variance). As a result, the model will produce inconsistent and
inaccurate predictions on average.
Low Bias, Low Variance: A model that has low bias and low variance
means that the model is able to capture the underlying patterns in the data
(low bias) and is not too sensitive to change in the training data (low
variance). This is the ideal scenario for a machine learning model, as it is
able to generalize well to new, unseen data and produce consistent and
accurate predictions. But in practice, it’s not possible.
Bias Variance Trade-off
If the algorithm is too simple (hypothesis with linear equation) then it may be on high
bias and low variance condition and thus is error prone. If algorithms fit too complex
(hypothesis with high degree equation) then it may be on high variance and low bias.
In the latter condition, the new entries will not perform well. Well, there is something
between both conditions, known as a Trade-off or Bias Variance Trade-off. This trade-
off in complexity is why there is a trade-off between bias and variance. An algorithm
can’t be more complex and less complex at the same time. For the graph, the perfect
trade-off will be like this.
Introduction to Deep Learning
Deep learning is a branch of machine learning which is based on artificial neural networks. It
is capable of learning complex patterns and relationships within data. In deep learning, we
don’t need to explicitly program everything. It has become increasingly popular in recent
years due to the advances in processing power and the availability of large datasets. Because
it is based on artificial neural networks (ANNs) also known as deep neural networks (DNNs).
These neural networks are inspired by the structure and function of the human brain’s
biological neurons, and they are designed to learn from large amounts of data.
1. Deep Learning is a subfield of Machine Learning that involves the use of neural
networks to model and solve complex problems. Neural networks are modeled after
the structure and function of the human brain and consist of layers of interconnected
nodes that process and transform data.
2. The key characteristic of Deep Learning is the use of deep neural networks, which
have multiple layers of interconnected nodes. These networks can learn complex
representations of data by discovering hierarchical patterns and features in the data.
Deep Learning algorithms can automatically learn and improve from data without the
need for manual feature engineering.
3. Deep Learning has achieved significant success in various fields, including image
recognition, natural language processing, speech recognition, and recommendation
systems. Some of the popular Deep Learning architectures include Convolutional
Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Deep Belief
Networks (DBNs).
4. Training deep neural networks typically requires a large amount of data and
computational resources. However, the availability of cloud computing and the
development of specialized hardware, such as Graphics Processing Units (GPUs), has
made it easier to train deep neural networks.
In summary, Deep Learning is a subfield of Machine Learning that involves the use of deep
neural networks to model and solve complex problems. Deep Learning has achieved
significant success in various fields, and its use is expected to continue to grow as more data
becomes available, and more powerful computing resources become available.
In a fully connected Deep neural network, there is an input layer and one or more hidden
layers connected one after the other. Each neuron receives input from the previous layer
neurons or the input layer. The output of one neuron becomes the input to other neurons in
the next layer of the network, and this process continues until the final layer produces the
output of the network. The layers of the neural network transform the input data through a
series of nonlinear transformations, allowing the network to learn complex representations of
the input data.
Today Deep learning has become one of the most popular and visible areas of machine
learning, due to its success in a variety of applications, such as computer vision, natural
language processing, and Reinforcement learning.
Deep learning can be used for supervised, unsupervised as well as reinforcement machine
learning. it uses a variety of ways to process these.
Artificial neurons, also known as units, are found in artificial neural networks. The whole
Artificial Neural Network is composed of these artificial neurons, which are arranged in a
series of layers. The complexities of neural networks will depend on the complexities of the
underlying patterns in the dataset whether a layer has a dozen units or millions of
units. Commonly, Artificial Neural Network has an input layer, an output layer as well as
hidden layers. The input layer receives data from the outside world which the neural network
needs to analyze or learn about.
In a fully connected artificial neural network, there is an input layer and one or more hidden
layers connected one after the other. Each neuron receives input from the previous layer
neurons or the input layer. The output of one neuron becomes the input to other neurons in
the next layer of the network, and this process continues until the final layer produces the
output of the network. Then, after passing through one or more hidden layers, this data is
transformed into valuable data for the output layer. Finally, the output layer provides an
output in the form of an artificial neural network’s response to the data that comes in.
Units are linked to one another from one layer to another in the bulk of neural networks. Each
of these links has weights that control how much one unit influences another. The neural
network learns more and more about the data as it moves from one unit to another, ultimately
producing an output from the output layer.
machine learning and deep learning both are subsets of artificial intelligence but there are
many similarities and differences between them.
Deep Learning models are able to automatically learn features from the data, which makes
them well-suited for tasks such as image recognition, speech recognition, and natural
language processing. The most widely used architectures in deep learning are feedforward
neural networks, convolutional neural networks (CNNs), and recurrent neural networks
(RNNs).
Feedforward neural networks (FNNs) are the simplest type of ANN, with a linear flow of
information through the network. FNNs have been widely used for tasks such as image
classification, speech recognition, and natural language processing.
Convolutional Neural Networks (CNNs) are specifically for image and video recognition
tasks. CNNs are able to automatically learn features from the images, which makes them
well-suited for tasks such as image classification, object detection, and image segmentation.
Recurrent Neural Networks (RNNs) are a type of neural network that is able to process
sequential data, such as time series and natural language. RNNs are able to maintain an
internal state that captures information about the previous inputs, which makes them well-
suited for tasks such as speech recognition, natural language processing, and language
translation.
Computer vision
In computer vision, Deep learning models can enable machines to identify and understand
visual data. Some of the main applications of deep learning in computer vision include:
• Object detection and recognition: Deep learning model can be used to identify and
locate objects within images and videos, making it possible for machines to perform
tasks such as self-driving cars, surveillance, and robotics.
• Image classification: Deep learning models can be used to classify images into
categories such as animals, plants, and buildings. This is used in applications such as
medical imaging, quality control, and image retrieval.
• Image segmentation: Deep learning models can be used for image segmentation into
different regions, making it possible to identify specific features within images.
In NLP, the Deep learning model can enable machines to understand and generate human
language. Some of the main applications of deep learning in NLP include:
• Automatic Text Generation – Deep learning model can learn the corpus of text and
new text like summaries, essays can be automatically generated using these trained
models.
• Language translation: Deep learning models can translate text from one language to
another, making it possible to communicate with people from different linguistic
backgrounds.
• Sentiment analysis: Deep learning models can analyze the sentiment of a piece of
text, making it possible to determine whether the text is positive, negative, or neutral.
This is used in applications such as customer service, social media monitoring, and
political analysis.
• Speech recognition: Deep learning models can recognize and transcribe spoken
words, making it possible to perform tasks such as speech-to-text conversion, voice
search, and voice-controlled devices.
Reinforcement learning:
1. Data availability: It requires large amounts of data to learn from. For using deep
learning it’s a big concern to gather as much data for training.
2. Computational Resources: For training the deep learning model, it is computationally
expensive because it requires specialized hardware like GPUs and TPUs.
3. Time-consuming: While working on sequential data depending on the computational
resource it can take very large even in days or months.
4. Interpretability: Deep learning models are complex, it works like a black box. it is
very difficult to interpret the result.
5. Overfitting: when the model is trained again and again, it becomes too specialized for
the training data, leading to overfitting and poor performance on new data.