Deep Learning - Wikipedia

Deeplearning
FromWikipedia,thefreeencyclopedia
Deeplearning(alsoknownasdeepstructuredlearning,hierarchicallearningordeepmachinelearning)isa
classofmachinelearningalgorithmsthat:[1](pp199200)
useacascadeofmanylayersofnonlinearprocessingunitsforfeatureextractionandtransformation.Each
successivelayerusestheoutputfromthepreviouslayerasinput.Thealgorithmsmaybesupervisedor
unsupervisedandapplicationsincludepatternanalysis(unsupervised)andclassification(supervised).
arebasedonthe(unsupervised)learningofmultiplelevelsoffeaturesorrepresentationsofthedata.Higher
levelfeaturesarederivedfromlowerlevelfeaturestoformahierarchicalrepresentation.
arepartofthebroadermachinelearningfieldoflearningrepresentationsofdata.
learnmultiplelevelsofrepresentationsthatcorrespondtodifferentlevelsofabstractionthelevelsforma
hierarchyofconcepts.
Inasimplecase,theremightbetwosetsofneurons:onesetthatreceivesaninputsignalandonethatsendsan
outputsignal.Whentheinputlayerreceivesaninputitpassesonamodifiedversionoftheinputtothenextlayer.
Inadeepnetwork,therearemanylayersbetweentheinputandtheoutput(andthelayersarenotmadeofneurons
butitcanhelptothinkofitthatway),allowingthealgorithmtousemultipleprocessinglayers,composedof
multiplelinearandnonlineartransformations.[2][1][3][4][5][6][7][8][9]
Deeplearningispartofabroaderfamilyofmachinelearningmethodsbasedonlearningrepresentationsofdata.
Anobservation(e.g.,animage)canberepresentedinmanywayssuchasavectorofintensityvaluesperpixel,or
inamoreabstractwayasasetofedges,regionsofparticularshape,etc.Somerepresentationsarebetterthan
othersatsimplifyingthelearningtask(e.g.,facerecognitionorfacialexpressionrecognition[10]).Oneofthe
promisesofdeeplearningisreplacinghandcraftedfeatureswithefficientalgorithmsforunsupervisedorsemi
supervisedfeaturelearningandhierarchicalfeatureextraction.[11]
Researchinthisareaattemptstomakebetterrepresentationsandcreatemodelstolearntheserepresentationsfrom
largescaleunlabeleddata.Someoftherepresentationsareinspiredbyadvancesinneuroscienceandareloosely
basedoninterpretationofinformationprocessingandcommunicationpatternsinanervoussystem,suchasneural
codingwhichattemptstodefinearelationshipbetweenvariousstimuliandassociatedneuronalresponsesinthe
brain.[12]
Variousdeeplearningarchitecturessuchasdeepneuralnetworks,convolutionaldeepneuralnetworks,deepbelief
networksandrecurrentneuralnetworkshavebeenappliedtofieldslikecomputervision,automaticspeech
recognition,naturallanguageprocessing,audiorecognitionandbioinformaticswheretheyhavebeenshownto
producestateoftheartresultsonvarioustasks.
Deeplearninghasbeencharacterizedasabuzzword,orarebrandingofneuralnetworks.[13][14]
Contents
1 Introduction
1.1 Definitions
1.2 Fundamentalconcepts
2 Interpretations
2.1 Universalapproximationtheoreminterpretation
2.2 Probabilisticinterpretation
3 History
4 Artificialneuralnetworks
5 Deepneuralnetworkarchitectures
5.1 Briefdiscussionofdeepneuralnetworks
5.1.1 Backpropagation
5.1.2 Problemswithdeepneuralnetworks
5.2 Firstdeeplearningnetworksof1965:GMDH
5.3 Convolutionalneuralnetworks
5.4 Neuralhistorycompressor
5.5 Recursiveneuralnetworks
5.6 Longshorttermmemory
5.7 Deepbeliefnetworks
5.8 Convolutionaldeepbeliefnetworks
5.9 Largememorystorageandretrievalneuralnetworks
5.10 DeepBoltzmannmachines
5.11 Stacked(denoising)autoencoders
5.12 Deepstackingnetworks
5.13 Tensordeepstackingnetworks
5.14 SpikeandslabRBMs
5.15 Compoundhierarchicaldeepmodels
5.16 Deepcodingnetworks
5.17 DeepQnetworks
5.18 Networkswithseparatememorystructures
5.18.1 LSTMrelateddifferentiablememorystructures
5.18.2 Semantichashing
5.18.3 NeuralTuringmachines
5.18.4 Memorynetworks
5.18.5 Pointernetworks
5.18.6 Encoderdecodernetworks
6 Otherarchitectures
6.1 Multilayerkernelmachine
7 Applications
7.1 Automaticspeechrecognition
7.2 Imagerecognition
7.3 Naturallanguageprocessing
7.4 Drugdiscoveryandtoxicology
7.5 Customerrelationshipmanagement
7.6 Recommendationsystems
7.7 Biomedicalinformatics
8 Theoriesofthehumanbrain
9 Commercialactivities
10 Criticismandcomment
11 Softwarelibraries
12 Seealso
13 References
14 Externallinks
Introduction
Definitions
Deeplearningisaclassofmachinelearningalgorithmsthat[1](pp199200)
useacascadeofmanylayersofnonlinearprocessingunitsforfeatureextractionandtransformation.Each
successivelayerusestheoutputfromthepreviouslayerasinput.Thealgorithmsmaybesupervisedor
unsupervisedandapplicationsincludepatternanalysis(unsupervised)andclassification(supervised).
arebasedonthe(unsupervised)learningofmultiplelevelsoffeaturesorrepresentationsofthedata.Higher
levelfeaturesarederivedfromlowerlevelfeaturestoformahierarchicalrepresentation.
arepartofthebroadermachinelearningfieldoflearningrepresentationsofdata.
learnmultiplelevelsofrepresentationsthatcorrespondtodifferentlevelsofabstractionthelevelsforma
hierarchyofconcepts.
Thesedefinitionshaveincommon(1)multiplelayersofnonlinearprocessingunitsand(2)thesupervisedor
unsupervisedlearningoffeaturerepresentationsineachlayer,withthelayersformingahierarchyfromlowlevel
tohighlevelfeatures.[1](p200)Thecompositionofalayerofnonlinearprocessingunitsusedinadeeplearning
algorithmdependsontheproblemtobesolved.Layersthathavebeenusedindeeplearningincludehiddenlayers
ofanartificialneuralnetworkandsetsofcomplicatedpropositionalformulas.[3]Theymayalsoincludelatent
variablesorganizedlayerwiseindeepgenerativemodelssuchasthenodesinDeepBeliefNetworksandDeep
BoltzmannMachines.
Deeplearningalgorithmstransformtheirinputsthroughmorelayersthanshallowlearningalgorithms.Ateach
layer,thesignalistransformedbyaprocessingunit,likeanartificialneuron,whoseparametersare'learned'
throughtraining.[5](p6)Achainoftransformationsfrominputtooutputisacreditassignmentpath(CAP).CAPs
describepotentiallycausalconnectionsbetweeninputandoutputandmayvaryinlengthforafeedforward
neuralnetwork,thedepthoftheCAPs(thusofthenetwork)isthenumberofhiddenlayersplusone(astheoutput
layerisalsoparameterized),butforrecurrentneuralnetworks,inwhichasignalmaypropagatethroughalayer
morethanonce,theCAPispotentiallyunlimitedinlength.Thereisnouniversallyagreeduponthresholdofdepth
dividingshallowlearningfromdeeplearning,butmostresearchersinthefieldagreethatdeeplearninghas
multiplenonlinearlayers(CAP>2)andJuergenSchmidhuberconsidersCAP>10tobeverydeeplearning.[5](p7)
Fundamentalconcepts
Deeplearningalgorithmsarebasedondistributedrepresentations.Theunderlyingassumptionbehinddistributed
representationsisthatobserveddataaregeneratedbytheinteractionsoffactorsorganizedinlayers.Deeplearning
addstheassumptionthattheselayersoffactorscorrespondtolevelsofabstractionorcomposition.Varying
numbersoflayersandlayersizescanbeusedtoprovidedifferentamountsofabstraction.[4]
Deeplearningexploitsthisideaofhierarchicalexplanatoryfactorswherehigherlevel,moreabstractconceptsare
learnedfromthelowerlevelones.Thesearchitecturesareoftenconstructedwithagreedylayerbylayermethod.
Deeplearninghelpstodisentangletheseabstractionsandpickoutwhichfeaturesareusefulforlearning.[4]
Forsupervisedlearningtasks,deeplearningmethodsobviatefeatureengineering,bytranslatingthedatainto
compactintermediaterepresentationsakintoprincipalcomponents,andderivelayeredstructureswhichremove
redundancyinrepresentation.[1]
Manydeeplearningalgorithmsareappliedtounsupervisedlearningtasks.Thisisanimportantbenefitbecause
unlabeleddataareusuallymoreabundantthanlabeleddata.Examplesofdeepstructuresthatcanbetrainedinan
unsupervisedmannerareneuralhistorycompressors[15]anddeepbeliefnetworks.[4][16]
Interpretations
Deepneuralnetworksaregenerallyinterpretedintermsof:Universalapproximationtheorem[17][18][19][20][21]or
Probabilisticinference.[1][3][4][5][16][22]
Universalapproximationtheoreminterpretation
Theuniversalapproximationtheoremconcernsthecapacityoffeedforwardneuralnetworkswithasinglehidden
layeroffinitesizetoapproximatecontinuousfunctions.[17][18][19][20][21]
In1989,thefirstproofwaspublishedbyGeorgeCybenkoforsigmoidactivationfunctions[18]andwasgeneralised
tofeedforwardmultilayerarchitecturesin1991byKurtHornik.[19]
Probabilisticinterpretation
Theprobabilisticinterpretation[22]derivesfromthefieldofmachinelearning.Itfeaturesinference,[1][3][4][5][16][22]
aswellastheoptimizationconceptsoftrainingandtesting,relatedtofittingandgeneralizationrespectively.More
specifically,theprobabilisticinterpretationconsiderstheactivationnonlinearityasacumulativedistribution
function.[22]SeeDeepbeliefnetwork.Theprobabilisticinterpretationledtotheintroductionofdropoutas
regularizerinneuralnetworks.[23]
TheprobabilisticinterpretationwasintroducedandpopularizedbyGeoffHinton,YoshuaBengio,YannLeCunand
JuergenSchmidhuber.
History
Thefirstgeneral,workinglearningalgorithmforsuperviseddeepfeedforwardmultilayerperceptronswas
publishedbyIvakhnenkoandLapain1965.[24]A1971paper[25]describedadeepnetworkwith8layerstrainedby
theGroupmethodofdatahandlingalgorithmwhichisstillpopularinthecurrentmillennium.Theseideaswere
implementedinacomputeridentificationsystem"Alpha",whichdemonstratedthelearningprocess.OtherDeep
Learningworkingarchitectures,specificallythosebuiltfromartificialneuralnetworks(ANN),datebacktothe
NeocognitronintroducedbyKunihikoFukushimain1980.[26]TheANNsthemselvesdatebackevenfurther.The
challengewashowtotrainnetworkswithmultiplelayers.In1989,YannLeCunetal.wereabletoapplythe
standardbackpropagationalgorithm,whichhadbeenaroundasthereversemodeofautomaticdifferentiationsince
1970,[27][28][29][30]toadeepneuralnetworkwiththepurposeofrecognizinghandwrittenZIPcodesonmail.
Despitethesuccessofapplyingthealgorithm,thetimetotrainthenetworkonthisdatasetwasapproximately3
days,makingitimpracticalforgeneraluse.[31]
In1993,JrgenSchmidhuber'sneuralhistorycompressor[15]implementedasanunsupervisedstackofrecurrent
neuralnetworks(RNNs)solveda"VeryDeepLearning"task[5]thatrequiresmorethan1,000subsequentlayersin
anRNNunfoldedintime.[32]In1994,AndrC.P.L.F.deCarvalho,togetherwithMikeC.FairhurstandDavid
Bisset,publishedaworkwithexperimentalresultsofaseverallayersbooleanneuralnetwork,alsoknownas
weightlessneuralnetwork,composedbytwomodules,aselforganisingfeatureextractionneuralnetworkmodule
followedbyaclassificationneuralnetworkmodule,whichwereindependentlyandsequentiallytrained.[33]In
1995,BrendanFreydemonstratedthatitwaspossibletotrainanetworkcontainingsixfullyconnectedlayersand
severalhundredhiddenunitsusingthewakesleepalgorithm,whichwascodevelopedwithPeterDayanand
GeoffreyHinton.[34]However,trainingtooktwodays.Manyfactorscontributetotheslowspeed,onebeingthe
vanishinggradientproblemanalyzedin1991bySeppHochreiter.[35][36]
Whileby1991suchneuralnetworkswereusedforrecognizingisolated2Dhandwrittendigits,recognizing3D
objectswasdonebymatching2Dimageswithahandcrafted3Dobjectmodel.JuyangWengetal.suggestedthat
ahumanbraindoesnotuseamonolithic3Dobjectmodel,andin1992theypublishedCresceptron,[37][38][39]a
methodforperforming3Dobjectrecognitiondirectlyfromclutteredscenes.Cresceptronisacascadeoflayers
similartoNeocognitron.ButwhileNeocognitronrequiredahumanprogrammertohandmergefeatures,
Cresceptronautomaticallylearnedanopennumberofunsupervisedfeaturesineachlayer,whereeachfeatureis
representedbyaconvolutionkernel.Cresceptronalsosegmentedeachlearnedobjectfromaclutteredscene
throughbackanalysisthroughthenetwork.Maxpooling,nowoftenadoptedbydeepneuralnetworks(e.g.
ImageNettests),wasfirstusedinCresceptrontoreducethepositionresolutionbyafactorof(2x2)to1throughthe
cascadeforbettergeneralization.Despitetheseadvantages,simplermodelsthatusetaskspecifichandcrafted
featuressuchasGaborfiltersandsupportvectormachines(SVMs)wereapopularchoiceinthe1990sand2000s,
becauseofthecomputationalcostofANNsatthetime,andagreatlackofunderstandingofhowthebrain
autonomouslywiresitsbiologicalnetworks.
Inthelonghistoryofspeechrecognition,bothshallowanddeeplearning(e.g.,recurrentnets)ofartificialneural
networkshavebeenexploredformanyyears.[40][41][42]Butthesemethodsneverwonoverthenonuniform
internalhandcraftingGaussianmixturemodel/HiddenMarkovmodel(GMMHMM)technologybasedon
generativemodelsofspeechtraineddiscriminatively.[43]Anumberofkeydifficultieshavebeenmethodologically
analyzed,includinggradientdiminishing[35]andweaktemporalcorrelationstructureintheneuralpredictive
models.[44][45]Additionaldifficultieswerethelackofbigtrainingdataandweakercomputingpowerintheseearly
days.Thus,mostspeechrecognitionresearcherswhounderstoodsuchbarriersmovedawayfromneuralnetsto
pursuegenerativemodeling.AnexceptionwasatSRIInternationalinthelate1990s.FundedbytheUS
government'sNSAandDARPA,SRIconductedresearchondeepneuralnetworksinspeechandspeaker
recognition.Thespeakerrecognitionteam,ledbyLarryHeck(https://www.linkedin.com/in/larryheck),achieved
thefirstsignificantsuccesswithdeepneuralnetworksinspeechprocessingasdemonstratedinthe1998NIST
(NationalInstituteofStandardsandTechnology)SpeakerRecognitionevaluation(http://www.nist.gov/itl/iad/mig/
sre.cfm)andlaterpublishedinthejournalofSpeechCommunication.[46]WhileSRIestablishedsuccesswithdeep
neuralnetworksinspeakerrecognition,theywereunsuccessfulindemonstratingsimilarsuccessinspeech
recognition.Hintonetal.andDengetal.reviewedpartofthisrecenthistoryabouthowtheircollaborationwith
eachotherandthenwithcolleaguesacrossfourgroups(UniversityofToronto,Microsoft,Google,andIBM)
ignitedarenaissanceofdeepfeedforwardneuralnetworksinspeechrecognition.[47][48][49][50]
Today,however,manyaspectsofspeechrecognitionhavebeentakenoverbyadeeplearningmethodcalledLong
shorttermmemory(LSTM),arecurrentneuralnetworkpublishedbySeppHochreiter&JrgenSchmidhuberin
1997.[51]LSTMRNNsavoidthevanishinggradientproblemandcanlearn"VeryDeepLearning"tasks[5]that
requirememoriesofeventsthathappenedthousandsofdiscretetimestepsago,whichisimportantforspeech.In
2003,LSTMstartedtobecomecompetitivewithtraditionalspeechrecognizersoncertaintasks.[52]Lateritwas
combinedwithCTC[53]instacksofLSTMRNNs.[54]In2015,Google'sspeechrecognitionreportedlyexperienced
adramaticperformancejumpof49%throughCTCtrainedLSTM,whichisnowavailablethroughGoogleVoice
Searchtoallsmartphoneusers,[55]andhasbecomeashowcaseofdeeplearning.
Theuseoftheexpression"DeepLearning"inthecontextofArtificialNeuralNetworkswasintroducedbyIgor
Aizenbergandcolleaguesin2000.[56]AGoogleNgramchartshowsthattheusageofthetermhasgainedtraction
(actuallyhastakenoff)since2000.[57]In2006,apublicationbyGeoffreyHintonandRuslanSalakhutdinovdrew
additionalattentionbyshowinghowmanylayeredfeedforwardneuralnetworkcouldbeeffectivelypretrained
onelayeratatime,treatingeachlayerinturnasanunsupervisedrestrictedBoltzmannmachine,thenfinetuningit
usingsupervisedbackpropagation.[58]In1992,Schmidhuberhadalreadyimplementedaverysimilarideaforthe
moregeneralcaseofunsuperviseddeephierarchiesofrecurrentneuralnetworks,andalsoexperimentallyshown
itsbenefitsforspeedingupsupervisedlearning.[15][59]
Sinceitsresurgence,deeplearninghasbecomepartofmanystateoftheartsystemsinvariousdisciplines,
particularlycomputervisionandautomaticspeechrecognition(ASR).Resultsoncommonlyusedevaluationsets
suchasTIMIT(ASR)andMNIST(imageclassification),aswellasarangeoflargevocabularyspeechrecognition
tasksareconstantlybeingimprovedwithnewapplicationsofdeeplearning.[47][60][61]Recently,itwasshownthat
deeplearningarchitecturesintheformofconvolutionalneuralnetworkshavebeennearlybestperforming[62][63]
however,thesearemorewidelyusedincomputervisionthaninASR,andmodernlargescalespeechrecognitionis
typicallybasedonCTC[53]forLSTM.[51][55][64][65][66]
Therealimpactofdeeplearninginindustryapparentlybeganintheearly2000s,whenCNNsalreadyprocessedan
estimated10%to20%ofallthecheckswrittenintheUSintheearly2000s,accordingtoYannLeCun.[67]
Industrialapplicationsofdeeplearningtolargescalespeechrecognitionstartedaround2010.Inlate2009,Li
DenginvitedGeoffreyHintontoworkwithhimandcolleaguesatMicrosoftResearchinRedmond,Washingtonto
applydeeplearningtospeechrecognition.Theycoorganizedthe2009NIPSWorkshoponDeepLearningfor
SpeechRecognition.Theworkshopwasmotivatedbythelimitationsofdeepgenerativemodelsofspeech,andthe
possibilitythatthebigcompute,bigdataerawarrantedaserioustryofdeepneuralnets(DNN).Itwasbelieved
thatpretrainingDNNsusinggenerativemodelsofdeepbeliefnets(DBN)wouldovercomethemaindifficulties
ofneuralnetsencounteredinthe1990s.[49]However,earlyintothisresearchatMicrosoft,itwasdiscoveredthat
withoutpretraining,butusinglargeamountsoftrainingdata,andespeciallyDNNsdesignedwithcorresponding
large,contextdependentoutputlayers,producederrorratesdramaticallylowerthanthenstateoftheartGMM
HMMandalsothanmoreadvancedgenerativemodelbasedspeechrecognitionsystems.Thisfindingwasverified
byseveralothermajorspeechrecognitionresearchgroups.[47][68]Further,thenatureofrecognitionerrorsproduced
bythetwotypesofsystemswasfoundtobecharacteristicallydifferent,[48][69]offeringtechnicalinsightsintohow
tointegratedeeplearningintotheexistinghighlyefficient,runtimespeechdecodingsystemdeployedbyallmajor
playersinspeechrecognitionindustry.Thehistoryofthissignificantdevelopmentindeeplearninghasbeen
describedandanalyzedinrecentbooksandarticles.[1][70][71]
Advancesinhardwarehavealsobeenimportantinenablingtherenewedinterestindeeplearning.In2009,Nvidia
wasinvolvedinwhatwascalledthebigbangofdeeplearning,asdeeplearningneuralnetworkswere
combinedwithNvidiagraphicsprocessingunits(GPUs).[72]Thatyear,theGoogleBrainusedNvidiaGPUsto
createDeepNeuralNetworkscapableofmachinelearning,whereAndrewNgdeterminedthatGPUscould
increasethespeedofdeeplearningsystemsbyabout100times.[73]Inparticular,powerfulgraphicsprocessing
units(GPUs)arewellsuitedforthekindofnumbercrunching,matrix/vectormathinvolvedinmachine
learning.[74][75]GPUshavebeenshowntospeeduptrainingalgorithmsbyordersofmagnitude,bringingrunning
timesofweeksbacktodays.[76][77]Specializedhardwareandalgorithmoptimizationscanalsobeusedforefficient
processingofDNNs.[78]
Artificialneuralnetworks
Someofthemostsuccessfuldeeplearningmethodsinvolveartificialneuralnetworks.Artificialneuralnetworks
areinspiredbythe1959biologicalmodelproposedbyNobellaureatesDavidH.Hubel&TorstenWiesel,who
foundtwotypesofcellsintheprimaryvisualcortex:simplecellsandcomplexcells.Manyartificialneural
networkscanbeviewedascascadingmodels[37][38][39][79]ofcelltypesinspiredbythesebiologicalobservations.
Fukushima'sNeocognitronintroducedconvolutionalneuralnetworkspartiallytrainedbyunsupervisedlearning
withhumandirectedfeaturesintheneuralplane.YannLeCunetal.(1989)appliedsupervisedbackpropagationto
sucharchitectures.[80]Wengetal.(1992)publishedconvolutionalneuralnetworksCresceptron[37][38][39]for3D
objectrecognitionfromimagesofclutteredscenesandsegmentationofsuchobjectsfromimages.
Anobviousneedforrecognizinggeneral3Dobjectsisleastshiftinvarianceandtolerancetodeformation.Max
poolingappearedtobefirstproposedbyCresceptron[37][38]toenablethenetworktotoleratesmalltolarge
deformationinahierarchicalway,whileusingconvolution.Maxpoolinghelps,butdoesnotguarantee,shift
invarianceatthepixellevel.[39]
Withtheadventofthebackpropagationalgorithmbasedonautomatic
differentiation,[27][29][30][81][82][83][84][85][86][87]manyresearcherstriedtotrainsuperviseddeepartificialneural
networksfromscratch,initiallywithlittlesuccess.SeppHochreiter'sdiplomathesisof1991[35][36]formally
identifiedthereasonforthisfailureasthevanishinggradientproblem,whichaffectsmanylayeredfeedforward
networksandrecurrentneuralnetworks.Recurrentnetworksaretrainedbyunfoldingthemintoverydeep
feedforwardnetworks,whereanewlayeriscreatedforeachtimestepofaninputsequenceprocessedbythe
network.Aserrorspropagatefromlayertolayer,theyshrinkexponentiallywiththenumberoflayers,impeding
thetuningofneuronweightswhichisbasedonthoseerrors.
Toovercomethisproblem,severalmethodswereproposed.OneisJrgenSchmidhuber'smultilevelhierarchyof
networks(1992)pretrainedonelevelatatimebyunsupervisedlearning,finetunedbybackpropagation.[15]Here
eachlevellearnsacompressedrepresentationoftheobservationsthatisfedtothenextlevel.
Anothermethodisthelongshorttermmemory(LSTM)networkofHochreiter&Schmidhuber(1997).[51]In
2009,deepmultidimensionalLSTMnetworkswonthreeICDAR2009competitionsinconnectedhandwriting
recognition,withoutanypriorknowledgeaboutthethreelanguagestobelearned.[88][89]
SvenBehnkein2003reliedonlyonthesignofthegradient(Rprop)whentraininghisNeuralAbstraction
Pyramid[90]tosolveproblemslikeimagereconstructionandfacelocalization.
Othermethodsalsouseunsupervisedpretrainingtostructureaneuralnetwork,makingitfirstlearngenerally
usefulfeaturedetectors.Thenthenetworkistrainedfurtherbysupervisedbackpropagationtoclassifylabeled
data.ThedeepmodelofHintonetal.(2006)involveslearningthedistributionofahighlevelrepresentationusing
successivelayersofbinaryorrealvaluedlatentvariables.ItusesarestrictedBoltzmannmachine(Smolensky,
1986[91])tomodeleachnewlayerofhigherlevelfeatures.Eachnewlayerguaranteesanincreaseonthelower
boundoftheloglikelihoodofthedata,thusimprovingthemodel,iftrainedproperly.Oncesufficientlymany
layershavebeenlearned,thedeeparchitecturemaybeusedasagenerativemodelbyreproducingthedatawhen
samplingdownthemodel(an"ancestralpass")fromthetoplevelfeatureactivations.[92]Hintonreportsthathis
modelsareeffectivefeatureextractorsoverhighdimensional,structureddata.[93]
In2012,theGoogleBrainteamledbyAndrewNgandJeffDeancreatedaneuralnetworkthatlearnedto
recognizehigherlevelconcepts,suchascats,onlyfromwatchingunlabeledimagestakenfromYouTube
videos.[94][95]
Othermethodsrelyonthesheerprocessingpowerofmoderncomputers,inparticular,GPUs.In2010,DanCiresan
andcolleagues[76]inJrgenSchmidhuber'sgroupattheSwissAILabIDSIAshowedthatdespitetheabove
mentioned"vanishinggradientproblem,"thesuperiorprocessingpowerofGPUsmakesplainbackpropagation
feasiblefordeepfeedforwardneuralnetworkswithmanylayers.Themethodoutperformedallothermachine
learningtechniquesontheold,famousMNISThandwrittendigitsproblemofYannLeCunandcolleaguesatNYU.
Ataboutthesametime,inlate2009,deeplearningfeedforwardnetworksmadeinroadsintospeechrecognition,as
markedbytheNIPSWorkshoponDeepLearningforSpeechRecognition.Intensivecollaborativeworkbetween
MicrosoftResearchandUniversityofTorontoresearchersdemonstratedbymid2010inRedmondthatdeepneural
networksinterfacedwithahiddenMarkovmodelwithcontextdependentstatesthatdefinetheneuralnetwork
outputlayercandrasticallyreduceerrorsinlargevocabularyspeechrecognitiontaskssuchasvoicesearch.The
samedeepneuralnetmodelwasshowntoscaleuptoSwitchboardtasksaboutoneyearlateratMicrosoftResearch
Asia.Evenearlier,in2007,LSTM[51]trainedbyCTC[53]startedtogetexcellentresultsincertainapplications.[54]
Thismethodisnowwidelyused,forexample,inGoogle'sgreatlyimprovedspeechrecognitionforallsmartphone
users.[55]
Asof2011,thestateoftheartindeeplearningfeedforwardnetworksalternatesconvolutionallayersandmax
poolinglayers,[96][97]toppedbyseveralfullyconnectedorsparselyconnectedlayerfollowedbyafinal
classificationlayer.Trainingisusuallydonewithoutanyunsupervisedpretraining.Since2011,GPUbased
implementations[96]ofthisapproachwonmanypatternrecognitioncontests,includingtheIJCNN2011Traffic
SignRecognitionCompetition,[98]theISBI2012SegmentationofneuronalstructuresinEMstackschallenge,[99]
theImageNetCompetition,[100]andothers.
Suchsuperviseddeeplearningmethodsalsowerethefirstartificialpatternrecognizerstoachievehuman
competitiveperformanceoncertaintasks.[101]
ToovercomethebarriersofweakAIrepresentedbydeeplearning,itisnecessarytogobeyonddeeplearning
architectures,becausebiologicalbrainsusebothshallowanddeepcircuitsasreportedbybrainanatomy[102]
displayingawidevarietyofinvariance.Weng[103]arguedthatthebrainselfwireslargelyaccordingtosignal
statisticsand,therefore,aserialcascadecannotcatchallmajorstatisticaldependencies.ANNswereableto
guaranteeshiftinvariancetodealwithsmallandlargenaturalobjectsinlargeclutteredscenes,onlywhen
invarianceextendedbeyondshift,toallANNlearnedconcepts,suchaslocation,type(objectclasslabel),scale,
lighting.ThiswasrealizedinDevelopmentalNetworks(DNs)[104]whoseembodimentsareWhereWhat
Networks,WWN1(2008)[105]throughWWN7(2013).[106]
Deepneuralnetworkarchitectures
Therearehugenumbersofvariantsofdeeparchitectures.Mostofthemarebranchedfromsomeoriginalparent
architectures.Itisnotalwayspossibletocomparetheperformanceofmultiplearchitecturesalltogether,because
theyarenotallevaluatedonthesamedatasets.Deeplearningisafastgrowingfield,andnewarchitectures,
variants,oralgorithmsappeareveryfewweeks.
Briefdiscussionofdeepneuralnetworks
Adeepneuralnetwork(DNN)isanartificialneuralnetwork(ANN)withmultiplehiddenlayersofunitsbetween
theinputandoutputlayers.[3][5]SimilartoshallowANNs,DNNscanmodelcomplexnonlinearrelationships.
DNNarchitectures,e.g.,forobjectdetectionandparsing,generatecompositionalmodelswheretheobjectis
expressedasalayeredcompositionofimageprimitives.[107]Theextralayersenablecompositionoffeaturesfrom
lowerlayers,givingthepotentialofmodelingcomplexdatawithfewerunitsthanasimilarlyperformingshallow
network.[3]
DNNsaretypicallydesignedasfeedforwardnetworks,butresearchhasverysuccessfullyappliedrecurrentneural
networks,especiallyLSTM,[51][108]forapplicationssuchaslanguagemodeling.[109][110][111][112][113]Convolutional
deepneuralnetworks(CNNs)areusedincomputervisionwheretheirsuccessiswelldocumented.[114]CNNsalso
havebeenappliedtoacousticmodelingforautomaticspeechrecognition(ASR),wheretheyhaveshownsuccess
overpreviousmodels.[63]Forsimplicity,alookattrainingDNNsisgivenhere.
Backpropagation
ADNNcanbediscriminativelytrainedwiththestandardbackpropagationalgorithm.Accordingtovarious
sources,[5][8][87][115]basicsofcontinuousbackpropagationwerederivedinthecontextofcontroltheorybyHenryJ.
Kelley[82]in1960andbyArthurE.Brysonin1961,[83]usingprinciplesofdynamicprogramming.In1962,Stuart
Dreyfuspublishedasimplerderivationbasedonlyonthechainrule.[84]Vapnikcitesreference[116]inhisbookon
SupportVectorMachines.ArthurE.BrysonandYuChiHodescribeditasamultistagedynamicsystem
optimizationmethodin1969.[117][118]In1970,SeppoLinnainmaafinallypublishedthegeneralmethodfor
automaticdifferentiation(AD)ofdiscreteconnectednetworksofnesteddifferentiablefunctions.[27][119]This
correspondstothemodernversionofbackpropagationwhichisefficientevenwhenthenetworksare
sparse.[5][8][28][81]In1973,StuartDreyfususedbackpropagationtoadaptparametersofcontrollersinproportionto
errorgradients.[85]In1974,PaulWerbosmentionedthepossibilityofapplyingthisprincipletoartificialneural
networks,[120]andin1982,heappliedLinnainmaa'sADmethodtoneuralnetworksinthewaythatiswidelyused
today.[8][30]In1986,DavidE.Rumelhart,GeoffreyE.HintonandRonaldJ.Williamsshowedthroughcomputer
experimentsthatthismethodcangenerateusefulinternalrepresentationsofincomingdatainhiddenlayersof
neuralnetworks.[86]In1993,EricA.Wanwasthefirst[5]towinaninternationalpatternrecognitioncontest
throughbackpropagation.[121]
Theweightupdatesofbackpropagationcanbedoneviastochasticgradientdescentusingthefollowingequation:
Here, isthelearningrate, isthecostfunctionand astochasticterm.Thechoiceofthecostfunction

dependsonfactorssuchasthelearningtype(supervised,unsupervised,reinforcement,etc.)andtheactivation
function.Forexample,whenperformingsupervisedlearningonamulticlassclassificationproblem,common
choicesfortheactivationfunctionandcostfunctionarethesoftmaxfunctionandcrossentropyfunction,
respectively.Thesoftmaxfunctionisdefinedas where representstheclassprobability
(outputoftheunit )and and representthetotalinputtounits and ofthesamelevelrespectively.Cross

entropyisdefinedas where representsthetargetprobabilityforoutputunit and is
theprobabilityoutputfor afterapplyingtheactivationfunction.[122]
Thesecanbeusedtooutputobjectboundingboxesintheformofabinarymask.Theyarealsousedformultiscale
regressiontoincreaselocalizationprecision.DNNbasedregressioncanlearnfeaturesthatcapturegeometric
informationinadditiontobeingagoodclassifier.Theyremovethelimitationofdesigningamodelwhichwill
capturepartsandtheirrelationsexplicitly.Thishelpstolearnawidevarietyofobjects.Themodelconsistsof
multiplelayers,eachofwhichhasarectifiedlinearunitfornonlineartransformation.Somelayersare
convolutional,whileothersarefullyconnected.Everyconvolutionallayerhasanadditionalmaxpooling.The
networkistrainedtominimizeL2errorforpredictingthemaskrangingovertheentiretrainingsetcontaining
boundingboxesrepresentedasmasks.
Problemswithdeepneuralnetworks
AswithANNs,manyissuescanarisewithDNNsiftheyarenaivelytrained.Twocommonissuesareoverfitting
andcomputationtime.
DNNsarepronetooverfittingbecauseoftheaddedlayersofabstraction,whichallowthemtomodelrare
dependenciesinthetrainingdata.RegularizationmethodssuchasIvakhnenko'sunitpruning[25]orweightdecay(
regularization)orsparsity( regularization)canbeappliedduringtrainingtohelpcombatoverfitting.[123]A
morerecentregularizationmethodappliedtoDNNsisdropoutregularization.Indropout,somenumberofunits
arerandomlyomittedfromthehiddenlayersduringtraining.Thishelpstobreaktheraredependenciesthatcan
occurinthetrainingdata.[124]
Thedominantmethodfortrainingthesestructureshasbeenerrorcorrectiontraining(suchasbackpropagationwith
gradientdescent)duetoitseaseofimplementationanditstendencytoconvergetobetterlocaloptimathanother
trainingmethods.However,thesemethodscanbecomputationallyexpensive,especiallyforDNNs.Thereare
manytrainingparameterstobeconsideredwithaDNN,suchasthesize(numberoflayersandnumberofunitsper
layer),thelearningrateandinitialweights.Sweepingthroughtheparameterspaceforoptimalparametersmaynot
befeasibleduetothecostintimeandcomputationalresources.Various'tricks'suchasusingminibatching
(computingthegradientonseveraltrainingexamplesatonceratherthanindividualexamples)[125]havebeen
showntospeedupcomputation.ThelargeprocessingthroughputofGPUshasproducedsignificantspeedupsin
training,duetothematrixandvectorcomputationsrequiredbeingwellsuitedforGPUs.[5]Radicalalternativesto
backpropsuchasExtremeLearningMachines,[126]"Noprop"networks,[127]trainingwithoutbacktracking,[128]
"weightless"networks,[129]andnonconnectionistneuralnetworksaregainingattention.
Firstdeeplearningnetworksof1965:GMDH
Accordingtoahistoricsurvey,[5]thefirstfunctionalDeepLearningnetworkswithmanylayerswerepublishedby
AlexeyGrigorevichIvakhnenkoandV.G.Lapain1965.[24][130]ThelearningalgorithmwascalledtheGroup
MethodofDataHandlingorGMDH.[131]GMDHfeaturesfullyautomaticstructuralandparametricoptimization
ofmodels.TheactivationfunctionsofthenetworknodesareKolmogorovGaborpolynomialsthatpermit
additionsandmultiplications.Ivakhnenko's1971paper[25]describesthelearningofadeepfeedforwardmultilayer
perceptronwitheightlayers,alreadymuchdeeperthanmanylaternetworks.Thesupervisedlearningnetworkis
grownlayerbylayer,whereeachlayeristrainedbyregressionanalysis.Fromtimetotimeuselessneuronsare
detectedusingavalidationset,andprunedthroughregularization.Thesizeanddepthoftheresultingnetwork
dependsontheproblem.Variantsofthismethodarestillbeingusedtoday.[132]
Convolutionalneuralnetworks
CNNshavebecomethemethodofchoiceforprocessingvisualandothertwodimensionaldata.[31][67]ACNNis
composedofoneormoreconvolutionallayerswithfullyconnectedlayers(matchingthoseintypicalartificial
neuralnetworks)ontop.Italsousestiedweightsandpoolinglayers.Inparticular,maxpooling[38]isoftenusedin
Fukushima'sconvolutionalarchitecture.[26]ThisarchitectureallowsCNNstotakeadvantageofthe2Dstructureof
inputdata.Incomparisonwithotherdeeparchitectures,convolutionalneuralnetworkshaveshownsuperiorresults
inbothimageandspeechapplications.Theycanalsobetrainedwithstandardbackpropagation.CNNsareeasierto
trainthanotherregular,deep,feedforwardneuralnetworksandhavemanyfewerparameterstoestimate,making
themahighlyattractivearchitecturetouse.[133]ExamplesofapplicationsinComputerVisioninclude
DeepDream.[134]SeethemainarticleonConvolutionalneuralnetworksfornumerousadditionalreferences.
Neuralhistorycompressor
Thevanishinggradientproblem[35]ofautomaticdifferentiationorbackpropagationinneuralnetworkswas
partiallyovercomein1992byanearlygenerativemodelcalledtheneuralhistorycompressor,implementedasan
unsupervisedstackofrecurrentneuralnetworks(RNNs).[15]TheRNNattheinputlevellearnstopredictitsnext
inputfromthepreviousinputhistory.OnlyunpredictableinputsofsomeRNNinthehierarchybecomeinputsto
thenexthigherlevelRNNwhichthereforerecomputesitsinternalstateonlyrarely.EachhigherlevelRNNthus
learnsacompressedrepresentationoftheinformationintheRNNbelow.Thisisdonesuchthattheinputsequence
canbepreciselyreconstructedfromthesequencerepresentationatthehighestlevel.Thesystemeffectively
minimisesthedescriptionlengthorthenegativelogarithmoftheprobabilityofthedata.[8]Ifthereisalotof
learnablepredictabilityintheincomingdatasequence,thenthehighestlevelRNNcanusesupervisedlearningto
easilyclassifyevendeepsequenceswithverylongtimeintervalsbetweenimportantevents.In1993,suchasystem
alreadysolveda"VeryDeepLearning"taskthatrequiresmorethan1000subsequentlayersinanRNNunfoldedin
time.[32]
ItisalsopossibletodistilltheentireRNNhierarchyintoonlytwoRNNscalledthe"conscious"chunker(higher
level)andthe"subconscious"automatizer(lowerlevel).[15]Oncethechunkerhaslearnedtopredictandcompress
inputsthatarestillunpredictablebytheautomatizer,theautomatizerisforcedinthenextlearningphasetopredict
orimitatethroughspecialadditionalunitsthehiddenunitsofthemoreslowlychangingchunker.Thismakesit
easyfortheautomatizertolearnappropriate,rarelychangingmemoriesacrossverylongtimeintervals.Thisin
turnhelpstheautomatizertomakemanyofitsonceunpredictableinputspredictable,suchthatthechunkercan
focusontheremainingstillunpredictableevents,tocompressthedataevenfurther.[15]
Recursiveneuralnetworks
Arecursiveneuralnetwork[135]iscreatedbyapplyingthesamesetofweightsrecursivelyoveradifferentiable
graphlikestructure,bytraversingthestructureintopologicalorder.Suchnetworksaretypicallyalsotrainedby
thereversemodeofautomaticdifferentiation.[27][81]Theywereintroducedtolearndistributedrepresentationsof
structure,suchaslogicalterms.AspecialcaseofrecursiveneuralnetworksistheRNNitselfwhosestructure
correspondstoalinearchain.Recursiveneuralnetworkshavebeenappliedtonaturallanguageprocessing.[136]
TheRecursiveNeuralTensorNetworkusesatensorbasedcompositionfunctionforallnodesinthetree.[137]
Longshorttermmemory
NumerousresearchersnowusevariantsofadeeplearningRNNcalledtheLongshorttermmemory(LSTM)
networkpublishedbyHochreiter&Schmidhuberin1997.[51]ItisasystemthatunliketraditionalRNNsdoesn't
havethevanishinggradientproblem.LSTMisnormallyaugmentedbyrecurrentgatescalledforgetgates.[108]
LSTMRNNspreventbackpropagatederrorsfromvanishingorexploding.[35]Insteaderrorscanflowbackwards
throughunlimitednumbersofvirtuallayersinLSTMRNNsunfoldedinspace.Thatis,LSTMcanlearn"Very
DeepLearning"tasks[5]thatrequirememoriesofeventsthathappenedthousandsorevenmillionsofdiscretetime
stepsago.ProblemspecificLSTMliketopologiescanbeevolved.[138]LSTMworksevenwhentherearelong
delays,anditcanhandlesignalsthathaveamixoflowandhighfrequencycomponents.
Today,manyapplicationsusestacksofLSTMRNNs[139]andtrainthembyConnectionistTemporalClassification
(CTC)[53]tofindanRNNweightmatrixthatmaximizestheprobabilityofthelabelsequencesinatrainingset,
giventhecorrespondinginputsequences.CTCachievesbothalignmentandrecognition.In2009,CTCtrained
LSTMwasthefirstRNNtowinpatternrecognitioncontests,whenitwonseveralcompetitionsinconnected
handwritingrecognition.[5][88]Alreadyin2003,LSTMstartedtobecomecompetitivewithtraditionalspeech
recognizersoncertaintasks.[52]In2007,thecombinationwithCTCachievedfirstgoodresultsonspeechdata.[54]
Sincethen,thisapproachhasrevolutionisedspeechrecognition.In2014,theChinesesearchgiantBaiduused
CTCtrainedRNNstobreaktheSwitchboardHub5'00speechrecognitionbenchmark,withoutusingany
traditionalspeechprocessingmethods.[140]LSTMalsoimprovedlargevocabularyspeechrecognition,[64][65]text
tospeechsynthesis,[141]alsoforGoogleAndroid,[8][66]andphotorealtalkingheads.[142]In2015,Google'sspeech
recognitionreportedlyexperiencedadramaticperformancejumpof49%throughCTCtrainedLSTM,whichis
nowavailablethroughGoogleVoicetobillionsofsmartphoneusers.[55]
LSTMhasalsobecomeverypopularinthefieldofNaturalLanguageProcessing.Unlikepreviousmodelsbased
onHMMsandsimilarconcepts,LSTMcanlearntorecognisecontextsensitivelanguages.[109]LSTMimproved
machinetranslation,[110]Languagemodeling[111]andMultilingualLanguageProcessing.[112]LSTMcombined
withConvolutionalNeuralNetworks(CNNs)alsoimprovedautomaticimagecaptioning[143]andaplethoraof
otherapplications.
Deepbeliefnetworks
Adeepbeliefnetwork(DBN)isaprobabilistic,generativemodelmadeup
ofmultiplelayersofhiddenunits.Itcanbeconsideredacompositionof
simplelearningmodulesthatmakeupeachlayer.[16]
ADBNcanbeusedtogenerativelypretrainaDNNbyusingthelearned
DBNweightsastheinitialDNNweights.Backpropagationorother
discriminativealgorithmscanthenbeappliedforfinetuningofthese
weights.Thisisparticularlyhelpfulwhenlimitedtrainingdataare
available,becausepoorlyinitializedweightscansignificantlyhinderthe
learnedmodel'sperformance.Thesepretrainedweightsareinaregionof
theweightspacethatisclosertotheoptimalweightsthanarerandomly
choseninitialweights.Thisallowsforbothimprovedmodelingandfaster
convergenceofthefinetuningphase.[144]
ArestrictedBoltzmannmachine
ADBNcanbeefficientlytrainedinanunsupervised,layerbylayer (RBM)withfullyconnectedvisible
manner,wherethelayersaretypicallymadeofrestrictedBoltzmann andhiddenunits.Notethereareno
machines(RBM).AnRBMisanundirected,generativeenergybased hiddenhiddenorvisiblevisible
modelwitha"visible"inputlayerandahiddenlayer,andconnections connections.
betweenthelayersbutnotwithinlayers.ThetrainingmethodforRBMs
proposedbyGeoffreyHintonforusewithtraining"ProductofExpert"
modelsiscalledcontrastivedivergence(CD).[145]CDprovidesanapproximationtothemaximumlikelihood
methodthatwouldideallybeappliedforlearningtheweightsoftheRBM.[125][146]IntrainingasingleRBM,
weightupdatesareperformedwithgradientascentviathefollowingequation:
.Here, istheprobabilityofavisiblevector,whichisgivenby
. isthepartitionfunction(usedfornormalizing)and istheenergyfunction
assignedtothestateofthenetwork.Alowerenergyindicatesthenetworkisinamore"desirable"configuration.
Thegradient hasthesimpleform where representaverageswith
respecttodistribution .Theissuearisesinsampling becausethisrequiresrunningalternatingGibbs

samplingforalongtime.CDreplacesthisstepbyrunningalternatingGibbssamplingfor steps(valuesof
haveempiricallybeenshowntoperformwell).After steps,thedataaresampledandthatsampleisusedinplace
of .TheCDprocedureworksasfollows:[125]
1.Initializethevisibleunitstoatrainingvector.
2.Updatethehiddenunitsinparallelgiventhevisibleunits: . isthe
sigmoidfunctionand isthebiasof .
3.Updatethevisibleunitsinparallelgiventhehiddenunits: . isthe
biasof .Thisiscalledthe"reconstruction"step.
4.Reupdatethehiddenunitsinparallelgiventhereconstructedvisibleunitsusingthesameequationasinstep
2.
5.Performtheweightupdate: .
OnceanRBMistrained,anotherRBMis"stacked"atopit,takingitsinputfromthefinalalreadytrainedlayer.
Thenewvisiblelayerisinitializedtoatrainingvector,andvaluesfortheunitsinthealreadytrainedlayersare
assignedusingthecurrentweightsandbiases.ThenewRBMisthentrainedwiththeprocedureabove.Thiswhole
processisrepeateduntilsomedesiredstoppingcriterionismet.[3]
AlthoughtheapproximationofCDtomaximumlikelihoodisverycrude(hasbeenshowntonotfollowthe
gradientofanyfunction),ithasbeenempiricallyshowntobeeffectiveintrainingdeeparchitectures.[125]
Convolutionaldeepbeliefnetworks
Arecentachievementindeeplearningistheuseofconvolutionaldeepbeliefnetworks(CDBN).CDBNshave
structureverysimilartoaconvolutionalneuralnetworksandaretrainedsimilartodeepbeliefnetworks.
Therefore,theyexploitthe2Dstructureofimages,likeCNNsdo,andmakeuseofpretraininglikedeepbelief
networks.Theyprovideagenericstructurewhichcanbeusedinmanyimageandsignalprocessingtasks.
Recently,manybenchmarkresultsonstandardimagedatasetslikeCIFAR[147]havebeenobtainedusing
CDBNs.[148]
Largememorystorageandretrievalneuralnetworks
Largememorystorageandretrievalneuralnetworks(LAMSTAR)[149][150]arefastdeeplearningneuralnetworks
ofmanylayerswhichcanusemanyfilterssimultaneously.Thesefiltersmaybenonlinear,stochastic,logic,non
stationary,orevennonanalytical.Theyarebiologicallymotivatedandcontinuouslylearning.
ALAMSTARneuralnetworkmayserveasadynamicneuralnetworkinspatialortimedomainorboth.Itsspeed
isprovidedbyHebbianlinkweights(Chapter9ofinD.Graupe,2013[151]),whichservetointegratethevarious
andusuallydifferentfilters(preprocessingfunctions)intoitsmanylayersandtodynamicallyrankthesignificance
ofthevariouslayersandfunctionsrelativetoagiventaskfordeeplearning.Thisgrosslyimitatesbiological
learningwhichintegratesoutputsvariouspreprocessors(cochlea,retina,etc.)andcortexes(auditory,visual,etc.)
andtheirvariousregions.Itsdeeplearningcapabilityisfurtherenhancedbyusinginhibition,correlationandbyits
abilitytocopewithincompletedata,or"lost"neuronsorlayersevenatthemidstofatask.Furthermore,itisfully
transparentduetoitslinkweights.Thelinkweightsalsoallowdynamicdeterminationofinnovationand
redundancy,andfacilitatetherankingoflayers,offiltersorofindividualneuronsrelativetoatask.
LAMSTARhasbeenappliedtomanymedical[152][153][154]andfinancialpredictions(seeGraupe,2013[155]Section
9C),adaptivefilteringofnoisyspeechinunknownnoise,[156]stillimagerecognition[157](Graupe,2013[158]
Section9D),videoimagerecognition,[159]softwaresecurity,[160]adaptivecontrolofnonlinearsystems,[161]and
others.LAMSTARhadamuchfastercomputingspeedandsomewhatlowererrorthanaconvolutionalneural
networkbasedonReLUfunctionfiltersandmaxpooling,in20comparativestudies.[162]
Theseapplicationsdemonstratedelvingintoaspectsofthedatathatarehiddenfromshallowlearningnetworksor
evenfromthehumansenses(eye,ear),suchasinthecasesofpredictingonsetofsleepapneaevents,[153]ofan
electrocardiogramofafetusasrecordedfromskinsurfaceelectrodesplacedonthemother'sabdomenearlyin
pregnancy,[154]offinancialprediction(Section9CinGraupe,2013),[149]orinblindfilteringofnoisyspeech.[156]
LAMSTARwasproposedin1996(AU.S.Patent5,920,852A(https://www.google.com/patents/US5920852))and
wasfurtherdevelopedbyDGraupeandHKordylewski19972002.[163][164][165]Amodifiedversion,knownas
LAMSTAR2,wasdevelopedbyNCSchneiderandDGraupein2008.[166][167]
DeepBoltzmannmachines
AdeepBoltzmannmachine(DBM)isatypeofbinarypairwiseMarkovrandomfield(undirectedprobabilistic
graphicalmodel)withmultiplelayersofhiddenrandomvariables.Itisanetworkofsymmetricallycoupled
stochasticbinaryunits.Itcomprisesasetofvisibleunits ,andaseriesoflayersofhiddenunits
.Thereisnoconnectionbetweenunitsofthesamelayer
(likeRBM).FortheDBM,theprobabilityassignedtovectoris
where arethesetofhiddenunits,and arethemodel

parameters,representingvisiblehiddenandhiddenhiddeninteractions.If and thenetwork
isthewellknownrestrictedBoltzmannmachine. [168]Interactionsaresymmetricbecauselinksareundirected.By
contrast,inadeepbeliefnetwork(DBN)onlythetoptwolayersformarestrictedBoltzmannmachine(whichisan
undirectedgraphicalmodel),butlowerlayersformadirectedgenerativemodel.
LikeDBNs,DBMscanlearncomplexandabstractinternalrepresentationsoftheinputintaskssuchasobjector
speechrecognition,usinglimitedlabeleddatatofinetunetherepresentationsbuiltusingalargesupplyof
unlabeledsensoryinputdata.However,unlikeDBNsanddeepconvolutionalneuralnetworks,theyadoptthe
inferenceandtrainingprocedureinbothdirections,bottomupandtopdownpass,whichallowtheDBMstobetter
unveiltherepresentationsoftheambiguousandcomplexinputstructures.[169][170]
However,thespeedofDBMslimitstheirperformanceandfunctionality.Becauseexactmaximumlikelihood
learningisintractableforDBMs,wemayperformapproximatemaximumlikelihoodlearning.Anotheroptionisto
usemeanfieldinferencetoestimatedatadependentexpectations,andapproximationtheexpectedsufficient
statisticsofthemodelbyusingMarkovchainMonteCarlo(MCMC).[168]Thisapproximateinference,whichmust
bedoneforeachtestinput,isabout25to50timesslowerthanasinglebottomuppassinDBMs.Thismakesthe
jointoptimizationimpracticalforlargedatasets,andseriouslyrestrictstheuseofDBMsfortaskssuchasfeature
representation.[171]
Stacked(denoising)autoencoders
Theautoencoderideaismotivatedbytheconceptofagoodrepresentation.Forexample,foraclassifier,agood
representationcanbedefinedasonethatwillyieldabetterperformingclassifier.
Anencoderisadeterministicmapping thattransformsaninputvectorxintohiddenrepresentationy,where
, istheweightmatrixandbisanoffsetvector(bias).Adecodermapsbackthehidden
representationytothereconstructedinputzvia .Thewholeprocessofautoencodingistocomparethis
reconstructedinputtotheoriginalandtrytominimizethiserrortomakethereconstructedvalueascloseas
possibletotheoriginal.
Instackeddenoisingautoencoders,thepartiallycorruptedoutputiscleaned(denoised).Thisideawasintroduced
in2010byVincentetal.[172]withaspecificapproachtogoodrepresentation,agoodrepresentationisonethatcan
beobtainedrobustlyfromacorruptedinputandthatwillbeusefulforrecoveringthecorrespondingcleaninput.
Implicitinthisdefinitionarethefollowingideas:
Thehigherlevelrepresentationsarerelativelystableandrobusttoinputcorruption
Itisnecessarytoextractfeaturesthatareusefulforrepresentationoftheinputdistribution.
Thealgorithmconsistsofmultiplestepsstartsbyastochasticmappingof to through ,thisisthe

corruptingstep.Thenthecorruptedinput passesthroughabasicautoencoderprocessandismappedtoahidden
representation .Fromthishiddenrepresentation,wecanreconstruct .Inthe
laststage,aminimizationalgorithmrunsinordertohavezascloseaspossibletouncorruptedinput .The
reconstructionerror mightbeeitherthecrossentropylosswithanaffinesigmoiddecoder,orthe
squarederrorlosswithanaffinedecoder.[172]
Inordertomakeadeeparchitecture,autoencodersstackoneontopofanother.[173]Oncetheencodingfunction
ofthefirstdenoisingautoencoderislearnedandusedtouncorrupttheinput(corruptedinput),wecantrainthe
secondlevel.[172]
Oncethestackedautoencoderistrained,itsoutputcanbeusedastheinputtoasupervisedlearningalgorithmsuch
assupportvectormachineclassifieroramulticlasslogisticregression.[172]
Deepstackingnetworks
Onedeeparchitecturebasedonahierarchyofblocksofsimplifiedneuralnetworkmodulesisadeepconvex
network,introducedin2011.[174]Here,theweightslearningproblemisformulatedasaconvexoptimization
problemwithaclosedformsolution.Thisarchitectureisalsocalledadeepstackingnetwork(DSN),[175]
emphasizingthemechanism'ssimilaritytostackedgeneralization.[176]EachDSNblockisasimplemodulethatis
easytotrainbyitselfinasupervisedfashionwithoutbackpropagationfortheentireblocks.[177]
AsdesignedbyDengandDong,[174]eachblockconsistsofasimplifiedmultilayerperceptron(MLP)witha
singlehiddenlayer.Thehiddenlayerhhaslogisticsigmoidalunits,andtheoutputlayerhaslinearunits.
ConnectionsbetweentheselayersarerepresentedbyweightmatrixUinputtohiddenlayerconnectionshave
weightmatrixW.TargetvectorstformthecolumnsofmatrixT,andtheinputdatavectorsxformthecolumnsof
matrixX.Thematrixofhiddenunitsis .Modulesaretrainedinorder,solowerlayerweightsW
areknownateachstage.Thefunctionperformstheelementwiselogisticsigmoidoperation.Eachblockestimates
thesamefinallabelclassy,anditsestimateisconcatenatedwithoriginalinputXtoformtheexpandedinputfor
thenextblock.Thus,theinputtothefirstblockcontainstheoriginaldataonly,whiledownstreamblocks'input
alsohastheoutputofprecedingblocks.ThenlearningtheupperlayerweightmatrixUgivenotherweightsinthe
networkcanbeformulatedasaconvexoptimizationproblem:
whichhasaclosedformsolution.
Unlikeotherdeeparchitectures,suchasDBNs,thegoalisnottodiscoverthetransformedfeaturerepresentation.
Thestructureofthehierarchyofthiskindofarchitecturemakesparallellearningstraightforward,asabatchmode
optimizationproblem.Inpurelydiscriminativetasks,DSNsperformbetterthanconventionalDBN.[175]
Tensordeepstackingnetworks
Thisarchitectureisanextensionofdeepstackingnetworks(DSN).ItimprovesonDSNintwoimportantways:it
useshigherorderinformationfromcovariancestatistics,andittransformsthenonconvexproblemofalower
layertoaconvexsubproblemofanupperlayer.[178]TDSNsusecovariancestatisticsofthedatabyusinga
bilinearmappingfromeachoftwodistinctsetsofhiddenunitsinthesamelayertopredictions,viaathirdorder
tensor.
WhileparallelizationandscalabilityarenotconsideredseriouslyinconventionalDNNs,[179][180][181]alllearning
forDSNsandTDSNsisdoneinbatchmode,toallowparallelizationonaclusterofCPUorGPUnodes.[174][175]
Parallelizationallowsscalingthedesigntolarger(deeper)architecturesanddatasets.
Thebasicarchitectureissuitablefordiversetaskssuchasclassificationandregression.
SpikeandslabRBMs
Theneedfordeeplearningwithrealvaluedinputs,asinGaussianrestrictedBoltzmannmachines,motivatesthe
spikeandslabRBM(ssRBMs),whichmodelscontinuousvaluedinputswithstrictlybinarylatentvariables.[182]
SimilartobasicRBMsanditsvariants,aspikeandslabRBMisabipartitegraph,whilelikeGRBMs,thevisible
units(input)arerealvalued.Thedifferenceisinthehiddenlayer,whereeachhiddenunithasabinaryspike
variableandarealvaluedslabvariable.Aspikeisadiscreteprobabilitymassatzero,whileaslabisadensityover
continuousdomain[183][183]theirmixtureformsaprior.Thetermscomefromthestatisticsliterature.[184]
AnextensionofssRBMcalledssRBMprovidesextramodelingcapacityusingadditionaltermsintheenergy
function.Oneofthesetermsenablesthemodeltoformaconditionaldistributionofthespikevariablesby
marginalizingouttheslabvariablesgivenanobservation.
Compoundhierarchicaldeepmodels
CompoundhierarchicaldeepmodelscomposedeepnetworkswithnonparametricBayesianmodels.Featurescan
belearnedusingdeeparchitecturessuchasDBNs,[92]DBMs,[169]deepautoencoders,[185]convolutional
variants,[186][187]ssRBMs,[183]deepcodingnetworks,[188]DBNswithsparsefeaturelearning,[189]recursiveneural
networks,[190]conditionalDBNs,[191]denoisingautoencoders.[192]Thisprovidesabetterrepresentation,allowing
fasterlearningandmoreaccurateclassificationwithhighdimensionaldata.However,thesearchitecturesarepoor
atlearningnovelclasseswithfewexamples,becauseallnetworkunitsareinvolvedinrepresentingtheinput(a
distributedrepresentation)andmustbeadjustedtogether(highdegreeoffreedom).Limitingthedegreeof
freedomreducesthenumberofparameterstolearn,facilitatinglearningofnewclassesfromfewexamples.
HierarchicalBayesian(HB)modelsallowlearningfromfewexamples,forexample[193][194][195][196][197]for
computervision,statistics,andcognitivescience.
CompoundHDarchitecturesaimtointegratecharacteristicsofbothHBanddeepnetworks.ThecompoundHDP
DBMarchitecture,ahierarchicalDirichletprocess(HDP)asahierarchicalmodel,incorporatedwithDBM
architecture.Itisafullgenerativemodel,generalizedfromabstractconceptsflowingthroughthelayersofthe
model,whichisabletosynthesizenewexamplesinnovelclassesthatlookreasonablynatural.Allthelevelsare
learnedjointlybymaximizingajointlogprobabilityscore.[198]
InaDBMwiththreehiddenlayers,theprobabilityofavisibleinputis:
where isthesetofhiddenunits,and arethemodelparameters,

representingvisiblehiddenandhiddenhiddensymmetricinteractionterms.
AfteraDBMmodelislearned,wehaveanundirectedmodelthatdefinesthejointdistribution .
Onewaytoexpresswhathasbeenlearnedistheconditionalmodel andapriorterm .
Here representsaconditionalDBMmodel,whichcanbeviewedasatwolayerDBMbutwith
biastermsgivenbythestatesof :
Deepcodingnetworks
Thereareadvantagesofamodelwhichcanactivelyupdateitselffromthecontextindata.Deepcodingnetwork
(DPCN)isapredictivecodingschemewheretopdowninformationisusedtoempiricallyadjustthepriorsneeded
forabottomupinferenceprocedurebymeansofadeeplocallyconnectedgenerativemodel.Thisworksby
extractingsparsefeaturesfromtimevaryingobservationsusingalineardynamicalmodel.Then,apoolingstrategy
isusedtolearninvariantfeaturerepresentations.Theseunitscomposetoformadeeparchitecture,andaretrained
bygreedylayerwiseunsupervisedlearning.ThelayersconstituteakindofMarkovchainsuchthatthestatesat
anylayeronlydependontheprecedingandsucceedinglayers.
Deeppredictivecodingnetwork(DPCN)[199]predictstherepresentationofthelayer,byusingatopdown
approachusingtheinformationinupperlayerandtemporaldependenciesfromthepreviousstates.
DPCNscanbeextendedtoformaconvolutionalnetwork.[199]
DeepQnetworks
AdeepQnetwork(DQN)isatypeofdeeplearningmodeldevelopedatGoogleDeepMindwhichcombinesa
deepconvolutionalneuralnetworkwithQlearning,aformofreinforcementlearning.Unlikeearlierreinforcement
learningagents,DQNscanlearndirectlyfromhighdimensionalsensoryinputs.Preliminaryresultswerepresented
in2014,withapaperpublishedinFebruary2015inNature[200]Theapplicationdiscussedinthispaperislimitedto
Atari2600gaming,althoughithasimplicationsforotherapplications.However,muchbeforethiswork,therehad
beenanumberofreinforcementlearningmodelsthatapplydeeplearningapproaches(e.g.,[201]).
Networkswithseparatememorystructures
Integratingexternalmemorywithartificialneuralnetworksdatestoearlyresearchindistributed
representations[202]andTeuvoKohonen'sselforganizingmaps.Forexample,insparsedistributedmemoryor
hierarchicaltemporalmemory,thepatternsencodedbyneuralnetworksareusedasaddressesforcontent
addressablememory,with"neurons"essentiallyservingasaddressencodersanddecoders.However,theearly
controllersofsuchmemorieswerenotdifferentiable.
LSTMrelateddifferentiablememorystructures
Apartfromlongshorttermmemory(LSTM),otherapproachesofthe1990sand2000salsoaddeddifferentiable
memorytorecurrentfunctions.Forexample:
Differentiablepushandpopactionsforalternativememorynetworkscalledneuralstackmachines[203][204]
Memorynetworkswherethecontrolnetwork'sexternaldifferentiablestorageisinthefastweightsof
anothernetwork[205]
LSTM"forgetgates"[206]
Selfreferentialrecurrentneuralnetworks(RNNs)withspecialoutputunitsforaddressingandrapidly
manipulatingeachoftheRNN'sownweightsindifferentiablefashion(internalstorage)[207][208]
Learningtotransducewithunboundedmemory[209]
Semantichashing
Approacheswhichrepresentpreviousexperiencesdirectlyanduseasimilarexperiencetoformalocalmodelare
oftencallednearestneighbourorknearestneighborsmethods.[210]Morerecently,deeplearningwasshowntobe
usefulinsemantichashing[211]whereadeepgraphicalmodelthewordcountvectors[212]obtainedfromalargeset
ofdocuments.Documentsaremappedtomemoryaddressesinsuchawaythatsemanticallysimilardocumentsare
locatedatnearbyaddresses.Documentssimilartoaquerydocumentcanthenbefoundbysimplyaccessingallthe
addressesthatdifferbyonlyafewbitsfromtheaddressofthequerydocument.Unlikesparsedistributedmemory
whichoperateson1000bitaddresses,semantichashingworkson32or64bitaddressesfoundinaconventional
computerarchitecture.
NeuralTuringmachines
NeuralTuringmachines,[213]developedbyGoogleDeepMind,coupleLSTMnetworkstoexternalmemory
resources,whichtheycaninteractwithbyattentionalprocesses.ThecombinedsystemisanalogoustoaTuring
machinebutisdifferentiableendtoend,allowingittobeefficientlytrainedbygradientdescent.Preliminary
resultsdemonstratethatneuralTuringmachinescaninfersimplealgorithmssuchascopying,sorting,and
associativerecallfrominputandoutputexamples.
Memorynetworks
Memorynetworks[214][215]areanotherextensiontoneuralnetworksincorporatinglongtermmemory,whichwas
developedbytheFacebookresearchteam.Thelongtermmemorycanbereadandwrittento,withthegoalof
usingitforprediction.Thesemodelshavebeenappliedinthecontextofquestionanswering(QA)wherethelong
termmemoryeffectivelyactsasa(dynamic)knowledgebase,andtheoutputisatextualresponse.[216]
Pointernetworks
Deepneuralnetworkscanbepotentiallyimprovediftheygetdeeperandhavefewerparameters,whilemaintaining
trainability.Whiletrainingextremelydeep(e.g.1millionlayerdeep)neuralnetworksmightnotbepractically
feasible,CPUlikearchitecturessuchaspointernetworks[217]andneuralrandomaccessmachines[218]developed
byGoogleBrainresearchersovercomethislimitationbyusingexternalrandomaccessmemoryaswellasadding
othercomponentsthattypicallybelongtoacomputerarchitecturesuchasregisters,ALUandpointers.Such
systemsoperateonprobabilitydistributionvectorsstoredinmemorycellsandregisters.Thus,themodelisfully
differentiableandtrainsendtoend.Thekeycharacteristicofthesemodelsisthattheirdepth,thesizeoftheir
shorttermmemory,andthenumberofparameterscanbealteredindependentlyunlikemodelslikeLongshort
termmemory,whosenumberofparametersgrowsquadraticallywithmemorysize.
Encoderdecodernetworks
Anencoderdecoderframeworkisaframeworkbasedonneuralnetworksthataimstomaphighlystructuredinput
tohighlystructuredoutput.Itwasproposedrecentlyinthecontextofmachinetranslation,[219][220][221]wherethe
inputandoutputarewrittensentencesintwonaturallanguages.Inthatwork,anLSTMrecurrentneuralnetwork
(RNN)orconvolutionalneuralnetwork(CNN)wasusedasanencodertosummarizeasourcesentence,andthe
summarywasdecodedusingaconditionalrecurrentneuralnetworklanguagemodeltoproducethetranslation.[222]
Allthesesystemshavethesamebuildingblocks:gatedRNNsandCNNs,andtrainedattentionmechanisms.
Otherarchitectures
Multilayerkernelmachine
Multilayerkernelmachines(MKM)asintroducedin[223]areawayoflearninghighlynonlinearfunctionsby
iterativeapplicationofweaklynonlinearkernels.Theyusethekernelprincipalcomponentanalysis(KPCA),
in[224]asmethodforunsupervisedgreedylayerwisepretrainingstepofthedeeplearningarchitecture.
Layer thlearnstherepresentationofthepreviouslayer ,extractingthe principalcomponent(PC)ofthe

projectionlayer outputinthefeaturedomaininducedbythekernel.Forthesakeofdimensionalityreductionof
theupdatedrepresentationineachlayer,asupervisedstrategyisproposedtoselectthebestinformativefeatures
amongfeaturesextractedbyKPCA.Theprocessis:
rankthe featuresaccordingtotheirmutualinformationwiththeclasslabels
fordifferentvaluesofKand ,computetheclassificationerrorrateofaKnearestneighbor
(KNN)classifierusingonlythe mostinformativefeaturesonavalidationset
thevalueof withwhichtheclassifierhasreachedthelowesterrorratedeterminesthenumberoffeatures
toretain.
TherearesomedrawbacksinusingtheKPCAmethodasthebuildingcellsofanMKM.
AmorestraightforwardwaytousekernelmachinesfordeeplearningwasdevelopedbyMicrosoftresearchersfor
spokenlanguageunderstanding.[225]Themainideaistouseakernelmachinetoapproximateashallowneuralnet
withaninfinitenumberofhiddenunits,thenusestackingtosplicetheoutputofthekernelmachineandtheraw
inputinbuildingthenext,higherlevelofthekernelmachine.Thenumberoflevelsinthedeepconvexnetworkis
ahyperparameteroftheoverallsystem,tobedeterminedbycrossvalidation.
Applications
Automaticspeechrecognition
Speechrecognitionhasbeenrevolutionisedbydeeplearning,especiallybyLongshorttermmemory(LSTM),a
recurrentneuralnetworkpublishedbySeppHochreiter&JrgenSchmidhuberin1997.[51]LSTMRNNs
circumventthevanishinggradientproblemandcanlearn"VeryDeepLearning"tasks[5]thatinvolvespeechevents
separatedbythousandsofdiscretetimesteps,whereonetimestepcorrespondstoabout10ms.In2003,LSTM
withforgetgates[108]becamecompetitivewithtraditionalspeechrecognizersoncertaintasks.[52]In2007,LSTM
trainedbyConnectionistTemporalClassification(CTC)[53]achievedexcellentresultsincertainapplications,[54]
althoughcomputersweremuchslowerthantoday.In2015,Google'slargescalespeechrecognitionsuddenly
almostdoubleditsperformancethroughCTCtrainedLSTM,nowavailabletoallsmartphoneusers.[55]
Theinitialsuccessofdeeplearninginspeechrecognition,however,wasbasedonsmallscaleTIMITtasks.The
resultsshowninthetablebelowareforautomaticspeechrecognitiononthepopularTIMITdataset.Thisisa
commondatasetusedforinitialevaluationsofdeeplearningarchitectures.Theentiresetcontains630speakers
fromeightmajordialectsofAmericanEnglish,whereeachspeakerreads10sentences.[226]Itssmallsizeallows
manyconfigurationstobetriedeffectively.Moreimportantly,theTIMITtaskconcernsphonesequence
recognition,which,unlikewordsequencerecognition,allowsveryweak"languagemodels"andthusthe
weaknessesinacousticmodelingaspectsofspeechrecognitioncanbemoreeasilyanalyzed.Suchanalysison
TIMITbyLiDengandcollaboratorsaround20092010,contrastingtheGMM(andothergenerativemodelsof
speech)vs.DNNmodels,stimulatedearlyindustrialinvestmentindeeplearningforspeechrecognitionfromsmall
tolargescales,[48][69]eventuallyleadingtopervasiveanddominantuseinthatindustry.Thatanalysiswasdone
withcomparableperformance(lessthan1.5%inerrorrate)betweendiscriminativeDNNsandgenerativemodels.
Theerrorrateslistedbelow,includingtheseearlyresultsandmeasuredaspercentphoneerrorrates(PER),have
beensummarizedoveratimespanofthepast20years:
Method PER(%)
RandomlyInitializedRNN 26.1
BayesianTriphoneGMMHMM 25.6
HiddenTrajectory(Generative)Model 24.8
MonophoneRandomlyInitializedDNN 23.4
MonophoneDBNDNN 22.4
TriphoneGMMHMMwithBMMITraining 21.7
MonophoneDBNDNNonfbank 20.7
ConvolutionalDNN[227] 20.0
ConvolutionalDNNw.HeterogeneousPooling 18.7
EnsembleDNN/CNN/RNN[228] 18.2
BidirectionalLSTM 17.9
In2010,industrialresearchersextendeddeeplearningfromTIMITtolargevocabularyspeechrecognition,by
adoptinglargeoutputlayersoftheDNNbasedoncontextdependentHMMstatesconstructedbydecision
trees.[229][230][231]ComprehensivereviewsofthisdevelopmentandofthestateoftheartasofOctober2014are
providedintherecentSpringerbookfromMicrosoftResearch.[70]Anearlierarticle[232]reviewedthebackground
ofautomaticspeechrecognitionandtheimpactofvariousmachinelearningparadigms,includingdeeplearning.
Onefundamentalprincipleofdeeplearningistodoawaywithhandcraftedfeatureengineeringandtouseraw
features.Thisprinciplewasfirstexploredsuccessfullyinthearchitectureofdeepautoencoderonthe"raw"
spectrogramorlinearfilterbankfeaturesatSRIinthelate1990s,[46]andlateratMicrosoft,[233]showingits
superiorityovertheMelCepstralfeatureswhichcontainafewstagesoffixedtransformationfromspectrograms.
Thetrue"raw"featuresofspeech,waveforms,havemorerecentlybeenshowntoproduceexcellentlargerscale
speechrecognitionresults.[234]
SincetheinitialsuccessfuldebutofDNNsforspeakerrecognitioninthelate1990sandspeechrecognitionaround
20092011andofLSTMaround20032007,therehavebeenhugenewprogressesmade.Progress(andfuture
directions)canbesummarizedintoeightmajorareas:[1][50][70]
Scalingup/outandspeedupDNNtraininganddecoding
SequencediscriminativetrainingofDNNs
Featureprocessingbydeepmodelswithsolidunderstandingoftheunderlyingmechanisms
AdaptationofDNNsandofrelateddeepmodels
MultitaskandtransferlearningbyDNNsandrelateddeepmodels
Convolutionneuralnetworksandhowtodesignthemtobestexploitdomainknowledgeofspeech
RecurrentneuralnetworkanditsrichLSTMvariants
Othertypesofdeepmodelsincludingtensorbasedmodelsandintegrateddeepgenerative/discriminative
models.
Largescaleautomaticspeechrecognitionisthefirstandmostconvincingsuccessfulcaseofdeeplearninginthe
recenthistory,embracedbybothindustryandacademiaacrosstheboard.Between2010and2014,thetwomajor
conferencesonsignalprocessingandspeechrecognition,IEEEICASSPandInterspeech,haveseenalarge
increaseinthenumbersofacceptedpapersintheirrespectiveannualconferencepapersonthetopicofdeep
learningforspeechrecognition.Moreimportantly,allmajorcommercialspeechrecognitionsystems(e.g.,
MicrosoftCortana,Xbox,SkypeTranslator,AmazonAlexa,GoogleNow,AppleSiri,BaiduandiFlyTekvoice
search,andarangeofNuancespeechproducts,etc.)areallbasedondeeplearningmethods.[1][235][236][237]See
alsotherecentmediainterviewwiththeCTOofNuanceCommunications.[238]
Imagerecognition
AcommonevaluationsetforimageclassificationistheMNISTdatabasedataset.MNISTiscomposedof
handwrittendigitsandincludes60,000trainingexamplesand10,000testexamples.AswithTIMIT,itssmallsize
allowsmultipleconfigurationstobetested.Acomprehensivelistofresultsonthissetcanbefoundin.[239]The
currentbestresultonMNISTisanerrorrateof0.23%,achievedbyCiresanetal.in2012.[240]
AccordingtoLeCun,[67]intheearly2000s,inanindustrialapplicationCNNsalreadyprocessedanestimated10%
to20%ofallthecheckswrittenintheUSintheearly2000s.Significantadditionalimpactofdeeplearningin
imageorobjectrecognitionwasfeltintheyears20112012.AlthoughCNNstrainedbybackpropagationhadbeen
aroundfordecades,[31]andGPUimplementationsofNNsforyears,[74]includingCNNs,[75]fastimplementations
ofCNNswithmaxpoolingonGPUsinthestyleofDanCiresanandcolleagues[96]wereneededtomakeadentin
computervision.[5]In2011,thisapproachachievedforthefirsttimesuperhumanperformanceinavisualpattern
recognitioncontest.[98]Alsoin2011,itwontheICDARChinesehandwritingcontest,andinMay2012,itwonthe
ISBIimagesegmentationcontest.[99]Until2011,CNNsdidnotplayamajorroleatcomputervisionconferences,
butinJune2012,apaperbyDanCiresanetal.attheleadingconferenceCVPR[101]showedhowmaxpooling
CNNsonGPUcandramaticallyimprovemanyvisionbenchmarkrecords,sometimeswithhumancompetitiveor
evensuperhumanperformance.InOctober2012,asimilarsystembyAlexKrizhevskyintheteamofGeoff
Hinton[100]wonthelargescaleImageNetcompetitionbyasignificantmarginovershallowmachinelearning
methods.InNovember2012,Ciresanetal.'ssystemalsowontheICPRcontestonanalysisoflargemedicalimages
forcancerdetection,andinthefollowingyearalsotheMICCAIGrandChallengeonthesametopic.[241]In2013
and2014,theerrorrateontheImageNettaskusingdeeplearningwasfurtherreducedquickly,followingasimilar
trendinlargescalespeechrecognition.ReleasesliketheWolframImageIdentificationprojectcontinuetobring
improvementsinthetechnologytothepubliceye.[242]
Asintheambitiousmovesfromautomaticspeechrecognitiontowardautomaticspeechtranslationand
understanding,imageclassificationhasrecentlybeenextendedtothemorechallengingtaskofautomaticimage
captioning,inwhichdeeplearning(oftenasacombinationofCNNsandLSTMs)istheessentialunderlying
technology[243][244][245][246]
Oneexampleapplicationisacarcomputersaidtobetrainedwithdeeplearning,whichmayenablecarsto
interpret360cameraviews.[247]AnotherexampleisthetechnologyknownasFacialDysmorphologyNovel
Analysis(FDNA)usedtoanalyzecasesofhumanmalformationconnectedtoalargedatabaseofgenetic
syndromes.
Naturallanguageprocessing
Neuralnetworkshavebeenusedforimplementinglanguagemodelssincetheearly2000s.[109][248]Recurrent
neuralnetworks,especiallyLSTM,[51]aremostappropriateforsequentialdatasuchaslanguage.LSTMhelpedto
improvemachinetranslation[110]andLanguageModeling.[111][112]LSTMcombinedwithCNNsalsoimproved
automaticimagecaptioning[143]andaplethoraofotherapplications.[5]
Otherkeytechniquesinthisfieldarenegativesampling[249]andwordembedding.Wordembedding,suchas
word2vec,canbethoughtofasarepresentationallayerinadeeplearningarchitecture,thattransformsanatomic
wordintoapositionalrepresentationofthewordrelativetootherwordsinthedatasetthepositionisrepresented
asapointinavectorspace.Usingwordembeddingasaninputlayertoarecursiveneuralnetwork(RNN)allows
thetrainingofthenetworktoparsesentencesandphrasesusinganeffectivecompositionalvectorgrammar.A
compositionalvectorgrammarcanbethoughtofasprobabilisticcontextfreegrammar(PCFG)implementedbya
recursiveneuralnetwork.[250]Recursiveautoencodersbuiltatopwordembeddingshavebeentrainedtoassess
sentencesimilarityanddetectparaphrasing.[250]Deepneuralarchitectureshaveachievedstateoftheartresultsin
manynaturallanguageprocessingtaskssuchasconstituencyparsing,[251]sentimentanalysis,[252]information
retrieval,[253][254]spokenlanguageunderstanding,[255]machinetranslation,[110][256]contextualentitylinking,[257]
writingstylerecognition[258]andothers.[259]
Drugdiscoveryandtoxicology
Thepharmaceuticalindustryfacestheproblemthatalargepercentageofcandidatedrugsfailtoreachthemarket.
Thesefailuresofchemicalcompoundsarecausedbyinsufficientefficacyonthebiomoleculartarget(ontarget
effect),undetectedandundesiredinteractionswithotherbiomolecules(offtargeteffects),orunanticipatedtoxic
effects.[260][261]In2012,ateamledbyGeorgeDahlwonthe"MerckMolecularActivityChallenge"usingmulti
taskdeepneuralnetworkstopredictthebiomoleculartargetofacompound.[262][263]In2014,SeppHochreiter's
groupusedDeepLearningtodetectofftargetandtoxiceffectsofenvironmentalchemicalsinnutrients,household
productsanddrugsandwonthe"Tox21DataChallenge"ofNIH,FDAandNCATS.[264][265]Theseimpressive
successesshowthatdeeplearningmaybesuperiortoothervirtualscreeningmethods.[266][267]Researchersfrom
GoogleandStanfordenhanceddeeplearningfordrugdiscoverybycombiningdatafromavarietyofsources.[268]
In2015,AtomwiseintroducedAtomNet,thefirstdeeplearningneuralnetworksforstructurebasedrationaldrug
design.[269]Subsequently,AtomNetwasusedtopredictnovelcandidatebiomoleculesforseveraldiseasetargets,
mostnotablytreatmentsfortheEbolavirus[270]andmultiplesclerosis.[271][272]
Customerrelationshipmanagement
Recentlysuccesshasbeenreportedwithapplicationofdeepreinforcementlearningindirectmarketingsettings,
illustratingsuitabilityofthemethodforCRMautomation.Aneuralnetworkwasusedtoapproximatethevalueof
possibledirectmarketingactionsoverthecustomerstatespace,definedintermsofRFMvariables.Theestimated
valuefunctionwasshowntohaveanaturalinterpretationascustomerlifetimevalue.[273]
Recommendationsystems
Recommendationsystemshaveuseddeeplearningtoextractmeaningfuldeepfeaturesforlatentfactormodelfor
contentbasedrecommendationformusic.[274]Recently,amoregeneralapproachforlearninguserpreferences
frommultipledomainsusingmultiviewdeeplearninghasbeenintroduced.[275]Themodelusesahybrid
collaborativeandcontentbasedapproachandenhancesrecommendationsinmultipletasks.
Biomedicalinformatics
Recently,adeeplearningapproachbasedonanautoencoderartificialneuralnetworkhasbeenusedin
bioinformatics,topredictGeneOntologyannotationsandgenefunctionrelationships.[276]
Inmedicalinformatics,deeplearninghasalsobeenusedinthehealthdomain,includingthepredictionofsleep
qualitybasedonwearabledata[277]andpredictionsofhealthcomplicationsfromElectronicHealthRecord
data.[278]
Theoriesofthehumanbrain
Computationaldeeplearningiscloselyrelatedtoaclassoftheoriesofbraindevelopment(specifically,neocortical
development)proposedbycognitiveneuroscientistsintheearly1990s.[279]Anapproachablesummaryofthis
workisElman,etal.'s1996book"RethinkingInnateness"[280](seealso:ShragerandJohnson[281]Quartzand
Sejnowski[282]).Asthesedevelopmentaltheorieswerealsoinstantiatedincomputationalmodels,theyare
technicalpredecessorsofpurelycomputationallymotivateddeeplearningmodels.Thesedevelopmentalmodels
sharetheinterestingpropertythatvariousproposedlearningdynamicsinthebrain(e.g.,awaveofnervegrowth
factor)conspiretosupporttheselforganizationofjustthesortofinterrelatedneuralnetworksutilizedinthelater,
purelycomputationaldeeplearningmodelsandsuchcomputationalneuralnetworksseemanalogoustoaviewof
thebrain'sneocortexasahierarchyoffiltersinwhicheachlayercapturessomeoftheinformationintheoperating
environment,andthenpassestheremainder,aswellasmodifiedbasesignal,tootherlayersfurtherupthe
hierarchy.Thisprocessyieldsaselforganizingstackoftransducers,welltunedtotheiroperatingenvironment.As
describedinTheNewYorkTimesin1995:"...theinfant'sbrainseemstoorganizeitselfundertheinfluenceof
wavesofsocalledtrophicfactors...differentregionsofthebrainbecomeconnectedsequentially,withonelayer
oftissuematuringbeforeanotherandsoonuntilthewholebrainismature."[283]
Theimportanceofdeeplearningwithrespecttotheevolutionanddevelopmentofhumancognitiondidnotescape
theattentionoftheseresearchers.Oneaspectofhumandevelopmentthatdistinguishesusfromournearestprimate
neighborsmaybechangesinthetimingofdevelopment.[284]Amongprimates,thehumanbrainremainsrelatively
plasticuntillateinthepostnatalperiod,whereasthebrainsofourclosestrelativesaremorecompletelyformedby
birth.Thus,humanshavegreateraccesstothecomplexexperiencesaffordedbybeingoutintheworldduringthe
mostformativeperiodofbraindevelopment.Thismayenableusto"tunein"torapidlychangingfeaturesofthe
environmentthatotheranimals,moreconstrainedbyevolutionarystructuringoftheirbrains,areunabletotake
accountof.Totheextentthatthesechangesarereflectedinsimilartimingchangesinhypothesizedwaveof
corticaldevelopment,theymayalsoleadtochangesintheextractionofinformationfromthestimulus
environmentduringtheearlyselforganizationofthebrain.Ofcourse,alongwiththisflexibilitycomesan
extendedperiodofimmaturity,duringwhichwearedependentuponourcaretakersandourcommunityforboth
supportandtraining.Thetheoryofdeeplearningthereforeseesthecoevolutionofcultureandcognitionasa
fundamentalconditionofhumanevolution.[285]
Commercialactivities
DeeplearningisoftenpresentedasasteptowardsrealisingstrongAI[286]andthusmanyorganizationshave
becomeinterestedinitsuseforparticularapplications.InDecember2013,FacebookhiredYannLeCuntoheadits
newartificialintelligence(AI)labthatwastohaveoperationsinCalifornia,London,andNewYork.TheAIlab
willdevelopdeeplearningtechniquestohelpFacebookdotaskssuchasautomaticallytagginguploadedpictures
withthenamesofthepeopleinthem.[287]Latein2014,FacebookalsohiredVladimirVapnik,amaindeveloperof
theVapnikChervonenkistheoryofstatisticallearning,andcoinventorofthesupportvectormachinemethod.[288]
In2014,GooglealsoboughtDeepMindTechnologies,aBritishstartupthatdevelopedasystemcapableof
learninghowtoplayAtarivideogamesusingonlyrawpixelsasdatainput.In2015theydemonstratedAlphaGo
systemwhichachievedoneofthelongstanding"grandchallenges"ofAIbylearningthegameofGowellenough
tobeatahumanprofessionalGoplayer.[289][290][291]
In2015,Blippardemonstratedanewmobileaugmentedrealityapplicationthatmakesuseofdeeplearningto
recognizeobjectsinrealtime.[292]
Criticismandcomment
Giventhefarreachingimplicationsofartificialintelligencecoupledwiththerealizationthatdeeplearningis
emergingasoneofitsmostpowerfultechniques,thesubjectisunderstandablyattractingbothcriticismand
comment,andinsomecasesfromoutsidethefieldofcomputerscienceitself.
Amaincriticismofdeeplearningconcernsthelackoftheorysurroundingmanyofthemethods.Learninginthe
mostcommondeeparchitecturesisimplementedusinggradientdescentwhilegradientdescenthasbeen
understoodforawhilenow,thetheorysurroundingotheralgorithms,suchascontrastivedivergenceislessclear.
(i.e.,Doesitconverge?Ifso,howfast?Whatisitapproximating?)Deeplearningmethodsareoftenlookedatasa
blackbox,withmostconfirmationsdoneempirically,ratherthantheoretically.
OtherspointoutthatdeeplearningshouldbelookedatasasteptowardsrealizingstrongAI,notasanall
encompassingsolution.Despitethepowerofdeeplearningmethods,theystilllackmuchofthefunctionality
neededforrealizingthisgoalentirely.ResearchpsychologistGaryMarcushasnotedthat:
"Realistically,deeplearningisonlypartofthelargerchallengeofbuildingintelligentmachines.Such
techniqueslackwaysofrepresentingcausalrelationships(...)havenoobviouswaysofperforming
logicalinferences,andtheyarealsostillalongwayfromintegratingabstractknowledge,suchas
informationaboutwhatobjectsare,whattheyarefor,andhowtheyaretypicallyused.Themost
powerfulA.I.systems,likeWatson(...)usetechniqueslikedeeplearningasjustoneelementinavery
complicatedensembleoftechniques,rangingfromthestatisticaltechniqueofBayesianinferenceto
deductivereasoning."[293]
Totheextentthatsuchaviewpointimplies,withoutintendingto,thatdeeplearningwillultimatelyconstitute
nothingmorethantheprimitivediscriminatorylevelsofacomprehensivefuturemachineintelligence,arecentpair
ofspeculationsregardingartandartificialintelligence[294]offersanalternativeandmoreexpansiveoutlook.The
firstsuchspeculationisthatitmightbepossibletotrainamachinevisionstacktoperformthesophisticatedtaskof
discriminatingbetween"oldmaster"andamateurfiguredrawingsandthesecondisthatsuchasensitivitymight
infactrepresenttherudimentsofanontrivialmachineempathy.Itissuggested,moreover,thatsuchaneventuality
wouldbeinlinewithanthropology,whichidentifiesaconcernwithaestheticsasakeyelementofbehavioral
modernity.[295]
Infurtherreferencetotheideathatasignificantdegreeofartisticsensitivitymightinherewithinrelativelylow
levels,whetherbiologicalordigital,ofthecognitivehierarchy,apublishedseriesofgraphicrepresentationsofthe
internalstatesofdeep(2030layers)neuralnetworksattemptingtodiscernwithinessentiallyrandomdatathe
imagesonwhichtheyweretrained[296]seemtodemonstrateastrikingvisualappealinlightoftheremarkablelevel
ofpublicattentionwhichthisworkcaptured:theoriginalresearchnoticereceivedwellover1,000comments,and
thecoveragebyTheGuardian[297]wasforatimethemostfrequentlyaccessedarticleonthatnewspaper'swebsite.
Somecurrentlypopularandsuccessfuldeeplearningarchitecturesdisplaycertainproblematicbehaviors,[298]such
asconfidentlyclassifyingunrecognizableimagesasbelongingtoafamiliarcategoryofordinaryimages[299]and
misclassifyingminusculeperturbationsofcorrectlyclassifiedimages.[300]ThecreatorofOpenCog,BenGoertzel,
hypothesized[298]thatthesebehaviorsareduetolimitationsintheinternalrepresentationslearnedbythese
architectures,andthattheselimitationswouldinhibitintegrationofthesearchitecturesintoheterogeneousmulti
componentAGIarchitectures.Itissuggestedthattheseissuescanbeworkedaroundbydevelopingdeeplearning
architecturesthatinternallyformstateshomologoustoimagegrammar[301]decompositionsofobservedentities
andevents.[298]Learningagrammar(visualorlinguistic)fromtrainingdatawouldbeequivalenttorestrictingthe
systemtocommonsensereasoningwhichoperatesonconceptsintermsofproductionrulesofthegrammar,andis
abasicgoalofbothhumanlanguageacquisition[302]andAI.(SeealsoGrammarinduction.[303])
Softwarelibraries
Deeplearning4jAnopensourcedeeplearninglibrarywrittenforJava/C++withLSTMsand
convolutionalnetworks.ItprovidesparallelizationwithSparkonCPUsandGPUs.
GensimAtoolkitfornaturallanguageprocessingimplementedinthePythonprogramminglanguage.
KerasAnopensourcedeeplearningframeworkforthePythonprogramminglanguage.
MicrosoftCNTK(ComputationalNetworkToolkit)Microsoft'sopensourcedeeplearningtoolkitfor
WindowsandLinux.ItprovidesparallelizationwithCPUsandGPUsacrossmultipleservers.
MXNetAnopensourcedeeplearningframeworkthatallowsyoutodefine,train,anddeploydeepneural
networks.
OpenNNAnopensourceC++librarywhichimplementsdeepneuralnetworksandprovides
parallelizationwithCPUs.
PaddlePaddle(http://www.paddlepaddle.org)AnopensourceC++/CUDAlibrarywithPythonAPIfor
scalabledeeplearningplatformwithCPUsandGPUs,originallydevelopedbyBaidu.
TensorFlowGoogle'sopensourcemachinelearninglibraryinC++andPythonwithAPIsforboth.It
providesparallelizationwithCPUsandGPUs.
TheanoAnopensourcemachinelearninglibraryforPythonsupportedbytheUniversityofMontrealand
YoshuaBengio'steam.
TorchAnopensourcesoftwarelibraryformachinelearningbasedontheLuaprogramminglanguageand
usedbyFacebook.
Caffe(http://caffe.berkeleyvision.org)Caffeisadeeplearningframeworkmadewithexpression,speed,
andmodularityinmind.ItisdevelopedbytheBerkeleyVisionandLearningCenter(BVLC)andby
communitycontributors.
DIANNE(http://dianne.intec.ugent.be)AmodularopensourcedeeplearningframeworkinJava/OSGi
developedatGhentUniversity,Belgium.ItprovidesparallelizationwithCPUsandGPUsacrossmultiple
servers.
Seealso
Applicationsofartificialintelligence
Artificialneuralnetworks
Boltzmannmachine
CompressedSensing
Connectionism
Echostatenetwork
Listofartificialintelligenceprojects
Liquidstatemachine
Listofdatasetsformachinelearningresearch
Reservoircomputing
Sparsecoding
References
1.Deng,L.Yu,D.(2014)."DeepLearning:MethodsandApplications"(PDF).FoundationsandTrendsinSignal
Processing.7(34):1199.doi:10.1561/2000000039.
2.IanGoodfellow,YoshuaBengio,andAaronCourville(2016).DeepLearning.MITPress.Online(http://www.deeplearnin
gbook.org)
3.Bengio,Yoshua(2009)."LearningDeepArchitecturesforAI"(PDF).FoundationsandTrendsinMachineLearning.2
(1):1127.doi:10.1561/2200000006.
4.Bengio,Y.Courville,A.Vincent,P.(2013)."RepresentationLearning:AReviewandNewPerspectives".IEEE
TransactionsonPatternAnalysisandMachineIntelligence.35(8):17981828.arXiv:1206.5538 .
doi:10.1109/tpami.2013.50.
5.Schmidhuber,J.(2015)."DeepLearninginNeuralNetworks:AnOverview".NeuralNetworks.61:85117.
arXiv:1404.7828 .doi:10.1016/j.neunet.2014.09.003.
6.Bengio,YoshuaLeCun,YannHinton,Geoffrey(2015)."DeepLearning".Nature.521:436444.
doi:10.1038/nature14539.PMID26017442.
7.DeepMachineLearningANewFrontierinArtificialIntelligenceResearchasurveypaperbyItamarArel,DerekC.
Rose,andThomasP.Karnowski.IEEEComputationalIntelligenceMagazine,2013
8.Schmidhuber,Jrgen(2015)."DeepLearning".Scholarpedia.10(11):32832.doi:10.4249/scholarpedia.32832.
9.CarlosE.Perez."APatternLanguageforDeepLearning".
10.Glauner,P.(2015).DeepConvolutionalNeuralNetworksforSmileRecognition(MScThesis).ImperialCollegeLondon,
DepartmentofComputing.arXiv:1508.06535 .
11.Song,H.A.Lee,S.Y.(2013)."HierarchicalRepresentationUsingNMF".NeuralInformationProcessing.Lectures
NotesinComputerSciences.8226.SpringerBerlinHeidelberg.pp.466473.doi:10.1007/9783642420542_58.
ISBN9783642420535.
12.Olshausen,B.A.(1996)."Emergenceofsimplecellreceptivefieldpropertiesbylearningasparsecodefornatural
images".Nature.381(6583):607609.doi:10.1038/381607a0.PMID8637596.
13.Collobert,R.(April2011).DeepLearningforEfficientDiscriminativeParsing.VideoLectures.net.Eventoccursat7min
45s.
14.Gomes,L.(20October2014)."MachineLearningMaestroMichaelJordanontheDelusionsofBigDataandOtherHuge
EngineeringEfforts".IEEESpectrum.
15.J.Schmidhuber.,"Learningcomplex,extendedsequencesusingtheprincipleofhistorycompression,"Neural
Computation,4,pp.234242,1992.
16.Hinton,G.E."Deepbeliefnetworks".Scholarpedia.4(5):5947.doi:10.4249/scholarpedia.5947.
17.BalzsCsandCsji.ApproximationwithArtificialNeuralNetworksFacultyofSciencesEtvsLorndUniversity,
Hungary
18.Cybenko(1989)."Approximationsbysuperpositionsofsigmoidalfunctions"(PDF).MathematicsofControl,Signals,
andSystems.2(4):303314.doi:10.1007/bf02551274.
19.Hornik,Kurt(1991)."ApproximationCapabilitiesofMultilayerFeedforwardNetworks".NeuralNetworks.4(2):251
257.doi:10.1016/08936080(91)90009t.
20.Haykin,Simon(1998).NeuralNetworks:AComprehensiveFoundation,Volume2,PrenticeHall.ISBN0132733501.
21.Hassoun,M.(1995)FundamentalsofArtificialNeuralNetworksMITPress,p.48
22.Murphy,K.P.(2012)Machinelearning:aprobabilisticperspectiveMITPress
23.Hinton,G.E.Srivastava,N.Krizhevsky,A.Sutskever,I.Salakhutdinov,R.R.(2012)."Improvingneuralnetworksby
preventingcoadaptationoffeaturedetectors".arXiv:1207.0580 [math.LG].
24.Ivakhnenko,Alexey(1965).CyberneticPredictingDevices.Kiev:NaukovaDumka.
25.Ivakhnenko,Alexey(1971)."Polynomialtheoryofcomplexsystems".IEEETransactionsonSystems,Manand
Cybernetics(4):364378.
26.Fukushima,K.(1980)."Neocognitron:Aselforganizingneuralnetworkmodelforamechanismofpatternrecognition
unaffectedbyshiftinposition".Biol.Cybern.36:193202.doi:10.1007/bf00344251.PMID7370364.
27.SeppoLinnainmaa(1970).TherepresentationofthecumulativeroundingerrorofanalgorithmasaTaylorexpansionof
thelocalroundingerrors.Master'sThesis(inFinnish),Univ.Helsinki,67.
28.Griewank,Andreas(2012).WhoInventedtheReverseModeofDifferentiation?.OptimizationStories,Documenta
Matematica,ExtraVolumeISMP(2012),389400.
29.P.Werbos.,"BeyondRegression:NewToolsforPredictionandAnalysisintheBehavioralSciences,"PhDthesis,
HarvardUniversity,1974.
30.PaulWerbos(1982).Applicationsofadvancesinnonlinearsensitivityanalysis.InSystemmodelingandoptimization
(pp.762770).SpringerBerlinHeidelberg.Online(http://werbos.com/Neural/SensitivityIFIPSeptember1981.pdf)
31.LeCunetal.,"BackpropagationAppliedtoHandwrittenZipCodeRecognition,"NeuralComputation,1,pp.541551,
1989.
32.JrgenSchmidhuber(1993).Habilitationthesis,TUM,1993.Page150ffdemonstratescreditassignmentacrossthe
equivalentof1,200layersinanunfoldedRNN.Online(ftp://ftp.idsia.ch/pub/juergen/habilitation.pdf)
33.deCarvalho,AndreC.L.F.Fairhurst,MikeC.Bisset,David(19940808)."AnintegratedBooleanneuralnetworkfor
patternclassification".PatternRecognitionLetters.15(8):807813.doi:10.1016/01678655(94)900094.
34.Hinton,GeoffreyE.Dayan,PeterFrey,BrendanJ.Neal,Radford(19950526)."Thewakesleepalgorithmfor
unsupervisedneuralnetworks".Science.268(5214):11581161.doi:10.1126/science.7761831.
35.S.Hochreiter.,"UntersuchungenzudynamischenneuronalenNetzen(http://people.idsia.ch/~juergen/SeppHochreiter1991
ThesisAdvisorSchmidhuber.pdf),"Diplomathesis.Institutf.Informatik,TechnischeUniv.Munich.Advisor:J.
Schmidhuber,1991.
36.S.Hochreiteretal.,"Gradientflowinrecurrentnets:thedifficultyoflearninglongtermdependencies,"InS.C.Kremer
andJ.F.Kolen,editors,AFieldGuidetoDynamicalRecurrentNeuralNetworks.IEEEPress,2001.
37.J.Weng,N.AhujaandT.S.Huang,"Cresceptron:aselforganizingneuralnetworkwhichgrowsadaptively(http://www.
cse.msu.edu/~weng/research/CresceptronIJCNN1992.pdf),"Proc.InternationalJointConferenceonNeuralNetworks,
Baltimore,Maryland,volI,pp.576581,June,1992.
38.J.Weng,N.AhujaandT.S.Huang,"Learningrecognitionandsegmentationof3Dobjectsfrom2Dimages(http://ww
w.cse.msu.edu/~weng/research/CresceptronICCV1993.pdf),"Proc.4thInternationalConf.ComputerVision,Berlin,
Germany,pp.121128,May,1993.
39.J.Weng,N.AhujaandT.S.Huang,"LearningrecognitionandsegmentationusingtheCresceptron(http://www.cse.msu.
edu/~weng/research/CresceptronIJCV.pdf),"InternationalJournalofComputerVision,vol.25,no.2,pp.105139,Nov.
1997.
40.Morgan,Bourlard,Renals,Cohen,Franco(1993)"Hybridneuralnetwork/hiddenMarkovmodelsystemsforcontinuous
speechrecognition.ICASSP/IJPRAI"
41.T.Robinson.(1992)Arealtimerecurrenterrorpropagationnetworkwordrecognitionsystem,ICASSP.
42.Waibel,Hanazawa,Hinton,Shikano,Lang.(1989)"Phonemerecognitionusingtimedelayneuralnetworks.IEEE
TransactionsonAcoustics,SpeechandSignalProcessing."
43.Baker,J.Deng,LiGlass,JimKhudanpur,S.Lee,C.H.Morgan,N.O'Shaughnessy,D.(2009)."Research
DevelopmentsandDirectionsinSpeechRecognitionandUnderstanding,Part1".IEEESignalProcessingMagazine.26
(3):7580.doi:10.1109/msp.2009.932166.
44.Y.Bengio(1991)."ArtificialNeuralNetworksandtheirApplicationtoSpeech/SequenceRecognition,"Ph.D.thesis,
McGillUniversity,Canada.
45.Deng,L.Hassanein,K.Elmasry,M.(1994)."Analysisofcorrelationstructureforaneuralpredictivemodelwith
applicationstospeechrecognition".NeuralNetworks.7(2):331339.doi:10.1016/08936080(94)900272.
46.Heck,L.Konig,Y.Sonmez,M.Weintraub,M.(2000)."RobustnesstoTelephoneHandsetDistortioninSpeaker
RecognitionbyDiscriminativeFeatureDesign".SpeechCommunication.31(2):181192.doi:10.1016/s0167
6393(99)000771.
47.Hinton,G.Deng,L.Yu,D.Dahl,G.Mohamed,A.Jaitly,N.Senior,A.Vanhoucke,V.Nguyen,P.Sainath,T.
Kingsbury,B.(2012)."DeepNeuralNetworksforAcousticModelinginSpeechRecognitionThesharedviewsof
fourresearchgroups".IEEESignalProcessingMagazine.29(6):8297.doi:10.1109/msp.2012.2205597.
48.Deng,L.Hinton,G.Kingsbury,B.(2013)."Newtypesofdeepneuralnetworklearningforspeechrecognitionand
relatedapplications:Anoverview(ICASSP)".
49.Keynotetalk:RecentDevelopmentsinDeepNeuralNetworks.ICASSP,2013(byGeoffHinton).
50.Keynotetalk:"AchievementsandChallengesofDeepLearningFromSpeechAnalysisandRecognitionToLanguage
andMultimodalProcessing,"Interspeech,September2014.
51.Hochreiter,SeppandSchmidhuber,JrgenLongShortTermMemory,NeuralComputation,9(8):17351780,1997
52.AlexGraves,DouglasEck,NicoleBeringer,andJrgenSchmidhuber(2003).BiologicallyPlausibleSpeechRecognition
withLSTMNeuralNets.1stIntl.WorkshoponBiologicallyInspiredApproachestoAdvancedInformationTechnology,
BioADIT2004,Lausanne,Switzerland,p.175184,2004.Online(ftp://ftp.idsia.ch/pub/juergen/bioadit2004.pdf)
53.AlexGraves,SantiagoFernandez,FaustinoGomez,andJrgenSchmidhuber(2006).Connectionisttemporal
classification:Labellingunsegmentedsequencedatawithrecurrentneuralnets.ProceedingsofICML06,pp.369376.
54.SantiagoFernandez,AlexGraves,andJrgenSchmidhuber(2007).Anapplicationofrecurrentneuralnetworksto
discriminativekeywordspotting.ProceedingsofICANN(2),pp.220229.
55.HaimSak,AndrewSenior,KanishkaRao,FranoiseBeaufaysandJohanSchalkwyk(September2015):Googlevoice
search:fasterandmoreaccurate.(http://googleresearch.blogspot.ch/2015/09/googlevoicesearchfasterandmore.html)
56.IgorAizenberg,NaumN.Aizenberg,JoosP.L.Vandewalle(2000).MultiValuedandUniversalBinaryNeurons:Theory,
LearningandApplications.SpringerScience&BusinessMedia.
57.GoogleNgramchartoftheusageoftheexpression"deeplearning"postedbyJrgenSchmidhuber(2015)Online(https://
plus.google.com/100849856540000067209/posts/7N6z251w2Wd?pid=6127540521703625346&oid=10084985654000006
7209)
58.G.E.Hinton.,"Learningmultiplelayersofrepresentation,"TrendsinCognitiveSciences,11,pp.428434,2007.
59.J.Schmidhuber.,"MyFirstDeepLearningSystemof1991+DeepLearningTimeline19622013."Online(http://people.
idsia.ch/~juergen/firstdeeplearner.html)
60.Deng,LiHinton,GeoffreyKingsbury,Brian(1May2013)."Newtypesofdeepneuralnetworklearningforspeech
recognitionandrelatedapplications:Anoverview"viaresearch.microsoft.com.
61.L.Dengetal.RecentAdvancesinDeepLearningforSpeechResearchatMicrosoft,ICASSP,2013.
62.L.Deng,O.AbdelHamid,andD.Yu,Adeepconvolutionalneuralnetworkusingheterogeneouspoolingfortrading
acousticinvariancewithphoneticconfusion,ICASSP,2013.
63.T.Sainathetal.,"ConvolutionalneuralnetworksforLVCSR,"ICASSP,2013.
64.HasimSakandAndrewSeniorandFrancoiseBeaufays(2014).LongShortTermMemoryrecurrentneuralnetwork
architecturesforlargescaleacousticmodeling.ProceedingsofInterspeech2014.
65.XiangangLi,XihongWu(2015).ConstructingLongShortTermMemorybasedDeepRecurrentNeuralNetworksfor
LargeVocabularySpeechRecognitionarXiv:1410.4281(http://arxiv.org/abs/1410.4281)
66.HeigaZenandHasimSak(2015).UnidirectionalLongShortTermMemoryRecurrentNeuralNetworkwithRecurrent
OutputLayerforLowLatencySpeechSynthesis.InProceedingsofICASSP,pp.44704474.
67.YannLeCun(2016).SlidesonDeepLearningOnline(https://indico.cern.ch/event/510372/)
68.D.Yu,L.Deng,G.Li,andF.Seide(2011)."Discriminativepretrainingofdeepneuralnetworks,"U.S.PatentFiling.
69.NIPSWorkshop:DeepLearningforSpeechRecognitionandRelatedApplications,Whistler,BC,Canada,Dec.2009
(Organizers:LiDeng,GeoffHinton,D.Yu).
70.Yu,D.Deng,L.(2014)."AutomaticSpeechRecognition:ADeepLearningApproach(Publisher:Springer)".
71.IEEE(2015)http://blogs.technet.com/b/inside_microsoft_research/archive/2015/12/03/dengreceivesprestigiousieee
technicalachievementaward.aspx
72."NvidiaCEObetsbigondeeplearningandVR".VentureBeat.April5,2016.
73."Fromnotworkingtoneuralnetworking".TheEconomist.
74.Oh,K.S.Jung,K.(2004)."GPUimplementationofneuralnetworks".PatternRecognition.37(6):13111314.
doi:10.1016/j.patcog.2004.01.013.
75.Chellapilla,K.,Puri,S.,andSimard,P.(2006).Highperformanceconvolutionalneuralnetworksfordocument
processing.InternationalWorkshoponFrontiersinHandwritingRecognition.
76.D.C.Ciresanetal.,"DeepBigSimpleNeuralNetsforHandwrittenDigitRecognition,"NeuralComputation,22,pp.
32073220,2010.
77.R.Raina,A.Madhavan,A.Ng.,"LargescaleDeepUnsupervisedLearningusingGraphicsProcessors,"Proc.26thInt.
Conf.onMachineLearning,2009.
78.Sze,VivienneChen,YuHsinYang,TienJuEmer,Joel(2017)."EfficientProcessingofDeepNeuralNetworks:A
TutorialandSurvey".arXiv:1703.09039 .
79.Riesenhuber,MPoggio,T(1999)."Hierarchicalmodelsofobjectrecognitionincortex".NatureNeuroscience.2(11):
10191025.doi:10.1038/14819.
80.Y.LeCun,B.Boser,J.S.Denker,D.Henderson,R.E.Howard,W.Hubbard,L.D.Jackel.1989Backpropagation
AppliedtoHandwrittenZipCodeRecognition.NeuralComputation,1(4):541551.
81.Griewank,AndreasandWalther,A..PrinciplesandTechniquesofAlgorithmicDifferentiation,SecondEdition.SIAM,
2008.
82.Kelley,HenryJ.(1960)."Gradienttheoryofoptimalflightpaths".ArsJournal.30(10):947954.doi:10.2514/8.5282.
83.ArthurE.Bryson(1961,April).Agradientmethodforoptimizingmultistageallocationprocesses.InProceedingsofthe
HarvardUniv.Symposiumondigitalcomputersandtheirapplications.
84.Dreyfus,Stuart(1962)."Thenumericalsolutionofvariationalproblems".JournalofMathematicalAnalysisand
Applications.5(1):3045.doi:10.1016/0022247x(62)900045.
85.Dreyfus,Stuart(1973)."Thecomputationalsolutionofoptimalcontrolproblemswithtimelag".IEEETransactionson
AutomaticControl.18(4):383385.doi:10.1109/tac.1973.1100330.
86.Rumelhart,D.E.,Hinton,G.E.&Williams,R.J.,"Learningrepresentationsbybackpropagatingerrors"nature,1974.
87.StuartDreyfus(1990).ArtificialNeuralNetworks,BackPropagationandtheKelleyBrysonGradientProcedure.J.
Guidance,ControlandDynamics,1990.
88.Graves,AlexandSchmidhuber,JrgenOfflineHandwritingRecognitionwithMultidimensionalRecurrentNeural
Networks,inBengio,YoshuaSchuurmans,DaleLafferty,JohnWilliams,ChrisK.I.andCulotta,Aron(eds.),
AdvancesinNeuralInformationProcessingSystems22(NIPS'22),December7th10th,2009,Vancouver,BC,Neural
InformationProcessingSystems(NIPS)Foundation,2009,pp.545552
89.Graves,A.Liwicki,M.Fernandez,S.Bertolami,R.Bunke,H.Schmidhuber,J.(2009)."ANovelConnectionist
SystemforImprovedUnconstrainedHandwritingRecognition".IEEETransactionsonPatternAnalysisandMachine
Intelligence.31(5):855868.doi:10.1109/tpami.2008.137.
90.SvenBehnke(2003).HierarchicalNeuralNetworksforImageInterpretation.(PDF).LectureNotesinComputerScience.
2766.Springer.
91.Smolensky,P.(1986)."Informationprocessingindynamicalsystems:Foundationsofharmonytheory.".InD.E.
Rumelhart,J.L.McClelland,&thePDPResearchGroup.ParallelDistributedProcessing:Explorationsinthe
MicrostructureofCognition.1.pp.194281.
92.Hinton,G.E.Osindero,S.Teh,Y.(2006)."Afastlearningalgorithmfordeepbeliefnets"(PDF).NeuralComputation.
18(7):15271554.doi:10.1162/neco.2006.18.7.1527.PMID16764513.
93.Hinton,G.(2009)."Deepbeliefnetworks".Scholarpedia.4(5):5947.doi:10.4249/scholarpedia.5947.
94.JohnMarkoff(25June2012)."HowManyComputerstoIdentifyaCat?16,000.".NewYorkTimes.
95.Ng,AndrewDean,Jeff(2012)."BuildingHighlevelFeaturesUsingLargeScaleUnsupervisedLearning".
arXiv:1112.6209 .
96.D.C.Ciresan,U.Meier,J.Masci,L.M.Gambardella,J.Schmidhuber.Flexible,HighPerformanceConvolutional
NeuralNetworksforImageClassification.InternationalJointConferenceonArtificialIntelligence(IJCAI2011,
Barcelona),2011.
97.Martines,H.Bengio,Y.Yannakakis,G.N.(2013)."LearningDeepPhysiologicalModelsofAffect".IEEE
ComputationalIntelligence.8(2):2033.doi:10.1109/mci.2013.2247823.
98.D.C.Ciresan,U.Meier,J.Masci,J.Schmidhuber.MultiColumnDeepNeuralNetworkforTrafficSignClassification.
NeuralNetworks,2012.
99.D.Ciresan,A.Giusti,L.Gambardella,J.Schmidhuber.DeepNeuralNetworksSegmentNeuronalMembranesin
ElectronMicroscopyImages.InAdvancesinNeuralInformationProcessingSystems(NIPS2012),LakeTahoe,2012.
100.Krizhevsky,A.,Sutskever,I.andHinton,G.E.(2012).ImageNetClassificationwithDeepConvolutionalNeural
Networks.NIPS2012:NeuralInformationProcessingSystems,LakeTahoe,Nevada
101.D.C.Ciresan,U.Meier,J.Schmidhuber.MulticolumnDeepNeuralNetworksforImageClassification.IEEEConf.on
ComputerVisionandPatternRecognitionCVPR2012.
102.D.J.FellemanandD.C.VanEssen,"Distributedhierarchicalprocessingintheprimatecerebralcortex(http://cercor.oxfo
rdjournals.org/content/1/1/1.1.full.pdf+html),"CerebralCortex,1,pp.147,1991.
103.J.Weng,"NaturalandArtificialIntelligence:IntroductiontoComputationalBrainMind(http://www.amazon.com/Natural
ArtificialIntelligenceIntroductionComputational/dp/0985875720),"BMIPress,ISBN9780985875725,2012.
104.J.Weng,"WhyHaveWePassed`NeuralNetworksDonotAbstractWell'?(http://www.cse.msu.edu/~weng/research/Wh
yPassWengNI2011.pdf),"NaturalIntelligence:theINNSMagazine,vol.1,no.1,pp.1322,2011.
105.Z.Ji,J.Weng,andD.Prokhorov,"WhereWhatNetwork1:WhereandWhatAssistEachOtherThroughTopdown
Connections(http://www.cse.msu.edu/~weng/research/ICDL08_0077.pdf),"Proc.7thInternationalConferenceon
DevelopmentandLearning(ICDL'08),Monterey,CA,Aug.912,pp.16,2008.
106.X.Wu,G.Guo,andJ.Weng,"SkullclosedAutonomousDevelopment:WWN7DealingwithScales(http://www.cse.ms
u.edu/~weng/research/WWN7WuICBM2013.pdf),"Proc.InternationalConferenceonBrainMind,July2728,East
Lansing,Michigan,pp.+19,2013.
107.Szegedy,Christian,AlexanderToshev,andDumitruErhan."Deepneuralnetworksforobjectdetection."Advancesin
NeuralInformationProcessingSystems.2013.
108.Gers,FelixSchraudolph,NicholasSchmidhuber,Jrgen(2002)."LearningprecisetimingwithLSTMrecurrent
networks".JournalofMachineLearningResearch.3:115143.
109.FelixA.GersandJrgenSchmidhuber.LSTMRecurrentNetworksLearnSimpleContextFreeandContextSensitive
Languages.IEEETNN12(6):13331340,2001.
110.I.Sutskever,O.Vinyals,Q.Le(2014)"SequencetoSequenceLearningwithNeuralNetworks,"Proc.NIPS.
111.RafalJozefowicz,OriolVinyals,MikeSchuster,NoamShazeer,YonghuiWu(2016).ExploringtheLimitsofLanguage
Modeling.arXiv(http://arxiv.org/abs/1602.02410)
112.DanGillick,CliffBrunk,OriolVinyals,AmarnagSubramanya(2015).MultilingualLanguageProcessingFromBytes.
arXiv(http://arxiv.org/abs/1512.00103)
113.T.Mikolovetal.,"Recurrentneuralnetworkbasedlanguagemodel,"Interspeech,2010.
114.LeCun,Y.etal."Gradientbasedlearningappliedtodocumentrecognition".ProceedingsoftheIEEE.86(11):2278
2324.doi:10.1109/5.726791.
115.EijiMizutani,StuartDreyfus,KenichiNishio(2000).OnderivationofMLPbackpropagationfromtheKelleyBryson
optimalcontrolgradientformulaanditsapplication.ProceedingsoftheIEEEInternationalJointConferenceonNeural
Networks(IJCNN2000),ComoItaly,July2000.Online(http://queue.ieor.berkeley.edu/People/Faculty/dreyfuspubs/ijcn
n2k.pdf)
116.Bryson,A.E.W.F.DenhamS.E.Dreyfus.Optimalprogrammingproblemswithinequalityconstraints.I:Necessary
conditionsforextremalsolutions.AIAAJ.1,11(1963)25442550
117.StuartRussellPeterNorvig.ArtificialIntelligenceAModernApproach.p.578."Themostpopularmethodforlearning
inmultilayernetworksiscalledBackpropagation."
118.ArthurEarlBryson,YuChiHo(1969).Appliedoptimalcontrol:optimization,estimation,andcontrol.Blaisdell
PublishingCompanyorXeroxCollegePublishing.p.481.
119.SeppoLinnainmaa(1976).Taylorexpansionoftheaccumulatedroundingerror.BITNumericalMathematics,16(2),146
160.
120.PaulWerbos(1974).Beyondregression:Newtoolsforpredictionandanalysisinthebehavioralsciences.PhDthesis,
HarvardUniversity.
121.EricA.Wan(1993).Timeseriespredictionbyusingaconnectionistnetworkwithinternaldelaylines.InSANTAFE
INSTITUTESTUDIESINTHESCIENCESOFCOMPLEXITYPROCEEDINGS(Vol.15,pp.195195).Addison
WesleyPublishingCo.
122.G.E.Hintonetal..,"DeepNeuralNetworksforAcousticModelinginSpeechRecognition:Thesharedviewsoffour
researchgroups,"IEEESignalProcessingMagazine,pp.8297,November2012.
123.Y.Bengioetal..,"Advancesinoptimizingrecurrentnetworks,"ICASSP,2013.
124.G.Dahletal..,"ImprovingDNNsforLVCSRusingrectifiedlinearunitsanddropout,"ICASSP,2013.
125.G.E.Hinton.,"APracticalGuidetoTrainingRestrictedBoltzmannMachines,"Tech.Rep.UTMLTR2010003,Dept.
CS.,Univ.ofToronto,2010.
126.Huang,GuangBinZhu,QinYuSiew,CheeKheong(2006)."Extremelearningmachine:theoryandapplications".
Neurocomputing.70(1):489501.doi:10.1016/j.neucom.2005.12.126.
127.Widrow,Bernardetal.(2013)."Thenopropalgorithm:Anewlearningalgorithmformultilayerneuralnetworks".
NeuralNetworks.37:182188.doi:10.1016/j.neunet.2012.09.020.
128.Ollivier,YannCharpiat,Guillaume(2015)."Trainingrecurrentnetworkswithoutbacktracking".arXiv:1507.07680 .
129.Aleksander,Igor,etal."AbriefintroductiontoWeightlessNeuralSystems."ESANN.2009.
130.AlexeyGrigorevichIvakhnenkoandV.G.LapaandR.N.McDonough(1967).Cyberneticsandforecastingtechniques.
AmericanElsevier,NY.
131.AlexeyGrigorevichIvakhnenko(1968).Thegroupmethodofdatahandlingarivalofthemethodofstochastic
approximation.SovietAutomaticControl,13(3):4355.
132.Kondo,T.Ueno,J.(2008)."MultilayeredGMDHtypeneuralnetworkselfselectingoptimumneuralnetwork
architectureanditsapplicationto3dimensionalmedicalimagerecognitionofbloodvessels".InternationalJournalof
InnovativeComputing,InformationandControl.4(1):175187.
133."UnsupervisedFeatureLearningandDeepLearningTutorial".
134.Szegedy,ChristianLiu,WeiJia,YangqingSermanet,PierreReed,ScottAnguelov,DragomirErhan,Dumitru
Vanhoucke,VincentRabinovich,Andrew(2014)."GoingDeeperwithConvolutions".ComputingResearchRepository.
arXiv:1409.4842 .
135.Goller,C.Kchler,A."Learningtaskdependentdistributedrepresentationsbybackpropagationthroughstructure".
NeuralNetworks,1996.,IEEE.doi:10.1109/ICNN.1996.548916.
136.Socher,RichardLin,CliffNg,AndrewY.Manning,ChristopherD."ParsingNaturalScenesandNaturalLanguage
withRecursiveNeuralNetworks".The28thInternationalConferenceonMachineLearning(ICML2011).
137.Socher,RichardPerelygin,AlexY.Wu,JeanChuang,JasonD.Manning,ChristopherY.Ng,AndrewPotts,
Christopher."RecursiveDeepModelsforSemanticCompositionalityOveraSentimentTreebank"(PDF).EMNLP2013.
138.JustinBayer,DaanWierstra,JulianTogelius,andJrgenSchmidhuber(2009).Evolvingmemorycellstructuresfor
sequencelearning.ProceedingsofICANN(2),pp.755764.
139.SantiagoFernandez,AlexGraves,andJrgenSchmidhuber(2007).Sequencelabellinginstructureddomainswith
hierarchicalrecurrentneuralnetworks.ProceedingsofIJCAI.
140.AwniHannun,CarlCase,JaredCasper,BryanCatanzaro,GregDiamos,ErichElsen,RyanPrenger,SanjeevSatheesh,
ShubhoSengupta,AdamCoates,AndrewNg(2014).DeepSpeech:Scalingupendtoendspeechrecognition.
arXiv:1412.5567(http://arxiv.org/abs/1412.5567)
141.Fan,Y.,Qian,Y.,Xie,F.,andSoong,F.K.(2014).TTSsynthesiswithbidirectionalLSTMbasedrecurrentneural
networks.InProceedingsofInterspeech.
142.BoFan,LijuanWang,FrankK.Soong,andLeiXie(2015).PhotoRealTalkingHeadwithDeepBidirectionalLSTM.In
ProceedingsofICASSP2015.
143.OriolVinyals,AlexanderToshev,SamyBengio,andDumitruErhan(2015).ShowandTell:ANeuralImageCaption
Generator.arXiv(http://arxiv.org/abs/1411.4555)
144.Larochelle,H.etal."Anempiricalevaluationofdeeparchitecturesonproblemswithmanyfactorsofvariation".Proc.
24thInt.Conf.MachineLearning.2007:473480.
145.G.E.Hinton.,"TrainingProductofExpertsbyMinimizingContrastiveDivergence,"(http://www.cs.toronto.edu/~fritz/a
bsps/nccd.pdf)NeuralComputation,14,pp.17711800,2002.
146.Fischer,A.Igel,C.(2014)."TrainingRestrictedBoltzmannMachines:AnIntroduction"(PDF).PatternRecognition.47:
2539.doi:10.1016/j.patcog.2013.05.025.
147.ConvolutionalDeepBeliefNetworksonCIFAR10(http://www.cs.toronto.edu/~kriz/convcifar10aug2010.pdf)
148.Lee,HonglakGrosse,RogerRanganath,RajeshNg,AndrewY.(1January2009)."ConvolutionalDeepBelief
NetworksforScalableUnsupervisedLearningofHierarchicalRepresentations".ACM.pp.609616.
doi:10.1145/1553374.1553453viaACMDigitalLibrary.
149.D.Graupe,"PrinciplesofArtificialNeuralNetworks.3rdEdition",WorldScientificPublishers,2013.
150.D.Graupe,"Largememorystorageandretrieval(LAMSTAR)network,USPatent5920852A",April1996.
151.D.Graupe,"PrinciplesofArtificialNeuralNetworks.3rdEdition",WorldScientificPublishers,2013,pp.203274.
152.V.P.Nigam,D.Graupe,(2004),"Aneuralnetworkbaseddetectionofepilepsy",NeurologicalResearch,26(1):5560.
153.Waxman,J.Graupe,D.Carley,CW.(2010)."Automatedpredictionofapneaandhypopnea,usingaLAMSTAR
artificialneuralnetwork".AmericanJournalofRespiratoryandCriticalCareMedicine.171(7):727733.
154.Graupe,D.Graupe,M.H.Zhong,Y.Jackson,R.K.(2008)."Blindadaptivefilteringfornoninvasiveextractionofthe
fetalelectrocardiogramanditsnonstationarities".Proc.Inst.MechEng.,UK,PartH:JournalofEngineeringin
Medicine.222(8):12211234.doi:10.1243/09544119jeim417.
155.D.Graupe,"PrinciplesofArtificialNeuralNetworks.3rdEdition",WorldScientificPublishers,2013,pp.240253.
156.Graupe,D.Abon,J.(2002)."ANeuralNetworkforBlindAdaptiveFilteringofUnknownNoisefromSpeech".
IntelligentEngineeringSystemsThroughArtificialNeuralNetworks.12:683688.
157.Homayon,S.(2015)."IrisRecognitionforPersonalIdentificationUsingLAMSTARNeuralNetwork".International
JournalofComputerScienceandInformationTechnology.7(1).
158.D.Graupe,"PrinciplesofArtificialNeuralNetworks.3rdEdition",WorldScientificPublishers",2013,pp.253274.
159.Girado,J.I.Sandin,D.J.DeFanti,T.A.(2003)."RealtimecamerabasedfacedetectionusingamodifiedLAMSTAR
neuralnetworksystem".Proc.SPIE5015,ApplicationsofArtificialNeuralNetworksinImageProcessingVIII.
doi:10.1117/12.477405.
160.Venkatachalam,VSelvan,S.(2007)."IntrusionDetectionusinganImprovedCompetitiveLearningLamstarNetwork".
InternationalJournalofComputerScienceandNetworkSecurity.7(2):255263.
161.D.Graupe,M.Smollack,(2007),"ControlofunstablenonlinearandnonstationarysystemsusingLAMSTARneural
networks",Proceedingsof10thIASTEDonIntelligentControl,Sect.592,141144.
162.D.Graupe,"DeepLearningNeuralNetworks.DesignandCaseStudies",WorldScientificPublishers,2016,pp.57110.
163.Graupe,H.Kordylewski(1996)."NetworkbasedonSOM(selforganizingmap)modulescombinedwithstatistical
decisiontools".Proc.IEEE39thMidwestConf.onCircuitsandSystems.1:471475.
164.D,Graupe,H.Kordylewski,(1998),"Alargememorystorageandretrievalneuralnetworkforadaptiveretrievaland
diagnosis",InternationalJournalofSoftwareEngineeringandKnowledgeEngineering,1998.
165.Kordylewski,H.Graupe,DLiu,K."Anovellargememoryneuralnetworkasanaidinmedicaldiagnosisapplications".
IEEETransactionsonInformationTechnologyinBiomedicine.5(3):202209.doi:10.1109/4233.945291.
166.Schneider,N.C.Graupe(2008)."AmodifiedLAMSTARneuralnetworkanditsapplications".Internationaljournalof
neuralsystems.18(4):331337.doi:10.1142/s0129065708001634.
167.D.Graupe,"PrinciplesofArtificialNeuralNetworks.3rdEdition",WorldScientificPublishers,2013,p.217.
168.Hinton,GeoffreySalakhutdinov,Ruslan(2012)."AbetterwaytopretraindeepBoltzmannmachines"(PDF).Advancesin
Neural.3:19.
169.Hinton,GeoffreySalakhutdinov,Ruslan(2009)."EfficientLearningofDeepBoltzmannMachines"(PDF).3:448455.
170.Bengio,YoshuaLeCun,Yann(2007)."ScalingLearningAlgorithmstowardsAI"(PDF).1:141.
171.Larochelle,HugoSalakhutdinov,Ruslan(2010)."EfficientLearningofDeepBoltzmannMachines"(PDF):693700.
172.Vincent,PascalLarochelle,HugoLajoie,IsabelleBengio,YoshuaManzagol,PierreAntoine(2010)."Stacked
DenoisingAutoencoders:LearningUsefulRepresentationsinaDeepNetworkwithaLocalDenoisingCriterion".The
JournalofMachineLearningResearch.11:33713408.
173.DanaH.Ballard(1987).Modularlearninginneuralnetworks.ProceedingsofAAAI,pages279284.
174.Deng,LiYu,Dong(2011)."DeepConvexNet:AScalableArchitectureforSpeechPatternClassification"(PDF).
ProceedingsoftheInterspeech:22852288.
175.Deng,LiYu,DongPlatt,John(2012)."Scalablestackingandlearningforbuildingdeeparchitectures"(PDF).2012
IEEEInternationalConferenceonAcoustics,SpeechandSignalProcessing(ICASSP):21332136.
176.David,Wolpert(1992)."Stackedgeneralization".NeuralNetworks.5(2):241259.doi:10.1016/S08936080(05)80023
1.
177.Bengio,Yoshua(2009)."LearningdeeparchitecturesforAI".FoundationsandTrendsinMachineLearning.2(1):1
127.doi:10.1561/2200000006.
178.Hutchinson,BrianDeng,LiYu,Dong(2012)."Tensordeepstackingnetworks".IEEETransactionsonPattern
AnalysisandMachineIntelligence.115:19441957.doi:10.1109/tpami.2012.268.
179.Hinton,GeoffreySalakhutdinov,Ruslan(2006)."ReducingtheDimensionalityofDatawithNeuralNetworks".Science.
313:504507.doi:10.1126/science.1127647.PMID16873662.
180.Dahl,G.Yu,D.Deng,L.Acero,A.(2012)."ContextDependentPreTrainedDeepNeuralNetworksforLarge
VocabularySpeechRecognition".IEEETransactionsonAudio,Speech,andLanguageProcessing.20(1):3042.
doi:10.1109/tasl.2011.2134090.
181.Mohamed,AbdelrahmanDahl,GeorgeHinton,Geoffrey(2012)."AcousticModelingUsingDeepBeliefNetworks".
IEEETransactionsonAudio,Speech,andLanguageProcessing.20(1):1422.doi:10.1109/tasl.2011.2109382.
182.Courville,AaronBergstra,JamesBengio,Yoshua(2011)."ASpikeandSlabRestrictedBoltzmannMachine"(PDF).
JMLR:WorkshopandConferenceProceeding.15:233241.
183.Courville,AaronBergstra,JamesBengio,Yoshua(2011)."UnsupervisedModelsofImagesbySpikeandSlabRBMs".
Proceedingsofthe28thInternationalConferenceonMachineLearning(PDF).10.pp.18.
184.Mitchell,TBeauchamp,J(1988)."BayesianVariableSelectioninLinearRegression".JournaloftheAmerican
StatisticalAssociation.83(404):10231032.doi:10.1080/01621459.1988.10478694.
185.Larochelle,HugoBengio,YoshuaLouradour,JerdmeLamblin,Pascal(2009)."ExploringStrategiesforTrainingDeep
NeuralNetworks".TheJournalofMachineLearningResearch.10:140.
186.Coates,AdamCarpenter,Blake(2011)."TextDetectionandCharacterRecognitioninSceneImageswithUnsupervised
FeatureLearning":440445.
187.Lee,HonglakGrosse,Roger(2009)."Convolutionaldeepbeliefnetworksforscalableunsupervisedlearningof
hierarchicalrepresentations".Proceedingsofthe26thAnnualInternationalConferenceonMachineLearning:18.
188.Lin,YuanqingZhang,Tong(2010)."DeepCodingNetwork"(PDF).AdvancesinNeural...:19.
189.Ranzato,MarcAurelioBoureau,YLan(2007)."SparseFeatureLearningforDeepBeliefNetworks"(PDF).Advancesin
NeuralInformationProcessingSystems.23:18.
190.Socher,RichardLin,Clif(2011)."ParsingNaturalScenesandNaturalLanguagewithRecursiveNeuralNetworks"
(PDF).Proceedingsofthe26thInternationalConferenceonMachineLearning.
191.Taylor,GrahamHinton,Geoffrey(2006)."ModelingHumanMotionUsingBinaryLatentVariables"(PDF).Advancesin
NeuralInformationProcessingSystems.
192.Vincent,PascalLarochelle,Hugo(2008)."Extractingandcomposingrobustfeatureswithdenoisingautoencoders".
Proceedingsofthe25thinternationalconferenceonMachinelearningICML'08:10961103.
193.Kemp,CharlesPerfors,AmyTenenbaum,Joshua(2007)."LearningoverhypotheseswithhierarchicalBayesian
models".DevelopmentalScience.10(3):30721.doi:10.1111/j.14677687.2007.00585.x.PMID17444972.
194.Xu,FeiTenenbaum,Joshua(2007)."WordlearningasBayesianinference".Psychol.Rev.114(2):24572.
doi:10.1037/0033295X.114.2.245.PMID17500627.
195.Chen,BoPolatkan,Gungor(2011)."TheHierarchicalBetaProcessforConvolutionalFactorAnalysisandDeep
Learning"(PDF).MachineLearning...
196.FeiFei,LiFergus,Rob(2006)."Oneshotlearningofobjectcategories".IEEETransactionsonPatternAnalysisand
MachineIntelligence.28(4):594611.doi:10.1109/TPAMI.2006.79.PMID16566508.
197.Rodriguez,AbelDunson,David(2008)."TheNestedDirichletProcess".JournaloftheAmericanStatistical
Association.103(483):11311154.doi:10.1198/016214508000000553.
198.Ruslan,SalakhutdinovJoshua,Tenenbaum(2012)."LearningwithHierarchicalDeepModels".IEEETransactionson
PatternAnalysisandMachineIntelligence.35:195871.doi:10.1109/TPAMI.2012.269.
199.Chalasani,RakeshPrincipe,Jose(2013)."DeepPredictiveCodingNetworks".arXiv:1301.3541 .
200.Mnih,Volodymyretal.(2015)."Humanlevelcontrolthroughdeepreinforcementlearning".Nature.518:529533.
doi:10.1038/nature14236.PMID25719670.
201.R.SunandC.Sessions,Selfsegmentationofsequences:Automaticformationofhierarchiesofsequentialbehaviors.
IEEETransactionsonSystems,Man,andCybernetics:PartBCybernetics,Vol.30,No.3,pp.403418.2000.
202.Hinton,GeoffreyE."Distributedrepresentations."(1984)
203.S.Das,C.L.Giles,G.Z.Sun,"LearningContextFreeGrammars:LimitationsofaRecurrentNeuralNetworkwithan
ExternalStackMemory,"Proc.14thAnnualConf.oftheCog.Sci.Soc.,p.79,1992.
204.Mozer,M.C.,&Das,S.(1993).Aconnectionistsymbolmanipulatorthatdiscoversthestructureofcontextfree
languages.NIPS5(pp.863870).
205.Schmidhuber,J.(1992)."Learningtocontrolfastweightmemories:Analternativetorecurrentnets".Neural
Computation.4(1):131139.doi:10.1162/neco.1992.4.1.131.
206.Gers,F.Schraudolph,N.Schmidhuber,J.(2002)."LearningprecisetimingwithLSTMrecurrentnetworks".JMLR.3:
115143.
207.JrgenSchmidhuber(1993)."Anintrospectivenetworkthatcanlearntorunitsownweightchangealgorithm".InProc.
oftheIntl.Conf.onArtificialNeuralNetworks,Brighton.IEE.pp.191195.
208.Hochreiter,SeppYounger,A.StevenConwell,PeterR.(2001)."LearningtoLearnUsingGradientDescent".ICANN.
2130:8794.
209.Grefenstette,Edward,etal."LearningtoTransducewithUnboundedMemory."(http://arxiv.org/pdf/1506.02516.pdf)
arXiv:1506.02516(2015).
210.Atkeson,ChristopherG.Schaal,Stefan(1995)."Memorybasedneuralnetworksforrobotlearning".Neurocomputing.9
(3):243269.doi:10.1016/09252312(95)000336.
211.Salakhutdinov,Ruslan,andGeoffreyHinton."Semantichashing."(http://www.utstat.toronto.edu/~rsalakhu/papers/sdartic
le.pdf)InternationalJournalofApproximateReasoning50.7(2009):969978.
212.Le,QuocV.Mikolov,Tomas(2014)."Distributedrepresentationsofsentencesanddocuments".arXiv:1405.4053 .
213.Graves,Alex,GregWayne,andIvoDanihelka."NeuralTuringMachines."arXiv:1410.5401(2014).
214.Weston,Jason,SumitChopra,andAntoineBordes."Memorynetworks."arXiv:1410.3916(2014).
215.Sukhbaatar,Sainbayar,etal."EndToEndMemoryNetworks."arXiv:1503.08895(2015).
216.Bordes,Antoine,etal."LargescaleSimpleQuestionAnsweringwithMemoryNetworks."arXiv:1506.02075(2015).
217.Vinyals,Oriol,MeireFortunato,andNavdeepJaitly."Pointernetworks."arXiv:1506.03134(2015).
218.Kurach,Karol,Andrychowicz,MarcinandSutskever,Ilya."NeuralRandomAccessMachines."arXiv:1511.06392(2015).
219.N.KalchbrennerandP.Blunsom,"Recurrentcontinuoustranslationmodels,"inEMNLP2013,2013.
220.I.Sutskever,O.Vinyals,andQ.V.Le,"Sequencetosequencelearningwithneuralnetworks,"inNIPS2014,2014.
221.K.Cho,B.vanMerrienboer,C.Gulcehre,F.Bougares,H.Schwenk,andY.Bengio,"Learningphraserepresentations
usingRNNencoderdecoderforstatisticalmachinetranslation,"inProceedingsoftheEmpiricialMethodsinNatural
LanguageProcessing(EMNLP2014),Oct.2014
222.Cho,Kyunghyun,AaronCourville,andYoshuaBengio."DescribingMultimediaContentusingAttentionbasedEncoder
DecoderNetworks."arXiv:1507.01053(2015).
223.Cho,Youngmin(2012)."KernelMethodsforDeepLearning"(PDF):19.
224.Scholkopf,BSmola,Alexander(1998)."Nonlinearcomponentanalysisasakerneleigenvalueproblem".Neural
computation.(44):12991319.doi:10.1162/089976698300017467.
225.L.Deng,G.Tur,X.He,andD.HakkaniTur."UseofKernelDeepConvexNetworksandEndToEndLearningfor
SpokenLanguageUnderstanding,"Proc.IEEEWorkshoponSpokenLanguageTechnologies,2012
226.TIMITAcousticPhoneticContinuousSpeechCorpusLinguisticDataConsortium,Philadelphia.
227.AbdelHamid,O.etal.(2014)."ConvolutionalNeuralNetworksforSpeechRecognition".IEEE/ACMTransactionson
Audio,Speech,andLanguageProcessing.22(10):15331545.doi:10.1109/taslp.2014.2339736.
228.Deng,L.Platt,J.(2014)."EnsembleDeepLearningforSpeechRecognition".Proc.Interspeech.
229.Yu,D.Deng,L.(2010)."RolesofPreTrainingandFineTuninginContextDependentDBNHMMsforRealWorld
SpeechRecognition".NIPSWorkshoponDeepLearningandUnsupervisedFeatureLearning.
230.Seide,F.,Li,G.,Yu,D.Conversationalspeechtranscriptionusingcontextdependentdeepneuralnetworks.Interspeech,
2011.
231.DengL.,Li,J.,Huang,J.,Yao,K.,Yu,D.,Seide,F.etal.RecentAdvancesinDeepLearningforSpeechResearchat
Microsoft.ICASSP,2013.
232.Deng,L.Li,Xiao(2013)."MachineLearningParadigmsforSpeechRecognition:AnOverview".IEEETransactionson
Audio,Speech,andLanguageProcessing.21:10601089.doi:10.1109/tasl.2013.2244083.
233.L.Deng,M.Seltzer,D.Yu,A.Acero,A.Mohamed,andG.Hinton(2010)BinaryCodingofSpeechSpectrogramsUsing
aDeepAutoencoder.Interspeech.
234.Z.Tuske,P.Golik,R.SchlterandH.Ney(2014).AcousticModelingwithDeepNeuralNetworksUsingRawTime
SignalforLVCSR.Interspeech.
235.McMillan,R."HowSkypeUsedAItoBuildItsAmazingNewLanguageTranslator",Wire,Dec.2014.
236.Hannunetal.(2014)"DeepSpeech:Scalingupendtoendspeechrecognition",arXiv:1412.5567.
237."PlenarypresentationatICASSP2016"(PDF).
238.RonSchneiderman(2015)"Accuracy,AppsAdvanceSpeechRecognitionInterviewswithVladSejnohaandLiDeng",
IEEESignalProcessingMagazine,Jan,2015.
239.http://yann.lecun.com/exdb/mnist/.
240.D.Ciresan,U.Meier,J.Schmidhuber.,"MulticolumnDeepNeuralNetworksforImageClassification,"Technical
ReportNo.IDSIA0412,2012.
241.D.Ciresan,A.Giusti,L.M.Gambardella,J.Schmidhuber(2013).MitosisDetectioninBreastCancerHistologyImages
usingDeepNeuralNetworks.ProceedingsMICCAI,2013.
242."TheWolframLanguageImageIdentificationProject".www.imageidentify.com.Retrieved20170322.
243.Vinyalsetal.(2014)."ShowandTell:ANeuralImageCaptionGenerator,"arXiv:1411.4555.
244.Fangetal.(2014)."FromCaptionstoVisualConceptsandBack,"arXiv:1411.4952.
245.Kirosetal.(2014)."UnifyingVisualSemanticEmbeddingswithMultimodalNeuralLanguageModels,"
arXiv:1411.2539.
246.Zhong,S.Liu,Y.Liu,Y."BilinearDeepLearningforImageClassification".Proceedingsofthe19thACM
InternationalConferenceonMultimedia.11:343352.
247.NvidiaDemosaCarComputerTrainedwith"DeepLearning"(http://www.technologyreview.com/news/533936/nvidiade
mosacarcomputertrainedwithdeeplearning/)(20150106),DavidTalbot,MITTechnologyReview
248.Y.Bengio,R.Ducharme,P.Vincent,C.Jauvin.,"ANeuralProbabilisticLanguageModel,"JournalofMachineLearning
Research3(2003)11371155,2003.
249.Goldberg,YoavLevy,Omar."word2vecExplained:DerivingMikolovetal.'sNegativeSamplingWordEmbedding
Method".arXiv:1402.3722 .
250.Socher,RichardManning,Christopher."DeepLearningforNLP"(PDF).Retrieved26October2014.
251.Socher,RichardBauer,JohnManning,ChristopherNg,Andrew(2013)."ParsingWithCompositionalVector
Grammars"(PDF).ProceedingsoftheACL2013conference.
252.Socher,Richard(2013)."RecursiveDeepModelsforSemanticCompositionalityOveraSentimentTreebank"(PDF).
EMNLP2013.
253.Y.Shen,X.He,J.Gao,L.Deng,andG.Mesnil(2014)"ALatentSemanticModelwithConvolutionalPoolingStructure
forInformationRetrieval,"Proc.CIKM.
254.P.Huang,X.He,J.Gao,L.Deng,A.Acero,andL.Heck(2013)"LearningDeepStructuredSemanticModelsforWeb
SearchusingClickthroughData,"Proc.CIKM.
255.Mesnil,G.Dauphin,Y.Yao,K.Bengio,Y.Deng,L.HakkaniTur,D.He,X.Heck,L.Tur,G.Yu,D.Zweig,G.
(2015)."Usingrecurrentneuralnetworksforslotfillinginspokenlanguageunderstanding".IEEETransactionson
Audio,Speech,andLanguageProcessing.23(3):530539.doi:10.1109/taslp.2014.2383614.
256.J.Gao,X.He,W.Yih,andL.Deng(2014)"LearningContinuousPhraseRepresentationsforTranslationModeling,"
Proc.ACL.
257.J.Gao,P.Pantel,M.Gamon,X.He,L.Deng(2014)"ModelingInterestingnesswithDeepNeuralNetworks,"Proc.
EMNLP.
258.BrocardoML,TraoreI,WoungangI,ObaidatMS."Authorshipverificationusingdeepbeliefnetworksystems(http://onl
inelibrary.wiley.com/doi/10.1002/dac.3259/full)".IntJCommunSyst.2017.doi:10.1002/dac.3259
259.J.Gao,X.He,L.Deng(2014)"DeepLearningforNaturalLanguageProcessing:TheoryandPractice(Tutorial),"
CIKM.
260.Arrowsmith,JMiller,P(2013)."Trialwatch:PhaseIIandphaseIIIattritionrates20112012".NatureReviewsDrug
Discovery.12(8):569.doi:10.1038/nrd4090.PMID23903212.
261.Verbist,BKlambauer,GVervoort,LTalloen,WTheQstar,ConsortiumShkedy,ZThas,OBender,AGhlmann,
H.W.Hochreiter,S(2015)."Usingtranscriptomicstoguideleadoptimizationindrugdiscoveryprojects:Lessons
learnedfromtheQSTARproject".DrugDiscoveryToday.20:505513.doi:10.1016/j.drudis.2014.12.014.
PMID25582842.
262."AnnouncementofthewinnersoftheMerckMolecularActivityChallenge"
https://www.kaggle.com/c/MerckActivity/details/winners.
263.Dahl,G.E.Jaitly,N.&Salakhutdinov,R.(2014)"MultitaskNeuralNetworksforQSARPredictions,"ArXiv,2014.
264."Toxicologyinthe21stcenturyDataChallenge"https://tripod.nih.gov/tox21/challenge/leaderboard.jsp
265."NCATSAnnouncesTox21DataChallengeWinners"http://www.ncats.nih.gov/newsandevents/features/tox21
challengewinners.html
266.Unterthiner,T.Mayr,A.Klambauer,G.Steijaert,M.Ceulemans,H.Wegner,J.K.&Hochreiter,S.(2014)"Deep
LearningasanOpportunityinVirtualScreening"(http://www.bioinf.jku.at/publications/2014/NIPS2014a.pdf).Workshop
onDeepLearningandRepresentationLearning(NIPS2014).
267.Unterthiner,T.Mayr,A.Klambauer,G.&Hochreiter,S.(2015)"ToxicityPredictionusingDeepLearning"(http://arxi
v.org/pdf/1503.01445v1).ArXiv,2015.
268.Ramsundar,B.Kearnes,S.Riley,P.Webster,D.Konerding,D.&Pande,V.(2015)"MassivelyMultitaskNetworks
forDrugDiscovery".ArXiv,2015.
269.Wallach,IzharDzamba,MichaelHeifets,Abraham(20151009)."AtomNet:ADeepConvolutionalNeuralNetworkfor
BioactivityPredictioninStructurebasedDrugDiscovery".arXiv:1510.02855 .
270."Torontostartuphasafasterwaytodiscovereffectivemedicines".TheGlobeandMail.Retrieved20151109.
271."StartupHarnessesSupercomputerstoSeekCures".KQEDFutureofYou.Retrieved20151109.
272."Torontostartuphasafasterwaytodiscovereffectivemedicines".
273.Tkachenko,Yegor.AutonomousCRMControlviaCLVApproximationwithDeepReinforcementLearninginDiscrete
andContinuousActionSpace.(April8,2015).arXiv.org:http://arxiv.org/abs/1504.01840
274.VandenOord,Aaron,SanderDieleman,andBenjaminSchrauwen."Deepcontentbasedmusicrecommendation."
AdvancesinNeuralInformationProcessingSystems.2013.
275.Elkahky,AliMamdouh,YangSong,andXiaodongHe."AMultiViewDeepLearningApproachforCrossDomainUser
ModelinginRecommendationSystems."Proceedingsofthe24thInternationalConferenceonWorldWideWeb.
InternationalWorldWideWebConferencesSteeringCommittee,2015.
276.Chicco,DavideSadowski,PeterBaldi,Pierre(1January2014)."DeepAutoencoderNeuralNetworksforGene
OntologyAnnotationPredictions".ACM.pp.533540.doi:10.1145/2649387.2649442viaACMDigitalLibrary.
277.Sathyanarayana,Aarti(20160101)."SleepQualityPredictionFromWearableDataUsingDeepLearning".JMIR
mHealthanduHealth.4(4):e125.doi:10.2196/mhealth.6562.
278.Choi,EdwardSchuetz,AndyStewart,WalterF.Sun,Jimeng(20160813)."Usingrecurrentneuralnetworkmodels
forearlydetectionofheartfailureonset".JournaloftheAmericanMedicalInformaticsAssociation:ocw112.
doi:10.1093/jamia/ocw112.ISSN10675027.PMID27521897.
279.Utgoff,P.E.Stracuzzi,D.J.(2002)."Manylayeredlearning".NeuralComputation.14:24972529.
doi:10.1162/08997660260293319.
280.J.Elmanetal.,"RethinkingInnateness,"1996.
281.Shrager,J.Johnson,MH(1996)."Dynamicplasticityinfluencestheemergenceoffunctioninasimplecorticalarray".
NeuralNetworks.9(7):11191129.doi:10.1016/08936080(96)000330.
282.Quartz,SRSejnowski,TJ(1997)."Theneuralbasisofcognitivedevelopment:Aconstructivistmanifesto".Behavioral
andBrainSciences.20(4):537556.doi:10.1017/s0140525x97001581.
283.S.Blakeslee.,"Inbrain'searlygrowth,timetablemaybecritical,"TheNewYorkTimes,ScienceSection,pp.B5B6,
1995.
284.{BUFILL}E.Bufill,J.Agusti,R.Blesa.,"Humanneotenyrevisited:Thecaseofsynapticplasticity,"AmericanJournal
ofHumanBiology,23(6),pp.729739,2011.
285.J.ShragerandM.H.Johnson.,"Timinginthedevelopmentofcorticalfunction:Acomputationalapproach,"InB.Julesz
andI.Kovacs(Eds.),Maturationalwindowsandadultcorticalplasticity,1995.
286.D.Hernandez.,"TheManBehindtheGoogleBrain:AndrewNgandtheQuestfortheNewAI,"
http://www.wired.com/wiredenterprise/2013/05/neuroartificialintelligence/all/.Wired,10May2013.
287.C.Metz.,"Facebook's'DeepLearning'GuruRevealstheFutureofAI,"
http://www.wired.com/wiredenterprise/2013/12/facebookyannlecunqa/.Wired,12December2013.
288.V.Vapnik.,"research.facebook.com"(https://research.facebook.com/researchers/1566384816909948/vladimirvapnik/).
289."GoogleAIalgorithmmastersancientgameofGo".NatureNews&Comment.Retrieved20160130.
290.Silver,DavidHuang,AjaMaddison,ChrisJ.Guez,ArthurSifre,LaurentvandenDriessche,GeorgeSchrittwieser,
JulianAntonoglou,IoannisPanneershelvam,Veda(20160128)."MasteringthegameofGowithdeepneuralnetworks
andtreesearch".Nature.529(7587):484489.doi:10.1038/nature16961.ISSN00280836.PMID26819042.
291."AGoogleDeepMindAlgorithmUsesDeepLearningandMoretoMastertheGameofGo|MITTechnologyReview".
MITTechnologyReview.Retrieved20160130.
292."BlipparDemonstratesNewRealTimeAugmentedRealityApp".TechCrunch.
293.G.Marcus.,"Is"DeepLearning"aRevolutioninArtificialIntelligence?"TheNewYorker,25November2012.
294.Smith,G.W.(March27,2015)."ArtandArtificialIntelligence".ArtEnt.RetrievedMarch27,2015.
295.Mellars,Paul(February1,2005)."TheImpossibleCoincidence:ASingleSpeciesModelfortheOriginsofModern
HumanBehaviorinEurope"(PDF).EvolutionaryAnthropology:Issues,News,andReviews.RetrievedApril5,2017.
296.AlexanderMordvintsevChristopherOlahMikeTyka(June17,2015)."Inceptionism:GoingDeeperintoNeural
Networks".GoogleResearchBlog.RetrievedJune20,2015.
297.AlexHern(June18,2015)."Yes,androidsdodreamofelectricsheep".TheGuardian.RetrievedJune20,2015.
298.BenGoertzel.ArethereDeepReasonsUnderlyingthePathologiesofToday'sDeepLearningAlgorithms?(2015)Url:
http://goertzel.org/DeepLearning_v1.pdf
299.Nguyen,Anh,JasonYosinski,andJeffClune."DeepNeuralNetworksareEasilyFooled:HighConfidencePredictions
forUnrecognizableImages."arXiv:1412.1897(2014).
300.Szegedy,Christian,etal."Intriguingpropertiesofneuralnetworks."arXiv:1312.6199(2013).
301.Zhu,S.C.Mumford,D."Astochasticgrammarofimages".Found.TrendsComput.Graph.Vis.2(4):259362.
doi:10.1561/0600000018.
302.Miller,G.A.,andN.Chomsky."Patternconception."PaperforConferenceonpatterndetection,UniversityofMichigan.
1957.
303.JasonEisner,DeepLearningofRecursiveStructure:GrammarInduction,http://techtalks.tv/talks/deeplearningof
recursivestructuregrammarinduction/58089/
Externallinks
DeepLearningLibrariesbyLanguage(http://www.teglor.com/b/deeplearninglibrarieslanguagecm569/)
DataScience:DatatoInsightsfromMIT(deeplearning)(https://mitprofessionalx.mit.edu/courses/coursev
1:MITProfessionalX+DSx+2016_T1/about)
Retrievedfrom"https://en.wikipedia.org/w/index.php?title=Deep_learning&oldid=778525642"
Categories: Deeplearning
Thispagewaslasteditedon3May2017,at17:37.
TextisavailableundertheCreativeCommonsAttributionShareAlikeLicenseadditionaltermsmayapply.
Byusingthissite,youagreetotheTermsofUseandPrivacyPolicy.Wikipediaisaregisteredtrademark
oftheWikimediaFoundation,Inc.,anonprofitorganization.

Deep Learning - Wikipedia

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Deep Learning - Wikipedia

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning - Wikipedia

Uploaded by

Copyright:

Available Formats

Deeplearning

Here, isthelearningrate, isthecostfunctionand astochasticterm.Thechoiceofthecostfunction

respectively.Thesoftmaxfunctionisdefinedas where representstheclassprobability

(outputoftheunit )and and representthetotalinputtounits and ofthesamelevelrespectively.Cross

Thegradient hasthesimpleform where representaverageswith

respecttodistribution .Theissuearisesinsampling becausethisrequiresrunningalternatingGibbs

where arethesetofhiddenunits,and arethemodel

Thealgorithmconsistsofmultiplestepsstartsbyastochasticmappingof to through ,thisisthe

where isthesetofhiddenunits,and arethemodelparameters,

Layer thlearnstherepresentationofthepreviouslayer ,extractingthe principalcomponent(PC)ofthe

You might also like