Interview

Download as pdf or txt
Download as pdf or txt
You are on page 1of 86

1|

Page

I
lpI
nfyi
nter
nal
:
:ht
tp:
//i
l
pce/
ce/
Cour
se/
SCP/
page/
30
htt
ps:/
/www.jav
atpoint.com/coll
ections-
in-
java
htt
ps:/
/begi
nnersbook.com/ j
ava-coll
ecti
ons-t
utori
als/
[ 6/ 6/ 20186: 26PM]Raj eshKonda:
htt
ps:/
/www.youtube.com/ watch?v=tt
H7VLw33i Y&li
st=PLj
JPQJzDx4Or
hZ4v
nowqAS5Qy
YPgMWJ7I
&i
ndex=3
 

1.Howt oimpor tandprocessdataintoHadoop


hdfsdfs-put-simplewayt oinser
tfil
esfrom l
ocalf
il
esyst
em toHDFS
HDFSJav aAPI
Sqoop-f orbri
ngingdatato/f
rom databases
Fl
ume-st r
eami ngfi
l
es,logs
Kafka-distr
ibutedqueue,mostlyfornear-
real
timestr
eam pr
ocessi
ng
Nif
i -i
ncubati
ngpr oj
ectatApachef ormov i
ngdatai
ntoHDFSwithoutmaki
ng
l
otsofchanges

Forprocessi
ngdat
a:Ther
ear
emanyway
stodoi
t
Mapr educe
Spark
Hive
Hbase
Pig
Mahout
MapR
Sqoop
Etc

2.Whatar ethedi ffer


entjoinsinHive
Join(
innerjoin)
Leftj
oin-Displaysallrowsf r
om leftt
abl
e.ShowsnullFornonmatched
recordsfrom righttable
Rightjoi
n-Di splaysallr
owsf rom ri
ghtt
able.Showsnull
Fornonmat ched
recordsfrom lefttable.
Skewj oi
n–skewj oinsarenot hi
ngbutdupli
catedat
a.Perf
ormsatreducer
side.
sethive.
opt
imize.
skewjoin=true;
sethive.
skewjoi
n.key
=500000;
sethive.
skewjoi
n.mapjoi
n.map.t
asks=10000;
sethive.
skewjoi
n.mapjoi
n.min.
spli
t=33554432;

1|
Page
2|
Page

Leftsemijoin-onlyleftsidecolumnst hattowhicharematchedrecordsget
s
di
splayed.
Mapj oin-
--smal lsetofdat awillbecopiedinmemor yandprocessthedata
soprocessingspeedi ncreases.
Usedf orf
astquer ypr ocessing
Thejoinoper at
ionper formedbasedonpr i
mar ykeyandthatwil
lbedoneat
mapperstageandont opofmemor y.sononeedt ogotoreduceoperati
on.
Tasksuccessf ulafterfini
shingthemappert ask.

3.Previ
ousproj
ectexperi
enceandissuesfacedpar
ti
cularwit
hSqoop.
Freefor
m queryt
oov er
comespeci al
charact
ersi
nhivedatawhileusi
ng
sqoopimport
Sincehi
vewon’tal
lowifthedat
ahassomespeci alcharact
ers.

Toel
i
minatecostofoperati
ononhugesetofdat
aweneedt
ospeci
fymi
nand
maxval
uesusingboundaryquer
y

Fet
chsizebydef1000i
fwecani
ncr
easei
tto1l
akhwecani
mpr
ovet
he
per
for
mance.

sqoop- -
impor t--connectj
dbc:
/local
host
:3306/
sqoop
--
usernameswagat
--
passwor dpasswor d
--
tablecust omer s
--
target-di
r/ user/ hdf
s
--
columnsi d,name
--
split
byi d
--
fi
leds- t
ermi nated-by",
"
--
hive-i
mpor t
--
hive-create-table
--
hive-tableschemaname. tabl
ename

sqoopi mport--
connectjdbc:
my sql:
//local
host/geo
--
usernamer oot--
passwor droot
--
tabl
epur chase--
target
-dir/
user/geouser/newpurchase_
hiv
e
--
boundary-quer
y“ sel
ect1,2from purchase”--
columns
productname,quantit
y,
pri
ze
-m 1--spl
it
-bypurchaseorder
_ i
d;

4.Previousproj
ectexper
ienceandissuesfacedpart
icul
arwi
thhiv
e.
Wor karoundf
orupdatesanddeletes:
Scd1andscd2andscd3: it
’spr
ettycost
li
eroperati
oninhi
ve.
5.Diff
erencebetweenRDD, Dataf
rameandDat aset?

2|
Page
3|
Page

3|
Page
4|
Page

Ty
pesaf
eexampl
e:

 DataFrame– Ify
ouar etr
yingt
oaccessthecolumnwhi chdoesnotexistinthe
tabl
ei nsuchcaseDataf
rameAPI sdoesnotsupportcompil
e-t
imeerror
.Itdetects
attr
ibuteerr
oronl
yatrunti
mebutRDDandDat aset
saretypesafeitcandetectat
compi l
etime

6.Byusi
ngsqoop,
ift
her
eisnopr
imar
ykeyv
alueshowt
oreadt
hedat
a.

Wecani
mpor
tbyspeci
fymapper
s–m 1orspl
i
t-bywi
thcol
umnname

sqoopi mpor t\
-
-connectj dbc: my sql
:/
/ l
ocal
host
/test
_db\
-
-usernamer oot\
-
-passwor d* ***\
-
-table<RDBMS- Tabl
e-name>\
-
-target-
dir/ user /r
oot/
user_dat
a\
-
-hive-i
mpor t\
-
-hive-t
able<hi ve-t
abl
e-name>\
-
-create-
hiv e-table\
-
m 1( or)--spl i
t-
by<RDBMS- Col
umn>

Wecanimportdat
afrom r
dbmsusingmapper
s1ify
ouarespeci
fyi
ngmor
e
mapperuneedtousespl
it
byfunct
ioncl
oumnnametodi
vi
dethedataf
or
i
mportpar
all
el

7.Askedaboutt
hepr
oject
sIwor
kedon,
over
all
andr
elev
antexper
ience

8.Howsqoopget
sconnect
edt
odi
ff
erentdat
abases?

j
dbc:
mysql
:
//db.
foo.
com/
cor
p

j
dbc:
net
ezza:
//nzhost
:5480/
sqoop

j
dbc:
oracl
e:t
hin:
@Or
acl
eSer
ver
:Or
acl
ePor
t:
Oracl
eSI
D

j
dbc:
post
gresql
:
//pgsql
.exampl
e.net
:5432/
sqoopt
est

j
dbc:
sap:
//connect
ionur
l:
30015/
?cur
rent
schema=<schemaname>-
dri
vercom.
sap.
db.
jdbc.
Dri
ver

9.Haveyouusedanyspeci
fi
cdeli
miteri
nsqoop?
Fi
ledst
erminat
edby,or|del
i
miterify
ouareimpor
ti
ngi
ttohi
vet
abl
e

10.
Def
aul
tmapper
sruni
nsqoop

4|
Page
5|
Page

4
11.
Howt oimpor tdatai fnopr i
mar ykeyspeci f
iedint hetable?
Mapper1
12.
Whati ssplit-
byi nsqoop?
Whennumberofmapper saremor eandt orunthem i nweneedt ospeci
fy
splitbyf unctiont odiv i
dethedat ator unthem inpar all
el
13.
IfIhav e10di ff
erentdat esandIusespl itbydate(--spl
it
-bydate)andIam
speci f
y i
ng  noofmapper s4( --
m 4)sohowt hesedat esaredistri
butedacr
oss
these4mapper s,innerlogic
3 322
14.
Differencebet ween  i
nternalandext er
nal t
able
Dropt abl e
Internaltable–dat aandschemaget sdeleted
External table–schemaget sdel et
ednotact ualdat a
15.
Ser desinhi ve
Ser deisacombi nat i
onofser il
izeranddeser i
li
zer
serdewi l
lconv ertst r
ingorbi narydatatojav aobjectsandv i
cev ersa
Fir
sti tdoesdeser ial
izeandt henser i
ali
ze.

org.
apache.
hadoop.
hive.
serde2.av r
o.Av roSerDe
org.
apache.
hadoop.
hive.
ql.i
o.orc.
Or cSerde
org.
apache.
hadoop.
hive.
ql.i
o.parquet.serde.Par
quet
Hiv
eSer
De
rg.
apache.
hadoop.
hiv
e.ql.
io.RCFileI
nput Format
sequent
ialfi
l
eserde
j
sonserde
16.
Syntaxf
orjsonser
de,howt oi
nser
tdatat
oajsontabl
efr
om nor
mal
tabl
e(managed/
exter
nal)andhowschemaismapped?

Fir
stofal lyouhav et oval
idatey ourj
sonf i
leon htt
p://j
sonlint
.com/  
aft
erthat
makey ourfileasoner owperl ineandr emovet he[].thecommaatt heendofthe
l
ineismandat or
y.
[{
"fi
eld1":"
dat a1"
,"
fiel
d2":
100,"field3":
"moredata1","
fiel
d4":123.001},
{"
fi
eld1":"data2",
"f
ield2"
:200,
"field3":"
mor edata2",
"f
ield4"
:123.002},
{"
fi
eld1":"data3",
"f
ield2"
:300,
"field3":"
mor edata3",
"f
ield4"
:123.003},
{"
fi
eld1":"data4",
"f
ield2"
:400,
"field3":"
mor edata4",
"f
ield4"
:123.004}]
Inmyt estIaddedhi ve-
json-ser de-0.
2.j
arfrom hadoopcl uster,Ithi
nkhive-
json-
serde-0.1.j
arshouldbeok.

ADDJARhi v
e-json-
ser
de-
0.2.
jar
;
Creat
eyourtable

CREATETABLEmy _table(fi
eld1stri
ng,fi
eld2i
nt,f
iel
d3stri
ng,fi
eld4doubl
e)ROW
FORMATSERDE' or
g.apache.hadoop.hi
ve.cont
ri
b.ser
de2.JsonSerde'
;
LoadyourJsondat
af i
le,her
eIl oaditf
rom hadoopclust
ernotfrom l
ocal

Needt
oexpl
oremor
e.

LOADDATAI
NPATH'
Test
2.j
son'
INTOTABLEmy
_tabl
e;

5|
Page
6|
Page

17.
Performancet uningi nhiv
e(expl
ainedhi m aboutorcformatandsnappy
compr ession)
Parti
tions
Bucket ing
Mapsi dejoi
ns
Orcf ormat
Parquetf ormat
Avrof ormat
Costbasedopt imizationi
sequal totrue(cbo.enabl
ed=t
rue)
Hive.queryusingst ats=true
Usingcompr essiontechniques(snappy ,zl
ib)
Execut i
onengi nescanusedt ot
ez, spark,Yarn
Hive.exe.paral
lel
l=true
Vector i
zati
on=t akesbunchofr ecor dstopr ocessthedat
a

18.
Whil
eissui
ngqueryonorctabl
einhi
ve,
sowi l
ldat
abede-ser
ial
i
zedthatt
ime?
Oncedatai
sde-seri
ali
zedhowitwi
ll
guarant
eeonbet
terperf
ormance?

Hi
vecanst
oret
hedat
ainhi
ghl
yef
fi
cientmanneri
ntheOpt
imi
zedRowCol
umnar
 

ORCFil
eformatr
educest
hedat astor
ageformatbyupt
o75%oft
heoriginal
datafi
l
e
per
for
msbet t
ert
hananyotherHivedataf
il
esformat
swhenHi
vei
sreading,wri
ti
ng,
andpr
ocessi
ngdata

blocklev
el i
ndexi
ng:hugesetofr
ecor
dsgetsdi
vi
dedi
ntosomecountofbl
ocksand
assignsindexforev
eryrecor
dsowhil
ereadi
ngt
hedat
aitcandi
rect
lygot
oblock
andgetthedat a.

19.
Arey
ouabl
etoseethecontentofor
cfi
l
es?
Nowhy?bezzi
tscompressed.

20.
MapSi deJoi ns:
smallsetofdat awillbecopiedinmemoryandpr ocesst
hedataso
processingspeedincr eases.
Usedf orfastqueryprocessing
Thejoinoper ati
onper formedbasedonpri
mar ykeyandt
hatwi l
lbedoneat
mapperst ageandont opofmemor y.
sononeedt ogotoreduceoperat
ion.
Tasksuccessf ulaft
erf ini
shi
ngthemappertask

21.
Howtof
ilt
eroutthedat
ahaving(speci
alchar
acterl
i
keenter
,tabs)i
npi
g,
hav
eyoucomeacr osst
hissi
tuati
on?
Wecanrepl
acethespeci
alcharact
ersusi
ngregex_r
epl
acefunti
on.
hi
ve>sel
ectr
egexp_
ext
ract
('
wor
d1/
~wor
d2/
~wor
d3'
,
'^
(\\
w.*
)\\
/~(
\\w.
*)$'
,
2)aswor
d3;
word3

hi
ve>sel
ectr
egexp_
ext
ract
('
wor
d1/
~wor
d2/
~wor
d3'
,
'^
(?:
([
^/~]
+)\
\/~?
){1}
',
1)aswor
d1;
wor
d1

6|
Page
7|
Page

hi
ve>sel
ectr
egexp_
ext
ract
('
wor
d1/
~wor
d2/
~wor
d3'
,
'^
(?:
([
^/~]
+)\
\/~?
){2}
',
1)aswor
d2;
wor
d2
r
egexp_
repl
ace(
"foobar
","
oo|
ar"
,""
)

22.
Givenmeascenar i
othatuserwhil
ef i
l
lingthedat
apressed’ENTER’ many
ti
messot hatthedatai
sspreadinnewl i
nes.Howtoprocessthisdatai
npig
sot hat
 thisdatashoul
dcomei nsingleli
ne?Andnofieldshouldbedelet
ed.

Wecanrepl
acethespeci
alchar
act
ersusi
ngr
egex_
repl
acef
unt
ion
Repl
ace\nwit
hspace

23.
Givenmescenar
iot
hatt
heyhavet
abdeli
mi t
eddat
aandt
hisdat
acont
ains
tabsinbet
weensohowtopr
ocessthi
sdata?

Rowf
ormatdel
i
mit
edf
il
edst
ermi
ntedby“ ”

24.
Whataretopi
cs,
consumer
sinkaf
ka?
It
’sadi
str
ibut
edpubandsubmessagi
ngsy
stem

Producer
:wil
lgener
atethemessagesbyconnecti
ngtosour ce
Requestt
ozookeeper-->i
nter
nsearchforavail
ablebr
oker
s
mov emessagestopart
icul
arbr
oker brokerwi
llsavethi
smessagesin
topi
cparti
ti
ons(bl
ocks)

Consumer consumerwi l
lsubscr
ibeoneormoretopi
csandgett
he
messagesfrom t
opicpart
it
ions.
Topi
c t opicwi
lldi
vi
deintomulti
plepar
ti
oti
ons.
eachpar
ti
ti
onwil
lhav
eit
s
ownrepl
ciati
on.
inpati
ti
onthemessageswi l
lbest
ored.

Par
ti
ti
on physi
calbl
ockar
enothi
ngbutpar
tit
ions.
Kaf
kabroker
s eachnodei
nthekafkacl
ust
eriscall
edkaf
kabr
oker
.

25.
Howconsumerwil
lknowfrom whi
chtopi
cthedat
ashoul
dbeconsumed?
Byusi
ngconsumer
.Subscr
ibemethod.

26.
Oncethedatai
sconsumedhowtoprocessthedata?
It
’sy
ourwish.y
oucanwrit
eyourcust
om logi
cbasedy ourr
equi
rement
.

27.
Howspar
kknowswhichconsumeritshoul
drecei
vet
hedata?Confi
gur
ati
on?
Weneedtospeci
fywhi
chconsumerneedstobeusedexamplekaf
ka,f
lume
etc
28.
Howkaf kaknowwhicht
hemostr
ecentdat
a/ordat
aisal
readyconsumed?
(conceptonoff
seti
nkaf
ka)
7|
Page
8|
Page

Usi
ngt
opi
cof
fsetv
alue.I
twi
l
lbef
etchedusi
ngzookeeper
.

29.
Anyexper
ienceoni
mpal
a,Hbase?
No
30.
Howt oaddr
esssmall
fil
eproblemsi
nHadoop?Ify
ouget( 1kbdat
a,3gbdat
a
etc….)andy
ouarenotsureaboutt
hesi
zeofthedatasohowdoy oudesi
gn
yourHadoopsyst
em?
Wecanusesequenti
alfi
lefor
mattocombi
nesmallfil
es.

31.
HadoopLi mi t
ati
ons?
Cannotsuppor tlargenumberofsmal lfi
l
es.Int
endedf orbatchpr
ocessi
ngnot
goodf orrealti
mepr ocessing.Goodf orOLAPbutnotf orOLTP (onl
ine
analyt
icalprocessing)
32.
Howt omov edataf rom DB2t oHDFSandchal lenges?
UsingSqoop
33.
Wher edidy ouusePI Gi nyourpr oj
ect?
Notused
34.
Whenwi
​ l
lHivenotr unmapr educeafterrunni
ngHi vequery?
Select*from table;willnotrunmapr educe
35.
Hive-Externaltables,Parti
tions

36.
HiveLimit
ati
ons
Recordlevel
Updatesdelet
ionswon’tsuppor
t
Highlat
ency
Slow
Won’tsupportf
orrealt
imepr ocessi
ng
OLTPnotsuppor t
ed

37.
HBasebasi
cquest
ionsl
i
kehowt
ocr
eat
etabl
e,al
tert
abl
e

create‘<t
ablename>’ ,

<columnf amily
>’
hbase(mai n)
:002:0>l
ist
TABLE
emp
2r ow(s)i
n0. 0340seconds
hbase>alter'
t1',
READONLY( option)
hbase>al t
er't1'
,METHOD⇒ ' table_att
_unset'
,NAME⇒ 'MAX_
FILESI
ZE'
base>al t
er‘tablename’ ,
‘delete’⇒‘ columnfamil
y’
hbase(mai n)
:007:0>alt
er'
empl oy ee'
,'
delet
e'⇒'
prof
essi
onal
'
hbase>get'
tabl
e1'
,'
row1'-
--t
hiswi
l
lshowal
lcol
umnv
aluesf
orr
ow1oft
abl
e1

hbase>get'
tabl
e1'
,'
row1'
,{
COLUMN=>'
col
1'
}--
-thi
swi
l
lshowv
alueofcol
umncol
1ofr
ow1of
tabl
e1.
HDFS HBase

HDFSi
sadi
str
ibut
edf
il
esy
stem HBasei
sadat
abasebui
l
tont
opoft
heHDFS.

8|
Page
9|
Page

sui
tabl
eforst
ori
ngl
argef
il
es.

HDFSdoesnotsupportfast HBasepr
ovi
desf
astl
ookupsf
orl
argert
abl
es.
i
ndi
vi
dualr
ecordlookups.

I
tprovi
deshighl
atencybat
ch I
tprov
idesl
owlatencyaccesstosingl
erowsf
rom
pr
ocessing;
noconceptofbat
ch bi
ll
i
onsofrecor
ds(Random access).
pr
ocessing.

I
tprov
idesonl
ysequent
ial
accessof HBaseinter
nall
yusesHasht abl
esandpr
ovi
des
dat
a. r
andom access,anditst
oresthedat
aini
ndexedHDFS
f
il
esforfasterl
ookups.

38.
Stori
ngdat ainHBaseusi ngPIG?
39.
Howt omov eXMLf il
estoHDFSusi ngst
reaming?
UsingKaf ka
40.
Myexi sti
ngprojectarchi
tect
ure?
41.
Whatt ypeofpr obl
emsuhav e facedi
nurproj
ectandhowdidu 
handl
e?
42.
Howt oconv ertxmlfi
leformattocsvandhowt oreadxml
code,someli
nesof
code? 

43.
Prepar
ingcsvf
il
einmappercl
ass.

44.
Ser
de'
sinhi
ve
45.
Canwewrit
ecust
omi
zedser
de?Howt
owr
it
ethecodef
orcust
omi
zedser
de?

46.
Whatissqoop?
Tool
totransf
erthedat
afr
om RDBMSt
oHadoopandv
icev
ersa.
47.
Howmanymapperi
nsqoop?
4
48.
Whati
spi
gandpigf
unct
ions?
49.
Whati
sORCinhi
ve?

50.
Whatar
etheadv
ant
agesofORC.

Columnor i
entedst or
agef ormat
.
Ori
ginal
lyitisHi ve'
sRowCol umnarf
ile.Nowi mprov
edasOpt i
mizedRC
(ORC)
Schemai swi t
ht hedat a,butasapartoffooter.
Dataisstoredasr owgr oupsandstr
ipes.
Eachstri
pemai ntainsindexesandstatsaboutdataitst
ores.

9|
Page
10|
Page

51.
Whati
sshuf
fl
ingi
nmapper
?

mov
ingdat
afr
om onenodet
oanot
herbasedonkey
s.

52.
DiduuseUDFi
npr
ojectandwhatt
ypeofdat
aishandl
ed?

Howt
oWr
it
eaUDFf
unct
ioni
nHi
ve?

1.CreateaJavaclassfortheUserDef i
nedFunct i
onwhichextends
ora.
apache.
hadoop.hive.
ql.exec.UDFandi mplementsmor ethanone
eval
uate(
)methods.Puti ny ourdesir
edlogicandy ouarealmostther
e.
2.PackageyourJavaclassint oaJARf il
e( Iam usi
ngMav en)
3.Got oHiv
eCLI,addy ourJAR, andver
ifyyourJARsi si
ntheHi veCLI
cl
asspath
4.CREATETEMPORARYFUNCTI ONinHivewhichpointstoyourJava
cl
ass
5.UseitinHiv
eSQLandhav ef un!

Therearebett
erway
st odothi
s,bywr
it
ingy
ourownGeneri
cUDFt
odealwith
non-pr
imiti
vet
ypesl
ikearr
aysandmaps–butIam notgoi
ngt
ocoveri
tin
thi
sarti
cle.

Iwi
l
lgoi
ntodet
ail
foreachone.

eval
uat
e(Textst
r,St
ri
ngst
ri
pChar
s)-wi
l
ltr
im speci
fi
edchar
act
ersi
nst
ri
pChar
sfr
om f
ir
star
guments
tr
.
ev
aluat
e(Textst
r)-wi
l
ltr
iml
eadi
ngandt
rai
l
ingspaces.
packageor
g.har
dik.
let
sdobi
gdat
a;
i
mpor
tor
g.apache.
commons.
lang.
Str
ingUt
il
s;
i
mpor
tor
g.apache.
hadoop.
hiv
e.ql
.exec.
UDF;
i
mpor
tor
g.apache.
hadoop.
io.
Text
;
publ
i
ccl
assSt
ri
pext
endsUDF{
pr
ivat
eTextr
esul
t=newText
();
publ
i
cTextev
aluat
e(Textst
r,St
ri
ngst
ri
pChar
s){
i
f(
str==nul
l
){
r
etur
nnul
l
;
}
r
esul
t.
set
(St
ri
ngUt
il
s.st
ri
p(st
r.
toSt
ri
ng(
),st
ri
pChar
s))
;
r
etur
nresul
t;
}

10|
Page
11|
Page

publ
i
cTextev
aluat
e(Textst
r){
i
f(
str==nul
l
){
r
etur
nnul
l
;
}
r
esul
t.
set
(St
ri
ngUt
il
s.st
ri
p(st
r.
toSt
ri
ng(
)))
;
r
etur
nresul
t;
}
}
hi
ve>ADD/
home/
clouder
a/wor
kspace/
Hiv
eUDFs/
tar
get
/Hi
veUDFs-
0.0.
1-SNAPSHOT.
jar
;
Added[
/home/
clouder
a/wor
kspace/
Hiv
eUDFs/
tar
get
/Hi
veUDFs-
0.0.
1-SNAPSHOT.
jar
]tocl
asspat
h
hi
ve>l
i
stj
ars;
/
usr
/l
ib/
hiv
e/l
i
b/hi
ve-
cont
ri
b.j
ar
/
home/
clouder
a/wor
kspace/
Hiv
eUDFs/
tar
get
/Hi
veUDFs-
0.0.
1-SNAPSHOT.
jar
hi
ve>CREATETEMPORARYFUNCTI
ONSTRI
PAS'
org.
har
dik.
let
sdobi
gdat
a.St
ri
p'
;
hi
ve>sel
ectst
ri
p('
hadoop'
,
'ha'
)fr
om dummy
;
OK
doop
Ti
met
aken:
0.131seconds,
Fet
ched:
1row(
s)
hi
ve>sel
ectst
ri
p('hi
veUDF'
)fr
om dummy
;
OK
hi
veUDF

53.
Prer
equi
sit
ef orDat ai
mpor tfrom Databasetohivetabl
eenvironment?
1.Connectivi
tybeweensour ceanddest i
nation
2.Permissi
onst odatabases
3.Jdbcdriverstoconnectt ot hedatabase.
4.Persmissionst ohi
v edatabasesandt abl
es( r
eadandwr i
te)
5.Hdfspermi ssions
6.Sqoopinst al
lati
on, hi
veinstall
ati
onandt hoseservi
ceshouldberunnabl
e.

54.
HDFSar
chi
tect
ure?

55.
Whati
sJobt
rackerandTaskt
rackerandhowt
ocommuni
cat
e?

Theywi
l
lcommuni
cat
eusi
ngRPC(
Remot
epr
ocedur
ecal
l
).

56.
HDFSdat
amov
ecommands?
Put
11|
Page
12|
Page

Get
Copyf
roml ocal
Copyt
olocal
Movefromlocal
Movetolocal
-
mv

57.
Whati
sXpat
hinj
ava?

No.

58.
Whati
sJAXB?
No.

59.
Dif
ferencebet weenAbstractcl assandanint
erf
ace?
60.
Whati sabstractmethodinj ava?
61.
Anyexper i
enceoni mpala,hbase?
NO
62.
Douhav eanyexperienceonst reami
ngli
keSparkorKaf
ka.
Yes.Ihav eknowledgeonkaf ka.

63.
Int
roducey
oursel
fal
ongwit
hyourpr
imaryskil
lsetandlat
estpr
ojecty
ou
workedonandwhatarey
ourrol
esandresponsibi
li
ti
es.

64.
Howt
oloadJSONr
ecor
dinaHi
vet
abl
e?

Usi
ngj
sonser
dewecanl
oadt
hedat
aint
ohi
ve

ROW FORMATSERDE'
org.
apache.
hiv
e.hcat
alog.
dat
a.JsonSer
De'

65.
Ifyouhavest
agi
ngt
abl
eort
empor
aryt
abl
e,sowi
l
lyoucr
eat
eitasext
ernal
or
int
ernal
?

I
nter
nal
becausest
oragewi
l
lbehdf
sonl
y

Whent empt
abl
esused:
onl
yforper
ti
cul
arsessi
onl
atert
heset
abl
esget
s
del
eted.

12|
Page
13|
Page

66.
Howt
odebugHi
vequer
ies?

hi
ve-
-hi
veconfhi
ve.
root
.l
ogger
=DEBUG,
consol
e

Ty
pical
quer
yexecut
ionf
low(
HighLev
el)

1. TheUIcal
l
stheexecut
eint
erf
acet
otheDr
iver
.
2. TheDri
vercr
eatesasessionhandlef
ort
hequer
yandsendst
hequer
ytot
he
compil
ert
ogenerateanexecuti
onplan.
3. Thecompil
ergetst
henecessar ymetadataf
rom t
hemetastor
e.Thi
s
metadat
aisusedtotypechecktheexpressi
onsint
hequer
ytreeaswellast
o
pr
uneparti
ti
onsbasedonquer ypredi
cates.
4. Thepl
angener atedbyt hecompilerisaDAGofst ageswi t
heachst age
beingeit
heramap/ reducejob,amet adataoper
ati
onoranoper ationonHDFS.
Formap/ reducestages,theplancont ai
nsmapoper atortr
ees(operat
ortrees
thatareexecutedont hemapper s)andar educeoperatortr
ee(foroperat
ions
thatneedr educer
s).
5. Theexecut i
onengi nesubmi tsthesest agestoappr opriatecomponent sI
n
eacht ask( mapper /reducer)thedeser ial
i
zerassociatedwi tht hetableor
i
ntermedi ateout putsi susedtor eadt herowsf r
om HDFSf i
lesandt heseare
passedt hrought heassociatedoper atortree.Oncetheout putisgenerated,i
t
i
swr i
ttentoat empor aryHDFSf il
et hought heseri
ali
zer(thi shappensi nthe
mapperi ncaset heoper at
iondoesnotneedar educe).Thet emporaryfil
esare
usedt opr ovidedat atosubsequentmap/ reducestagesoft heplan.ForDML
operationst hef i
naltempor aryfi
leismov edtothetable'slocat i
on.This
schemei susedt oensur ethatdirt
ydat aisnotread( f
ilerenamebei ngan
atomicoper ati
oni nHDFS) .
6. Forqueri
es,t
hecontent
soft
hetempor
aryfi
lear
er eadbyt
heexecut
ion
engi
nedirect
lyf
rom HDFSaspar
tofthef
etchcal
lfr
om theDri
ver.

67.
Howwi
l
lyoupasspar
amet
eri
nhi
vet
abl
e?

$bi
n/hi
ve-
-hi
veconfa=b-
e'seta;
sethi
veconf
:a;
\
cr
eat
etabl
e i

notexi
stsb(
col
 
int
);descr
ibe${
hiv
econf
:a}
'

hi
ve-
-hi
veconft
abl
e=empl
oyee-
-hi
veconfy
ear
=2016-
fsampl
e.hql

use${
hiv
econf
:dat
abase}
;

sel
ect*f
rom ${
hiv
econf
:t
abl
e}wher
eyear=${
hiv
econf
:year
};

13|
Page
14|
Page

68.
Howmanyf
il
efor
mat
syouhav
eusedi
nHi
ve?

Par
quett
ext
fi
leandor
c

69.
Hav
eyouwor
kedonI
mpal
a?

No

ButIknowt
heconcept
:

1.I
nst
all
Impal
aonal
lnodes

2.Forev
erynodet
her
ewi
l
lbei
mpal
adser
vice

3.Thi
sser
vicewi
l
lrunoneachandeachev
erydat
anode

4.Oneoft
heimpaladservi
cewi
l
lactascoor
edi
nat
orf
orcol
l
ect
ingt
he
t
askandsubmittocl
ient

5.I
foneoft
hedaemonf
ail
edi
mpal
awontr
etr
yandt
askst
atuswi
l
lbe
f
ail
ed

6.That
’si
twon’
tbeaf
aul
ttol
arentser
vice

70.
Howt
oconv
ertt
extf
il
einORCf
il
efor
mat
?

Cr
eatet
emptabl
eandl
oadtextf
il
eint
oitandagai
ncr
eat
eor
canddot
he
i
nser
tover
wri
tet
oorct
abl
e

71.
Sqoopwher
econdi
ti
onscl
ausei
nfr
eef
rom quer
y?

sqoopi
mpor
t\

-
-connectj
dbc:
mysql
:
//my
sql
.exampl
e.com/
sqoop\

-
-user
namesqoop\

-
-passwor
dsqoop\

-
-quer
y'SELECTnor
mci
ti
es.
id,
\

count
ri
es.
count
ry,
\

nor
mci
ti
es.
cit
y\

14|
Page
15|
Page

FROM nor
mci
ti
es\

JOI
Ncount
ri
esUSI
NG(
count
ry_
id)\

WHERE$CONDI
TIONS'
\

-
-spl
i
t-byi
d\

-
-t
arget
-di
rci
ti
es\

-
-boundar
y-quer
y"sel
ectmi
n(i
d),
max(
id)f
rom nor
mci
ti
es"

72.
Whati
sdi
ff
erencebet
weenAv
rof
il
eandORCf
il
e?

1)AVRO:
-

 I
tisr
owmaj orf
ormat.
 I
tspr
imarydesi
gngoalwasschemaev ol
uti
on.
 I
ntheavr
oformat,westor
eschemaseparatel
yfr
om dat
a.Gener
all
y
avr
oschemaf i
le(
.av
sc)i
smai nt
ained.
2)ORC

 Columnor i
entedst or
agef ormat
.
 Ori
ginal
lyitisHi ve'
sRowCol umnarf
ile.Nowi mprov
edas
Opti
mi zed RC( ORC)
 Schemai swi t
ht hedat a,butasapartoffooter.
 Dataisstoredasr owgr oupsandstr
ipes.
 Eachstri
pemai ntainsindexesandstatsaboutdataitst
ores.
3)Par
quet

 Si
mi l
artoORC.Basedongoogl edremel
 Schemastoredinfooter
 Columnorient
edstorageformat
 Hasintegr
atedcompr essi
onandindexes

Around10GBofCSVdatacompressedto1.
1GBofORCwit
hZLI
Bcompressionand
samedat at
o1.2GBofPar
quetGZIP.Bot
hfi
lef
ormat
swi
thSNAPPYcompressi
on,
usedaround1.
6GBofspace.

73.
Whati
sPar
quetf
il
efor
mat
?

15|
Page
16|
Page

ApacheParquetisa 
col
umnarst
orage 
for
matav
ail
abl
etoanypr
ojecti
nthe
Hadoopecosystem

Whenwear epr ocessi


ngBi gdat a,
costrequiredtost
oresuchdat aismore
(Hadoopst oresdataredundant l
yI.e3copi esofeachfil
etoachievefaul
t
tolerance)alongwi tht
hest oragecostpr ocessi
ngthedat acomeswi t
h
CPU, NetworkIO,etccosts.Ast hedataincreasescostforprocessi
ngand
stor ageincreases.Par
queti sthechoiceofBi gdata asitserv
esbothneeds,
efficientandper f
ormancei nbot hstor
ageandpr ocessing.

Parquetst
oresbinarydat
ainacol umn-orientedway,wherethev al
uesofeach
col
umnar eorganizedsothattheyarealladjacent
,enabli
ngbetter
compressi
on.Iti
sespeciall
ygoodf orquer i
eswhichreadparti
cularcol
umns
fr
om a“ wi
de”(withmanycolumns)t abl
esi nceonlyneededcolumnsareread
andIOisminimized.Readthisformoredet ail
sonPar quet
.

Wecannotl oad t
extfi
ledir
ectlyi
ntopar
quettabl
e,weshoul
dfi
rstcr
eatean
alt
ernat
etabletostorethetextfi
l
eanduseinsertov
erwr
it
ecommandt owri
te
thedatainparquetfor
mat .

Organi
zi
ngbycolumnallowsforbet
tercompressi
on,asdat
aismore
homogeneous.Thespacesav
ingsareverynoti
ceabl
eatthescal
eofa
Hadoopclust
er.

I
/Owillber
educedaswecanef f
ici
entl
yscanonl
yasubsetofthecolumns
whi
lereadi
ngthedat
a.Bett
ercompressional
sor
educesthebandwidth
r
equir
edtoreadthei
nput.

74.
Whatar
etheHDFScommandsy
ouhav
ewor
kedon?

hadoopf
s[gener
icopt
ions]

hadoopf
s-mkdi
r[-
p]<pat
h>

[
-appendToFi
l
e<l
ocal
src>.
..<dst
>]

[
-cat[
-i
gnor
eCr
c]<sr
c>.
..
]

16|
Page
17|
Page

[
-checksum <sr
c>.
..
]

[
-chgr
p[-
R]GROUPPATH.
..
]

[
-chmod[
-R]<MODE[
,
MODE]
..
.|OCTALMODE>PATH.
..
]

[
-chown[
-R][
OWNER]
[:
[GROUP]
]PATH.
..
]

[
-copy
FromLocal
[-f
][-
p]<l
ocal
src>.
..<dst
>]

[
-copy
ToLocal
[-p][
-i
gnor
eCr
c][
-cr
c]<sr
c>.
..<l
ocal
dst
>]

[
-count[
-q]<pat
h>.
..
]

[
-cp[
-f
][-
p|-
p[t
opax]
]<sr
c>.
..<dst
>]

[
-cr
eat
eSnapshot<snapshot
Dir
>[<snapshot
Name>]
]

[
-del
eteSnapshot<snapshot
Dir
><snapshot
Name>]

[
-df[
-h][
<pat
h>.
..
]]

[
-du[
-s][
-h]<pat
h>.
..
]

[
-expunge]

[
-get[
-p][
-i
gnor
eCr
c][
-cr
c]<sr
c>.
..<l
ocal
dst
>]

[
-get
facl
[-R]<pat
h>]

[
-get
fat
tr[
-R]{
-nname|
-d}[
-een]<pat
h>]

[
-get
mer
ge[
-nl
]<sr
c><l
ocal
dst
>]

[
-hel
p[cmd.
..
]]

[
-l
s[-
d][
-h][
-R][
<pat
h>.
..
]]

[
-mkdi
r[-
p]<pat
h>.
..
]

[
-mov
eFr
omLocal
<local
src>.
..<dst
>]

[
-mov
eToLocal
<sr
c><l
ocal
dst
>]

[
-mv<sr
c>.
..<dst
>]

[
-put[
-f
][-
p]<l
ocal
src>.
..<dst
>]

[
-r
enameSnapshot<snapshot
Dir
><ol
dName><newName>]

17|
Page
18|
Page

[
-r
m[-
f][
-r
|-
R][
-ski
pTr
ash]<sr
c>.
..
]

[
-r
mdi
r[-
-i
gnor
e-f
ail
-on-
non-
empt
y]<di
r>.
..
]

[
-set
facl
[-R][
{-
b|-
k}{
-m|
-x<acl
_spec>}<pat
h>]
|[
--
set<acl
_spec><pat
h>]
]

[
-set
fat
tr{
-nname[
-vv
alue]|
-xname}<pat
h>]

[
-set
rep[
-R][
-w]<r
ep><pat
h>.
..
]

[
-st
at[
for
mat
]<pat
h>.
..
]

[
-t
ail
[-f
]<f
il
e>]

[
-t
est-
[def
sz]<pat
h>]

[
-t
ext[
-i
gnor
eCr
c]<sr
c>.
..
]

[
-t
ouchz<pat
h>.
..
]

75.
Whatisf
sckcommandusedf
or?Andwhati
stheout
puty
ougetwhi
l
egi
vi
ng
thi
scommand?

[
dev
user
@dhb2b-
dv-
cedn0~]
$hdf
sfsck/
user
/dev
user

Connecti
ngtonamenodev i
ahtt
p:/
/dhb2b-
dv-
cmsn0.cl
oud.
corp.
tel
str
a.com:
50070

FSCKstart
edbydevuser(
aut
h:KERBEROS_
SSL)f
rom /
10.
56.
52.
68f
orpat
h
/user
/dev
useratFr
iFeb1616:45:
14EST2018

.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.

.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.

.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.

.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.

.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.

.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.

.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.

.
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
..
.

.
..
..
..
.St
atus:
HEALTHY

18|
Page
19|
Page

Tot
alsi
ze: 2572152687B

Tot
aldi
rs: 206

Tot
alf
il
es:808

Tot
alsy
mli
nks: 0

Tot
albl
ocks(
val
i
dat
ed)
: 801(
avg.bl
ocksi
ze3211176B)

Mi
nimal
l
yrepl
i
cat
edbl
ocks:801(
100.
0%)

Ov
er-
repl
i
cat
edbl
ocks: 0(
0.0%)

Under
-r
epl
i
cat
edbl
ocks: 0(
0.0%)

Mi
s-r
epl
i
cat
edbl
ocks: 0(
0.0%)

Def
aul
trepl
i
cat
ionf
act
or:3

Av
eragebl
ockr
epl
i
cat
ion: 1.
6666666

Cor
ruptbl
ocks: 0

Mi
ssi
ngr
epl
i
cas: 0(
0.0%)

Numberofdat
a-nodes: 3

Numberofr
acks: 1

FSCKendedatFr
iFeb1616:
45:
14EST2018i
n30mi
l
li
seconds

Thef
il
esy
stem underpat
h'/
user
/dev
user
'isHEALTHY

76.
Howt
osetpr
oper
ti
esi
nhi
ve?

Youcant
hepr
oper
tyusi
ngsetcommandonl
yfort
hatsessi
on

I
fyouar
eset
ti
ngf
orwhol
ecl
ust
erneedt
odef
inei
tin

pr
oper
tyi
nhi
ve-
sit
e.x
ml:

<property>
<name>hi ve.
exec.
dynamic.
part
it
ion.mode</
name>
<value>nonstr
ict
</val
ue>
</property
>
hive>SEThi ve.
exec.
dynamic.
part
iti
on=tr
ue;
19|
Page
20|
Page

hi
ve>SEThi
ve.
exec.
dynami
c.par
ti
ti
on.
mode=non-
str
ict
;

77.
Howt
ovi
ewal
lpr
oper
ti
esseti
nhi
ve?

Set;

78.
Whatar
eyourst
rengt
handoppor
tuni
ti
es?

79.
Limi
tat
ionsofHi
ve

Itdoesnotof ferreal-
timequer i
es.
Itdoesnotof ferrow-levelupdate
Providesaccept ablel
at encyfori
nteracti
vedatabr
owsing.
Sub-queriesarenotsuppor t
edinHi ve
Latencyf orApacheHi vequer i
esisgener al
lyv
eryhi
gh.
Notdesi gnedforOnl i
neTr ansacti
onPr ocessi
ng
Suppor t
sov erwrit
ingorappr ehendingdatabutnotupdat
esanddel
etes.
Highlat
ency
Slow
I
mpal a,Apachehawkanddr
il
lar
esomeal
ter
nat
ivest
ryi
ngt
oov
ercomesomeof
theseli
mitati
ons.

80.
Limi
tat
ionsofSpar
k

NoSuppor
tforReal
-t
imePr
ocessi
ng

Pr
obl
em wi
thSmal
lFi
l
e

NoFi
l
eManagementSy
stem

Expensi
ve

Manual Opt i
mizat
ion
It
erati
vePr ocessi
ng
Latency
ApacheSpar khashi
gherlatencyascomparedt
o ApacheFl
ink.
WindowCr i
teri
a
Sparkdoesnotsuppor trecordbasedwindowcr
iter
ia.I
tonl
yhasti
me-
basedwi ndowcr i
ter
ia.

20|
Page
21|
Page

81.
Limi
tat
ionsofHDFS

AlthoughHadoopisthemostpower fult
oolofbigdata,t
herear ev ar
ious
l
imi t
ati
onsofHadoopl i
keHadoopi snotsui
tedforsmallfi
les,itcannot
handlefirml
ythel
ivedata,sl
owprocessingspeed,notef
ficientforit
erat
ive
processing,
notef
fi
cientforcachi
ngetc

I
ssuewi
thSmal
lFi
l
es
Sl
owPr
ocessi
ngSpeed
Suppor
tforBat
chPr
ocessi
ngonl
y
NoReal
-t
imeDat
aPr
ocessi
ng
NoDel
taI
ter
ati
on
Lat
ency
NotEasyt
oUse
Secur
it
y

82.
Howt
oloadt
xtf
il
eorj
sondat
aandcr
eat
eschemausi
ngspar
k?

#Loadat
extf
il
eandconv
erteachl
i
net
oaRow.

l
i
nes=sc.
text
Fil
e("
/user
/dev
user
/test
/sampl
e")

par
ts=l
i
nes.
map(
lambdal
:l
.spl
i
t("
,"
))

peopl
e=par
ts.
map(
lambdap:
Row(
id=i
nt(
p[0]
),name=p[
1],
loc=p[
2])
)

schemaPeopl
e=sc.
creat
eDat
aFr
ame(
peopl
e)

f
rom py
spar
k.sql
impor
tSQLCont
ext

sql
Cont
ext=SQLCont
ext
(sc)

schemaPeopl
e=sql
Cont
ext
.cr
eat
eDat
aFr
ame(
df3)
21|
Page
22|
Page

schemaPeopl
e.cr
eat
eOr
Repl
aceTempVi
ew(
"peopl
e")

f
rom py
spar
k.sql
impor
tRow

df
1=sc.
text
Fil
e("
/user
/dev
user
/test
/dupl
i
cat
e.t
xt"
)

df
2=df
1.map(
lambdax:
x.spl
i
t("
,"
))

df
3=df
2.map(
lambdaf
:Row(
name=f
[0]
,l
ocat
ion=f
[1]
,company
=f[
2])
)

sql
Cont
ext=SQLCont
ext
(sc)

schemaPeopl
e=sql
Cont
ext
.cr
eat
eDat
aFr
ame(
df3)

schemaPeopl
e.cr
eat
eOr
Repl
aceTempVi
ew(
"peopl
e")

df=spar
k.r
ead.
load(
“/home/
sampl
e.j
son”
)

samest
epsf
orhowwecr
eat
edf
ort
extf
il
e

83.
Whatar
erepar
ti
oti
onsi
nspar
k?

repar
ti
ti
ons--
-incr
easeordecr
easi
ngt
henumberpar
ti
ti
onsbasedonl
oad
andprocess

84.
Coal
elseusecasei
nspar
k?

Coal
elsei
susedf
ordecr
easi
ngt
henumberpar
ti
ti
ons

85.
Coal
elseusecasei
nhi
ve?

t
oretur
nfi
rstnotnul
lval
ue(
ifi
tsnul
lchecksf
or2ndv
alueandr
etur
nsnotnul
l
v
alue)

86.
.wi
thcol
umnf
unct
ionusecasei
nspar
k?

adnewcol
umnswi
thnewv
aluesorupdat
eexi
sti
ngcol
umns

.
wit
hcol
umn(
fi
eldname,
"der
ivat
ionl
ogi
c")

22|
Page
23|
Page

87.
Sur
getkeyusageandhowt
owr
it
esy
ntaxi
nspar
k?

SegKeyEmpName Desi
gnat
ion

1 Pr
ashant
h TA

2 Pr
ashant
h TL

1 Swagat
h TA

2 Swagat
h TL

Df
.sel
ect
(EmpName,
Desi
gnat
ion)
.gr
oupBy
(EmpName)
.max(
segKey
).show(
)

88.
Howt
oremov
edupl
i
cat
erecor
dsi
nspar
k

.
dropDupl
i
cat
es(
)

.dr
opDupli
cat
es(
“col
umnnames”
)—y
oucanspeci
fycol
umnnamesaswel
lor
avalue.

.
dist
inct
()wi
l
lal
sor
emov
edupl
i
cat
erecor
ds

89.
Cany
ouexpl
ainhadoopar
chi
techt
ure?

23|
Page
24|
Page

Her
eresour
cemanagerdev
idedi
nto2t
ypesunl
i
kehadoop1

Applicat
ionmaster
:isresponsi
blef
orcont
act
ingnodemanagerf
ornegi
ati
ng
resoucesandall
ocati
on.

Schedul
ar:
isr
esponsi
blef
oronl
yschedul
i
ngj
obsandwontanymoni
tor
ingor
somethi
ng.

Nodemanager: i
sresponsi
blef
ormoni
tori
ngtheresour
cesandnegot
iat
ingwi
th
cont
ainer
sforspaceall
ocat
ionandmonit
ori
ngtheresour
ces.

Appl
icat
ionmast
eri
sresponsi
blef
ormoni
tor
ingt
heappl
i
cat
ionj
obsandupdat
ingt
o
nodemanager

Cont
ainer
s:areresponsi
blef
orbl
ocklevelst
orageandnegot
iat
ingwi
thnode
managerandappli
cati
onmasterf
ortasks

90.
Expl
ainaboutscd1andscd2?

Scd1—t
omai
ntai
nonl
yupdat
edr
ecor
ds

Scd2-t
omai
ntai
nhi
stor
yandupdat
edr
ecor
ds

Expl
anat
ionofscd1:
loadt
argett
abl
edat
aint
otempt
abl
e

Nowcompar
esour
cet
abl
eandt
empt
abl
edat
a

24|
Page
25|
Page

 DFTemp1=Sel
ectonl
ytar
get
DFcol
umns-Tar
get
DFl
eft
out
erj
oin
Sour
ceDFwher
e

sour
ce.
pri
mar
ykey=nul
l
;

 I
nser
tov
erwr
it
eDFTemp1i
ntot
argetDF.

 I
nser
tSour
cer
DFi
ntot
arger
tDF

Sel
ect*f
rom sour
cesl
eftj
oint
empt
abl
eton(
s.i
d=t
.i
d)wher
et.
fi
eld=nul
l
;

Nowy
ouwi
l
lgett
heol
drecor
dsusi
ngabov
equer
y

I
nser
tov
erwr
it
eint
otar
gett
abl
e(nowol
drecor
dsupdat
ed)

Nowy
ouwantt
oupdat
enewr
ecor
dssosi
mpl
yupdat
etar
gett
abl
erecor
ds

Theni
nser
tsour
cet
abl
edat
aint
otar
gett
abl
e(newr
ecor
dsget
supdat
ed)

Expl
ati
onat
ionofscd2:sameasscd1buti
nst
eadofoverwr
it
ingt
het
arget
tabl
eweneedtoinsertupdat
edr
ecordsi
ntot
argett
abl
e.

 DFOl
dRecor
ds=Sel
ectonl
ytar
getDFcol
umns-Tar
get
DFl
eft
out
erj
oin
Sour
ceDFwher
esour
ce.pr
imar
ykey=nul
l;

 DFUpdat
edRecordsFr
omTar
get=Sel
ectonl
ytar
getDFcolumns-
Tar
getDFlef
tout
erjoi
nSour
ceDFwher
esource.
pri
marykey!=nul
l
;

 I
nser
tov
erwr
it
eDFOl
dRecor
dsi
ntot
argetDF.

 I
nser
tint
otargetDFSel
ectDFUpdatedRecor
dsFr
omTar
get.
*,”
CBD”
(
EndDate)f
rom DFUpdatedRecor
dsFromTarget
.

 I
nsertint
otar
getDFSelectsourceDF.
*,
”CBD”
(Ef
fect
iveSt
art
Dat
e),nul
l(
EndDate)f
rom sourceDF.

25|
Page
26|
Page

91.
Whati
sfl
apmapandmap?

Readseachandev
eryr
ecor
del
ementbyel
ementbuti
nmapi
treadr
ecor
dby
recor
d.

92.
Litf
unct
ionusage?

Usedt
orepl
aceconst
antv
aluesi
nspar
ksql
.

Val
val
ue=10

Val
ue=10

Li
t(
val
ue)

Sel
ect*f
rom t
abl
ewher
ea=l
i
t(&v
alue)
;

93.
Howdoy
oudebugi
nspar
k?

Usi
ngspar
ktr
acki
ngur
lwecandebugt
hei
ssueandt
rackt
hest
atusofj
ob

spark-
submit--
mast eryar
n-cl
uster--
num- executor
s4-
-execut
or-
memor
y8g-
-
dri
ver-
memor y4g--executor-
cores2--conf
spark.
yar
n.execut
or.memor yOverhead=1024s. py

94.
Spar
kSql
logi
cal
plan?

26|
Page
27|
Page

95.
Sc.
par
all
eli
zeexampl
e?

96.
MSCKREPAI
R?

TOst
atst
omet
adat
aifnumberpar
ti
ti
onsi
ncr
easedordecr
eased.

Sy
ntax
:Msckr
epai
rtabl
etabl
ename

97.Whati
sFl
ume?
Fl
umei
sadi
str
ibut
edser
vicef
orcol
l
ect
ing,
aggr
egat
ing,
andmov
ingl
argeamount
sof
l
ogdat
a

27|
Page
28|
Page

98.Whati
sFl
umeNG
Arealt
imel
oaderf
orstr
eami
ngyourdat
aintoHadoop.I
tstoresdatai
nHDFSandHBase.
You’
l
lwantt
ogetst
art
edwithFl
umeNG,
whichimprovesontheori
ginalf
lume.

99.
Pyt
honi
nter
viewquest
ions

100. Basi
cpr
ogr
amsusi
ngwhi
l
efori
fandel
sest
atement
s?

Fori
inr
ange(
1,11)

Pr
intI

I=1

Whi
l
eI<=10:

Pr
inti

I
+=1;

A=10

B=20

I
fa<b:

Pr
int“
{“”
}”

El
sei
fa==20:

Pr
int“
{“i
sequal
tob”
}”

El
se:

Pr
int“ai
sgr
eat
er”

101. Pr
intj
pegi
magesi
nonef
il
eusi
ngpy
thoncode?

I
mpor
tos,
glob

Os.
chdi
r(
“pat
hofj
pegf
il
es”
)

Forf
il
eingl
ob.
glob(
“*.
jpeg”
)

Pr
int
(fi
l
e)

28|
Page
29|
Page

102. Pr
intf
izzbuzz,
buzzusi
ngmodul
arf
unct
ionsi
npy
thon?

Fornum i
nxr
ange(
1,101)
:

I
fnum %5==0andnum %3==0:

Pr
int“
fi
zzbuzz

El
sei
fnum %6==0:

Pr
int
”fi
zz”

El
sei
fnum %7==0:

Pr
int
”buzz”

El
senum %9==0:

Pr
int“
none”
(somet
hing”
)

103. Pr
intf
ibonacci
ser
iesusi
ngpy
thon?

A,
b=0,
1

ForIi
nxr
ange(
1,10)
:

Pr
inta

A,
b=b,
a+b (
a=b,
b=a+b)

104. Li
sts,
tupl
es,
dict
ionr
ies,
set
sinpy
thon?

Li
st=[
1,2,
3,
45]

Fori
ninl
i
st:

Pr
intI

I
tsmut
abl
eandwecanchanget
hem i
nrunt
ime

Tupl
es

29|
Page
30|
Page

Tupl
e=(
1,2,
3,
4,
5)

ForIi
ntupes:

Pr
inti

I
tsi
mmut
abl
ecantbechangedl
i
kel
i
st

Al
i
sti
nsi
deat
upl
ecanbechangedbutat
upl
einsi
deal
i
stcannotbechanged.

Di
ctnor
ies:

Di
ct={
‘name’
:
’swagat
’,

loc’
:
’hy
d’,

comp’
:
’i
nfosy
s’}

Forkey
,val
indi
ct.
i
ter
it
ems(
):

Pr
int“
my{
}is{
}”.
for
mat
(key
,val
)

My
_set=(
10,
12,
20,
10,
20)
--
--t
oremov
edupl
i
cat
es

ForIi
nmy
set
:

Pr
inti

105. Howt
opr
intsquar
eofnumber
s?

Li
st=[
1,2,
3,
4,
5,
6]

Squar
e=[
num*
num f
ornum i
nli
st]

Pr
intSquar
e

106. Gener
ator
sanddi
cti
onar
iesdi
ff
?

Xr
angeandi
ter
it
ems:
rangeandi
temsbot
hdoest
hesamewor
k

30|
Page
31|
Page

Butxrangewil
ltakeeachyiel
donebyonewher
easr
angewi
l
lpi
ckf
ull
dat
a
keepsitinmemor yanddoesthepr
ocess

I
ter
it
emsal
sosame

I
ter
it
emsar
esomet
hingnextv
alue

Valueasxrangei
ssomet
hingy
iel
dsofpr
evi
ousdeforder
ivedf
rom pr
evi
ous
funct
ion.

107. spar
k1andspar
k2di
ff
?

si
ngl
epointent
ryi
nspark2i
ssparksessi
onwhereasi
n1sqlcont
exthi
ve
cont
extandspar
kcont
extwi
l
lbasi
cal
lycreat
econf
usi
onwhi
lecreat
ing
cont
ext

maj
orchangei
n2i
sdat
asetanddat
afr
ameapi
smer
ged

t
ogetmoreeffi
ci
encyt
hecombi
nat
ionofpar
quetandcachi
ngwi
l
limpr
ove
t
hethr
oughtput

108. py
thon:

Shal
l
owcopyanddeepcopy

sy
ntax:
l2=copy
.copy
(l1)

l
4=copy
.deepcopy
(l2)

i
tisar
efer
encetothenewi
nst
ancet
ypecr
eat
edandkeept
hev
aluescopi
ed
i
nthenewinst
ance

31|
Page
32|
Page

i
tisusedtost
orethev
aluesthatareal
readycopi
edandcr
eat
esr
efobj
ect
andpoint
sitt
othenewobjectcreat
ed

109. di
ffl
i
standt
upl
es:

l
i
sti
smut
abl
ecanbechangedatr
unt
imebutat
ubl
ecannoti
tsi
mmut
abl
e

l
i
st=[
1,2,
3,
4]

t
upl
e=(
1,2,
3,
4)

ucanchangel
i
statt
imel
i
kel
i
st.
append[
5]

butt
upl
eit
sfi
xedwecannotchangei
t

butat
upl
einsi
deal
i
stcanbechanged

l
i
st=(
[1,
2,
3])

l
i
st.
append(
4)

110. Howt
ohandl
efi
xedl
engt
hfi
esi
nhi
veandpi
g?

I
nhi
vecountf
iel
dlengt
handpassi
tini
nputr
egexwi
thser
depr
oper
ti
es

CREATETABLEemployees_
stg(
empl
i
dSTRI
NG,
nameSTRI
NG,
ageI
NT,
sal
aryDOUBLE,
deptSTRING)

ROW FORMATSERDE'
org.
apache.
hadoop.
hiv
e.cont
ri
b.ser
de2.
RegexSer
De'

WI
THSERDEPROPERTI
ES(

32|
Page
33|
Page

"
input
.r
egex"="
(.
{4}
)(.
{35}
)(.
{3}
)(.
{11}
)(.
{4}
)",

"
out
put
.f
ormat
.st
ri
ng"="
%1$s%2$s%3$s%4$s%5$s"

LOCATI
ON'
/pat
h/t
o/i
nput
/empl
oyees_
stg'
;

I
npi
gwecanl
oadt
hedat
awi
tht
hesubst
ri
ngf
unct
ion

Pi
g=l
oad(
fi
lenamepat
h)

For
eachpi
gsubst
ri
ng(
id,
'0'
,
'4,
)

Justment
ionf
oral
lcol
umnswecanabl
etol
oadt
hedat
a

111. Zi
pfunct
ionusecasei
npy
thon?

T1=(
1,2,
3,
4,
5)

T2=(
a,b,
c,
v,
b)

Zi
p(t
1.t
2)wi
l
lresul
t

[
(a,
1),
(b,
2)….
.et
c)

I
fdi
ct(
zip(
t1.
t2)

I
tcr
eat
eskeyv
aluepai
rsl
i
kebel
ow

[
(a:
1),
(b:
2)….
.et
c)

112. howt
ocr
eat
eshel
lscr
iptf
orr
epl
i
cat
inganot
herdat
abasei
nhi
ve?

#!/bin/bash
tables=`hiv
e-e"usedb_ ns;
showtabl
es;
"`
fortablein$tables
do
hive-e" usedb_ns_bkup;cr
eat
etabl
e$tabl
eassel
ect*f
rom db_
ns.
${t
abl
e};
"
done
33|
Page
34|
Page

113. DDLmeans?

Datadef
inati
onlanguage:whi
char
epr
edef
ined.
exampl
es:
creat
e,drop,sel
ect,
tabl
e

114. DMLmeans?

Dat
amani
pul
ati
onl
anguage:
whi
chi
susedf
orchangi
ngt
heexi
sti
ng
dat
a.

115. Spar
kbestpr
act
ice

Readi
ngDat
a:

1.Readspeci
fi
ccol
umnsf
rom t
het
abl
es.

2.Av
oidcr
eat
ingmul
ti
pledat
afr
amesf
ort
hesamet
abl
e.

Tr
ansf
ormat
ions:

3.Av
oidusi
ngcount()t
oval
i
dat
eifadat
afr
amei
sempt
yornot
,inst
eaduse
i
sEmpty(
)funct
ion.

4.Duri
ngupdates/del
etesconsi
derpar
ti
ti
onsandupdat
e/del
etespeci
fi
c
part
it
ionsonl
y.

5.Usetheopti
mum numberoft
ranf
ormat
ions,
tryt
omer
get
ranf
ormat
ionst
o
theext
entpossi
ble.

6.Per
sistorcacheadat
afr
amei
fiti
sref
erencedmul
ti
plet
imei
nwor
kfl
ow.

Joi
ns:

7.Joi
ns,
usesmal
ldat
aset
sonr
ightsi
de.

8.Toi
mpr
ovej
oinsper
for
mance,
broadcastsmal
ldat
aset
s.

9.Useal
i
asnamesf
orbr
oadcastdat
aset
sinj
oins.

10.Av
oidsubquer
iesandusej
oinsi
nst
ead.

Wr
it
ingDat
a:

34|
Page
35|
Page

11.Whi
l
ewr
it
ingt
otabl
euseal
lcol
umnl
i
sti
npl
aceof"
*".

12.Av
oidov
erwr
it
e,i
nst
eduset
runcat
eandi
nser
tint
otabl
e.

13.Whenwrit
ingmul
ti
pledat
aset
stot
abl
e,av
oiduni
onandwi
tet
hedat
aset
s
seper
atel
y.

Cachev
sBr
oadcast
:

ht
tps:
//st
ackov
erf
low.
com/
quest
ions/
38056774/
spar
k-cache-
vs-
broadcast

bucket
ing:
:

i
tdi
vi
deseachpar
ti
ti
oni
ntomul
ti
plebucket
s(cl
ust
erbysomev
alue)

andwhi
l
edi
vi
ngi
ntobucket
situseshashi
ngt
echni
que

i
tusesmodul
ardi
vi
sion%2(
supposei
fit
scl
ust
erby2bucket
s)

eachreminderli
ke0r
emi
nderr
ecor
dsgoest
oonef
olderand1r
emi
ndergoes
toanotherf
older

t
hisopt
imi
zat
ionwi
l
lbemostusef
ulwhi
l
edoi
ngj
oinoper
ati
ons

j
oinsal
souseshasingt
echi
quewhi
l
esowhi
l
eper
for
mingt
aski
tdi
rect
lygot
o
hashval
uefol
derand

35|
Page
36|
Page

per
for
mst
het
ask

==============================================================
================================

howt
osear
chawor
dinmul
ti
plef
il
esusi
ngshel
lscr
ipt
ing

hadoopf
s -
ls 
/apps/
hdmi
-t
echnol
ogy
/b_
dps/
real
-t
ime 
|
 awk 
'
{pr
int$8}


\

whi
l
e r
eadf

do

 
 hadoopf
s -
cat$f
 |
 gr
ep 
-qbcd4bc3e1380a56108f
486a4f
ffbc8dc 
&& 
echo$f

done

==============================================================
==================================

howf
indav
erageofsal
ari
esi
nbi
l
li
onr
ecor
dsusi
ngmapr
educepr
ogr
am?

supposet
her
eiscsvfi
l
ewhi
chwi
l
lbesendt
omapperf
orusi
quekey
sli
ke
empnameandsalar
y

t
hesev
aluesgetshuf
fl
esi
nshf
fel
i
ngphase

r
educerphasev
aluesgetsummedupandf
indav
eragev
aluesofsal
ary

==============================================================
==================================

di
ff
erencebet
weencacheanddi
str
ibut
edcache
36|
Page
37|
Page

cacheisper
ti
culart
ooneexecut
ororonemachine(onecore)wher
eas
dist
rubut
edcacheissomet
hingl
ikebr
oadcast
ingusedacrossthecl
ust
er

==============================================================
===================================

di
ff
erencebet
weencacheandbr
oadcast
ing

cachei
sperti
cul
aronenodebutwhenev
erot
hert
askorexecut
orr
equi
resi
t
wil
lbeused

butbr
oadcast
ingper
for
msacr
osscl
ust
erandwi
l
lbeusedal
lthet
asks

==============================================================
=====================================

i
mpor
tpandasaspd

f
rom col
l
ect
ionsi
mpor
tOr
der
edDi
ct

f
rom dat
eti
mei
mpor
tdat
e

sal
es=[{'
account
':'
JonesLLC'
,'
Jan'
:150,'
Feb':200,
'Mar':
140},
{'
account'
:
'
AlphaCo'
,'Jan'
:200,'
Feb'
:210,
'Mar
':215}
,{
'account
':'
BlueInc'
,'Jan'
:50,
'
Feb':
90,'Mar'
:95}]

sal
es={
'account
':
['
JonesLLC'
,'
AlphaCo'
,'
BlueI
nc'
]
,

'
Jan'
:[
150,
200,
50]
,

'
Feb'
:[
200,
210,
90]
,

37|
Page
38|
Page

'
Mar
':[
140,
215,
95]
}

df
4 = 
spar
k.cr
eat
eDat
aFr
ame(
df2)

i
mpor

org.
apache.
spar
k.sql
.t
ypes.
*

i
mpor

org.
apache.
spar
k.sql
.{
Row,
 
Spar
kSessi
on}

i
mpor

org.
apache.
spar
k.sql
.t
ypes.
*

SCALA:
:
::
:

Scal
a:i
tscompi
l
edl
anguage.unl
i
kepy
thon,
Ruby
.it
’snoti
nter
pret
er.

Conv
ert
stocl
assf
il
eandcompi
l
ed.I
nter
medi
ateandt
empcl
assesgener
ated.

I
t’
sanr
epl
shel
l--
-READEVALUAEPRI
NTLOOP(
REPL)

:
hel
ptoseeal
lthecommands

38|
Page
39|
Page

Howt
odecl
areav
ari
abl
einscal
a

Val
andv
ar

Val
–immut
abl
e

Var–mut
abl
e–canbechanged

Val
x:i
nt=10

I
tsi
mmut
abl
eyoucannotchange

I
fyout
ryt
ochangei
tcanbechangel
i
kex–11

I
tst
hrowsaner
ror

Val
x:st
ri
ng=“
test

Val
x:Bool
ean=“
test

Bool
eanv
aluest
rueandf
alsewi
l
lbet
akenaut
omat
ical
l
yev
enI
fyoudondecl
arel
i
ke

Ex:

v
arx=t
rue

i
tsshowsasxi
sBool
eant
ypeandaut
omat
ical
l
yassi
gnsxasBool
ean

ev
eni
tsdoubl
eorf
loatv
aluei
taut
omat
ical
l
ytakeswi
thi
nbui
l
tfeat
ures

scal
adoesnotsuppor
tpr
imi
ti
veorwr
appercl
ass-
--
yesi
tdoesnotsuppor
t

v
alx={
val
a=100;
b=200;
a+b}

i
tpr
intx=300

whatev
ert
hel
astexpr
essi
oni
nbr
asseswhi
chwi
l
lbepr
int
edasouput

39|
Page
40|
Page

Lazyev
aluat
ion:
lazyl
oadi
ng.

unl
i
kel
oadi
ngf
ull
dat
ainomemor
yandper
for
mingt
asks

I
fyoudecl
areav
ari
abl
easl
azyi
tloadt
hedat
awhent
her
eisademandf
oranyt
ask

Li
ke

Lazyv
alx=100

I
twontal
l
ocat
eanymemor
ytoxv
ari
abl
e

40|
Page
41|
Page

Forl
oopsy
ntaxi
nscal
a

For{
II<=1to5}pr
int
{I
}--
--
-anot
herex:
::int
eadyoucanpr
int5o1-
--
-youneedt
ouseby-
1itr
educest
hecount
er to0andprint
stheval
ues54321

I
tshows12345

I
fIwantt
hem i
nnewl
i
neIneedt
ogi
vepr
int
ln

I
fIwantt
opr
int

Li
kev
alst
r=“
test

St
r.
for
each(
pri
nt)f
oral
ll
eter
sinnewl
i
nepr
int
ln-
--

41|
Page
42|
Page

Forl
oopwi
tguar
dcondi
ti
on:

Insi
deifyouwr
iteanyi
fst
atement
sori
sact
ivenotanycondi
ti
onsar
ecomst
oforl
l
opquar
d
condit
ionsi
nscala

Yi
eldi
scollecti
onvect
orwhichbasi
cal
l
ycol
l
ect
sthev
aluesofapar
ti
cul
arv
ari
abl
eand
st
oresi
tinsteadofpri
nti
ng

Howt
owr
it
eaf
unct
ioni
nscal
a

Defaer
aof
rech{
xfl
oat
,yf
loat
}fl
oat={
x*y
}

Cal
lthef
unct
ionar
eaof
rech(
10,
20)i
tpr
int
s200

Thi
sist
het
ypi
cal
synt
axofscal
aandr
etur
nthev
alueofx*y

I
fyoudef
inel
i
ke

Defaer
aof
rech{
xfl
oat
,yf
loat
}uni
t={
pri
ntl
n(x*
y)}-
--
--f
loatasuni
t

I
twon’
tret
urnanyt
hingwheny
oucal
lthef
unct
ion

Ar
eaof
rech(
10.
45,
20.
33)

I
trut
urnst
hev
alues

Howt
ocheckev
enorodd?

Defi
sev
ennumber
{nmber
:int
}Bool
ean={number%2==0}

I
sev
ennumber
(100)

42|
Page
43|
Page

Av
alv
ari
abl
edecl
arat
ionl
i
kev
alx=10

Youcannotabl
etochanget
hev
alueofxsi
ncei
tsr
efer
encedt
oxt
o10

Buti
fyouar
est
ori
ngv
aluesasanar
ray

Youcant
hev
alues

Val
arr=Ar
ray
{1,
2,
3,
4,
5}

Youcanchngear
r(0)=10

Ar
r(1)=20 y
oucanchangel
i
ket
hisr
efer
enceonl
yforAr
raynotf
ori
ndi
vi
dual
val
ues.

Ar
r+=Ar
ray
Buf
fer(
)

Thissy
ntaxi
susedt
oaddt
hev
aluesf
ort
hepr
evi
ousl
i
key
oucanr
emov
ethev
aluesf
rom
previ
ousone

Ar
r++=Ar
ray
Buf
fer
800,
300)

I
tadds800300i
fyou–i
tremov
es800300v
alues

Ar
r.
for
each(
pri
ntl
n)

For(n<ar
rifn%2==0)y
iel
d(n)

Ar
ray,
Arr
ayBuf
fer
,Maps,
l
ist
s,Tupl
es

Ar
rayi
scol
l
ect
ionofel
ement
sandl
engt
hoft
hear
rayi
sfi
xed

43|
Page
44|
Page

Wher
easAr
raybuf
feri
syouneedt
oment
iont
hel
engt
hofv
aluesy
oucanment
ionasbel
ow

Andi
nvokeasmanyasy
ouwant

Val
arr=Ar
ray
Buf
fer
[i
nt]
()

Exampl
eshowsast
her
eIsnol
engt
h

Usi
ngpandasl
oadi
ngdi
cti
onar
iesdat
a

Di
cti
onar
y:i
snot
hingbutunor
der
edkeyv
aluepai
rs.A 
dict
ionar
y i
sanassoci
ati
vear
ray
.

>>>sal
es=[
{'
account
':
'JonesLLC'
,'
Jan'
:150,
'Feb'
:200,
'Mar
':140}
,

.
..
  
  
  
  
 {'
account
':
'Al
phaCo'
,
 'Jan'
:200,
'Feb'
:210,
'Mar
':215}
,

.
..
  
  
  
  
 {'
account
'
:'Bl
ueI
nc'
,
 'Jan'
:50,
 '
Feb'
:90,
 '
Mar
':95}
]

>>>

>>>df
2=pd.
Dat
aFr
ame(
sal
es)

>>>pr
int
(df
2)

 
 Feb Jan 
Mar
  
 account

0 200 150 140 JonesLLC


210 200 215 
 Al
phaCo


 90 
 50 
 95 
 Bl
ueI
nc

>>>f
rom py
spar
k.sql
impor
tSQLCont
ext

>>>sql
Ctx=SQLCont
ext
(sc)

>>>sql
Ctx.
creat
eDat
aFr
ame(
df2)
.show(
)

+-
--
+--
-+-
--
+--
--
--
--
-+

|
Feb|
Jan|
Mar
| account
|

+-
--
+--
-+-
--
+--
--
--
--
-+

|
200|
150|
140|
JonesLLC|

|
210|
200|
215|
AlphaCo|

|
90|
50|
95|
BlueI
nc|

+-
--
+--
-+-
--
+--
--
--
--
-+

44|
Page
45|
Page

>>>

==================
>>>list=[1,
2,3,
4,5]
>>>df 2=sc.parall
eli
ze(
li
st)
>>>df 2.
coll
ect()
[1,2,3,4,5]
 
 
df3=df 2.
map( lambdax: Row(
id=i
nt(
x),
)).
toDF(
)
>>>df 3.
show( )
+---
+
|id|
+---
+
|
 1|
|
 2|
|
 3|
|
 4|
|
 5|
+---
+
Py
thon:
:
:Cl
ass

Wr
appi
ngofdat
amember
sanddat
amet
hodsi
nasi
ngl
euni
tiscal
l
ed

Wecanachi
eveencapsul
ati
onusi
ngcl
asses

canuset
hechar
act
eri
sti
csofpar
entcl
assi
ntochi
l
dcl
assusi
ngi
nher
it
ance

r
egul
arexpr
essi
on 
A regul
arexpressi
on i
saspecialsequenceofcharact
ersthathelpsy
oumat chorfi
nd
otherstr
ingsorsetsofstr
ings,usi
ngaspecial
izedsyntaxheldinapatt
ern.Regul
ar
expressi
onsar ewidel
yusedinUNI Xworld.

Mul
ti
thr
eadi
ng:
mul
ti
plet
asksatt
hesamet
imei
scal
l
edmul
tit
hreadi
ng

45|
Page
46|
Page

defcal
c_squar
e(number
s):

f
orni
nnumber
s:

t
ime.
sleep(
0.2)

pr
int
('
squar
e:'
,
n*n)

defcal
c_cube(
number
s):

f
orni
nnumber
s:

t
ime.
sleep(
0.2)

pr
int
('
cube:
'
,n*
n*n)

ar
r=[
1,2,
3,
4,
5]

i
mpor
tthr
eadi
ng

t=t
ime.
ti
me(
)

t
1=t
hreadi
ng.
Thr
ead(
tar
get
=cal
c_squar
e()
,ar
gs=(
arr
,)
)

t
1=t
hreadi
ng.
Thr
ead(
tar
get
=cal
c_squar
e,ar
gs=(
arr
,)
)

t
2=t
hreadi
ng.
Thr
ead(
tar
get
=cal
c_cube,
args=(
arr
,)
)

46|
Page
47|
Page

>>>t
1.st
art
()

>>>(
'squar
e:'
,1)

(
'squar
e:'
,4)

(
'squar
e:'
,9)

(
'squar
e:'
,16)

(
'squar
e:'
,25)

>>>t
2.st
art
()

>>>(
'cube:
'
,1)

(
'cube:
'
,8)

(
'cube:
'
,27)

(
'cube:
'
,64)

(
'cube:
'
,125)

>>>

Howcant
het
ernar
yoper
ator
sbeusedi
npy
thon?

Ans:
 TheTer
nar
yoper
atori
stheoper
atort
hati
susedt
oshowt
hecondi
ti
onalst
atement
s.Thi
s
consi
stsoft
het
rueorf
alsev
alueswi
thast
atementt
hathast
obeev
aluat
edf
ori
t.

Sy
ntax:

TheTer
nar
yoper
atorwi
l
lbegi
venas:
[
on_
true]i
f[expr
essi
on]el
se[
on_
fal
se]
x,y=25,
50bi
g=xi
fx<yel
sey

Exampl
e:

Theexpr
essi
onget
sev
aluat
edl
i
kei
fx<yel
sey
,int
hiscasei
fx<yi
str
uet
hent
hev
aluei
s
r
etur
nedasbi
g=xandi
fiti
sincor
rectt
henbi
g=ywi
l
lbesentasar
esul
t.

Wi
thcol
umnusageandr
ever
se:
:
::
::

>>>df
3=df
2.wi
thCol
umn(
"rev
",df
2.name)

>>>df
3.show(
)

47|
Page
48|
Page

+-
--
--
--
--
+--
--
--
--
-+

| name| r
ev|

+-
--
--
--
--
+--
--
--
--
-+

|
mal
ayal
am|
mal
ayal
am|

| 121| 121|

|l
sdkj
f|l
sdkj
f|

| sdf
jk| sdf
jk|

|l
sjdf
|lsj
df|

+-
--
--
--
--
+--
--
--
--
-+

>>>f
rom py
spar
k.sql
impor
tfunct
ionsasf
un

>>>df
3=df
2.wi
thCol
umn(
"rev
",f
un.
rev
erse(
df2.
name)
)

>>>df
3.show(
)

+-
--
--
--
--
+--
--
--
--
-+

| name| r
ev|

+-
--
--
--
--
+--
--
--
--
-+

|
mal
ayal
am|
mal
ayal
am|

| 121| 121|

|l
sdkj
f|f
jkdsl
|

| sdf
jk| kj
fds|

|l
sjdf
|fdj
sl|

+-
--
--
--
--
+--
--
--
--
-+

>>>df
4=df
3.wher
e(df
3.name==df
3.r
ev)

>>>df
4.show(
)

+-
--
--
--
--
+--
--
--
--
-+

| name| r
ev|

+-
--
--
--
--
+--
--
--
--
-+

|
mal
ayal
am|
mal
ayal
am|

| 121| 121|

48|
Page
49|
Page

+-
--
--
--
--
+--
--
--
--
-+

Sqoop:

· Howsqoopi
nter
nal
l
yhandl
eimpl
i
citdat
aty
peconv
ersi
onwhi
l
eloadi
ngdat
afr
om
ot
herrdbmst
ohiv
e.

49|
Page
50|
Page

Ans:
::Sqoopi
: mpor
ttoHI
VE
Youneedsame -
-map-
col
umn-
j a 
av ment
ionedabov
e.
Bydef
ault,
sqoopsupport
stheseJDBCt y
pesandconv
ertt
hem i
nthecor
respondi
ng
hi
vetypes:

INTEGER
SMALLINT
VARCHAR
CHAR
LONGVARCHAR
NVARCHAR
NCHAR
LONGNVARCHAR
DATE
TIME
TIMESTAMP
CLOB
NUMERIC
DECIMAL
FLOAT
DOUBLE
REAL
BIT
BOOLEAN
TINYI
NT
BIGI
NT
I
fyourdat
aty
pei
snoti
nthi
sli
st,
yougeter
rorl
i
ke:

Hivedoesnotsuppor ttheSQLt ypef or...


..
Soluti
on
Youneedt oadd  -
-map-column-hi
ve i
ny oursqoopi mpor tcommand.
Syntax:
 -
-map-
column-hivecol-
name=hive-ty
pe,.
..
Forexampl e,
 
--
map- col
umn- hi
vecol1=st
ring,
col2='
var
char
(100)
'

· Whet hersqoopcanhandl eBLOB/CLOBdat at


ypes.
Yes
sqoopi mpor t\
-Dmapr ed. j
ob. queue. name=defaul

\
–connectj dbc: oracle:t
hin:
@hostname:
por
t/por
t\
–usernameXxxxxx  
\
–passwor dXXXXXX  \
–query“ SELECT*FROM t abl
ename WHERE\$CONDITI
ONS”\
–hive-drop- i
mpor t-
del i
ms\
–map- col umn- javacol umn1=Stri
ng,
col
umn2=Stri
ng\
-m 8\
–hive
-i
mpor t\
–hive-tablet ablename  \
–target-dir/ user/hdf s/\
–fi
elds- termi nated-by’01’\
–split
-byi d;
column1/ 2—>AnyCLOB/ BLOBcolumnconvert
edtoJavaSTRI
NGv
alue.

50|
Page
51|
Page

· Howt ohandl
eincr
emental
datal
oadusingPr
imaryandti
mestampcolumn.
let
'stakeexampleher
e,youar
ehavingcustomertabl
ewit
ht wocolumnscust
_id
andpoli
cy,al
socusti
disyourpri
marykeyandyoujustwanttoi
nsertdat
acustid100
onward

scenar
io1:
-appendnewdat
aont
hebasi
sofcust
_idf
iel
d

phase1:
-

bel
ow3recor
dsar
ether
ewhi
char
einser
tedr
ecent
lyi
ncust
omert
abl
ewhi
chwewant
toi
mpor
tinHDFS

|
cust
id|
Pol
i
cy|
|
101|1|
|
102|2|
|
103|3|
her
eissqoopcommandf
ort
hat

sqoopimport\--
connectjdbc:mysql:
//l
ocalhost
:3306/db\-
-user
nameroot-
P\--
tabl
e
customer\-
-tar
get-
dir/user/
hive/warehouse//\--
append\--
check-
col
umncust
id\--
i
ncremental
append\- -
last-
value100

phase2:
-bel
ow4recor
dsaret
her
ewhi
char
einser
tedr
ecent
lyi
ncust
omert
abl
ewhi
ch
wewantt oi
mpor
tinHDFS

|
cust
id|
Pol
i
cy|
|
104|4|
|
105|5|
|
106|6|
|
107|7|
her
eissqoopcommandf
ort
hat

sqoopimport\--
connectjdbc:mysql:
//l
ocalhost
:3306/db\-
-user
nameroot-
P\--
tabl
e
customer\-
-tar
get-
dir/user/
hive/warehouse//\--
append\--
check-
col
umncust
id\--
i
ncremental
append\- -
last-
value103

sot
hesef
ourpr
oper
ti
eswewi
l
lhav
etocosi
derf
ori
nser
ti
ngnewr
ecor
ds

-
-append\
-
-check-col
umn<primarykey
>\
-
-i
ncrementalappend\
-
-l
ast-val
ue<LastValueofpr
imar
ykeywhi
chsqoopj
obhasi
nser
tedi
nlastr
un>
scenar
io2:
-appendnewdat
a+updat
eexi
sti
ngdat
aont
hebasi
sofcust
_idf
iel
d

below1newr ecor
dwit
hcusti
d108hasinser
tedandcusti
d101and102hasupdat
ed
recent
lyi
ncustomert
ablewhi
chwewanttoimporti
nHDFS

|
cust
id|Pol
i
cy|
|
108|8|
|
101|11|
|
102|12|
sqoopi
mpor
t\-
-connectj
dbc:
mysql
:
//l
ocal
host
:3306/
db\-
-user
namer
oot-
P\-
-t
abl
e
51|Page
52|
Page

customer\-
-tar
get-
dir/user
/hiv
e/warehouse/
/\-
-append\-
-check-
col
umncust
id\-
-
i
ncremental
lastmodifi
ed\--l
ast-
val
ue107

sothesef
ourpr
oper
ti
eswewi
l
lhav
etocosi
derf
ori
nser
t/updat
erecor
dsi
nsame
command

-
-append\
-
-check-col
umn<primarykey
>\
-
-i
ncrementall
astmodifi
ed\
-
-l
ast-val
ue<LastValueofpr
imar
ykeywhi
chsqoopj
obhasi
nser
tedi
nlastr
un>
Iam speci
fi
call
ymentioni
ngpr
imarykeyasift
ablei
snothav
ingpr
imar
ykeyt
henf
ew
morepropert
iesneedstobeconsi
derwhichar
e:-

mul
ti
plemapperper
for
mt hesqoopj
obbydef
aul
tsomapperneeddat
atobespl
i
ton
t
hebasisofsomekeyso

ei
therwehavet
ospecifi
cal
l
ydef
ine-
-m 1opt
iont
osayt
hatonl
yonemapperwi
l
l
perf
ormthi
soperat
ion

orwehavetospeci
fyanyot
herkey(
byusi
ngsqooppr
oper
ty-
-spl
i
t-by)t
hroughwi
thy
ou
canuni
quel
yident
if
ythedat
athenyoucanuse

· Howwoul
dy ouschedul
esqoopjobs.
Cr
ont
ab,appworks,
air
fl
owscheduler,
Oozi
e,azkabal

· Howsqoopi
nter
nal
l
yhandl
esi
ftar
gett
abl
elockedbyot
herpr
ocess.

sqoopi
mpor
t..
.--
tabl
ecust
om_
tabl
e---
-t
abl
e-hi
ntsNOLOCK

· What
’sthedef
ault#ofmappersinv
okedbysqoopj
ob?I
fIwantt
ochanget
hedef
aul
t
#ofmappershowwouldyoudealtwit
h?
Def4mappers
Wecanchangethem i
nsqoop-
sit
e.xml

· Whatisboundar
yquer
ies?What
’st
hedi
ff
erencebet
weenboundar
yquer
iesandspl
i
t-
byopt
ion.

Spl
i
tbyfunct
ioni
susedt
ospl
i
tthenumberofr
ecor
dsbasedonakey
Torunt
hem i
nparal
l
el

Tosett
heboundar
ys(
minandmaxv
alues)

· Whatarethedi
ff
erentt
ypesoff
ormatsuppor
tedbysqoop(
li
keav
ro,
par
quetet
c.,
)and
di
ff
erencebet
weenthosefor
mats.
-
-as-
avr
odat
afeI
i
l mpor
tsdat
atoAv
roDat
aFi
l
es
52|
Page
53|
Page

-
-as-
sequencef
ieI
l mpor
tsdat
atoSequenceFi
l
es
-
-as-
text
fi
le I
mpor
tsdat
aasplai
ntext(def
aul
t)
I
mpor
tsdat
atoParquetFi
l
es
-
-as-
par
quet
fi
le

· Howsqoopexpor
tusedt
owor
k?
Sameasimport

· Ifmysqoopexportj
obexecutes,
canIseethedatainmytar
getenvir
onment?
Yes
· IfIwantt
oseemydat ai
ntargetenv
ironmentdur
ingsqoopexpor
t,whati
stheopt
ion
y
ouneedtosettoachi
evei
t?

Spar
k

· Howsparkhandl
esthedat
aloadifi
tsr
unni
ngoutofmemor
y.
Itwi
l
lspil
li
ntodeskandpr
ocessthej
ob.

· Howspar
kut
il
izedat
afr
amet
ocommuni
cat
etohi
vedat
abase.

Est
abl
i
shi
ngConnect
ionBet
weenHi
veandSpar
kSQL
Letusbegi
nbyconnect
ingHi
vet
oSpar
kSQL.Wecanexecut
ethi
sbyf
oll
owi
ngt
hest
epsbel
ow:

St
ep1:Move 
hiv
e-si
te.xml
 f
rom $HI
VE_
HOME/
conf
/hi
ve-
sit
e.xml
 
t $SPARK_
o  HOME/
conf
/hi
ve-
si
te.
xml
.Makeanent r
yregardi
ng

met
ast
oreur
isi
nthi
sfi
l
e.Theent
rywi
l
llookl
i
ket
his:

St
ep2:Ext
ractal
l t
hedependenci
esf
orr
equi
redSpar
kcomponent
s(i
nthi
scaseSpar
kSQLand
Hi
ve)i
nthebuil
d.sbtf
il
e.

53|
Page
54|
Page

St
ep3:
Star
tal
lHadooppr
ocessesi
nthecl
ust
er.Ver
if
ythef
oll
owi
ng:

St
ep4: St
artMySQLbecauseHiveneedsitt
oconnectt
othemet
ast
ore 
andbecauseSpar
kSQL
wi
llal
soneeditwhenitconnect
stoHive

St
ep5: Runt
heHivemetast
oreprocesssothatwhenSpar
kSQLruns,
itcanconnectt
o
metast
oreur
isandtakef
rom i
tthehive-
sit
e.xml
fil
ementi
onedi
nthefi
rstst
ep.

54|
Page
55|
Page

Thestepsabovearet
oconfi
gureHi
veandSparkSQLsothatt
heycanwor
ktoget
her
.Nowwe
shal
ldiscussSpar
kSQLcodetoseehowitconnect
stoHi
ve.

· I
sther
eanyopt
iont
oupdat
eyourexi
sti
ngdat
afr
ame?
Yes

· Whatist
herol
eofYarni
nyourspar
kenv
ironment
.
Resour
ceall
ocat
ion

· Howwoul dyouexecut espar kjobs.


Spark-
submi t–mast eryarn-cl
uster
· Whatarethedi f
ferenttypesofoptionsavail
abl
einspar
ksubmi tjobs.
htt
ps:/
/spark.apache.org/docs/l
atest
/submit
ti
ng-
appl
icat
ions.html
example:
:;
.
/bi
n/spar
k-submi
t\

-
-cl
ass<mai
n-cl
ass>\

-
-mast
er<mast
er-
url
>\

-
-depl
oy-
mode<depl
oy-
mode>\

-
-conf<key
>=<v
alue>\

.
..#ot
heropt
ions

<appl
i
cat
ion-
jar
>\

[
appl
i
cat
ion-
argument
s]

· Whati
sther
oleofCont
ainer
s?
Whenr unni
ngSparkonYARN,eachSpar kexecut
orrunsasaYARN
cont
ainer.WhereMapReduceschedulesacontai
nerandfi
resupa
JVM foreachtask,
Spar
khostsmul t
iplet
askswithi
nthesame
cont
ainer

55|
Page
56|
Page

Hi
ve

116. Ext
ernal
andManagedtabl
es.Per
formancewisewhichi
sbet
teropt
ion.
Managedtabl
es,
becausehi
vehasfull
contr
olonit

117. Howbucket ingwi l


limprov
et heperformanceofmapreducejob.
Bucketedtablesall
owsmuchmor eeffi
cientsampli
ngthanthenon-bucketedt
abl
es.
Withsampl i
ng,wecant r
youtqueriesonasect i
onofdatafortest
inganddebugging
purposewhent heor i
ginaldat
asetsarev eryhuge.Her
e,theusercanfixthesi
zeof
bucketsaccordingtotheneed.

Bucket
ingconceptal
soprovi
desthefl
exi
bil
i
tyt
okeeptherecor
dsineachbuckett
o
besort
edbyoneormor ecolumns.Si
ncethedat
afi
lesar
eequal si
zedpar
ts,
map-
si
dejoi
nswillbefast
eronthebucket
edtabl
es.

118. Whatisstati
canddy
namicparti
ti
oni
nginhiv
e?
i
nstat
icpart
it
ioni
ngweneedtospeci
fythepar
ti
ti
oncol
umnv
aluei
neachandev
ery
LOADstat
ement .

supposewear ehavingpar
tit
iononcol
umncount
ryf
ortabl
et1(
useri
d,
name,occupat
ion,count
ry)
,soeachti
meweneedtoprovi
decount
ryval
ue

hi
ve>LOADDATAI NPATH' /
hdfspat
hofthefi
le'I
NTOTABLEt1
PARTITION(
countr
y="US"
)
hi
ve>LOADDATAI NPATH' /
hdfspat
hofthefi
le'I
NTOTABLEt1
PARTITION(
countr
y="UK"
)
dynamicpar
ti
ti
onallowusnottospeci
fypar
ti
tioncol
umnval
ueeacht
ime.t
he
approachwefol
lowsisasbelow:

creat
eanon-parti
ti
onedtablet2andinsertdat
aintoi
t.
nowcr eat
eatablet1par
tit
ionedonintendedcolumn(
saycount
ry)
.
l
oaddat aint
1from t2asbelow:

hi
ve>INSERTINTOTABLEt 2PARTI
TION(count
ry)SELECT*fr
om T1;
makesurethatpar
ti
ti
onedcol
umnisalwaysthelastonei
nnonpart
it
ionedt
abl
e(as
wearehavi
ngcountrycol
umnint2)

119. Howwoul
dyouenf
orcemapsi
dej
oin.
MapJoi
n

1.Byspeci
fyi
ngt
hekey
wor
d,/
*+MAPJOI
N(b)*
/int
hej
oinst
atement
.

2.Byset
ti
ngt
hef
oll
owi
ngpr
oper
tyt
otr
ue.

1
56|
Page
57|
Page

hive.auto.
convert
.j
oin=t
rue
Forper f
ormingMap-sidejoi
ns,
ther
eshouldbetwofi
les,
onei
soflargersi
zeandthe
otherisofsmallersi
ze.Youcansetthesmall
fil
esi
zebyusi
ngthefoll
owingpr
opert
y:

1
hiv
e.mapj
oin.
smallt
abl
e.f
il
esi
ze=(
defaul
titwil
lbe25MB)
Now,l
etusperfor
m Map-si
dejoi
nsandjointhetwodat
aset
sbasedont
hei
rIDs.

1
SELECT/*+MAPJOI
N(dataset2)*
/dataset1.
fi
rst
_name,
dat
aset1.
eid,dataset
2.ei
d
FROM dat
aset
1JOINdataset2ONdataset1.f
ir
st_
name=dataset
2.f
irst_name;

120. Whatistheroleofcombiners.
ht
tps:
//data-
fl
air
.t
raini
ng/bl
ogs/hadoop-
combiner
-t
utori
al/
121. Howshuf f
leandsortoperati
onwoulddegradetheperf
ormance.Whatar
ethe
necessar
yacti
onst otunethehiv
equer i
es.

122. Whati sclustermodeandcl ientmodei nspark?


Whent hejobget sexecut edwit
hcli
entmodedr i
verprogrunont hesamenodewher e
t
hejobissubmi tt
ed.
I
fsomeonedoki llorct r
l+zthesessionwillst
opduet othissparkjobget
s
f
ail
ed.Becauseofcl i
entfail
ure.
Clust
ermodedr i
verpr ogram r
unsonav ail
abledatanodeinitdoesn’tmatt
erwhere
t
hejobget sexecuted.
123. Hav eyoudeal twithanycompl exdatatypesli
kemap, str
uct,ar
rayet
c.,
in
whichscenarioitwillbeused.

 ar
rays:ARRAY<data_t
ype>
 maps: MAP<pri
miti
ve_t
ype,data_t
ype>
 st
ructs:STRUCT<col
_name: data_t
ype[COMMENTcol
_comment
],…>

Thefi
rstcomplext
ypeisanarr
ay.I
tisnothi
ngbutacol
l
ect
ionofi
temsofsi
mil
ardat
a
ty
pe.i
.e,anar
raycancontai
noneormor eval
uesoft
hesamedataty
pe.

CREATE  TABLE 
hiv
e_arr
ay _
table
(nameSt ri
ng,
sali
nt,
age ar
ray<small
i
nt>
)
ROW FORMATDELI MITED FIELDS 
TERMINATED 
BY 

,’
LINES TERMINATED BY ‘
\n’stor
ed 
AS t
extf
il
e;

Hiv
eMapdat atype i
sonet y
peofHi
  vecompl
exdat
aty
pesexample 
Iti
sanunor
dered
col
lect
ionofkey
-val
ue pai
rs.
Key
s mustbeofpr
imi
ti
vet
ypes.
Val
uescanbeofanyty
pe.

CREATE TABLE 
hiv
e_map_
tabl
e
(nameStri
ng,
sali
nt,
age ar
ray
<small
i
nt>
feelmap<st
ri
ng,
bool
ean>
)

57|
Page
58|
Page

ROW FORMATDELIMI TED FIELDS 


TERMINATED 
BY 

,’
map 
KEYS ter
minat
ed BY '
:'
LI
NES TERMINATED BY ‘
\n’stor
ed 
AS t
extf
il
e;

i
fyouobservedtheabovequer
yinmapf uncti
onfi
rstel
ementi
sstr
ing(
thatmeanspri
miti
ve
datat
ypes) 
andsecondelementisanytype(her
eweusedBoolean(Mi
scell
aneousDat
a
Type)
).Inqueryweused mapkeyster
minatedby sy
ntax.

Iti
sacoll
ect
ionofelement
sofdi
ff
erentt
ypes.
wecanuseanydatat
ypetospecif
ythi
s
str
uctdat
atype.
Elementsi
nSTRUCTtypeareaccessedusi
ngt
heDOT(.
)notat
ion.

CREATE  TABLE  hive_


struct_tabl
e
(nameSt ri
ng,
sali
nt,
addressstruct<city
:Stri
ng,state:
Stri
ng>
)
ROW FORMATDELI MITED  FIELDS TERMINATED 
BY 

,’
coll
ecti
onitemst erminated BY '
,'
LINES TERMI NATED  BY ‘\n’stor
ed AS t
extf
il
e;

ifyouobser
vedt heabovequer
yinstruct
 f
unct
ionfi
rstel
ementi
sstr
ingandsecond
elementisalsost
ri
ng( Thereisnor
estr
ict
ionswecanuseanyt y
peofdatat
ypesforst
ruct

.
Inqueryweused col
lectionit
emster
minatedby 
termi
natedbysynt
ax.

uni
ondat
aty
pe
UNI
ONTYPE<i
nt,
 
doubl
e,
 ar
ray
<st
ri
ng>,
 
str
uct
<a:
i
nt,
b:st
ri
ng>>

124. Howt oi
mplementcompressi
ontechni
quesi
nHivetabl
es.Whatar
ethe
di
ff
erenttypesofcompr
essionf
ormatsavail
abl
einhi
veandwhat’st
hedif
fer
ence
bet
weent hose.

125. What
sthepur
poseofl
ater
alv
iewexpl
odei
nhi
ve.

ht
tp:
//www.
eri
cli
n.me/
2013/
09/
how-
to-
use-
hiv
e-l
ater
al-
view-
in-
your
-quer
y/

126. Howt
ohandl
eJSON/
XMLf
il
efor
mat
s.

127. WhatisSerde.Whet
herSer
Dewi
l
lincr
easeordegr
adet
heper
for
manceof
y
ourmap-
reducej
ob?

128. Howwoul dyoudealwi


thperf
ormancebott
lenecksi
nHiveenvi
ronmenti
f
y
ourqueryi
srunningf
orlongti
me.
129.
130. I
nwhichscenari
oy ouwil
lpr
efermanagedandexter
nal
tablesandwhatst
he
r
eason?
131.
58|
Page
59|
Page

132. Howwouldyouenf
orcey
ourpar
ti
ti
onedt
abl
esar
enotsel
ect
ed/
readwi
thout
anywher
econdi
ti
ons?

133. Howbucket
ingusedt
owor
kinsi
depar
ti
ti
oneddi
rect
ory
.

134. Whatsthedi
ff
erencebetweenselect*andSel
ectcol
umn_
namesi
nhi
ve?
Howhi
veusedtowor
kbet weenthesescenar
ios.

135. Explanati
onofI
gni
teMapReduce?
136. Dif
ferencebwMapreducei
nmemor yandspar
kinmemor
y?howspar
kgi
ves
mor
eper
formancethanMR?

137. Howapar t
it
ionerwor
ks?Ty
pesofpar
ti
ti
oner
sinspar
k?Howt
owr
it
ea
cust
om par
ti
ti
oner?

ht
tps:
//acadgi
l
d.com/
blog/
par
ti
ti
oni
ng-
in-
spar
k/

138. Whati
stheout
putofcol
l
ect
ASMap(
)inspar
k?

col
lectAsMapwi
l
lret
urntheresul
tsforpai
redRDDasMapcol
lect
ion.Andsi
ncei
tis
ret
urni
ngMapcoll
ecti
onyouwi l
lonl
ygetpair
swit
huni
quekey
sandpai rswi
th
dupli
catekey
swil
lberemoved.

139. Howt
oseet
hest
atusofasqooopj
ob?

$sqoopj
ob-
-showmy
job

140. Howt
oloadhi
vet
abl
efr
om r
dmsi
npar
quetf
ormatusi
ngsqoop?

sqoopi
mpor
t--
connectj
dbc:
mysql
:
//xx.
xx.
xx.
xx/
dat
abase\

-
-user
nameuser
name-
-passwor
dmy
pass\

-
-quer
y'SELECTpage_
id,
user
_idFROM pages_
user
sWHERE$CONDI
TIONS'
--
spl
i
t-
bypage_i
d\

-
-hi
ve-
impor
t--
hiv
e-dat
abasedef
aul
t--
hiv
e-t
abl
epages_
user
s3\

-
-t
arget
-di
rhi
ve_
pages_
user
s--
as-
par
quet
fi
le

141. Whati
sther
oleofcl
ust
ermanageri
nspar
kjobexecut
ion?

142. I
ssqoopcreat
ingsqlquer
yint
ernal
l
y?I
fyes,
thenHowi
tisget
ti
ng
cr
eat
edandexecutedformult
ipl
emapper?
Absolut el
y,Sqoopi sbuildi
ngaSQLquer y(actuall
yonef oreachmapper )to
thesour cetableitisingesti
ngintoHDFSf r om.Thenumberofmapper s
(defaultisfour,
buty oucanov er r
ide)leveragethesplit
-bycol umnand
basicallySqoopt r
iest obuil
dani ntelli
gentsetofWHEREcl ausessothateach
ofthemapper shav eal ogi
cal"sli
ce"oft het ar
gettabl
e.Asandexampl e,ifwe
usedt hreemapper sandaspl i
t-bycol umnt hatisanintegerwi t
hrangesfrom
0to1, 000,000fortheact ualdata( i
.e.sqoopcandoapr ett
yeasymi nand
maxcal lt
ot heDBont hespli
t-
bycol umn),thenSqoopf i
rstmapperwoul dtry
togetv alues0-333333, thesecondmapperwoul dpul
l333334- 666666,and
59|
Page
60|
Page

t
hel
astwoul
dgr
ab666667-
1000000.
143. I
ssqoopusinganyst
agi
ngnodetoloadthedata?Orissqoopl
oadi
ng
dat
adi
rect
lyi
ndat
anode?Howi tbehav
esfordif
fer
entmapper?
Nope, Sqoopisr unni
ngamap- onl
yjobwhi cheachmapper(3i nmyexampl e
above)runningaquer ywithaspecifi
crangetopr ev
entanyki
ndofov er
lap.
Themappert henjustdropsthedatainthetar
get-di
rHDFSdirectorywit
haf i
le
namedpar t-m-00000(well,t
he2ndonendswi t
h00001andt he3r doneends
wit
h00002) .Thecomposi t
eexportisrepr
esentedbythetar
get-dirHDFS
di
rectory(basicall
yfol
lowstheMapReducenami ngschemeoff i
les).
144. Howsqoopr
unpar
all
elf
ormul
ti
plemapper?
Ans:
118

145. Howdat
aisspl
i
tint
opar
tfi
l
esi
nsqoop?

Ther
ear
esequenceofst
epsi
nvol
vedi
nit
.

Sqoopj
obReadr
ecor
ds.v
iaDBRecor
dReader

or
g.apache.
sqoop.
mapr
educe.
db.
DBRecor
dReader

Twomet
hodswi
l
ldowor
kher
e.

met
hod1.

pr
otect
edResul
tSetexecut
eQuer
y(St
ri
ngquer
y)t
hrowsSQLExcept
ion{

I
ntegerf
etchSi
ze=dbConf
.get
Fet
chSi
ze(
);

/
*getf
etchSi
zeaccor
dingt
ospl
i
twhi
chi
scal
cul
atedv
iaget
Spl
i
ts(
)met
hodof

or
g.apache.
sqoop.
mapr
educe.
db.
DBI
nput
For
mat
.Andno.ofspl
i
tsar
ecal
cul
ated

v
iano.of(
countf
rom t
abl
e/no.ofmapper
s).*
/

Spl
i
tCal
cul
ati
on:
-

or
g.apache.
sqoop.
mapr
educe.
db.
DBI
nput
For
mat

publ
i
cLi
st<I
nput
Spl
i
t>get
Spl
i
ts(
JobCont
extj
ob)t
hrowsI
OExcept
ion{

.
..
..
..
//her
espl
i
tsar
ecal
cul
atedaccr
odi
ngt
ocountofsour
cet
abl
e

.
..
..
..
quer
y.append(
"SELECTCOUNT(
*)FROM "+t
abl
eName)
;

met
hod2.

pr
otect
edSt
ri
ngget
Sel
ect
Quer
y(){
60|
Page
61|
Page

i
f(dbConf
.get
Input
Quer
y()==nul
l
){

quer
y.append(
"SELECT"
);

f
or(
inti
=0;
i<f
iel
dNames.
lengt
h;i
++){

quer
y.append(
fi
eldNames[
i]
);

i
f(i
!=f
iel
dNames.
l
engt
h-1){

quer
y.append(
","
);

quer
y.append(
"FROM "
).
append(
tabl
eName)
;

quer
y.append(
"AS"
).
append(
tabl
eName)
;

i
f(condi
ti
ons!
=nul
l&&condi
ti
ons.
lengt
h()>0){

quer
y.append(
"WHERE(
").
append(
condi
ti
ons)
.append(
")"
);

St
ri
ngor
der
By=dbConf
.get
Input
Order
By(
);

i
f(or
der
By!
=nul
l&&or
der
By.
lengt
h()>0){

quer
y.append(
"ORDERBY"
).
append(
order
By)
;

}el
se{

/
/PREBUI
LTQUERY

quer
y.append(
dbConf
.get
Input
Quer
y()
);

t
ry{
//mai
nlogi
ctodeci
dedi
vi
sionofr
ecor
dsbet
weenmapper
s.

quer
y.append(
"LI
MIT"
).
append(
spl
i
t.get
Lengt
h()
);

quer
y.append(
"OFFSET"
).
append(
spl
i
t.get
Star
t()
);

}cat
ch(
IOExcept
ionex){

/
/Ignor
e,wi
l
lnott
hrow.

61|
Page
62|
Page

r
etur
nquer
y.t
oSt
ri
ng(
);

checkoutforcodesect i
onundercomment smainlogicto..
..
..
.Hererecor
dsare
divi
dedaccordingtoLIMITandOFFSET.Andt hi
slogicisimplementeddiff
erentl
yfor
everyRDBMS.j ustlookf
ororg.
apache.
sqoop.mapreduce.db.Oracl
eDBRecordReader
i
thasl i
tt
ledi
fferentimpl
ementati
onofgetSel
ectQuery()method.

146. Hi
vewi
ndowf
unct
ionexampl
e?

ht
tp:
//dwgeek.
com/
hadoop-
hiv
e-anal
yti
c-f
unct
ions-
exampl
es.
html
/

sel
ectpat
_id,

dept
_id,

count
(*)ov
er(
par
ti
ti
onbydept
_idor
derbydept
_idasc)aspat
_cnt

f
rom pat
ient
;

147. Whati
sdef
aul
tpar
ti
ti
oni
nhi
veandhowwecandel
etei
t?

__HIVE_DEFAULT_PARTITIONitget
screatedaut omat
ical
l
ywhenthenull
valuesinsour
cedataifweareapplyi
ngpar t
it
iononanycolumnandallt
he
nullval
ueswil
lbewrit
tentonullpar
ti
ti
onst odeletet
hem wecandir
ect
lygo
tofolderanddel
etepart
it
ionandexecutebelowcommand

Msckr
epai
rtabl
etabl
etabl
ename

Orwithalt
erst
atementi
fthepar
ti
tioncol
umnparti
ti
oncol
umnt
ypei
sint
changeitt
ostri
nganddropasbelowandrev
ertback.

ALTERTABLEt
estPARTI
TIONCOLUMN(
p1st
ri
ng)
;

-
-remov
ethedef
aul
tpar
ti
ti
on

ALTERTABLEt
estDROPPARTI
TION(
p1='
__HI
VE_
DEFAULT_
PARTI
TION_
_')
;

-
-thenr
ever
tthecol
umnbackt
o"i
nt"t
ype

ALTERTABLEt
estPARTI
TIONCOLUMN(
p1i
nt)
;

148. Whati
sspar
kli
neage?

Thelogi
calexecuat
ionpl
anwhi
chi
sDAGf
lowconsi
stsofst
ages,
tasksand
act
ionstr
ansformati
on

Fi
rstact
ionswi
l
lbeper
for
medandt
hent
ransf
ormat
ionwhi
chhasno

62|
Page
63|
Page

dependencyonot
hert
ransf
ormat
ion.

I
fanyoneofpar
ti
ti
oni
slost

RDDthatwasusedtocr
eatethepart
it
ioni
snotinmemor
yany
mor
eithas
tobel
oadedagainf
rom di
skandrecomputed.
Spar
kwi
l
lhav
etogoonest
epbackagai
nandr
ecomput
ethepr
evi
ous
RDD.

Ifyouarehavinglongl i
neagechainsliket heonedescribedabov
eas
the wor
stcasescenar io t
hatmightmeanl ongr e-
computati
onti
mes,
that'
swheny oushoul dconsiderusing checkpointi
ng 
whichstor
es
i
ntermediateresul
tsinr el
iabl
estorage( l
ikeHDFS)whi chwouldprev
ent
Sparkfrom goingallthewaybackt ot heor i
ginaldat
asourceandusethe
checkpoint
eddat ainstead.
149. Reasonf
orspar
kov
erheadexcept
ions?

Whenwedosparksubmi
tresourcesgetal
locat
edandwement i
onthe
numberpar
ti
ti
onst
ogetdi
videthedataint
ochecksanddi
str
ibut
ethem .

Ifanyoneofthepart
it
iont
akesmoret
han2gbofmemor
yandi
tthr
owsan
excepti
onsassparkmemoryover
head

spar
k.y
arn.
execut
or.
memor
yOv
erhead

i
fthispropert
yissettoli
mitedandspar
kjobcrosst hi
sli
mit.wewil
l poi
ntt
o
sparkoverheadexcept
ionsincej
avaheapsizeandexecutormemor y
proper
ti
esweneedt ocheckhowtheyareconfigur
edinordertoachievewe
candir
ectlypassexecutormemorymorewhi l
edoingsparksubmitjob

spar
k.execut
or.
ext
raJav
aOpt
ions='
-
Xmx24g'

spar
k.y
arn.
am.
memor
yOv
erhead-
--
-AM memor
y*0.
10,
wit
hmi
nimum of384

150. Hi
veexecut
ionmodes?

Local
---
--
--
--
--met
ast
orei
nonehi
veser
verandst
oragewi
l
lbei
nonehi
ve
ser
ver

Remote--
--
--
--met
ast
orer
unsi
nit
sownj
vm andst
oragewi
l
lber
emot
eli
ke
ext
ernal
database

Embedded-
--
-metast
oreandst
orageembeddedi
nonehi
veser
veronl
y
si
ngl
euseratat
ime
63|
Page
64|
Page

Defder
bydat
abase-
--
-usesJDBCdr
iverf
orext
ernal
connect
ion

151. Spar
kwi
thJav
afastorpy
thon?

Jav
aismor
efast
ert
hanpy
thon

Si
ncei
thasbui
l
tinf
unct
ionsandabl
etoper
for
m ef
fi
cient
ly.

Forpy
thonwi
l
lhav
etoconv
ertt
hem t
opy
thonr
eadabl
eitt
akest
ime.

152. Howt
osav
espar
kdfdat
aint
osql
ser
ver
(ext
ernal
ser
ver
)

/
/jdbcmy
sql
url
-dest
inat
iondat
abasei
snamed"
dat
a"

v
alur
l="
jdbc:
mysql
:
//l
ocal
host
:3306/
dat
a"

/
/dest
inat
iondat
abaset
abl
e

v
alt
abl
e="
sampl
e_dat
a_t
abl
e"

/
/wr
it
edat
afr
om spar
kdat
afr
amet
odat
abase

df
.wr
it
e.mode(
"append"
).
jdbc(
url
,tabl
e,pr
op)

153. whi
chi
sbet
teror
corpar
quet
?

ORCwi
thsnappygi
vesbet
tercompr
essi
onandr
eadper
for
mance.

154. Howt
ocheckl
i
stofdat
anodesi
ncommandl
i
ne?
sudo-
uhdf
shdf
sdf
sadmi
n-r
epor
t

155. Df
sadmi
nusecase?

I
tsl
i
ker
oot

Wecanabl
etodoal
ladmi
nrel
atedt
hingsusi
ngt
his.

156. I
sitpossi
blet
oov
erwr
it
eHDFSdi
rect
oryaut
omat
ical
l
yinst
eadof
over
wri
ti
nginsqoop?
sqoopimpor
t--
connectj
dbc:mysql
:/
/local
host/
dbname--usernameuser
name-
P--
tabl
e
tabl
ename--
del
ete-
tar
get-
dir-
-t
arget
-dir'
/t
arget
dir
ect
ory
pat h'
-m1
157. Wor
dcountusi
nghi
ve?

SELECTowner
_key
,wor
d,

count
(*)
FROM st
ackdat
a_updt
d

LATERALVI
EW expl
ode(
spl
i
t(l
ower
(post
),'
\\
W+'
))t
1ASwor
d

64|
Page
65|
Page

GROUPBYowner
_key
,wor
d;

158. Di
ffbwgr
oupbyandsor
tby
?

The 
ORDERBY 
clause’
spur
posei
stosor
tthequer
yresul
tbyspeci
fi
c
col
umns.
The 
GROUPBY 
clause’
spur
posei
s summar
izeuni
quecombi
nat
ions 
of
col
umnsv
alues.

159. Hi
vedefser
de?
Lazy
Simpl
eSer
De

160. Hi
vedefdel
i
mit
er?

Ct
rl del
A( i
mit
er=<Ct
rl
+A>='
\001'
)
161. Gr
oupbyandcogr
oupdi
ffi
npi
g?

The COGROUP operatorwor
ksmoreorlessinthesamewayas
the 
GROUPoperator.Theonl
ydif
fer
encebetweenthetwooper
ator
sist
hat
the 
groupoper
atori
snor mal
l
yusedwithonerelat
ion,
whil
e
the 
cogroup 
operat
orisusedinst
atementsi
nvolvi
ngtwoormore
rel
ati
ons.
162. Loadaf
il
eusi
ngpi
g.

par
t1=LOAD' /
user/dev
user
/sampl
e.t
xt'
USI
NGPi
gSt
orage(
',
'
)as(
pal
an:
char
arr
ay,
i
d:i
nt,
loc:
char
array
);

t
ocheckr
esul
tonconsol
edumppar
t1

163. Howt
ojoi
ninpi
g

a. Resul
t=j
oincust
omer
1bykey,
cust
omer
2bykey
;

b.I
fit
slef
tout
err
esul
t=j
oincust
omer
1bykeyl
eftout
er,
cust
omer
2bykey
;

164. Whati
sequi
joi
n?

I
nnerj
oini
sref
err
edasequi
joi
n

Whent
her
ear
emat
chedr
owsi
nbot
htabl
esbasedonpr
imar
ykey

Ther
esul
tswi
l
lber
eff
eredasi
nnerj
oinr
esul
ts.

165. Howt
odouni
oni
npi
g?

Rel
ati
on_
name3=UNI
ONRel
ati
on_
name1,
Rel
ati
on_
name2;
166. Whati
sfi
l
terandhowt
ouseri
npi
g?
65|
Page
66|
Page

The 
FILTER 
operat
ori
susedt
osel
ectt
her
equi
redt
upl
esf
rom ar
elat
ionbased
onaconditi
on.

f
il
ter
_dat
a=FI
LTERst
udent
_det
ail
sBYci
ty=='
Chennai
'
;

167. For
eachst
atementusagei
npi
ng?

The 
FOREACH 
operat
ori
susedt
ogener
atespeci
fi
eddat
atr
ansf
ormat
ionsbased
onthecol
umndat
a.

f
oreach_
dat
a=FOREACHst
udent
_det
ail
sGENERATEi
d,age,
ci
ty;

168. Or
derbyoper
atori
npi
g?

The ORDERBY oper
atori
susedtodispl
ayt
hecont
ent
sofar
elat
ioni
nasor
ted
orderbasedononeormorefi
elds.

gr
unt
>Rel
ati
on_
name2=ORDERRel
ati
n_name1BY(
ASC|
DESC)
;

169. Howt
ost
orei
npi
g?

STOREst
udentI
NTO'
hdf
s:/
/local
host
:9000/
pig_
Out
put
/'USI
NGPi
gSt
orage(
',
'
);

170. Whati
stokeni
zer
?

The TOKENIZE() 
funct
ionofPigLat
inisusedt
ospl
itastr
ing(
whichcont
ainsa
groupofwords)inasingletupl
eandretur
nsabagwhichcont
ainst
heoutputof
thespli
toperat
ion.

st
udent
_name_
tokeni
ze=f
oreachst
udent
_det
ail
sGener
ateTOKENI
ZE(
name)
;

gr
unt
>Dumpst
udent
_name_
tokeni
ze;
(
{(Rajai
v )
,(
Reddy )}
)
(
{(si
ddar t
h),(
Battachary
a)})
(
{(Rajesh),
(Khanna)})
(
{(Preethi
),(
Agarwal )
})
(
{(Trupthi)
,(
Mohant hy)}
)
(
{(Archana),(
Mishra)})
(
{(Komal ),
(Nayak)})
(
{(Bharathi)
,(
Nambi ayar)
})

171. Fl
att
enoper
ator
?
FLATTEN 
oper
atori
susedt
oel
i
minat
enest
ing.

172. pi
gpar
all
eli
zam?

Gener
all
ytherewil
lbeoner
educerf
orever
yfourmappers.
ift
her
earemoremapper
s
andonereducerwecannotabl
etoachi
eveparal
l
eloper
ati
onwithi
nsti
pul
atedt
ime
66|
Page
67|
Page

per
iod.

Forbet
terper
for
mancebyi
ncr
easi
ngdef
aul
treducer
swecanachi
evebet
ter
perf
ormance
SETDEFAULT_
PARALLEL5;
173. di
ffbw–i
ncr
ement
alappendandi
ncr
ement
all
ast
modi
fi
ed

 append 
:
-iti
susedwhenrowsi nasour
cet
ablei
nDBgetinser
tedr
egul
arl
yandthe
tabl
emusthaveanumer i
cpr
imarykey
,ifnott
henanumer
ic–spl
i
t-bycol
umnthati
susedin
absenceofnumer
icpri
marykey
.Fore.g

$sqoopimport–connectj
dbc:/
/my sql
:
/local
host/
DB_name–userusername–passwor
d
pasword–tabl
etablename–incrementalappend–check-
col
umncolname–last
-val
ue100

 last
modif
ied 
:
-iti
susedwhenr owsinasourcet
abl
einDBgetupdat
ed r
egular
lyand
thet
abl
emusthav eanumericpri
marykey,i
fnott
henanumeri
c–spl
it
-bycol
umnt hati
s
usedi
nabsenceofnumericpri
marykey.Fore.
g

$sqoopimport–connectj
dbc:
//my sql
:
/local
host
/DB_name–usernameuser name
–passwordpasword–tabl
etablename–i ncr
emental
last
modif
ied –check-
columncol
name
–last
-val
ue“
yyyy-
mm–dd0000: 00:0”

174. whati
smer
gekeyi
nsqoop?

Ther
ear
e2way
stousemer
gekey

SqoopI
mpor
tmer
gekey

Sqoopmer
gekey

Toupdat
etar
gettabl
erecor
dswecanusemer
gekeyt
o
updat
eandappendnewrecords

175. Lat
eral
viewexpl
odeexampl
e?

Tospl
i
tar
rayl
i
sti
ntomul
ti
plel
i
st
SELECTqI
d,cI
d,vI
dFROM answer
LATERALVI
EW expl
ode(
vIds)v
isi
torASv
Id
WHEREcId=2

67|
Page
68|
Page

176. 2and3r
ecor
dsoutof10usi
ngsql
quer
y?
SELECTROW_
NUMBER(
)OVER(
ORDERBYPr
oduct
s.Pr
oduct
ID)FROM Pr
oduct
s;

Wher
erownumber=2andwher
erownumber=3

177. whati
s–di
recti
nsqoop?
obeshortandpreci
se,
it
sthemodef
orf
asti
mpor
twhi
chdoesn'
trunsany
mapper
sorr educer
s.

sqoopi
mpor
t--
connectj
dbc:
mysql
:
//db.
foo.
com/
cor
p--
tabl
eEMPLOYEES-
-di
rect

178. Howt
omakear
rayl
i
st?

concat
_ws(
',
'
,col
l
ect
_li
st(
tri
m(pcms_
product
_code)
))\

asconcat
enat
ed_
pcms_
codes

casewhenar
ray
_cont
ains(
spl
i
t(t
.concat
enat
ed_
pcms_
codes,
'
,'
),
'
ECMN'
)\

t
hen'
MULTI
LINE'
\

el
se\

casewhen
ar
ray
_cont
ains(
spl
i
t(t
.concat
enat
ed_
pcms_
codes,
'
,'
),
'
ECMN1'
)\

t
hen'
MULTI
LINE'
\

el
se\

casewhen
ar
ray
_cont
ains(
spl
i
t(t
.concat
enat
ed_
pcms_
codes,
'
,'
),
'ELS'
)\

t
hen'
MULTI
LINE'
endendend,
\

178. Or
derbyandsor
tby
?

Or
derby
::Usi
ngpr
imar
ykeywecandot
heascendi
ngordescendi
ngt
he
r
ecor
ds

Sor
tby:
:wor
ksatr
educerphaseusi
ng

179. Mul
tidel
i
met
erser
deusecase:

createtabl
etest
1(f1int,f2int
,f3int)r
owf or
matserde
'
org.apache.
hadoop.hi
ve.contr
ib.
serde2.Mult
iDel
imi
tSerDe'
wit
h
serdeproper
ti
es("f
iel
d.deli
m"=" [
,\
;]"
)storedastext
fi
le;

68|
Page
69|
Page

180. hi
vewi
thor
cvsspar
kwi
thpar
quet
?

 Hivehasav ectori
zedORCr eaderbutnov ect
orizedparquetr
eader.
 Sparkhasav ectori
zedparquetreaderandnov ector
izedORCr eader
.
 Sparkperf
ormsbestwi thparquet,hi
veperfor
msbestwi thORC.
181.
Broadcast?

Broadcastvar
iabl
esareusedtosavethecopyofdataacrossal
lnodes.Thi
s
v
ari
abl
ei scachedonallthemachinesandnotsentonmachi neswit
htasks.The
f
oll
owi
ngcodebl ockhasthedet
ail
sofaBroadcastcl
assf
orPySpark.
cl
asspy spar
k.Broadcast(
sc=None,
value=None,
pickl
e_regi
stry=None,
path=None
)

The foll
owing example shows how to use a Br
oadcastvar
iabl
e.A Broadcast
vari
ablehasanat tr
ibut
ecalledv
alue,whi
chstoresthedataandisusedt
or et
urna
broadcastedval
ue.

182.Accumul
ator
?

Accumulat
orvar
iabl
esareusedf oraggr
egat
ingt
hei nf
ormati
onthroughassoci
ati
veand
commutati
veoperat
ions.Forexample,y
oucanuseanaccumul at
orforasum operat
ion
orcount
ers(i
nMapReduce) .Thefol
lowi
ngcodeblockhasthedetai
lsofanAccumulator
cl
assforPySpar
k.

cl
asspy
spar
k.Accumul
ator
(ai
d,v
alue,
accum_
par
am)

Thefollowingexampleshowshowt ouseanAccumulat
orvari
able.AnAccumulator
var
iablehasanat t
ri
butecal
ledval
uethati
ssimil
art
owhatabr oadcastvari
abl
ehas.
I
tstorest hedataandisusedtoret
urntheaccumul
ator
'sv
alue,butusableonl
yina
dri
verpr ogr
am.

Inthi
sexampl
e,anaccumul
atorv
ari
abl
eisusedbymul
ti
plewor
ker
sandr
etur
nsan
accumul
atedv
alue.
--
---
---
---
--
----
--
--
--
---
--
--
---
--
----
-accumul at
or.py
---
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
-
from py sparkimpor tSpar kCont ext
sc=Spar kContext("
local","Accumulatorapp")
num =sc. accumul ator(10)
deff (x)
:
global num
num+=x
rdd=sc. parall
eli
ze([
20,30, 40,50])
rdd.foreach(f)
fi
nal =num. val
ue
print"Accumul atedv aluei s->%i "%(fi
nal)

69|
Page
70|
Page

-
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
-accumul
ator
.py
---
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
-

Command 
−Thecommandf
oranaccumul
atorv
ari
abl
eisasf
oll
ows−
$SPARK_
HOME/
bin/
spar
k-submi
taccumul
ator
.py

Out
put
 −Theout
putf
ort
heabov
ecommandi
sgi
venbel
ow.
Accumul
atedv
aluei
s->150

184. Rankanddenser
ankdi
ff
erence?
wi thqas(
select10dept no, 'rr
r'empname, 10000.00salfr
om dualunionall
select11,'nnn',20000. 00f r
om dual unionall
select11,'mmm' ,5000. 00from dual unional
l
select12,'kkk',30000f rom dual unionall
select10,'ff
f'
, 40000f rom dual unionall
select10,'ddd',40000f rom dualunional l
select10,'bbb',50000f rom dualunional l
select10,'xxx',nullfrom dual uni
onal l
select10,'ccc',50000f rom dual )
selectempname, dept no, sal
,rank(
)ov er( partit
ionbydept noorderbysalnull
sfi
rst)r
,dense_rank( )ov er(par t
it
ionbydept noorderbysalnull
sfir
st)dr
1
,dense_rank( )ov er(par t
it
ionbydept noorderbysalnull
slast)dr
2
from q;

EMP DEPTNO SAL R DR1 DR2


--
---
--
--
--
----
--
--
--
---
--
--
--
--
---
--
--
--
---
--
--
---
--
xxx 10 1 1 4
rr
r 10 10000 2 2 1
ff
f 10 40000 3 3 2
ddd 10 40000 3 3 2
ccc 10 50000 5 4 3
bbb 10 50000 5 4 3
mmm 11 5000 1 1 1
nnn 11 20000 2 2 2
kkk 12 30000 1 1 1

9r
owssel
ect
ed.

180. Reducebykeyusecaseandexpl
anat
ion?
Reducebykeywor
ksonl
yforRDD’
sandi
t’
sat
ransf
ormat
ionoper
ati
on
Onl
ywor
kswi
thkeyv
aluepai
rs
Basedonkeyitshuf
fl
esthedat
atouni
quecombi
nat
ionandsumsupandpr
oduce
t
her
esul
tasYasbelowscreenshot
.

70|
Page
71|
Page

181. Spar
kecosy
stem?

182. cl
i
entmodev
scl
ust
ermodei
nspar
k?
cli
entmode- -dri
verrunsonsamenodewher ewear eexecut
inganduseswor kernode
resources-
-needexternalmonitr
oingsyst
em t
omonitornootheropti
onifprocessdi
es
clust
ermode- -dri
versrunsononeoft heworker
snodes.canmonitoredusingoneofthemaster
node--ssupervisef
lagifprocessdies

181. Pai
rrddandaggr
egat
ions?

Worki
ngwit
hK,Vpairinputandoutputdat
ainSpar
k
Appl
yi
ngaggregat
ionsonK,Vpairs

71|
Page
72|
Page

DemoSt
eps:

I
nt hi
sdemo, youwi
lll
oadt
hedataset,comput
eandfi
ndthenumberofuni
que
occurenceofeachr
ecordi
nthedatasetusi
ngSpar
kscal
a

1.Ty
pei
nspar
k-shel
ltoopent
hescal
ashel
lofSpar
k
2.Ty
pei
nthebel
owcodel
i
nebyl
i
nei
ntheshel
l

3. v
all
i
nes=sc.
text
Fil
e(“
rout
erLog.
tsv"
)
4. v
alpai
rs=l
i
nes.
map(
s=>(
s,1)
)
5. v
alcount
s=pai
rs.
reduceBy
Key
((a,
b)=>a+b)
6. count
s.col
l
ect
.f
oreach(
pri
ntl
n)
7. count
s.col
l
ect
.sav
eAsText
Fil
e(“
/hdf
spat
h/my
uni
quedi
rect
ory
”)
8.

9.  
Youcanf
indt
heout
putas(
recor
d,numberoft
imesofoccur
rence)asout
put

181. howt
oel
i
minat
edel
i
mter
sinbet
weenv
alues?

Repl
acedel
i
mit
erwi
thex:
:\twi
th\
\ti
nfi
l
es

Thencr
eat
etabl
easbel
ow.

182.Basepat
hexampl
e?
DataFr amedf=hi veContext.
read()
.f
ormat
("
orc"
).
opt
ion("basePath"
,"path/t
o/table/
").
l
oad("path/t
o/tabl
e/entit
y=xyz")
72|
Page
73|
Page

t
oloadspeci
fi
cdat
apar
ti
ti
oni
nspar
k.

1. Regi
stert
hej
arf
rom hi
veconsol
e
hi
ve>addj
ar/
pat
h/t
o/csv
-ser
de-
1.1.
2-0.
11.
0-al
l
.j
ar;
2. Cr
eat
eat
abl
ewi
tht
hespeci
fi
edser
deandcust
om pr
oper
ti
es
hi
ve>cr eatetabletest_t
abl
e(c1str
ing,c2st
ri
ng,c3str
ing,
c4st
ri
ng)
>rowf ormatser de'com.bi
zo.hi
ve.
serde.
csv.
CSVSerde'
>withser depropert
ies(
>"separ ator
Char "="\t
",
>"quot eChar" =" \
"",
>"escapeChar " ="\\"
>)
>storedast extfi
le;
3. Loady
ourdat
aseti
ntot
het
abl
e:
hi
ve>l
oaddatai
npat
h'/pat
h/to/f
il
e/i
n/hdf
s'i
ntotabl
etest
_tabl
e;
4. Doa  sel
ect*f
rom t
est_
table 
tocheckt heresults

184. Av
rov
sPar
qetdi
ff
erence?

Av
roi
susedf
orev
ent
lydr
ivendat
a(wr
it
e)

Par
queti
sforanyki
ndofdat
a(r
ead)

Av
rosuppor
tsschemaev
olut
ion(
wecanchangeschemady
nami
cal
l
y)

Foral
lot
herf
ormat
siti
snotpossi
ble

Av
rosuppor
tsmul
til
anguagesuppor
t

185.Hor
izont
alscal
i
ngandv
ert
ical
scal
i
ng?

Addi
ngspacei
nmachi
nei
scal
l
edv
ert
ical
machi
ne

addi
ngnodei
ncl
ust
eri
scal
l
edhor
icont
alscal
i
ng

186.di
ff
erencebet
weenhashbasedpar
ti
ti
onandr
angebasedpar
ti
ti
on?

185. countofcal
ldur
ati
on?

t
abl
aA

mobi
l
enumber|
cal
ldur
ati
on

sel
ectmobi
l
enumber
,sum(
cal
l
_dur
ati
on)f
rom t
abl
eagr
oupbymobi
l
enumber
;

73|
Page
74|
Page

186. ,
del
i
mit
eri
nval
ueshowt
ohandl
e?

ht
tps:
//communi
ty.
hor
tonwor
ks.
com/
quest
ions/
134846/
hiv
e-escapi
ng-
fi
eld-
del
i
mit
er-
in-
column-
val
ue.
html

cr
eat
etabl
emov
ies(
mov
iei
dint
,
t
it
leSt
ri
ng,
genr
est
ri
ng)
ROW FORMATSERDE'
org.
apache.
hadoop.
hiv
e.ser
de2.
OpenCSVSer
de'

190. Howt
oincr
easespar
kexecut
ormemor
y?

i
ncr
easi
ngmemor
yusi
ngexecut
orsi
nspar
kwhi
l
espar
ksubmi
t

spark-submi t--
mast eryarn-cl
uster-
-num-execut
ors4--executor
-memory8g--
driv
er-memory
4g--executor -
cores2- -
confspar k.
yarn.
executor
.memor yOver
head=1024
create_fb_
Nor mali
zed_layer_cpi
.py
191. mer gingpartrfil
esi nHadoopt oonesinglef
il
e?
Usingspar k:;
sc.t
extFil
e("hdfs:
//.
..
..
/par
t*)
.coal
esce(
1).saveAsText
Fil
e("
hdf
s:/
/..
..
./
fil
ename)
UsingHadoopj ar

$hadoopjar/usr
/hdp/
2.3.
2.0-
2950/
hadoop-
mapr
educe/
hadoop-
str
eami
ng-
2.
7.1.
2.3.
2.0-
2950.j
ar\

-
Dmapr
ed.
reduce.
tasks=1\

-
input"
/hdf
s/i
nput
/di
r"\

-
out
put"
/hdf
s/out
put
/di
r"\

-
mappercat\

-
reducercat

spar
k-submit--
masteryar
n-cl
ust
er--
num-execut
ors4--
execut
or-
memor y8g--
driv
er-
memor
y
4g--
executor-
cores2--
confspar
k.y
arn.
executor
.memoryOver
head=1024sample.
py

74|
Page
75|
Page

spar
k-submi
t--
mast
ery
arn-
clust
ersampl
e.py

Hi
veserver1andserver2dif
fer
nceandwhyit
sf ast
::
:;
ht
tps:
//hadoopabcd.
wordpr
ess.
com/2016/
10/26/hi
ve-ar
chi
tect
ure/
#mor
e-1991

Jav
aInt
erv
iewQuest
ions:

1.Jav
adesi
gnpat
tens?

2.Oopsconcept
s?

3.Pol
ymor
phi
sam ?

75|
Page
76|
Page

Ther
ear
etwot
ypesi
n

Compi
l
eti
mepol
y

Runt
imepol
y

Compi
l
eti
me:
:
:met
hodov
erl
oadi
ng

Met
hodoverl
oadi
ng:
::
:onemethodprev
ent
stheoccur
anceofanot
her
met
hodbasedmethodsignat
ure(
li
kear
gument
s)

Pol
ymor
phi
sam:
:
:anObj
ectr
efer
encecanhol
dmul
ti
pleobj
ect
s.

Pol
ymor
phi
sm 
i
stheabi
l
ityofanobj
ectt
otakeonmanyf
orms

4.Inher
ence:::
usi
ngonecl
asspr
oper
ti
esi
nant
hercal
ssbyusi
ngkey
wor
d
call
edextend.
Di
fft
ypesofi
nher
it
ance:
Si
ngl
einher
i
Mul
ti
lev
eli
nher
i
Mul
ti
plei
nher
i
Hy
bri
dinher
i
5.Encapsul
ati
on:
:
:Wr
appi
ngdat
amember
sanddat
amet
hodsi
nasi
ngl
euni
t.

Wecanachi
vet
hisconceptusi
ngcl
asses
Cl
ass:
:
:Acl
asscanbedef
inedasat
empl
atewhi
chcanhol
dmet
hodsand
v
ari
abl
es.

6.I
nter
face:
:
::Publ
i
cst
ati
cfi
nal
abst
actmet
hods

 I
thaspubl
i
cst
ati
cfi
nal
met
hodsandv
arai
l
bles

 Toachi
evemul
ti
plei
nher
it
ance

 Wi
thcl
assesi
tisnotpossi
ble

abstactmet
hods::
:Anabst
ractmethodhasnobody
,whi
chcanbecal
l
edat
runt
imebyextendi
ngt
hati
nter
face.
publ
i
cabst
racti
ntmy
Met
hod(
intn1,
intn2)
;

i
nterf
ace(stat
ickeywor
d:::
whenyoudefi
nei
tasstati
c):
::
:Wecanabletocal
l
al
lthemethodsusingclassname(pul
i
cstat
icf
inal
)withoutcr
eat
ingany
obj
ectsi
nsideanthercl
ass..

7.Abst
ractcl
ass:
:

Abst
ractcl
asshasabst
ractmet
hodsandnonabst
ractmet
hods

Nor
mal
classhasnonabst
ractmet
hodsonl
y.

76|
Page
77|
Page

//abstractclass
abstractclassSum{
/ *Theset woar eabst
ractmethods,t
hechil
dclass
*musti mpl ementthesemethods
*/
publ i
cabst racti
ntsumOfTwo(intn1,i
ntn2)
;
publ i
cabst racti
ntsumOfThr
ee( i
ntn1,i
ntn2,i
ntn3);

//
Regularmet hod
publ
i
cv oiddisp(){
Syst
em. out.pr
int
ln(
"Met
hodofcl
assSum"
);
}
}

8.Desi
gnpat
tensi
nJav
a

Si
ngl
etondesi
gnpat
ter
n::
:

Onl
yoneobj
ectcancr
eat
edi
nsi
ngl
etonpat
ter
n.

Onl
yoneobj
ectcanber
etunedev
eni
fwecr
eat
emul
ti
pleobj
ect
s.

Exampl
e::

publ
i
ccl
assSi
ngl
eObj
ect{

/
/cr
eat
eanobj
ectofSi
ngl
eObj
ect

pr
ivat
est
ati
cSi
ngl
eObj
ecti
nst
ance=newSi
ngl
eObj
ect
();

/
/maket
heconst
ruct
orpr
ivat
esot
hatt
hiscl
asscannotbe

/
/inst
ant
iat
ed

pr
ivat
eSi
ngl
eObj
ect
(){
}

/
/Gett
heonl
yobj
ectav
ail
abl
e

publ
i
cst
ati
cSi
ngl
eObj
ectget
Inst
ance(
){

r
etur
ninst
ance;

publ
i
cvoi
dshowMessage(
){

Sy
stem.
out
.pr
int
ln(
"Hel
l
oWor
ld!
")
;

77|
Page
78|
Page

9.Bui
l
derpat
ter
n::

Bui
l
derpat
ter
nbui
l
dsacompl
exobj
ectusi
ngsi
mpl
eobj
ect
sandusi
ngast
ep
byst
epappr
oach.Thi
sty
peofdesi
gnpat
ter
ncomesundercr
eat
ionalpat
ter
n
ast
hispat
ter
npr
ovi
desoneoft
hebestway
stocr
eat
eanobj
ect
.

A Bui
l
dercl
ass bui
l
ds t
he f
inalobj
ectst
ep by st
ep.Thi
s bui
l
deri
s
i
ndependentofot
herobj
ect
s.

10.MVCdesi
gnpat
ter
n(Model
viewcont
rol
l
er)

Model::i
ssomethi
ngwheret
hedataget
sst
oredandcanal
sohav
elogi
cto
updat
econtr
oll
eri
fit
sdat
achanges

ht
tps:
//www.
tut
ori
alspoi
nt.
com/
desi
gn_
pat
ter
n/mv
c_pat
ter
n.ht
m

11.Fact
oryDesi
gnpat
ter
n::

Fact
orypat
ter
nisoneoft
hemostuseddesi
gnpat
ter
nsi
nJav
a.Thi
sty
peof
desi
gnpat
ter
ncomesundercr
eat
ionalpat
ter
nast
hispat
ter
npr
ovi
desone
oft
hebestway
stocr
eat
eanobj
ect
.

I
nFact
orypat
ter
n,wecr
eat
eobj
ectwi
thoutexposi
ngt
hecr
eat
ionl
ogi
ctot
he
cl
i
entandr
efert
onewl
ycr
eat
edobj
ectusi
ngacommoni
nter
face.

Example:
:
htt
ps:
//www.
tut
ori
alspoi
nt.
com/
desi
gn_
pat
ter
n/f
act
ory
_pat
ter
n.ht
m

12.Col
l
ect
ions:
:
:

Bel
owar
ethet
ypesofcol
l
ect
ioni
nter
faces.

 Li
st

 Map

 Set

 Queue

78|
Page
79|
Page

Li
st:
:i
tscol
elci
tonofel
ement
s.I
tal
l
owdupl
i
cat
eel
ement
s

I
tal
l
owi
nser
ti
onor
der.

Set
::
:i
tscol
l
ect
ionofel
ement
s.I
twontal
l
owdupl
i
cat
eel
ement
s

I
twontal
l
owi
nser
ti
onor
der

Map:
:col
l
ect
ionofkeyv
aluepai
rs.

Ar
ray
li
st:
:

Needt
olooki
ntoal
lcol
l
ect
iont
ypes.

13.Obj
ect:
:i
tsani
nst
anceofacl
ass

Obj
ectcl
assi
sasupercl
asst
oal
lthecl
asses.

f
orexampl
e::
col
elct
ionsar
esupercl
assf
ormap,
l
ist
,set

14.Except
ions?

15.Mul
ti
thr
eadi
ng?

16.Packages?
79|
Page
80|
Page

17.

200. Spar
kcont
extuse?

I
tistheent rypoi
ntofSparkf
uncti
onali
ty.Themosti mportantstepofany
Sparkdr i
verappl
icat
ioni
stogenerat
eSpar kContext
.Ital
lowsy ourSpar
k
Applicati
ontoaccessSparkClust
erwit
ht hehelpofResour ceManager.The
resourcemanagercanbeoneoft hesethree- 
SparkStandalone,
 
YARN,  ApacheMesos.

Spar
kCont
ext
 — 
Ent
ryPoi
ntt
oSpar
kCor
e

Spar
kcontext 
setsupi
nternal
serv
ices 
andest
abl
i
shesaconnect
iont
o

Sparkexecut
ionenvi
ronment.

Oncea  SparkCont ext i


scr eated y
oucanusei tto cr
eate
RDDs,  
accumul ators and broadcastvari
ables,accessSparkser
v i
cesand 
run
j
obs  (
unti

Spar kCont ext 
i
s  st
opped) .
ASpar kcont extisessent iall
yacl i
entofSpark’
sexecut i
onenvi
ronmentandactsas
the masterofy ourSpar kappl icati
on 
(don’
tgetconf usedwit
htheothermeaning
of Master 
inSpar k,though) .
210. Whati
sbr
oadcastj
oini
nspar
k?

Aswi t
hcor e Spark,
ifoneoft hetabl
esismuchsmallert
hantheothery
oumaywant

broadcasthash 
join.Youcanhi ntto Spar
k SQLt
hatagivenDFshould
be 
broadcast 
for 
j
oi n bycal
li
ng broadcast 
ontheDat
aFramebefore 
j
oini
ng i
t

(
e.g.
,df
1.j
oin(
broadcast
(df
2),
"key
"))
.

211. Pur
gei
nhi
ve?

Dr
opt
abl
etabl
enamepur
gewi
l
ldel
etewi
thoutanydel
ay.

212. hcat
alogI
nhi
ve?

I
t’sast
oragemanagementt
ool
from Hadoopt
oconnectwi
thal
lot
herHadoop
appl
i
cat
ions.

Ex:
:fr
om hi
vet
opi
greadsandwr
it
es(
int
ermedi
atet
ool
bet
weenecosy
stem)

213. t
runcat
einhi
ve?

Todel
etet
het
abl
edat
anott
heschemaoft
abl
e.

214. br
oadcasti
nspar
k?

Copyofv
ari
abl
ewi
l
lbecopyt
oent
ir
ecl
ust
eri
nor
dert
ojoi
nthet
abl
es.

80|
Page
81|
Page

Cor
eJav
aQuest
ions.

1.Oopsconcept
s

 Pol
ymer
phi
sam

 Ov
err
idi
ng&&ov
erl
oadi
ng

Ov
erl
oadi
ng:
:

Publ
icst
ati
csum(
)

Publ
icst
ati
csum (
intx,
inty
)

Justusi
ngsamemet hodandoverl
oadingthesum
methodbypassi
ngsomev ari
abl
esthisiscall
edstat
ic
poly
merphi
sam orcompil
eti
mepolymer mi
sam

Overr
idi
ng:
:Supercl
assmet
hodcanov
err
ide/
modi
fyi
n
subcl
ass

Cl
assA

I
nta=12;

Publ
icv
oidsumma(
)

s.
o.pr
inl
n(“
a=some”
+a)
;

Cl
assbext
endsA

/
/@ov
err
ide

81|
Page
82|
Page

Publ
icv
oidsumma(
)

I
ntb=12;

Aa=bB;

s.
o.p.
pri
nln(
“”)

}
}

 I
nher
it
ance

Acqui
ret
hepr opert
iesofsupercl
assi
ntosubcl
assusi
ngkey
wordcal
ledextends

Ty
pes:
Singl
e,mul
ti
ple,
mul
ti
level
(notsuppor
ted)
,hy
bri
d

Si
ngl
ejustonecl
asst
oanot
hercl
ass

Mul
ti
plei
nher
tence:cansuppor
tat
obandbt
ocsoat
oc

Mul
til
eveli
nher
tence:doesnotsuppor
tat
ocandbt
oc

Canachi
evesameconceptbyusi
ngi
nter
face

Hy
bri
dinher
it
ance:cansuppor
tat
obandbt
ocandat
oc

At
odanddt
ocsoat
oc

 Encapsul
ati
on

Wrappi
ngupdataint
oasi
ngl
euni
t.
toachi
evet
hisusi
ng
pr
ivat
emethods.

Medi
cine:
:

 Abst
ract
ion

Abst
racti
on i
saprocessofhidingthei
mplement
ati
ondetai
l
s
fr
om theuser.Оnl
ythefunct
ionali
tywi
llbepr
ovi
dedtot
heuser
.
Toav
oidcodedupl
icat
ionandCoder
eusabi
li
ty.

I
nter
face:
:

An i
nter
face 
isaref
erencet
ypein 
Java.I
tissi
milart
o
cl
ass.I
tisacol
l
ect
ionofabst
ractmethods.Acl
assimplementsan 
i
nterf
ace,
82|
Page
83|
Page

ther
ebyinher
iti
ngtheabst
ractmethodsofthe i
nter
face.Alongwithabstr
act
methods,an 
inter
face 
mayalsocontai
nconstants,
defaultmethods,st
ati
cmethods,
andnestedtypes
I
tcanal
lowonlymet
hoddefini
ti
onst
ri
ctl
y.wher
easi
nabst
racti
tcanal
l
owpar
ti
al
i
mplement
ati
oninmethods.

2.Col
lect
ions

3.Desi
gnpat
ter
ns

4.St
ri
ngs

5.Ser
ial
izat
ionanddeser
ial
izat
ion

1.Whati
sNamenodeandDat
anode?

2.Whatist
hedefaul
tbl
ocksi
zeofHDFS?whathappenswhenwedecr
ease/
incr
ease
t
heblocksi
ze?

3.Di
ff
erencebet
weent
heI
nter
nal
andExt
ernal
tabl
einHi
ve?

4.Whati
sRDDi
nspar
k?

5.Di
ff
erencebet
weenJav
aandpy
thon/
Scal
a?

6.whati
stheuseofexpl
odei
nhi
ve?

7.whatdoy
oumeanbycl
osur
einspar
k?

8.spar
kjobexecut
ionpr
ocess?

9.whati
sDAG?

10.Tel
laboutsomeHi
vequer
ies?

11.Gi
veanyexampl
eforPar
ti
ti
onquer
yinhi
ve?

12.Di
ff
erencebet
weenPar
ti
ti
onandBucket
ing?

13.SqoopPer
for
mancet
uni
ng?

14.Howt
oimpor
t/expor
tdat
afr
om RDBMSt
oHDFS?

15.Howt
ocheckexi
sti
ngj
obsi
nSqoop?
83|
Page
84|
Page

16.di
ff
ercombi
nerandpar
ti
ti
oni
ngi
nMapr
educe

17.why$condi
ti
oni
susedaf
ter-
-quer
ypar
amet
erSqoop

18.whati
smapsi
dej
oini
nhi
ve?

19.Di
ff
erRDD,
Dat
afr
amesandDat
aset
sandwher
eexact
lyused.

20.Di
ff
ermap&Fl
atmap

21.Di
ff
ergr
oupby
Keyandr
educeby
key

22.f
ewcommonUni
xquest
ions

23.Spar
kCl
ust
ercomput
ingf
ormul
as

24.FewQuest
ionsonJav
acol
l
ect
ions

25.hi
vequer
ytogett
hesecondmaxempl
oyeesal
aryi
nempl
oyeet
abl
e

26.howut
est
edt
hespar
kcodei
nyourpr
oject

27.howcanyourevi
ewt
hecodeandwhatar
ethest
epst
haty
ouf
oll
owwhi
l
erev
iewi
ng
theot
her
'scode?

28.i
fthef
il
esi
zei
s1TBt
heni
nhowmanybl
ocksi
twi
l
lst
orei
nHDFS

29.BigDataProj
ect
swhi
chIhav
ewor
kedon,
bigdat
atechnol
ogi
esusedandt
hei
r
versi
ons.

30.Cur
rentpr
oject
'sar
chi
tect
ure,
Ver
sionofeachpr
oductused;

31.Ty
pesofdi
str
ibut
ion(
s)usedi
nthepr
ojectl
i
keCl
ouder
a,MapR.

32.Howareweschedul
i
ngtheSparkj
obs(Thi
squestionar
ousedbecausewear
en'
t
usi
ngworkf
lowmanagementt
oolf
rom anydi
str
ibuti
on)

33.Whati
sthePr
oduct
iondat
a'sv
olume?

34.WhywasCassandr
achosenast
heUnderl
yi
ngDBforSpar
k?Whydi
dyou
r
ecommendCassandr
aandwhatwastheCli
ent
’sr
equi
rement
?

35.Whatwast
heper
for
mancet
uni
ngpar
amet
ersi
nSpar
kJob?

36.Whati
sdy
nami
cresour
ceal
l
ocat
ion?

37.Product
ionSet
up’sHar
dwareconf
igurat
ionf
orSpar
kCl
ust
er.Howmanycor
esdi
d
sparkhave?Whatwasthecl
ust
ersize?

84|
Page
85|
Page

38.Whati
sHBase?Whati
sthespeci
alt
yofi
t?Whenwi
l
lyouoptf
orHBase.

39.Howdoy
oul
aunchSpar
kjobspr
agmat
ical
l
yfr
om anyAPI
?(Jav
aorScal
a)

40.Whatwer
etheAPI

susedi
nJav
afort
heSpar
kJobs?

41.Hi
ve-
>whendoy
ouoptf
orhi
ve?

42.Di
ff
erencesbet
weenSpar
k1.
6.2andt
henewerv
ersi
onofSpar
k(2.
1.0)
?

43.Howf
requent
lyar
etheSpar
kJobsschedul
ed?

44.Whatar
etheshowst
opperi
ssuesf
acedi
nSpar
kjobsoranybi
gdat
apr
oject
s?

45.Fewbasi
cquest
ionsf
rom Cor
eJav
a.

46.Howdoy
ouachi
evepar
all
eli
sm wi
thExecut
or&Cor
es?What
’st
hei
rrel
ati
on?

47.Whatwil
lhappeni
f3Coresaregi
venperexecut
or?Whatdi
ff
erencedoesi
tmake
f
rom al
l
ocati
ng1coreperexecut
orandmaking3execut
orsrun?

48.Whatoperat
ionswher
eper
for
medwi
thSpar
k(Li
keDat
aFr
ame,
Dat
aSet
,
Jav
aPair
RDD) ?

49.Whatt
ypeofdat
acanbei
ngest
edi
nSpl
unk?

50.I
nwhatscenar
iost
ouseMapReduceorSpar
k.

51.Whatar
ethepr
ocessesi
nvol
vedi
nDat
aLakecr
eat
ion?

52.Howtoval
i
datethedat
awhichincludes:adher
encetodat
aty
pes,
adher
encet
odat
a
f
ormat
,Dataqual
it
ycheck,
fil
ter
ingoutthebadr ecor
ds

53.Howt
oimpr
ovet
heper
for
manceofSqoop?

54.Howt
ounl
oadskeweddat
afr
om adat
abaseusi
ngSqoop?

55.Howtomergehi
stor
ical
dat
aanddel
tapr
esentondat
alakei
fti
mest
ampcol
umni
s
notpr
esent
?

56.Howtoachi
evet
hesameper
for
mancest
andar
dsofETLdoneusi
ngI
nfor
mat
ica/
Ab
I
nit
ioi
nHadoop?

57.Howt oval
i
datet
heout
putdat
awi
thr
efer
encet
oinputdat
aint
heHadoop
env
ironment?

58.Howt
ojoi
ntwomassi
vedat
aset
sinHadoop?

59.Howcant
hehar
d-codi
ngoft
ransf
ormat
ionsbeav
oidedi
nanETLappl
i
cat
ion?
85|
Page
86|
Page

60.Howcanthebusi
nessr
ulesbei
ncor
por
atedi
nthet
ransf
ormat
ionphaseofan
appl
i
cat
ion?

61.Howdoy
ouwr
it
eaf
unct
ioni
nScal
aspar
k?

62.Di
ff
erencebet
weenMapReduceandSpar
k

63.Howt
owr
it
eMapr
educepr
ogr
am(
component
s)?

64.Howt
ocr
eat
eaHi
veUDFandhowt
ousei
t?

65.Whati
sFl
umeusedf
or

66.Expl
aint
heSpar
kcodesdone.

67.Whati
sinhi
ve-
sit
e.xml

86|
Page

You might also like