Business Intelligence & Big Data Analytics-CSE3124Y: Map Reduce (Part 2)
Business Intelligence & Big Data Analytics-CSE3124Y: Map Reduce (Part 2)
Business Intelligence & Big Data Analytics-CSE3124Y: Map Reduce (Part 2)
Analytics- CSE3124Y
MAP REDUCE (PART 2)
LECTURE 6
Learning Outcomes
Recap
▪Elaborate on the functions of the JobTracker and TaskTracker.
▪Explain how map-reduce work
Learning Objectives:
▪Describe how split is being done in Map-Reduce
▪Explain the main classes used in Split and what are their main
roles
▪Details how Hadoop runs MapReduce job
Map/Reduce tasks (1)
▪Local Execution
– Hadoop will attempt to execute splits locally
– If no local Map slot is available, split will be moved to the Map task
▪Number Map Tasks
– It is possible to configure the number of Map and Reduce tasks
– If file is not splittable there will only be a single Map task
▪Number Reduce Tasks
– Normally there are less Reduce tasks than Map tasks
– Reduce output is written locally to HDFS
– If you need a single output task use one Reduce task
Map/Reduce tasks (2)
▪Redundant Execution
– It is possible to configure redundant execution, i.e. 2 or
more Map tasks are
started for each split
• The first Map task for a split that finishes wins.
• In systems with large numbers of machines and cheap
machines this may increase performance
• In systems with smaller number of nodes or high quality
hardware it can decrease overall performance.
Splits
• Files in MapReduce are stored in Blocks (128 MB)
• MapReduce divides data into fragments or splits.
◦ One map task is executed on each split
• Most files have records with defined split points
◦ Most common is the end of line character
• The InputSplitter class is responsible for taking a HDFS
file and transforming it into splits.
◦ Aim is to process as much data as possible locally
Classes
There are three main classes reading data in MapReduce:
• InputSplitter, dividing a File into Splits
◦ Normally the block sizes but depends on number of requested
Map tasks etc.
• RecordReader, takes a split and reads the files into records
◦ For example one record per line (LineRecordReader)
• InputFormat, takes each record and transforms it into a <key,
value> pair that is then forwarded to the Map task