Bda - Unit 2
Bda - Unit 2
Bda - Unit 2
OUTLINE:
• History of Hadoop
• Hadoop Environment
• Security in Hadoop
• Administering in Hadoop
Problem 1:Data is too big to store on one machine HDFS: Store the data on multiple machines!
???
Problem 2: Very high end machines are too HDFS: Run on commodity hardware!
expensive !!!!
Problem 3: Commodity hardware will fail! HDFS: Software is intelligent enough to handle
hardware failure!
Problem 4: What happens to the data if the HDFS: Replicate the data!
machine stores the data fails?
2. Using the output of Map, sort and shuffle are applied by the
Hadoop architecture. This sort and shuffle acts that is a list of
<key, value> pairs and sends out unique keys and a list of
values associated with this unique key <key, list(values)>.
2. Map function
• The map function processes the upcoming key-value pairs
and generated the corresponding output key-value pairs.
• The map input and output type may be different from each
other.
3. Partition function
• The partition function assigns the output of each Map
function to the appropriate reducer.
• The available key and value provide this function.
• It returns the index of reducers.
4. Output writer
• Once the data flow from all the above phases, the Output
writer executes.
• The role of the Output writer is to write the Reduce output
to the stable storage.
MapReduce Architecture
Job Tracker
• Job Tracker is the one to which client applications submit
mapreduce programs(jobs).
MapReduce Monitoring
• A MapReduce application is a collection of jobs(Map job,
Combiner, Partitioner, and Reduce job)
• It is mandatory to monitor and maintain the following –
Configuration of datanode where the application is
suitable.
• The number of datanodes and resources used per
application.
Maintenance
• Hadoop Admin Roles and Responsibilities include setting
up Hadoop clusters.