Welcome to Scribd!

0% found this document useful (0 votes)

23 views

Business Intelligence & Big Data Analytics-CSE3124Y: Map Reduce (Part 2)

Uploaded by

The document discusses how MapReduce jobs are split and executed in Hadoop. It explains that splits are created from HDFS blocks by the InputSplitter class, with one map task run per split. The number of map and reduce tasks can be configured. Map tasks attempt to run splits locally, but may be moved to other nodes if no local slots are available. The RecordReader and InputFormat classes then handle reading the split data and transforming it into key-value pairs for the map tasks.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Business Intelligence & Big Data Analytics-CSE3124Y: Map Reduce (Part 2)

Uploaded by

splokbov

0% found this document useful (0 votes)

23 views17 pages

Original Description:

Original Title

Lecture6

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

0% found this document useful (0 votes)

23 views17 pages

Business Intelligence & Big Data Analytics-CSE3124Y: Map Reduce (Part 2)

Uploaded by

splokbov

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Jump to Page

You are on page 1of 17

Search inside document

Business Intelligence & Big Data

Analytics- CSE3124Y
MAP REDUCE (PART 2)

LECTURE 6
Learning Outcomes
Recap
▪Elaborate on the functions of the JobTracker and TaskTracker.
▪Explain how map-reduce work
Learning Objectives:
▪Describe how split is being done in Map-Reduce
▪Explain the main classes used in Split and what are their main
roles
▪Details how Hadoop runs MapReduce job
Map/Reduce tasks (1)

▪Local Execution
– Hadoop will attempt to execute splits locally
– If no local Map slot is available, split will be moved to the Map task
▪Number Map Tasks
– It is possible to configure the number of Map and Reduce tasks
– If file is not splittable there will only be a single Map task
▪Number Reduce Tasks
– Normally there are less Reduce tasks than Map tasks
– Reduce output is written locally to HDFS
– If you need a single output task use one Reduce task
Map/Reduce tasks (2)
▪Redundant Execution
– It is possible to configure redundant execution, i.e. 2 or
more Map tasks are
started for each split
• The first Map task for a split that finishes wins.
• In systems with large numbers of machines and cheap
machines this may increase performance
• In systems with smaller number of nodes or high quality
hardware it can decrease overall performance.
Splits
• Files in MapReduce are stored in Blocks (128 MB)
• MapReduce divides data into fragments or splits.
◦ One map task is executed on each split
• Most files have records with defined split points
◦ Most common is the end of line character
• The InputSplitter class is responsible for taking a HDFS
file and transforming it into splits.
◦ Aim is to process as much data as possible locally
Classes
There are three main classes reading data in MapReduce:
• InputSplitter, dividing a File into Splits
◦ Normally the block sizes but depends on number of requested
Map tasks etc.
• RecordReader, takes a split and reads the files into records
◦ For example one record per line (LineRecordReader)
• InputFormat, takes each record and transforms it into a <key,
value> pair that is then forwarded to the Map task

Hadoop: Data Processing and Modelling
From Everand
Hadoop: Data Processing and Modelling
Deshpande Tanmay
No ratings yet
Chap 6 - MapReduce Programming
No ratings yet
Chap 6 - MapReduce Programming
37 pages
HadoopMapreduce Summerization
No ratings yet
HadoopMapreduce Summerization
24 pages
Hadoop Map Reduce
No ratings yet
Hadoop Map Reduce
53 pages
PPT1 Module2 Hadoop Distribution
No ratings yet
PPT1 Module2 Hadoop Distribution
23 pages
There Are 7 Tips For Improving Map Reduce Performance:: Configuring The Cluster Correctly
No ratings yet
There Are 7 Tips For Improving Map Reduce Performance:: Configuring The Cluster Correctly
4 pages
Map Reduce: Simplified Processing On Large Clusters
No ratings yet
Map Reduce: Simplified Processing On Large Clusters
29 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
26 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Understand: The First Phase of Mapreduce Paradigm, What Is A Map/Mapper, What Is The Input To The
No ratings yet
Understand: The First Phase of Mapreduce Paradigm, What Is A Map/Mapper, What Is The Input To The
5 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
2inceptez Hadoop Processing
No ratings yet
2inceptez Hadoop Processing
16 pages
Unit-Iv CC&BD CS62
No ratings yet
Unit-Iv CC&BD CS62
76 pages
The Map Reduce Programming
No ratings yet
The Map Reduce Programming
15 pages
BY K.Karthikeyan: Hadoop & Map Reduce
No ratings yet
BY K.Karthikeyan: Hadoop & Map Reduce
7 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
Hadoop Streaming: Mapreduce
No ratings yet
Hadoop Streaming: Mapreduce
8 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
Intro ToHadoop-Unit 04
No ratings yet
Intro ToHadoop-Unit 04
24 pages
Lecture 06 - Data Analytics For IoT A Primer
No ratings yet
Lecture 06 - Data Analytics For IoT A Primer
31 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
59 pages
MapReduce BigData 09
No ratings yet
MapReduce BigData 09
9 pages
Unit 2 Topic 5 Developing A Map Reduce Application
No ratings yet
Unit 2 Topic 5 Developing A Map Reduce Application
52 pages
UNIT 4 Notes by ARUN JHAPATE
No ratings yet
UNIT 4 Notes by ARUN JHAPATE
20 pages
Lecturer6 7
No ratings yet
Lecturer6 7
7 pages
Lecture4 IntroMapReduce PDF
No ratings yet
Lecture4 IntroMapReduce PDF
75 pages
DM Hadoop Architecture
No ratings yet
DM Hadoop Architecture
6 pages
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
No ratings yet
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
22 pages
Ditp - ch2 4
No ratings yet
Ditp - ch2 4
2 pages
Unit 3 Bba
No ratings yet
Unit 3 Bba
11 pages
Bda - 3 Unit
No ratings yet
Bda - 3 Unit
18 pages
Hadoop Map Reduce Concepts - Teaching - 1
No ratings yet
Hadoop Map Reduce Concepts - Teaching - 1
53 pages
Matchmaking: A New Mapreduce Scheduling Technique: Digitalcommons@University of Nebraska - Lincoln
No ratings yet
Matchmaking: A New Mapreduce Scheduling Technique: Digitalcommons@University of Nebraska - Lincoln
9 pages
Lovely Professional University (Lpu) : Mittal School of Business (Msob)
No ratings yet
Lovely Professional University (Lpu) : Mittal School of Business (Msob)
10 pages
Take A Close Look At: Ma Ed
No ratings yet
Take A Close Look At: Ma Ed
42 pages
Shortnotes For Cloud
No ratings yet
Shortnotes For Cloud
22 pages
Unit 5
No ratings yet
Unit 5
35 pages
Introduction To: Ma Ed
No ratings yet
Introduction To: Ma Ed
42 pages
3.6 Backup Tasks 4.2 Ordering Guarantees: To Appear in OSDI 2004
No ratings yet
3.6 Backup Tasks 4.2 Ordering Guarantees: To Appear in OSDI 2004
1 page
BDM HadoopMapReduce
No ratings yet
BDM HadoopMapReduce
63 pages
Scala and Spark Overview PDF
No ratings yet
Scala and Spark Overview PDF
37 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Unit Iv-1
No ratings yet
Unit Iv-1
84 pages
Spark Keywords 1675605055
No ratings yet
Spark Keywords 1675605055
6 pages
Map Reduce Examples
No ratings yet
Map Reduce Examples
16 pages
777 1651400043 BD Module 4
No ratings yet
777 1651400043 BD Module 4
21 pages
An Optimized Algorithm For Reduce Task Scheduling: Xiaotong Zhang, Bin Hu, Jiafu Jiang
No ratings yet
An Optimized Algorithm For Reduce Task Scheduling: Xiaotong Zhang, Bin Hu, Jiafu Jiang
8 pages
Unit - III
No ratings yet
Unit - III
37 pages
3 Bda Unit 3 Notes
No ratings yet
3 Bda Unit 3 Notes
12 pages
Spark Based Topics Kewords
No ratings yet
Spark Based Topics Kewords
6 pages
Map Red
No ratings yet
Map Red
6 pages
Unit 2 Notes BDA
No ratings yet
Unit 2 Notes BDA
10 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
System Design and Implementation 5.1 System Design
No ratings yet
System Design and Implementation 5.1 System Design
14 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
Hadoop Beginner's Guide
From Everand
Hadoop Beginner's Guide
Garry Turkington
4/5 (7)
Business Intelligence & Big Data Analytics-CSE3124Y
No ratings yet
Business Intelligence & Big Data Analytics-CSE3124Y
26 pages
Business Intelligence & Big Data Analytics-CSE3124Y
No ratings yet
Business Intelligence & Big Data Analytics-CSE3124Y
14 pages
Presentation - Build Voting Application Using Blockchain
No ratings yet
Presentation - Build Voting Application Using Blockchain
16 pages
Unit 6: Big Data Analytics Using R: 6.0 Overview
No ratings yet
Unit 6: Big Data Analytics Using R: 6.0 Overview
32 pages
Lab 1 - Week2
No ratings yet
Lab 1 - Week2
29 pages
Unit 1 - Setting Up The Environment-Getting R Ready
No ratings yet
Unit 1 - Setting Up The Environment-Getting R Ready
7 pages
Business Intelligence & Big Data Analytics-CSE3124Y
No ratings yet
Business Intelligence & Big Data Analytics-CSE3124Y
25 pages
Zakat Management in Malaysia: A Review: Mohd Shahril Ahmad Razimi, Abd Rahim Romle and Muhammad Farid Muhamad Erdris
No ratings yet
Zakat Management in Malaysia: A Review: Mohd Shahril Ahmad Razimi, Abd Rahim Romle and Muhammad Farid Muhamad Erdris
5 pages
Research Paper - Reading Materials
No ratings yet
Research Paper - Reading Materials
15 pages
CSE 3118Y Week 02 About Password by Keszthelyi
No ratings yet
CSE 3118Y Week 02 About Password by Keszthelyi
20 pages
UNIT 3: Data Analysis and Visualization With R
No ratings yet
UNIT 3: Data Analysis and Visualization With R
12 pages
R Fundamentals: Unit 2
No ratings yet
R Fundamentals: Unit 2
52 pages
E-Zakat: Redesign The Collection and Distribution of Zakat: Conference Paper
No ratings yet
E-Zakat: Redesign The Collection and Distribution of Zakat: Conference Paper
20 pages
CSE 3118Y Week 01 Introduction To Information Systems Security
No ratings yet
CSE 3118Y Week 01 Introduction To Information Systems Security
56 pages
Parallel Programming Platforms (Part 2) : CSE3057Y Parallel and Distributed Systems
No ratings yet
Parallel Programming Platforms (Part 2) : CSE3057Y Parallel and Distributed Systems
20 pages
University of Mauritius: Faculty of Information, Communication and Digital Technologies
No ratings yet
University of Mauritius: Faculty of Information, Communication and Digital Technologies
11 pages
Updated IBM Digital Nation Africa Fifth Webinars Schedule
No ratings yet
Updated IBM Digital Nation Africa Fifth Webinars Schedule
1 page
XML and PHP
No ratings yet
XML and PHP
33 pages
CSE3057Y Parallel and Distributed Systems: Lecture 1 Introduction To Parallel Computing
No ratings yet
CSE3057Y Parallel and Distributed Systems: Lecture 1 Introduction To Parallel Computing
34 pages
Weeks 25 - REST
No ratings yet
Weeks 25 - REST
43 pages
Week 19 - Internet Technologies and Web Services - Creating and Using A Web Service - Generating An XML File (Updated)
No ratings yet
Week 19 - Internet Technologies and Web Services - Creating and Using A Web Service - Generating An XML File (Updated)
26 pages
RESTful Web Services - Part1
No ratings yet
RESTful Web Services - Part1
14 pages
RESTful Web Services
No ratings yet
RESTful Web Services
13 pages
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
No ratings yet
Parallel Programming Platforms (Part 1) : CSE3057Y Parallel and Distributed Systems
38 pages

Business Intelligence & Big Data Analytics-CSE3124Y: Map Reduce (Part 2)

Uploaded by

Copyright:

Available Formats

Business Intelligence & Big Data Analytics-CSE3124Y: Map Reduce (Part 2)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Business Intelligence & Big Data Analytics-CSE3124Y: Map Reduce (Part 2)

Uploaded by

Copyright:

Available Formats

Business Intelligence & Big Data

You might also like