Big Data Syllabus For Theory and Lab

19ECS442: BIG DATA
The course is designed which largely involves collecting data from different sources, manage
it in a way that it becomes available to be consumed by analysts and finally deliver data
products useful to the organization business. The process of converting large amounts of
unstructured raw data, retrieved from different sources to a data product useful for
organizations forms the core of Big Data Analytics.
Course objectives:
1. To introduce an in depth understanding of all the concepts related to Big Data and its
uses
2. To provide an insight on the underlying technologies to handle Big Data and the
Ecosystem of Hadoop.
3. To explore the layers of Big Data Stack and YARN Functionality.
4. To Understand the Architecture, benefits and Properties of Hive and Pig.
5. To provide learners with a deep and systematic knowledge on Spark.

Module I: Module Name: Getting an overview of Big Data
Big Data definition, History of Data Management, Structuring Big Data, Elements of
Bigdata, Big Data Analytics.

Exploring use of Big Data in Business Context: Use of Big Data in Social Networking, Use of
Big Data in preventing Fraudulent Activities in Insurance Sector & in Retail Industry.
Learning Outcomes:
After completion of this unit, the student will be able to:

1. Learn various sources of data and forms of data generation. (L2)
2. Understand the evolution and elements of Big Data. (L2)
3. Explore different opportunities available in the career path. (L3)
4. Understand the role and importance of Big Data in various domains. (L2)

Module II:
Handling Big Data Number of hours (LTP) 6 0
6
Distributed and parallel computing for Big Data, Introducing Hadoop, Cloud computing and
Big Data, In-memory Computing Technology for Big Data.
Understanding Hadoop Ecosystem: Hadoop Ecosystem, Hadoop Distributed File System,

MapReduce, Hadoop YARN, Introducing HBase, Combing HBase and HDFS, Hive, Pig
and Pig Latin, Sqoop, ZooKeeper, Flume, Oozie.

Learning Outcomes:

1. Identify the difference between distributed and parallel computing. (L3)
2. Learn the importance of Virtualization in Big Data. (L2)
3. Learn the details of Hadoop and Cloud Computing. (L2)
4. Learn the architecture and features of HDFS. (L2)

Module III:
Understanding Big Data Technology Foundations Number of hours (LTP) 6 0 6
The MapReduce Framework, Techniques to Optimize Map Reduce Jobs, Uses of Map Reduce,
Role of HBase in Big Data Processing.
Exploring the Big Data Stack, Virtualization and Big Data, Virtualization approaches.
Learning Outcomes:
1. Understand Hadoop Ecosystem, MapReduce and HBase. (L2)
2. Apply the technique in optimizing MapReduce jobs. (L3)
3. Explore the layers of Big Data Stack. (L2)
4. Learn virtualization approaches in handling Big Data operations. (L2)

Module IV: HIVE and PIG Number of hours (LTP) 6 0 6
Exploring Hive: Introducing Hive, Getting Started with Hive, Hive Services, Data Types,
Built- in Functions, Hive-DDL, Data Manipulation, Data Retrieval Queries, Using Joins.
Analysing Data with Pig: Introducing Pig, Running Pig, Getting started with Pig Latin,
working `with operators in Pig, Debugging Pig, Working with Functions in pig, Error Handling
in Pig.

Learning Outcomes:
1. Learn the working of Hive and query execution. (L2)
2. Learn the importance of Pig. (L2)
3. Choose the operators in Pig. (L2)
Module V: SPARK Number of hours (LTP) 6 0 6

Introduction, Spark Jobs and API, Spark 2.0 Architecture, Resilient Distributed Datasets:
Internal Working, Creating RDDs, Transformations, Actions. Data Frames: Python to RDD
Communications, speeding up PySpark with Data Frames, Creating Data Frames and
Simple Data Frame Queries, Interoperating with RDDs, Querying with Data Frame.
Learning Outcomes:

1. Get an overview of Spark technology and Jobs Organization concept (L2)
2. Understand the schema less data structure available in PySpark (L3)
3. Get an overview of data frames that bridges the gap between Scala and Python in
terms of efficiency. (L2)
4. Able to handle a real time Big Data Application. (L4)

Textbooks(s)
1. Big Data Black Book by Dt Editorial Services, Dreamtech Publications, 2016.
2. Learning PySpark by Tomasz Drabas, Denny Lee, Packt publishing, 2017.
3. Tom White, "Hadoop: The Definitive Guide", 3/e,4/e O'Reilly, 2015.
Reference Book(s)
1. Bill Franks Taming, The Big Data Tidal Wave, 1/e, Wiley, 2012.
2. Frank J. Ohlhorst, Big Data Analytics, 1/e, Wiley, 2012 Course
Outcomes:
1. Demonstrate the big data concepts for real world data analysis (L1).
2. Develop Map Reduce concepts (L2).
3. Learn how Pig Latin is used for programming in Hadoop. (L3).
4. Illustrate Hadoop API for Map reduce framework (L4).
5. Develop basic programs of map reduce framework particularly driver code, mapper
code, reducer code (L5).
6. Learn Apache Spark fundamentals, RDD, DataFrame(L6)
Lab experiments for Bigdata

1 Installation of Hadoop Cluster –
a. Stand Alone Mode, b. Pseudo Distributed Mode, c.Fully Distributed Mode
2 Perform file management task in Hadoop.
a. Creating directory
b. List the contents of a directory
c. Upload and download a file
d. See contents of a file
e. Copy a file from source to destination
f. Move file from source to destination.
3 Map reduce programming
a. Wordcount program using Java
b. Wordcount program using python
4 Databases,Tables,Views,Functions and Indexes
5 Write a program to perform matrix multiplication in hadoop with a matrix size of nxn
where n >1000.
7 Given the following table schema
Employee_table {ID: INT, Name: Varchar (10), Age: INT, Salary: INT}
Loan_table {LoanID:INT, ID: INT, Loan_applied: Boolean, Loan_amt: INT)
a. Create a database and the following tables in Hive.
b. Insert records into the table
c. write an SQL to retrieve the employee details who have applied for a loan.
8 Write a query to create a table which stores the employee records working in the same
department together in the same sub-directory in HDFS. The schema for the table is given
below:Emp_table: {id, name, dept, yoj}
9 Given
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
+-----+---------------------+-------------+--------+
|OID | DATE | CUSTOMER_ID | AMOUNT |

Create the following table in hive and insert transaction records into it. write
an SQL query to find the customer details who have made an order?
10 Understanding Spark

Big Data Syllabus For Theory and Lab

Uploaded by

Copyright:

Available Formats

Big Data Syllabus For Theory and Lab

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data Syllabus For Theory and Lab

Uploaded by

Copyright:

Available Formats

19ECS442: BIG DATA

Understanding Hadoop Ecosystem: Hadoop Ecosystem, Hadoop Distributed File System,

Module V: SPARK Number of hours (LTP) 6 0 6

Lab experiments for Bigdata

You might also like