Big Data Syllabus For Theory and Lab
Big Data Syllabus For Theory and Lab
Big Data Syllabus For Theory and Lab
The course is designed which largely involves collecting data from different sources, manage
it in a way that it becomes available to be consumed by analysts and finally deliver data
products useful to the organization business. The process of converting large amounts of
unstructured raw data, retrieved from different sources to a data product useful for
organizations forms the core of Big Data Analytics.
Course objectives:
1. To introduce an in depth understanding of all the concepts related to Big Data and its
uses
2. To provide an insight on the underlying technologies to handle Big Data and the
Ecosystem of Hadoop.
3. To explore the layers of Big Data Stack and YARN Functionality.
4. To Understand the Architecture, benefits and Properties of Hive and Pig.
5. To provide learners with a deep and systematic knowledge on Spark.
Module I: Module Name: Getting an overview of Big Data
Big Data definition, History of Data Management, Structuring Big Data, Elements of
Bigdata, Big Data Analytics.
Exploring use of Big Data in Business Context: Use of Big Data in Social Networking, Use of
Big Data in preventing Fraudulent Activities in Insurance Sector & in Retail Industry.
Learning Outcomes:
After completion of this unit, the student will be able to:
1. Learn various sources of data and forms of data generation. (L2)
2. Understand the evolution and elements of Big Data. (L2)
3. Explore different opportunities available in the career path. (L3)
4. Understand the role and importance of Big Data in various domains. (L2)
Module II:
Handling Big Data Number of hours (LTP) 6 0
6
Distributed and parallel computing for Big Data, Introducing Hadoop, Cloud computing and
Big Data, In-memory Computing Technology for Big Data.
The MapReduce Framework, Techniques to Optimize Map Reduce Jobs, Uses of Map Reduce,
Role of HBase in Big Data Processing.
Exploring the Big Data Stack, Virtualization and Big Data, Virtualization approaches.
Learning Outcomes:
After completion of this unit, the student will be able to:
1. Understand Hadoop Ecosystem, MapReduce and HBase. (L2)
2. Apply the technique in optimizing MapReduce jobs. (L3)
3. Explore the layers of Big Data Stack. (L2)
4. Learn virtualization approaches in handling Big Data operations. (L2)
Module IV: HIVE and PIG Number of hours (LTP) 6 0 6
Exploring Hive: Introducing Hive, Getting Started with Hive, Hive Services, Data Types,
Built- in Functions, Hive-DDL, Data Manipulation, Data Retrieval Queries, Using Joins.
Analysing Data with Pig: Introducing Pig, Running Pig, Getting started with Pig Latin,
working `with operators in Pig, Debugging Pig, Working with Functions in pig, Error Handling
in Pig.
Learning Outcomes:
After completion of this unit, the student will be able to:
1. Learn the working of Hive and query execution. (L2)
2. Learn the importance of Pig. (L2)
3. Choose the operators in Pig. (L2)
Learning Outcomes:
After completion of this unit, the student will be able to:
1. Get an overview of Spark technology and Jobs Organization concept (L2)
2. Understand the schema less data structure available in PySpark (L3)
3. Get an overview of data frames that bridges the gap between Scala and Python in
terms of efficiency. (L2)
4. Able to handle a real time Big Data Application. (L4)
Textbooks(s)
1. Big Data Black Book by Dt Editorial Services, Dreamtech Publications, 2016.
2. Learning PySpark by Tomasz Drabas, Denny Lee, Packt publishing, 2017.
3. Tom White, "Hadoop: The Definitive Guide", 3/e,4/e O'Reilly, 2015.
Reference Book(s)
1. Bill Franks Taming, The Big Data Tidal Wave, 1/e, Wiley, 2012.
2. Frank J. Ohlhorst, Big Data Analytics, 1/e, Wiley, 2012 Course
Outcomes:
1. Demonstrate the big data concepts for real world data analysis (L1).
2. Develop Map Reduce concepts (L2).
3. Learn how Pig Latin is used for programming in Hadoop. (L3).
4. Illustrate Hadoop API for Map reduce framework (L4).
5. Develop basic programs of map reduce framework particularly driver code, mapper
code, reducer code (L5).
6. Learn Apache Spark fundamentals, RDD, DataFrame(L6)