Unit - I - 2

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 63

Unit - I

Data Analytics (KIT 601)

SWASTI SINGHAL
Department of CSIT
KIET, Ghaziabad

DA KIT 601 By Swasti Singhal


05/25/2024 2
EVOLUTION OF ANALYTIC
SCALABILITY
• The amount of data organizations process continues to increase.
 The old method for handling data doesn't work efficiently
• Important technologies to handle big data are
 MPP (Massive Parallel processing )
 The cloud
 Grid computing
 Map reduce
MORDERN DATA BASE
ARCHITECTURE
Massively Parallel Processing
What is cloud computing ?
Grid Computing
Map Reduce
Working process
Good & Bad
Technologies can integrate and work together
Evolution of Analytical Processes
Definition of Analytical frame work
An internal Configuration
An External Configuration
A Hybrid Configuration
Benefits
Definition of ADS
The data that is pulled together in order to create an analysis or model

• In the format required for the specific analysis at hand

• Generated by transforming, aggregating, and combining data

• Help to bridge the gap between efficient storage and ease of use
Two Primary kinds of Analytics Data sets
Traditional Analytics data sets
Enterprise Analytic Data Set
EDA Set - Structure
Summary Table or View?
Embedded Scoring
Model and Score Management
• Model and score management procedures will need to be in place to
scale the use of models by an organization.
REPORTING Vs ANALYSIS
• Reporting: The process of organizing data into informational
summaries in order to monitor how different areas of a business are
performing.

• They select the reports which they want to run


• Get the reports executed
• View Results
• Analysis: The process of exploring data and reports in order to
extract meaningful insights, which can be used to better
understand and improve business performance.

• Tracking Problem
• Finding Data Required
• Analyze the data
• Interpret the result
Difference
Making Inference
• To produce a great analysis, it is necessary to infer potential actions
• Make initial inference based on analysis
• Visualization plays vital role in understanding
• An effective Visualization can bring out much more inferences
• Today visualization tool allows multiple tabs, links the graphs and charts
• New idea for visualizations is 3-D
Applications
• Open source software have been around for some time
• In many cases, open source products are outside the mainstream
• Many individuals are contributing to improve the functionality
• Bugs can be patched soon
Data Analytics Lifecycle
• Big Data analysis differs from traditional data analysis primarily
due to the volume, velocity and va r ie ty characterstics of the
data being processes.
• To address the distinct requirements for performing analysis on Big
Data, a step-by-step methodology is needed to organize the activities
and tasks involved with acquiring, processing, analyzing and
repurposing data.
Key Roles for a Successful Analytics
Project
• Business User – understands the domain area
• Project Sponsor – provides requirements
• Project Manager – ensures meeting objectives
• Business Intelligence Analyst – provides business
domain expertise based on deep understanding of the
data
• Database Administrator (DBA) – creates DB
environment
• Data Engineer – provides technical skills, assists data
management and extraction, supports analytic sandbox
• Data Scientist – provides analytic techniques and
modeling
Data Analytics Lifecycle (cont..)
• The data analytic lifecycle is designed for Big Data problems and data
science projects
• The cycle is iterative to represent a real project
• Work can return to earlier phases as new information is uncovered
Data Analytics Lifecycle-Abstract View
Discovery
• In this phase,
• The data science team must learn and investigate the problem,
• Develop context and understanding and Learn about the data sources
needed and available for the project.
• In addition, the team formulates initial hypotheses that
• can later be tested with data.
• The team should perform five main activities during this step
of the discovery.
• Identify data sources: Make a list of data sources the team
may need to test the initial hypotheses outlined in this phase.
Make an inventory of the datasets currently available
and those that can be purchased or otherwise acquired
for the tests the team wants to perform.
• Capture aggregate data sources: This is for previewing the
data and providing high-level understanding.
It enables the team to gain a quick overview of the data
and perform further exploration on specific areas.
• Review the raw data: Begin understanding the
interdependencies among the data attributes.
Become familiar with the content of the data, its quality,
and its limitations
• Evaluate the data structures and tools needed: The data type
and structure dictate which tools the team can use to analyze
the data.
• Scope the sort of data infrastructure needed for this type of
problem: In addition to the tools needed, the data influences
the kind of infrastructure that's required, such as disk storage
and network capacity.
• Unlike many traditional stage-gate processes, in which the team
can advance only when specific criteria are met, the Data
Analytics Lifecycle is intended to accommodate more
ambiguity
• For each phase of the process, it is recommended to pass
certain checkpoints as a way of gauging whether the team is
ready to move to the next phase of the Data Analytics
Lifecycle.
Data preparation
• This phase includes
• Steps to explore, Preprocess, and condition data prior to
modeling and analysis.
• It requires the presence of an analytic sandbox (workspace), in
which the team can work with data and perform analytics for the
duration of the project.
 The team needs to execute Extract, Load, and Transform
(ELT) or extract, transform and load (ETL) to get data
into the sandbox.
 In ETL, users perform processes to extract data from a
datastore, perform data transformations, and load the data
back into the datastore
 The ELT and ETL are sometimes abbreviated as ETLT. Data
should be transformed in the ETLT process so the team can work
with it and analyze it.
Data preparation (Cont.,)
Data preparation (Cont.,)
Common Tools for the Data Preparation
Phase
Model Planning
Common Tools for the Model Planning
Phase
Model Building
Communicate Results
Operationalize
Common Tools for the Model Building
Phase
Key outputs for each of the main
stakeholders

You might also like