LS1.1 - V6 Generalized Architecture of Big Data Systems

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 8

Generalized Architecture of Big Data

Systems

Pravin Y Pawar
Big data architecture style

• is designed to handle the ingestion, processing, and analysis of data that is too large or complex
for traditional database systems.

Source : https://docs.microsoft.com/en-us/azure/architecture/guide/architecture-styles/big-data
Big Data Applications
Workloads
• Big data solutions typically involve one or more of the following types of workload:

 Batch processing of big data sources at rest


 Real-time processing of big data in motion
 Interactive exploration of big data
 Predictive analytics and machine learning
Big Data Systems Components
Components
• Most big data architectures include some or all of the following components:
 Data sources
All big data solutions start with one or more data sources like databases, files, IoT devices etc
 Data Storage
Data for batch processing operations is typically stored in a distributed file store that can hold high volumes of large files in various formats.  
 Batch processing
Because the data sets are so large, often a big data solution must process data files using long-running batch jobs to filter, aggregate, and
otherwise prepare the data for analysis. Usually these jobs involve reading source files, processing them, and writing the output to new files.  
 Real-time message ingestion
 If the solution includes real-time sources, the architecture must include a way to capture and store real-time messages for stream
processing. 
 Stream processing
After capturing real-time messages, the solution must process them by filtering, aggregating, and otherwise preparing the data for analysis.
The processed stream data is then written to an output sink. 
 Analytical data store
Many big data solutions prepare data for analysis and then serve the processed data in a structured format that can be queried using
analytical tools. The analytical data store used to serve these queries can be a Kimball-style relational data warehouse, as seen in most
traditional business intelligence (BI) solutions.
 Analysis and reporting
The goal of most big data solutions is to provide insights into the data through analysis and reporting.  
 Orchestration
Most big data solutions consist of repeated data processing operations, encapsulated in workflows, that transform source data, move data
between multiple sources and sinks, load the processed data into an analytical data store, or push the results straight to a report or
dashboard. To automate these workflows, you can use an orchestration technology such Azure Data Factory or Apache Oozie and Sqoop.
Big data architecture Usage
When to use this architecture
• Consider this architecture style when you need to:

 Store and process data in volumes too large for a traditional database
 Transform unstructured data for analysis and reporting
 Capture, process, and analyze unbounded streams of data in real time, or with low latency
Big data architecture Benefits
Advantages
• Technology choices
 Variety of technology options in open source and from vendors are available
• Performance through parallelism
 Big data solutions take advantage of parallelism, enabling high-performance solutions that scale to
large volumes of data.
• Elastic scale
 All of the components in the big data architecture support scale-out provisioning, so that you can adjust
your solution to small or large workloads, and pay only for the resources that you use.
• Interoperability with existing solutions
 The components of the big data architecture are also used for IoT processing and enterprise BI
solutions, enabling you to create an integrated solution across data workloads.
Big data architecture Challenges
Things to ponder upon
• Complexity
 Big data solutions can be extremely complex, with numerous components to handle data ingestion from
multiple data sources. It can be challenging to build, test, and troubleshoot big data processes.

• Skillset
 Many big data technologies are highly specialized, and use frameworks and languages that are not typical of
more general application architectures. On the other hand, big data technologies are evolving new APIs that
build on more established languages.

• Technology maturity
• Many of the technologies used in big data are evolving. While core Hadoop technologies such as Hive and
Pig have stabilized, emerging technologies such as Spark introduce extensive changes and enhancements
with each new release.
Thank You!
In our next session: Streaming Data Systems

You might also like