Kafka

Kafka is a publish-subscribe messaging system that allows data from many sources to be streamed in real-time to a cluster for processing. Kafka servers store incoming messages from publishers in topics, and consumers subscribe to topics to receive the streaming data. Kafka can scale horizontally by distributing processes and storage across multiple servers, and consumers can also be distributed so that messages are load balanced among consumer groups.

Uploaded by

Nouhaila

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

0% found this document useful (0 votes)

15 views7 pages

Kafka

Uploaded by

Nouhaila

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

You are on page 1/ 7

STREAMING WITH

KAFKA
Publish/Subscribe Messaging with Kafka
What is streaming?

■ So far we’ve really just talked about processing historical, existing big data
– Sitting on HDFS
– Sitting in a database
■ But how does new data get into your cluster? Especially if it’s “Big data”?
– New log entries from your web servers
– New sensor data from your IoT system
– New stock trades
■ Streaming lets you publish this data, in real time, to your cluster.
– And you can even process it in real time as it comes in!
Two problems

■ How to get data from many different sources flowing into your cluster
■ Processing it when it gets there
■ First, let’s focus on the first problem
Enter Kafka

■ Kafka is a general-purpose publish/subscribe messaging system

■ Kafka servers store all incoming messages from publishers for some period of
time, and publishes them to a stream of data called a topic.
■ Kafka consumers subscribe to one or more topics, and receive data as it’s
published
■ A stream / topic can have many different consumers, all with their own
position in the stream maintained
■ It’s not just for Hadoop
Kafka architecture
Producers

App App App

DB App

Connectors Kafka Cluster Stream

Processors

DB App

App App App

Consumers
How Kafka scales

■ Kafka itself may be distributed among

many processes on many servers
– Will distribute the storage of stream
data as well
■ Consumers may also be distributed
– Consumers of the same group will
have messages distributed amongst
them
– Consumers of different groups will get
their own copy of each message

Image: kafka.apache.org
Let’s play

■ Start Kafka on our sandbox

■ Set up a topic
– Publish some data to it, and watch it get consumed
■ Set up a file connector
– Monitor a log file and publish additions to it

Apache Kafka Documentation
No ratings yet
Apache Kafka Documentation
419 pages
Understanding Apache Kafka White Paper
No ratings yet
Understanding Apache Kafka White Paper
7 pages
Introduction To Apache Kafka
No ratings yet
Introduction To Apache Kafka
15 pages
Kafka
No ratings yet
Kafka
50 pages
BDA Lab A7
No ratings yet
BDA Lab A7
10 pages
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
No ratings yet
Getting To Know Kafka: Ola Is The First Course in The Series of Courses Covering All The Aspects of Kafka
23 pages
Apache Kafka Introduction
No ratings yet
Apache Kafka Introduction
21 pages
Apache Kafka - Introduction
No ratings yet
Apache Kafka - Introduction
2 pages
Kafka101training Public v2 140818033637 Phpapp01
No ratings yet
Kafka101training Public v2 140818033637 Phpapp01
119 pages
Learning Apache Kafka - Second Edition - Sample Chapter
No ratings yet
Learning Apache Kafka - Second Edition - Sample Chapter
12 pages
unit 3
No ratings yet
unit 3
26 pages
Documentation
No ratings yet
Documentation
105 pages
Kafka Ebook SoftwareMill
No ratings yet
Kafka Ebook SoftwareMill
27 pages
Kafka
No ratings yet
Kafka
1 page
Apache Kafka - Introduction - Tutorialspoint
No ratings yet
Apache Kafka - Introduction - Tutorialspoint
3 pages
KAFKAExample2
No ratings yet
KAFKAExample2
12 pages
01 - Chapter Introduction To AMQ Streams
No ratings yet
01 - Chapter Introduction To AMQ Streams
10 pages
i
No ratings yet
i
26 pages
Apache Kafka Long Polling
No ratings yet
Apache Kafka Long Polling
20 pages
Apache Kafka Tutorial
No ratings yet
Apache Kafka Tutorial
6 pages
Apache Kafka 101
No ratings yet
Apache Kafka 101
25 pages
Apache Kafka
No ratings yet
Apache Kafka
17 pages
Kafka Patterns and Anti-Patterns
No ratings yet
Kafka Patterns and Anti-Patterns
7 pages
Stream Processing Using Kafka
No ratings yet
Stream Processing Using Kafka
46 pages
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Lecture Intro Kafka
No ratings yet
Lecture Intro Kafka
27 pages
Kafka Clustering v1.0.0
No ratings yet
Kafka Clustering v1.0.0
20 pages
Kafka Notes
No ratings yet
Kafka Notes
7 pages
Event-Driven Architecture- Building Scalable Systems With Apache Kafka - The Tal
No ratings yet
Event-Driven Architecture- Building Scalable Systems With Apache Kafka - The Tal
19 pages
Getting Started With Apache Kafka in Python - Towards Data Science PDF
No ratings yet
Getting Started With Apache Kafka in Python - Towards Data Science PDF
17 pages
4. Introduction to Apache Kafka and its setup (3)
No ratings yet
4. Introduction to Apache Kafka and its setup (3)
29 pages
_Data_and_AI_Kafka_Overview_1740507867
No ratings yet
_Data_and_AI_Kafka_Overview_1740507867
20 pages
1646412329504-CCDAK_study_guide
No ratings yet
1646412329504-CCDAK_study_guide
56 pages
Kafka: Big Data Huawei Course
No ratings yet
Kafka: Big Data Huawei Course
14 pages
Kafka Reference Architecture
No ratings yet
Kafka Reference Architecture
12 pages
Kafka
No ratings yet
Kafka
23 pages
Kafka Architectures Notes
No ratings yet
Kafka Architectures Notes
9 pages
Introduction To Apache Kafka - 070224-1155-334
No ratings yet
Introduction To Apache Kafka - 070224-1155-334
7 pages
Instaclustr Understanding Apache Kafka White Paper
No ratings yet
Instaclustr Understanding Apache Kafka White Paper
8 pages
kafka
No ratings yet
kafka
43 pages
Real Time Analytics With Apache Kafka and Spark: Rahul Jain
No ratings yet
Real Time Analytics With Apache Kafka and Spark: Rahul Jain
54 pages
Kafka Sparkstreaming
No ratings yet
Kafka Sparkstreaming
75 pages
Kafka As A Storage System
No ratings yet
Kafka As A Storage System
6 pages
Basics of Kafka
No ratings yet
Basics of Kafka
17 pages
HD Mod011 Kafka
No ratings yet
HD Mod011 Kafka
29 pages
Bda 07
No ratings yet
Bda 07
9 pages
Benchmarking Apache Kafka - 2 Million Writes Per Second (On Three Cheap Machines) - LinkedIn Engineering
No ratings yet
Benchmarking Apache Kafka - 2 Million Writes Per Second (On Three Cheap Machines) - LinkedIn Engineering
9 pages
Bigdata Notes
No ratings yet
Bigdata Notes
26 pages
Apache Kafka
No ratings yet
Apache Kafka
27 pages
Streaming Ecosystem
No ratings yet
Streaming Ecosystem
31 pages
Interview Question
No ratings yet
Interview Question
24 pages
Apache Kafka
No ratings yet
Apache Kafka
17 pages
Kafka
No ratings yet
Kafka
5 pages
Anuja Himanshu Runwal: Angular 4.0, HTML, Javascript, Highcharts - Js
No ratings yet
Anuja Himanshu Runwal: Angular 4.0, HTML, Javascript, Highcharts - Js
4 pages
Integrating Apache Nifi and Apache Kafka
No ratings yet
Integrating Apache Nifi and Apache Kafka
5 pages
KafkaDemo
No ratings yet
KafkaDemo
12 pages
Step 19 Kafka Optional
No ratings yet
Step 19 Kafka Optional
10 pages
Abdul Faheem Khan
No ratings yet
Abdul Faheem Khan
4 pages
The Apache Kafka® and Generative AI Handbook
From Everand
The Apache Kafka® and Generative AI Handbook
Joseph Matthew Stein
No ratings yet
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
From Everand
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
Peter Jones
No ratings yet