0% found this document useful (0 votes)
70 views40 pages

Lecture24 PDF

This document discusses Apache Kafka and Flume for programming big data. It provides an overview of Kafka, describing it as a fast, scalable, fault-tolerant messaging system that enables communication between producers and consumers using message-based topics. It also discusses Kafka's architecture and use cases. The document then briefly introduces Apache Flume, describing its architecture and comparing it to Kafka.

Uploaded by

Mariam shahid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
70 views40 pages

Lecture24 PDF

This document discusses Apache Kafka and Flume for programming big data. It provides an overview of Kafka, describing it as a fast, scalable, fault-tolerant messaging system that enables communication between producers and consumers using message-based topics. It also discusses Kafka's architecture and use cases. The document then briefly introduces Apache Flume, describing its architecture and comparing it to Kafka.

Uploaded by

Mariam shahid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 40

Programming for Big Data

Apache Ka ka and Flume

Saeed Iqbal Khattak

Centre for Healthcare Modelling & Informatics


Faculty of Information Technology,
University of Central Punjab, Lahore

January ,
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Outline

▶ Publish-Subscribe
▶ What is Apache Ka ka?
▶ Messaging Systems in Apache Ka ka
▶ Apache Ka ka Architecture
▶ Apache Ka ka Use Cases
▶ Apache Flume
▶ Apache Flume Architecture
▶ Apache Flume – Data Flow
▶ Apache Ka ka vs Flume

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

What is Ka ka
▶ Apache Ka ka is a fast, scalable, fault-tolerant messaging system:

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

What is Ka ka
▶ Apache Ka ka is a fast, scalable, fault-tolerant messaging system:
. enables communication between producers and consumers using message-based topics.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

What is Ka ka
▶ Apache Ka ka is a fast, scalable, fault-tolerant messaging system:
. enables communication between producers and consumers using message-based topics.
. it designs a platform for high-end new generation distributed applications.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

What is Ka ka
▶ Apache Ka ka is a fast, scalable, fault-tolerant messaging system:
. enables communication between producers and consumers using message-based topics.
. it designs a platform for high-end new generation distributed applications.
. it allows a large number of permanent or ad-hoc consumers.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

What is Ka ka
▶ Apache Ka ka is a fast, scalable, fault-tolerant messaging system:
. enables communication between producers and consumers using message-based topics.
. it designs a platform for high-end new generation distributed applications.
. it allows a large number of permanent or ad-hoc consumers.
. it is highly available and resilient to node failures and supports automatic recovery.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

What is Ka ka
▶ Apache Ka ka is a fast, scalable, fault-tolerant messaging system:
. enables communication between producers and consumers using message-based topics.
. it designs a platform for high-end new generation distributed applications.
. it allows a large number of permanent or ad-hoc consumers.
. it is highly available and resilient to node failures and supports automatic recovery.
▶ Apache Ka ka is a distributed data store optimized for ingesting and processing streaming
data in real-time.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

What is Ka ka
▶ Apache Ka ka is a fast, scalable, fault-tolerant messaging system:
. enables communication between producers and consumers using message-based topics.
. it designs a platform for high-end new generation distributed applications.
. it allows a large number of permanent or ad-hoc consumers.
. it is highly available and resilient to node failures and supports automatic recovery.
▶ Apache Ka ka is a distributed data store optimized for ingesting and processing streaming
data in real-time.
▶ Streaming data is data that is continuously generated by thousands of data sources, which
typically send the data records in simultaneously.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

What is Ka ka
▶ Apache Ka ka is a fast, scalable, fault-tolerant messaging system:
. enables communication between producers and consumers using message-based topics.
. it designs a platform for high-end new generation distributed applications.
. it allows a large number of permanent or ad-hoc consumers.
. it is highly available and resilient to node failures and supports automatic recovery.
▶ Apache Ka ka is a distributed data store optimized for ingesting and processing streaming
data in real-time.
▶ Streaming data is data that is continuously generated by thousands of data sources, which
typically send the data records in simultaneously.
▶ Ka ka provides three main functions to its users:
. Publish and subscribe to streams of records.
. E fectively store streams of records in the order in which records were generated.
. Process streams of records in real time.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

▶ Before moving deep into the Ka ka, you must aware of the main terminologies:
Message: In Ka ka o ten we consider data as a set of messages. A message is a simple array of bytes, e.g.
csv file.
Producer: Producer is an application that sends messages. It does not send messages directly to the
recipient. It send messages only to the Ka ka server.
Consumer: It is an application that reads messages from the Ka ka server. (i.e. consumers are the
recipients.) Consumers should have the permission to read the messages.
Broker: The broker is a Ka ka server. One can say that all Ka ka does is act as a message broker
between producer and consumer, because producer and consumer do not connect directly.
Cluster: Ka ka is a distributed system, it act as a cluster. That is, a group of computers sharing
workload for common purpose. Each instance contains a Ka ka broker.
Topics: A stream of messages belonging to a particular category is called a topic. Data is stored in
topics.
Partitions: Ka ka Brokers will store messages for a topic. But the capacity of data can be enormous and it
may not be possible to store in a single computer.
O fsets: O fset is a sequence of ids given to messages as the arrive at a partition. Once the o fset is
assigned it will never be changed.
Zookeeper: Zookeeper serves as the coordination interface between the Ka ka brokers and consumers.
/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Messaging Systems in Ka ka
▶ The main task of managing system is to transfer data from one application to another.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Messaging Systems in Ka ka
▶ The main task of managing system is to transfer data from one application to another.
▶ Messages are queued non-synchronously between the messaging system and client
applications.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Messaging Systems in Ka ka
▶ The main task of managing system is to transfer data from one application to another.
▶ Messages are queued non-synchronously between the messaging system and client
applications.
▶ There are two types of messaging patterns available:
. Point to point messaging system.

. Publish-subscribe messaging system

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Point to point messaging system

▶ More than one sender can produce and send messages to a queue. Senders can share a
connection or use di ferent connections, but they can all access the same queue.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Point to point messaging system

▶ More than one sender can produce and send messages to a queue. Senders can share a
connection or use di ferent connections, but they can all access the same queue.
▶ More than one receiver can consume messages from a queue, but each message can be
consumed by only one receiver. Thus, Message , Message , and Message are consumed
by di ferent receivers. (This is a message queue extension.)

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Point to point messaging system

▶ More than one sender can produce and send messages to a queue. Senders can share a
connection or use di ferent connections, but they can all access the same queue.
▶ More than one receiver can consume messages from a queue, but each message can be
consumed by only one receiver. Thus, Message , Message , and Message are consumed
by di ferent receivers. (This is a message queue extension.)
▶ Senders and receivers have no timing dependencies; the receiver can consume a message
whether or not it was running when the sender produced and sent the message.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Point to point messaging system

▶ More than one sender can produce and send messages to a queue. Senders can share a
connection or use di ferent connections, but they can all access the same queue.
▶ More than one receiver can consume messages from a queue, but each message can be
consumed by only one receiver. Thus, Message , Message , and Message are consumed
by di ferent receivers. (This is a message queue extension.)
▶ Senders and receivers have no timing dependencies; the receiver can consume a message
whether or not it was running when the sender produced and sent the message.
▶ The PTP messaging model can be further categorized into two types:
. Fire-and-forget model
. Request/reply model

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Publish-Subscribe Messaging System

▶ A Pub/Sub messaging model is used when you need to broadcast an event or message to
many message consumers.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Publish-Subscribe Messaging System

▶ A Pub/Sub messaging model is used when you need to broadcast an event or message to
many message consumers.
▶ In this messaging system, messages continue to remain in a Topic.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Publish-Subscribe Messaging System

▶ A Pub/Sub messaging model is used when you need to broadcast an event or message to
many message consumers.
▶ In this messaging system, messages continue to remain in a Topic.
▶ Contrary to Point to point messaging system, consumers can take more than one topic and
consume every message in that topic.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Publish-Subscribe Messaging System

▶ A Pub/Sub messaging model is used when you need to broadcast an event or message to
many message consumers.
▶ In this messaging system, messages continue to remain in a Topic.
▶ Contrary to Point to point messaging system, consumers can take more than one topic and
consume every message in that topic.
▶ Messages are shared through a channel called atopic. A topic is a centralized place where
producers can publish, and subscribers can consume, messages.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Publish-Subscribe Messaging System

▶ A Pub/Sub messaging model is used when you need to broadcast an event or message to
many message consumers.
▶ In this messaging system, messages continue to remain in a Topic.
▶ Contrary to Point to point messaging system, consumers can take more than one topic and
consume every message in that topic.
▶ Messages are shared through a channel called atopic. A topic is a centralized place where
producers can publish, and subscribers can consume, messages.
▶ Each message is delivered to one or more message consumers, called subscribers.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Publish-Subscribe Messaging System

▶ A Pub/Sub messaging model is used when you need to broadcast an event or message to
many message consumers.
▶ In this messaging system, messages continue to remain in a Topic.
▶ Contrary to Point to point messaging system, consumers can take more than one topic and
consume every message in that topic.
▶ Messages are shared through a channel called atopic. A topic is a centralized place where
producers can publish, and subscribers can consume, messages.
▶ Each message is delivered to one or more message consumers, called subscribers.
▶ The Publisher generally does not know and is not aware of which subscribers are receiving
the topic messages.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Ka ka Architecture

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Ka ka Architecture

▶ Apache Ka ka Architecture has four core APIs, Producer, Consumer, Streams, and
Connector API.
Producer: In order to publish a stream of records to one or more Ka ka topics, the Producer API allows
an application.
Consumer: This API permits an application to subscribe to one or more topics.
Streams: It consuming an input stream from one or more topics and producing an output stream to
one or more output topics.
Connector: While it comes to building and running reusable producers or consumers that connect Ka ka
topics to existing applications or data systems, we use the Connector API.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Ka ka Architecture – Ka ka Topics
▶ The topic is a logical channel to which producers publish message and from which the
consumers receive messages.
. A topic defines the stream of a particular type/classification of data, in Ka ka.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Ka ka Architecture – Ka ka Topics
▶ The topic is a logical channel to which producers publish message and from which the
consumers receive messages.
. A topic defines the stream of a particular type/classification of data, in Ka ka.
. Moreover, here messages are structured or organized. A particular type of messages is
published on a particular topic.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Ka ka Architecture – Ka ka Topics
▶ The topic is a logical channel to which producers publish message and from which the
consumers receive messages.
. A topic defines the stream of a particular type/classification of data, in Ka ka.
. Moreover, here messages are structured or organized. A particular type of messages is
published on a particular topic.
. Basically, at first, a producer writes its messages to the topics. Then consumers read those
messages from topics.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Ka ka Architecture – Ka ka Topics
▶ The topic is a logical channel to which producers publish message and from which the
consumers receive messages.
. A topic defines the stream of a particular type/classification of data, in Ka ka.
. Moreover, here messages are structured or organized. A particular type of messages is
published on a particular topic.
. Basically, at first, a producer writes its messages to the topics. Then consumers read those
messages from topics.
. In a Ka ka cluster, a topic is identified by its name and must be unique.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Ka ka Architecture – Ka ka Topics
▶ The topic is a logical channel to which producers publish message and from which the
consumers receive messages.
. A topic defines the stream of a particular type/classification of data, in Ka ka.
. Moreover, here messages are structured or organized. A particular type of messages is
published on a particular topic.
. Basically, at first, a producer writes its messages to the topics. Then consumers read those
messages from topics.
. In a Ka ka cluster, a topic is identified by its name and must be unique.
. There can be any number of topics, there is no limitation.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Ka ka Architecture – Ka ka Topics
▶ The topic is a logical channel to which producers publish message and from which the
consumers receive messages.
. A topic defines the stream of a particular type/classification of data, in Ka ka.
. Moreover, here messages are structured or organized. A particular type of messages is
published on a particular topic.
. Basically, at first, a producer writes its messages to the topics. Then consumers read those
messages from topics.
. In a Ka ka cluster, a topic is identified by its name and must be unique.
. There can be any number of topics, there is no limitation.
. We can not change or update data, as soon as it gets published.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Ka ka Architecture – Ka ka Partitions
▶ In a Ka ka cluster, Topics are split into Partitions and also replicated across
brokers/clusters.
. However, to which partition a published message will be written, there is no guarantee about
that.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Ka ka Architecture – Ka ka Partitions
▶ In a Ka ka cluster, Topics are split into Partitions and also replicated across
brokers/clusters.
. However, to which partition a published message will be written, there is no guarantee about
that.
. In one partition, messages are stored in the sequenced fashion.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Ka ka Architecture – Ka ka Partitions
▶ In a Ka ka cluster, Topics are split into Partitions and also replicated across
brokers/clusters.
. However, to which partition a published message will be written, there is no guarantee about
that.
. In one partition, messages are stored in the sequenced fashion.
. In a partition, each message is assigned an incremental id, also called o fset.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Ka ka Architecture – Ka ka Partitions
▶ In a Ka ka cluster, Topics are split into Partitions and also replicated across
brokers/clusters.
. However, to which partition a published message will be written, there is no guarantee about
that.
. In one partition, messages are stored in the sequenced fashion.
. In a partition, each message is assigned an incremental id, also called o fset.
. There can be any number of Partitions, there is no limitation.

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Ka ka Use Cases – eshop

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Ka ka Use Cases – Healthy AI

/
Outline Publish-Scribe What is Ka ka Messaging Systems in Ka ka Ka ka Architecture Ka ka Use Cases

Thank You

You might also like