Apache_Kafka_360_1631077800

Apache Kafka – 360 View
*8
Compiled by Navaneetha Babu C
Apache Kafka – 360 VIew
Course Outline
❏ Apache Kafka Introduction

❏ Low-Level Architecture
❏ Advanced Kafka Producers and Consumers

Apache Kafka Introduction
❏ Why Apache Kafka
❏ Apache Kafka Architecture
❏ Overview of Key Concepts ❏ Replicas, Followers and Leaders
❏ Apache Zookeeper ❏ Disaster Recovery – High Level
❏ Cluster, Nodes and Kafka Brokers ❏ High Water Mark
❏ Kafka Topic and Kafka APIs ❏ Consumer Load Balancing
❏ Kafka partitions, Records and Keys ❏ Fail-over
❏ Consumers and Producers
❏ Kafka Logs
❏ Kafka Partitions for Write Throughput
❏ Partitions for Consumer Parallelism

Why Apache Kafka
q Kafka is a distributed, scalable, fault-tolerant, publish-subscribe messaging system that enables you to
build distributed applications
q Kafka was developed around 2010 at LinkedIn.
q The problem they originally set out to solve was low-latency ingestion of large amounts of event data
from the LinkedIn website and infrastructure into a lambda architecture that harnessed Hadoop and
real-time event processing systems.
q Real-time systems such as the traditional messaging queues (think ActiveMQ, RabbitMQ, etc.) have
great delivery guarantees and support things such as transactions, protocol mediation, and message
consumption tracking, but they are overkill for the use case LinkedIn had in mind.

Why Apache Kafka
q Kafka was developed to be the ingestion backbone for this type of use case.
q Kafka was ingesting more than 1 billion events a day.
q LinkedIn has reported ingestion rates of 1 trillion messages a day.
q Kafka looks and feels like a publish-subscribe system that can deliver in-order, persistent, scalable
messaging. It has publishers, topics, and subscribers.
q Kafka can also partition topics and enable massively parallel consumption
q All messages written to Kafka are persisted and replicated to peer brokers for fault tolerance, and
those messages stay around for a configurable period of time

Why Apache Kafka

❏ Kafka Logs

Apache kafka Architecture

q Kafka consists of Records, Topics, Consumers, Producers, Brokers, Logs, Partitions, and Clusters.
Records can have key (optional), value and timestamp.
q Kafka Records are immutable
q A Kafka Topic is a stream of records
q A topic has a Log which is the topic’s storage on disk.
q A Topic Log is broken up into partitions and segments.
q The Kafka Producer API is used to produce streams of data records

q The Kafka Consumer API is used to consume a stream of records from Kafka.
q A Broker is a Kafka server that runs in a Kafka Cluster.
q Kafka Brokers form a cluster
q The Kafka Cluster consists of many Kafka Brokers.
q Kafka uses ZooKeeper to manage the cluster

q Kafka communication from clients and servers uses a wire protocol over TCP that is versioned and
documented.
q Kafka promises to maintain backwards compatibility with older clients, and many languages are
supported.
q Kafka is fast because it avoids copying buffers in-memory (Zero Copy), and streams data to
immutable logs instead of using random access.

❏ Kafka Logs

Overview of key Concepts

q Topics
- Kafka maintains feeds of messages in categories called topics. Each topic has a user-defined
category (or feed name), to which messages are published.
- For each topic, the Kafka cluster maintains a structured commit log with one or more partitions

q Partitions
- Kafka appends new messages to a partition in an ordered, immutable sequence.
- Each message in a topic is assigned a sequential number that uniquely identifies the message
within a partition which is called offset.
- Partition support for topics provides parallelism. In addition, because writes to a partition are
sequential, the number of hard disk seeks is minimized. This reduces latency and increases performance.

q Producers
- Producers are processes that publish messages to one or more Kafka topics.
- The producer is responsible for choosing which message to assign to which partition within a
topic.
- Assignment can be done in a round-robin fashion to balance load, or it can be based on a

semantic partition function.

q Consumer
- Consumers are processes that subscribe to one or more topics and process the feeds of
published messages from those topics.
- Kafka consumers keep track of which messages have already been consumed by storing the
current offset. Because Kafka retains all messages on disk for a configurable amount of time, consumers
can use the offset to rewind or skip to any point in a partition.

q Broker
- A Kafka cluster consists of one or more servers, each of which is called a broker.
- Producers send messages to the Kafka cluster, which in turn serves them to consumers. Each
broker manages the persistence and replication of message data.
- Kafka Brokers scale and perform well in part because Brokers are not responsible for keeping
track of which messages have been consumed. Instead, the message consumer is responsible for this. This
design feature eliminates the potential for back-pressure when consumers process messages at different
rates.

❏ Kafka Logs

Apache Zookeeper
q Apache ZooKeeper is an open-source project which deals with maintaining configuration information,
naming, providing distributed synchronization, group services for various distributed applications.
q Zookeeper implements various protocols on the cluster so that the applications need not to implement
them on their own.
q Developed originally at Yahoo, ZooKeeper facilitates synchronization among the process by maintaining
a status on ZooKeeper servers that stores information in local log files.
q The Zookeeper servers are capable of supporting a large Hadoop, kafka, Cassandra and many other
cluster. Each client machine communicates to one of the servers to retrieve information.

Apache Zookeeper

Apache Zookeeper
q Zookeeper server is replicated over a set of machines
q All machines stores a copy of the data(in-memory)
q A leader is elected on service start-up
q Client connects only to a single zookeeper server and maintains a TCP connection
q Clients can read from any zookeeper server, writes go through the leader and needs majority consensus.
q Read Requests are processed locally at the zookeeper server to which the client is currently connected.
q Write Requests are forwarded to the leader and go through majority consensus before a response is
generated

Apache Zookeeper - Overview
q Client
- One of the nodes in our distributed application cluster, access information from the server. For a particular time
interval, every client sends a message to the server to let the sever know that the client is alive.
- Similarly, the server sends an acknowledgement when a client connects. If there is no response from the
connected server, the client automatically redirects the message to another server.
q Server
- Server, one of the nodes in our ZooKeeper ensemble, provides all the services to clients. Gives
acknowledgement to client to inform that the server is alive
q Ensemble
- Group of ZooKeeper servers. The minimum number of nodes that is required to form an ensemble is 3.
q Leader
- Server node which performs automatic recovery if any of the connected node failed. Leaders are elected on
service startup
q Follower
- Server node which follows leader instruction.

Apache Zookeeper Use Cases
q Configuration Management
- Cluster member nodes bootstrapping configuration from a centralized source.
- Easier, simpler deployment/provisioning
q Distributed Cluster Management

- Node Join/Leave
- Node status in real-time
q Naming Services – DNS
q Distributed syncronization – Locks, barriers and queues
q Leader election in Distributed system
q Centralized and highly reliable data registry

Apache Zookeeper Consistency
q Sequential Consistency – Updates are applied in order
q Atomicity – Updates either succeed of fail
q Single System Image – A Client sees the same view of the service regardless of the ZK server it connects
to.
q Reliability – Updates persists once applied, till overwritten by some clients
q Timeliness – The client’s view of the system is guaranteed to be up-to-date within a certain time bound.

Apache Zookeeper Features
q Naming Services
- Identifying ZooKeeper attaches a unique identification to every node which is quite similar to
the DNA that helps identify it.
q Updating the Node’s status

- Apache Zookeeper is capable of updating every node which allows it to store updated
information about each node across the cluster.
q Managing the cluster

- Able to manage the cluster in such a way that the status of each node is maintained in real-time
leaving lesser chances for errors and ambiguity
q Automatic failure recovery

- Apache ZooKeeper locks the data while modifying which helps the cluster to recover it
automatically if a failure occurs in the database.

Apache Zookeeper Features
q Naming Services
- Identifying ZooKeeper attaches a unique identification to every node which is quite similar to
the DNA that helps identify it.
q Updating the Node’s status

- Apache Zookeeper is capable of updating every node which allows it to store updated
information about each node across the cluster.
q Managing the cluster

- Able to manage the cluster in such a way that the status of each node is maintained in real-time
leaving lesser chances for errors and ambiguity
q Automatic failure recovery

- Apache ZooKeeper locks the data while modifying which helps the cluster to recover it
automatically if a failure occurs in the database.


❏ Replicas, Followers and Leaders
❏ Overview of Key Concepts
❏ Disaster Recovery – High Level
❏ Apache Zookeeper
❏ High Water Mark
❏ Cluster, Nodes and Kafka Brokers
❏ Consumer Load Balancing
❏ Kafka Topic and Kafka APIs
❏ Fail-over
❏ Kafka partitions, Records and Keys
❏ Kafka Logs
Kafka Cluster
q A collection of Kafka broker forms the cluster.
q One of the brokers in the cluster is designated as a controller, which is responsible for handling the
administrative operations as well as assigning the partitions to other brokers.
q The controller also keeps track of broker failures.
q Kafka uses Apache ZooKeeper as the distributed configuration store. It forms the backbone of Kafka
cluster that continuously monitors the health of the brokers.
q When new brokers get added to the cluster, ZooKeeper will start utilizing it by creating topics and
partitions on it.

Kafka Nodes and Broker

q A node is a single computer in the Apache Kafka cluster.
q Each node in the cluster is called a Kafka broker.
q Kafka cluster typically consists of multiple brokers to maintain load balance.
q Kafka brokers are stateless, so they use ZooKeeper for maintaining their cluster state.
q One Kafka broker instance can handle hundreds of thousands of reads and writes per second and
each bro-ker can handle TB of messages without performance impact.
q Kafka broker leader election can be done by ZooKeeper.
q A Kafka broker receives messages from producers and stores them on disk keyed by unique offset.
q A Kafka broker allows consumers to fetch messages by topic, partition and offset

q Each Kafka Broker has a unique ID (number)
q Kafka Brokers contain topic log partitions
q Connecting to one broker bootstraps a client to the entire Kafka cluster.
q For failover, you want to start with at least three to five brokers.
q A Kafka cluster can have, 10, 100, or 1,000 brokers in a cluster if needed.


❏ High Water Mark
❏ Fail-over
❏ Kafka Logs
Kafka Topic
q Kafka provides for a stream of records—the topic
q A topic is a category or feed name to which records are published.
q Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that
subscribe to the data written to it.
q For each topic, the Kafka cluster maintains a partitioned log that looks like this:

Kafka Topic
q Lets say a Bank with customer 360 capturing events from various side of the business using kafka
- Create separate topic for the customer personal events

- Create separate topic for payments related events
- Create another topic for credit card related events.
q By this way you can isolate the data flow.
q We can also isolate the real-time, batch and near real-time data flows using topics

Kafka APIs
q Kafka Producer API

- The Producer API allows an application to publish a stream of records to one or more Kafka
topics.
q Kafka Consumer API

- The Consumer API allows an application to subscribe to one or more topics and process the
stream of records produced to them.
q Kafka Stream API

- The Streams API allows an application to act as a stream processor, consuming an input stream
from one or more topics and producing an output stream to one or more output topics, effectively
transforming the input streams to output streams.
q Kafka Connector API

- The Connector API allows building and running reusable producers or consumers that connect
Kafka topics to existing applications or data systems. For example, a connector to a relational database might
capture every change to a table

❏ High Water Mark
❏ Fail-over
❏ Kafka Logs
Kafka Partitions
q Kafka can replicate partitions across a configurable number of Kafka servers which is used for fault
tolerance.
q Each kafka broker holds a number of partitions and each of these partitions can be either a leader or a
replica for a topic.
q Each partition has a leader server and zero or more follower servers.
q All writes and reads to a topic go through the leader and the leader coordinates updating replicas with
new data.
q If a leader fails, a replica takes over as the new leader with the help of zookeeper
q Kafka also uses partitions for parallel consumer handling within a group.
q Kafka distributes topic log partitions over servers in the Kafka cluster. Each server handles its share of
data and requests by sharing partition leadership.
Kafka Partitions

Kafka Record – Key value pair
q Kafka record is nothing but a key/value pair to be sent to Kafka
q This consists of a topic name to which the record is being sent, an optional partition number, and an
optional key and value.
q If a valid partition number is specified that partition will be used when sending the record.
q If no partition is specified but a key is present a partition will be chosen using a hash of the key
q If neither key nor partition is present a partition will be assigned in a round-robin fashion.
q The record also has an associated timestamp. If the user did not provide a timestamp, the producer will
stamp the record with its current time.
q The timestamp eventually used by Kafka depends on the timestamp type configured for the topic.


❏ High Water Mark
❏ Fail-over
❏ Kafka Logs
Kafka Producer and Consumer

Kafka Producer

Kafka Producer
q Kafka producers send records to topics.
q The producer picks which partition to send a record to per topic.
q The producer can send records round-robin
q The producer could implement priority systems based on sending records to certain partitions based on
the priority of the record.
q Producers send records to a partition based on the record’s key.
q The default partitioner for Java uses a hash of the record’s key to choose the partition or uses a round-
robin strategy if the record has no key.
q Producers write at their cadence so the order of Records cannot be guaranteed across partitions

Kafka Consumer

Kafka Consumer
q Kafka Consumers are typicaclly part of kafka consumer groups
q Consumer group is a multi-threaded or multi-machine consumption from Kafka topics
q The maximum parallelism of a group is that the number of consumers in the group - no of partitions
q Kafka assigns the partitions of a topic to the consumer in a group, so that each partition is consumed
by exactly one consumer in the group.
q Kafka guarantees that a message is only ever read by a single consumer in the group.
q Consumers can see the message in the order they were stored in the log.


❏ High Water Mark
❏ Fail-over
❏ Kafka Logs
Kafka Logs
q A log for a topic named "my_topic" with two partitions consists of two directories
(namely my_topic_0 and my_topic_1) populated with data files containing the messages for that topic.
q The format of the log files is a sequence of "log entries""; each log entry is a 4 byte integer N storing
the message length which is followed by the N message bytes.
q Each message is uniquely identified by a 64-bit integer offset giving the byte position of the start of
this message in the stream of all messages ever sent to that topic on that partition.
q A GUID will be generated by the producer for the uniqueness and the consumers will read the logs
based upon the GUID(offset) and partition pair.
q Each message has its value, offset, timestamp, key, message size, compression codec, checksum,
and version of the message format.

❏ Kafka Logs
❏ Kafka Partitions for Write
Throughput
Kafka Partitions for Write Throughput
q Kafka always write data to files immediately.
q Recommend using multiple drives to get good throughput. Do not share the same drives with any
other application or for kafka application logs.
q Multiple drives can be configured using log.dirs in server.properties. Kafka assigns partitions in round-
robin fashion to log.dirs directories.
q If the data is not well balanced among partitions this can lead to load imbalance among the disks.
Also kafka currently doesn’t good job of distributing data to less occupied disk in terms of space. So
users can easily run out of disk space on 1 disk and other drives have free disk space and which itself
can bring the Kafka down.

q RAID can potentially do better load balancing among the disks. But RAID can cause performance
bottleneck due to slower writes and reduces available disk space.
q RAID can tolerate disk failures but rebuilding RAID array is I/O intensive that effectively disables the
server. So RAID doesn’t provide much real availability improvement.
q Recommended settings –
File Descriptors limits: Kafka needs open file descriptors for files and network connections .
We recommend at least 128000 allowed for file descriptors.
Max socket buffer size , can be increased to enable high-performance data transfer.

q Kafka uses regular files on disk, and such it has no hard dependency on a specific file system
q Recommend EXT4 or XFS. Recent improvements to the XFS file system have shown it to have the
better performance characteristics for Kafka’s workload without any compromise in stability.
q Do not use mounted shared drives and any network file systems


❏ High Water Mark
❏ Fail-over
❏ Kafka Logs

q Each Kafka consumer belongs to a consumer group i.e. it can be thought of as a logical
container/namespace for a bunch of consumers
q A consumer group can choose to receive messages from one or more topics
q Instances in a consumer group can receive messages from zero, one or more partitions within each
topic (depending on the number of partitions and consumer instances)
q Kafka makes sure that there is no overlap as far as message consumption is concerned i.e. a
consumer (in a group) receives messages from exactly one partition of a specific topic
q The partition to consumer instance assignment is internally taken care of by Kafka and this process
is dynamic in nature

Partitions for Consumer Parallelism
q Scaling In and scale out of Consumer groups are possible
q Scalability is the ability to consume messages which are both high volume and velocity.
q Kafka transparently load balances traffic from all partitions amongst a bunch of consumers in a group
which means that a consuming application can respond to higher performance and throughput
q If Consumer to Partition ratio is equal to 1 in which each consumer will receive messages from exactly
one partition i.e. one-to-one co-relation
q If Consumer to Partition ratio is less than 1 some consumers might receive from more than 1 partition
q If consumer to partition ratio is more than 1 some consumers will remain idle


❏ High Water Mark
❏ Fail-over
❏ Kafka Logs
Kafka Replicatiton
q The broker is responsible for below things in Kafka

- Maintaining high availability and consistency in the cluster.
- Handling requests from producers and consumers.
- Storing the messages in Kafka
q Same like HDFS, For Maintaining high availability, Kafka is also totally depended upon the replication.
q We know that the Kafka cluster stores streams of records in categories called topics and topics stores
the data in partitions.
q Partitions are also the way that Kafka provides redundancy and scalability.
q If we want to achieve high availability in Kafka we need the redundant copy of each partition across
the clusters.

Kafka Replicatiton
q Kafka will always replicate partitions for maintaining high availability of a topic. each copy of the
partition is called as replicas.
q There are two types of Replica

- Leader Replica and
- Follower Replica
q The leader always handles all read and write requests for any particular partition. \
q Leader handles the write request from producer and writes messages to the Kafka cluster.
q After every 500 milliseconds, followers connects to the leader and fetches the new message and
keeps himself ready for the event of leader failes.

Leader Replica
q If replication factor is set to 3, each partition will have 3 replicas which will be stored across the nodes
of a cluster and not on the same node.
q Out of this 3 replicas, consumer and producer will always use a specific replica for both producing
and consuming messages. i.e all read and write request for a particular partition will always be served
by a specific replica and those replicas are called as Leader replicas.
q Each partition has a single replica designated as the leader.
q Leader replica is responsible for reading and writing a data in particular partition.
q Without leader replica, you can not read or write a data to or from the partition.

Follower Replica
q All replicas which are not leader replicas are called as a follower replicas.
q The only job follower replica does is keeping himself up to date with leader replica by fetching a
message from leader replica so that in the event of leader failures any follower replicas can act as a
leader replica.
q A leader is elected while starting up of the kafka broker by IDs assigned to it in incremental style
q Once the leader is not available, the ID with next increment will be the leader.

Leader and Follower
q Apart from serving read and write request of producer and consumers, Leader is also responsible for
keeping a track of replicas which are up to date (In sync) with the leader.
q In order to keep himself in sync with leader, followers always connect to the leader after every 500
milliseconds (replica.fetch.wait.max.ms) and fire a fetch request to the leader.
q In every fetch request, follower sends the offset number that he wants to fetch from the leader.
q The rule is follower can’t request any random offset from the leader, i.e. in order to request offset 4
follower must have offsets till 3. This is how leader comes to know which follower is at what offset.
q The followers which are not having the previous offset within 10 seconds of time are termed as out of
sync follower
q Replicas who are continuously in sync with leader are called as In sync replicas (ISR).

Controller
q Till now we have understood that leader is the main component using which read and writes to Kafka
cluster happens.
q In order to the continuous functionality of Kafka cluster, we need redundancy in the leader of each
partition.
q To achieve this redundancy, the broker needs to perform one more role i.e the role of controller.
q The controller is one of the broker node which is responsible for the leader election of each partition.
q The very first broker which we start in Kafka cluster will always be a controller of Kafka cluster.
q Same like how broker creates ephemeral znode in zookeeper while starting himself in /brokers/ids it
also tries to create ephemeral znode like /controller and whoever is able to create ephemeral znode
acts as a controller.


❏ High Water Mark
❏ Fail-over
❏ Kafka Logs
Disaster Recovery – High Level
q Datacenter downtime and data loss can result in businesses losing a vast amount of revenue or
entirely halting operations.
q To minimize the downtime and data loss resulting from a disaster, enterprises create business
continuity plans and disaster recovery strategies.
q If disaster strikes—catastrophic hardware failure, software failure, power outage, denial of service
attack, or any other event that causes one datacenter to completely fail—Kafka continues running in
another datacenter until service is restored.
q Kafka's mirroring feature makes it possible to maintain a replica of an existing Kafka cluster.

Disaster Recovery – High Level
Kafka Mirror Maker will be discussed in upcoming chapter


❏ High Water Mark
❏ Fail-over
❏ Working on Partitions for parallel
processing and resiliency
❏ Kafka Logs
High Water mark

High Water mark
q The high watermark indicated the offset of messages that are fully replicated, while the end-of-log offset
might be larger if there are newly appended records to the leader partition which are not replicated yet.
q Consumers can only consume messages up to the high watermark.


❏ High Water Mark
❏ Fail-over
❏ Kafka Logs
Consumer Rebalancing
Recap
q Kafka consumers are part of consumer groups.
q A group has one or more consumers in it. Each partition gets assigned to one consumer.
q Partitions are how Kafka scales out
q If you have more consumers than partitions, then some of your consumers will be idle.
q If you have more partitions than consumers, more than one partition may get assigned to a single
consumer.

Consumer Rebalancing
q When a new consumer joins, a rebalance occurs, and the new consumer is assigned some partitions
previously assigned to other consumers.
q If there were 10 partitions all being consumed by one consumer, and another consumer joins, there'll be
a rebalance, and afterwards, there'll be (typically) five partitions per consumer.
q It's worth noting that during a rebalance, the consumer group "pauses". A similar thing happens when
consumers gracefully leave, or the leader detects that a consumer has left.


❏ High Water Mark
❏ Fail-over
❏ Working on Partitions for parallel
processing and resiliency
❏ Kafka Logs

❏ High Water Mark
❏ Fail-over
❏ Kafka Logs
Course Outline


Low Level Architecture
❏ Kafka Design Motivatiton

❏ Kafka persistance
❏ Producer Durability
❏ Kafka Producer load Balancing
❏ Kafka Producer Atomic Log writes
❏ Kafka Producer Record Batching
❏ Kafka Broker Failover
❏ Kafka Compression
❏ Replicated Log partitions
❏ Pull vs Push
❏ Kafka and Quorum
❏ Kafka Consumer message state
❏ Quotas
Tracking
❏ Message Delivery Semantics
❏ Kafka Producer Durability and
Acknowledgement
Kafka Design Motivation
q Kafka was designed to feed analytics system that did real-time processing of streams.
q The goal behind Kafka, build a high-throughput streaming data platform that supports high-volume event
streams like log aggregation, user activity, etc.
q Kafka was also designed to handle periodic large data loads from offline systems as well as traditional
messaging use-cases, low-latency.
q MOM is message oriented middleware think IBM MQSeries, JMS, ActiveMQ, and RabbitMQ.
q Like many MOMs, Kafka is fault-tolerance for node failures through replication and leadership election.
q However, the design of Kafka is more like a distributed database transaction log than a traditional
messaging system. Unlike many MOMs, Kafka replication was built into the low-level design and is not an
afterthought.

Kafka Persistance
q Kafka relies on the file system for storing and no caching records.
q The disk performance of hard drives performance of sequential writes is fast.
q JBOD is just a bunch of disk drives. JBOD configuration with six 7200rpm SATA RAID-5 array is about
600MB/sec.
q Like Cassandra tables, Kafka logs are write only structures, meaning, data gets appended to the end of
the log.
q When using HDD, sequential reads and writes are fast, predictable, and heavily optimized by operating
systems. Using HDD, sequential disk access can be faster than random memory access and SSD.

Kafka Persistance
q While JVM GC overhead can be high, Kafka leans on the OS a lot for caching, which is big, fast and rock
solid cache.
q OS file caches are almost free and don’t have the overhead of the OS. Implementing cache coherency is
challenging to get right, but Kafka relies on the rock solid OS for cache coherence.
q Using the OS for cache also reduces the number of buffer copies. Since Kafka disk usage tends to do
sequential reads, the OS read-ahead cache is impressive.

Kafka Producer load Balancing
q The producer asks the Kafka broker for metadata about which Kafka broker has which topic partitions
leaders thus no routing layer needed.
q This leadership data allows the producer to send records directly to Kafka broker partition leader.
q The Producer client controls which partition it publishes messages to, and can pick a partition based on
some application logic.
q Producers can partition records by key, round-robin or use a custom application-specific partitioner logic.

Kafka Producer Record Batching
q Kafka producers support record batching. Batching can be configured by the size of records in bytes in
batch. Batches can be auto-flushed based on time.
q Batching is good for network IO throughput.
q Batching speeds up throughput drastically.
q Buffering is configurable and lets you make a tradeoff between additional latency for better throughput.
q Batching allows accumulation of more bytes to send, which equate to few larger I/O operations on Kafka
Brokers and increase compression efficiency.

Kafka Compression
q In large streaming platforms, the bottleneck is not always CPU or disk but often network bandwidth.
q Batching is beneficial for efficient compression and network IO throughput.
q Kafka provides end-to-end batch compression instead of compressing a record at a time, Kafka
efficiently compresses a whole batch of records.
q The same message batch can be compressed and sent to Kafka broker/server in one go and written in
compressed form into the log partition.
q We can configure the compression so that no decompression happens until the Kafka broker delivers
the compressed records to the consumer.
q Kafka supports GZIP, Snappy and LZ4 compression protocols

Pull vs Push
q Messaging is usually a pull-based system
q With the pull-based system, if a consumer falls behind, it catches up later when it can.
q Since Kafka is pull-based, it implements aggressive batching of data.
q A long poll keeps a connection open after a request for a period and waits for a response.
q Push based push data to consumers (scribe, flume, reactive streams, RxJava, Akka)
q Push-based or streaming systems have problems dealing with slow or dead consumers.
q It is possible for a push system consumer to get overwhelmed when its rate of consumption falls below
the rate of production.

Pull vs Push
q Push-based or streaming systems can send a request immediately or accumulate requests and send in
batches.
q The consumer can accumulate messages while it is processing data already sent which is an advantage
to reduce the latency of message processing.

Kafka Consumer message state Tracking
q Message tracking is not an easy task. As consumer consumes messages, the broker keeps track of the
state.
q Remember that Kafka topics get divided into ordered partitions. Each message has an offset in this
ordered partition. Each topic partition is consumed by exactly one consumer per consumer group at a
time.
q This partition layout means, the Broker tracks the offset data not tracked per message like MOM, but
only needs the offset of each consumer group, partition offset pair stored. This offset tracking equates to
a lot fewer data to track.
q The consumer sends location data periodically (consumer group, partition offset pair) to the Kafka
broker, and the broker stores this offset data into an offset topic.
q The offset style message acknowledgment is much cheaper compared to MOM.

Message Delivery Semantics
q There are three message delivery semantics: at most once, at least once and exactly once.
q “At most once” is messages may be lost but are never redelivered.
q “At least once” is messages are never lost but may be redelivered.
q “Exactly once” is each message is delivered once and only once. Exactly once is preferred but more
expensive, and requires more bookkeeping for the producer and consumer.
q Recall that all replicas have exactly the same log partitions with the same offsets and the consumer
groups maintain its position in the log per topic partition.

Message Delivery Semantics - at most once
q To implement “at-most-once” consumer reads a message, then saves its offset in the partition by
sending it to the broker, and finally process the message.
q The issue with “at-most-once” is a consumer could die after saving its position but before processing the
message.
q Then the consumer that takes over or gets restarted would leave off at the last position and message in
question is never processed.

Message Delivery Semantics – at least once
q To implement “at-least-once” the consumer reads a message, process messages, and finally saves offset
to the broker.
q The issue with “at-least-once” is a consumer could crash after processing a message but before saving
last offset position.
q If the consumer is restarted or another consumer takes over, the consumer could receive the message
that was already processed.
q The “at-least-once” is the most common set up for messaging, and it is your responsibility to make the
messages idempotent, which means getting the same message twice will not cause a problem.

Message Delivery Semantics – Exactly once
q To implement “exactly once” on the consumer side, the consumer would need a two-phase commit
between storage for the consumer position, and storage of the consumer’s message process output. Or,
the consumer could store the message process output in the same location as the last offset.
q Kafka offers the first two, and it up to you to implement the third from the consumer perspective.

Kafka Producer Durability and Acknowledgement
q Kafka’s offers operational predictability semantics for durability.
q When publishing a message, a message gets “committed” to the log which means all ISRs accepted the
message.
q This commit strategy works out well for durability as long as at least one replica lives.
q The producer connection could go down in middle of send, and producer may not be sure if a message
it sent went through, and then the producer resends the message.
q This resend-logic is why it is important to use message keys and use idempotent messages
q The producer can resend a message until it receives confirmation, i.e., acknowledgment received.
q The producer resending the message without knowing if the other message it sent made it or not,
negates “exactly once” and “at-most-once” message delivery semantics.

Producer Durability
q The producer can specify durability level.
q The producer can wait on a message being committed. Waiting for commit ensures all replicas have a
copy of the message.
q The producer can send with no acknowledgments (0)
q The producer can send with just get one acknowledgment from the partition leader (1).
q The producer can send and wait on acknowledgments from all replicas (-1), which is the default.

Kafka Producer Atomic Log writes
q Kafka producers having atomic write across partitions.
q The atomic writes mean Kafka consumers can only see committed logs which can be configurable.
q Kafka has a coordinator that writes a marker to the topic log to signify what has been successfully
transacted.
q The transaction coordinator and transaction log maintain the state of the atomic writes.

Kafka Broker Failover
q Kafka keeps track of which Kafka brokers are alive.
q To be alive, a Kafka Broker must maintain a ZooKeeper session using Zookeeper's heartbeat
mechanism, and must have all of its followers in-sync with the leaders and not fall too far behind.
q Both the ZooKeeper session and being in-sync is needed for broker live ness which is referred to as
being in-sync.
q In-sync replica is called an ISR. Each leader keeps track of a set of “in sync replicas”.
q If ISR/follower dies, falls behind, then the leader will remove the follower from the set of ISRs. Falling
behind is when a replica is not in-sync after replica.lag.time.max.ms period.
q A message is considered “committed” when all ISRs have applied the message to their log. Consumers
only see committed messages.
q Kafka guarantee: committed message will not be lost, as long as there is at least one ISR.
Replicated Log partitions
q A Kafka partition is a replicated log
q A replicated log is a distributed data system primitive.
q A replicated log is useful for implementing other distributed systems using state machines.
q While a leader stays alive, all followers just need to copy values and ordering from their leader.
q If the leader does die, Kafka chooses a new leader from its followers which are in-sync.
q If a producer is told a message is committed, and then the leader fails, then the newly elected leader
must have that committed message.
q The more ISRs you have; the more there are to elect during a leadership failure.

Kafka and Quorum
q Quorum is the number of acknowledgments required and the number of logs that must be compared to
elect a leader such that there is guaranteed to be an overlap for availability. Most systems use a majority
vote, Kafka does not use a simple majority vote to improve availability.
q In Kafka, leaders are selected based on having a complete log. If we have a replication factor of 3, then
at least two ISRs must be in-sync before the leader declares a sent message committed.
q If a new leader needs to be elected then, with no more than 3 failures, the new leader is guaranteed to
have all committed messages.
q Among the followers there must be at least one replica that contains all committed messages. Problem
with majority vote Quorum is it does not take many failures to have inoperable cluster.

Kafka and Quorum
q Kafka maintains a set of ISRs per leader.
q Only members in this set of ISRs are eligible for leadership election.
q Kafka’s guarantee about data loss is only valid if at least one replica is in-sync.
q If all followers that are replicating a partition leader die at once, then data loss Kafka guarantee is not
valid.
q If all replicas are down for a partition, Kafka, by default, chooses first replica.

Quotas
q Kafka has quotas for consumers and producers to limits bandwidth they are allowed to consume.
q These quotas prevent consumers or producers from hogging up all the Kafka broker resources.
q The quota is by client id or user.
q The quota data is stored in ZooKeeper, so changes do not necessitate restarting Kafka brokers.

Course Outline


Advanced Kafka Producers and Consumers
Advanced Producer
Advanced Consumer
❏ Batching by Time and Size
❏ Consumer Poll
❏ Compression
❏ At most once message semantics
❏ Async Producers and Sync
❏ At least once message semantics
Producers
❏ Exactly once message semantics
❏ Default partitioning
❏ Using ConsumerRebalanceListener
❏ Custom Partitioning

Buffering and Batching
q The Producer has buffers of unsent records per topic partition (sized at batch.size)
q The Kafka Producer buffers are available to send immediately.
q To reduce requests count and increase throughput, set linger.ms > 0.
q This setting forces the Producer to wait up to linger.ms before sending contents of buffer or until batch
fills up whichever comes first.
q Under heavy load linger.ms not met as the buffer fills up before the linger.ms duration completes.
q Under lighter load, the producer can use to linger to increase broker IO throughput and increase
compression.

Buffering and Batching
q The buffer.memory controls total memory available to a producer for buffering.
q If records get sent faster than they can be transmitted to Kafka then and this buffer will get exceeded
then additional send calls block up to max.block.ms after then Producer throws a TimeoutException.
q You can also set the producer config property buffer.memory which default 32 MB of memory.
q This denotes the total memory (in bytes) that the producer can use to buffer records to be sent to the
broker.
q If the Producer is sending records faster than the broker can receive records, an exception is thrown.

Batching by Time
q The producer config property linger.ms defaults to 0.
q You can set this so that the Producer will wait this long before sending if batch size not exceeded.
q This setting allows the Producer to group together any records that arrive before they can be sent into a
batch.
q Setting this value to 5 ms or greater is good if records arrive faster than they can be sent out.
q The producer can reduce requests count even under moderate load using linger.ms.
q Kafka may send batches before this limit is reached

Batching by Size
q The linger.ms setting adds a delay to wait for more records to build up, so larger batches get sent.
q Increase linger.ms to increase brokers throughput at the cost of producer latency.
q If the producer gets records whose size is batch.size or more for a broker’s leader partitions, then it is
sent right away.
q If Producers gets less than batch.size but linger.ms interval has passed, then records for that partition are
sent.
q Increase linger.ms to improve the throughput of Brokers and reduce broker load
q Normally the producer will not wait at all, and simply send all the messages that accumulated while the
previous send was in progress

Code Snippets - Batching by time and Size
/Linger up to 100 ms before sending batch if size not met

props.put(ProducerConfig.LINGER_MS_CONFIG, 100);
//Batch up to 64K buffer sizes.

props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384 * 4);

Advanced Producer
Advanced Consumer
❏ Consumer Poll
❏ Compression
Producers

Compression
q The producer config property compression.type defaults to none
q Setting this allows the producer to compresses request data.
q By default, the producer does not compress request data.
q This setting can be set to none, gzip, snappy, lz4 or custom.
q The compression is by batch and improves with larger batch sizes.
q End to end compression is possible if the Kafka Broker config “compression.type” set to “producer”.

Compression
q The compressed data can be sent from a producer, then written to the topic log and forwarded to a
consumer by broker using the same compressed format.
q End to end compression is efficient as compression only happens once and is reused by the broker and
consumer. End to end compression takes the load off of the broker.
Code Snippet – Compression
//Use Snappy compression for batch compression.

props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");

Advanced Producer
Advanced Consumer
❏ Consumer Poll
❏ Compression
Producers

Synchronous Producers
q Kafka provides a synchronous send method to send a record to a topic.
q RecordMetadata has "partition" where the record was written and the ‘offset’ of the record in that
partition.
q It wait for acknowledgement
q Message is sent only after the acknowledgement is received.
q In case of exception, it stop sending the messages after the exception occurs.

Synchronous Producers
Code Snippet
public class KafkaProducerExample {
...
static void runProducer(final int sendMessageCount) throws Exception {
final Producer<Long, String> producer = createProducer();
long time = System.currentTimeMillis();
…
RecordMetadata metadata = producer.send(record).get();
long elapsedTime = System.currentTimeMillis() - time;

System.out.printf("sent record(key=%s value=%s) " +
"meta(partition=%d, offset=%d) time=%d\n",
record.key(), record.value(), metadata.partition(), \
metadata.offset(), elapsedTime);

Asynchronous Producers
q Kafka provides asynchronous send method to send a record to a topic.
q Since the send call is asynchronous it returns a Future for the RecordMetadata that will be assigned to
this record.
q It wont wait for the acknowledgement
q Some messages will be sent before that something is wrong and perform some actions
q in asynchronous approach the number of messages which are "in flight" is controlled
by max.in.flight.requests.per.connection parameter.
q The callback ensures that the message request is completed or not.

Asynchronous Producers
Code Snippet
public void send() {

aProducer.send(message, new Callback() {
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception != null) {
// How do I find get the original message so that I can do something with it if
needed?
throw new KafkaException("Asynchronous send failure: ", exception);
} else {
//NoOp
}
}
}

Advanced Producer
Advanced Consumer
❏ Consumer Poll
❏ Compression
Producers

Default Partition – Round Robin
q Let’s consider, we have a TopicA in Kafka which has partitions count 5 and replication factor 3 and we
want to distribute data uniformly between all the partitions so that all the partitions contains same data
size.
q The Kafka uses the default partition mechanism to distribute data between partitions, but in case of
default partition mechanism it might be possible that our some partitions size larger than others.
q Suppose we have inserted 40 GB data into Kafka, then the data size of each partition may look like:
Without Partitioner With Partitioner
Partition0 - 10 GB Partition0 – 8GB
Partition1 - 8 GB Partition1 – 8 GB
q Ensuring the fair share of the data to the partitions

Default Partition – Round Robin
q Round Robin is the default partition strategy for producer.
Code Snippet
class RRPartitioner():
def __init__():
# Using topic metadata get total number of partitions
self.total_partitions = client[topic].get_number_partitions()
self.part_offset = 0
def partitioner(self, key, msg):

if self.part_offset > self.total_partitions:
self.part_offset = 0
return self.part_offset
else:
self.part_offset += 1
return self.part_offset

Advanced Producer
Advanced Consumer
❏ Consumer Poll
❏ Compression
Producers

Advancced Kafka Producer and Consumer
Custom Partitioning

Custom Partitioning
q By default, Apache Kafka producer will distribute the messages to different partitions by round-robin
fashion.
q Writing a custom partition will split the data to the partition based upon the partition logic
Code snippet
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes,
Cluster cluster) {
int partition = 0;
String userName = (String) key;
// Find the id of current user based on the username
Integer userId = userService.findUserId(userName);
// If the userId not found, default partition is 0
if (userId != null) {
partition = userId;
}
return partition;
}
Advancecd Producer Exercise
q Exercise on Message Batching and Compression
q Exercise – Round Robin Partition
q Exercise Custom Partition

Advanced Producer
Advanced Consumer
❏ Consumer Poll
❏ Compression
Producers

Consumer Poll
q Poll the kafka broker to obtain new messages
q Once the record is polled, Consumer protocol process the messages and Commit the update
consumer position back to the kafka broker.
q If the application running this loop dies, it will start consuming at the last committed consumer
position.
q Effectively, this guarantees you that you will process each message at least once. It can very well
happen that the same message is processed multiple times.
q Consumer Poll has no Timeout

Consumer Poll
q poll() function actually has a parameter called timeout.
q It’s important to realize that this timeout only applies to part of what the poll() function does
internally.
q The timeout parameter is the number of milliseconds that the network client inside the kafka
consumer will wait for sufficient data to arrive from the network to fill the buffer.
q If no data is sent to the consumer, the poll() function will take at least this long. If data is available
for the consumer, poll() might be shorter.
q Before it gets to that part of the poll() function, the consumer will also do a check to ensure that the
broker is available.
q That part does not respect the timeout. It will try infinitely long to fetch metadata from the cluster

Consumer Poll
q If the processing of messages is expensive (e.g. complex calculations, or long blocking I/O), you
may run into a CommitFailedException.
q The reason for this is that the consumer is expected to send a heartbeat to the broker every so
often.
q This heartbeat informs the broker that the consumer is still alive. When the heartbeat doesn’t arrive
in time, the broker will mark the consumer as dead and kick it from the consumer group.
q The time is defined by the session.timeout.ms configuration of the broker (default is 30 seconds).
q Both the poll() and commitSync() functions send this heartbeat. However, if the time between the
two function calls is 30 seconds, then by the time commitSync() is called, the broker will already
have marked the consumer as dead. As a result you get a CommitFailedException.

Advanced Producer
Advanced Consumer
❏ Batching by Time and Size ❏ Consumer Poll
❏ Compression ❏ At most once message semantics
❏ Async Producers and Sync ❏ At least once message semantics
Producers ❏ Exactly once message semantics

At most once message semantics
q if the producer does not retry when an ack times out or returns an error, then the message might
end up not being written to the Kafka topic, and hence not delivered to the consumer.
q In most cases it will be, but in order to avoid the possibility of duplication, we accept that
sometimes messages will not get through.

At most once message semantics
q To configure this type of consumer:

-- Set ‘enable.auto.commit’ to true or
-- Set ‘auto.commit.interval.ms’ to a lower timeframer.
-- And do not make call to consumer.commitSync(); from the consumer. With this
configuration of consumer, Kafka would auto commit offset at the specified interval.
Code Snippet
// Set this property, if auto commit should happen.
props.put("enable.auto.commit", "true");
// Auto commit interval, kafka would commit offset at this interval.
props.put("auto.commit.interval.ms", "101");

Advanced Producer
Advanced Consumer
❏ Consumer Poll
❏ Compression
Producers

Advanced Kafka Producer and Consumer
At least once message semantics
q if the producer receives an acknowledgement (ack) from the Kafka broker and acks=all, it means
that the message has been written exactly once to the Kafka topic.
q If a producer ack times out or receives an error, it might retry sending the message assuming that
the message was not written to the Kafka topic.
q If the broker had failed right before it sent the ack but after the message was successfully written
to the Kafka topic, this retry leads to the message being written twice and hence delivered more
than once to the end consumer.
q This approach can lead to duplicated work and incorrect results.

At least once message semantics
q To configure this type of consumer:

-- Set ‘enable.auto.commit’ to false or
-- Set ‘enable.auto.commit’ to true with ‘auto.commit.interval.ms’ to a higher number.
-- Consumer should now then take control of the message offset commits to Kafka by
making the following call consumer.commitSync();
Code Snippet
// Set this property, if auto commit should happen.

props.put("enable.auto.commit", "true");
// Make Auto commit interval to a big number so that auto commit does not happen,
// we are going to control the offset commit via consumer.commitSync(); after processing
// message.
props.put("auto.commit.interval.ms", "999999999999");

Advanced Producer
Advanced Consumer
❏ Consumer Poll
❏ Compression
Producers

Exactly once message semantics
q Even if a producer retries sending a message, it leads to the message being delivered exactly
once to the end consumer.
q Exactly-once semantics is the most desirable guarantee, but also a poorly understood one. This is
because it requires a cooperation between the messaging system itself and the application
producing and consuming the messages.
q To configure this type of consumer

-- Set enable.auto.commit = false.
-- Do not make call to consumer.commitSync(); after processing message.
-- Register consumer to a topic by making a ‘subscribe’ call. Subscribe call behavior is
explained earlier in the article.
-- Implement a ConsumerRebalanceListener and within the listener
perform consumer.seek(topicPartition,offset); to start reading from a specific offset of that
topic/partition.

Exactly once message semantics
-- While processing the messages, get hold of the offset of each message. Store the
processed message’s offset in an atomic way along with the processed message using atomic-
transaction. When data is stored in relational database atomicity is easier to implement.
-- Implement idempotent as a safety net.
Code Snippets
// Below is a key setting to turn off the auto commit.

props.put("enable.auto.commit", "false");
props.put("heartbeat.interval.ms", "2000");
props.put("session.timeout.ms", "6001");
// Control maximum data on each poll, make sure this value is bigger than the maximum //
single message size
props.put("max.partition.fetch.bytes", "140");
…
// Save processed offset in external storage.
offsetManager.saveOffsetInExternalStore(record.topic(),Compiled by Navaneetha Babu C
record.partition(),
Advanced Producer Advanced Consumer

❏ Default partitioning ❏ Using
❏ Custom Partitioning ConsumerRebalanceListener

ConsumerRebalanceListener
q When the situation arises to adjust the number of partitions, rebalance will be triggered.
q When Kafka is managing the group membership, a partition re-assignment will be triggered any
time the members of the group change or the subscription of the members changes.
q This can occur when processes die, new process instances are added or old instances come back
to life after failure.
q There are many uses for this functionality. One common use is saving offsets in a custom store. By
saving offsets in the onPartitionsRevoked(Collection) call we can ensure that any time partition
assignment changes the offset gets saved.
q Another use is flushing out any kind of cache of intermediate results the consumer may be
keeping.
q Callback will execute in the user thread as part of the poll(long) call whenever partition assignment
changes.
ConsumerRebalanceListener
Code Snippet
public class SaveOffsetsOnRebalance implements ConsumerRebalanceListener {

private Consumer<?,?> consumer;
public SaveOffsetsOnRebalance(Consumer<?,?> consumer) {

this.consumer = consumer;
}
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {

// save the offsets in an external store using some custom code not described here
for(TopicPartition partition: partitions)
saveOffsetInExternalStore(consumer.position(partition));
}
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {

// read the offsets from an external store using some custom code not described here
for(TopicPartition partition: partitions)
consumer.seek(partition, readOffsetFromExternalStore(partition));
}
}
Advanced Producer
Advanced Consumer

Thank you

Apache_Kafka_360_1631077800

Uploaded by

Apache_Kafka_360_1631077800

Uploaded by

Apache Kafka – 360 View

❏ Apache Kafka Introduction

Compiled by Navaneetha Babu C

Compiled by Navaneetha Babu C

Why Apache Kafka

q Kafka was developed around 2010 at LinkedIn.

Compiled by Navaneetha Babu C

Why Apache Kafka

q Kafka was ingesting more than 1 billion events a day.

q LinkedIn has reported ingestion rates of 1 trillion messages a day.

Compiled by Navaneetha Babu C

Why Apache Kafka

Compiled by Navaneetha Babu C

Compiled by Navaneetha Babu C

Apache kafka Architecture

Compiled by Navaneetha Babu C

Apache kafka Architecture

q Kafka Records are immutable

q A Kafka Topic is a stream of records

q A topic has a Log which is the topic’s storage on disk.

q A Topic Log is broken up into partitions and segments.

q The Kafka Producer API is used to produce streams of data records

Compiled by Navaneetha Babu C

Apache kafka Architecture

q A Broker is a Kafka server that runs in a Kafka Cluster.

q Kafka Brokers form a cluster

q The Kafka Cluster consists of many Kafka Brokers.

q Kafka uses ZooKeeper to manage the cluster

Compiled by Navaneetha Babu C

Apache kafka Architecture

Compiled by Navaneetha Babu C

Compiled by Navaneetha Babu C

Overview of key Concepts

Compiled by Navaneetha Babu C

Overview of key Concepts

Compiled by Navaneetha Babu C

Overview of key Concepts

Compiled by Navaneetha Babu C

Overview of key Concepts

- Assignment can be done in a round-robin fashion to balance load, or it can be based on a

Compiled by Navaneetha Babu C

Overview of key Concepts

Compiled by Navaneetha Babu C

Overview of key Concepts

Compiled by Navaneetha Babu C

Compiled by Navaneetha Babu C

Compiled by Navaneetha Babu C

Compiled by Navaneetha Babu C

q Zookeeper server is replicated over a set of machines

q All machines stores a copy of the data(in-memory)

q A leader is elected on service start-up

Compiled by Navaneetha Babu C

Apache Zookeeper - Overview

Compiled by Navaneetha Babu C

Apache Zookeeper Use Cases

q Distributed Cluster Management

q Naming Services – DNS

q Distributed syncronization – Locks, barriers and queues

q Leader election in Distributed system

q Centralized and highly reliable data registry

Compiled by Navaneetha Babu C

Apache Zookeeper Consistency

q Sequential Consistency – Updates are applied in order

q Atomicity – Updates either succeed of fail

q Reliability – Updates persists once applied, till overwritten by some clients

Compiled by Navaneetha Babu C

Apache Zookeeper Features

q Updating the Node’s status

q Managing the cluster

q Automatic failure recovery

Compiled by Navaneetha Babu C

Apache Zookeeper Features

q Updating the Node’s status

q Managing the cluster

q Automatic failure recovery

Compiled by Navaneetha Babu C