? Kafka
? Kafka
Key concepts
Cluster
A group of Kafka Brokers.
Broker
A Kafka instance. It is intentionally very simple, responsible only for handling partitions and
consumption/production requests.
Event
An Event is something that happened in the past. In kafka, an event is a K/V pair. The value is
the state (the content/payload) and the key can be anything, although it's usually a simple
primitive type (such as an ID). Internally, Kafka handles events as byte sequences.
Topic
Kafka's fundamental unit. A topic is a stream of events represented as a Log. Unlike the
queues in traditional systems, logs are durable structures. Messages will live for as long as the
configuration states, regardless of their read state. This allows things such as replays or multiple
consumers for the same message.
Partition
Kafka is designed for scalable distributed systems, so topics are split across multiple partitions.
Producers may or may not specify which partition a message will go through. If the Event key is
null, Kafka will evenly distribute messages through all partitions. If it is not, the destination will
be determined by hashing the key and getting its mod .
The latter approach is essential for guaranteeing message delivery order across multiple
partitions.
The partitions are replicated through all brokers in the cluster for fault tolerance.
This concept is important for Scaling > Horizontal Scaling, since kafka will distribute
partitions through the available instances. This distribution is done automatically because each
node indentifies itself with a groupId property, allowing Kafka to monitor the state of potential
consumers.
Producer
A producer application has code that writes into a topic. The producer API is quite simple,
although it handles a lot fo complexity under the hood. Acknowledgement, partition choosing
and connection pooling are specifically decided by those publisher agents.
Consumer
A consumer application reads from a topic partition. They are split into groups (as detailed in
the Partition section, Kafka automatically handles this distribution).
Consumers will read one message at a time. Only after acknowledging a message or bach will
the consumer pull the next one.
Aggregate tools
Confluent, Apache and the community develop many tools for the Kafka ecosystem.