02data Stream Processing With Apache Flink
02data Stream Processing With Apache Flink
02data Stream Processing With Apache Flink
Apache Flink
Sina Nourian
1
Contents
1. Introduction to Stream Processing and
Apache Flink
2. Auto-Parallelizing Stateful Distributed
Streaming Applications
3. Flink’s Dataflow Programming Model
4. Handling Time
5. Stateful Computation
6. Flink Performance
2
Introduction to Stream Processing
and Apache Flink
3
Stream Processing
Stream processing is a computer programming paradigm,
equivalent to:
• Dataflow Programming
• Event Stream Programming
• Reactive Programming
High throughput
• Millions of events per second
Exactly-once consistency
• Correct results in case of failures
Out-of-order events
• Process events based on their associated time
6
What is Apache Flink?
Apache Flink is an open source platform for scalable stream and
batch processing.
The core of Flink is a distributed streaming dataflow engine.
• Executes dataflows in parallel on clusters
• Provides a reliable backend for various workloads
7
Comparison
One of the strengths of Apache Flink is the way it combines many desirable
capabilities that have previously required a trade-off in other projects.
8
Who uses Flink?
9
Flink in Streaming Architectures
Flink
Data ingestion
and ETL Analytics on static data
HDFS
10
Auto-Parallelizing Stateful
Distributed Streaming Applications
11
Stream Graph
Streaming applications are directed graphs
• Vertices are operators
• Edges are data streams.
12
Stream Graph
When operators are connected in chains, they expose
inherent pipeline parallelism.
When the same streams are fed to multiple operators that
perform distinct tasks, they expose inherent task parallelism
13
Data Parallelism
Data parallelism involves splitting data streams and replicating operators.
In a streaming context, replication of operators is data parallelism because each
replica of an operator performs the same task on a different portion of the data.
The parallelism obtained through replication can be more well-balanced than the
inherent parallelism in a particular stream graph, and is easier to scale to the
resources at hand.
Data parallelism has the advantage that it is not limited by the number of
operators in the original stream graph.
14
Data Parallelism
Sink Sink
15
Routing
When parallel regions only have stateless operators, the splitters route
tuples in round-robin fashion, regardless of the ordering strategy.
When parallel regions have partitioned state, the splitter uses all of the
attributes that define the partition key to compute a hash value. That
hash value is then used to route the tuple, ensuring that the same
attribute values are always routed to the same operators.
Stateful Operators:
• Join
• Aggregate
• KeyBy (PartitionBy)
• Windows
• …
Stateless Operators:
• Map
• Filter
• …
16
Flink’s Dataflow Programming
Model
17
Programs and Dataflows
18
Parallel Dataflows
19
Tasks and Operator Chains
20
Distributed Execution
21
Task Slots and Resources
Each worker (TaskManager) is a JVM process, and may execute one or more
subtasks in separate threads.
Each task slot represents a fixed subset of resources of the TaskManager. A
TaskManager with three slots, for example, will dedicate 1/3 of its managed
memory to each slot.
By adjusting the number of task slots, users can define how subtasks are isolated
from each other. Having one slot per TaskManager means each task group runs in
a separate JVM
Tasks in the same JVM share TCP connections (via multiplexing) and heartbeat
messages
22
Handling Time
23
Notions of Time
Processing Time
• The time that the event is observed by the machine that is processing it.
• Best performance and the lowest latency.
Event Time
• The time that an event actually happened in the real world.
Ingestion Time
• The time that the event enters the stream processing framework.
24
Notions of Time
25
Notions of Time
Event time and processing time always have a time-varying lag
(called event time skew).
26
Window
Windows are the mechanism to group and collect a bunch of
events by time or some other characteristic in order to do
some analysis on these events as a whole (e.g., to sum them
up).
Window
1 2 3 4 5 6 7 8 9 10 11 12
Count = 7
27
Time Window
Tumbling Windows
• A tumbling windows assigner assigns each element to a window of a
specified window size.
• Tumbling windows have a fixed size and do not overlap.
9 6 8 4 7 3 8 4 2 1 3 2
A tumbling time window of 1 minute that sums the last minute’s worth of values.
28
Time Window
Sliding Windows
• The sliding windows assigner assigns elements to windows of fixed length.
• Sliding windows can be overlapping if the slide is smaller than the window size.
9 6 8 4 7 3 8 4 2 1 3 2
9 6 8 4 Sum = 27
8 4 7 3 Sum = 22
7 3 8 4 Sum = 22
8 4 2 1 Sum = 15
2 1 3 2 Sum = 8
A sliding time window that computes the sum of the last minute’s values every half minute.
29
Session Windows
Session Windows
• The session windows assigner groups elements by sessions of activity.
• A session window closes when it does not receive elements for a
certain period of time.
9 6 8 4 7 3 8 4 2 1 3 2
30
Watermarks
The mechanism in Flink to measure progress in event time is Watermarks.
Watermarks flow as part of the data stream and carry a timestamp t
A Watermark(t) declares that event time has reached time t in that
stream, meaning that there should be no more elements from the stream
with a timestamp t’ <= t
Stream (In Order)
23 21 20 19 18 17 15 14 11 10 9 9 7
W(20) W(11)
Event Timestamp
Watermarks
21 19 20 17 22 12 17 14 12 9 15 11 7
W(17) W(11)
31
Watermarks in Parallel Streams
As the watermarks flow through the streaming program, they advance the event time at the
operators where they arrive.
Whenever an operator advances its event time, it generates a new watermark downstream
for its successor operators.
The operator’s current event time is the minimum of the input streams’ event time. As the
input streams update their event time, so does the operator.
32
Late Elements
If a certain elements violate the watermark condition,
delaying the watermarks by too much is often not desirable,
because it delays the evaluation of the event time windows by
too much.
Some streaming programs will explicitly expect a number
of late elements.
Late elements are elements that arrive after the system’s
event time clock (as signaled by the watermarks) has already
passed the time of the late element’s timestamp.
33
Stateful Computation
34
Stateful Computation
A stateful program creates output based on multiple events
taken together
• All types of windows
• All kinds of state machines used for complex event processing (CEP).
• All kinds of joins between streams as well as joins between streams,
and static or slowly changing tables.
Stateless Stream Processing Stateful Stream Processing
Op Op
State 35
Notions of Consistency
Consistency is, really, a different word for
“level of correctness”
• How correct are my results after a failure and a successful
recovery compared to what I would have gotten without
any failures?
Assume that we are simply counting user logins
within the last hour.
• What is the count (the state) if the system experiences a
failure?
36
Notions of Consistency
People distinguish between three different levels of
consistency:
• At most once: At most once is really a euphemism for no
correctness guarantees whatsoever—the count may be
lost after a failure.
• At least once: At least once, in our setting, means that the
counter value may be bigger than but never smaller than
the correct count. So, our program may over-count (in a
failure scenario) but guarantees that it will never under-
count.
• Exactly once: Exactly once means that the system
guarantees that the count will be exactly the same as it
would be in the failure-free scenario.
37
Checkpoints: Guaranteeing Exactly Once
38
ABS for Acyclic Dataflows
39
ABS for Cyclic Dataflows
40
Checkpointing in Flink
Barriers
• Injected into the data stream and flow with the records as part of the
data stream.
• A barrier separates the records in the data stream into the set of
records that goes into the current snapshot, and the records that go
into the next snapshot.
• Barriers do not interrupt the flow of the stream and are hence very
lightweight.
Checkpoint Checkpoint
Newer Records Barrier n Barrier n-1 Older Records
Checkpoint
Barrier n
Operator y x
42
Checkpointing in Flink
Streams that report barrier n are temporarily set aside. Records that are
received from these streams are not processed, but put into an input buffer.
Input Buffer
3 2 1
Operator b a x
43
Checkpointing in Flink
Once the last stream has received barrier n, the operator emits all pending
outgoing records, and then emits snapshot n barriers itself.
emit barrier N
4 3 2 1
Operator c b a
44
Checkpointing in Flink
After that, it resumes processing records from all input streams, processing
records from the input buffers before processing the records from the
streams.
Operator 3 2 1 c
45
Stateful Operations
When operators contain any form of state, this state must be part of the
snapshots as well. Operator state comes in different forms
• User-defined state: This is state that is created and modified directly by the
transformation functions (like map() or filter()).
• System state: This state refers to data buffers that are part of the operator’s
computation. A typical example for this state are the window buffers
46
Stateful Operations
Flink offers the user facilities to define state.
An example:
47
Stateful Operations
“a”, 2 ckpt “b”, 2
KeyBy Map
Save positions 1
in stream
1
Asynchronously
backup counters
pos1 pos2 pos3
50
Stateful Operations
“a”, 4 “a”, 2 ckpt
KeyBy Map
Checkpoint Complete 3
Process Failure! 3
54
Batch API
Apache Flink does the TeraSort job in about half of the time of Apache
Spark. For very small cases, Apache Flink almost has no execution time
while Apache Spark needs a significant amount of execution time to
complete the job.
55
Batch API
Apache Flink has about a constant rate of incoming and outgoing network
traffic and Apache Spark does not have this constant rate of incoming and
outgoing network traffic.
56
Batch API
The behaviour of the disk also reflects to behaviour of the network.
57
Stream API
The fact that Apache Flink is fundamentally based on data streams is
clearly reflected. The mean latency of Apache Flink is 54ms (with a
standard deviation of 50ms) while the latency of Apache Spark is centered
around 274ms (with a standard deviation of 65ms).
58
Stream API
Spark Streaming suffered from a throughput-latency tradeoff. As batches
increase in size, latency also increases. If batches are kept small to
improve latency, throughput decreases. Storm and Flink can both sustain
low latency as throughput increases.
59
Conclusion
Flink is an open-source framework for distributed stream
processing that:
• Provides results that are accurate, even in the case of out-of-order or late-
arriving data.
• Is stateful and fault-tolerant and can seamlessly recover from failures
while maintaining exactly-once application state.
• Performs at large scale, running on thousands of nodes with very good
throughput and latency characteristics.
• Guarantees exactly-once semantics for stateful computations.
• Supports stream processing and windowing with event time semantics.
• Supports flexible windowing based on time, count, or sessions in addition
to data-driven windows.
• Flink’s fault tolerance is lightweight and allows the system to maintain
high throughput rates and provide exactly-once consistency guarantees at
the same time.
• Is capable of high throughput and low latency (processing lots of data
quickly).
60
References
Apache Flink Documentation
• https://ci.apache.org/projects/flink/flink-docs-release-1.2/
Introduction to Apache Flink
• https://www.mapr.com/introduction-to-apache-flink
Auto-Parallelizing Stateful Distributed Streaming Applications
• http://dl.acm.org/citation.cfm?id=2370826
The Dataflow Model: A Practical Approach to Balancing
Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-
of-Order Data Processing
• https://research.google.com/pubs/pub43864.html
Lightweight Asynchronous Snapshots for Distributed Dataflows
• http://arxiv.org/abs/1506.08603
Apache Flink: Distributed Stream Data Processing
• https://cds.cern.ch/record/2208322/files/report.pdf 61