0% found this document useful (0 votes)
189 views33 pages

Spring Cloud Data Flow - Animated

Spring Cloud Data Flow is a platform for microservice streaming and batch data processing. It provides an event-driven programming model and supports deployment on local, Kubernetes, and Cloud Foundry platforms. It integrates with Spring Cloud Stream to provide microservices that connect to message brokers. It also includes features for updating and rolling back deployed streaming applications.

Uploaded by

Christian Tzolov
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
189 views33 pages

Spring Cloud Data Flow - Animated

Spring Cloud Data Flow is a platform for microservice streaming and batch data processing. It provides an event-driven programming model and supports deployment on local, Kubernetes, and Cloud Foundry platforms. It integrates with Spring Cloud Stream to provide microservices that connect to message brokers. It also includes features for updating and rolling back deployed streaming applications.

Uploaded by

Christian Tzolov
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 33

Spring Cloud Data Flow

Microservice Streaming and Batch data


processing for Cloud
dataflow.spring.io
Event-driven Applications
Event-Driven Applications source.time=maven://o.s.c.s.app:time-source-rabbit:2.1.1
source.time.metadata=maven://o.s.c.s.app:time-source-rabbit:jar:metadata:2.1.1
... Foundry
Platforms: Local, Kubernetes, Cloud

Apps Registry Get artifact


Register & Metadata
(Local and CF only)
app register -name time
Data Flow API Data Flow SKIPPER API Skipper <<Deploy>>
Server Server
time | transform | log
manifest
Spring Cloud Stream (e.g.
Spring Boot) Microservices
Platforms: Local, Kubernetes, Cloud Foundry

time transform log


Deployer SPI

Micrometer Micrometer Micrometer

Message Middleware
(Kafka, RabbitMQ, …) RSocket API
Prometheus Prometheus Grafana
RSocket Proxy Scrap API (TSDB)
PromQL

metrics

more information: stream processing , stream developer guides , stream monitoring


source.time=maven://o.s.c.s.app:time-source-rabbit:2.1.1
source.time.metadata=maven://o.s.c.s.app:time-source-rabbit:jar:metadata:2.1.1

Platforms: Local, Kubernetes, Cloud Foundry

Apps Registry Get artifact


Register & Metadata
(Local and CF only)
app register -name time
Data Flow API Data Flow SKIPPER API Skipper <<Deploy>>
Server Server
time | transform | log
manifest
Spring Cloud Stream - Spring
Boot Microservices
Platforms: Local, Kubernetes, Cloud Foundry

time transform log


Deployer SPI

Micrometer Micrometer Micrometer

Message Middleware
(Kafka, RabbitMQ, …) RSocket API
Prometheus Prometheus Grafana
RSocket Proxy Scrap API (TSDB)
PromQL

metrics

more information: stream processing , stream developer guides , stream monitoring


Stream Programming Models - Imperative

@EnableBinding(Processor.class) c
B public class Application { B
e i i o
v foo n @StreamListener("foo")
n bar n
@SendTo("bar")
e d public String replaceStringMsgHandler(String payload) { d s
n i i u
t n return StringUtils.replace(payload, "foo", "bar");
n m
}
s g } g e
s s r
s

more information: stream programming models


Stream Programming Models - KStream

@EnableBinding(Processor.class) c
B public class Application { B
e i i o
v foo n @StreamListener("foo")
n bar n
@SendTo("bar")
e d public KStream<Object, Foo> handler( d s
n i KStream<Object, Event> input) { i u
t n return ...;
n m
}
s g } g e
s s r
s

more information: stream programming models


Stream Programming Models - Reactive

@EnableBinding(Processor.class) c
B public class Application { B
e i i o
v foo n @StreamListener("foo")
n bar n
@SendTo("bar")
e d public Flux<Average> average(Flux<Sensor> data) { d s
n i i u
t n return ...;
n m
}
s g } g e
s s r
s

more information: stream programming models


Stream Programming Models - Functional

@EnableBinding(Processor.class) c
B public class Application { B
e i i o
v foo n n bar n
@Bean
e d public Function<String, String> toUpperCase() { d s
n i i u
t n return s -> s.toUpperCase();
n m
}
s g } g e
s s r
s

more information: stream programming models


Features: Update and Rollback
source.transform=maven://o.s.c.s.app:transform-processor-rabbit:V1
source.transform=maven://o.s.c.s.app:transform-processor-rabbit:V2

app register transform:V1


Apps
Registry
time | transform | log

app register transform:V2 Data Flow API Data Flow SKIPPER API Skipper Deploy
Update
Rollback
Server Server

update to transform:V2

rollback
Platforms: Local, Kubernetes, Cloud Foundry

transform
time V1
log
Deployer SPI

transform
V2

more information: continuously delivery of streaming applications


Features: Multiple Inputs/Outputs (1)
orderGenApp || baristaApp || hotDrinkApp || coldDrinkgApp

Create
Platforms:
Local, Kubernetes, Cloud Foundry
hotDrinksDest hot
Data Flow API Data Flow Skipper Drink
App
Server Server ordersDest
order
barista
Gen
App
App
cold
Drink
Deploy coldDrinksDest App

--app.orderGenApp.spring.cloud.stream.bindings.output.destination=ordersDest
--app.baristaApp.spring.cloud.stream.bindings.orders.destination=ordersDest

--app.baristaApp.spring.cloud.stream.bindings.hotDrinks.destination=hotDrinksDest
--app.hotDrinkApp.spring.cloud.stream.bindings.input.destination=hotDrinksDest

--app.baristaApp.spring.cloud.stream.bindings.coldDrinks.destination=coldDrinksDest
--app.coldDrinkApp.spring.cloud.stream.bindings.input.destination=coldDrinksDest

more information: stream application DSL


Features: Multiple Inputs/Outputs (2)
@EnableBinding(Source.class) public interface BaristaApp { @EnableBinding(Sink.class)
public class OrderGenApp { public class HotDrinkApp {
@Input @StreamListener(Sink.INPUT)
@Autowired SubscribableChannel orders(); public void onDring(String payload) {...}
private Source source; }
@Output
@Scheduled(fixedDelay=1000) MessageChannel hotDrinks();
public void sendOrders() { @EnableBinding(Sink.class)
this.source.output().send(...); @Output public class ColdDrinkApp {
} MessageChannel coldDrinks(); @StreamListener(Sink.INPUT)
} } public void onDring(String payload) {..}
}

--app.orderGenApp.spring.cloud.stream.bindings.output.destination=ordersDest
--app.baristaApp.spring.cloud.stream.bindings.orders.destination=ordersDest

--app.baristaApp.spring.cloud.stream.bindings.hotDrinks.destination=hotDrinksDest
--app.hotDrinkApp.spring.cloud.stream.bindings.input.destination=hotDrinksDest

--app.baristaApp.spring.cloud.stream.bindings.coldDrinks.destination=coldDrinksDest
--app.coldDrinkApp.spring.cloud.stream.bindings.input.destination=coldDrinksDest

more information: stream application DSL


Features: Polyglot
TODO
Examples: Twitter Real Time Analytics
twittersource | counter Data Flow Skipper
Data Flow API
--counter.name=tweets Server Server
--counter.tag.expression.lang=#jsonPath(payload,'$..lang')
--counter.tag.expression.htag=#jsonPath(payload,'$.entities.hashtags[*].text')

Deployment

"text": "Today we are sharing our",


Local, Kubernetes, Grafana
"user": {
"id": 2244994945, Cloud Foundry
"name": "Twitter Dev",
"screen_name": "TwitterDev",
"lang": "en"
}, twitter-s
counter
"entities": { "hashtags": [ ource
{
"text": "documentation", Micrometer
"indices": [211, 225]
},
],
tweets_total metrics with htag and lang tags sort_desc(topk(10, sum(tweets_total) by (lang)))
sort_desc(topk(100, sum(tweets_total) by (htag)))

Prometheus Prometheus
RSocket Proxy (TSDB)

more information: twitter analytics


Examples: Object Detection
TODO
Data Grafana
Flow API DataFlow Skipper
cdc-debezium | fraud-detection | counter Server Server
Alert
Handler

Deploy
Scale UP/DOWN

Credit Card
Company Platforms: Local, Kubernetes, Cloud Foundry Prometheus
Credit card (TSDB)
purchases Card Tx, Credit card
processing Transactions CDC
Fraud
Fraud
Fraud
Detection
Detection Counter
Debezium Detection
Micrometer
DB
Customers Prometheus
Pre-trained ML Fraud RSocket Proxy
detection models
Historical Transaction Data Train
ML
models

Data Science (Offline)


Short-lived Applications
Composed Tasks
Platforms: Local, Kubernetes, Cloud Foundry

task.myTask=maven://o.s.c.s.app:my-task:1.1.1
Platforms: Grafana
Local, Kubernetes,
Apps Cloud Foundry
Registry
Register, Resolve and
app register -name myTask Metadata

Micrometer

Micrometer
< ExtractDB: myTask Prometheus
Extract Extract
|| Data Flow API Data Flow Deployer SPI (TSDB)
Server DB Files
ExtractFiles: myTask>
&& <<Launch>>
Merge: myTask >

Update Task Scrap API

Micrometer
Status Merge

RSocket API Prometheus


RSocket Proxy
Task
DB Update Task
Status

more information: create and launch a composed task


Examples: File Ingestion
TODO
Data Flow Monitoring
OLTP vs OLAP vs Time-Series Workloads
- Large number short transactions
- Server metrics
- Simple queries
- Application performance monitoring
- Low latency
- Resource consumption
Business
- Random access
- Network data
Transactions
- sensor data,
- events, clicks, trades ...
Business Processing
OLTP Time-Series
Operations
- Timestamp ordered
OLAP - Emphasizes recent-time
Information
Business Data - Real-time scan queries
Warehouse - Time-range aggregations
Data mining, - Large datasets (TBs)
Analytics, - Write Intensive (mill/sec)
Decision -Making
Small number short transactions - Append Only
- Complex queries
- High latency
Time Series Data
● Sequence of data points, strictly ordered by timestamps

● Uniquely identified by metric name and a set of label dimensions (e.g tags)

● Emphasis on dimensional aggregation over individual data points

● Emphasis on recent data points

● Write-intensive, millions of data points per seconds, hundred of thousands


time-series

● Append-only
Time-Series Databases (TSDB)
● Optimized for time-stamped data Dimensions
. . . . . . . . . . . . . . . . . . . {metrics="request", method="GET"}
. . . . . . . . . . . . . . . . . . . {metrics="request", method="POST"}
● Efficient storage (delta-of-delta) . . . . . . .
. . . . .. . . . . . . . . . . . . ...
. . . . . . . . . . . . . . . . . .
. . . . . . .. . . . . . . . . . . . {metrics="errors", method="POST"}
● Recent data in memory . . . . . . . . . . . .
. . . . . . .. . . . . . .
. . . {metrics="errors", method="GET"}

. . . . . . . . . . . . . . . . . ...
. . . . . . .. . . . . . . . . . . .
Time
● Large time-range scans and
aggregations

● Lifecycle management, retention,


compaction
Time Series DB Diversification
● Dimensional vs Hierarchical

○ dimensional: Prometheus & Influx, hierarchical: JMX

● Push vs Pull

○ Push: Influx, Pull: Prometheus

○ Short-lived vs Long-lived applications

● Client vs Server Rate Aggregation

○ Client: Influx, Server: Prometheus


Micrometer
● Library for timers, gauges, counters, distribution summaries

● Generic dimensional data model

● Support for Prometheus, Netflix Atlas, CloudWatch, Datadog, Graphite, Ganglia,


JMX, Influx/Telegraf, New Relic, StatsD, SignalFx, and Wavefront …

● Support TSDB diversity

● Spring Boot 2.x Integration

more information: micrometer.io, concepts


Data Flow Monitoring - InfluxDB
Platforms: Local, Kubernetes, Cloud Foundry
Spring Cloud Task

Grafana

Micrometer

Micrometer
task 1 task 2
InfluxDB
Push (TSDB) CQL

Metrics t1
micrometer-registry-influx Metrics t2
...
"spring.cloud.dataflow.applicationProperties":{
"task.management.metrics.export.influx":{
"enabled":true,
"db":"myinfluxdb",
Micrometer Micrometer "uri":"http://influxdb:8086"
},
time log "stream.management.metrics.export.influx":{
"enabled":true,
"db":"myinfluxdb",
"uri":"http://influxdb:8086"
Spring Cloud Stream }

more information: monitoring, stream-monitoring, task-monitoring


Data Flow Monitoring - Prometheus / RSocket-Proxy
Spring Cloud Task Platforms: Local, Kubernetes, Cloud Foundry

RSocket Connections
(bidirectional, durable)
Micrometer

Micrometer
Grafana
task 1 task 2 Monitor the Proxy itself
TCP Endpoint Proxy
Scrap
Prometheus Prometheus
RSocket Proxy (TSDB) PormQL
Websockets
Endpoint Connected

prometheus-rsocket-spring
Scrap Metrics t1
"spring.cloud.dataflow.applicationProperties": { Monitor theMetrics
servicest2
"task.management.metrics.export.prometheus": { connected Metrics t3
to the Proxy
Micrometer Micrometer
"enabled": true, via the TCPMetrics
or t4
"rsocket.enabled": true,
"rsocket.host": <proxy-host>, Websocket ...
endpoints
time log "rsocket.port": <proxy-port>
},
"stream.management.metrics.export.prometheus": {
"enabled": true,
Spring Cloud Stream "rsocket.enabled": true,
"rsocket.host": <proxy-host>,
"rsocket.port": <proxy-port>
}

more information: monitoring, stream-monitoring, task-monitoring


Data Flow Auto-Scaling
Webhook SCDF
API Alert scale-in
scale-out
Scale API DataFlow Skipper
Web Server Server
Hook

Alert:
rate(Source)
rate(Source)==
> rate(Processor)
rate(Processor)

Alert Scrap
Prometheus Prometheus Prometheus
AlertManager & Alert Rules RSocket Proxy

Metrics

Micrometer Micrometer Micrometer


Processor
Source Sink Platform Scale
#1
API

Local, Kubernetes, Cloud Foundry


Processor
#2

Processor
#3
Local, Kubernetes, Cloud Foundry
Data Flow Auto-Scaling
Grafana

Alert:
rate(Source)==
rate(Source) > rate(Processor)
rate(Processor)
Query
Webhook
Alert API Alert Scrap
Prometheus Prometheus Prometheus
Web
AlertManager & Alert Rules RSocket Proxy
Hook

Metrics
scale-in
scale-out
SCDF
Scale API

Platform Micrometer Micrometer Micrometer


DataFlow Skipper Native API Processor
Server Server Source Sink
#1

Local, Kubernetes, Cloud Foundry


Processor
#2

Processor
#3

Local, Kubernetes, Cloud Foundry


--cdc.connector=mysql cdc-log = cdc-debezium | log
--cdc.config.database.user=... Data Flow & Skipper
--cdc.config.database.password=... Servers
--cdc.config.database.hostname=... cdc-analytics-tap = :cdc-log.cdc-debezium > analytics
--cdc.config.database.port=3306
--cdc.flattening.enabled=false

Terminal
cdc-log
change
events logs
cdc-
debezi log
um
"source": { Source
"connector":Records
"mysql",
DB ....
"db": "inventory",
write-ahead log "table": "customers", metrics Grafana
}, cdc-analytics-tap
"op": "u", ..

Micrometer
"before": {...},
"after": {…} Prometheus &
analytic
RSocket Proxy

PQL
SpEL
--analytics.name=cdc sort_desc(topk(10, sum(cdc_total) by (db, table)))
--analytics.tag.expression.db=#jsonPath(payload,'$..db') sort_desc(topk(100, sum(cdc_total) by (operation)))
--analytics.tag.expression.table=#jsonPath(payload,'$..table') sum(rate(cdc_total[60s])) by (db, table, operation)
--analytics.tag.expression.operation=#jsonPath(payload,'$..op')
Message Binders: Time-Series Databases:
Apache Kafka, RabbitMQ,, Prometheus, Wavefront, InfluxDb, ….
Amazon Kinesis, Google PubSub , ….

Grafana / Wavefront

Micrometer
cdc- Time-Series
DB debezi analytic Database
um (TSDB)

Source Databases: Deployment Platforms:


MySQL, PostgreSQL, Oracle, DB2, Kubernetes, Cloud Foundry, Local ….
SQL Server, Casandra, MongoDB,
….
--cdc.connector=mysql cdc-log = cdc-debezium | log
--cdc.config.database.user=... Data Flow & Skipper
--cdc.config.database.password=... Servers
--cdc.config.database.hostname=... cdc-analytics-tap = :cdc-log.cdc-debezium > analytics
--cdc.config.database.port=3306
--cdc.flattening.enabled=false

Terminal
cdc-log
change
events logs
cdc-
debezi log
um
"source": {
"connector": "mysql",
DB ....
"db": "inventory",
write-ahead log "table": "customers", metrics Grafana
}, cdc-analytics-tap
"op": "u", ..

Micrometer
"before": {...},
"after": {…} Prometheus &
analytic
RSocket Proxy

PQL
SpEL
--analytics.name=cdc sort_desc(topk(10, sum(cdc_total) by (db, table)))
--analytics.tag.expression.db=#jsonPath(payload,'$..db') sort_desc(topk(100, sum(cdc_total) by (operation)))
--analytics.tag.expression.table=#jsonPath(payload,'$..table') sum(rate(cdc_total[60s])) by (db, table, operation)
--analytics.tag.expression.operation=#jsonPath(payload,'$..op')
Message Binders:
Time-Series Databases:
Apache Kafka, RabbitMQ, Amazon Kinesis,
Prometheus, Wavefront, InfluxDb, ….
Google PubSub , ….

Grafana / Wavefront

Micrometer
CDC- Time-Series
Analytics
DB debezium Database

Source databases: Platforms:


MySQL, PostgreSQL, Oracle, DB2, SQL Kubernetes, Cloud Foundry, Local ….
Server, Casandra, MongoDB, ….

You might also like