Thursday 

Workshop Room 

16:20 - 17:20 

(UTC+01

Workshop (60 min)

Part 1/2: Stream the word with Apache Flink and Apache Spark

Stream processing is often compared with batch processing in terms of latency. Although it's true, there are many technical differences like watermarks, state stores, micro-batch or dataflow processing models. They make the streaming exciting but also more challenging for engineers used to working with batch systems.

Big Data

In this workshop you're going to see two stream processing models in action. The first is the micro-batch one that should help you enter the streaming world and see basic aspects the easy way. The second is the dataflow model that has nothing to do with batch processing and therefore, requires a bigger mind shift effort. Both parts will be covered with Open Source data processing frameworks, Apache Spark for the micro-batch part, and Apache Flink for the dataflow model.

By the end of this workshop you, as a prior batch processing person, should better understand the streaming world and be able to write your first jobs by taking all gotchas into account.

Bartosz Konieczny

Bartosz Konieczny is a freelance data engineer enthusiast who has been coding for 15+ years. He has held various senior hands-on positions that helped him work on many data engineering problems in batch and stream processing, such as sessionization, data ingestion, data cleansing, ordered data processing, or data migration. He enjoys solving data challenges with public cloud services and Open Source technologies, especially Apache Spark, Apache Kafka, Apache Airflow, and Delta Lake. Beyond that, readers can read his data engineering blog posts at waitingforcode.com each month.