Thursday
Room 4
11:40 - 12:40
(UTC+01)
Talk (60 min)
Embrace the failure, stay idempotent
Hardware and software failures are the bread and butter of data engineers today. At first, they sound scary as they might bring a pipeline down, delay data insight, or even worse, provide a wrong data insight for the business users. Even though there won't be a 100% protection against those errors, there is one concept that helps embrace the failure and mitigate the risk, the idempotency.
In this session you'll learn various techniques to implement idempotency for batch and streaming pipelines. They'll address the issues such as job retries, late data arrival, or yet data backfilling due to a code regression. The solutions are technology-agnostic but to simplify the understanding, you'll see them applied to Apache Kafka, Apache Spark, and Apache Airflow.
After the session you should be able to better understand the challenges in creating resilient data pipelines, so that the failures will be less scary!