Embrace the failure, stay idempotent

Hardware and software failures are the bread and butter of data engineers today. At first, they sound scary as they might bring a pipeline down, delay data insight, or even worse, provide a wrong data insight for the business users. Even though there won't be a 100% protection against those errors, there is one concept that helps embrace the failure and mitigate the risk, the idempotency.

Big Data

Database

In this session you'll learn various techniques to implement idempotency for batch and streaming pipelines. They'll address the issues such as job retries, late data arrival, or yet data backfilling due to a code regression. The solutions are technology-agnostic but to simplify the understanding, you'll see them applied to Apache Kafka, Apache Spark, and Apache Airflow.

After the session you should be able to better understand the challenges in creating resilient data pipelines, so that the failures will be less scary!

Bartosz Konieczny

Bartosz Konieczny is a freelance data engineer enthusiast who has been coding for 15+ years. He has held various senior hands-on positions that helped him work on many data engineering problems in batch and stream processing, such as sessionization, data ingestion, data cleansing, ordered data processing, or data migration. He enjoys solving data challenges with public cloud services and Open Source technologies, especially Apache Spark, Apache Kafka, Apache Airflow, and Delta Lake. Beyond that, readers can read his data engineering blog posts at waitingforcode.com each month.

NDC { Porto }

Embrace the failure, stay idempotent

Bartosz Konieczny