Friday
Room 4
13:30 - 15:30
(UTC+01)
Half-Day
Part 1/2: Data pipelines, Documentation and Lineage with SQL & dbt
SQL is an integral part of data analysis - it is comparatively easy to learn and can be easily run on a database. Accordingly, it is very popular with many data analysts. In reality, you often find the pattern: Python glue code in notebooks that executes SQL statements.
dbt (data build tool) is a command line tool for building SQL data pipelines in a structured way. It also enables the validation of data. And the result is not only tables in a database, but also documentation and dependency graphs. This not only helps with the preparation of data. The subsequent regular analyses and evaluations can also be conveniently automated. Including traceability of which analyses use which data. And if the basic data contains errors, the analyses based on it are not even updated.
With dbt, even a developer can have fun with SQL & data transformations!
We will cover:
- sources & models
- documentation & lineage
- testing your data & enforce contracts
- extending dbt with plugins
- how to: ci/cd, deployment
This workshop is a kickstarter for you to get going with your structured data pipelines to power analytics.