Philosophy

diepvries has been designed with the following in mind:

Convention over configuration

Strong naming conventions for table and column names allow diepvries to magically know the role of a given entity. For example, h_customer is a hub, hs_customer is a hub satellite, and h_customer_hashkey is the hashkey of the customer hub. These conventions have been carefully crafted to remove ambiguities and make everything predictable. On top of this, Data Vault tables are easier to analyze when they are sorted by type, which comes for free with the table prefixes.

Relying on conventions rather than configuration allows focusing on business logic and does not clutter source code with configuration files, which are hard to maintain and allow inconsistencies to creep in.

Automatic field mapping

Field mapping (between source systems and Data Vault entities) is often manually done and error-prone. In an extraction table, diepvries assumes the field names are the same as in the Data Vault tables, allowing a natural mapping based on identical names.

No external dependencies

With the exception of the Python Snowflake connector (to be able to automatically deserialize tables into Python objects), diepvries doesn’t depend on anything but a Python runtime. This is a standalone tool that does not bring outdated or conflicting packages into your dependencies tree. As a result, it is very lightweight.

Restricted scope

diepvries is not a full-blown solution to manage your Data Vault. Instead, it focuses on one thing and does it well: generating SQL to load data in a Data Vault. This makes it a very flexible tool to integrate into your environment: it does not care how data reaches extraction tables, or how the generated SQL is executed.