Slowly Changing Dimensions Type 2, without the headache
The essential pattern for tracking history in dimensions — explained with a concrete example and the most common mistakes.
Every time someone asks "what was this customer's address when the sale happened?", you're facing a Slowly Changing Dimension (SCD) problem. And the answer, most of the time, is SCD Type 2.
The problem
Imagine a customer dimension. A customer moves to a new city. If you simply update the row (SCD Type 1), you lose where they were before — and your historical sales now point to the wrong city.
The solution: Type 2
In SCD Type 2, you never update an existing row for a tracked attribute. Instead, you close the current row and insert a new version. Each row gets three control columns:
valid_from— when the version became effectivevalid_to— when it stopped being effective (orNULL/future date if active)is_current— boolean flag for the live version
-- Close the previous version
UPDATE dim_customer
SET valid_to = current_date(), is_current = false
WHERE customer_id = 42 AND is_current = true;
-- Insert the new version
INSERT INTO dim_customer (customer_id, city, valid_from, valid_to, is_current)
VALUES (42, 'Lisbon', current_date(), NULL, true);
The fact table always references the surrogate key of the correct version at the time of the event — not the customer's natural key.
The most common mistakes
- Using the natural key in the fact. Always use the surrogate key of the version valid at the time.
- Forgetting idempotency. If the pipeline runs twice, it must not create duplicate versions. Compare attributes before generating a new version.
- Tracking too many attributes. Not every field needs history. Consciously decide what is Type 1 and what is Type 2.
In the Lakehouse, Delta Lake's MERGE makes Type 2 far more elegant than it was in the SSIS days. But the concept has been the same for decades — and it remains one of the fundamentals that most separates those who can model data from those who can only load tables.
Related articles
Incremental ingestion: stop reloading everything every night
Watermarking, change data capture and the patterns that cut cost and processing windows in ETL pipelines.
Read articleMedallion Architecture: the pattern that organizes your Lakehouse
How the Bronze, Silver and Gold layers turn a chaotic data lake into a reliable, auditable platform.
Read articleEnjoyed this? Check out the e-books for in-depth content.
E-books