Data Warehouse Toolkit Chapter V - notes

23 Nov 2014

Sometimes fact table which seems a monolith at the beginning turns out to contain facts which are not compatible with each other - this requires creating a separate fact table for each identified type. Indicators that this might be the case:
- Granularity levels.
- Facts are completely different from each other - they describe different business processes
- Users want to analyze those groups of facts separately
Accumulating snapshot models something that has well-defined milestones.
Types of dimension changes (… i have a weird feeling that i already seen that somewhere before …)
- Type 0: retain original - nothing ever changes
- Type 1: overwrite value in row
- Type 2: add a new row and reference that new row only in new facts
  - You may want to add additional columns which denote the effective and expiration dates (dates when the value started being used and when it stopped).
- Mix of Type 1 and Type 2
- Type 3: add a new attribute (column) (waaaat?) - do this only when the change impacts a lot of rows in dimensions
- Type 4: Mini-dimension - create a new dimension (with small number of rows) which contains the attributes which frequently change. Unfortunately you need to add a new key to fact table which will refer to the mini-dimension.
- Type 5: add mini-dimension key to the primary dimension. Overwrite the key reference with each profile change.
- Type 6: Type 1 (overwrite) attributes in Type 2 (adding new row) dimension (eg. a column denoting the current value of something - there might be a second column which will denote the historic value)
- Type 7: Dual Type 1 and Type 2 dimensions

It seems that those types are mostly about introducing a combination of volatile (get updated over time) and non-volatile columns which present historical values. If you look at it from hadoop perspective (we can pretty much only add - unless you use hbase) - all types including updates (1,4,5,6,7) don’t have much use.

braindump ... cause thread-dumps are not enough ;)

Data Warehouse Toolkit Chapter V - notes

Related Posts

Model stores in ML platforms 30 Mar 2021

Feature stores from Data Engineering perspective 30 Mar 2021

Sales Force Marketing Cloud REST API upload errors 23 Mar 2021