Nico Budi Darmawan Tan - Simple Outside, Complicated Inside.: Slowly Changing Dimensions

Slowly Changing Dimensions (SCDs) refers to the process of tracking and analyzing attribute relationship that change over time. For example, a retail company has numerous stores that are assigned to various managers. The sales organization hierarchy in their data model is structured as follows :

Although the relationships between districts, regions and managers do not change over time, the managers assigned to store change as they are reassigned or as they enter or leave the company.

SCDs is also referred to as versioning in MicroStrategy.

Although SCDs are well documented in the data warehousing literature; there are different terminologies used to distinguish the different types of SCDs. In this lesson we will discuss four types of SCDs, they are :

#As Is vs. As Is (Type 1 SCDs)

As Is vs. As Is (Type 1) involves analyzing all data in accordance with the attribute relationships as they exist currently. Regardless of how relationships have changed over time, you aggregate and qualify all data (current and historical) based on the current values in the lookup and relationship tables. If aggregate tables exists, you either have to modify how the values roll up to reflect the current attribute relationships, or you have to ignore the tables when you perform this type of analysis.

Notice that for Missy and Jim, the sales for October are for the only the stores they manage as of November. For Liz even though did not start as an employee untill November, she has sales for October.

#As Is vs. As Was (Type 2 SCDs)

As Is vs. As Was (Type 2) involves analyzing all data in accordance with the attribute relationships as they exist currently and as they existed historically. You aggregate and qualify data based on the values in the lookup and relationship tables that correspond to the desired timeframe. If aggregate tables exists, the logic behind how the values roll up may differ based on the time period that you query.

As the manager-store relationships change over time, there will be multiple records not only to map managers to their current stores, but also to map them to every store to which they were previously assigned. You can store these relationships in a single lookup table along with data range values and flags that indicate the time period when a particular relationship existed.

Since Liz did not start as a manager until November, she does not have any sales number for October.

#Like vs. Like

Like vs. Like, also referred to as comparable analysis, involves analyzing only data records that exist and are identical for the querying time period. In other words, only data relationships that have not changed over time are included in the result set. Similar to the As Is vs. As Was or Type 2 SCDs, the schema preserves both the historical and current relationships. As a result, when you run a report to aggregate the total sales for each manager, only data that exists unchanged are part of the final result set.

Missy and Jim were reassigned stores in November and Liz came on as a new manager in November. Only Jena stayed identical in both time periods. Thus, the report contains sales from those stores that had the same managers in both time periods.

#As Was vs. As Was

As Was vs. As Was involves analyzing data only in accordance with the attribute relationships as the existed historically. Similar to the As Is vs. As Was or Type 2 SCDs, the schema preserves both the historical and current relationships. With this type of analysis, however, you reference only historical relationships in queries. As a result, when you run a report to aggregate the sales for each manager, the fact table rolls up the store sales to which the manager was historically assigned, not the store that they are currently assigned.

Missy’s sales for both months roll up into the stores that she was initially assigned to even though in November she no longer managers Metro South and is instead responsible for Metro West. Similarly, Jim’s sales for both months roll up into the three stores he was initially assigned to even though he no longer manages Metro West in November. Since Liz did not start as an employee until November, she is not included in the report. To include her November sales for the Metro South store, you must query for data based on current relationships.

#Summary of Four Types of SCDs

Each of these types of analysis returns a different result set when querying the same data.

You can implement SCDs by :

1. Creating a life stamp.
With a life stamp solution, yo include a start date and end date in the lookup table that indicate the time period for which specific records are valid. When you run reports, you can then use filters with the appropriate date ranges to determine how data is aggregated.

Notes :

The current_Flag column in the LU_Store table indicates whether each record in the table contains a current value (Y) or historical value (N). You can easily perform As Is vs. As Is (Type 1) analysis on this table by simply filtering on the Current Flag being set to “Y”.

If you have all of your current records set to the same arbitrary end date (in this example, 12/31/2099), you could also retrieve current data from the table by filtering on the End Date attribute. Only the current records would be associated with the arbitrary end date.

Implementing SCDs using life stamp requires a time filter, you either have to create all of the appropriate filters and prompt users on which one to select for a particular query, or users have to understand the data well enough to create their own filters.

My Notes :

MSTR is not support theta join, so users must use filters or flag on their own report.
These are joins where the evaluation condition is something other than equals (=). Since a traditional implementation may require a "join between", this eliminates a lot of options. (Bryan Brandow)

2. Creating a hidden atributte that relates to both current historical values.
With a hidden attribute solution, you have multiple look up tables that store current and historical values, and you use the hidden attribute to facilitate joins between the lookup tables and any fact tables.

My Notes :

This method can only contain 1 historical attribute. I think it’s similar with Type 3 or Type 4 of SCD‘s Approach. So I won’t recommend it.

3. By denormalizing the associated fact tables
A final alternative for implementing SCDs is to denormalize the fact tables that contain the “versioned” attribute. You can denormalize the fact table to include the lowest level attribute as well as the higher-level attribute, thus moving the relationship in which the versioning occurs into the fact table itself.

My Notes :

The fact table will be larger especially in data volume.

...

Related With : Slowly Changed Dimensions - Data Warehouse

...

Source 1 : MicroStrategy BI – Course Work 9.31 – Advance Datawarehousing
Source 2: Bryan Brandow – MicroStrategy Blog

Nico Budi Darmawan Tan - Simple Outside, Complicated Inside.

Tuesday, March 4, 2014

Slowly Changing Dimensions - MicroStrategy

No comments:

Post a Comment