Nico Budi Darmawan Tan - Simple Outside, Complicated Inside.: Star Schema

#By Wikipedia

In computing, the Star Schema (also called star-join schema) is the simplest style of data mart schema. The start schema consists of one or more fact tables referencing any number of dimension tables. The start schema is an important special case of the snowflake schema, and is more effective for handling simpler queries. The star schema gets its name from the physical model’s resemblance to a star with a fact table at its center and the dimension tables surrounding it representing the star’s points.

Model : The star schema separates business process data into facts, which hold the measureable, quantitative data about a business, and dimensions which are descriptive attributes relate to fact data.

Fact tables : record measurements or metrics for a specific event. Fact tables generally consist of numeric values, and foreign keys to dimensional data where descriptive information is kept.

Dimension tables : usually have a relatively small number of records compared to fact tables, but each record may have a very large number of attributes to describe the fact data.

Benefits : Star schemas are denormalized, meaning the normal rules of normalization applied to transactional relational databases are relaxed during star schema design and implementation. The benefits of start schema denormalization are :

Simpler queries – start schema join logic is generally simpler than the join logic required to retrieve data from a highly normalized transactional schemas.
Simplified business reporting logic – when compared to highly normalized schemas, the star schema simplifies common business reporting logic, such as period-over-period and as-of reporting.
Query performance gains – star schemas can provide performance enhancements for read-only reporting applications when compared to highly normalized schemas.
Fast aggregations – the simpler queries against a star schema can result in improved performance for aggregation operations.
Feeding cubes – star schemas are used by all OLAP systems to build proprietary OLAP cubes efficiently ; in fact, most major OLAP systems provide a ROLAP mode of operation which can use a star schema directly as a source without building a proprietary cube structure.

Disadvantages : The main disadvantage of the star schema is that data integrity is not enforced as well as it is a highly normalized database. One-off inserts and updates can result in data anomalies which normalized schemas are designed to avoid. Generally speaking, star schemas are loaded in a highly controlled fashion via batch processing or near-real time “trickle feeds”, to compensate for the lack of protection afforded by normalization.

Star Schema Example From Wikipedia

#By Learn Data Modeling
Star Schema is a relational database schema for representing multidimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. The center of the star schema consists of a large fact table and it points towards the dimension tables. The advantage of star schema are slicing down, performance increase and easy understanding of data.

Steps in designing Star Schema :

Identify a business process for analysis
Identify measures or facts
Identify dimensions for facts
List the column that describe each dimension
Determine the lowest level of summary in a fact table

Important aspects of Star Schema & Snow Flake Schema :

In a star schema every dimension will have a primary key.
In a star schema, a dimension table will not have any parent table.
Whereas in a snow flake schema, a dimension table will have one or more parent tables.
Hierarchies for the dimensions are stored in the dimensional table itself in star schema
Whereas hierarchies are broken into separate tables in snow flake schema. These hierarchies helps to drill down the data from topmost hierarchies to the lowermost hierarchies.

Glossary :

Hierarchy : a logical structure that uses ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation; for example in a time dimension, a hierarchy might be used to aggregate data from the Month level to the Quarter level, from the Quarter level to the Year level. A hierarchy can also be used to define a navigational drill path, regardless of whether the levels in the hierarchy represent aggregated totals or not.
Level : a position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the Month, Quarter, and Year levels.
Fact table : a table in star schema that contains facts and connected to dimensions. A fact table typically has two types of columns : those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.

...

Source 1 : Wikipedia
Source 2 : Learn Data Modeling

Nico Budi Darmawan Tan - Simple Outside, Complicated Inside.

Friday, March 7, 2014

Star Schema - Data Warehouse

No comments:

Post a Comment