Two common designs for a Data Warehouse are identified as a “Star Schema” and a “Snowflake Schema”. When we discuss the differences and individual preferences, usually the gloves come off and DBA’s will duke it out. It becomes a religious argument with plenty of passion and zeal. Let’s take a look at why… The definition of a Star Schema is one in which all dimension tables are directly related to the fact table through foreign key relationships. It is the simplest form of Data Warehouse design where complex dimension relationships are collapsed into a single layer of dimension tables. Looking at the diagram of the design shows a “Star” type figure with the fact table at the center, hence the name. The Snowflake Schema gets its name from an extra layer or layers of dimension tables which are NOT directly related to the fact table. The extra layers make the star look more like a “Snowflake” supposedly although it normally takes a few eggnogs to see the similarity. The dimension tables are effectively normalized to yield additional relationships. Normalized design is essential in high performing OLTP databases where data needs to be stored in one place, and one place only, to ensure efficient updates and space savings. One exception is where we selectively choose to violate the rules of normalization in the name of performance costing redundant storage. This is called “denormalization” and should be performed with extreme care. With a Data Warehouse we can relax the rules of normalization a little more and denormalize as long as we buy off on the extra space needed. As always it comes down to performance versus storage. The Star Schema effectively is a denormalized schema costing extra storage. Proponents would argue this is done for performance reasons and will also emphasize the “keep it simple stupid” or “KISS” approach to database development just to wind-up the Snowflake gurus. The Snowflake Schema is more normalized and saves space and redundancy. Skeptics would argue that it does this at the expense of performance degradation caused by excessive joins. Of course, there’s a bit of Star and Snowflake in all of us. The typical Time Dimension in both schemas is really a collapsed snowflake-turned-star schema design with Year, Quarter, Month dimensions collapsed into a single table. Some older analysis products actually required a Star Schema. Fortunately, Microsoft SQL Server Analysis Services (SSAS) now allows us a choice as it supports both schema designs. To calm people down, I usually introduce the concept of a “Constellation” or “Galaxy” schema which actually contains multiple Star and/or Snowflake Schemas combined so we can look to the heavens knowing there is a common solution for all of us. And SSAS supports that too, just so we can all coexist. Thankfully. Cheers, Brian.
Star vs Snowflake Schemas – what’s your belief?
An age old argument comes down to storage vs performance...so what's new?
Copyright © 2011 IDG Communications, Inc.