Dimensional modelling is a data structure technique optimized for data warehousing and analysis. It involves fact tables containing numeric data linked to dimension tables that provide context. Dimensional models arrange data in a way that makes it easier to retrieve information and generate reports compared to normalized relational models. Key elements include facts, dimensions, attributes and star or snowflake schemas linking fact and dimension tables. Dimensional modelling involves identifying business processes, granularity of data, dimensions, facts and implementing star or snowflake schemas. It provides benefits like standardization, flexibility and optimized performance for analysis.
Dimensional modelling is a data structure technique optimized for data warehousing and analysis. It involves fact tables containing numeric data linked to dimension tables that provide context. Dimensional models arrange data in a way that makes it easier to retrieve information and generate reports compared to normalized relational models. Key elements include facts, dimensions, attributes and star or snowflake schemas linking fact and dimension tables. Dimensional modelling involves identifying business processes, granularity of data, dimensions, facts and implementing star or snowflake schemas. It provides benefits like standardization, flexibility and optimized performance for analysis.
Dimensional modelling is a data structure technique optimized for data warehousing and analysis. It involves fact tables containing numeric data linked to dimension tables that provide context. Dimensional models arrange data in a way that makes it easier to retrieve information and generate reports compared to normalized relational models. Key elements include facts, dimensions, attributes and star or snowflake schemas linking fact and dimension tables. Dimensional modelling involves identifying business processes, granularity of data, dimensions, facts and implementing star or snowflake schemas. It provides benefits like standardization, flexibility and optimized performance for analysis.
Dimensional modelling is a data structure technique optimized for data warehousing and analysis. It involves fact tables containing numeric data linked to dimension tables that provide context. Dimensional models arrange data in a way that makes it easier to retrieve information and generate reports compared to normalized relational models. Key elements include facts, dimensions, attributes and star or snowflake schemas linking fact and dimension tables. Dimensional modelling involves identifying business processes, granularity of data, dimensions, facts and implementing star or snowflake schemas. It provides benefits like standardization, flexibility and optimized performance for analysis.
Download as PPTX, PDF, TXT or read online from Scribd
Download as pptx, pdf, or txt
You are on page 1of 26
Dimensional Modelling
What is Dimensional Model?
• A dimensional model is a data structure technique optimized for Data warehousing tools. • The concept of Dimensional Modelling was developed by Ralph Kimball and is comprised of "fact" and "dimension" tables. • A Dimensional model is designed to read, summarize, analyze numeric information like values, balances, counts, weights, etc. in a data warehouse. • In contrast, relation models are optimized for addition, updating and deletion of data in a real-time Online Transaction System. • These dimensional and relational models have their unique way of data storage that has specific advantages. • In the relational mode, normalization and ER models reduce redundancy in data. On the contrary, dimensional model arranges data in such a way that it is easier to retrieve information and generate reports. • Dimensional models are used in data warehouse systems and not a good fit for relational systems. Elements of Dimensional Data Model • Fact Facts are the measurements/metrics or facts from your business process. For a Sales business process, a measurement would be quarterly sales number • Dimension Dimension provides the context surrounding a business process event. In simple terms, they give who, what, where of a fact. In the Sales business process, for the fact quarterly sales number, dimensions would be • Who – Customer Names • Where – Location • What – Product Name In other words, a dimension is a window to view information in the facts. • Attributes The Attributes are the various characteristics of the dimension. In the Location dimension, the attributes can be • State • Country • Zipcode, etc. Attributes are used to search, filter, or classify facts. Dimension Tables contain Attributes • Fact Table A fact table is a primary table in a dimensional model. A Fact Table contains • Measurements/facts • Foreign key to dimension table • Dimension table – A dimension table contains dimensions of a fact. – They are joined to fact table via a foreign key. – Dimension tables are de-normalized tables. – The Dimension Attributes are the various columns in a dimension table – Dimensions offers descriptive characteristics of the facts with the help of their attributes – No set limit set for given for number of dimensions – The dimension can also contain one or more hierarchical relationships Steps of Dimensional Modelling • The accuracy in creating Dimensional modeling determines the success of data warehouse implementation. • Here are the steps to create Dimension Model 1. Identify Business Process 2. Identify Grain (level of detail) 3. Identify Dimensions 4. Identify Facts 5. Build Star • The model should describe the Why, How much, When/Where/Who and What of your business process Step 1) Identify the business process • Identifying the actual business process a datarehouse should cover. This could be Marketing, Sales, HR, etc. as per the data analysis needs of the organization. • The selection of the Business process also depends on the quality of data available for that process. It is the most important step of the Data Modelling process, and a failure here would have cascading and irreparable defects. • To describe the business process, use plain text or use basic Business Process Modelling Notation (BPMN) or Unified Modelling Language (UML). Step 2) Identify the grain • The Grain describes the level of detail for the business problem/solution. It is the process of identifying the lowest level of information for any table in your data warehouse. If a table contains sales data for every day, then it should be daily granularity. If a table contains total sales data for each month, then it has monthly granularity. • During this stage, we will answer questions like – Do we need to store all the available products or just a few types of products? This decision is based on the business processes selected for Datawarehouse – Do we store the product sale information on a monthly, weekly, daily or hourly basis? This decision depends on the nature of reports requested by executives – How do the above two choices affect the database size? • Example of Grain: • The CEO at an MNC wants to find the sales for specific products in different locations on a daily basis. • So, the grain is "product sale information by location by the day.“ Step 3) Identify the dimensions • Dimensions are nouns like date, store, inventory, etc. These dimensions are where all the data should be stored. For example, the date dimension may contain data like a year, month and weekday. Example of Dimensions: The CEO at an MNC wants to find the sales for specific products in different locations on a daily basis. • Dimensions: Product, Location and Time • Attributes: For Product: Product key (Foreign Key), Name, Type, Specifications • Hierarchies: For Location: Country, State, City, Street Address, Name Step 4) Identify the Fact • This step is co-associated with the business users of the system because this is where they get access to data stored in the data warehouse. Most of the fact table rows are numerical values like price or cost per unit, etc. Example of Facts: • The CEO at an MNC wants to find the sales for specific products in different locations on a daily basis. • The fact here is Sum of Sales by product by location by time. Step 5) Build Schema • Implement the Dimension Model. A schema is nothing but the database structure (arrangement of tables). There are two popular schemas : Star Schema • The star schema architecture is easy to design. It is called a star schema because diagram resembles a star, with points radiating from a center. The center of the star consists of the fact table, and the points of the star is dimension tables. • The fact tables in a star schema which is third normal form whereas dimensional tables are de-normalized. Snowflake Schema • The snowflake schema is an extension of the star schema. In a snowflake schema, each dimension are normalized and connected to more dimension tables. Rules for Dimensional Modelling • Load atomic data into dimensional structures. • Build dimensional models around business processes. • Need to ensure that every fact table has an associated date dimension table. • Ensure that all facts in a single fact table are at the same grain or level of detail. • It's essential to store report labels and filter domain values in dimension tables • Need to ensure that dimension tables use a surrogate key • Continuously balance requirements and realities to deliver business solution to support their decision-making Benefits of dimensional modeling • Standardization of dimensions allows easy reporting across areas of the business. • Dimension tables store the history of the dimensional information. • It allows to introduced entirely new dimension without major disruptions to the fact table. • Dimensional also to store data in such a fashion that it is easier to retrieve the information from the data once the data is stored in the database. • Compared to the normalized model dimensional table are easier to understand. • Information is grouped into clear and simple business categories. • The dimensional model is very understandable by the business. This model is based on business terms, so that the business knows what each fact, dimension, or attribute means. • Dimensional models are deformalized and optimized for fast data querying. Many relational database platforms recognize this model and optimize query execution plans to aid in performance. • Dimensional modeling creates a schema which is optimized for high performance. It means fewer joins and helps with minimized data redundancy. • The dimensional model also helps to boost query performance. It is more denormalized therefore it is optimized for querying. • Dimensional models can comfortably accommodate change. Dimension tables can have more columns added to them without affecting existing business intelligence applications using these tables. What is Multidimensional schemas? • Multidimensional schema is especially designed to model data warehouse systems. The schemas are designed to address the unique needs of very large databases designed for the analytical purpose (OLAP). • Types of Data Warehouse Schema: Following are 3 chief types of multidimensional schemas each having its unique advantages. – Star Schema – Snowflake Schema – Galaxy Schema What is a Star Schema? • The star schema is the simplest type of Data Warehouse schema. • It is known as star schema as its structure resembles a star. • In the Star schema, the center of the star can have one fact tables and numbers of associated dimension tables. • It is also known as Star Join Schema and is optimized for querying large data sets. • For example, fact table is at the center which contains keys to every dimension table like Deal_ID, Model ID, Date_ID, Product_ID, Branch_ID & other attributes like Units sold and revenue. Characteristics of Star Schema: • Every dimension in a star schema is represented with the only one-dimension table. • The dimension table should contain the set of attributes. • The dimension table is joined to the fact table using a foreign key • The dimension table are not joined to each other • Fact table would contain key and measure • The Star schema is easy to understand and provides optimal disk usage. • The dimension tables are not normalized. For instance, in the above figure, Country_ID does not have Country lookup table as an OLTP design would have. • The schema is widely supported by BI Tools What is a Snowflake Schema? • A Snowflake Schema is an extension of a Star Schema, and it adds additional dimensions. It is called snowflake because its diagram resembles a Snowflake. • The dimension tables are normalized which splits data into additional tables. In the following example, Country is further normalized into an individual table. Characteristics of Snowflake Schema: • The main benefit of the snowflake schema it uses smaller disk space. • Easier to implement a dimension is added to the Schema • Due to multiple tables query performance is reduced • The primary challenge that you will face while using the snowflake Schema is that you need to perform more maintenance efforts because of the more lookup tables.