Business Intelligence Introduced by Howard
Business Intelligence Introduced by Howard
Data Warehouse drives the corporate information supply chain to support Corporate Business Intelligence process.
Business Intelligence introduced by Howard Dresner of the Gartner Group in 1989 is a set of concepts and methodologies to improve decisionmaking in business through the use of facts and fact-based systems.
Fact-Based Systems
Executive Information Systems Decision Support Systems Enterprise Information Systems Management Support Systems OLAP Data & Text Mining Data Visualization Geographic Information Systems
Data Warehousing
This simple concept has recently become a Multi-Billion dollar industry. New Breeds of vendors are introducing tools and technologies at an alarming rate to deliver data warehouse solutions.
Business Intelligence
-
Extract data from transactions systems. Manipulate extracted data to generate reports. Makes such reports accessible to the decision-makers.
First Chapter
What is Data Warehousing? Data Warehousing Architecture Components of Data Warehousing Evolution of ERP Data Warehousing A Multi-Dimensional Data model
Data Warehouse.
This is conceptually the same defined my Inmon.
It could be one large physical instance or a collection of several physical data object instances (Detailed & Aggregated), each serving a special purpose conforming to a bigger corporate vision.
Data Mart
Data Marts are stand-alone small data warehouses limited to a subject area (Ex:- Sales Analysis).
We have Dependent and Independent data marts. Dependent Data Marts are extracted views of a corporate data warehouse. Independent Data Marts are those which are built directly against transaction systems.
If the house does not adhere to an architectural plan, its integrity will always be in question.
SAP BW
The SAP Business Information Warehouse (SAP BW) follows this paradigm. Under one frame work , you can have a huge Extraprise Data warehouse of a hierarchy of enterprise data warehouses or data marts, all conforming to the same architecture, infrastructure and information delivery methods.
However, that information is spread over a wide variety of platforms and applications throughout the corporate IT structures.
This can make obtaining vital facts and figures a complex and time-consuming task.
Extraction, Transformation, & Transport services fetch data from data sources , Qualify , perform value- added data manipulation, and push data out to data warehouse data objects.
Data Transport. Data Transformation. Data cleansing. Data Extraction. Subject Models.
At this layer, data is further transformed for specific data analysis tasks such as drill-down analysis and predefined reports integration with third-party subscribed data. This Layer is very complex within the data warehouse architecture. Key services performed at this layer are the following:
End Of Intro
A decision support database that is maintained separately from the organizations operational database Support information processing by providing a solid platform of consolidated, historical data for analysis.
W.H.Inmons Definition
A data warehouse is a timevariant, integrated, nonvolatile , & subject-oriented collection of data in support of managements decision-making process. William.H.Inmon - Father of Data
Warehousing
Time Variant
The time horizon for the data warehouse is significantly longer than that of operational systems.
Operational database: current value data. Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)
Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain time element.
Integrated
Constructed by integrating multiple, heterogeneous data sources
Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources
E.g., Hotel price: currency, tax, breakfast covered, etc.
Non-Volatile
A physically separate store of data transformed from the
operational environment.
Operational update of data does not occur in the data warehouse environment.
Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing:
Subject-Oriented
Organized around major subjects, such as customer, product, sales. Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.
Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc.
Read / Write Lots Of Scans Index / Hash On Prim. Key Short, Simple Transaction Complex Query Tens Thousands 100mb-Gb Transaction Throughput Millions Hundreds 100GB-TB Query Throughput, Response
Missing data: Decision Support requires historical data which operational DBMS do not typically maintain. Data consolidation: Decision Support requires consolidation (Aggregation, Summarization) of data from heterogeneous sources. Data quality: Different Sources typically use inconsistent data representations, codes and formats which have to be reconciled.
A Data Warehouse is based on a Multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions
Dimension tables, such as item (item name, brand, type), or time (day, week, month, quarter, year)
Fact table contains measures (such as dollars) and keys to each of the related dimension tables
item
Sales Fact Table time_key item_key branch_key
item_key item_name brand type supplier_type
branch
branch_key branch_name branch_type
location_key units_sold
location
location_key street city province_or_street country
dollars_sold
avg_sales
Measures
time
supplier
supplier_key supplier_type
time_key
item_key branch_key
branch
branch_key branch_name branch_type
location
location_key street city_key
location_key
units_sold dollars_sold avg_sales Measures
city
time
time_key day day_of_the_week month quarter year
item_key
shipper_key
from_location
location
location_key street city province_or_street country
branch
branch_key branch_name branch_type
The values of the Dimension attributes are stored in various denormalized Dimension tables (from a semantical point of view: The Dimensions)
Here, logically related dimension attributes are stored as hierarchy (parent-child relationships) within the dimension table. The dimension tables are linked relationally with the central fact table by way of foreign or primary key relationships. The dimensional attribute with the finest level of detail of the corresponding dimension table is a foreign key in the fact table. In this way, all data records in the facts table can be identified uniquely.
Fact Table
DAY_ID 03.01.2002 03.01.2002 CUSTOMER_ID K100 K100 MATERIAL_ID M1111 M2222 Sales 50.000 3.000 Quantity 100 60
DIM_ID_MATERIAL
Material Group
CUSTOMER_ID Customer Name Customer Dimension Table DIM_ID_CUSTOMER SID_CUSTOMER Fact Table DIM_ID_MATERIAL DIM_ID_CUSTOMER DIM_ID_TIME DIM_ID_MATERIAL SID_CUSTOMER Material Dimension Table
SID_CUSTOMER
External Hierarchy
MATERIAL_ID
Material Name
Attribute Table
Material Group
This graphic shows how the SAP BW-star schema is an enhancement of the classic star schema. The enhancement comes from the fact that the dimension tables do not contain master data information. This master data information is stored in separate tables, called master data tables. This section firstly explains the SAP BW-Star Schema in detail. At the end of the section, both star schemas are compared in terms of their advantages and disadvantages. In the SAP BW-star schema, the distinction is made between two self-contained areas:
InfoCube InfoCubes are the central objects of the multi-dimensional model in SAP BW. Reports and analyses are based on these. From a reporting perspective, an InfoCube describes a self-contained data set within a business area, for which you can define queries. An InfoCube (BasicCube) consists of a number of relational tables that are combined on a multi-dimensional basis. In other words, in consists of a central fact table and several surrounding dimension tables. Hint: There are various types of InfoCube in BW. The InfoCube with type BasicCube is the InfoCube relevant for modelling, since only physical objects (objects that contain data) are considered in the modeling within the SAP BW-data model. For this reason, InfoCube always refers to BasicCube in this section. (You can find additional information about other cube types in the Virtual Cubes lesson).
DIM_ID_DATENPAKET SID_REQUEST Fact Table DIM_ID_DATAPACKET DIM_ID_TIME DIM_ID_UNIT DIM_ID_CUSTOMER DIM_ID_MATERIAL Revenue Quality
DIM_ID_TIME
DIM_ID_EINHEIT
SID_DAY
SID_CURRENCY
DIM_ID_,MATERIAL
SID_MATERIAL
SID_CUSTOMER
Figure 15:InfoCube
Master Data Tables/SID Tables Additional information about characteristics is referred to as master data in the SAP BW. A distinction is made between the following master data types: Attributes Texts (External) Hierarchies
This assignment is made in a SID table for the respect characteristic, whereby the characteristic becomes the primary key in the SID table. In the following graphic, the SID key SID_MATERIAL is assigned to the characteristic MATERIAL in the SID table for characteristic MATERIAL. The SID table is connected to the associated master data tables via the characteristic key. Hint: By using the term Hierarchy, we usually mean an arrangement of objects having a 1:N relationship to each other.
Classic Star Schema in Comparison with the SAP BW-Star Schema Firstly, let us compare terminology for the two schemas.
Classic Star Schema Fact Dimension Attribute Described Attribute Dimension Tables Dimension = Dimension Table BW Star Schema Key Figure Characteristic Attribute Tex External Hierarchies Dimension Tables (contain no master data) Dimension Table, SID Tables (optional)Master Data Tables
Advantages
Disadvantages
Redundant entries exist in the dimension tables. In contrast to the historization of fact data (the
same
dimension table),
Modeling
some
hierarchy
types
(parallel
and
imbalanced hierarchies for example) in a dimension can lead to anomalies. Query performance is also made worse, since aggregates and Basic fact data stored in the same table (fact table).
Historizing dimensions Multi-lingual capability Cross-InfoCube use of master data (shared dimensions)
The query performance is improved here as aggregated key figures can be stored in their own fact tables.
Hint: Another enhancement vis--vis the classic star schema is the excavation of aggregated key figures in their own fact tables using the construction of aggregates that were previously not taken into account in the SAP BW-star schema. You can find additional information about aggregates in the Administering InfoCubes & Aggregates unit.
Metadata Repository
Meta data is the data defining warehouse objects. It has the following kinds
Operational meta-data
data lineage (history of migrated data and transformation path), currency of data (active, archived, or purged), monitoring information (warehouse usage statistics, error reports, audit trails)
The algorithms used for summarization The mapping from operational environment to the data warehouse Data related to system performance
warehouse schema, view and derived data definitions
Business data
business terms and definitions, ownership of data, charging policies