0% found this document useful (0 votes)
8 views

Lec 11- DW

Uploaded by

Ayesha Asad
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lec 11- DW

Uploaded by

Ayesha Asad
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

DATA WAREHOUSING

Lecture 11
Ms. Ushna Tasleem
RECAP
WHAT IS DATA WAREHOUSING?

 Data warehousing involves collecting, storing, and managing enormous data from
various sources. It is a centralized repository that offers a single source of truth for
consistent and reliable data analysis to support business decision-making.
 This data is structured, cleansed, transformed, and organized to provide a unified
view of your organization’s operations and performance.
 In Other words, Data warehousing is the process of transferring and storing data
from multiple sources into a single repository known as a data warehouse. The
data warehouse acts as a central data bank from which analytics can be performed
through business intelligence tools.
 The biggest innovation data warehouses introduced at their inception, according to
DW 2.0: The Architecture for the Next Generation of Data Warehousing, was the
ability to store “integrated granular historical data.”

 Breaking that down into human terms, this means data warehouses excel at storing
data that’s:

 Integrated: They combine data from many databases and data sources.
 Granular: The data they house is highly detailed and can be used in many different
ways.
 Historical: They can host a continuous record of data over years and years.
 You can store this data in three different ways: on-premise data warehouses, cloud
data warehouses, and hybrid data warehouses.
 On-premise data warehouses run on physical servers that your company owns and
manages. Cloud data warehouses are fully online, and you pay for space on
servers that another company manages, like Amazon Redshift. Hybrid data
warehouses are a mix of both on-premise and cloud, and companies making the
transition to the cloud over a period of time use this option.
ONLINE ANALYTICAL PROCESSING (OLAP),

 With all the data stored in one place, data warehouses use a specific approach to
process data called online analytical processing (OLAP), which is specifically
designed for complex queries.
 One way to think about it is that when you go to your data warehouse to ask a
question about the relationship between one set of data and another, OLAP is a
way of organizing and moving among the rows and rows of shelves to quickly find
that information.
 This is great for business intelligence because the questions you ask about your
data in order to make decisions are rarely simple. Because data warehouses use
OLAP, they make finding answers to these complex questions very efficient. As a
result, they’ve become a foundation for many successful business intelligence
systems.
HOW DO DATA WAREHOUSES WORK?
HOW DO DATA WAREHOUSES WORK?

 Data warehouses are fairly complex systems but can be thought of as


encompassing three core aspects: storage, software, and labor. When making the
decision to implement a data warehouse, you need to take into account the
investment required for all three.

 Storage is a fairly simple choice. As we mentioned earlier, you can host your data
warehouse on-premises, in the cloud, or use a hybrid approach. On-premises
hosting is, according to some, on its way out. Cloud hosting is much cheaper and
more flexible because you’re renting space on another’s server. You don’t need to
run maintenance, you can expand and cut back as needed, and there is an ever-
expanding set of features added each year. Bridging the gap between these two
approaches is hybrid hosting, which, as we mentioned before, is the preferred
choice for companies migrating from on-premises to cloud hosting.
HOW DO DATA WAREHOUSES WORK?
 To get data into your data warehouse, you need to use a type of software commonly called ETL software. Extract,
transform, load (ETL) is a process where the data is extracted, made ready for use, then loaded into the data
warehouse.

 Nowadays, we recommend and see many more companies using an alternative to ETL called extract, load,
transform (ELT). Often companies will extract data from source data, load it into a data lake, then use data
warehouses to transform the data. Both ETL and ELT are facilitated with software like Panoply.io and Stitch. If
you’d like to learn more, check out our detailed resource on ETL, ELT, and even ETLT.

 Of course, data warehouses don’t run themselves. Labor is a significant part of keeping a data warehouse running
because it’s not just a system; it’s a “full-fledged…architecture” that requires experts to set up and manage.

 The purpose of all this work is to centralize and organize data, so it can be more easily understood. This is where
business intelligence tools come in. They essentially sit atop the data warehouses as a layer that helps you query,
analyze, and visualize your data.
DW CHARACTERISTICS:

 Data warehouse can be controlled when the user has a shared way of explaining
the trends that are introduced as specific subject. Below are major characteristics
of data warehouse :
DW CHARACTERISTICS:

 Subject Oriented: Focuses on a specific area or subject such as sales, customers, or


inventory.
 Integrated: Integrates data from multiple sources into a single, consistent format.
 Read-Optimized: Designed for fast querying and analysis, with indexing and aggregations
to support reporting.
 Summary Data: Data is summarized and aggregated for faster querying and analysis.
 Historical Data: Stores large amounts of historical data, making it possible to analyze
trends and patterns over time.
 Schema-on-Write: Data is transformed and structured according to a predefined schema
before it is loaded into the data warehouse.
 Query-Driven: Supports ad-hoc querying and reporting by business users, without the
need for technical support.
DW FUNCTIONS

 It works as a collection of data and here is organized by various communities that


endures the features to recover the data functions. It has stocked facts about the
tables which have high transaction levels which are observed so as to define the
data warehousing techniques and major functions which are involved in this are
mentioned below:
1. Data Consolidation: The process of combining multiple data sources into a single
data repository in a data warehouse. This ensures a consistent and accurate view
of the data.
2. Data Cleaning: The process of identifying and removing errors, inconsistencies,
and irrelevant data from the data sources before they are integrated into the
data warehouse. This helps ensure the data is accurate and trustworthy.
3. Data Integration: The process of combining data from multiple sources into a single, unified
data repository in a data warehouse. This involves transforming the data into a consistent format
and resolving any conflicts or discrepancies between the data sources. Data integration is an
essential step in the data warehousing process to ensure that the data is accurate and usable for
analysis. Data from multiple sources can be integrated into a single data repository for analysis.
4. Data Storage: A data warehouse can store large amounts of historical data and make it easily
accessible for analysis.
5. Data Transformation: Data can be transformed and cleaned to remove inconsistencies,
duplicate data, or irrelevant information.
6. Data Analysis: Data can be analyzed and visualized in various ways to gain insights and make
informed decisions.
7. Data Reporting: A data warehouse can provide various reports and dashboards for different
departments and stakeholders.
8. Data Mining: Data can be mined for patterns and trends to support decision-making and
strategic planning.
9. Performance Optimization: Data warehouse systems are optimized for fast querying and
analysis, providing quick access to data.
ARCHITECTURE OF A DATA WAREHOUSE
 A typical data warehouse architecture consists of the following components:
 Source Systems: These are the various operational systems from which data is extracted,
such as databases, ERP systems, CRM, etc.
 ETL (Extract, Transform, Load) Process: The ETL process extracts data from source systems,
transforms it (cleaning, standardizing, aggregating), and loads it into the data warehouse.
 Data Warehouse Database: This is the central repository where transformed data is stored,
often in a structured format like relational databases.
 Data Marts: These are smaller, specialized databases designed to serve specific departments
(e.g., marketing, sales) by providing tailored data sets from the data warehouse.
 OLAP (Online Analytical Processing): A technology that enables multidimensional analysis
of data in the warehouse, often used for complex queries and analysis.
 BI Tools: Business Intelligence tools (e.g., Tableau, Power BI) are used to query the data
warehouse, create reports, and visualize data for decision-making.
TYPES OF DATA WAREHOUSES

 There are several types of data warehouses depending on the organization’s needs:
 Enterprise Data Warehouse (EDW): A large-scale data warehouse that provides a
central repository for the entire organization. It serves various departments with
integrated data.
 Operational Data Store (ODS): A staging area for operational data, used for short-
term queries. It often acts as an intermediary between operational systems and the
EDW.
 Data Marts: Smaller, department-specific data stores that are derived from the
larger data warehouse. They focus on specific business functions, such as sales or
marketing.
BENEFITS OF DATA WAREHOUSING IN BI

•Improved Decision-Making: A data warehouse ensures that decision-makers have access to a


consolidated, reliable source of information, which improves the quality and speed of decisions.

•Enhanced Data Quality: By cleansing and standardizing data during the ETL process, data
warehouses ensure higher data accuracy.

•Historical Analysis: The ability to store historical data allows businesses to identify trends, measure
performance over time, and forecast future trends.

•Scalability: Data warehouses are built to handle vast amounts of data, making them scalable as the
organization’s data needs grow.

•Faster Query Performance: Optimized for analytical queries, data warehouses support complex
queries and reports, improving the efficiency of data retrieval.
CHALLENGES IN DATA WAREHOUSING

•High Implementation Cost: Building and maintaining a data warehouse can be expensive due to hardware,
software, and skilled personnel requirements.

•Data Integration: Integrating data from various sources can be complex, especially when dealing with
different data formats and structures.

•Maintenance: As organizations grow, the volume of data can increase, requiring ongoing maintenance,
optimization, and storage management.

•Latency: Although data warehouses provide near-real-time data, they may not be suitable for applications
requiring real-time data processing.
WHEN SHOULD I USE A DATA WAREHOUSE
FOR BUSINESS INTELLIGENCE?
 there are generally four stages of data sophistication: source data, data lakes, data
warehouses, and data marts. Knowing when to invest in a data warehouse requires
you to know each stage, but at the end of the day, the data warehouse stage is
what unlocks the true power of your data.
1. SOURCE DATA

 Source data is any individual set of data like databases, Excel spreadsheets,
individual application reports, etc. It’s structured (i.e., organized) yet cached data
that works fine alone but does not provide a larger picture of your organization’s
data as a whole.
2.DATA LAKE

 A data lake is a central place where all raw, unorganized data is stored. Unlike a
data warehouse, which stores data in an orderly, structured way, a data lake holds
everything in its original form, like dumping it all into a lake.
 The term was coined by James Dixon, who likens it to a place where skilled divers
can explore the raw data. However, the challenge with a data lake is that the data
isn't organized for immediate use or analysis, and it can take a lot of effort to find
what you're looking for.
3.DATA WAREHOUSE

 Like a data lake, a data warehouse centralizes your data, but as we’ve established,
it’s well-organized and set up for efficient analysis. It’s a single source of truth for
all data that’s easier to understand and navigate.

 Data warehouses can hook right up to source data, but nowadays, we’re seeing
more and more companies use their data warehouse as a layer on top of their data
lake. Following Dixon’s comparison, if a data lake is the water/data in its natural,
unorganized state, a data warehouse is where you treat it and make it ready for
consumption
4.DATA MART

 A data warehouse can be too powerful for small tasks, like using a sledgehammer
to swat a fly(means using a tool or solution that is far too large, powerful, or
complex for a small, simple task.). For repeated queries, such as by a marketing
team, a data mart is a better option. A data mart is a smaller, curated set of data
tailored for specific use cases.

 Think of the warehouse as a treatment center where water is purified, and a data
mart as pre-packaged water bottles ready for use. The data warehouse remains
the core system, providing a structured, centralized view of all data, making it
easier to access and create specific data marts when needed.
DATA WAREHOUSE VS. DATA LAKE

 Data Warehouse: Structured, organized data designed for analysis and reporting. It
is ideal for business users who need clean, standardized data for decision-making.
 Data Lake: A storage system for raw, unstructured data. Unlike a data warehouse,
data lakes store data in its natural format, making it more flexible but less
structured.
BUSINESS INTELLIGENCE VS. DATA
WAREHOUSES
 Business intelligence and data warehousing are similar concepts that operate in the
same space, yet are very different. Both BI and data warehouses involve the
storage of data.
 However, business intelligence is also the collection, methodology, and analysis of
data. Meanwhile, a data warehouse is fundamentally the storage and organization
of that data to provide for BI processes.
 Maintaining and deploying a data warehouse is so critical to BI that they are often
collectively referred to as BIDW.
HOW IS DATA ANALYZED USING A DATA
WAREHOUSE?
 DWHs use Online Analytical Processing (OLAP) to process large swaths of data. It
consolidates all the data on a centralized platform. It is a data processing approach
employed by DWHs for streamlining complex queries. In simpler terms, it is a
computing method that helps users extract and query the required data for
analysis.

 For example, if someone asks about the relationship between two different
datasets in a DWH, OLAP processing would be used to move through the stored
data to find, identify, and summarize the desired information quickly. Using OLAP, a
data warehouse provides BI with the data it needs to analyze.
EXAMPLES OF DATA WAREHOUSING IN
BUSINESS INTELLIGENCE
 Retail: A retail company uses a data warehouse to analyze sales data across different stores, customer
demographics, and time periods. This helps optimize product offerings, improve marketing strategies, and
forecast demand.
 Healthcare: A healthcare provider consolidates patient data, treatment histories, and medical outcomes into a
data warehouse. This helps in tracking patient progress, improving treatment plans, and complying with
regulatory reporting requirements.
 Finance: Financial institutions use data warehouses to analyze transactions, manage risks, and detect fraud. By
combining data from various banking systems, they can make more informed investment and lending decisions.
 9. Trends in Data Warehousing
 Cloud Data Warehousing: Many businesses are shifting to cloud-based data warehouses (e.g., Amazon Redshift,
Google BigQuery) for cost efficiency, scalability, and flexibility.
 Real-Time Data Warehousing: Increasingly, businesses are demanding near-real-time data for faster decision-
making, leading to the development of more real-time data processing and streaming analytics.
 Artificial Intelligence (AI) Integration: AI and machine learning models are being integrated into data
warehouses to automate data analysis, generate insights, and predict future trends.
CONCLUSION

 Data warehousing is a critical component of Business Intelligence. It provides a


structured, reliable, and efficient way to store and analyze data from multiple
sources, allowing organizations to make informed, data-driven decisions. While
setting up and maintaining a data warehouse can be costly and complex, the
benefits in terms of improved decision-making, data quality, and operational
efficiency make it a valuable investment for businesses looking to leverage their
data effectively.
REFERENCES

 https://airbyte.com/data-engineering-resources/business-intelligence-data-wareh
ouse
.
 https://www.impactmybiz.com/blog/business-intelligence-data-warehousing/
 https://www.tableau.com/learn/articles/value-of-bi-data-warehousing
 https://www.astera.com/type/blog/data-warehouse-and-business-intelligence/
 https://www.atlassian.com/data/business-intelligence/data-warehouses-guide
 https://www.geeksforgeeks.org/characteristics-and-functions-of-data-warehouse/

You might also like