Lec 11- DW
Lec 11- DW
Lecture 11
Ms. Ushna Tasleem
RECAP
WHAT IS DATA WAREHOUSING?
Data warehousing involves collecting, storing, and managing enormous data from
various sources. It is a centralized repository that offers a single source of truth for
consistent and reliable data analysis to support business decision-making.
This data is structured, cleansed, transformed, and organized to provide a unified
view of your organization’s operations and performance.
In Other words, Data warehousing is the process of transferring and storing data
from multiple sources into a single repository known as a data warehouse. The
data warehouse acts as a central data bank from which analytics can be performed
through business intelligence tools.
The biggest innovation data warehouses introduced at their inception, according to
DW 2.0: The Architecture for the Next Generation of Data Warehousing, was the
ability to store “integrated granular historical data.”
Breaking that down into human terms, this means data warehouses excel at storing
data that’s:
Integrated: They combine data from many databases and data sources.
Granular: The data they house is highly detailed and can be used in many different
ways.
Historical: They can host a continuous record of data over years and years.
You can store this data in three different ways: on-premise data warehouses, cloud
data warehouses, and hybrid data warehouses.
On-premise data warehouses run on physical servers that your company owns and
manages. Cloud data warehouses are fully online, and you pay for space on
servers that another company manages, like Amazon Redshift. Hybrid data
warehouses are a mix of both on-premise and cloud, and companies making the
transition to the cloud over a period of time use this option.
ONLINE ANALYTICAL PROCESSING (OLAP),
With all the data stored in one place, data warehouses use a specific approach to
process data called online analytical processing (OLAP), which is specifically
designed for complex queries.
One way to think about it is that when you go to your data warehouse to ask a
question about the relationship between one set of data and another, OLAP is a
way of organizing and moving among the rows and rows of shelves to quickly find
that information.
This is great for business intelligence because the questions you ask about your
data in order to make decisions are rarely simple. Because data warehouses use
OLAP, they make finding answers to these complex questions very efficient. As a
result, they’ve become a foundation for many successful business intelligence
systems.
HOW DO DATA WAREHOUSES WORK?
HOW DO DATA WAREHOUSES WORK?
Storage is a fairly simple choice. As we mentioned earlier, you can host your data
warehouse on-premises, in the cloud, or use a hybrid approach. On-premises
hosting is, according to some, on its way out. Cloud hosting is much cheaper and
more flexible because you’re renting space on another’s server. You don’t need to
run maintenance, you can expand and cut back as needed, and there is an ever-
expanding set of features added each year. Bridging the gap between these two
approaches is hybrid hosting, which, as we mentioned before, is the preferred
choice for companies migrating from on-premises to cloud hosting.
HOW DO DATA WAREHOUSES WORK?
To get data into your data warehouse, you need to use a type of software commonly called ETL software. Extract,
transform, load (ETL) is a process where the data is extracted, made ready for use, then loaded into the data
warehouse.
Nowadays, we recommend and see many more companies using an alternative to ETL called extract, load,
transform (ELT). Often companies will extract data from source data, load it into a data lake, then use data
warehouses to transform the data. Both ETL and ELT are facilitated with software like Panoply.io and Stitch. If
you’d like to learn more, check out our detailed resource on ETL, ELT, and even ETLT.
Of course, data warehouses don’t run themselves. Labor is a significant part of keeping a data warehouse running
because it’s not just a system; it’s a “full-fledged…architecture” that requires experts to set up and manage.
The purpose of all this work is to centralize and organize data, so it can be more easily understood. This is where
business intelligence tools come in. They essentially sit atop the data warehouses as a layer that helps you query,
analyze, and visualize your data.
DW CHARACTERISTICS:
Data warehouse can be controlled when the user has a shared way of explaining
the trends that are introduced as specific subject. Below are major characteristics
of data warehouse :
DW CHARACTERISTICS:
There are several types of data warehouses depending on the organization’s needs:
Enterprise Data Warehouse (EDW): A large-scale data warehouse that provides a
central repository for the entire organization. It serves various departments with
integrated data.
Operational Data Store (ODS): A staging area for operational data, used for short-
term queries. It often acts as an intermediary between operational systems and the
EDW.
Data Marts: Smaller, department-specific data stores that are derived from the
larger data warehouse. They focus on specific business functions, such as sales or
marketing.
BENEFITS OF DATA WAREHOUSING IN BI
•Enhanced Data Quality: By cleansing and standardizing data during the ETL process, data
warehouses ensure higher data accuracy.
•Historical Analysis: The ability to store historical data allows businesses to identify trends, measure
performance over time, and forecast future trends.
•Scalability: Data warehouses are built to handle vast amounts of data, making them scalable as the
organization’s data needs grow.
•Faster Query Performance: Optimized for analytical queries, data warehouses support complex
queries and reports, improving the efficiency of data retrieval.
CHALLENGES IN DATA WAREHOUSING
•High Implementation Cost: Building and maintaining a data warehouse can be expensive due to hardware,
software, and skilled personnel requirements.
•Data Integration: Integrating data from various sources can be complex, especially when dealing with
different data formats and structures.
•Maintenance: As organizations grow, the volume of data can increase, requiring ongoing maintenance,
optimization, and storage management.
•Latency: Although data warehouses provide near-real-time data, they may not be suitable for applications
requiring real-time data processing.
WHEN SHOULD I USE A DATA WAREHOUSE
FOR BUSINESS INTELLIGENCE?
there are generally four stages of data sophistication: source data, data lakes, data
warehouses, and data marts. Knowing when to invest in a data warehouse requires
you to know each stage, but at the end of the day, the data warehouse stage is
what unlocks the true power of your data.
1. SOURCE DATA
Source data is any individual set of data like databases, Excel spreadsheets,
individual application reports, etc. It’s structured (i.e., organized) yet cached data
that works fine alone but does not provide a larger picture of your organization’s
data as a whole.
2.DATA LAKE
A data lake is a central place where all raw, unorganized data is stored. Unlike a
data warehouse, which stores data in an orderly, structured way, a data lake holds
everything in its original form, like dumping it all into a lake.
The term was coined by James Dixon, who likens it to a place where skilled divers
can explore the raw data. However, the challenge with a data lake is that the data
isn't organized for immediate use or analysis, and it can take a lot of effort to find
what you're looking for.
3.DATA WAREHOUSE
Like a data lake, a data warehouse centralizes your data, but as we’ve established,
it’s well-organized and set up for efficient analysis. It’s a single source of truth for
all data that’s easier to understand and navigate.
Data warehouses can hook right up to source data, but nowadays, we’re seeing
more and more companies use their data warehouse as a layer on top of their data
lake. Following Dixon’s comparison, if a data lake is the water/data in its natural,
unorganized state, a data warehouse is where you treat it and make it ready for
consumption
4.DATA MART
A data warehouse can be too powerful for small tasks, like using a sledgehammer
to swat a fly(means using a tool or solution that is far too large, powerful, or
complex for a small, simple task.). For repeated queries, such as by a marketing
team, a data mart is a better option. A data mart is a smaller, curated set of data
tailored for specific use cases.
Think of the warehouse as a treatment center where water is purified, and a data
mart as pre-packaged water bottles ready for use. The data warehouse remains
the core system, providing a structured, centralized view of all data, making it
easier to access and create specific data marts when needed.
DATA WAREHOUSE VS. DATA LAKE
Data Warehouse: Structured, organized data designed for analysis and reporting. It
is ideal for business users who need clean, standardized data for decision-making.
Data Lake: A storage system for raw, unstructured data. Unlike a data warehouse,
data lakes store data in its natural format, making it more flexible but less
structured.
BUSINESS INTELLIGENCE VS. DATA
WAREHOUSES
Business intelligence and data warehousing are similar concepts that operate in the
same space, yet are very different. Both BI and data warehouses involve the
storage of data.
However, business intelligence is also the collection, methodology, and analysis of
data. Meanwhile, a data warehouse is fundamentally the storage and organization
of that data to provide for BI processes.
Maintaining and deploying a data warehouse is so critical to BI that they are often
collectively referred to as BIDW.
HOW IS DATA ANALYZED USING A DATA
WAREHOUSE?
DWHs use Online Analytical Processing (OLAP) to process large swaths of data. It
consolidates all the data on a centralized platform. It is a data processing approach
employed by DWHs for streamlining complex queries. In simpler terms, it is a
computing method that helps users extract and query the required data for
analysis.
For example, if someone asks about the relationship between two different
datasets in a DWH, OLAP processing would be used to move through the stored
data to find, identify, and summarize the desired information quickly. Using OLAP, a
data warehouse provides BI with the data it needs to analyze.
EXAMPLES OF DATA WAREHOUSING IN
BUSINESS INTELLIGENCE
Retail: A retail company uses a data warehouse to analyze sales data across different stores, customer
demographics, and time periods. This helps optimize product offerings, improve marketing strategies, and
forecast demand.
Healthcare: A healthcare provider consolidates patient data, treatment histories, and medical outcomes into a
data warehouse. This helps in tracking patient progress, improving treatment plans, and complying with
regulatory reporting requirements.
Finance: Financial institutions use data warehouses to analyze transactions, manage risks, and detect fraud. By
combining data from various banking systems, they can make more informed investment and lending decisions.
9. Trends in Data Warehousing
Cloud Data Warehousing: Many businesses are shifting to cloud-based data warehouses (e.g., Amazon Redshift,
Google BigQuery) for cost efficiency, scalability, and flexibility.
Real-Time Data Warehousing: Increasingly, businesses are demanding near-real-time data for faster decision-
making, leading to the development of more real-time data processing and streaming analytics.
Artificial Intelligence (AI) Integration: AI and machine learning models are being integrated into data
warehouses to automate data analysis, generate insights, and predict future trends.
CONCLUSION
https://airbyte.com/data-engineering-resources/business-intelligence-data-wareh
ouse
.
https://www.impactmybiz.com/blog/business-intelligence-data-warehousing/
https://www.tableau.com/learn/articles/value-of-bi-data-warehousing
https://www.astera.com/type/blog/data-warehouse-and-business-intelligence/
https://www.atlassian.com/data/business-intelligence/data-warehouses-guide
https://www.geeksforgeeks.org/characteristics-and-functions-of-data-warehouse/