75% found this document useful (4 votes)
412 views14 pages

Datawarehousing Interview Questions

The document describes an interview question bank that provides study material and interview questions for various topics. It includes links to hundreds of questions and answers for topics like Informatica, SQL, Unix, SAP, Python, HR, data warehousing, ETL, and more. The interview question bank is hosted at https://www.instamojo.com/interview_questions.

Uploaded by

elena
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
75% found this document useful (4 votes)
412 views14 pages

Datawarehousing Interview Questions

The document describes an interview question bank that provides study material and interview questions for various topics. It includes links to hundreds of questions and answers for topics like Informatica, SQL, Unix, SAP, Python, HR, data warehousing, ETL, and more. The interview question bank is hosted at https://www.instamojo.com/interview_questions.

Uploaded by

elena
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 14

Interview Question Bank - https://www.instamojo.

com/interview_questions

Interview Question Bank

For Complete Study Material For Interview Preparations Click On Below Link
https://www.instamojo.com/interview_questions

Click On Below Links For More Courses

Informatica 650+ Interview Questions & Answers


https://www.instamojo.com/interview_questions/informatica-interview-questions-asked-in-top

SQL 500+ Interview Questions & Answers


https://www.instamojo.com/interview_questions/sql-interview-questions-asked-in-top-it-comp

Unix 500+ Interview Questions & Answers


https://www.instamojo.com/interview_questions/unix-interview-questions-asked-in-top-it-com

SAP BODS 250+ Interview Questions & Answers


https://www.instamojo.com/interview_questions/sap-bods-interview-questions-answers-asked-i

SAP BO 250+ Interview Questions & Answers


https://www.instamojo.com/interview_questions/sap-business-objects-interview-questions-ask

Python 500+ Interview Questions & Answers


https://www.instamojo.com/interview_questions/python-interview-questions-asked-in-top-it-c

HR 150+ Interview Questions & Answers


https://www.instamojo.com/interview_questions/hr-round-interview-questions-answers-asked-i

Data Warehousing and ETL Interview Questions & Answers


https://www.instamojo.com/interview_questions/data-warehousing-and-etl-interview-questions

SCD Types (0,1,2,3,4,5,6,7) With Explanation


https://www.instamojo.com/interview_questions/scd-5a66d

Interview Question Bank - https://www.instamojo.com/interview_questions


Interview Question Bank - https://www.instamojo.com/interview_questions

Campus Placement 2000+ Interview Questions With Answers For Freshers


https://www.instamojo.com/interview_questions/campus-placement-interview-questions-with-an

100+ Resume , CV & Cover Letters


https://www.instamojo.com/interview_questions/download-your-favorite-resume-template-and-c

Register Here For Job Referral Program


https://www.instamojo.com/interview_questions/apply-for-referral-program-and-get-an-opport

ETL – Data Warehousing


Interview Questions & Answers

What is a data warehouse? List the types of Data warehouse architectures?


A data warehouse is the electronic storage of an organization’s historical data for the
purpose of data analytics. In other words, a data warehouse contains a wide variety of data
that supports the decision-making process in an organization.
There are mainly 3 types of data warehouse architectures:

• The objective of a single layer is to minimize the amount of data stored


Single-tier
by removing data redundancy.
architecture
• It is not frequently used in practice.

• This architecture separates physically available sources from the data


warehouse.
Two-tier • This architecture is not expandable & does not support a large number
architecture of end-users.
• Because of network limitations, this architecture faces connectivity
issues.
• It is the most widely used architecture that is consist of the Top,
Middle and Bottom Tier.
• Bottom Tier: Usually a relational database of the Datawarehouse
serves as the bottom tier where Data is cleansed, transformed, and
Three-tier loaded.
architecture • Middle Tier: This application tier is an OLAP server & presents an
abstracted view of the database which acts as a mediator between the
end-user and the database.
• Top-Tier: The top tier is a front-end client layer channels data out of
the data warehouse.

Interview Question Bank - https://www.instamojo.com/interview_questions


Interview Question Bank - https://www.instamojo.com/interview_questions

What are different characteristics of Data Warehouse?


1. Data warehouse is a database which is separate from operational database which stores
historical information also.
2. Data warehouse database contains transactional as well as analytical data.
3. Data warehouse helps higher management to take strategic as well as tactical decisions
using historical or current data.
4. Data warehouse helps consolidated historical data analysis.
5. Data warehouse helps business user to see the current trends to run the business.
6. Data warehouse is used for reporting and data analysis purpose.

What are different types of Data warehouse Systems?


1. Data Mart
2. Online Analytical Processing (OLAP)
3. Online Transactional Processing
4. Predictive Analysis

What is Business Intelligence?


Business Intelligence is also known as DSS – Decision support system which refers to the
technologies, application and practices for the collection, integration and analysis of the
business-related information or data. Even, it helps to see the data on the information itself.

What is Data Mining?


Data Mining is set to be a process of analysing the data in different dimensions or
perspectives and summarizing into a useful information. Can be queried and retrieved the
data from database in their own format.

What is Virtual Warehouse?


The view over an operational data warehouse is known as virtual warehouse.

What is the main difference between Inmon and Kimball philosophies of Data
Warehousing?
Both differ in the concept of building the Data Warehouse.
• Kimball views Data Warehousing as a constituency of data marts. Data marts are focused on
delivering business objectives for departments in an organization, and the Data Warehouse
is a conformed dimension of the data marts. Hence, a unified view of the enterprise can be
obtained from the dimension modelling on a local departmental level.
• Inmon explains in creating a Data Warehouse on a subject-by-subject area basis. Hence, the
development of the Data Warehouse can start with data from the online store. Other
subject areas can be added to the Data Warehouse as their needs arise. Point-of-sale (POS)
data can be added later if management decides that it is necessary.

Interview Question Bank - https://www.instamojo.com/interview_questions


Interview Question Bank - https://www.instamojo.com/interview_questions

• Hence, the process will be as follows:


Kimball > First Data Marts > Combined Ways > Data Warehouse
Inmon > First Data Warehouse > Data marts

What is difference between Data warehouse and Transactional System?

Data warehouse Transactional System

Data warehouse stores complex and Transactional system stores the updated
general form of the data. daily transactions, workloads etc.

Data warehouse stores historical Transactional System Contains current


information data of organization

Data warehouse database generally have Operational or Transactional database


only read only access means user can only have insert, update privileges as the data
select the data processing and updation needed.

Transactional system requires parallel


processing of the data, concurrency
control. Data consolidation is less
Data warehousing requires data cleaning, required in transactional system as
data validation and data consolidation compare to OLAP database.

What is the very basic difference between data warehouse and operational databases?
A data warehouse contains historical information that is made available for analysis of the
business whereas an operational database contains current information that is required to
run the business.

Define data analytics in the context of data warehousing?


• Data analytics is the science of examining raw data with the purpose of drawing business-
driven conclusions about that data.
• The role of a data warehouse is to enable data analysis.

What is a subject-oriented data warehouse?


Subject-oriented data warehouses are those that store data around a particular “subject”
such as customer, sales, product, among others.

List any five applications of data warehouse?


Some applications include financial services, banking services, customer goods, retail
sectors, controlled manufacturing.

Interview Question Bank - https://www.instamojo.com/interview_questions


Interview Question Bank - https://www.instamojo.com/interview_questions

What is ODS?
ODS stands for Operational Data Store. it is essentially a repository of real-time operational
data.

What does OLAP stand for?


OLAP stands for Online Analytical Processing. It is a system which collects, manages, and
processes multi-dimensional data for analysis and management.

List the types of OLAP server?


There are four types of OLAP servers, namely Relational OLAP, Multidimensional OLAP,
Hybrid OLAP, and Specialized SQL Servers.

Which one is faster, Multidimensional OLAP or Relational OLAP?


Multidimensional OLAP is faster than Relational OLAP.

List the functions performed by OLAP?


OLAP performs functions such as roll-up, drill-down, slice, dice, and pivot.

List the types of OLAP servers?


• Relational OLAP
• Multidimensional OLAP
• Hybrid OLAP
• Specialized SQL Servers

List some of the functions performed by OLAP?


Some of the major functions performed by OLAP include “roll-up”, “drill-down”, “slice”,
“dice”, and “pivot”.

What does OLTP stand for?


OLTP stands for Online Transaction Processing. It is a system which modifies the data
whenever it received, to many concurrent users.

What is the difference between OLTP and OLAP?


Following are the differences between OLTP and OLAP:
OLTP OLAP

Data is from original data source Data is from various data sources

Simple queries by users Complex queries by system

Normalized small database De-normalized Large Database

Fundamental business tasks Multi-dimensional business tasks

Interview Question Bank - https://www.instamojo.com/interview_questions


Interview Question Bank - https://www.instamojo.com/interview_questions

What Is Schema?
A schema is collection of database objects of a User.

What is database schema? What are its types?


Schema is logical description of whole database. Database schema is a skeleton or structure
of the database which represents database logically. Same like a database Data warehouse
also requires to maintain database schema. Data warehouse schema includes name of
database objects with its relationship maintained in diagrammatic format. Database uses
relational model where data warehouse uses following types of Database Schema:
1.Star Schema
2.Snowflake Schema
3.Fact Constellation schema which is also called as Galaxy Schema.

List the Schema that a data warehouse system can implements?


A data Warehouse can implement star schema, snowflake schema, and fact constellation
schema.

What is Star Schema?


In Star schema there is Fact table as a centre and all dimension tables surrounded with that
fact table. It is called as Star schema because diagram resembles a star with points radiating
from centre. Star schema is used in simple data mart or data warehouse. Star schema is
designed in such way that it will optimize the querying on large data sets. In Star schema
multiple dimension tables joined with only one fact table in denormalized form.
OBIEE BMM (Business model) Layer always follows star schema.

What is snowflakes schema?


Snowflake schema is a form of dimensional modelling where dimensions are stored with
multiple dimension tables. Snowflake schema is variation over star schema. The schema is
diagrammed as each fact is surrounded with dimensions; and some dimensions are further
related to other dimensions which are branched in snowflake pattern. In snowflake schema
multiple dimension tables are organized and joined with fact table. Only difference between
star and snowflake schema is dimensions are normalized in snowflake schema.
Normalization splits up data into additional tables.
Real life Example:
In Diagram i shown the snowflake schema where sales table is a fact table, and all are
dimensions. Store table is further normalized into different tables name city, state and
region.

What is the language that is used for schema definition?


Data Mining Query Language (DMQL) is used for schema definition.

What do you understand by the ER model?


ER model or entity-relationship model is a methodology for data modelling wherein the goal
of modelling is to normalize the data by reducing redundancy.

Interview Question Bank - https://www.instamojo.com/interview_questions


Interview Question Bank - https://www.instamojo.com/interview_questions

What do you understand by dimensional modelling?


Dimensional model is a methodology that consists of “dimensions” and “fact tables”. Fact
tables are used to store various transactional measurements from “dimension tables” that
qualifies the data.

What are the types of Dimensional Modelling?


There are three types of Dimensional Modelling and they are as follows:
• Conceptual Modelling
• Logical Modelling
• Physical Modelling

What is the difference between ER Modelling and Dimensional Modelling?


ER modelling will have logical and physical model, but Dimensional modelling will have only
Physical model.
ER Modelling is used for normalizing the OLTP database design whereas Dimensional
Modelling is used for de-normalizing the ROLAP and MOLAP design.

Explain what is a dimension of data warehousing? What are the primary functions of the
dimensions?
A dimension can be defined as classification where it categorizes the measures and facts in
an orderly fashion. Using these facts and measures, it will help the users to define and
provide necessary answers for the business operations.
For example:
The common dimensions that are used are:
1. People
2. Products
3. Place
4. Time, etc.
The primary functions of the dimensions are as follows:
1. Filtering
2. Grouping
3. Labelling
Usually, these factors are all utilized in the concept of slicing and dicing the data. Out of
which slicing refers filtering the data and dicing the data refers grouping the data.

Real Life Example:


Consider following table which contains item information. In the following table ITEM KEY is
primary key which uniquely identifies the rows in the dimension table. ITEM KEY will be
present in Fact table.
ITEM KEY ITEM NAME BRAND SOLD BY Category

Interview Question Bank - https://www.instamojo.com/interview_questions


Interview Question Bank - https://www.instamojo.com/interview_questions

00001 Yellow shirt Yazaki Amit Shirts

00002 Football Start sports Rahul Sports

00003 Blue Shorts Puma Amit Shorts

In the image i have explained which are fact and which are dimension tables. You will able
to see there are four dimensions :

1.Time
2.Location
3.Item
4.Branch

What are the different types of “dimension”?


• Conformed dimension
• Junk dimension
• Degenerated dimension
• Role Playing dimension

What is junk dimension?


• In scenarios where certain data may not be appropriate to store in the schema, the data (or
attributes) can be stored in a junk dimension. The nature of the data of junk dimension is
usually Boolean or flag values.
• A single dimension is formed by lumping a number of small dimensions. This is called a junk
dimension. Junk dimension has unrelated attributes. The process of grouping random flags
and text attributes in a dimension by transmitting them to a distinguished sub-dimension is
related to junk dimension.

What is a mini dimension?


Mini dimensions are dimensions that are used when a large number of rapidly changing
attributes are separated into smaller tables.

How can we load the time dimension?


Time dimensions are usually loaded through all possible dates in a year and it can be done
through a program. Here, 100 years can be represented with one row per day.

What is mean by Fact Tables? Explain with example


Fact table is central table found in star schema or snowflakes schema which is surrounded
by dimension tables. Fact table contains numeric values that are known as measurements.
Fact table has two types of columns:
1.Facts
2.Foreign key of dimension tables.
The measures in a fact table are of three types:

Interview Question Bank - https://www.instamojo.com/interview_questions


Interview Question Bank - https://www.instamojo.com/interview_questions

1.Additive:
Measures that can be added across any dimension
2.Non-additive:
Measures that cannot be added across any dimension
3.Semi-additive:
Measures that can be added across some dimensions.
Real Example:
Following is a fact table which contains all the primary keys of dimensions table and added
measures for ITEM,i.e. Product sold.
ITEM KEY Time key Product key Date key Product Sold

00001 T001 P001 D001 100

00002 T002 P002 D002 30

00003 T003 P003 D003 15

The fact table contains the foreign keys, time dimensions, product dimension, customer
dimension, measurement values. Following are some examples of common facts :
No of unit sold, Margin, Sales revenue and the dimension tables are customer, time and
product etc.. which is used to analyse data.

Define fact-less fact?


Fact-less fact is a fact table that does not contain any value. Such a table only contains keys
from different dimension tables.

What is mean by additive, semi-additive and non-additive measures?


Non-additive Measures
Non-additive measures are those which cannot be used inside any numeric aggregation
function (e.g. SUM (), AVG() etc.). One example of non-additive fact is any kind of ratio or
percentage. Example, 5% profit margin, revenue to asset ratio etc. A non-numerical data can
also be a non-additive measure when that data is stored in fact tables, e.g. some kind of
varchar flags in the fact table.
Semi Additive Measures
Semi-additive measures are those where only a subset of aggregation function can be
applied. Let’s say account balance. A sum() function on balance does not give a useful result
but max() or min() balance might be useful. Consider price rate or currency rate. Sum is
meaningless on rate; however, average function might be useful.
Additive Measures
Additive measures can be used with any aggregation function like Sum(), Avg() etc. Example
is Sales Quantity etc.

What is mean by Granularity?

Interview Question Bank - https://www.instamojo.com/interview_questions


Interview Question Bank - https://www.instamojo.com/interview_questions

Granularity in table represents the level of information stored in the table. In BI granularity
is very important concept to check the table data. The granularity is high and low. High
granularity data contains the data with high information, or you can say it as transaction
level data is high granularity data. Low granularity means data has low level information
only. Fact table always have low granularity mean we need very low-level data in fact table.
Following 2 points are important in defining granularity:
1.Determining the dimensions that are to be included
2.Determining location to place hierarchy of each dimension of information.
Real life Example:
Date Dimension Granularity level :
Year, month, quarter, period, week, day

What is normalization?
The term normalization is also considered as “Database Normalization”.
This is a process of rearranging or organizing the columns and the tables that are associated
in a relational database. By doing this activity, it reduces the data redundancy and also helps
in improving the data integrity.
Further, this process also helps in simplifying the database design so that the optimal
structure is enabled. In short, normalization helps the data to split into additional tables to
incorporate the data and at the same time makes it easy while retrieving the data.

Out of star schema and snowflake schema, whose dimension table is normalized?
Snowflake schema uses the concept of normalization.

What is the benefit of normalization?


Normalization helps in reducing data redundancy.

Define metadata?
Metadata is simply defined as data about data. In other words, we can say that metadata is
the summarized data that leads us to the detailed data.

What does Metadata Respiratory contain?


Metadata respiratory contains definition of data warehouse, business metadata,
operational metadata, data for mapping from operational environment to data warehouse,
and the algorithms for summarization.

Differentiate between “bteqexport” and “fastexport”?


“bteqexport” is used when the number of rows is less than half a million, while “fastexport”
is used if the number of rows in more than half a million.

Define load manager.

Interview Question Bank - https://www.instamojo.com/interview_questions


Interview Question Bank - https://www.instamojo.com/interview_questions

A load manager performs the operations required to extract and load the process. The size
and complexity of load manager varies between specific solutions from data warehouse to
data warehouse.

Define the functions of a load manager.


A load manager extracts data from the source system. Fast load the extracted data into
temporary data store. Perform simple transformations into structure similar to the one in
the data warehouse.

What is VLDB in the context of data warehousing?


VLDB stands for Very Large Database. The size of a VLDB is pre-set to more than one
terabyte.

What is a data mart?


Data mart is a subset of organizational data. In other words, it is a collection of data specific
to a particular group within an organization.

What is data aggregation?


Data aggregation is the broad definition for any process that enables information gathering
expression in a summary form, for statistical analysis.

What is summary information?


Summary Information is the location within data warehouse where predefined aggregations
are stored.

What is a data cube?


A data cube helps represent data in multiple facets. Data cubes are defined by dimensions
and facts.

What Is Table?
A table is the basic unit of data storage in an ORACLE database. The tables of a database
hold all of the user accessible data. Table data is stored in rows and columns.

What Is A View?
A view is a virtual table. Every view has a Query attached to it. (The Query is a SELECT
statement that identifies the columns and rows of the table(s) the view uses.)

What is the difference between ‘view’ and ‘materialized view’?


View:
• Tail raid data representation is provided with a view to access data from its table.
• It has logical structure that does not occupy space.
• Changes get affected in the corresponding tables.
Materialized view:
• Pre-calculated data persists in the materialized view.

Interview Question Bank - https://www.instamojo.com/interview_questions


Interview Question Bank - https://www.instamojo.com/interview_questions

• It has physical data space occupation.


• Changes will not get affected in the corresponding tables.

What Is An Extent?
An Extent is a specific number of contiguous data blocks, obtained in a single allocation, and
used to store a specific type of information.

What Is An Index?
An Index is an optional structure associated with a table to have direct access to rows, which
can be created to increase the performance of data retrieval. Index can be created on one
or more columns of a table.
What Is An Integrity Constrains?
An integrity constraint is a declarative way to define a business rule for a column of a table.

What Are Clusters?


Clusters are groups of one or more tables physically stores together to share common
columns and are often used together.

What is SCD?
SCD is defined as slowly changing dimensions, and it applies to the cases where record
changes over time.

What are the types of SCD?


There are three types of SCD and they are as follows:
SCD 1 – The new record replaces the original record
SCD 2 – A new record is added to the existing customer dimension table
SCD 3 – A original data is modified to include new data

For Full Information on SCD from Type 0,1,2,3,4,5,6,7


Then Visit This Link - https://www.instamojo.com/interview_questions/scd-5a66d

What is the benefit of a data warehouse system in business intelligence?


Data warehouse system is very much benefited from business intelligence by processing
sales report. This sales report can be collected from different sources and stored in a data
warehouse for analytics and reporting to understand the business and its improvement. For
sales improvement of a business data warehouse, technology is essential.

What is the application of a data warehouse platform in the healthcare industry?


Data warehouse system is very much benefited in the Healthcare industry by processing
genomic and proteomic analysis. This report can be collected from different sources of
patients and stored in a data warehouse for analytics and reporting to understand the
disease and its improvement. For better drug and improvement of a drug, data warehouse
technology is essential.

Interview Question Bank - https://www.instamojo.com/interview_questions


Interview Question Bank - https://www.instamojo.com/interview_questions

How can a doctor get benefited from data warehouse technologies?


Data warehouse system is very much benefited in hospital industry by processing patient
report. This patient report can be collected from different sources and stored in a data
warehouse for understanding the disease and understand the patient affected and its
improvement. For patient disease improvement and betterment in treatment data
warehouse technology is essential to store report and track reports.
What is the application of a data warehouse platform in political science?
Data warehouse system is very much benefited from political science for processing, ID
proof tracking and categorizing enrollment of election data. This election report can be
collected from different sources of election booth and stored in a data warehouse for
analytics and reporting to understand the count of votes and selecting the party for
leadership. For economics improvement of a country data warehouse technology is
essential.
How can a political leader get benefited from data warehouse technologies?
Data warehouse system is very much benefited in the political industry by processing voter
report. This voter report can be collected from different sources and stored in a data
warehouse. For members, performance and improvement in economic data warehouse
technology is essential to store report and track reports for risk management, fraud
management, facilities to be provided all over the country
How can the Banking sector get benefited from data warehouse technologies?
Data warehouse system is very much benefited in the banking industry by processing
shares, investment report. This financial report can be collected from different sources and
stored in a data warehouse. For investors shares performance and improvement in financial
growth. Data warehouse technology is essential to store report and track reports for risk
management, fraud management and to provide loan’s credit card to get more interest in
support to the banking sector and industry.

Interview Question Bank

For Complete Study Material For Interview Preparations Click On Below Link
https://www.instamojo.com/interview_questions

Click On Below Links For More Courses

Informatica 650+ Interview Questions & Answers


https://www.instamojo.com/interview_questions/informatica-interview-questions-asked-in-top

SQL 500+ Interview Questions & Answers

Interview Question Bank - https://www.instamojo.com/interview_questions


Interview Question Bank - https://www.instamojo.com/interview_questions

https://www.instamojo.com/interview_questions/sql-interview-questions-asked-in-top-it-comp

Unix 500+ Interview Questions & Answers


https://www.instamojo.com/interview_questions/unix-interview-questions-asked-in-top-it-com

SAP BODS 250+ Interview Questions & Answers


https://www.instamojo.com/interview_questions/sap-bods-interview-questions-answers-asked-i

SAP BO 250+ Interview Questions & Answers


https://www.instamojo.com/interview_questions/sap-business-objects-interview-questions-ask

Python 500+ Interview Questions & Answers


https://www.instamojo.com/interview_questions/python-interview-questions-asked-in-top-it-c

HR 150+ Interview Questions & Answers


https://www.instamojo.com/interview_questions/hr-round-interview-questions-answers-asked-i

Data Warehousing and ETL Interview Questions & Answers


https://www.instamojo.com/interview_questions/data-warehousing-and-etl-interview-questions

SCD Types (0,1,2,3,4,5,6,7) With Explanation


https://www.instamojo.com/interview_questions/scd-5a66d

Campus Placement 2000+ Interview Questions With Answers For Freshers


https://www.instamojo.com/interview_questions/campus-placement-interview-questions-with-an

100+ Resume , CV & Cover Letters


https://www.instamojo.com/interview_questions/download-your-favorite-resume-template-and-c

Register Here For Job Referral Program


https://www.instamojo.com/interview_questions/apply-for-referral-program-and-get-an-opport

Interview Question Bank - https://www.instamojo.com/interview_questions

You might also like