Program: BE Computer Engineering

Curriculum Scheme: Revised 2016

Examination: Third Year Semester VI
Course Code: CSC603 and Course Name: Data Warehousing & Mining
Time: 1 hour Max. Marks: 50
Note to the students:- All the Questions are compulsory and carry equal marks .

Q1. What supports Data Filtering with the help of a query in Warehousing
Option A: NNTP
Option B: SMTP
Option C: OLAP
Option D: POP

Q2. _____are some popular OLAP tools.

Option A: Metacube, Informix
Option B: Oracle Express, Essbase
Option C: HOLAP
Option D: MOLAP

Q3. ____ is used to analyze data and make decisions.

Option A: OLTP
Option B: Files
Option C: RDBMS
Option D: OLAP

Q4. What is a pivot operation?

Option A: Operation which points to the required data
Option B: Visualization operation that rotates data axes
Option C: Aggregates the selected data
Option D: Operation to remove a selected data

Q5. Select the option that makes use of the merging approach.
Option A: Naive Bayes
Option B: Decision Tree
Option C: Partitional
Option D: Hierarchical

Q6. Variables are not classified as dependent or independent by which method of

Option A: regression analysis
Option B: discriminant analysis
Option C: analysis of variance
Option D: cluster analysis

Q7. Considering the K-median algorithm, if points (0, 2), (4, 5), and (-4, 2) are
assigned to the first cluster, calculate the new centroid for this cluster.
Option A: (2,0)
Option B: (2,1)
Option C: (0,3)
Option D: (1,2)

Q8. k-medoid algorithm uses_____________ to perform clustering on given input.

Option A: Medians
Option B: Medoids
Option C: Aggregate
Option D: Sums

Q9. __________ clustering techniques starts with all records in one cluster and then
try to split that cluster into small pieces
Option A: Agglomerative
Option B: Divisive
Option C: Partitioning
Option D: Numeric

Q10. A Media company wants to establish a relationship between sales for a year and
the amount spent on advertising that year. Which of the following methods you
would suggest for the same ?
Option A: Linear Regression
Option B: Multiple Regression
Option C: Decision Tree
Option D: Bayesian Classification

Q11. Netflix recommendation system is a best example of

Option A: Web structure mining
Option B: Web usage mining
Option C: Web content mining
Option D: Data mining

Q12. Following is not used in Web content Mining

Option A: Crawlers
Option B: Harvest System
Option C: Virtual Web View
Option D: Page rank

Q13. What is data about the data known as
Option A: Metadata
Option B: Microdata
Option C: Minidata
Option D: Multidata

Q14. _______________ are said to be facts that cannot be summed up for any of the
dimensions present in the fact table.
Option A: Additive Facts
Option B: Conformed Facts
Option C: Non-Additive facts
Option D: Factless facts

Q15. Which of the following is not a correct feature of Data Warehouse ?

Option A: integrated
Option B: time-variant
Option C: object-oriented
Option D: non-volatile

Q16. ________ is designed to target a specific set of users for their specific questions
and it is designed to be very specific, subject oriented storage of data.
Option A: database
Option B: data mart
Option C: data warehouse
Option D: big data

Q17. Gender variable in a given data set can have two values - Male and Female. This
is an example of ________
Option A: Ordinal variable
Option B: Categorical variable
Option C: Continuous variable
Option D: Mixed variable

Q18. What aims to communicate data clearly and effectively through graphical
representation ?
Option A: Data preparation
Option B: Data visualization
Option C: Data integration
Option D: Data selection

Q19. For attribute salary we have the following values for salary(in rupees), shown in
increasing order- 40, 46, 47, 60, 62, 64, 76, 83, 90, 100, 130. Calculate the
Option A: 46
Option B: 62
Option C: 64

Option D: 76

Q20. As per the concept of KDD process, which of the following statements is valid ?
Option A: KDD and Data Mining have no connection at all
Option B: KDD is one of the steps in Data Mining
Option C: Data Mining is one of the steps in KDD process
Option D: KDD and Data Mining mean the same

Q21. ____is the extraction of implicit, previously unknown, and potentially useful
information from data.
Option A: Data warehousing
Option B: Data Mining
Option C: KDD
Option D: Data selection

Q22. Association rule is accepted for analysis if they satisfy both

Option A: Maximum support threshold and minimum confidence threshold
Option B: Minimum support threshold and maximum confidence threshold
Option C: Minimum support threshold and minimum confidence threshold
Option D: Maximum support threshold and maximum confidence threshold

Q23. Given two objects represented by the tuples (23,1) and (21, 1):Compute the
Euclidean distance between the two objects.
Option A: 2
Option B: 0
Option C: 1
Option D: 3

Q24. Disadvantage of hierarchical clustering

Option A: Once a step is done(split/merge) it can be never undone
Option B: Dependent only on the cell in each dimension
Option C: Difficult to predict the K values
Option D: It needs a large number of parameters

Q25. The difference between supervised learning and unsupervised learning is given
Option A: unlike unsupervised learning, supervised learning needs labeled data
Option B: unlike unsupervised learning, supervised learning can be used to detect outliers
Option C: there is no difference
Option D: unlike supervised leaning, unsupervised learning can form new classes

Scheme R2012
Semester 8
Course Code CPC801
Course NameData Warehousing and Mining

Question No. Question

1 A data warehouse can be used to analyze a particular ________
2 which information is not provided by Information packages
3 Periodic Status is
4 Comparison of the
After the initial general
load, features
the data of the target
warehouse class
is kept data object
up-to-date byagainst the general
two actions:
5 REFRESH and UPDATE. As the number of records increase in a Data Warehouse,
6 The values of an ________ attribute provide enough information to order objects.
7 As per the concept of KDD process, which of the following statement is valid ?
8 information stored in the data warehouse.
9 Converting data from different sources into a common format for processing
10 Binary attribute are
11 It is measured on a scale of equal size units,these attributes allows us to compare such as
12 Which of the following is not a valid Visualization technique ?
13 The _______numerical measure which tells that two objects are alike
14 Removing duplicate records is a data mining process called ____________ .
15 _______________ is a process of taking operational data from one or more sources and
16 How __________ may be defined
many coefficients do youas the data
need objects that
to estimate in a do not comply
simple with the general
linear regression model
17 (One independent and one dependent variable)?
18 The mapping or classification of a class with some predefined group or class is known as?
19 To extract rules in supervised learning __________is used
20 from the given options______ is a predictive model
21 Euclidean distance measure is
22 Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36,
23 The following rule is an example of which association rule.{ age (X,
24 Repeating the holdout many times is called ______
25 Which algorithm requires fewer scans of data?
a b c d Answer Key
graph chart domain subject
Estimate data subject
Define the date of fullof
the value Establish data warehouse
the value of date of fullof
the value
data in which the attribute is data that are the attribute at the attribute is
Data Data Dataremains Data selection
is same as Data
decreases increases constant cost of increases
KDD ratio
and Data KDDBinary
is one of Data Interval
Mining is KDDordinal
and Data Dataordinal
Mining is
Mining have the steps in one of the Mining mean one of the
additive Only numeric All possible It is additive It is additive
This takes Preprocessin
This takes Transformati
This takes Interpretation
It cant take Transformati
This takes
only Scaled
Interval two only scaled
Ratio three only four
Binary any value.
Ternary only Scaled
Interval two
attribute attribute attribute attribute attribute
Scatter plot Decision Tree Box plot Histogram Decision Tree
dissimilarity clarity non clarity simmilarity simmilarity
Data isolation
Transformatio Recovery Data Cleaning Data dredging Data Cleaning
Evolution Cleansing Integration Scrubbing
Outlier Integration
Analysis Prediction Classification Analysis Analysis
Data 2
Data 3 4 2
Characterizati Discrimination Data Subset Data set Characterizati
root node sibling decision trees branches
Association decision trees
Clustering Regression Summarization rules Regression
A stage of The process The distance The distance The distance
6.32 6.71 6.15 6.22 6.71
random interlevel
cross multidimensi intralevel multidimensi
subsampling validation bootstrap bagging subsampling
Apriori FP growth Apriori and FP Growth
decision FP growth
Program: BE Computer Engineering
Curriculum Scheme: Revised 2016
Examination: Third Year Semester VI
Course Code: CSC603 and Course Name: Data Warehousing and Mining
Time: 1hour Max. Marks: 50
Note to the students:- All the Questions are compulsory and carry equal marks .

Q1. The star schema is composed of __________ fact table.

Option A: One
Option B: Two
Option C: Three
Option D: Four

Q2. In which order should the datawarehouse tables be loaded ?

Option A: First Dimension table and then fact table

Option B: First fact table and then Dimension table
Option C: Both can be loaded simultaneously
Option D: Depends on the business requirements

Q3. The data is stored, retrieved & updated in ____________.

Option A: OLTP
Option B: OLAP
Option C: SMTP
Option D: FTP

Q4. __________describes the data contained in the data warehouse.

Option A: Relational data
Option B: Operational data.
Option C: Informational data.
Option D: Metadata

Q5. The type of relationship in star schema is __________________.

Option A: one-to-one
Option B: one-to-many
Option C: many-to-one
Option D: many-to-many

Q6. Data warehouse architecture is based on …………………….

Option A: RDBMS
Option B: DBMD
Option C: SQL Server
Option D: Sybase
Q7. The following is not the data extraction issue
Option A: Source identification
Option B: Time window
Option C: Report generation
Option D: Job sequencing

Q8. The data capture through database triggers is the ---- data extraction.
Option A: Deferred
Option B: Immediate
Option C: Replication technology
Option D: future

Q9 The following step is not part of the KDD process

Option A: Data Mining
Option B: Preprocessing
Option C: Transformation
Option D: Extraction

Q10. The following is not the major steps in ETL process

Option A: Write procedure for all data load
Option B: Plan for aggregate table
Option C: Generate report
Option D: Establish comprehensive data extraction rule

Q11. The operation of moving from finer granular data to coarser granular data is called
Option A: Roll up
Option B: Drill down
Option C: Pivot
Option D: Slicing

Q12. Which is an essential process where intelligent methods are applied to extract data
patterns ?
Option A: Data Warehousing
Option B: Data Mining
Option C: Data Base
Option D: Data Structure

Q13. Which of the following is not a data mining functionality?

Option A: Characterization and Discrimination
Option B: Classification and regression
Option C: Selection and interpretation
Option D: Clustering and Analysis

Q14. Data set {brown, black, blue, green , red} is example of

Option A: Continuous attribute
Option B: Ordinal attribute
Option C: Numeric attribute
Option D: Nominal attribute

Q15. summarization of the general characteristics or features of a target class of data is

called as
Option A: Data Characterization
Option B: Data Classification
Option C: Data discrimination
Option D: Data selection

Q16. What is a symbolic representation of facts or ideas from which information can
potentially be extracted ?
Option A: Knowledge.
Option B: Data.
Option C: Algorithm .
Option D: Program.

Q17. Dimensionality reduction reduces the data set size by removing

Option A: Composite attributes
Option B: Derived attributes
Option C: Relevant attributes
Option D: Irrelevant attributes

Q18. To detect fraudulent usage of credit cards, the following data mining task should
be used
Option A: Outlier analysis
Option B: Prediction
Option C: Association analysis
Option D: Feature selection

Q19. The most important part of _________ is selecting the variables on which
clustering is based.
Option A: interpreting and profiling clusters
Option B: selecting a clustering procedure
Option C: assessing the validity of clustering
Option D: formulating the clustering problem
Q20. The most commonly used measure of similarity is the _________ or its square.
Option A: Euclidean distance
Option B: city block distance
Option C: Chebychev's distance
Option D: Manhattan distance

Q21. _________ is a clustering procedure characterized by the development of a tree

like structure.
Option A: Non-hierarchical clustering
Option B: Hierarchical clustering
Option C: Divisive Clustering
Option D: Agglomerative clustering

Q22. _________ is a clustering procedure where all objects start out in one giant
cluster. Clusters are formed by dividing this cluster into smaller and smaller
Option A: Non-hierarchical clustering
Option B: Hierarchical clustering
Option C: Divisive Clustering
Option D: Agglomerative clustering

Q23. The _________ method uses information on all pairs of distances ,not merely the
minimum or maximum distances.
Option A: single linkage
Option B: medium linkage
Option C: complete linkage
Option D: average linkage

Q24. _________ is frequently referred to as k-means clustering.

Option A: Non-hierarchical clustering
Option B: Optimizing partitioning
Option C: Divisive Clustering
Option D: Agglomerative clustering

Q25. The process of knowledge discovery from data is called _________

Option A: Data mining
Option B: Datawarehouse
Option C: Query
Option D: Knowledge Engineering

