1 - Page
1 - Page
1 - Page
Q1. What supports Data Filtering with the help of a query in Warehousing
Option A: NNTP
Option B: SMTP
Option C: OLAP
Option D: POP
Q5. Select the option that makes use of the merging approach.
Option A: Naive Bayes
Option B: Decision Tree
Option C: Partitional
Option D: Hierarchical
1 | Page
analysis?
Option A: regression analysis
Option B: discriminant analysis
Option C: analysis of variance
Option D: cluster analysis
Q7. Considering the K-median algorithm, if points (0, 2), (4, 5), and (-4, 2) are
assigned to the first cluster, calculate the new centroid for this cluster.
Option A: (2,0)
Option B: (2,1)
Option C: (0,3)
Option D: (1,2)
Q9. __________ clustering techniques starts with all records in one cluster and then
try to split that cluster into small pieces
Option A: Agglomerative
Option B: Divisive
Option C: Partitioning
Option D: Numeric
Q10. A Media company wants to establish a relationship between sales for a year and
the amount spent on advertising that year. Which of the following methods you
would suggest for the same ?
Option A: Linear Regression
Option B: Multiple Regression
Option C: Decision Tree
Option D: Bayesian Classification
2 | Page
Q13. What is data about the data known as
Option A: Metadata
Option B: Microdata
Option C: Minidata
Option D: Multidata
Q14. _______________ are said to be facts that cannot be summed up for any of the
dimensions present in the fact table.
Option A: Additive Facts
Option B: Conformed Facts
Option C: Non-Additive facts
Option D: Factless facts
Q16. ________ is designed to target a specific set of users for their specific questions
and it is designed to be very specific, subject oriented storage of data.
Option A: database
Option B: data mart
Option C: data warehouse
Option D: big data
Q17. Gender variable in a given data set can have two values - Male and Female. This
is an example of ________
Option A: Ordinal variable
Option B: Categorical variable
Option C: Continuous variable
Option D: Mixed variable
Q18. What aims to communicate data clearly and effectively through graphical
representation ?
Option A: Data preparation
Option B: Data visualization
Option C: Data integration
Option D: Data selection
Q19. For attribute salary we have the following values for salary(in rupees), shown in
increasing order- 40, 46, 47, 60, 62, 64, 76, 83, 90, 100, 130. Calculate the
median
Option A: 46
Option B: 62
Option C: 64
3 | Page
Option D: 76
Q20. As per the concept of KDD process, which of the following statements is valid ?
Option A: KDD and Data Mining have no connection at all
Option B: KDD is one of the steps in Data Mining
Option C: Data Mining is one of the steps in KDD process
Option D: KDD and Data Mining mean the same
Q21. ____is the extraction of implicit, previously unknown, and potentially useful
information from data.
Option A: Data warehousing
Option B: Data Mining
Option C: KDD
Option D: Data selection
Q23. Given two objects represented by the tuples (23,1) and (21, 1):Compute the
Euclidean distance between the two objects.
Option A: 2
Option B: 0
Option C: 1
Option D: 3
Q25. The difference between supervised learning and unsupervised learning is given
by
Option A: unlike unsupervised learning, supervised learning needs labeled data
Option B: unlike unsupervised learning, supervised learning can be used to detect outliers
Option C: there is no difference
Option D: unlike supervised leaning, unsupervised learning can form new classes
4 | Page
Scheme R2012
Semester 8
Course Code CPC801
Course NameData Warehousing and Mining
Q8. The data capture through database triggers is the ---- data extraction.
Option A: Deferred
Option B: Immediate
Option C: Replication technology
Option D: future
Q11. The operation of moving from finer granular data to coarser granular data is called
Option A: Roll up
Option B: Drill down
Option C: Pivot
Option D: Slicing
Q12. Which is an essential process where intelligent methods are applied to extract data
patterns ?
Option A: Data Warehousing
Option B: Data Mining
Option C: Data Base
Option D: Data Structure
Q16. What is a symbolic representation of facts or ideas from which information can
potentially be extracted ?
Option A: Knowledge.
Option B: Data.
Option C: Algorithm .
Option D: Program.
Q18. To detect fraudulent usage of credit cards, the following data mining task should
be used
Option A: Outlier analysis
Option B: Prediction
Option C: Association analysis
Option D: Feature selection
Q19. The most important part of _________ is selecting the variables on which
clustering is based.
Option A: interpreting and profiling clusters
Option B: selecting a clustering procedure
Option C: assessing the validity of clustering
Option D: formulating the clustering problem
Q20. The most commonly used measure of similarity is the _________ or its square.
Option A: Euclidean distance
Option B: city block distance
Option C: Chebychev's distance
Option D: Manhattan distance
Q22. _________ is a clustering procedure where all objects start out in one giant
cluster. Clusters are formed by dividing this cluster into smaller and smaller
clusters.
Option A: Non-hierarchical clustering
Option B: Hierarchical clustering
Option C: Divisive Clustering
Option D: Agglomerative clustering
Q23. The _________ method uses information on all pairs of distances ,not merely the
minimum or maximum distances.
Option A: single linkage
Option B: medium linkage
Option C: complete linkage
Option D: average linkage