Data Mining Exam Questions
Data Mining Exam Questions
Data Mining Exam Questions
Business
Appoint-
ment?
No Yes
Decision =
Temp wear slacks
above
70?
No Yes
Decision = Decision =
wear jeans wear shorts
6. A statement to be tested.
a. theory
b. procedure
c. principle
d. hypothesis
How many individuals in the class life insurance= no have credit card insurance and are less
than 30 years old?
a. 63
b. 70
c. 30
d. 27
b. the percentage of instances that contain the consequent conditions listed in the association
rule.
c. the percentage of instances that contain all items listed in the association rule.
d. the percentage of instances in the database that contain at least one of the antecendent
conditional items listed in the association rule.
12. This approach is best when we are interested in finding all possible interactions among a
set of attributes.
a. decision tree
b. association rules
c. K-Means algorithm
d. genetic learning
13. This step of the KDD process model deals with noisy data.
a. Creating a target dataset
b. data preprocessing
c. data transformation
d. data mining
14. This data transformation technique works well when minimum and maximum values for a
real-valued attribute are known.
a. min-max normalization
b. decimal scaling
c. z-score normalization
d. logarithmic normalization
15. This technique uses mean and standard deviation scores to transform real-valued
attributes.
a. decimal scaling
b. min-max normalization
c. z-score normalization
d. logarithmic normalization
16. A data normalization technique for real-valued attributes that divides each numerical
value by the same power of 10.
a. min-max normalization
b. z-score normalization
c. decimal scaling
d. decimal smoothing
17. The correlation coefficient for two real-valued attributes is –0.85. What does this value
tell you?
a. The attributes are not linearly related.
b. As the value of one attribute increases the value of the second attribute also increases.
c. As the value of one attribute decreases the value of the second attribute increases.
d. The attributes show a curvilinear relationship.
18. most frequently occurring number in a set of values is called the ____.
a. Mean
b. Median
c. Mode
d. Range
19. ................... is an essential process where intelligent methods are applied to extract data patterns.
A) Data warehousing
B) Data mining
C) Text mining
D) Data selection
20. Which of the following is not a data mining functionality?
23. Discriminating between spam and ham e-mails is a classification task, true or
false?
A. True B. False
28. A class of learning algorithm that tries to find an optimum classification of a set of
examples using the probabilistic theory is named as …
a) Bayesian classifiers
b) Dijkstra classifiers
c) doppler classifiers
d) all of these
29. Group of similar objects that differ significantly from other objects is named as …
a) classification
b) cluster
c) community
d) none of these
31. Classification is
A. A subdivision of a set of examples into a number of classes
B. A measure of the accuracy, of the classification of a concept that is given by a certain
theory
C. The task of assigning a classification to a set of examples
D. None of these
A. This takes only two values. In general, these values will be 0 and 1 and .they can
be coded as one bit
B. The natural environment of a certain species
C. Systems that can be used without knowledge of internal operations
D. None of these
34. Cluster is
A. Group of similar objects that differ significantly from other objects
D. None of these
36. Discovery is
A. It is hidden within a database and can only be recovered if one is given certain clues
(an example IS encrypted information).
C. An extremely complex molecule that occurs in human chromosomes and that carries
genetic information in the form of genes.
D. None of these
B. Set of columns in a database table that can be used to identify each record within this table
uniquely.
C. collection of interesting and useful patterns in a database
D. none of these
C. A prediction made using an extremely simple method, such as always predicting the
same output.
D. None of these
40. Consider discretizing a continuous attribute whose values are listed below:
3, 4, 5, 10, 20, 32, 43, 44, 46, 52, 59, 61
Which of the following number of bins is not possible for using equidepth bins?
A. 2
B. 4
C. 5
D. All of the above
41. Consider discretizing a continuous attribute whose values are listed below:
3, 4, 5, 10, 21, 32, 43, 44, 46, 52, 59, 67
Using equal-width partitioning and four bins, how many values are there in the first
bin (the bin with small values)?
A. 1
B. 2
C. 3
D. 4
42. A machine learning problem involves four attributes plus a class. The attributes
have 3, 2, 2, and 2 possible values each. The class has 3 possible values. How
many possible different examples are there?
10
A. 3
B. 6
C. 48
D. 72
43. Which of the following statements about Naive Bayes is incorrect?
A. Attributes are equally important.
B. Attributes are statistically dependent of one another given the class value.
C. Attributes are statistically independent of one another given the class value.
D. All of the above
A) True
B) False
2. Data selection is a comparison of the general features of the target class data
objects against the general features of objects from one or multiple contrasting
classes.
A. True
B. False
11
51. The choice of a data mining tool is made at this step of the KDD
process.
a. goal identification
b. creating a target dataset
c. data preprocessing
d. data mining
52. Attibutes may be eliminated from the target dataset during this step of
the KDD process.
a. creating a target dataset
b. data preprocessing
c. data transformation
d. data mining
12
A. Data Mining is defined as the procedure of extracting information from huge sets of
data
B. Data mining also involves other processes such as Data Cleaning, Data Integration,
Data Transformation
C. Data mining is the procedure of mining knowledge from data.
D. All of the above
View Answer
Ans : D
57. The mapping or classification of a class with some predefined group or class is
known as?
A. Data Characterization
B. Data Discrimination
C. Data Set
D. Data Sub Structure
View Answer
Ans : B
13
View Answer
Ans : C
Explanation: Mining of Correlations : It is a kind of additional analysis performed to
uncover interesting statistical correlations between associated-attribute-value pairs or
between two item sets to analyze that if they have positive, negative or no effect on each
other.
59. __________ may be defined as the data objects that do not comply with the
general behavior or model of the data available.
A. Outlier Analysis
B. Evolution Analysis
C. Prediction
D. Classification
View Answer
Ans : A
Explanation: Outlier Analysis : Outliers may be defined as the data objects that do not
comply with the general behavior or model of the data available.
60. "Efficiency and scalability of data mining algorithms" issues comes under?
14
15
A. Study Class
B. Intial Class
C. Target Class
D. Final Class
View Answer
Ans : C
Explanation: Data Characterization : This refers to summarizing data of class under study.
This class under study is called as Target Class.
66. __________ refers to the description and model regularities or trends for objects
whose behavior changes over time.
A. Outlier Analysis
B. Evolution Analysis
C. Prediction
D. Classification
View Answer
Ans : B
Explanation: Evolution Analysis : Evolution analysis refers to the description and model
regularities or trends for objects whose behavior changes over time.
16
68. In which step of Knowledge Discovery, multiple data sources are combined?
A. Data Cleaning
B. Data Integration
C. Data Selection
D. Data Transformation
View Answer
Ans : B
69. The most commonly used algorithm to discover association rules by recursively
identifying frequent item sets
a. A priori algorithm
b. Ordinal data
c. Nominal data
d. Categorical data
70. A process that uses statistical, mathematical, artificial intelligence, and machine-
learning techniques to extract and identify useful information and subsequent
knowledge from large databases.
a. RapidMiner
b. Gini index
c. Sequence mining
17
d. Data mining
71. A machine learning process that performs rule induction or a related procedure to
establish knowledge from large databases
a. Categorical data
b. K fold cross validation
c. Numeric data
d. Knowledge discovery in databases
73. A type of data that represents the numeric values of specific variables. for
example age number of children etc
a. Ratio data
b. Numeric data
c. Nominal data
d. Interval data
18
74. The measure of how often products or services appear together in the same
transaction. The proportion of transactions in the dataset that contain all of the
products and/or services mentioned in a specific role.
a. SEMMA
b. Entropy
c. Lift
d. Support
75. The act of telling about the future
a. Regression
b. Prediction
c. Classification
d. Associations
76. A metric that is used in economics to measure the diversity of a population
a. RapidMiner
b. Confidence
c. Gini index
d. Data mining
77. The splitting mechanism used in id3
a. Ratio data
b. Associations
c. Interval data
d. Information gain
19
78. Supervised induction used to analyze the historical data stored in a database and
to automatically generate a model that can predict future behavior
a. Classification
b. Associations
c. Clustering
d. Prediction
80. To determine association rules from frequent item sets Select one:
a. Only minimum confidence needed
b. Neither support not confidence needed
c. Both minimum support and confidence are needed
d. Minimum support is needed
Feedback: Both minimum support and confidence are needed
81. If {A,B,C,D} is a frequent itemset, candidate rules which is not possible is Select one:
a. C –> A
b. D –>ABCD
c. A –> BC d.
B –> ADC
82. Feedback: D –>ABCD
86. Noise is a random error or variance in measured variables.
Ans: Noise
83. ______ routines attempt to fill in missing values, smooth out noise while
identifying outlines, and correct inconsistencies in the data.
20
89. The ______ technique uses encoding mechanisms to reduce the data set size.
Ans: Data compression
90. In which Strategy of data reduction redundant attributes are detected.
A. Date cube aggregation
B. Numerosity reduction
C. Data compression
D. Dimension reduction
Ans: D. Dimension reduction
91. The _____ rule can be used to segment numeric data into relatively uniform,
“natural” intervals.
Ans: 3-4-5
92. Oracle, SQL/Server, DB2 are examples for _____________.
Ans: DBMS
21
93. Data Base Management System (DBMS) supports query languages. (True/False)
Ans: True
94. The _____ item sets find all sets of items (items sets) whose support is greater
than the user-specified minimum support, σ.
Ans: Frequent set
95. A frequent set is a _______ if it is a frequent set and no superset of this is a
frequent set.
Ans: Maximal frequent set
96. ____________ techniques is used to detect relationships or associations between
specific values of categorical variables in large data sets.
Ans: Association rule mining
97. A Decision Tree is a _____________ model.
Ans: Predictive model
98. Using a decision tree, only categorical variables would be modelled.
(True/False).
Ans: False
99. Clustering is an unsupervised learning method (True/false).
Ans: False
100. For a given transaction database T, a ____ is an expression of the form X
=> Y, where X and Y are subsets of A and X => Y holds with confidence Ʈ, if
Ʈ% of transactions in D support X also support Y.
Ans: Association rule
101. The _______ rule describes associations between quantitative items or
attributes.
Ans: Quantitative association
102. The ____ step eliminates the extensions of (k-1) – itemsets, which are not
found to be frequent, from being considered for counting support.
Ans: Pruning
103. In the first phase of Partition algorithm, the algorithm logically divides the
database into a number of ______.
Ans: non – overlapping partitions.
22
23
a) Efficiency
24
b) Quality of data
c) Marketing
d) All the above
Ans: D. All the above.
124. To improve accuracy, data mining programs are used to analyze audit data
and extract features that can distinguish normal activities from intrusions.
(True/False)
Ans: True
125. Patient Rule Induction Method (PRIM) and Weighted Item Sets (WIS), is
a type of Association rule technique
(True/False)