Practice Exam

Uploaded by

Abhinav Yadav

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Practice Exam

Uploaded by

Abhinav Yadav

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

1.

The plot below shows data points between 2 predictor variables (x1 and x2) for a 2-class
classification problem (red vs blue):

Select the method that should give you the lowest misclassification error on this data set among
the following
• Classification Tree
• Logistic Regression

2. You have been given a large data set with 200,000 data points and 50 predictor variables. The
response variable takes continuous values. You have implemented a Linear Regression model
and obtained some Mean Squared Error (MSE) value along with the following residual plot
(fitted values vs errors):

Now, you want to try the Decision Tree method. You assume that since it is a large data set, you
can speed up the computation by restricting the maximum depth of the tree. You split the data
into 70% training set and 30% test set. Next, you apply the Decision Tree method and obtain a
new MSE value, which is higher than the one obtained using the Linear Regression model. What
would you do next?

• Increase flexibility of the model by allowing more depth in the tree.

• Conclude the relationship is linear and choose the Linear Regression model.
• Add a high penalty (cost-complexity pruning) on the number of terminal nodes that the
tree can generate

3. You are working on a classification project to identify whether an individual will have a certain
disease or not.
The predictors are the measurements obtained from the individual's blood test report. The
training data set contains 15,000 data samples and 10 predictor variables. You notice that 20
samples are missing random predictor variable values.
Upon further inspection, you find the following information:
1) the data set is balanced (i.e., it has a similar proportion of both the classes),
2) the maximum number of predictor variable values that are missing for any of the 20 samples
is 5,
3) none of the predictor variables are missing values of more than 2 samples, and
4) 11 out of the 20 samples belong to the same class.
How would you handle the missing values?

• Delete the 20 data samples.

• Impute the values using K-nearest neighbors.
• Impute the values using the mean of the respective predictor variables

4. Which of the following statements are true with respect to the Random Forests method?

a) It helps in reducing the correlation among various trees.

b) It uses bootstrap samples to train individual decision trees.
c) Given p number of predictors in a problem, it uses k = p predictors at each node/split to select
the best predictor.
d) It uses m number of trees and aggregates their predictions to determine the response
variable value
• a, c, d
• a, b, c, d
• a, b, c
• a, b, d
5. In Tree-based models that are used to predict continuous response values, which of the
following measures is used to create a split at a node of the tree?
• Residual Sum of Squares
• Entropy
• Gini Index
6. You are working on a 2-class classification problem where the objective is to determine whether
a customer is going to default on a loan or not. There are 2 predictor variables: the customer's
credit history and the amount of loan. Below is the table with 7 data points
You have decided to apply a tree-based model on this data set with Gini index as a measure to
select the variable to be split at the root node. Calculate and report the Gini index score for the
predictor variable: "Credit History"

• 0.57
• 1
• 0.47
7. You are working on a 2-class classification problem where the objective is to determine whether
a customer is going to default on a loan or not. There are 2 predictor variables: the customer's
credit history and the amount of loan. Below is the table with 7 data points
You have decided to apply a tree-based model on this data set with Gini index as a measure to
select the variable to be split at the root node. Calculate and report the Gini index score for the
predictor variable: "Loan Amount"

• 0
• 0.29
• 0.24
8. You are working on a 2-class classification problem where the objective is to determine whether
a customer is going to default on a loan or not. There are 2 predictor variables: the customer's
credit history and the amount of loan. Below is the table with 7 data points.
You have decided to apply a tree-based model on this data set with Gini index as a measure to
select the variable to be split at the root node.
• Credit History
• Loan Amount
9. Which of the following statements is not true about the Tree-based models?
• The relationship between the predictor variables and the response variable has to be
expressed in a parametric form to obtain high accuracy.
• Tree-based models can handle data with missing values.
• Tree-based models can handle predictors which take categorical values.

10. Below is a list of tasks involved in modeling a predictive analytics project represented by
respective alphabets:

a - read data

b - visualize data by obtaining the correlation plots

c - split data into 20% testing data set and 80% training data set

d - apply linear regression method e -apply ridge regression method after standardizing the
predictor variable values

f - apply the lasso method after standardizing the predictor variable values

g - apply principal components analysis h - obtain the misclassification rate

i - obtain the root mean squared error

You are given a predictive analytics project to estimate house prices given 4 predictors: the
number of rooms, school ratings, crime rate, and nitric oxides concentration. The training data
set consists of 50,000 data samples. You suspect there is a linear relationship between the
predictor variables and the response variable. Your objective is to obtain a high prediction
accuracy and also keep the model interpretable. Pick the correct list of tasks involved and the
order in which you will execute them for this project.

• a-b-c-f-i
• a-b-c-e-i
• a-b-c-d-i

11. Coding-based: The data set used in the python file has 10 numeric predictor variables. The
response variable is a quantitative measurement of the disease (diabetes) progression one year
after the baseline values of the predictor variables are recorded. For the given data set in the
python file, report the mean value observed for the response value (target)
• 0
• 346
• 152.13
12. Coding-based: For the given data set in the python file, report the predictor variable that has the
largest mean value. For identification of this variable compare up to 3 decimal places of the
closest mean values.
• s4
• All the predictor variables have the same mean value (upto 3 decimal places).
• s2
• s1
13. Coding-based: Which of the following predictor variable seem to have a stronger linear
relationship with the response variable compared to the other 3 variable options?
• Bmi
• s4
• age
• sex
14. Coding-based: For the given data set in the python file, which of the following pairs of predictor
variables are correlated the most:
• s3-s5
• s2-s3
• s1-s2
• s2-s4
15. Coding-based: For the given data set in the python file, do the following: Type this in a cell:
random.seed(123) And then split the data into train and test data sets with a test_size=0.20 and
random_state=1 as parameters. Report the number of data samples in the training data set.
• 89
• 442
• 353
• 16:
16. For the given data set in the python file, fit a linear regression model to the training data set
obtained after splitting the data set with the same conditions as in the previous question
(random.seed(123), test_size=0.2 and random_state=1). Make predictions on the test data set
with the fit obtained on the training data set. Obtain the lowest mean squared error. Select the
closest value of the mean squared error that you obtained from the following:
• 20
• 5
• 2990

7. Coding-based: For the given data set in the python file, fit one of the shrinkage methods to the
training data set obtained after splitting the data set with the same conditions as in the previous
question (random.seed(123), test_size=0.2 and random_state=1). Make predictions on the test
data set with the fit obtained on the training data set. Obtain the lowest mean squared error. (In
order to obtain the lowest mean squared error, you may want to tune the the 'alpha'
parameter, which can take values from decimal points to integers) Select the closest value of the
mean squared error that you obtained from the following:
• 2000
• 2930
• 3500
8. Coding-based: For the given data set in the python file: Fit a decision tree regressor to the same
training data set obtained after splitting the data set with the same conditions as in an earlier
question (random.seed(123), test_size=0.2 and random_state=1). (Set the random_state=1 in
the regressor. You may tune the respective parameter: max_depth to attain the lowest error.)
Make predictions on the test data set with the fit obtained on the training data set. Obtain the
lowest mean squared error. Select the closest value of the mean squared error that you
obtained from the following:
• 6766
• 2990
• 4090

9. Coding-based: For the given data set in the python file: Fit a random forest regressor to the
same training data set obtained after splitting the data set with the same conditions as in an
earlier question (random.seed(123), test_size=0.2 and random_state=1 ) (Set the
random_state=1 in the regressor. You may tune the respective parameter: n_estimators to
attain the lowest error.) Make predictions on the test data set with the fit obtained on the
training data set. Obtain the lowest mean squared error. Select the closest value of the mean
squared error that you obtained from the following:
• 4284
• 3700
• 8232
10. Based on the results obtained using the linear and tree-based methods on the diabetes data set
in the python file, is the below statement True or False? The relationship between the predictor
variables and the response variable can be well-explained using if-then statements in the
predictor space.
• False
• True

Exam Final
100% (1)
Exam Final
21 pages
Alteryx Data Analytics Process
No ratings yet
Alteryx Data Analytics Process
9 pages
Changan Parts List Jan 2020
No ratings yet
Changan Parts List Jan 2020
4 pages
qbus 5001 quiz 答案
No ratings yet
qbus 5001 quiz 答案
30 pages
Cot Detailed Lesson Plan in Mathematics 5
100% (7)
Cot Detailed Lesson Plan in Mathematics 5
3 pages
Science of Information
100% (6)
Science of Information
362 pages
DADM - Cheat Sheet: Hypothesis Testing Two Way Anova
No ratings yet
DADM - Cheat Sheet: Hypothesis Testing Two Way Anova
2 pages
Chapter 6
0% (1)
Chapter 6
50 pages
May 2021 Examination Diet School of Mathematics & Statistics ID5059
No ratings yet
May 2021 Examination Diet School of Mathematics & Statistics ID5059
6 pages
Stats216 hw3 PDF
No ratings yet
Stats216 hw3 PDF
26 pages
May2015 Examination Diet School of Mathematics & Statistics ID5059
No ratings yet
May2015 Examination Diet School of Mathematics & Statistics ID5059
9 pages
DAPv9d Mac2011
No ratings yet
DAPv9d Mac2011
36 pages
Practice Midterm2 Fall2011
No ratings yet
Practice Midterm2 Fall2011
9 pages
Chapter 13 Experimental Design and Analysis of Variance PDF
No ratings yet
Chapter 13 Experimental Design and Analysis of Variance PDF
44 pages
Week 10 Assignment Ch14
No ratings yet
Week 10 Assignment Ch14
16 pages
Stats216 hw4 PDF
No ratings yet
Stats216 hw4 PDF
27 pages
Midterm Principles
No ratings yet
Midterm Principles
8 pages
Data Science Gabarito 4 Da Universidade de Harvard
No ratings yet
Data Science Gabarito 4 Da Universidade de Harvard
5 pages
Week 1-2 Exercises
100% (1)
Week 1-2 Exercises
36 pages
Answer 1722791857 NLP and Classification Practical MCQ 4991
No ratings yet
Answer 1722791857 NLP and Classification Practical MCQ 4991
26 pages
Assignment6.1 DataMining Part1 Simple Linear Regression
No ratings yet
Assignment6.1 DataMining Part1 Simple Linear Regression
8 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
d2 - 1 PDF
No ratings yet
d2 - 1 PDF
5 pages
Multiple Regression
100% (1)
Multiple Regression
58 pages
Business Analytics Module 8
100% (1)
Business Analytics Module 8
65 pages
Decision Trees For Predictive Modeling (Neville)
100% (1)
Decision Trees For Predictive Modeling (Neville)
24 pages
Question and Answers For Pyplots
No ratings yet
Question and Answers For Pyplots
11 pages
Module 1 Quiz
No ratings yet
Module 1 Quiz
7 pages
Multi Regression
No ratings yet
Multi Regression
17 pages
ML Assignemnt PDF
No ratings yet
ML Assignemnt PDF
21 pages
Sbe10 10 Simple Regression
No ratings yet
Sbe10 10 Simple Regression
100 pages
Ch. 10 Review
No ratings yet
Ch. 10 Review
8 pages
CH 12
No ratings yet
CH 12
30 pages
Multiple Regression PDF
No ratings yet
Multiple Regression PDF
35 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Sample Exam 2 Questions
100% (1)
Sample Exam 2 Questions
5 pages
Ch2 Slides
No ratings yet
Ch2 Slides
80 pages
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
100% (1)
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
25 pages
Basic Econometrics 1
No ratings yet
Basic Econometrics 1
18 pages
Exams
No ratings yet
Exams
74 pages
Elementary Statistics Project
No ratings yet
Elementary Statistics Project
1 page
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
No ratings yet
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
80 pages
Predictive Analytics Notes
No ratings yet
Predictive Analytics Notes
42 pages
Stat982 (Chap13) Q Set
No ratings yet
Stat982 (Chap13) Q Set
27 pages
CHPT 12 Homework
100% (1)
CHPT 12 Homework
22 pages
Week 2 Quiz: Latest Submission Grade 100%
No ratings yet
Week 2 Quiz: Latest Submission Grade 100%
8 pages
Linear Regression Analysis For STARDEX: Trend Calculation
No ratings yet
Linear Regression Analysis For STARDEX: Trend Calculation
6 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
Comparison of Several Multivariate Means
No ratings yet
Comparison of Several Multivariate Means
111 pages
Questions & Answers Chapter - 7 Set 1
No ratings yet
Questions & Answers Chapter - 7 Set 1
6 pages
Applied Statistics II-SLR
100% (1)
Applied Statistics II-SLR
23 pages
ML Assignment 3 Nptel 2019
No ratings yet
ML Assignment 3 Nptel 2019
26 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Chapter-5 Curve Fitting
No ratings yet
Chapter-5 Curve Fitting
48 pages
Classification by Backpropagation - A Multilayer Feed-Forward Neural Network - Defining A Network Topology - Backpropagation
No ratings yet
Classification by Backpropagation - A Multilayer Feed-Forward Neural Network - Defining A Network Topology - Backpropagation
8 pages
Ismaykim1 PDF
No ratings yet
Ismaykim1 PDF
522 pages
On Hands On R Programming
No ratings yet
On Hands On R Programming
30 pages
Multiple Choice Questions On Linear Regression
No ratings yet
Multiple Choice Questions On Linear Regression
8 pages
Univariate and Bivariate Data Analysis + Probability
100% (1)
Univariate and Bivariate Data Analysis + Probability
5 pages
CS464 Ch9 LinearRegression
100% (1)
CS464 Ch9 LinearRegression
43 pages
Assignment 3 (2023)
No ratings yet
Assignment 3 (2023)
9 pages
Exam All Questions
No ratings yet
Exam All Questions
566 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
Chapter 5 Learning Deterministic Models
No ratings yet
Chapter 5 Learning Deterministic Models
28 pages
SQT_Tut11_hung3
No ratings yet
SQT_Tut11_hung3
41 pages
Marketing_plan_Nike.docx
No ratings yet
Marketing_plan_Nike.docx
13 pages
Chapter 4_ Activities and Assignments_ MIS-7605-002 Bus Database Systems Fall Term 2024
No ratings yet
Chapter 4_ Activities and Assignments_ MIS-7605-002 Bus Database Systems Fall Term 2024
1 page
360-F 2024 Project 1 (2) (1)
No ratings yet
360-F 2024 Project 1 (2) (1)
2 pages
Homework 4 Superior 2 Solution
No ratings yet
Homework 4 Superior 2 Solution
1 page
Marina Jacobi - The Harmonic Reactor Part 4
No ratings yet
Marina Jacobi - The Harmonic Reactor Part 4
20 pages
Solutions of Problem Set 4
No ratings yet
Solutions of Problem Set 4
16 pages
Examiner Article
No ratings yet
Examiner Article
1 page
Heat and Mass
No ratings yet
Heat and Mass
2 pages
End of Term 1 Physics g11 Test
No ratings yet
End of Term 1 Physics g11 Test
4 pages
Information Technology Adoption On Supply Chain in Small and Medium Enterprises
No ratings yet
Information Technology Adoption On Supply Chain in Small and Medium Enterprises
3 pages
Stability Indicating Assay
100% (2)
Stability Indicating Assay
28 pages
Milwaukee Product Catalogue 2022 23 Compressed
No ratings yet
Milwaukee Product Catalogue 2022 23 Compressed
65 pages
IIV Form
No ratings yet
IIV Form
2 pages
PPSC Unit 3 Lesson Notes
No ratings yet
PPSC Unit 3 Lesson Notes
14 pages
Module 4 Communication For General Purposes 2
No ratings yet
Module 4 Communication For General Purposes 2
16 pages
Understanding Paper Grain
No ratings yet
Understanding Paper Grain
3 pages
Writing Research Proposal
No ratings yet
Writing Research Proposal
12 pages
admin,+DR +SAMINA+RAUF
No ratings yet
admin,+DR +SAMINA+RAUF
16 pages
G7 Unit 5 Topic 2 Notes
No ratings yet
G7 Unit 5 Topic 2 Notes
4 pages
Full Download The Making of the American Landscape 2nd Edition Michael P. Conzen (Ed.) PDF DOCX
100% (1)
Full Download The Making of the American Landscape 2nd Edition Michael P. Conzen (Ed.) PDF DOCX
82 pages
Statements
No ratings yet
Statements
4 pages
Ficha Técnica TANQUE BESTANK
No ratings yet
Ficha Técnica TANQUE BESTANK
6 pages
Mil Know
No ratings yet
Mil Know
10 pages
Resume Atif Fashion Designer
No ratings yet
Resume Atif Fashion Designer
2 pages
Malawi Public Sector Institutional List
No ratings yet
Malawi Public Sector Institutional List
7 pages
Group Segment Revenue Expenses Member Months
No ratings yet
Group Segment Revenue Expenses Member Months
19 pages
Adv and Dis
No ratings yet
Adv and Dis
4 pages
Black Rock
No ratings yet
Black Rock
3 pages
Rubric For Marketing Plan
No ratings yet
Rubric For Marketing Plan
4 pages
Haryana Kaushal Rozgar Nigam
No ratings yet
Haryana Kaushal Rozgar Nigam
11 pages