R Programming in Data Science

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

SRM VALLIAMMAI ENGINEERING COLLEGE

SRM Nagar , Kattankulathur – 603 203

DEPARTMENT OF
ARTIFICIAL INTELLINGENCE AND DATA SCIENCE

QUESTION BANK

V SEMESTER
1922502 – R PROGRAMMING IN DATA SCIENCE

Regulation – 2019
Academic Year 2022 – 2023 (ODD)

Prepared by

Mrs. R. Deepa, Assistant Professor / AI&DS


SRM VALLIAMMAI ENGINEERING
COLLEGE

SRM Nagar, Kattankulathur – 603203.

DEPARTMENT OF
ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

QUESTION BANK

SUBJECT : 1922502 - R PROGRAMMING IN DATASCIENCE


SEM/YEAR : V / III

UNIT – I

Introduction to R: R Software, R packages, Data Types in R: Scalars, Vectors, Matrices,


Data frames, Lists, Variables and Logical Operations. R Matrix Create, Print, Column,
Slice, Factors in R, Categorical and Continuous Variables.

PART- A (2 Marks)
Q. Questions BT Level Compete
No nce
1 Define R programming. Remembering BTL-1
2 Differentiate between Scalars, vector, list, Matrix and Understanding BTL-2
Data frame.
3 List out any five features of R. Remembering BTL-1
4 Differentiate between R and Python in terms of Understanding BTL-2
functionality.
5 What are the applications of R? Understanding BTL-2
6 Why do we use the command- Analyzing BTL-4
install.packages(file.choose(),repos=NULL)?
7 Summarize some packages in R, which can be used for Evaluating BTL-5
data imputation?
8 How to get the name of the current working directory in Applying BTL-3
R?
9 Write a R program to take input from the user (name Analyzing BTL-4
and age) and display the values. Also print the version
of R installation
10 What are the different values that can be assigned to a Analyzing BTL-4
numeric datatype in R?
11 What are the different data types in R? Analyzing BTL-4
12 Explain RStudio. Evaluating BTL-5
13 Compare R with other technologies. Understanding BTL-2
14 Write a R program to create three vectors numeric data, Evaluating BTL-5
character data and logical data. Display the content of
the vectors and their type.
15 Define Merging and accessing list elements. Remembering BTL-1
16 Demonstrate the simple 3X3 matrix. Applying BTL-3
17 How do you access the elements in the 2nd column and Applying BTL-3
4th row of a matrix?
18 Define slice a matrix. Remembering BTL-1
19 How to create a Matrix? Applying BTL-3
20 Write R program to create a blank matrix. Creating BTL-6
21 Write a program to add two matrices. Creating BTL-6
22 List out the operations on Matrices. Remembering BTL-1
23 Define order of a Matrix. Remembering BTL-1
24 Difference between Nominal and ordinal categorical Understanding BTL-2
variable.
PART – B (13 Marks)
1 i.Summarize the advantages and disadvantages (6) Remembering BTL-1
of R?
ii.Explain scalar and vector with an example. (7)
2 i.Write a R program to get the first 10 Fibonacci (6) Remembering BTL-1
numbers.
ii.Generate the following: (7)
a. Access the element at 3rd column and 1st
row in a matrix.
b. Access only the second row
c. Access the element at 2nd column and 4th
row in a matrix
3 Write a R program to find the maximum and the (13) Applying BTL-3
minimum value of a given vector. Explain the
functions with syntax.
4 Write a R program to find elements which are (13) Applying BTL-3
present in two given data frames
5 Write a R program to create a data frame using (13) Understanding BTL-2
two given vectors and display the duplicated
elements and unique rows of the data frame.
Explain with a syntax.
6 i. Illustrate the usage of all logical operator in R. (6) Understanding BTL-2
ii. Explain the use of length () and mean() (7)
function.
7 Elaborate the statistical and programming (13) Understanding BTL-2
features of R.
8 i. Write a R program to add a new item g4 = (6) Evaluating BTL-5
"Python" to a given list. g1=1:10,g2=”R (7)
Program”,g3=”HTML”.
ii. Explain Data frame operations.
9 Write a R program to add 3 to each element of (13) Analyzing BTL-4
the first vector. Print the original and new vector.
10 Write a R program to reverse the order of given Analyzing BTL-4
(13)
vector.
11 Check whether the value of the element of a Understanding BTL-2
(13)
given vector greater than 10 or not. Return
TRUE or FALSE.
12 Write a R program to create an ordered factor Analyzing BTL-4
(13)
from data consisting of the names of months.
13 Write a R program to create a correlation matrix Applying BTL-3
(13)
from a data frame of same datatype. Explain the
functions with syntax.
14 Create the following: Creating BTL-6
a. Create a matrix taking a given vector of (7)
numbers as input. Display the matrix.
b. To access the element at 3rd column and (6)
2nd row, only the 3rd row and 4th column
of a given matrix.
15 List out the properties of the following: Remembering BTL-1
a. Matrix subtraction (4)
b. Matrix Division (4)
(3)
c. Matrix addition (2)
d. Matrix multiplication
16 i. What is Factor in R and its function? (6) Remembering BTL-1
ii. Distinguish two types of variables with an (7)
example.
17 Explain categorical variables with an example. (13) Evaluating BTL-5
PART – C (15 Marks)
1 i. Explain main features to write R code that runs (8) Evaluating BTL-5
faster.
ii. Difference between package and library. With (7)
examples
2 a. i.Let’s create the following vectors: (7) Creating BTL-6
u <- 4
v <- 8
Use the elementary arithmetic operators +, -, *, /,
and ^ to:
 add u and v
 subtract v from u
 multiply u by v
 divide u by v
 raise u to the power of v
ii. Write a R program to create a vector and find
the length and the dimension of the vector.

(8)
b. i. Suppose u and v are not scalars, but vectors
with multiple elements:
u <- c(4, 5, 6)
v <- c(1, 2, 3)
Without using R, write down what you expect as
the result of the same operations as in the previous
exercise:
 add u and v
 subtract v from u
 multiply u by v
 divide u by v
 raise u to the power of v
ii. Create a Vector using : Seq() function
iii. Write R program to find Sum, Mean and
Product of a vector, ignore elements like NA or
NaN.
3 a. Write a R program to get all prime (5) Evaluating BTL-5
numbers up to a given number
b. Write a R program to count the number of (5)
NA values in a data frame column.
c. Create the following (5)

a. Creating a list
b. Naming list elements
c. Check whether a item exist or not
4 Perform the following operation in data frame: Evaluating BTL-5
a. Write a R program to add a new column in (4)
a given data frame.
b. Write a R program to add new row(s) to (4)
an existing data frame.
c. Write a R program to drop column(s) by (4)
name from a given data frame.
d. Write a R program to drop row(s) by (4)
number from a given data frame.
e. Write a R program to create inner, outer, (4)
left, right join(merge) from given two data
frames
5 Create the following: Creating BTL-6
a. Create factor variables (5)
b. Create ordered factor variables (5)
c. Adding and dropping levels in factor (5)
variable

UNIT – II R DATA STRUCTURES

Scalars -Vectors Matrices - List - Data Frames-Factors -Packages - Data Reshaping


–Data management with repeats, sorting, ordering and lists - Vector indexing,
factors, Data management with strings, display and formatting.

PART – A (2-Marks)
Q. Questions BT Level Compete
No nce

1 Define R vector? Understanding BTL-2


2 Define R lists? Understanding BTL-2
3 To reverse the order of given vector using R vector. Remembering BTL-1
4 Difference between data frame and a matrix in R? Analyzing BTL-4
5 List out the various forms of reshaping data in a data Remembering BTL-1
frame.
6 Examine why R- data reshaping is important? Applying BTL-3
7 Define Transpose of a matrix. Understanding BTL-2
8 Define melt() and cast() function. Understanding BTL-2
9 Define Tidyr package. Understanding BTL-2
10 Identify the use of sort () function. Remembering BTL-1
11 List out the various sorting mechanisms. Remembering BTL-1
12 Explain factor variable? Evaluating BTL-5
13 Explain the recycling of elements in an R vector? Give Evaluating BTL-5
an example.
14 Differentiate vector index and Negative index. Analyzing BTL-4
15 Analyze what is meant by out-of-range index? Analyzing BTL-4
16 Explain the use of length () function? Evaluating BTL-5
17 Point out the attributes of a factor. Analyzing BTL-4
18 Show how to count the number of NA values in a data Remembering BTL-1
frame column
19 Convert a matrix to a 1-dimensional array using Applying BTL-3
Rcode.
20 How will you read a .csv file in R language? Applying BTL-3
21 Convert a given pH levels of soil to an ordered factor Applying BTL-3
using R Code.
22 Write a R program to get the structure of a given data Creating BTL-6
frame
23 Write a R program to get the length of the first two Creating BTL-6
vectors of a given list. g1=1:10,g2=”R
Program”,g3=”HTML”.
24 Show the R code to Add 10 to each element of the first Remembering BTL-1
vector in a given list. g1=1:10,g2=”R
Program”,g3=”HTML”.
PART- B (13 Marks)
1 Explain in detail about data frame with example (13) Remembering BTL-1
R code.
2 Demonstrate an R code to find the factorial of a (13) Applying BTL-3
number (use recursion)
3 Explain list data structure and its operations with (13) Remembering BTL-1
examples.
4 Create a simple data frame from 3 vectors. Order (13) Understanding BTL-2
the entire data frame by the first column.
5 i.Explain about how to create a list in R with an (7) Remembering BTL-1
example? (6)
ii.Explain how to access list element?
iii. Explain how to operate on lists in R?
6 Convert the following multi-line operations to a (13) Evaluating BTL-5
single expression. Check that both approaches
give the same result.
Part a:
w<- u + v
w <- w / 2
w <- w + u
Part b:
w1 <- u^3
w2 <- u - v
w <- w1 / w2
7 i. Define Factor. (4) Understanding BTL-2
ii. How to create a factor and how to access (5)
components of a factor?
iii. How to modify a factor? (4)
8 Explain with an example in changing the orders (13) Remembering BTL-1
of levels?
9 Summarize the functions to join columns and (13) Understanding BTL-2
rows in a data frame.
10 Show how to code R program to create a Data (13) Applying BTL-3
frames which contain details of 5 employees and
display the details
11 Write a R code to check the following: Creating BTL-6
a. Check available R packages (4)
b. Get the list of all packages installed (4)
(3)
c. Install new package (2)
d. Install package manually
12 Write a R program to create a matrix taking a (13) Evaluating BTL-5
given vector of numbers as input and define the
column and row names. Display the matrix.
13 Illustrate a R code using the following functions: (13) Analyze BTL-4
seq() , paste(), print(), format(), mode(), order()
14 Discuss any three commonly used packages with (13) Understanding BTL-2
an example.
15 Sketch out some popular repositories for R (13) Applying BTL-3
package.
16 Illustrate R program to create the system's idea of (13) Analyzing BTL-4
the current date with and without time. Explain
with a syntax.
17 Illustrate R program to create two 2x3 matrix and (13) Analyzing BTL-4
add, subtract, multiply and divide the matrixes.
PART – C(15 Marks)
1 Create the vectors: Creating BTL-6
(2)
(a) (1, 2, 3, . . . , 19, 20)
(b) (20, 19, . . . , 2, 1) (2)
(c) (1, 2, 3, . . . , 19, 20, 19, 18, . . . , 2, 1) (2)

(d) (4, 6, 3) and assign it to the name tmp. (2)


For parts (e), (f) and (g) look at the help for the
function rep.

(e) (4, 6, 3, 4, 6, 3, . . . , 4, 6, 3) where there are (2)


10 occurrences of 4.

(f) (4, 6, 3, 4, 6, 3, . . . , 4, 6, 3, 4) where there (2)


are 11 occurrences of 4, 10 occurrences of 6 and
10 occurrences of 3.

(g) (4, 4, . . . , 4, 6, 6, . . . , 6, 3, 3, . . . , 3) where (3)


there are 10 occurrences of 4, 20 occurrences of
6 and 30 occurrences of 3.

2 i. Explain operations on vectors. (7) Evaluating BTL-5


ii. Write R program to check a given number is (6)
Even or Odd.

3 Explain R function for differentiation and (15) Evaluating BTL-5


integration with an example?

4 i.Consider two vectors u and v: (7) Creating BTL-6


u <- c(8, 9, 10)
v <- c(1, 2, 3)
Create a new vector w in a single line of code:
w <- (2 * u + v) / 10
or carry out each operation on a separate line:
w <- 2 * u
w <- w + v
w <- w / 10

(6)
ii.Convert the following expressions to separate
operations, and check that both approaches give
the same
result:
w <- (u + 0.5 * v) ^ 2
w <- (u + 2) * (u - 5) + v
w <- (u + 2) / ((u - 5) * v)

5 i.Create a simple data frame from 3 vectors. Order (5) Creating BTL-6
the entire data frame by the first column.

ii.Create a data frame from a matrix of your (10)


choice, change the row names so every row says
id_i (where i is the row number) and change the
column names to variable_i (where i is the column
number). I.e., for column 1 it will say variable_1,
and for row 2 will say id_2 and so on.

UNIT – III -DATA PREPARATIONS

R Data Frame: Create, Append, Select, Subset. R sort a data Frame using Order (), R
Dplyr: Data manipulation and Cleaning, Merge Data Frames in R: Full and Partial
Match, Functions in R programming.

PART-A (2 Marks)
Q. Questions BT Level Compete
No nce

1 List out the characteristics of a data frame. Remembering BTL-1


2 Define the structure of a data frame using str () function. Applying BTL-3
3 What is the use of nrow () function? Understanding BTL-2
4 What is the use of subset() function? Understanding BTL-2
5 Write R code to select a column of a data frame. Analyzing BTL-4
6 List down the methods to sort a data frame. Remembering BTL-1
7 What is the use of order() function. Applying BTL-3
8 What is dplyr() function? Applying BTL-3
9 List out the performance of R -dplyr package. Remembering BTL-1
10 How to install and load dplyr package? Understanding BTL-2
11 What is the use of rename() and filter() function? Analyzing BTL-4
12 What is the use of with () and by () functions in R? Analyzing BTL-4
12 What is the use of summarise () function with synax? Analyzing BTL-4
13 List out the common symptoms of messy data. Remember BTL-1
14 Write a R code to remove empty rows and columns. Creating BTL-6
15 How to handle missing value in R? Understanding BTL-2
16 Write down the syntax of grep() function and it use? Creating BTL-6
17 List out the function components. Remember BTL-1
18 What is the use of return () function? Understanding BTL-2
19 List out some built-in functions. Remembering BTL-1
20 Compare the types of functions in R programming? Evaluating BTL-5
21 Explain Argument matching. Evaluating BTL-5
22 Explain Lazy evaluation. Evaluating BTL-5
23 Analyze on how library () and require () functions are Analyzing BTL-4
used.
24 List out the characteristics of a data frame. Applying BTL-3
PART-B (13 Marks)
1 Show how to extract data from data frame. (13) Applying BTL-3
Explain with an example.
2 Explain how to append rows to R data frame with (13) Evaluating BTL-5
an example?
3 Analyse R program to create dataframe with 2 (13) Analyzing BTL-4
columns and order based on particular columns
in decreasing order. Displayed the Sorted
dataframe based on subjects in decreasing order,
displayed the Sorted dataframe based on rollno
in decreasing order
4 List out the dplyr function and its equivalent SQL (13) Remembering BTL-1
with an example.
5 List out the functions to select variables based on (13) Remembering BTL-1
their names.
6 List out the purpose of data cleaning in R with an (13) Remembering BTL-1
example.
7 Explain about Data Manipulation with dplyr (13) Evaluating BTL-5
package
8 With the dataset swiss, create a data frame of only Creating BTL-6
the rows 1, 2, 3, 10, 11, 12 and 13, and only the
variables Examination, Education and
Infant.Mortality.
a) The infant mortality of Sarine is wrong, it
(5)
should be a NA, change it.
b) Create a row that will be the total sum of the
(5)
column, name it Total.
c) Create a new variable that will be the
(3)
proportion of Examination (Examination / Total)
9 Show how to clean the column names of a data (13) Applying BTL-3
frame using R Programming with an example.
10 Explain the following Understanding BTL-2
(5)
a. rbind() to merge two R data frames
b. cbind() to merge two R data frames (5)
c. merge() (3)
11 Analyze R code for the following: (7) Analyzing BTL-4
a. Find partial match in a specific column (6)
b. Find several partial matches
12 Write the syntax for writing functions in R with a (13) Analyzing BTL-4
sample program.
13 Write the R code for the following Applying BTL-3
a. Calling a function with default arguments (5)
b. Calling a function with arguments (4)
(4)
c. Calling a function without arguments
14 Summarize the features of R function? (13) Understanding BTL-2
Explain the following:
a. Full Match
b. Partial match
15 Summarize the functions which helps in (13) Understanding BTL-2
importing data from other applications in R.
With an example.
16 List out the commonly used functions in dplyr (13) Remembering BTL-1
package.
17 i.Explain the following example for writing a (13) Understanding BTL-2
function
a. Throwing a die
ii.Write a short note on:
a.Data Manipulation
b.Data Cleaning
PART-C (15 Marks)
1 Develop the R code for the following: (15) Creating BTL-6
a. Subset data frame by selecting columns
b. Subset data frame by excluding columns
c. Subset data frame by selecting rows
2 Create the dataframe (5) Creating BTL-6
data <- data.frame(x1 = 1:6,
x2 = c(1, 2, 2, 3, 1, 2),
x3 = c("F", "B", "C", "E", "A", "D"))
Use the following functions
a. Arrange function (1)
b. Filter function
(1)
c. Mutate function (1)
(2)
d. Pull function
e. Rename function
(2)
f. Sample)n function (2)
g. Select function (1)

3 Explain with a Sample (Dummy) Data in R and (15) Evaluating BTL-5


perform data manipulation with R.
4 Explain in detail about math function in R with (15) Evaluating BTL-5
an example each?
5 i.Create a function that will return the sum of 2 (7) Creating BTL-6
integers.
ii.Create a function that given a vector will print
(6)
by screen the mean and the standard deviation, it
will Optionally also print the median.

UNIT – IV – DATA FRAMES

Data frames, import of external data in various le formats, statistical functions,


compilation of data - Graphics and plots, statistical functions for central tendency,
variation, skewness and kurtosis, handling of bivariate data through graphics,
correlations, programming and illustration with examples

PART-A (2 Marks)
Q. Questions BT Level Compete
No nce

1 Discuss data frames. Evaluating BTL-5


2 Show how to access columns from a data frame? Applying BTL-3
3 Classify rbind () and cbind () function. Analyzing BTL-4
4 How to find the number of columns in a data frame Creating BTL-6
with an example?
5 Name the function to check if a variable is a data Analyzing BTL-4
frame or not.
6 List out few basic statistic functions. Remembering BTL-1
7 Write a formula to normalize a variable. Applying BTL-3
8 How to draw an empty R plot? Understanding BTL-2
9 How to set the axis labels and title of the R plots? Understanding BTL-2
10 How to save a plot as an image on disc? Understanding BTL-2
11 Define plot () function. Remembering BTL-1
12 Define Skewness. Remembering BTL-1
13 Define Kurtosis. Remembering BTL-1
14 Define Visualizing. Remembering BTL-1
15 Define Bivariate analysis Remembering BTL-1
16 Show Z-test and t-test explain with an equation? Applying BTL-3
17 Show the purpose of using ANOVA test? Applying BTL-3
18 Write the syntax of Covariance and Correlation. Creating BTL-6
19 Discuss about variance. Evaluating BTL-5
20 Discuss about standard deviation. Evaluating BTL-5
21 Explain histogram. Evaluating BTL-5
22 Explain Time series analysis. Analyzing BTL-4
23 How R can be used for predictive analysis? Understanding BTL-2
24 How would you measure correlation in R? Understanding BTL-2
PART-B (13 Marks)
1 Summarize the operations that can be performed (13) Evaluating BTL-5
on a Data frame.
2 Demonstrate with syntax how to select the subset (13) Applying BTL-3
of the data frame.
3 How to access components of a Data Frame? (13) Understanding BTL-2
4 Illustrate how to import data in R programming. (13) Applying BTL-3
5 List out the various methods that one can export (13) Remembering BTL-1
data to a text file with a syntax.
6 How to create two different x and y-axes? (13) Understanding BTL-2
Explain with an example.
7 How to add or change the R plot’s legend? Write (13) Understanding BTL-2
a syntax with an example.
8 How to adjust the size of points in an R plot? (13) Understanding BTL-2
Write a syntax with an example.
9 Illustrate the bivariate analysis of two categorical (13) Applying BTL-3
variables.
10 Point out the function which is used for the (13) Analyzing BTL-4
conversion of covariance to correlation in R.
Explain the function with syntax.
11 List out the methods for calculating the (13) Remembering BTL-1
correlation with an example.
12 Elaborate variance for regression model with an (13) Analyzing BTL-4
example program.
13 Analyze the difference between covariance and (13) Analyzing BTL-4
correlation.
14 i. List out few applications of covariance. (7) Remembering BTL-1
ii.Briefly explain about statistical functions for (6)
central tendency.
15 i.List out few applications of correlations. (7) Remembering BTL-1
ii.How to handle the bivariate data through (6)
graphics?
16 i.Discuss about plot () function. (7) Evaluating BTL-5
ii.Create the scatterplot for the relation between
weight and miles per gallon. (6)
17 Create the following for line chart: Creating BTL-6
a. Simple line graph in R code with plot (5)
function
b. Saving line graph in the PNG file. (4)
c. Create multiple lines in the line chart and
(4)
add a legend to line graph
PART – C (15 Marks)
1 Explain with an example Evaluating BTL-5
a. how to create a data frame (5)
b. To add the new variables to data frame. (5)
(5)
c. How to modify a data frame in R?
2 Write a code to demonstrate various charts using Creating BTL-6
tree datasets for the following
a. Histogram (4)
b. Scatter plot (4)
c. Box plot
(4)
d. Line chart
(3)
3 Show the inferences about skewness and kurtosis (15) Creating BTL-6
of a population given below:

Frequency distribution of litter size in rats, n-815


Litter 1 2 3 4 5 6 7 8 9 1 1 12
Size 0 1
Freq- 7 3 5 116 125 126 121 107 5 3 2 4
ency 8 8 6 7 5
4 Illustrate with the following example to covert the (15) Evaluating BTL-5
covariance value to correlation. Pass two vectors
a and b such that they obey all the terms of a
square matrix. Further, using cov2cor() function,
we achieve a corresponding correlation matrix for
every pair of the data values.

5 Create histogram using hist() function for the (15) Creating BTL-6
built-in dataset airquality which has “ Daily air
quality measurements in New York “.

UNIT – V – INTERFACING

R – CSV Files – Excel File – Binary Files – XML files – Web Data – Database –
Regression – Decision Tree – Random Forest, R Random Forest, Generalized Linear
Model in R with example, K- means Clustering in R with example

PART –A(2 Marks)


Q. Questions BT Level Compete
No nce

1 Show how to delete the content from files? Applying BTL-3


2 Distinguish between binary and text files. Understanding BTL-2
3 Discuss some binary file properties Understanding BTL-2
4 Discuss about some of the packages in R which are Understanding BTL-2
used to scrap data from the web.
5 What do you mean by normal distribution? Analyzing BTL-4
6 What is the difference between Correlation and Analyzing BTL-4
Regression?
7 Show how do you identify outliers? Applying BTL-3
8 Give an example scenario where a multiple linear Creating BTL-6
regression model is necessary.
9 Demonstrate some of the Evaluation Metrics for Applying BTL-3
regression model.
10 What does Intercept means? Remembering BTL-1
11 Difference between Mean Absolute Error (MAE) vs Understanding BTL-2
Mean Squared Error (MSE)?
12 Why do you need to prune the decision tree? Remembering BTL-1
13 Define Tree Boosting? Remembering BTL-1
14 How is a Random Forest related to Decision trees? Applying BTL-3
15 Define Entropy. Remembering BTL-1
16 What is Out-of-Bag error? Remembering BTL-1
17 What do you mean by Bagging? Remembering BTL-1
18 Can random forest algorithm be used both for Evaluating BTL-5
continuous and categorical target variables?
19 Write Generalized Linear Model (GLM) function. Evaluating BTL-5
20 How to create generalized linear model in R? Understanding BTL-2
21 Write out the generalized linear model in R? Evaluating BTL-5
22 What is the main difference between k-Means and k- Analyzing BTL-4
Nearest Neighbours?
23 What is the difference between the Manhattan Analyzing BTL-4
Distance and Euclidean Distance in Clustering?
24 Write about how to pre-process the data for k-Means? Creating BTL-6
PART-B (13 Marks)
1 Illustrate with an example how to read a (13) Applying BTL-3
particular file from the working directory.
2 Explain merge () function and its use with an (13) Evaluating BTL-5
example.
3 List out text file properties with examples (13) Remembering BTL-1
4 List out the basic assumptions of linear (13) Remembering BTL-1
regression.
5 i. How would you detect over fitting in linear (7) Understanding BTL-2
models and how to avoid it?
ii. Identify the problem of over fitting and under
(6)
fitting.
6 How is the Error calculated in a Linear (13) Understanding BTL-2
Regression model?
7 i.How does the CART algorithm produce (7) Understanding BTL-2
Regression Trees?
ii.Brief the following
a. XML Files (2)
b. Binary Files (2)
c. CSV Files
(2)
8 Compare Linear Regression and Decision trees. (13) Analyzing BTL-4
9 i.Explain the structure of decision tree. (7) Evaluating BTL-5

ii.Explain about Generalized Linear Model in R (6)


with example.
10 Compare how the random forest give output for (13) Analyzing BTL-4
classification and regression problems.
11 Summarize advantages of using Random Forest. (13) Understanding BTL-2
12 Analyze how does Random Forest handle (13) Analyzing BTL-4
missing values?
13 Describe how it is possible to perform (13) Remembering BTL-1
Unsupervised learning with random forest?
14 Use the adult data set to illustrate Logistic (13) Applying BTL-3
regression. The “adult” is a great dataset for the
classification task. The objective is to predict
whether the annual income in dollar of an
individual will exceed 50.000. The dataset
contains 46,033 observations and ten features:

 age: age of the individual. Numeric


 education: Educational level of the
individual. Factor.
 marital.status: Marital status of the
individual. Factor i.e. Never-married,
Married-civ-spouse, …
 gender: Gender of the individual. Factor,
i.e. Male or Female
 income: Target variable. Income above or
below 50K. Factor i.e. >50K, <=50K
amongst others

15 Describe some cases where k-means clustering (13) Remembering BTL-1


fails to give good results?
16 Use k-Means Algorithm to create two clusters: (13) Creating BTL-6

17 a. What are some stopping criteria for k- (7) Applying BTL-3


Means Clustering?
b. How does the k-means algorithm works? (6)
PART – C (15 Marks)
1 Problem Statement: Evaluating BTL-5
Consider the R inbuilt data "mtcars". (10)
a. First we create a csv file from it and
convert it to a binary file and store it as a
OS file.
b. Next we read this binary file created into (5)
R.
2 Problem Statement: (15) Evaluating BTL-5

Let's assume we want to play badminton on a


particular day — say Saturday — how will you
decide whether to play or not. Let's say you go
out and check if it's hot or cold, check the speed
of the wind and humidity, how the weather is,
i.e. is it sunny, cloudy, or rainy. Take all these
factors into account to decide if you want to play
Day Weather Temperature Humidity Wind
1 Sunny Hot High Weak
2 Cloudy Hot High Weak
3 Sunny Mild Normal Strong
4 Cloudy Mild High Strong
5 Rainy Mild High Strong
6 Rainy Cool Normal Strong
7 Rainy Mild High Weak
8 Sunny Hot High Strong
9 Cloudy Hot Normal Weak
10 Rainy Mild High Strong

or not. So, calculate all these factors for the last


ten days and form a lookup table like the one
below.

3 Problem Statement: (15) Creating BTL-6


To build a Random Forest model that can study
the characteristics of an individual who was on
the Titanic and predict the likelihood that they
would have survived.
4 Write in detail about k-Means Algorithm with an (15) Creating BTL-6
example.
5 Cluster the following eight points (with (x, y) (15) Evaluating BTL-5

representing locations) into three clusters:

A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5),
A6(6, 4), A7(1, 2), A8(4, 9)

Initial cluster centres are: A1(2, 10), A4(5, 8)


and A7(1, 2).

The distance function between two points a =


(x1, y1) and b = (x2, y2) is defined as-

Ρ(a, b) = |x2 – x1| + |y2 – y1|


Use K-Means Algorithm to find the three cluster
centres after the second iteration.

You might also like