0% found this document useful (0 votes)
11 views

PRACTICE QUIZ

Uploaded by

Gaurav Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

PRACTICE QUIZ

Uploaded by

Gaurav Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

PRACTICE QUIZ

SECTION 1 – R Programming
1. What is R primarily used for?
A) Web development
B) Data analysis and statistics
C) Mobile application development
D) Graphic design
Answer: B
Explanation: R is a programming language primarily used for statistical
computing and data analysis.

2. Which of the following is used to assign a value to a variable in R?


A) <=
B) <-
C) =
D) Both B & C
Answer: D
Explanation: You can assign values to variables using <-, or = in R.

3. What will the following command return: class(TRUE)?


A) "numeric"
B) "logical"
C) "character"
D) "factor"
Answer: B
Explanation: TRUE is a logical value in R, so the class() function will return
"logical".
4. What is the output of sum(1:5)?
A) 10
B) 15
C) 20
D) Error
Answer: B
Explanation: The colon operator 1:5 creates a sequence from 1 to 5, and the
sum() function adds them up (1+2+3+4+5=15).

5. How do you create a data frame in R?


A) data.frame()
B) dataframe()
C) create.dataframe()
D) new.data.frame()
Answer: A
Explanation: The data.frame() function is used to create data frames in R.

6. How do you add a new column to an existing data frame?


A) df$new_column <- values
B) append(df, values)
C) add.column(df, values)
D) new_column(df, values)
Answer: A
Explanation: You can add a new column to a data frame using the $ operator,
e.g., df$new_column <- values.

7. Which of the following is not a valid data type in R?


A) Numeric
B) Logical
C) Matrix
D) ArrayList
Answer: D
Explanation: R does not have an ArrayList data type. It supports numeric,
logical, matrix, and other types.

8. How do you access the third element in a vector named v?


A) v[3]
B) v(3)
C) v[2]
D) v{3}
Answer: A
Explanation: You access elements of a vector in R using square brackets [], so
v[3] retrieves the third element.

9. What is the output of mean(c(2, 4, 6, 8))?


A) 5
B) 6
C) 4
D) 8
Answer: B
Explanation: The mean() function calculates the average, and (2 + 4 + 6 + 8) /
4 = 20 / 4 = 5.

10. What is the purpose of the library() function?


A) Install a new package
B) Unload a package
C) Load an installed package
D) Check for available packages
Answer: C
Explanation: The library() function is used to load an installed package into the
current R session.

11. Which function in R is used to read a CSV file?


A) read.table()
B) read.csv()
C) read.file()
D) read.dataset()
Answer: B
Explanation: The read.csv() function is used to import CSV files in R.

12. What does is.na() function check for?


A) Negative numbers
B) Missing values (NA)
C) Zeros
D) Data type
Answer: B
Explanation: The is.na() function checks whether a value is missing (NA) in R.

13. How do you create a matrix in R?


A) matrix()
B) mat()
C) mtx()
D) create.matrix()
Answer: A
Explanation: The matrix() function is used to create matrices in R.

14. What is the output of class(c(1, 2, 3))?


A) "character"
B) "list"
C) "numeric"
D) "factor"
Answer: C
Explanation: The class() function returns the type of the object, and the given
vector contains numeric values.
15. How do you check the dimensions of a data frame?
A) length()
B) dim()
C) nrow()
D) size()
Answer: B
Explanation: The dim() function returns the dimensions of an object like a
matrix or data frame.

16. Which of the following is an example of a logical operator in R?


A) &&
B) ||
C) !
D) All of the above
Answer: D
Explanation: &&, ||, and ! are all logical operators in R.

17. What will typeof(42L) return?


A) "numeric"
B) "logical"
C) "integer"
D) "double"
Answer: C
Explanation: The L suffix forces R to interpret 42 as an integer, so typeof() will
return "integer".

18. What does summary() do in R?


A) Provides a statistical summary of an object
B) Displays the structure of an object
C) Creates a plot
D) Merges two data frames
Answer: A
Explanation: The summary() function provides descriptive statistics like mean,
median, and range for an object.

19. From given list-

What is the output of student[[1]][2] and student[[3]][2] -


A) pinkey and hyd
B) Rohith and kakumanu
C) bobby and 78
D) 100 and pinkey

20. How do you combine two data frames by rows in R?


A) rbind()
B) cbind()
C) merge()
D) join()
Answer: A
Explanation: The rbind() function combines data frames by rows.

SECTION 2 – MISSING VALUE


1. Which method would you use when the data has a significant amount of
missing values in a feature, and the feature is not very important for the
model?
A) Mean imputation
B) K-Nearest Neighbors (KNN) imputation
C) Dropping the feature
D) Interpolation
Answer: C
Explanation: If a feature has a significant proportion of missing values and is not
critical for the analysis, dropping the feature may be an appropriate approach.

2. Which imputation technique is most suitable for categorical variables?


A) Mean imputation
B) Mode imputation
C) Median imputation
D) Regression imputation
Answer: B
Explanation: Mode imputation replaces missing values with the most frequent
category, which is ideal for categorical data.

3. Which of the following is a potential drawback of using mean


imputation for handling missing values?
● A) It introduces bias into the dataset

● B) It is computationally expensive

● C) It can only be applied to categorical data

● D) It leads to loss of variance in the feature

Answer: D
Explanation: Mean imputation reduces variability in the data since all missing
values are replaced by a single constant (the mean), which can distort the
relationships between variables.

5. In which of the following cases would median imputation be preferred


over mean imputation?
● A) When the data contains outliers
● B) When the dataset is normally distributed

● C) When the missing values are categorical

● D) When the missing values are in time series data

Answer: A
Explanation: Median imputation is more robust to outliers than mean imputation
because the median is not affected by extreme values.

6. Which of the following is a problem caused by using mean imputation


on skewed data?
● A) It increases the variance

● B) It introduces a strong bias toward the extremes

● C) It does not capture the central tendency accurately

● D) It increases computation time

Answer: C
Explanation: In skewed data, mean imputation may not accurately capture the
true central tendency, leading to biased estimates.

7. Why is it essential to consider the mechanism of missing data (MCAR,


MAR, MNAR) before applying imputation techniques?
● A) To select the most computationally efficient imputation method

● B) To avoid overfitting the model

● C) To apply an appropriate imputation technique based on the missing data


pattern
● D) To increase the number of missing data points

Answer: C
Explanation: Understanding the missing data mechanism helps choose the right
imputation technique, as different techniques are better suited for different patterns
(MCAR, MAR, MNAR).

SECTION 3 – OUTLIERS DETECTION AND HANDLING


1. What is an outlier in machine learning?
A) A data point that lies within the expected range
B) A data point that deviates significantly from the majority of the data
C) A data point that has missing values
D) A data point that perfectly fits the model
Answer: B
Explanation: Outliers are data points that differ significantly from other
observations. They may represent errors, variability in the data, or rare events.

2. Which of the following methods is commonly used to detect outliers?


A) Cross-validation
B) Z-Score
C) Gradient Descent
D) Regularization
Answer: B
Explanation: Z-Score measures how many standard deviations a data point is from
the mean. Data points with a Z-Score higher than a certain threshold (e.g., 3) are
considered outliers.

3. What is the main disadvantage of using Z-Score for outlier detection?


A) It is computationally expensive
B) It assumes the data is normally distributed
C) It requires labeled data
D) It can only be used for categorical data
Answer: B
Explanation: Z-Score assumes the data follows a normal distribution. If the data is
not normally distributed, this method may fail to identify true outliers.

4. Which of the following is a non-parametric method for outlier detection?


A) Z-Score
B) Linear Regression
C) K-Nearest Neighbors (KNN)
D) t-test
Answer: C
Explanation: K-Nearest Neighbors is a non-parametric method that can be used for
detecting outliers by considering the distances between neighboring points.
5. Which metric is typically used in Distance-based outlier detection?
A) Mean
B) Median
C) Euclidean Distance
D) Variance
Answer: C
Explanation: Euclidean distance is a common metric used in distance-based outlier
detection methods to measure the distance between data points.

6. Which method uses the interquartile range (IQR) for detecting outliers?
A) Box plot
B) Decision Trees
C) Z-Score
D) Random Forest
Answer: A
Explanation: Box plots visualize the spread of the data and use the IQR to detect
outliers. Values outside 1.5 times the IQR from the quartiles are considered outliers.

7. Which method would be least appropriate for categorical data outlier


detection?
A) One-Hot Encoding
B) Z-Score
C) Frequency Analysis
D) Mode-Based Detection
Answer: B
Explanation: Z-Score is designed for continuous data, not categorical data, making
it less suitable for detecting outliers in categorical datasets.

8. What is a common way to handle outliers in regression models?


A) Drop the outliers
B) Assign them a different weight
C) Use a non-linear model
D) Convert the data to categorical
Answer: A
Explanation: A common approach is to remove or drop the outliers if they are
deemed to be noise or errors, as they can negatively affect regression models.

You might also like