FINAL BADM
FINAL BADM
Câu 2: If P(A) = 0.4 and P(B) = 0.5, and A and B are mutually exclusive, what is P(A U B)
(A)0.9(chap 5)
(B)0.1
(с) 0.5
(D) 0
Câu 3: If the sample proportion is 0.4 and the sample size is 100, what is the standard
error of the proportion?
(A)0.04(chap 6)
(B) 0.08
(C) 0.06
(D) 0.05
Cau 6: What do you call a summary of data that shows the number of observations in each
of several non overlapping classes?
(A) Percent frequency distribution
(B) Histogram
(C) Relative frequency distribution
(D) Frequency distribution( chap 2)
Câu 8: A 95% confidence interval for a population mean is calculated as (10, 20). What
does this mean?
(A) 95% of the data points lie between 10 and 20.
(B) The sample mean is between 10 and 20.
C) 95% of the sample means will fall between 10 and 20
(D) There is a 95% chance the population mean is between 10 and 20(chap 6)
Câu 9: Which type of data is collected from several entities at the same point in time?
(A) Time series data
(B) Longitudinal data
(C)Cross-sectional data(chap 2)
(D) Qualitative data
Câu 11. According to the Addition Law, if A and B are two events, what is P(A U B)?
(A) P(A) + P(B)
(B) P(A) * P(B)
(C) P(A) + P(B) + P(A ∩ B)
(D) P(A) + P(B) - P(A ∩ B)(chap 5)
1. The decisions concerning an organization’s goals and future plans are called
c. strategic d. operational
decisions.(chap 1) decisions.
2. A forecast that helps direct police officers to areas where crimes are likely to occur based
on past data is an example of
a. predictive b. decision
analytics.(chap 1) analysis.
c. prescriptive d. descriptive
analytics. analytics.
3. Optimization models can be used to
a. 175
b. 150
c. 105
d. 130
7. The ratio of the amount of ink used in a table or chart that is necessary to convey
information to the total amount of ink used in the table and chart is known as data-ink
ratio. Using additional ink that is not necessary to convey information has what effect on
the data-ink ratio?
a. is quantitative b. cannot be
data determined
a. 1.0221 b. 1.0148
c. 1.0363 d. 1.1475(chap 2)
11. A _____ determines how far a particular value is from the mean relative to the data
set’s standard deviation.
c. variance d. percentile
12. Which graph represents a negative linear relationship between x and y?
a. A
b. B
c. C
d. a subgroup of a
population/the likelihood of
an outcome
14. Which statement is true about mutually exclusive events?
a. If events A and B cannot
occur at the same time, they
are called mutually
exclusive.(chap 5)
b. 0.4866(chap 5)
c. 0.6321
d. 0.7769
17. A health conscious student faithfully wears a device that tracks his steps. Suppose that
the distribution of the number of steps he takes in a day is normally distributed with a
mean of 10,000 and a standard deviation of 1,500 steps. One day he took 15,000 steps. What
was his percentile on that day?
a. 95%
b. 97.7%
d. 100%
18. In a normal distribution, which is greater, the mean or the median?
a. Mean
b. Median
20. In order to determine an interval for the mean of a population with unknown standard
deviation, a sample of 24 items is selected. The mean of the sample is determined to be 23.
The number of degrees of freedom for reading the t value is _____.
a. 21
b. 22
c. 23( chap 6)
d. 24
21. In interval estimation, as the sample size becomes larger, the interval estimate _____.
a. becomes narrower
a. 5.75 to 24.25
b. 8.56 to 21.40(Chap 6)
c. 11.31 to 18.55
d. 13.02 to 16.98
23. In a random sample of 400 registered voters, 120 indicated they plan to vote for Trump
for President. Determine a 95% confidence interval for the proportion of all the registered
voters who will vote for Trump.
a. (0.25, 0.34)
b. (0.27, 0.32)
c. (0.29, 0.30)
a. Reject the null hypothesis; Fail to reject the null hypothesis (chap 6)
a) Seasonal pattern
b) Regression analysis
c) Forecast error
d) Exponential smoothing
9. Which measure of forecast accuracy considers both positive and negative forecast
errors?
a) It eliminates multicollinearity
a) To eliminate multicollinearity
b) To quantify historical data
c) To uncover patterns in the time series
d) To identify linear or nonlinear relationships between variables
15. What is the primary consideration in using software to select the best forecasting
model?
a) The size of the dataset
b) Managerial knowledge
c) Expert judgment
d) Software output and forecast error measures
16. How does dividing data into training and validation sets contribute to forecasting?
a) The proportion of the total sum of squares explained by the regression equation
2. In a simple linear regression model, how is the relationship between x and y assumed?
a) Non-linear
b) Quadratic
c) Exponential
d) Linear
9. In testing individual regression parameters, what does a p-value less than the
significance
level indicate?
a) Reject the null hypothesis(chap 6)
b) Fail to reject the null hypothesis
c) Accept the alternative hypothesis
d) Reject the alternative hypothesis
12. What is the primary challenge associated with using piecewise linear regression
models?
a) Overfitting the model
b) Underfitting the model
c) Multicollinearity
d) Lack of independence among observations
14. How does the forward selection procedure work in variable selection?
a) Removing variables based on p-values
b) Including variables with the largest p-value
c) Allowing variables to enter the model based on a criterion
d) Iteratively modeling procedures without guidance
15. What does the best subsets procedure focus on in variable selection?
a) Including all variables in the model
b) Removing variables based on p-values
c) Selecting the smallest p-value
d) Iterative modeling procedures based on stepwise elimination
This method involves examining all possible combinations of variables to find the subset that
provides the best model fit according to some criterion (like adjusted R-squared, AIC, or BIC),
rather than following a strictly forward or backward stepwise approach. It essentially compares
different models with different combinations of variables to determine which subset of variables
yields the best model.
17. In regression, what is the purpose of calculating the sum of squares due to error (SSE)?
a) To measure the fit of the regression model
b) To assess multicollinearity
c) To compute the margin of error
d) To estimate the population parameter
18. What conditions are necessary for valid inference( suy luận) in the least squares
regression model?
a) The sample size is large
b) The independent variables are not correlated
c) The error terms are normally distributed with constant variance
d) The regression parameters are equal to zero
Các điều khoản lỗi được phân phối chuẩn với phương sai không đổi
Điều kiện này đảm bảo rằng các giả định của định lý Gauss-Markov được đáp ứng, bao gồm tính
đồng phương sai (phương sai không đổi) và tính chuẩn của các điều khoản lỗi, cho phép suy luận
thống kê hợp lệ. Do đó, tùy chọn c là đúng.
19. In the context of regression analysis, what is the null hypothesis for individual
regression parameters?
a) There is a linear relationship between y and x
b) The error term is equal to zero
c) The dependent variable is normally distributed
d) The regression parameter ( tham số hồi quy) is equal to zero
This hypothesis states that there is no effect of the independent variable on the dependent
variable, meaning the parameter (coefficient) associated with that variable does not contribute to
explaining the variation in the dependent variable.
20. What is the primary objective of the best subsets procedure in variable selection?
a) To include only significant variables
b) To remove variables with large p-values
c) To provide a range of values around the point estimate
d) To guide iterative modeling procedures
Để hướng dẫn quy trình mô hình hóa lặp lại
The best subsets procedure systematically generates and evaluates all possible combinations of
variables to find the subset that provides the best model according to certain criteria, guiding the
user through an iterative process of model selection.
21. What is the purpose of the interaction between independent variables in regression
analysis?
a) To eliminate multicollinearity
b) To improve the fit of the regression model
c) To simplify the regression equation
d) To address nonsignificant variables
Interaction terms in regression allow the effect of one independent variable on the dependent
variable to depend on the level of another independent variable. This can capture more complex
relationships in the data, potentially leading to a better fit of the model by accounting for the
combined effect of variables.
22. How is the quadratic regression model different from a simple linear regression model?
a) It involves multiple independent variables
b) It assumes a non-linear relationship between x and y
c) It uses the least squares method
d) It does not involve an error term
23. What is the potential risk associated with overfitting a regression model?
a) Increased accuracy
b) Increased bias
c) Increased generalization
d) Decreased flexibility
27. When using the t-distribution in hypothesis testing, what does a large t-value suggest?
a) A significant relationship between variables(Chap 6)
b) A non-significant relationship between variables
c) An error in the regression model
d) A large sample size
A large t-value indicates that the observed effect (the difference between the sample mean and
the hypothesized population mean, adjusted for sample variability) is large relative to the
standard error, which typically leads to rejecting the null hypothesis of no effect or no
relationship, suggesting that there is a significant relationship between the variables being tested.
30. How does the best-subsets procedure guide variable selection in regression analysis?
a) By including only significant variables
b) By removing variables based on p-values
c) By selecting the smallest p-value
d) By providing guidance on iterative modeling procedures
This method involves generating and evaluating all possible combinations of variables to find the
subset that provides the best model fit according to certain criteria, thus guiding the user through
an iterative process of model selection.
CHAPTER 6: STATISTICAL INFERENCE
3. What term is used to describe the population from which a sample is drawn?
a) Sampled population
b) Census population
c) Finite population
d) Infinite population
5. What is the primary advantage of selecting a probability sample when sampling from a
finite population?
a) Lower cost
b) Faster data collection
c) Valid statistical inferences
d) Simplicity in implementation
10. What does the Central Limit Theorem state regarding the sampling distribution of the
sample mean?
a) It is always normal regardless of sample size
b) It approximates a normal distribution as sample size becomes large
c) It follows a uniform distribution
d) It is not applicable to small sample sizes
18. How many forms can hypothesis tests about a population parameter take?
a) One
b) Two
c) Three
d) Four
23. What does the Central Limit Theorem state about the sampling distribution of the
sample mean?
a) It is always normally distributed
b) It approaches a normal distribution with increasing sample size
c) It follows a uniform distribution
d) It is not applicable to finite populations
25. When is the t-distribution used instead of the standard normal distribution in interval
estimation?
a) When dealing with large samples
b) When the population standard deviation is known
c) When dealing with small samples or unknown population standard deviation
d) When working with discrete probability distributions
27. What does statistical inference(suy luận thống kê) aim to achieve using sample data?
a) Making exact predictions
b) Drawing conclusions about the sample only
c) Estimating population characteristics
d) Eliminating sampling error
30. What is the primary advantage of selecting a probability sample when sampling from a
finite population?
a) Lower cost
b) Faster data collection
c) Valid statistical inferences
d) Simplicity in implementation
32. What information does the sampling distribution provide about the sample mean?
a) Joint probabilities
b) The mean of the population
c) Different random samples
d) Sampling error
The sampling distribution of the sample mean describes how the sample means vary from
sample to sample due to the randomness of sampling, thus giving insight into the sampling
error, which is the difference between the sample mean and the population mean
34. What does the Central Limit Theorem state about the sampling distribution of the
sample mean?
a) It is always normally distributed
b) It approximates a normal distribution as sample size becomes large
c) It follows a uniform distribution
d) It is not applicable to finite populations
37. In interval estimation of the population mean, what is the range within which 95% of
the values lie if the sampling distribution follows a normal distribution?
a) 1.645 standard deviations
b) 1.960 standard deviations
c) 2.576 standard deviations
d) 3.000 standard deviations
42. How many forms can hypothesis tests about a population parameter take?
a) One
b) Two
c) Three
d) Four
50. What is the primary challenge associated with taking a census for data collection?
a) Cost-effectiveness
b) Time efficiency
c) Misleading results
d) Unreliable data