0% found this document useful (0 votes)
28 views

Hmw 09

Uploaded by

dagilbert
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Hmw 09

Uploaded by

dagilbert
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Stat 324 Homework #2

Student’s Name Here


*Submit your homework to Canvas by the due date and time. Email your lecturer if you have extenuating circumstances and need to request an
extension.

*If an exercise asks you to use R, include a copy of the code and output. Please edit your code and output to be only the relevant portions.

*If a problem does not specify how to compute the answer, you many use any appropriate method. I may ask you to use R or use manual
calculations on your exams, so practice accordingly.

*You must include an explanation and/or intermediate calculations for an exercise to be complete.

*Be sure to submit the HWK2 Autograde Quiz which will give you ~20 of your 40 accuracy points.

*50 points total: 20 points accuracy, and 10 points completion

Basics of Statistics and Summarizing Data Numerically and


Graphically (I)
Exercise 1. There are 12 numbers in a sample, and the mean is x̄ = 27 . The minimum of the sample is accidentally changed from 13.8 to 1.38.

a. Is it possible to determine the direction in which (increase/decrease) the mean (x̄)changes? Or how much
the mean changes? If so, by how much does it change? If not, why not? How do you know?

b. Is it possible to determine the direction in which the median changes? Or how much the median changes?
If so, by how much does it change? If not, why not? How do you know?

c. Is it possible to predict the direction in which the standard deviation changes? If so, does it get larger or
smaller? If not, why not? How do you know?

Exercise 2: Recall the computer disk error data given in HWK 1. The table below tabulates the number of errors detected on each of the 100 disks
produced in a day.

Number of Defects Number of Disks

0 42

1 30

2 16

3 7

4 5

A frequency histogram showing the frequency for number of errors on the 100 disks is given below.

error.data=c(rep(0,42), rep(1,30), rep(2,16), rep(3,7), rep(4, 5))


hist(error.data,
breaks=c(seq(from=-0.5, 4.5, by=1)),
xlab="Defects", main="Number of Defects",
labels=TRUE, ylim=c(0,60))

a. What is the shape of the histogram for the number of defects observed in this sample? Why does that
make sense in the context of the question?

b. Calculate the mean and median number of errors detected on the 100 disks ‘by hand’ and using the built-
in R functions. How do the mean and median values compare and is that consistent with what we would
guess based on the shape? [You can use the text such as x̄ =
value1
to help you show your work neatly].
value2

c. Calculate the sample standard deviation ``by hand” and using the built in R function. Are the values
consistent between the two methods? How would our calculation differ if instead we considered these 100
values the whole population? hint: use multiplication instead of repeated addition

d. Construct a boxplot for the number of errors data using R with helpful labels. Explain how the shape of the
data identified in (a) can be seen from the boxplot.

e. Describe why the histogram is better able to show the discrete nature of the data than a boxplot.

Exercise 3: A company that manufactures toilets claims that its new presure-assisted toilet reduces the
average amount of water used by more thaan 0.5 gallons per flush when compared to its current model. The
company selects 20 toilets of the current type and 19 of the New type and measures the amount of water used
when each toilet is flushed once. The number of gallons measured for each flush are recorded below. The
measurements are also given in flush.csv.

Current Model: 1.63, 1.25, 1.23, 1.49, 2.11, 1.48, 1.94, 1.72, 1.85, 1.54, 1.67, 1.76, 1.46, 1.32, 1.23, 1.67, 1.74, 1.63, 1.25, 1.56

New Model: 1.28, 1.19, 0.90, 1.24, 1.00, 0.80, 0.71, 1.03, 1.27, 1.14, 1.36, 0.91, 1.09, 1.36, 0.91, 0.91, 0.86, 0.93, 1.36

a. Use R to create histograms to display the sample data from each model (any kind of histogram that you
want since sample sizes are similar). Have identical x and y axis scales so the two groups’ values are
more easily compared. Include useful titles.

b. Compare the shape of the gallons flushed from the two models of toilets samples.

c. Compute the mean and median gallons flushed for the Current and New Model toilets using the built-in R
function. Compare both measures of center within each group and comment on how that relationship
corresponds to the datas’ shapes. Also compare the measures of center across the two groups and
comment on how that relationship is evident in the histograms.

d. Compute (using built-in R function) and compare the sample standard deviation of gallons flushed by the
current and new model toilets. Comment on how the relative size of these values can be identified from
the histograms.

e. Use R to create side-by-side boxplots of the two sets in R so they are easy to compare.

f. Explain why there are no values shown as a dot (outlier) on the Current Model flush boxplot. To what
values do the Current model flush boxplot whiskers extend? (Use R for your boxplot calculations and
type=2 for quantiles)

g. What would be the mean and median gallons flushed if we combined the two data sets into one large data
set with 39 observations? Show how the mean can be calculated using R and then from the summary
measures in part (c) along with the sample sizes. Explain why the median of the combined set cannot be
computed based on the summaries in part (c).

Exercise 4: The data below indicate the contamination in parts per million in each of 50 samples of drinking
water at a specific location.

contamination: 388,388,384,1962,389,397,380,385,383,380,396,406,369,392,380,372,387,402,381,390,396,399,369,395,395,388,403,405,
371,369,395,379,395,378,394,382,379,382,402,387,371,384,387,387,367,380,389,388,384,368

Values that are greater than Q3 + 1.5IQR or less than Q1 − 1.5IQR) are typically considered outliers.
What value is an outlier in this data?

Exercise 5: The data below indicate the time (in seconds) that it takes 25 seperate employees to complete a
certain task.

task: 202, 275, 236, 196, 241, 277, 225, 249, 266, 206, 219, 288, 265, 219, 268, 305, 184, 273, 308, 239, 240, 221, 252, 261, 244

What is the variance of the data in minutes?

You might also like