MAS202 - Homework For Chapters 1-2-3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Chapter 1

Defining and Collecting Data

1.7 For each of the following variables, determine whether the variable is categorical or
numerical and determine its measurement scale. If the variable is numerical, determine
whether the variable is discrete or continuous.
a. Amount of money spent on clothing in the past month
b. Favorite department store
c. Most likely time period during which shopping for clothing takes place (weekday, weeknight,
or weekend)
d. Number of pairs of shoes owned.
1.8 Suppose the following information is collected from Robert Keeler on his application for a
home mortgage loan at the Metro County Savings and Loan Association.
a. Monthly payments: $2,227
b. Number of jobs in past 10 years: 1
c. Annual family income: $96,000
d. Marital status: Married Classify each of the responses by type of data and measurement
scale.

1.9 School systems in many countries use a scale of numbers from 1 to 5, 6, or sometimes
even 10. For example, in Germany, 1 is the best grade, while in Hungary, 1 is the fail grade.
In Romania, on the other hand, 5 is the minimum passing grade and 10 is the highest grade.
a. Do you think school grades are numerical or categorical? Explain.
b. What is the measurement scale?
c. Do you think an average grade can or should be calculated?

1.10 If two students score a 90 on the same examination, what arguments could be used to
show that the underlying variable—test score—is continuous?

1.12 The American Community Survey (www.census.gov/acs) provides data every year about
communities in the United States. Addresses are randomly selected, and respondents are
required to supply answers to a series of questions. a. Which of the sources of data best
describe the American Community Survey? b. Is the American Community Survey based on
a sample or a population?

1.14 Assume that the recorded heights of 10 students are


120, 122, 128, 176, 124, 127, 121, 125, 127, and 129 centimeters.
Which number do you think will be the outlier while calculating the average height of students
in the class and why? How would you deal with this outlier?

1.15 Transportation engineers and planners want to address the dynamic properties of travel
behavior by describing in detail the driving characteristics of drivers over the course of a
month. What type of data collection source do you think the transportation engineers and
planners should use?
1
1.23 The registrar of a university with a population of N = 4,000 full-time students is asked by
the president to conduct a survey to measure satisfaction with the quality of life on campus.
The following table contains a breakdown of the 4,000 registered full-time students, by gender
and class designation:

The registrar intends to take a probability sample of n = 200 students and project the results
from the sample to the entire population of full-time students.
a. If the frame available from the registrar’s files is an alphabetical listing of the names of all
N = 4,000 registered full-time students, what type of sample could you take? Discuss.
b. What is the advantage of selecting a simple random sample in (a)?
c. What is the advantage of selecting a systematic sample in (a)?
d. If the frame available from the registrar’s files is a list of the names of all N = 4,000 registered
full-time students compiled from eight separate alphabetical lists, based on the gender and
class designation breakdowns shown in the class designation table, what type of sample
should you take? Discuss.
e. Suppose that each of the N = 4,000 registered full-time students lived in one of the 10
campus dormitories. Each dormitory accommodates 400 students. It is college policy to fully
integrate students by gender and class designation in each dormitory. If the registrar is able
to compile a listing of all students by dormitory, explain how you could take a cluster sample.

1.24 Prenumbered sales invoices are kept in a sales journal. The invoices are numbered from
0001 to 5000.
a. Beginning in row 16, column 01, and proceeding horizontally in a table of random numbers
(Table E.1), select a simple random sample of 50 invoice numbers.
b. Select a systematic sample of 50 invoice numbers. Use the random numbers in row 20,
columns 05–07, as the starting point for your selection.
c. Are the invoices selected in (a) the same as those selected in (b)? Why or why not?

1.25 Suppose that 10,000 customers in a retailer’s customer database are categorized by
three customer types: 3,500 prospective buyers, 4,500 first time buyers, and 2,000 repeat
(loyal) buyers. A sample of 1,000 customers is needed.
a. What type of sampling should you do? Why?
b. Explain how you would carry out the sampling according to the method stated in (a).
c. Why is the sampling in (a) not simple random sampling?

1.30 While collecting data using a survey, only 40% of the respondents gave feedback. What
does this tell you about survey methods? And how can a researcher increase the response
rate?

2
1.31 A simple random sample of n = 300 full-time employees is selected from a company list
containing the names of all N = 5,000 full-time employees in order to evaluate job satisfaction.
a. Give an example of possible coverage error.
b. Give an example of possible nonresponse error.
c. Give an example of possible sampling error.
d. Give an example of possible measurement error.

1.32 Results of a 2017 Computer Services, Inc. (CSI) survey of a sample of 163 bank
executives reveal insights on banking priorities among financial institutions (goo.gl/mniYMM).
As financial institutions begin planning for a new year, of utmost importance is boosting
profitability and identifying growth areas. The results show that 55% of bank institutions note
customer experience initiatives as an area in which spending is expected to increase.
Implementing a customer relationship management (CRM) solution was ranked as the top
most important omnichannel strategy to pursue with 41% of institutions citing digital banking
enhancements as the greatest anticipated strategy to enhance the customer experience.
Identify potential concerns with coverage, nonresponse, sampling, and measurement errors.

3
Chapter 2
Organizing and Visualizing Variables
2.3 The following table, stored in Smartphone Sales , represents the annual market share of
smartphones, by type, for the years 2011, 2012, 2013, 2014, and 2015.

a. What conclusions can you reach about the market for smartphones in 2011, 2012, 2013,
2014, and 2015.
b. What differences are there in the 2014 and 2015?

2.4 The Consumer Financial Protection Bureau reports on consumer financial product and
service complaint submissions by state, category, and company. The following table, stored
in FinancialComplaints1 , represents complaints received from Louisiana consumers by
complaint category for 2016.

a. Compute the percentage of complaints for each category.


b. What conclusions can you reach about the complaints for the different categories?
The following table, stored as FinancialComplaints2 , summarizes complaints received from
Louisiana consumers by most-complained about companies for 2016.

4
c. Compute the percentage of complaints for each company.
d. What conclusions can you reach about the complaints for the different companies?

2.5 In addition to the impact of Big Data, what disruptive technology capability do executives
anticipate will have the greatest impact on their firm over the next decade? A survey of 50
Fortune 1000 executives revealed the following:

What conclusions can you reach concerning the disruptive technology capabilities that
executives anticipate will have greatest impact on their firm over the next decade?
2.6 This table represents the summer power-generating capacity by energy source in the
United States as of July 2016.

5
What conclusions can you reach about the source of energy in July 2016?

2.7 Timetric’s 2016 survey of insurance professionals explores the use of technology in the
industry. The file Technologies contains the responses to the question that asked what
technologies these professionals expected to be most used by the insurance industry in the
coming year. Those responses are:

a. Compute the percentage of responses for each technology.


b. What conclusions can you reach concerning expected technology usage in the insurance
industry in the coming year?

2.8 A survey of 1,520 Americans adults asked “Do you feel overloaded with too much
information?” The results indicate that 23% of females feel information overload compared to
17% of males. The results are:

a. Construct contingency tables based on total percentages, row percentages, and column
percentages.
b. What conclusions can you reach from these analyses?

6
2.15 The FIFA World Cup was one of the biggest sporting events of 2018. The file
WC2018TeamAge contains average age of the players (years, in 2018) of the 32 teams that
qualified for the event. These average ages were:
26.04 26.78 27.17 27.57 28.17 28.43 28.61 28.96
26.09 27.09 27.26 27.78 28.22 28.43 28.78 29.17
26.09 27.09 27.26 27.83 28.26 28.52 28.83 29.52
26.48 27.09 27.48 28.09 28.35 28.52 28.91 29.74
Source: Data adapted from https://bit.ly/2zGSWRD.
a. Organize these mean ages as an ordered array.
b. Construct a frequency distribution and a percentage distribution for these mean ages.
c. Around which class grouping, if any, are these mean ages concentrated? Explain.

2.16 The file Utility contains the following data about the cost of electricity (in $) during July
2017 for a random sample of 50 one-bedroom apartments in a large city.
96 171 202 178 147 102 153 197 127 82
157 185 90 116 172 111 148 213 130 165
141 149 206 175 123 128 144 168 109 167
95 163 150 154 130 143 187 166 139 149
108 119 183 151 114 135 191 137 129 158
a. Construct a frequency distribution and a percentage distribution that have class intervals
with the upper class boundaries $99, $119, and so on.
b. Construct a cumulative percentage distribution.
c. Around what amount does the monthly electricity cost seem to be concentrated?

2.17 How far do commuters in Australia travel for work? The file CommutingAustralia
contains data about commuting time and distances of the 89 statistical regions of Australia.
Source: Data extracted from Australian Bureau of Statistics, available at https:// bit.ly/2Qvtvfu.
For the average commuting distance data, a. Construct a frequency distribution and a
percentage distribution. b. Construct a cumulative percentage distribution. c. What
conclusions can you reach concerning the average commuting distance of Australians?

2.18 How does the average annual precipitation differ around the world? The data in
AnnualPrecipitation contains the average annual precipitation data in millimeters for 4,166
weather stations.
Source: Data extracted from UN Data, available at https://bit.ly/2DWMYPz.
a. Construct a frequency distribution and a percentage distribution.
b. Construct a cumulative percentage distribution.
c. What conclusions can you reach concerning the average annual precipitation around the
world?

2.21 Cycling in cities is getting increasingly popular, which has led to challenges in urban
planning. According to the Copenhagenize index, Copenhagen, Denmark, was the most
bicycle friendly city in 2017. Assume a new intersection is under construction in your city. The
file BikeTraffic contains bicycle traffic in your city on 50 different days.
a. Construct a frequency distribution and a percentage distribution.
b. Construct a cumulative percentage distribution.

7
c. What can you conclude about a planned capacity of 250 people for the intersection?

2.22 The file ElectricConsME contains the electric power consumption data (kWh) of 44
randomly selected four-member households from Saudi Arabia and the United Arab Emirates.
a. Construct a frequency distribution and a percentage distribution for each country, using the
following class interval widths for each distribution:
Saudi Arabia: 10,000 but less than 20,000; 20,000 but less than 30,000; and so on.
United Arab Emirates: 0 but less than 10,000; 10,000 but less than 20,000; and so on.
b. Construct cumulative percentage distributions.
c. Which country’s families use more electric power–those from Saudi Arabia or United Arab
Emirates? Explain.
2.24 A survey of online shoppers revealed that in 2015 they bought more of their purchases
online than in stores. The data in OnlineShopping reveals how their purchases were made.
a. Construct a bar chart, a pie or doughnut chart, and a Pareto chart.
b. Which graphical method do you think is best for portraying these data?
c. What conclusions can you reach concerning how online shoppers make purchases?
2.25 How do college students spend their day? The 2016 American Time Use Survey for
college students found the following results:

a. Construct a bar chart, a pie or doughnut chart, and a Pareto chart.


b. Which graphical method do you think is best for portraying these data?
c. What conclusions can you reach concerning how college students spend their day?
2.26 The Energy Information Administration reported the following sources of electricity in the
United States in 2016:

a. Construct a Pareto chart.


b. What percentage of power is derived from coal, nuclear power, or natural gas?
8
c. Construct a pie chart.
d. For these data, do you prefer using a Pareto chart or a pie chart? Why?

2.27 The Consumer Financial Protection Bureau reports on consumer financial product and
service complaint submissions by state, category, and company. The following table, stored
in FinancialComplaints1 , represents complaints received from Louisiana consumers by
complaint category for 2016.

a. Construct a Pareto chart for the categories of complaints.


b. Discuss the “vital few” and “trivial many” reasons for the categories of complaints.

The following table, stored in FinancialComplaints2 , represents complaints received from


Louisiana consumers by mostcomplained-about companies for 2016.

c. Construct a bar chart and a pie chart for the complaints by company.

9
d. What graphical method (Pareto, bar, or pie chart) do you think is best for portraying these
data?

2.30 A survey of 1,520 American adults asked “Do you feel overloaded with too much
information?” The results indicate that 23% of females feel information overload compared to
17% of males. The results are:

a. Construct a side-by-side bar chart of overloaded with too much information and gender.
b. What conclusions can you reach from this chart?

2.32 A research was conducted to find if dogs resemble their owners. The finding of the
research was that people tend to select dogs that in some way resemble them and the
resemblance increases with the duration of ownership. Assume that this finding is specific to
a particular breed of dogs and that the following data has been collected:

a. Draw a side-by-side chart to project whether only dogs of a specific breed resemble their
owners, or dogs of all breeds do so.
b. What conclusions can you draw from the chart?

2.35 The following is a stem-and-leaf display representing the amount of gasoline purchased,
in gallons (with leaves in tenths of gallons), for a sample of 25 cars that use a particular service
station on the New Jersey Turnpike:

a. Construct an ordered array.


b. Which of these two displays seems to provide more information? Discuss.
c. What amount of gasoline (in gallons) is most likely to be purchased?
d. Is there a concentration of the purchase amounts in the center of the distribution?

10
2.36 The FIFA World Cup was one of the biggest sporting events of 2018. The file
WC2018TeamAge contains average age of the players (years, in 2018) of the 32 teams that
qualified for the event. Source: Data adapted from https://bit.ly/2zGSWRD. a. Construct a
stem-and-leaf display. b. Around what value, if any, are the mean ages of teams
concentrated? Explain.
2.37 The file MobileSpeed contains the overall download and upload speeds in mbps for nine
carriers in the United States.
Source: Data extracted from “Best Mobile Network 2016”, bit.ly/1KGPrMm, accessed
November 10, 2016.
a. Construct an ordered array.
b. Construct a stem-and-leaf display.
c. Does the ordered array or the stem-and-leaf display provide more information? Discuss. d.
Around what value, if any, are the download and upload speeds concentrated? Explain.

2.38 The file Utility contains the following data about the cost of electricity during July of a
recent year for a random sample of 50 one-bedroom apartments in a large city:
96 171 202 178 147 102 153 197 127 82
157 185 90 116 172 111 148 213 130 165
141 149 206 175 123 128 144 168 109 167
95 163 150 154 130 143 187 166 139 149
108 119 183 151 114 135 191 137 129 158
a. Construct a histogram and a percentage polygon.
b. Construct a cumulative percentage polygon.
c. Around what amount does the monthly electricity cost seem to be concentrated?

2.40 Unemployment is one of the major issues most governments of the world are faced
with. The file EuUnempl2017 contains employment data for 319 European regions in 2017,
and the following histogram shows the distribution of unemployment rates.

What conclusions can you reach concerning the unemployment rates in Europe?

2.41 How far do commuters in Australia travel for work? The file CommutingAustralia contains
data about commuting time and distances of the 89 statistical regions of Australia. Source:
Data extracted from Australian Bureau of Statistics, available at https://bit.ly/2Qvtvfu.
For the median commuting distance Australians travel for work:
a. Construct a percentage histogram.
11
b. Construct a cumulative percentage polygon.
c. What conclusions can you reach concerning the median commuting distance Australians
travel for work?

2.42 How does the average annual precipitation differ around the world? The data in
AnnualPrecipitation contains the average annual precipitation data in millimeters for 4,166
weather stations.
Source: Data extracted from UN Data, available at https://bit.ly/2DWMYPz.
a. Construct a percentage histogram.
b. Construct a cumulative percentage polygon.
c. What conclusions can you reach concerning the average annual precipitation around the
world?
2.44 Call centers today play an important role in managing day to-day business
communications with customers. Call centers must be monitored with a comprehensive set
of metrics so that businesses can better understand the overall performance of those centers.
One key metric for measuring overall call center performance is service level, the percentage
of calls answered by a human agent within a specified number of seconds. The file
ServiceLevel contains the following data for time, in seconds, to answer 50 incoming calls to
a financial services call center:
16 14 16 19 6 14 15 5 16 18 17 22 6 18 10 15 12
6 19 16 16 15 13 25 9 17 12 10 5 15 23 11 12 14
24 9 10 13 14 26 19 20 13 24 28 15 21 8 16 12
a. Construct a percentage histogram and a percentage polygon.
b. Construct a cumulative percentage polygon.
c. What can you conclude about call center performance if the service level target is set as
“80% of calls answered within 20 seconds”?

2.45 Cycling in cities is getting increasingly popular, which has led to challenges in urban
planning. According to the Copenhagenize index, Copenhagen, Denmark, was the most
bicycle friendly city in 2017. Assume a new intersection is under construction in your city. The
file BikeTraffic contains bicycle traffic in your city on 50 different days.
a. Construct a percentage histogram and a percentage polygon.
b. Construct a cumulative percentage polygon
c. What can you conclude about a planned capacity of 250 people for the intersection?

2.46 The file ElectricConsME contains the electric power consumption data (kWh) of 44
randomly selected four-member households from Saudi Arabia and the United Arab Emirates.
Use the following class interval widths for each distribution:
Saudi Arabia: 10,000 but less than 20,000; 20,000 but less than 30,000; and so on.
United Arab Emirates: 0 but less than 10,000; 10,000 but less than 20,000; and so on.
a. Construct percentage histograms on separate graphs and plot the percentage polygons on
one graph.
b. Plot cumulative percentage polygons on one graph.
c. Which country’s families use more electric power—Saudi Arabia or the United Arab
Emirates? Explain.

12
2.48 The following is a set of data from a sample of n = 11 items:
X: 7 5 8 3 6 0 2 4 9 5 8
Y: 1 5 4 9 8 0 6 2 7 5 4
a. Construct a scatter plot.
b. Is there a relationship between X and Y? Explain.
2.49 The following is a series of annual sales (in $millions) over an 11-year period (2007 to
2017):
Year: 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Sales: 13.0 17.0 19.0 20.0 20.5 20.5 20.5 20.0 19.0 17.0 13.0
a. Construct a time-series plot.
b. Does there appear to be any change in annual sales over time? Explain.

2.50 Movie companies need to predict the gross receipts of individual movies once a movie
has debuted. The following results, stored in PotterMovies , are the first weekend gross, the
U.S. gross, and the worldwide gross (in $millions) of the eight Harry Potter movies:

a. Construct a scatter plot with first weekend gross on the X axis and U.S. gross on the Y axis.
b. Construct a scatter plot with first weekend gross on the X axis and worldwide gross on the
Y axis.
c. What can you say about the relationship between first weekend gross and U.S. gross and
first weekend gross and worldwide gross?

2.51 Data were collected on the area and population of different states in India. The file
IndiaStates contains the vehicle code, zone, area, and population for all 29 states of India.
Source: Data extracted from Population Census 2011, available at https://bit.ly/ 1w1BQIG, and
https://bit.ly/1Lc6uDG.
a. Construct a scatter plot with area on the X axis and population on the Y axis.
b. What conclusions can you reach about the relationship between area and population?

2.52 The file MobileSpeed contains the overall download and upload speeds in mbps for nine
carriers in the United States.
Source: Data extracted from “Best Mobile Network 2016”, bit.ly/1KGPrMm, accessed November 10,
2016.
a. Do you think that carriers with a higher overall download speed also have a higher overall
upload speed?
13
b. Construct a scatter plot with download speed on the X axis and upload speed on the Y axis.
c. Does the scatter plot confirm or contradict your answer in (a)?

2.53 A Pew Research Center survey found a noticeable rise in smartphone ownership and
Internet usage in emerging and developing nations. Once online, adults in these nations are
hungry for social interaction. The file GlobalIntenetUsage contains the level of Internet
usage, measured as the percentage of adults polled who use the Internet as least occasionally
or report owning a smartphone, and the file GlobalSocialMedia contains the level of social
media networking, measured as the percentage of Internet users who use social media sites,
as well as the GDP at purchasing power parity (PPP, current international $) per capita for
each of 28 emerging and developing countries.
Source: Data extracted from Pew Research Center, “Smartphone Ownership and Internet Usage Continues to
Climb in Emerging Economies,” February 22, 2016, bit.ly/2oRv0rp.

a. Construct a scatter plot with GDP (PPP) per capita on the X axis and social media usage
on the Y axis.
b. What conclusions can you reach about the relationship between GDP and social media
usage?
c. Construct a scatter plot with GDP (PPP) per capita on the X axis and Internet usage on the
Y axis.
d. What conclusions can you reach about the relationship between GDP and Internet usage?
2.54 How have stocks performed in the past? The following table presents the data stored in
Stock Performance and shows the performance of a broad measure of stocks (by percentage)
for each decade from the 1830s through the 2000s:

a. Construct a time-series plot of the stock performance from the 1830s to the 2000s.
b. Does there appear to be any pattern in the data?

2.55 The file NewHomeSales contains the number of new homes sold (in thousands) and the
median sales price of new single-family houses sold in the United States recorded at the end
of each month from January 2000 through December 2016.
Source: Data extracted from bit.ly/2eEcIBR, accessed March 19, 2017.
a. Construct a times series plot of new home sales prices.
b. What pattern, if any, is present in the data?
14
Chapter 3
Numerical Descriptive Measures
3.7 Wired, a magazine that delivers a glimpse into the future of business, culture, innovation,
and science, reported the following summary for the household incomes of its two types of
subscribers, the print reader and the digital reader.

Interpret the median household income for the Wired readers and the Wired.com users.
3.8 The quality inspection team at a plant for medium size vehicles intends to compare the
acceptable thickness of two types of brake pads. The expected thickness of the brake pads
is 12 millimeters.
A sample size of 6 brake pads of each of the two types was randomly selected, and the
results showing the thickness of the brake pads were sorted in ascending order, as shown in
the table below:

a. Calculate the mean, median, standard deviation and range for both types of brake
pads.
b. Decide whether Type 1 or Type 2 brake pads meet the expectations set at 12 mm.
c. If the last value for the thickness for Type 2 brake pads is set at 24 mm, calculate the
new values for parts (a) and (b) and explain the effect in the difference.
3.10 The file MobileSpeed contains the overall download and upload speeds in mbps for
nine carriers in the United States.

For the download speed and the upload speed separately:

15
a. Compute the mean and median.
b. Compute the variance, standard deviation, range, and coefficient of variation.
c. Are the data skewed? If so, how?
d. Based on the results of (a) through (c), what conclusions can you reach concerning the
download and upload speed of various carriers?
3.11 The file AirportTraffic contains the number of total passengers and the annual rate
of change in passenger traffic for 50 airports.
Source: Data extracted from https://bit.ly/2kCe15W.
For the total number of passengers and the rate of change in passenger traffic:
a. Calculate the mean, median, and mode.
b. Compute the variance, standard deviation, range, coefficient of variation, and Z scores.
c. Are the data skewed? If so, how?
d. Based on the results of (a) through (c), what conclusions can
you reach about the number of passengers and rate of change in
passenger traffic?
3.12 The FIFA World Cup was one of the biggest sporting events of 2018. The file
WC2018Players contains data of the players of the 32 teams that qualified for the event.
A dummy variable is included to indicate whether a player is also a captain.
Source: Data adapted from https://bit.ly/2zGSWRD.
For the age of captains and non-captains separately:
a. Compute the mean, median, and mode.
b. Compute the variance, standard deviation, range, and coefficient of variation.
c. Are the data skewed? If so, how?
d. Based on the results of (a) through (c), what conclusions can you reach about the age
of captains and non-captains?
3.13 Wheat production is crucial in agriculture in many countries around the world. The
file Wheat contains yield data for 50 selected hectares in 2018 in tons.
a. Compute the mean, median, and mode.
b. Compute the variance, standard deviation, range, coefficient of
variation, and Z scores. Are there any outliers? Explain.
c. Are the data skewed? If so, how?
d. Based on the results of (a) through (c), what conclusions can you
reach concerning the yield of wheat in 2018?
3.14 The file MobileCommerce contains the following mobile commerce penetration
values, the percentage of the country population that bought something online via a
mobile phone in the past month, for 28 of the world’s economies:
23 27 26 25 40 19 26 36 23 33 23 11 38 21
26 23 21 33 40 15 55 30 41 31 47 37 33 28
Source: Data extracted from bit.ly/2jXeS3F.
a. Compute the mean and median.
b. Compute the variance, standard deviation, range, coefficient of variation, and Z scores.
Are there any outliers? Explain.
c. Are the data skewed? If so, how?
d. Based on the results of (a) through (c), what conclusions can you reach concerning
mobile commerce population penetration?

16
3.15 Is there a difference in the variation of the yields of different types of investments?
The file IndexReturn contains data about the performance of 38 indexes across the
world as of July 2018.
Source: Data extracted from https://bit.ly/2yS1QcS.
a. For one-year and five-year returns, separately compute the variance, standard
deviation, range, and coefficient of variation.
b. Based on the result in (a), do one-year or five-year returns have more variation?
Explain.
3.31 Wheat is a staple for many countries around the world and is a crucial part of their
agricultural sectors. The file Wheat contains yield data for 50 selected hectares in 2018 in
tons.
a. Compute the first quartile (Q1), the third quartile (Q3), and the interquartile range.
b. List the five-number summary.
c. Construct a boxplot and describe its shape.
3.32 The file MobileCommerce contains the following mobile commerce penetration
values, the percentage of the country population that bought something online via a
mobile phone in the past month, for twenty-eight of the world’s economies:
23 27 26 25 40 19 26 36 23 33 23 11 38 21
26 23 21 33 40 15 55 30 41 31 47 37 33 28
Source: Data extracted from www.slideshare.net/wearesocialsg/digital-in-2017-
global-overview.
a. Compute the first quartile (Q1), the third quartile (Q3), and the interquartile range.
b. List the five-number summary.
c. Construct a boxplot and describe its shape.
3.33 The file HotelAway contains the average room price (in US$) paid by various
nationalities while traveling abroad (away from their home country) in 2016:
124 101 115 126 114 112 138 85 138 96 130 116
Source: Data extracted from hpi.hotels.com/.
a. Compute the first quartile (Q1), the third quartile (Q3), and the interquartile range.
b. List the five-number summary.
c. Construct a boxplot and describe its shape.
3.34 The FIFA World Cup was one of the biggest sporting events of 2018. The file
WC2018Players contains data of the players of the 32 teams that qualified for the event.
A dummy variable is included to indicate whether a player is also a captain.
Source: Data adapted from https://bit.ly/2zGSWRD.
For the ages of captains and non-captains separately:
a. Compute the first quartile (Q1), the third quartile (Q3), and the interquartile range.
b. List the five-number summary.
c. Construct a boxplot and describe its shape.
3.40 Consider a population of 1,024 mutual funds that primarily invest in large
companies. You have determined that m, the mean one-year total percentage return
achieved by all the funds, is 8.20 and that s, the standard deviation, is 2.75.
a. According to the empirical rule, what percentage of these funds is expected to be
within {1 standard deviation of the mean?
b. According to the empirical rule, what percentage of these funds is expected to be
within {2 standard deviations of the mean?

17
c. According to Chebyshev’s theorem, what percentage of these funds is expected to be
within {1, {2, or {3 standard deviations of the mean?
d. According to Chebyshev’s theorem, at least 93.75% of these funds are expected to
have one-year total returns between what two amounts?
3.42 The file Energy contains the average residential price for electricity in cents per
kilowatt hour in each of the 50 states and the District of Columbia during a recent year.
a. Compute the mean, variance, and standard deviation for the population.
b. What proportion of these states has an average residential price for electricity within {1
standard deviation of the mean, within {2 standard deviations of the mean, and within {3
standard deviations of the mean?
c. Compare your findings with what would be expected based on the empirical rule. Are
you surprised at the results in (b)?
3.43 Thirty companies comprise the DJIA. Just how big are these companies? One
common method for measuring the size of a company is to use its market capitalization,
which is computed by multiplying the number of stock shares by the price of a share of
stock. On January 10, 2017, the market capitalization of these companies ranged from
Traveler’s $33.3 billion to Apple’s $625.6 billion. The entire population of market
capitalization values is stored in DowMarketCap.
Source: Data extracted from money.cnn.com, January 10, 2017.
a. Compute the mean and standard deviation of the market capitalization for this
population of 30 companies.
b. Interpret the parameters computed in (a).

3.46 The file Cereals lists the calories and sugar, in grams, in one serving of seven
breakfast cereals:

b. Compute the coefficient of correlation.


c. Which do you think is more valuable in expressing the relationship between calories
and sugar—the covariance or the coefficient of correlation? Explain.
d. Based on (a) and (b), what conclusions can you reach about the
relationship between calories and sugar?
3.47 Movie companies need to predict the gross receipts of individual movies once a
movie has debuted. The data, shown below and stored in PotterMovies , are the first
weekend gross, the U.S. gross, and the worldwide gross (in $ millions) of the eight Harry
Potter movies:

18
a. Compute the covariance between first weekend gross and U.S. gross, first weekend
gross and worldwide gross, and U.S. gross and worldwide gross.
b. Compute the coefficient of correlation between first weekend gross and U.S. gross,
first weekend gross and worldwide gross, and U.S. gross and worldwide gross.
c. Which do you think is more valuable in expressing the relationship between first
weekend gross, U.S. gross, and worldwide gross—the covariance or the coefficient of
correlation? Explain.
d. Based on (a) and (b), what conclusions can you reach about the relationship between
first weekend gross, U.S. gross, and worldwide gross?
3.48 The file MobileSpeed contains the overall download and upload speeds in mbps for
nine carriers in the U.S.
Source: Data extracted from “Best Mobile Network 2016,” bit.ly/1KGPrMm, accessed
November 10, 2016.
a. Compute the covariance between download speed and upload speed.
b. Compute the coefficient of correlation between download speed and upload speed.
c. Based on (a) and (b), what conclusions can you reach about the relationship between
download speed and upload speed?

19

You might also like