Practical Session 3: With Maximum Frequency)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

!"#"$!"$!%!

&''(

PRACTICAL SESSION 3

Exercise 1
Consider the dataset “bar_EXPENSE.Rda” and answer to the following questions.
1. Produce in Radiant three histograms for the variable “Expenditure”, one for each
category of the variable “Time”, using 33 bins. Would you conclude that there is a
dependence of the expenditure on the time window?
2. Which are the 5 summary numbers, the mean and the coefficient of variation of
expenditure for the three time windows? (Use Radiant).
3. BONUS: Using R, plot expenditure data for the three different time categories, together
with their mean and median.

SOL See the file “commands_PS3.R”

Exercise 2
Consider the histogram

1. Compute the modal class.


SOL The modal class is [4,5) (classes have equal width and therefore we consider the class
with maximum frequency).

2. Give a general definition of mean and median of a sample of observations. Discuss their
different usage as central tendency measures.
SOL The mean is

The median is found by ordering the observations and identifying the observation in position
(𝑛𝑛 + 1)/2 if integer (or the average of the two middle observations if 𝑛𝑛 is even). The mean is very
sensitive to outliers, while the median is a more robust index. Generally, one would prefer using the
median in presence of outliers. Also, if the distribution is left (rigth) skewed the mean will tend to be
smaller (greater) than the median.

3. What can you say about the shape considering the above distribution’s plot? Which of the
following couples of values for the mean and the median seems more reasonable for such
distribution?

a) Median=4.3, mean=4.17;
b) Median=4.17, mean=4.17;
c) Median=4.17, mean=4.3.

SOL The distribution is left skewed, so we expect the mean to be smaller than the median: the
first is the most likely of the three possibilities.

Exercise 3

The following table contains information regarding 100 pizzerias located in three different
Italian cities:

1. What is the proportion of pizzerias that are in Pavia and have a smoking area?
2. What proportion of the pizzerias in Pavia have a smoking area?
3. What proportion of the pizzerias with a smoking area are located in Pavia?
4. Use R‐Radiant to create a side‐by‐side plot of the distribution of SmokingArea within
each District and a stacked bar plot of the conditional distribution of District given the
presence of absence of SmokingArea. Do these two variables appear to be independent?

SOL
1. We want to know, out of the total n = 100 pizzerias, what proportion is in Pavia and has
a smoking area. In other words, we want the joint (relative) frequency:
Fr(District = Pavia, SmokingArea = Yes) = 14 / 100 = 0.14.
2. We want to know, out of the total nPavia = 32 pizzerias in Pavia, what proportion has a
smoking area. In other words, given that we consider only pizzerias in Pavia, which
proportion of them has a smoking area, so we want the conditional frequency:
Fr(SmokingArea = Yes| District = Pavia) = 14 / 32 = 0.4375.

3. We want to know, out of the total nSmoking = 49 pizzerias with a smoking area, what
proportion is in Pavia. In other words, given that we consider only pizzerias in with a
smoking area, which proportion of them is in Pavia, so we want the conditional
frequency: Fr(District = Pavia| SmokingArea = Yes) = 14 / 49 = 0.2857.

4. See the file “commands_PS3.R”

You might also like