Presentation Assignment

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

Presentation Assignment

Zuber

4/27/2022

Import the data and load the required packages


suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(corrplot))
df = read.csv(file.choose(),header = T,stringsAsFactors = T)

INTRODUCTION
The primary goal of this study is to identify factors that impact the quantity and unit
pricing of tomatoes. To find the factors that influence the quantity and unit price of the
tomatoes, we must perform an analysis of variance, Pearson’s correlation, and exploratory
data analysis. The following are the research questions:
i. Does tomato sub-unit have any impact on the tomato quantity?

ii. Is there a relationship between tomato sub-unit and unit price of the tomato?

iii. Is there a relationship between unit price and variety

iv. Is there a relationship between tomato variety and tomato quantity

Description of the data


str(df)

## 'data.frame': 331 obs. of 9 variables:


## $ Date : Factor w/ 78 levels "10/1/2019","10/1/2021",..: 26 74 10 28
43 43 56 58 55 26 ...
## $ Invoice : int 21048 21114 21160 16016 16032 16032 18057 19078 21055
21049 ...
## $ Client : Factor w/ 1 level "HFS": 1 1 1 1 1 1 1 1 1 1 ...
## $ Sub_Unit : Factor w/ 14 levels "Altoona","BJC",..: 10 3 2 10 4 4 5 11
13 13 ...
## $ Crop : Factor w/ 2 levels "tomatoes","Tomatoes": 2 2 2 2 2 2 2 2 2
2 ...
## $ Variety : Factor w/ 9 levels "Cherry","Grape",..: 1 1 9 4 4 9 1 1 9 1
...
## $ Quantity : num 5 5 5 5 5 5 5 5 5.2 5.7 ...
## $ Units : Factor w/ 13 levels "0.76","0.81",..: 12 12 12 13 13 13 13
13 12 12 ...
## $ Unit_Price: num 2.1 2.53 1.25 0.69 2.54 0.61 1.48 1.87 1.1 2.1 ...
The majority of the variables in this data frame are categorical, according to the structure
of the data frame. The data set has 331 observations and 9 variables. The unit price,
quantity, and invoice are all numerical values.
First 6 observation of the data}
head(df)

## Date Invoice Client Sub_Unit Crop Variety Quantity Units


## 1 7/30/2021 21048 HFS Pollock Tomatoes Cherry 5 pound
## 2 9/3/2021 21114 HFS Cafe Laura Tomatoes Cherry 5 pound
## 3 10/5/2021 21160 HFS BJC Tomatoes Slicers 5 pound
## 4 8/11/2016 16016 HFS Pollock Tomatoes Heirloom 5 pounds
## 5 8/25/2016 16032 HFS Café Laura Tomatoes Heirloom 5 pounds
## 6 8/25/2016 16032 HFS Café Laura Tomatoes Slicers 5 pounds
## Unit_Price
## 1 2.10
## 2 2.53
## 3 1.25
## 4 0.69
## 5 2.54
## 6 0.61

Exploratory Data Analysis


Boxplot of Quantity by Sub_unit
df%>%
ggplot(aes(Quantity,fill=Sub_Unit))+
geom_boxplot()+
labs(title = "Boxplot of Quantity by Sub_Unit")+
theme_bw()
There are just a few outliers, according to the boxplot. The majority of the values based on
the sub-units do not follow a normal distribution. The quantities for sub-units do not have
an equal mean, indicating that there is a considerable variance There are just a few outliers,
according to the boxplot. The majority of the values based on the sub-units do not follow a
normal distribution. The quantities for sub-units do not have an equal mean, indicating that
there is a considerable variance between sub-units sub-units.
Boxplot of Unit_Price by Sub_unit
df%>%
ggplot(aes(Unit_Price,fill=Sub_Unit))+
geom_boxplot()+
labs(title = "Boxplot of Unit_Price by Sub_Unit")+
theme_bw()
According to the boxplot, the variable Unit Price is not normally distributed, hence there is
no obvious evidence of a relationship between Sub Unit and Unit Price. This variable must
be transformed using the log-function.
Boxplot of log(Unit_Price) by Sub_unit
df%>%
ggplot(aes(log(Unit_Price),fill=Sub_Unit))+
geom_boxplot()+
labs(title = "Boxplot of Unit_Price by Sub_Unit")+
theme_bw()
After we transformed the Unit Price, there is clear evidence that the mean of the Unit Price
is not equal for all sub-units, implying that there is a relationship between Sub Unit and
Unit Price.
Boxplot of log(Unit_Price) by Variety
df%>%
ggplot(aes(log(Unit_Price),fill=Variety))+
geom_boxplot()+
labs(title = "Boxplot of Unit_Price by Variety")+
theme_bw()
The boxplot above shows that tomato varieties have a significant difference Unit_Price on
average.
Boxplot of Quantity based on Sub_Unit.
df%>%
ggplot(aes(Quantity,fill=Sub_Unit))+
geom_boxplot()+
labs(title = "Boxplot of Quantity by Sub_Unit")+
theme_bw()
According to this boxplot, the tomato sub-units show a substantial difference in average
quantity.
Boxplot of Quantity based on Variety.
df%>%
ggplot(aes(Quantity,fill=Variety))+
geom_boxplot()+
labs(title = "Boxplot of Quantity by Variety")+
theme_bw()
According to the boxplot, the tomato varieties show a substantial difference in average
quantity.
Histogram of Quantity
df%>%
ggplot(aes(Quantity))+
geom_histogram(bins = 10,aes(fill="red",col=2))+
labs(title = "Histogram of Quantity")+
theme_bw()
According to the histogram, the majority of the quantity falls between 10 and 20. There is a
positive skewness or a long right tail. As a result, the quantity does not follow normal
distribution.
Histogram of Unit_Price
df%>%
ggplot(aes(Unit_Price))+
geom_histogram(bins = 10,aes(fill="red",col=2))+
labs(title = "Histogram of Unit_Price")+
theme_bw()
Unit Price’s histogram has a positive skewness and hence does not follow the normal
distribution.

Correlation
The Pearson’s correlation measures the direction and strenght of the linear relationship
among the numerical data.
Correlation heatmap
Corr = cor(df[,c(2,7,9)])
corrplot(Corr,method = "number")
According to the
correlation findings, there is no substantial association between the numerical data. The
predictor variable has a slight negative association with the target variables.The
correlation results indicate that linear regression cannot fit the data.

ANOVA
In this part, we will use ANOVA to determine whether or not there is a significant
relationship between the variables of interest, as well as which variables have a significant
influence on our target variables.
1) Hypothesis testing on research question
H0: There is no significant relationship between tomato quantity and tomato variety.
H1: There is a significant relationship between tomato quantity and tomato variety.
Result:
attach(df)
summary(aov(log(Quantity)~Variety))

## Df Sum Sq Mean Sq F value Pr(>F)


## Variety 8 5.23 0.6538 1.843 0.0686 .
## Residuals 322 114.26 0.3548
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is no significant relationship between tomato quantity and tomato variety since the
p-value of 0.0686 is bigger than the significance level of 0.05. At a 95% confidence level, we
found that tomato varieties have no effect on quantity.
2) Hypothesis testing on research question
Hypothesis
H0: There is no relationship between tomato quantity and tomato sub-unit.
H1: There is a relationship between tomato quantity and tomato sub-unit.
Result:
summary(aov(log(Quantity)~Sub_Unit))

## Df Sum Sq Mean Sq F value Pr(>F)


## Sub_Unit 13 13.75 1.0578 3.171 0.000171 ***
## Residuals 317 105.74 0.3336
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Because the p-value of 0.000171 is smaller than the significance level of 0.05, there is a
significant relationship between tomato quantity and tomato sub-unit, implying that sub-
units have a major impact on tomato quantity at a 95 percent level of significance.
TukeyHSD Post hoc
TukeyHSD(aov(log(Quantity)~Sub_Unit))

## Tukey multiple comparisons of means


## 95% family-wise confidence level
##
## Fit: aov(formula = log(Quantity) ~ Sub_Unit)
##
## $Sub_Unit
## diff lwr upr p
adj
## BJC-Altoona -4.658149e-01 -1.605616614 0.67398680
0.9834676
## Cafe Laura-Altoona -5.846038e-01 -1.462023576 0.29281595
0.5916919
## Café Laura-Altoona -1.779447e-01 -1.168691153 0.81280165
0.9999977
## Findlay-Altoona 4.083486e-01 -0.260025988 1.07672313
0.7245905
## HUB-Altoona 2.862238e-01 -1.214394886 1.78684247
0.9999952
## HUB Food Court-Altoona -5.491965e-01 -2.588135795 1.48974276
0.9997428
## McAlister's-Altoona 2.701769e-01 -0.782726863 1.32308057
0.9998497
## Penn Stater-Altoona 3.230598e-01 -0.476678000 1.12279756
0.9850711
## Pollock-Altoona 2.594279e-01 -0.403434497 0.92229039
0.9887460
## Redifer-Altoona 2.162148e-01 -0.398338495 0.83076816
0.9958610
## Sbarro-Altoona -4.658149e-01 -2.504754186 1.57312437
0.9999598
## Waring-Altoona 1.491683e-01 -0.535937221 0.83427382
0.9999770
## Warnock-Altoona -3.338572e-02 -0.687236633 0.62046520
1.0000000
## Cafe Laura-BJC -1.187889e-01 -1.291876488 1.05429868
1.0000000
## Café Laura-BJC 2.878702e-01 -0.972228041 1.54796836
0.9999598
## Findlay-BJC 8.741635e-01 -0.151991279 1.90031824
0.1922529
## HUB-BJC 7.520387e-01 -0.938560441 2.44263784
0.9662702
## HUB Food Court-BJC -8.338161e-02 -2.265935713 2.09917249
1.0000000
## McAlister's-BJC 7.359918e-01 -0.573540702 2.04552422
0.8254095
## Penn Stater-BJC 7.888747e-01 -0.327301620 1.90505100
0.4906947
## Pollock-BJC 7.252429e-01 -0.297330202 1.74781591
0.4846096
## Redifer-BJC 6.820297e-01 -0.309910192 1.67396967
0.5384386
## Sbarro-BJC 1.021405e-14 -2.182554104 2.18255410
1.0000000
## Waring-BJC 6.149832e-01 -0.422146779 1.65211319
0.7640649
## Warnock-BJC 4.324292e-01 -0.584325477 1.44918386
0.9767197
## Café Laura-Cafe Laura 4.066591e-01 -0.622206808 1.43552494
0.9877018
## Findlay-Cafe Laura 9.929524e-01 0.269273522 1.71663125
0.0004344
## HUB-Cafe Laura 8.708276e-01 -0.655227100 2.39688231
0.8092497
## HUB Food Court-Cafe Laura 3.540730e-02 -2.022324446 2.09313904
1.0000000
## McAlister's-Cafe Laura 8.547807e-01 -0.234068625 1.94362996
0.3113599
## Penn Stater-Cafe Laura 9.076636e-01 0.061160752 1.75416644
0.0228862
## Pollock-Cafe Laura 8.440318e-01 0.125440664 1.56262285
0.0066902
## Redifer-Cafe Laura 8.008186e-01 0.126532106 1.47510518
0.0056608
## Sbarro-Cafe Laura 1.187889e-01 -1.938942837 2.17652065
1.0000000
## Waring-Cafe Laura 7.337721e-01 -0.005386949 1.47293118
0.0538697
## Warnock-Cafe Laura 5.512181e-01 -0.159068863 1.26150506
0.3300599
## Findlay-Café Laura 5.862933e-01 -0.271275390 1.44386203
0.5482007
## HUB-Café Laura 4.641685e-01 -1.129743614 2.05808069
0.9994009
## HUB Food Court-Café Laura -3.712518e-01 -2.479799353 1.73729582
0.9999982
## McAlister's-Café Laura 4.481216e-01 -0.733955288 1.63019849
0.9915927
## Penn Stater-Café Laura 5.010045e-01 -0.462468385 1.46447745
0.8929451
## Pollock-Café Laura 4.373727e-01 -0.415906953 1.29065234
0.9028403
## Redifer-Café Laura 3.941596e-01 -0.422158575 1.21047774
0.9364201
## Sbarro-Café Laura -2.878702e-01 -2.396417744 1.82067743
0.9999999
## Waring-Café Laura 3.271130e-01 -0.543558597 1.19778470
0.9922664
## Warnock-Café Laura 1.445590e-01 -0.701739119 0.99085719
0.9999987
## HUB-Findlay -1.221248e-01 -1.538352894 1.29410333
1.0000000
## HUB Food Court-Findlay -9.575451e-01 -2.935200021 1.02010985
0.9351182
## McAlister's-Findlay -1.381717e-01 -1.066854597 0.79051116
0.9999998
## Penn Stater-Findlay -8.528879e-02 -0.712525726 0.54194815
0.9999999
## Pollock-Findlay -1.489206e-01 -0.588502555 0.29066130
0.9971450
## Redifer-Findlay -1.921337e-01 -0.554792597 0.17052512
0.8789438
## Sbarro-Findlay -8.741635e-01 -2.851818412 1.10349146
0.9679616
## Waring-Findlay -2.591803e-01 -0.731636862 0.21327632
0.8491052
## Warnock-Findlay -4.417343e-01 -0.867605954 -0.01586262
0.0337375
## HUB Food Court-HUB -8.354203e-01 -3.226288539 1.55544792
0.9961283
## McAlister's-HUB -1.604694e-02 -1.649320875 1.61722700
1.0000000
## Penn Stater-HUB 3.683599e-02 -1.445917541 1.51958952
1.0000000
## Pollock-HUB -2.679585e-02 -1.440430924 1.38683923
1.0000000
## Redifer-HUB -7.000896e-02 -1.461645877 1.32162796
1.0000000
## Sbarro-HUB -7.520387e-01 -3.142906930 1.63882953
0.9986540
## Waring-HUB -1.370555e-01 -1.561256001 1.28714502
1.0000000
## Warnock-HUB -3.196095e-01 -1.729041499 1.08982248
0.9999631
## McAlister's-HUB Food Court 8.193734e-01 -1.319084187 2.95783093
0.9907241
## Penn Stater-HUB Food Court 8.722563e-01 -1.153570694 2.89808329
0.9741895
## Pollock-HUB Food Court 8.086245e-01 -1.167174390 2.78442331
0.9832603
## Redifer-HUB Food Court 7.654113e-01 -1.194708617 2.72553131
0.9889737
## Sbarro-HUB Food Court 8.338161e-02 -2.677355225 2.84411844
1.0000000
## Waring-HUB Food Court 6.983648e-01 -1.285007075 2.68173671
0.9958283
## Warnock-HUB Food Court 5.158108e-01 -1.456983032 2.48860463
0.9998148
## Penn Stater-McAlister's 5.288293e-02 -0.974398852 1.08016471
1.0000000
## Pollock-McAlister's -1.074891e-02 -0.935472626 0.91397481
1.0000000
## Redifer-McAlister's -5.396202e-02 -0.944693816 0.83676977
1.0000000
## Sbarro-McAlister's -7.359918e-01 -2.874449317 1.40246580
0.9966625
## Waring-McAlister's -1.210086e-01 -1.061804446 0.81978734
1.0000000
## Warnock-McAlister's -3.035626e-01 -1.221848124 0.61472299
0.9977657
## Pollock-Penn Stater -6.363184e-02 -0.684991830 0.55772816
1.0000000
## Redifer-Penn Stater -1.068450e-01 -0.676386359 0.46269646
0.9999961
## Sbarro-Penn Stater -7.888747e-01 -2.814701682 1.23695230
0.9892480
## Waring-Penn Stater -1.738915e-01 -0.818927294 0.47114433
0.9997404
## Warnock-Penn Stater -3.564455e-01 -0.968182897 0.25529190
0.7854534
## Redifer-Pollock -4.321311e-02 -0.395609925 0.30918370
1.0000000
## Sbarro-Pollock -7.252429e-01 -2.701041704 1.25055600
0.9937837
## Waring-Pollock -1.102596e-01 -0.574885619 0.35436633
0.9999379
## Warnock-Pollock -2.928137e-01 -0.709981184 0.12435386
0.5027121
## Sbarro-Redifer -6.820297e-01 -2.642149704 1.27809023
0.9962844
## Waring-Redifer -6.704653e-02 -0.459686915 0.32559385
0.9999987
## Warnock-Redifer -2.496005e-01 -0.584739201 0.08553811
0.3994621
## Waring-Sbarro 6.149832e-01 -1.368388684 2.59835510
0.9988397
## Warnock-Sbarro 4.324292e-01 -1.540364641 2.40522302
0.9999751
## Warnock-Waring -1.825540e-01 -0.634230560 0.26912253
0.9850017

According to the Tukey post-hoc test, the following sub-units have a significant influence
on tomato quantity: Laura - Penn Stater’s Cafe, Pollock’s Cafe, Redifer’s Cafe, and Findlay’s
Cafe
3) Hypothesis testing on research question
H0: There is no significant relationship between tomato unit price and tomato variety.
H1: There is a significant relationship between tomato unit price and tomato variety.
Result:
summary(aov(log(Unit_Price)~Variety))

## Df Sum Sq Mean Sq F value Pr(>F)


## Variety 8 54.02 6.752 11.73 1.17e-14 ***
## Residuals 322 185.39 0.576
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Because the p-value of 1.17e-14 is less than the significance level of 0.05, there is a
significant relationship between tomato unit price and tomato variety at a 95 percent level
of significance, showing that tomato varieties have a substantial influence on the price.
TukeyHSD Post hoc
TukeyHSD(aov(log(Unit_Price)~Variety))

## Tukey multiple comparisons of means


## 95% family-wise confidence level
##
## Fit: aov(formula = log(Unit_Price) ~ Variety)
##
## $Variety
## diff lwr upr p adj
## Grape-Cherry 0.07631996 -0.45711238 0.60975229 0.9999570
## Green-Cherry -0.31696562 -1.69979722 1.06586597 0.9985522
## Heirloom-Cherry 0.34322758 -0.09992525 0.78638040 0.2772887
## Hydroponic-Cherry -0.12735612 -2.50556374 2.25085151 1.0000000
## Mix-Cherry -1.06624668 -3.44445430 1.31196094 0.8972542
## Roma-Cherry -0.44909063 -1.07468323 0.17650198 0.3811945
## slicers-Cherry -0.24583772 -2.62404534 2.13236990 0.9999965
## Slicers-Cherry -0.76032789 -1.06197159 -0.45868418 0.0000000
## Green-Grape -0.39328558 -1.84792475 1.06135359 0.9953780
## Heirloom-Grape 0.26690762 -0.36565567 0.89947091 0.9255530
## Hydroponic-Grape -0.20367607 -2.62434197 2.21698982 0.9999993
## Mix-Grape -1.14256664 -3.56323253 1.27809926 0.8668435
## Roma-Grape -0.52541059 -1.29684950 0.24602833 0.4567263
## slicers-Grape -0.32215768 -2.74282357 2.09850822 0.9999753
## Slicers-Grape -0.83664784 -1.37954822 -0.29374747 0.0000795
## Heirloom-Green 0.66019320 -0.76381856 2.08420495 0.8781700
## Hydroponic-Green 0.18960951 -2.54668297 2.92590199 0.9999999
## Mix-Green -0.74928106 -3.48557354 1.98701142 0.9949564
## Roma-Green -0.13212501 -1.62302781 1.35877779 0.9999990
## slicers-Green 0.07112790 -2.66516458 2.80742038 1.0000000
## Slicers-Green -0.44336226 -1.82987371 0.94314918 0.9858421
## Hydroponic-Heirloom -0.47058369 -2.87296954 1.93180215 0.9995395
## Mix-Heirloom -1.40947425 -3.81186010 0.99291159 0.6606229
## Roma-Heirloom -0.79231820 -1.50432408 -0.08031233 0.0166991
## slicers-Heirloom -0.58906529 -2.99145114 1.81332055 0.9976603
## Slicers-Heirloom -1.10355546 -1.55806089 -0.64905003 0.0000000
## Mix-Hydroponic -0.93889056 -4.29015075 2.41236962 0.9941062
## Roma-Hydroponic -0.32173451 -2.76436413 2.12089510 0.9999772
## slicers-Hydroponic -0.11848160 -3.46974179 3.23277858 1.0000000
## Slicers-Hydroponic -0.63297177 -3.01332096 1.74737741 0.9958773
## Roma-Mix 0.61715605 -1.82547356 3.05978566 0.9971112
## slicers-Mix 0.82040896 -2.53085122 4.17166914 0.9976867
## Slicers-Mix 0.30591879 -2.07443039 2.68626798 0.9999812
## slicers-Roma 0.20325291 -2.23937670 2.64588252 0.9999994
## Slicers-Roma -0.31123726 -0.94492241 0.32244789 0.8387949
## Slicers-slicers -0.51449017 -2.89483935 1.86585902 0.9990526

According to the Tukey post-hoc test, the following varieties have a significant influence on
unit price of the tomatoes: Slicers-Cherry, Slicers-Grape, Roma-Heirloom, and Slicers-
Heirloom.
4) Hypothesis testing on research question
H0: There is no significant relationship between tomato unit price and tomato sub-unit.
H1: There is a significant relationship between tomato unit price and tomato sub-unit.
Result:
summary(aov(log(Unit_Price)~Sub_Unit))

## Df Sum Sq Mean Sq F value Pr(>F)


## Sub_Unit 13 10.08 0.7754 1.072 0.383
## Residuals 317 229.33 0.7234

There is no significant relationship between tomato unit price and tomato sub-unit because
p-value of 0.383 is greater than the significance level of 0.05. We conclude that the sub-
units of the tomatoes do not influence the price of the tomatoes at a 95% level of
confidence.

Conclusion
To summarise all that has been stated, the varieties that impact tomato prices include
Slicers-Cherry, Slicers-Grape, Roma-Heirloom, and Slicers-Heirloom. As a result, while
growing tomatoes, we should examine the variety because it has a considerable impact on
the price. The tomato’s sub-units have a significant impact on its quantity. The following
sub-units have a substantial influence on the quantity of the tomato: Laura - Penn Stater’s
Cafe, Pollock’s Cafe, Redifer’s Cafe, and Findlay’s Cafe. However, the sub-units have no
impact on the price of the tomatoes, and the variety of the tomato has no affect on the
quantity of the tomato.

You might also like