R Session - Note2 - Updated

STAL2073 –Introductory Notes # 2 on R –updated Nov 1, 2022
Task # 3: Data Exploratory with Graphics
1) Stem-leaf plot
Consider the following data:
141 148 132 138 154 142 150 146 155 158 150
140 157 148 144 150 149 145 149 159 143 141
144 144 126
Please input this data into R using “scan()” and make this as data1
> data1=scan()
2) Create stem-leaf plot for data1

>stem(data2)
3) Define data2=data1/10 and data3=data1/100. Create stem-leaf plots for data2 and
data3. Spot the differences of these stem-leaf plots to those of data1.
4) Input data below into R and name it as data2
6.68 3.22 5.20 3.91 5.60 3.79 5.97 6.47 8.42 4.61
0.72 3.32 7.71 2.86 6.92 5.25 7.87 1.08 4.61 2.58
10.82 6.65 7.76 2.88 4.06 4.46 7.20 4.48 6.40 0.90
4.29 3.35 1.85 6.01 5.56 5.07 2.33 7.25 5.70 4.40
5.04 4.47 1.50 4.42 3.33 3.04 2.68 3.93 0.99 6.93
6.04 4.96 4.93 3.40 7.03 4.73 3.57 7.70 4.55 3.82
4.41 3.30 2.76 10.05 8.31 5.62 2.49 3.27 4.65 6.58
2.34 0.34 2.10 5.67 5.78 5.90 4.74 5.37 4.08 6.72
2.28 5.91 3.30 4.33 6.10 7.08 2.77 7.52 6.32 4.86
4.61 4.56 4.39 5.05 5.10 6.65 8.05 5.93 4.58 6.25
5) Create stem-leaf plot for data2. Comment on the shape of the data distribution.
> stem(data2)
6) Comes with R is the dataset called PlantGrowth. Type “PlantGrowth” to view this
dataset.
>PlantGrowth
weight group
1 4.17 ctrl
2 5.58 ctrl
3 5.18 ctrl
4 6.11 ctrl
5 4.50 ctrl
6 4.61 ctrl
7 5.17 ctrl
8 4.53 ctrl
9 5.33 ctrl
10 5.14 ctrl
11 4.81 trt1
12 4.17 trt1
13 4.41 trt1
14 3.59 trt1
15 5.87 trt1
16 3.83 trt1
17 6.03 trt1
18 4.89 trt1
19 4.32 trt1
20 4.69 trt1
21 6.31 trt2
22 5.12 trt2
23 5.54 trt2
24 5.50 trt2
25 5.37 trt2
26 5.29 trt2
27 4.92 trt2
28 6.15 trt2
29 5.80 trt2
30 5.26 trt2
7) Show the structure of “PlantGrowth” data with “str()” command
> str(PlantGrowth)
8) Create stem-leaf for PlantGrowth data
>stem(PlantGrowth$weight)
9) Dot Diagram: Create dot diagram using dotchart() command.

>dotchart(PlantGrowth$weight) # does not look nice as no group labelling.
# with group labelling
>dotchart(PlantGrowth$weight,col="red",pch=1,labels=PlantGrowth$group,
main="group vs weight", xlab="weight") # still does not look nice as all groups
have same red colour.
10) Create dot-chart for PlantGrowth data by separating different group with
different colour
#Defining colour for each group

>pg = PlantGrowth
>pg$color[pg$group=="ctrl"] = "red"
>pg$color[pg$group=="trt1"] = "Violet"
>pg$color[pg$group=="trt2"] = "blue"
# plotting dotchart
> dotchart(pg$weight, labels=pg$group,cex=0.8,groups=pg$group, main="group

vs weight", xlab="weight", gcolor="black", color=pg$color)#cex is character size.
11) Creating Histogram
We can plot a histogram by using hist() command and let R use defaults
> hist(data2)
12) hist() has many options for controlling the histogram. For help on hist(),
> help(hist)
Or
>?hist
13) let us use hist() on dataA with options
# First clear frequency table; need to define how many classes and class width
> max(data2)
10.82
> min(data2)
0.34
Let say we want to have 8 classes. Round those numbers and find the class width
by (11 – 0.3)/8 =1.3375 ~1.35
# create “breaks”
> breaks= seq(0.2,11,by=1.35)
# to define lower bound for classes. Make sure there will be 8 breaks and min
and max values are within those breaks.
> data.cut=cut(data2, breaks, right=FALSE)

# to assign each value appropriate class
> data.freq=table(data.cut)
# to compute frequency table
14) Plot a histogram for the
>hist(data2, breaks)
15) Changing “Frequency” in Y to “Probability”
> hist(data2, breaks, freq=F) # you can use “FALSE” instead of “F” or “TRUE”
instead of “T”.
Or instead of using option “freq=F”, we can use “prob=T”, meaning “freq=F” is

equivalent to “prob=T”
> hist(data2, breaks, prob=T)
16) We can also change the limits in y-axis using ylim=c() or in x-axis using
xlim=c()
> hist(data2, breaks, prob=T, ylim=c(0, 0.3))
> hist(data2, breaks, prob=T, ylim=c(0, 0.3), xlim=c(0,12))
17) Also breaks can be directly specified
> hist(data2, breaks=7, prob=T, ylim=c(0, 0.3), xlim=c(0,12))
18) we can also modify the title of the histogram plot using main=” “ and label y-
axis and x-axis using ylab=” “ and xlab=” “.
> hist(data2, breaks=7, prob=T, ylim=c(0, 0.3), xlim=c(0,12), main="Histogram

Example", ylab="probability", xlab="X values")
19)
18) Creating Ploygon: Create polygon from the historgram
> lines(density(data2))
> lines(density(data2), col="red")
> lines(density(data2), col="red", lwd=5)
> par(mfrow=c(3,1)) # if you need to plot several you can use this command.
This one will create 3 plots template in one column.
A B C
0.1223 4.4899 4.3756
0.0429 5.3288 12.3948
0.5716 6.4955 0.6959
3.3221 4.4539 8.9233
5.3692 8.1526 6.5431
9.3402 4.0381 3.8307
7.2930 5.6550 -0.2849
3.3467 6.3295 3.5793
8.3335 5.1704 4.5665
13.8492 6.7619 -2.3470
12.0085 5.6464 5.9609
16.3614 3.4317 0.8885
9.2125 1.3893 4.5288
20.4853 8.7172 6.6811
14.2028 3.7909 0.4767
21.7073 5.2067 3.5587
16.3400 6.1263 6.7503
6.4288 5.2272 -4.1793
8.0042 3.1905 10.1799
19.8347 4.0646 17.1223
We can use data4 to demonstrate this.

> hist(data4$A, main=”Historgram for A”, xlab=”Unit of A”)
> hist(data4$B, main=”Historgram for B”, xlab=”Unit of B”)
> hist(data4$C, main=”Historgram for C”, xlab=”Unit of C”)
[Please watch the youtube listed in UKMfolio to plot histograms with different
breaks: https://www.youtube.com/watch?v=Hj1pgap4UOY]
13) Creating BOXPLOT
create boxplot for data1
6.68 3.22 5.20 3.91 5.60 3.79 5.97 6.47 8.42 4.61
0.72 3.32 7.71 2.86 6.92 5.25 7.87 1.08 4.61 2.58
10.82 6.65 7.76 2.88 4.06 4.46 7.20 4.48 6.40 0.90
4.29 3.35 1.85 6.01 5.56 5.07 2.33 7.25 5.70 4.40
5.04 4.47 1.50 4.42 3.33 3.04 2.68 3.93 0.99 6.93
6.04 4.96 4.93 3.40 7.03 4.73 3.57 7.70 4.55 3.82
4.41 3.30 2.76 10.05 8.31 5.62 2.49 3.27 4.65 6.58
2.34 0.34 2.10 5.67 5.78 5.90 4.74 5.37 4.08 6.72
2.28 5.91 3.30 4.33 6.10 7.08 2.77 7.52 6.32 4.86
4.61 4.56 4.39 5.05 5.10 6.65 8.05 5.93 4.58 6.25
> boxplot(data1)
#You can save the plotted figure in different format (pdf, png, jpeg, wmf)
> pdf(‘sampleplot.pdf’)
> boxplot(data1)
> dev.off()
Another example we can plot several boxplots for data that have several group.
Take PlantGrowth data for example.
> boxplot(pg$weight ~ pg$group, main=”Weight boxplots for different group”,

ylab=”weight”, xlab=”Group”)
14) Creating PIE CHART
example:
Island Perhentian Tioman Tinggi Redang

Sample Size 20 15 30 17
Create a pie chart of sample sizes
> data8=c(20, 15, 30, 17)

> labels=c(‘Pulau Perhentian’, ‘Pulau Tioman’, ‘Pulau Tinggi’,’Pulau Redang’)
> pie(data8,labels)
15) Creating SCATTER PLOT.
Let say we what to plot data4$A and data4$B.
> plot(data4$A, data4$B, pch=19, main=”B vs. A”, xlab=”Unit of A”, ylab=”Unit of
B”) # plot(x,y, ….)
16) Line plots
Let say we want to plot PlantGrowth data for control
> plot(seq(1:10), pg$weight[pg$group==”ctrl”], pch=19, main=”Weight for

CTRL”, xlab=”Sample Number”, ylab=”Weight”)
>lines(seq(1:10), pg$weight[pg$group==’ctrl’], col=’red’, lwd=4)
>abline(pg$weight[pg$group==”ctrl”]~seq(1:10), col=”red”, lwd=4)
Task # 4: Data Exploratory using Various Measures
a) Compute mean, mode and median of data1
> mean(data1)
> median(data1)
> as.numeric(names(table(data1))[which.max(table(data1))]) # mode
Or can use this command
> getmode=function(v){uniqv=uniq(v)
uniqv[which.max(tabulate(match(v,uniqv)))]}
>getmode(data1)
b) Compute variance and standard deviation
>var(data1)
>sd(data1)
c) Compute 25th, 50th and 75th percentile
> quantile(data1, 0.25) # 25th percentile

> quantile(data1, c(0.25, .50, .75)) # all in three
We can also use “summary()” command
> summary(data1)
d) Shape measure
> kurtosis(data1) #
> skewness (data1)

R Session - Note2 - Updated

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

R Session - Note2 - Updated

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

R Session - Note2 - Updated

Uploaded by

Copyright:

Available Formats

STAL2073 –Introductory Notes # 2 on R –updated Nov 1, 2022

Task # 3: Data Exploratory with Graphics

Consider the following data:

2) Create stem-leaf plot for data1

4) Input data below into R and name it as data2

7) Show the structure of “PlantGrowth” data with “str()” command

8) Create stem-leaf for PlantGrowth data

9) Dot Diagram: Create dot diagram using dotchart() command.

# with group labelling

#Defining colour for each group

> dotchart(pg$weight, labels=pg$group,cex=0.8,groups=pg$group, main="group

11) Creating Histogram

13) let us use hist() on dataA with options

> data.cut=cut(data2, breaks, right=FALSE)

14) Plot a histogram for the

15) Changing “Frequency” in Y to “Probability”

Or instead of using option “freq=F”, we can use “prob=T”, meaning “freq=F” is

> hist(data2, breaks, prob=T)

> hist(data2, breaks, prob=T, ylim=c(0, 0.3))

> hist(data2, breaks, prob=T, ylim=c(0, 0.3), xlim=c(0,12))

17) Also breaks can be directly specified

> hist(data2, breaks=7, prob=T, ylim=c(0, 0.3), xlim=c(0,12))

> hist(data2, breaks=7, prob=T, ylim=c(0, 0.3), xlim=c(0,12), main="Histogram

18) Creating Ploygon: Create polygon from the historgram

We can use data4 to demonstrate this.

13) Creating BOXPLOT

create boxplot for data1

> boxplot(pg$weight ~ pg$group, main=”Weight boxplots for different group”,

14) Creating PIE CHART

Island Perhentian Tioman Tinggi Redang

Create a pie chart of sample sizes

> data8=c(20, 15, 30, 17)

15) Creating SCATTER PLOT.

Let say we what to plot data4$A and data4$B.

16) Line plots

Let say we want to plot PlantGrowth data for control

> plot(seq(1:10), pg$weight[pg$group==”ctrl”], pch=19, main=”Weight for

a) Compute mean, mode and median of data1

Or can use this command

b) Compute variance and standard deviation

c) Compute 25th, 50th and 75th percentile

> quantile(data1, 0.25) # 25th percentile

> quantile(data1, c(0.25, .50, .75)) # all in three

We can also use “summary()” command

You might also like