R Session - Note2 - Updated

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

STAL2073 –Introductory Notes # 2 on R –updated Nov 1, 2022

Task # 3: Data Exploratory with Graphics

1) Stem-leaf plot

Consider the following data:

141 148 132 138 154 142 150 146 155 158 150
140 157 148 144 150 149 145 149 159 143 141
144 144 126

Please input this data into R using “scan()” and make this as data1
> data1=scan()

2) Create stem-leaf plot for data1


>stem(data2)

3) Define data2=data1/10 and data3=data1/100. Create stem-leaf plots for data2 and
data3. Spot the differences of these stem-leaf plots to those of data1.

4) Input data below into R and name it as data2

6.68 3.22 5.20 3.91 5.60 3.79 5.97 6.47 8.42 4.61
0.72 3.32 7.71 2.86 6.92 5.25 7.87 1.08 4.61 2.58
10.82 6.65 7.76 2.88 4.06 4.46 7.20 4.48 6.40 0.90
4.29 3.35 1.85 6.01 5.56 5.07 2.33 7.25 5.70 4.40
5.04 4.47 1.50 4.42 3.33 3.04 2.68 3.93 0.99 6.93
6.04 4.96 4.93 3.40 7.03 4.73 3.57 7.70 4.55 3.82
4.41 3.30 2.76 10.05 8.31 5.62 2.49 3.27 4.65 6.58
2.34 0.34 2.10 5.67 5.78 5.90 4.74 5.37 4.08 6.72
2.28 5.91 3.30 4.33 6.10 7.08 2.77 7.52 6.32 4.86
4.61 4.56 4.39 5.05 5.10 6.65 8.05 5.93 4.58 6.25

5) Create stem-leaf plot for data2. Comment on the shape of the data distribution.

> stem(data2)

6) Comes with R is the dataset called PlantGrowth. Type “PlantGrowth” to view this
dataset.

>PlantGrowth

weight group
1 4.17 ctrl
2 5.58 ctrl
3 5.18 ctrl
4 6.11 ctrl
5 4.50 ctrl
6 4.61 ctrl
7 5.17 ctrl
8 4.53 ctrl
9 5.33 ctrl
10 5.14 ctrl
11 4.81 trt1
12 4.17 trt1
13 4.41 trt1
14 3.59 trt1
15 5.87 trt1
16 3.83 trt1
17 6.03 trt1
18 4.89 trt1
19 4.32 trt1
20 4.69 trt1
21 6.31 trt2
22 5.12 trt2
23 5.54 trt2
24 5.50 trt2
25 5.37 trt2
26 5.29 trt2
27 4.92 trt2
28 6.15 trt2
29 5.80 trt2
30 5.26 trt2

7) Show the structure of “PlantGrowth” data with “str()” command

> str(PlantGrowth)

8) Create stem-leaf for PlantGrowth data

>stem(PlantGrowth$weight)

9) Dot Diagram: Create dot diagram using dotchart() command.


>dotchart(PlantGrowth$weight) # does not look nice as no group labelling.

# with group labelling

>dotchart(PlantGrowth$weight,col="red",pch=1,labels=PlantGrowth$group,
main="group vs weight", xlab="weight") # still does not look nice as all groups
have same red colour.

10) Create dot-chart for PlantGrowth data by separating different group with
different colour

#Defining colour for each group


>pg = PlantGrowth
>pg$color[pg$group=="ctrl"] = "red"
>pg$color[pg$group=="trt1"] = "Violet"
>pg$color[pg$group=="trt2"] = "blue"

# plotting dotchart

> dotchart(pg$weight, labels=pg$group,cex=0.8,groups=pg$group, main="group


vs weight", xlab="weight", gcolor="black", color=pg$color)#cex is character size.

11) Creating Histogram

We can plot a histogram by using hist() command and let R use defaults

> hist(data2)

12) hist() has many options for controlling the histogram. For help on hist(),

> help(hist)

Or

>?hist

13) let us use hist() on dataA with options

# First clear frequency table; need to define how many classes and class width

> max(data2)
10.82
> min(data2)
0.34

Let say we want to have 8 classes. Round those numbers and find the class width
by (11 – 0.3)/8 =1.3375 ~1.35

# create “breaks”
> breaks= seq(0.2,11,by=1.35)

# to define lower bound for classes. Make sure there will be 8 breaks and min
and max values are within those breaks.

> data.cut=cut(data2, breaks, right=FALSE)


# to assign each value appropriate class

> data.freq=table(data.cut)
# to compute frequency table

14) Plot a histogram for the

>hist(data2, breaks)

15) Changing “Frequency” in Y to “Probability”

> hist(data2, breaks, freq=F) # you can use “FALSE” instead of “F” or “TRUE”
instead of “T”.

Or instead of using option “freq=F”, we can use “prob=T”, meaning “freq=F” is


equivalent to “prob=T”

> hist(data2, breaks, prob=T)

16) We can also change the limits in y-axis using ylim=c() or in x-axis using
xlim=c()

> hist(data2, breaks, prob=T, ylim=c(0, 0.3))

> hist(data2, breaks, prob=T, ylim=c(0, 0.3), xlim=c(0,12))

17) Also breaks can be directly specified

> hist(data2, breaks=7, prob=T, ylim=c(0, 0.3), xlim=c(0,12))

18) we can also modify the title of the histogram plot using main=” “ and label y-
axis and x-axis using ylab=” “ and xlab=” “.

> hist(data2, breaks=7, prob=T, ylim=c(0, 0.3), xlim=c(0,12), main="Histogram


Example", ylab="probability", xlab="X values")

19)

18) Creating Ploygon: Create polygon from the historgram

> lines(density(data2))
> lines(density(data2), col="red")
> lines(density(data2), col="red", lwd=5)
> par(mfrow=c(3,1)) # if you need to plot several you can use this command.
This one will create 3 plots template in one column.

A B C
0.1223 4.4899 4.3756
0.0429 5.3288 12.3948
0.5716 6.4955 0.6959
3.3221 4.4539 8.9233
5.3692 8.1526 6.5431
9.3402 4.0381 3.8307
7.2930 5.6550 -0.2849
3.3467 6.3295 3.5793
8.3335 5.1704 4.5665
13.8492 6.7619 -2.3470
12.0085 5.6464 5.9609
16.3614 3.4317 0.8885
9.2125 1.3893 4.5288
20.4853 8.7172 6.6811
14.2028 3.7909 0.4767
21.7073 5.2067 3.5587
16.3400 6.1263 6.7503
6.4288 5.2272 -4.1793
8.0042 3.1905 10.1799
19.8347 4.0646 17.1223

We can use data4 to demonstrate this.


> hist(data4$A, main=”Historgram for A”, xlab=”Unit of A”)
> hist(data4$B, main=”Historgram for B”, xlab=”Unit of B”)
> hist(data4$C, main=”Historgram for C”, xlab=”Unit of C”)

[Please watch the youtube listed in UKMfolio to plot histograms with different
breaks: https://www.youtube.com/watch?v=Hj1pgap4UOY]

13) Creating BOXPLOT

create boxplot for data1

6.68 3.22 5.20 3.91 5.60 3.79 5.97 6.47 8.42 4.61
0.72 3.32 7.71 2.86 6.92 5.25 7.87 1.08 4.61 2.58
10.82 6.65 7.76 2.88 4.06 4.46 7.20 4.48 6.40 0.90
4.29 3.35 1.85 6.01 5.56 5.07 2.33 7.25 5.70 4.40
5.04 4.47 1.50 4.42 3.33 3.04 2.68 3.93 0.99 6.93
6.04 4.96 4.93 3.40 7.03 4.73 3.57 7.70 4.55 3.82
4.41 3.30 2.76 10.05 8.31 5.62 2.49 3.27 4.65 6.58
2.34 0.34 2.10 5.67 5.78 5.90 4.74 5.37 4.08 6.72
2.28 5.91 3.30 4.33 6.10 7.08 2.77 7.52 6.32 4.86
4.61 4.56 4.39 5.05 5.10 6.65 8.05 5.93 4.58 6.25

> boxplot(data1)

#You can save the plotted figure in different format (pdf, png, jpeg, wmf)

> pdf(‘sampleplot.pdf’)
> boxplot(data1)
> dev.off()

Another example we can plot several boxplots for data that have several group.
Take PlantGrowth data for example.

> boxplot(pg$weight ~ pg$group, main=”Weight boxplots for different group”,


ylab=”weight”, xlab=”Group”)

14) Creating PIE CHART

example:

Island Perhentian Tioman Tinggi Redang


Sample Size 20 15 30 17

Create a pie chart of sample sizes

> data8=c(20, 15, 30, 17)


> labels=c(‘Pulau Perhentian’, ‘Pulau Tioman’, ‘Pulau Tinggi’,’Pulau Redang’)
> pie(data8,labels)

15) Creating SCATTER PLOT.

Let say we what to plot data4$A and data4$B.

> plot(data4$A, data4$B, pch=19, main=”B vs. A”, xlab=”Unit of A”, ylab=”Unit of
B”) # plot(x,y, ….)

16) Line plots

Let say we want to plot PlantGrowth data for control

> plot(seq(1:10), pg$weight[pg$group==”ctrl”], pch=19, main=”Weight for


CTRL”, xlab=”Sample Number”, ylab=”Weight”)
>lines(seq(1:10), pg$weight[pg$group==’ctrl’], col=’red’, lwd=4)
>abline(pg$weight[pg$group==”ctrl”]~seq(1:10), col=”red”, lwd=4)
Task # 4: Data Exploratory using Various Measures

a) Compute mean, mode and median of data1

> mean(data1)
> median(data1)
> as.numeric(names(table(data1))[which.max(table(data1))]) # mode

Or can use this command

> getmode=function(v){uniqv=uniq(v)
uniqv[which.max(tabulate(match(v,uniqv)))]}

>getmode(data1)

b) Compute variance and standard deviation

>var(data1)
>sd(data1)

c) Compute 25th, 50th and 75th percentile

> quantile(data1, 0.25) # 25th percentile


> quantile(data1, 0.50) # 50th percentile
> quantile(data1, 0.75) # 75th percentile

> quantile(data1, c(0.25, .50, .75)) # all in three

We can also use “summary()” command

> summary(data1)

d) Shape measure

> kurtosis(data1) #
> skewness (data1)

You might also like