R Programming
R Programming
R Programming
http://www.r-project.org/
http://cran.r-project.org/
Hung Chen
Outline
• Introduction: – Grouping, loops and conditional
– Historical development execution
– S, Splus – Function
– Capability • Reading and writing data from
– Statistical Analysis files
• References • Modeling
– Regression
• Calculator
– ANOVA
• Data Type
• Data Analysis on Association
• Resources
– Lottery
• Simulation and Statistical
– Geyser
Tables
– Probability distributions
• Smoothing
• Programming
R, S and S-plus
• S: an interactive environment for data analysis developed at Bell
Laboratories since 1976
– 1988 - S2: RA Becker, JM Chambers, A Wilks
– 1992 - S3: JM Chambers, TJ Hastie
– 1998 - S4: JM Chambers
1.0
[1] 5
0.5
sin(seq(0, 2 * pi, length = 100))
> sqrt(2)
[1] 1.414214
0.0
-0.5
> seq(0, 5, length=6)
[1] 0 1 2 3 4 5 -1.0
0 20 40 60 80 100
Index
Parlance:
• class: the “abstract” definition of it
• object: a concrete instance
• method: other word for ‘function’
• slot: a component of an object
Object orientation
Advantages:
Encapsulation (can use the objects and methods someone else has
written without having to care about the internals)
Generic functions (e.g. plot, print)
Inheritance (hierarchical organization of complexity)
Caveat:
Overcomplicated, baroque program architecture…
variables
> a = 49
> sqrt(a) numeric
[1] 7
> a = (1+1==3)
>a logical
[1] FALSE
vectors, matrices and arrays
• vector: an ordered collection of data of the same type
> a = c(1,2,3)
> a*2
[1] 2 4 6
Example:
>a
localisation tumorsize progress
XX348 proximal 6.3 FALSE
XX234 distal 8.0 TRUE
XX987 proximal 10.0 FALSE
Factors
A character string can contain arbitrary text. Sometimes it is useful to use a limited
vocabulary, with a small number of allowed words. A factor is a variable that can only
take such a limited number of values, which are called levels.
>a
[1] Kolon(Rektum) Magen Magen
[4] Magen Magen Retroperitoneal
[7] Magen Magen(retrogastral) Magen
Levels: Kolon(Rektum) Magen Magen(retrogastral) Retroperitoneal
> class(a)
[1] "factor"
> as.character(a)
[1] "Kolon(Rektum)" "Magen" "Magen"
[4] "Magen" "Magen" "Retroperitoneal"
[7] "Magen" "Magen(retrogastral)" "Magen"
> as.integer(a)
[1] 1 2 2 2 2 4 2 3 2
> as.integer(as.character(a))
[1] NA NA NA NA NA NA NA NA NA NA NA NA
Warning message: NAs introduced by coercion
Subsetting
Individual elements of a vector, matrix, array or data frame are
accessed with “[ ]” by specifying their index, or their name
>a
localisation tumorsize progress
XX348 proximal 6.3 0
XX234 distal 8.0 1
XX987 proximal 10.0 0
> a[3, 2]
[1] 10
> a["XX987", "tumorsize"]
[1] 10
> a["XX987",]
localisation tumorsize progress
XX987 proximal 10 0
>a
localisation tumorsize progress
XX348 proximal 6.3 0 Subsetting
XX234 distal 8.0 1
XX987 proximal 10.0 0
> a[c(1,3),]
localisation tumorsize progress subset rows by a
XX348 proximal 6.3 0 vector of indices
XX987 proximal 10.0 0
> a[c(T,F,T),]
localisation tumorsize progress subset rows by a
XX348 proximal 6.3 0 logical vector
XX987 proximal 10.0 0
> a$localisation
[1] "proximal" "distal" "proximal"
> a$localisation=="proximal" subset a column
[1] TRUE FALSE TRUE
> a[ a$localisation=="proximal", ] comparison resulting in
localisation tumorsize progress logical vector
XX348 proximal 6.3 0
XX987 proximal 10.0 0 subset the selected
rows
Resources
• A package specification allows the production of loadable modules
for specific purposes, and several contributed packages are made
available through the CRAN sites.
• CRAN and R homepage:
– http://www.r-project.org/
It is R’s central homepage, giving information on the R project and
everything related to it.
– http://cran.r-project.org/
It acts as the download area,carrying the software itself, extension packages,
PDF manuals.
• Getting help with functions and features
– help(solve)
– ?solve
– For a feature specified by special characters, the argument must be enclosed in
double or single quotes, making it a “character string”: help("[[")
Getting help
Details about a specific command whose name you know (input
arguments, options, algorithm, results):
>? t.test
or
>help(t.test)
Getting help
if (logical expression) {
statements
} else {
alternative statements
}
for(i in 1:10) {
print(i*i)
}
i=1
while(i<=10) {
print(i*i)
i=i+sqrt(i)
}
lapply, sapply, apply
• When the same or similar tasks need to be performed multiple
times for all elements of a list or for all columns of an array.
• May be easier and faster than “for” loops
• lapply(li, function )
• To each element of the list li, the function function is applied.
• The result is a list whose elements are the individual function
results.
> li = list("klaus","martin","georg")
> lapply(li, toupper)
> [[1]]
> [1] "KLAUS"
> [[2]]
> [1] "MARTIN"
> [[3]]
> [1] "GEORG"
lapply, sapply, apply
sapply( li, fct )
Like apply, but tries to simplify the result, by converting it into a
vector or array of appropriate size
> li = list("klaus","martin","georg")
> sapply(li, toupper)
[1] "KLAUS" "MARTIN" "GEORG"
Example:
add = function(a,b)
{ result = a+b
return(result) }
Operators:
Short-cut writing for frequently used functions of one or two
arguments.
Examples: + - * / ! & | %%
functions and operators
• Functions do things with data
• “Input”: function arguments (0,1,2,…)
• “Output”: function result (exactly one)
Exceptions to the rule:
• Functions may also use data that sits around in other places, not
just in their argument list: “scoping rules”*
• Functions may also do other things than returning a result. E.g.,
plot something on the screen: “side effects”
> x = read.delim(“filename.txt”)
also: read.table, read.csv