0% found this document useful (0 votes)

110 views

Factors in R

Factor variables in R represent categorical data and are created using the factor() function. Factor levels can be accessed and modified. A data frame combines variables of different types into a table with rows and columns. Data frames allow for the storage of mixed data types and subsetting of rows and columns. New columns can be added to extend a data frame.

Uploaded by

Karthikeyan Ramajayam

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

110 views

Factors in R

Uploaded by

Karthikeyan Ramajayam

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Factors

Factor variables represent categories or groups in your data. The function factor() can be used to
create a factor variable.

Create a factor
# Create a factor variable
friend_groups <- factor(c(1, 2, 1, 2))
friend_groups
[1] 1 2 1 2
Levels: 1 2

The variable friend_groups contains two categories of friends: 1 and 2. In R terminology,

categories are called factor levels.

It’s possible to access to the factor levels using the function levels():

# Get group names (or levels)

levels(friend_groups)
[1] "1" "2"
# Change levels
levels(friend_groups) <- c("best_friend", "not_best_friend")
friend_groups
[1] best_friend not_best_friend best_friend not_best_friend
Levels: best_friend not_best_friend

Note that, R orders factor levels alphabetically. If you want a different order in the levels, you
can specify the levels argument in the factor function as follow.

# Change the order of levels

friend_groups <- factor(friend_groups,
levels = c("not_best_friend", "best_friend"))
# Print
friend_groups
[1] best_friend not_best_friend best_friend not_best_friend
Levels: not_best_friend best_friend

Note that:

 The function is.factor() can be used to check whether a variable is a factor. Results are
TRUE (if factor) or FALSE (if not factor)
 The function as.factor() can be used to convert a variable to a factor.
# Check if friend_groups is a factor
is.factor(friend_groups)
[1] TRUE
# Check if "are_married" is a factor
is.factor(are_married)
[1] FALSE
# Convert "are_married" as a factor
as.factor(are_married)
[1] TRUE FALSE TRUE TRUE
Levels: FALSE TRUE

Calculations with factors

 If you want to know the number of individuals in each levels, use the function
summary():

summary(friend_groups)
not_best_friend best_friend
2 2

 In the following example, I want to compute the mean salary of my friends by groups.
The function tapply() can be used to apply a function, here mean(), to each group.

# Salaries of my friends
salaries
Nicolas Thierry Bernard Jerome
2000 1800 2500 3000
# Friend groups
friend_groups
[1] best_friend not_best_friend best_friend not_best_friend
Levels: not_best_friend best_friend
# Compute the mean salaries by groups
mean_salaries <- tapply(salaries, friend_groups, mean)
mean_salaries
not_best_friend best_friend
2400 2250
# Compute the size/length of each group
tapply(salaries, friend_groups, length)
not_best_friend best_friend
2 2

 It’s also possible to use the function table() to create a frequency table, also known as a
contingency table of the counts at each combination of factor levels.

table(friend_groups)
friend_groups
not_best_friend best_friend
2 2
# Cross-tabulation between
# friend_groups and are_married variables
table(friend_groups, are_married)
are_married
friend_groups FALSE TRUE
not_best_friend 1 1
best_friend 0 2

Data frames
A data frame is like a matrix but can have columns with different types (numeric, character,
logical). Rows are observations (individuals) and columns are variables.

Create a data frame

A data frame can be created using the function data.frame(), as follow:

# Create a data frame

friends_data <- data.frame(
name = my_friends,
age = friend_ages,
height = c(180, 170, 185, 169),
married = are_married
)
# Print
friends_data
name age height married
Nicolas Nicolas 27 180 TRUE
Thierry Thierry 25 170 FALSE
Bernard Bernard 29 185 TRUE
Jerome Jerome 26 169 TRUE

To check whether a data is a data frame, use the is.data.frame() function. Returns TRUE if the
data is a data frame:

is.data.frame(friends_data)
[1] TRUE
is.data.frame(my_data)
[1] FALSE

The object “friends_data” is a data frame, but not the object “my_data”. We can convert-it to a
data frame using the as.data.frame() function:

# What is the class of my_data? --> matrix

class(my_data)
[1] "matrix"
# Convert it as a data frame
my_data2 <- as.data.frame(my_data)
# Now, the class is data.frame
class(my_data2)
[1] "data.frame"

As described in matrix section, you can use the function t() to transpose a data frame:

t(friends_data)

Subset a data frame

To select just certain columns from a data frame, you can either refer to the columns by name or
by their location (i.e., column 1, 2, 3, etc.).

1. Positive indexing by name and by location

# Access the data in 'name' column

# dollar sign is used
friends_data$name
[1] Nicolas Thierry Bernard Jerome
Levels: Bernard Jerome Nicolas Thierry
# or use this
friends_data[, 'name']
[1] Nicolas Thierry Bernard Jerome
Levels: Bernard Jerome Nicolas Thierry
# Subset columns 1 and 3
friends_data[ , c(1, 3)]
name height
Nicolas Nicolas 180
Thierry Thierry 170
Bernard Bernard 185
Jerome Jerome 169

2. Negative indexing

# Exclude column 1
friends_data[, -1]
age height married
Nicolas 27 180 TRUE
Thierry 25 170 FALSE
Bernard 29 185 TRUE
Jerome 26 169 TRUE

3. Index by characteristics

We want to select all friends with age >= 27.

# Identify rows that meet the condition

friends_data$age >= 27
[1] TRUE FALSE TRUE FALSE

TRUE specifies that the row contains a value of age >= 27.
# Select the rows that meet the condition
friends_data[friends_data$age >= 27, ]
name age height married
Nicolas Nicolas 27 180 TRUE
Bernard Bernard 29 185 TRUE

The R code above, tells R to get all rows from friends_data where age >= 27, and then to return
all the columns.

If you don’t want to see all the column data for the selected rows but are just interested in
displaying, for example, friend names and age for friends with age >= 27, you could use the
following R code:

# Use column locations

friends_data[friends_data$age >= 27, c(1, 2)]
name age
Nicolas Nicolas 27
Bernard Bernard 29
# Or use column names
friends_data[friends_data$age >= 27, c("name", "age")]
name age
Nicolas Nicolas 27
Bernard Bernard 29

If you’re finding that your selection statement is starting to be inconvenient, you can put your
row and column selections into variables first, such as:

age27 <- friends_data$age >= 27

cols <- c("name", "age")

Then you can select the rows and columns with those variables:

friends_data[age27, cols]
name age
Nicolas Nicolas 27
Bernard Bernard 29

It’s also possible to use the function subset() as follow.

# Select friends data with age >= 27

subset(friends_data, age >= 27)
name age height married
Nicolas Nicolas 27 180 TRUE
Bernard Bernard 29 185 TRUE

Another option is to use the functions attach() and detach(). The function attach() takes a data
frame and makes its columns accessible by simply giving their names.

The functions attach() and detach() can be used as follow:

# Attach a data frame

attach(friends_data)
# === Data manipulation ====
friends_data[age>=27, ]
# === End of data manipulation ====
# Detach the data frame
detach(friends_data)

Extend a data frame

Add new column in a data frame

# Add group column to friends_data

friends_data$group <- friend_groups
friends_data
name age height married group
Nicolas Nicolas 27 180 TRUE best_friend
Thierry Thierry 25 170 FALSE not_best_friend
Bernard Bernard 29 185 TRUE best_friend
Jerome Jerome 26 169 TRUE not_best_friend

It’s also possible to use the functions cbind() and rbind() to extend a data frame.

cbind(friends_data, group = friend_groups)

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Leonid Hurwicz, Stanley Reiter Designing Economic Mechanisms
No ratings yet
Leonid Hurwicz, Stanley Reiter Designing Economic Mechanisms
356 pages
Artificial Intelligence: Chapter 6: Representing Knowledge Using Rules
No ratings yet
Artificial Intelligence: Chapter 6: Representing Knowledge Using Rules
54 pages
AI in Gaming
No ratings yet
AI in Gaming
7 pages
R Language 1st Unit Deep
100% (3)
R Language 1st Unit Deep
61 pages
5th sem
No ratings yet
5th sem
9 pages
Unit-5 Control Statements
No ratings yet
Unit-5 Control Statements
16 pages
Data Structure Unit 5 (Searching and Sorting Notes)
No ratings yet
Data Structure Unit 5 (Searching and Sorting Notes)
26 pages
Artificial Intelligence: Using Predicate Logic
No ratings yet
Artificial Intelligence: Using Predicate Logic
64 pages
Classical Crypto
No ratings yet
Classical Crypto
16 pages
Data Preprocessing: L1+ Freq
No ratings yet
Data Preprocessing: L1+ Freq
13 pages
Prolog PPT Updated
No ratings yet
Prolog PPT Updated
45 pages
M.Tech JNTUK ADS UNIT-3
No ratings yet
M.Tech JNTUK ADS UNIT-3
13 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
Data Wrangling
No ratings yet
Data Wrangling
13 pages
Data Preprocessing in Python - Handling Missing Data
No ratings yet
Data Preprocessing in Python - Handling Missing Data
8 pages
Module3-Fitting A Model To Data
No ratings yet
Module3-Fitting A Model To Data
57 pages
R Programming Notes
No ratings yet
R Programming Notes
113 pages
Machine Learning: in Telugu
No ratings yet
Machine Learning: in Telugu
14 pages
Data Mining - IMT Nagpur-Manish
No ratings yet
Data Mining - IMT Nagpur-Manish
82 pages
Artificial Intelligence Lab Manual: Python
No ratings yet
Artificial Intelligence Lab Manual: Python
15 pages
C Programming Question Bank
No ratings yet
C Programming Question Bank
3 pages
Schema Refinement and Normal Forms: UNIT-4
No ratings yet
Schema Refinement and Normal Forms: UNIT-4
10 pages
06 Linux Shell Programming
No ratings yet
06 Linux Shell Programming
59 pages
MCS 021 Data and File Structures
No ratings yet
MCS 021 Data and File Structures
22 pages
C Programming 7
No ratings yet
C Programming 7
21 pages
Dev Answer Key
100% (1)
Dev Answer Key
17 pages
Practical 5: Introduction To Weka For Classfication
100% (1)
Practical 5: Introduction To Weka For Classfication
4 pages
Data Structures and Algorithms - Intro
No ratings yet
Data Structures and Algorithms - Intro
20 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Data Structure Using C UNIT-I
No ratings yet
Data Structure Using C UNIT-I
14 pages
Data Visualization
No ratings yet
Data Visualization
9 pages
Network Layer: Delivery, Forwarding, and Routing
No ratings yet
Network Layer: Delivery, Forwarding, and Routing
62 pages
Chapter 2 Introduction To R and Python
No ratings yet
Chapter 2 Introduction To R and Python
35 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
Unit II Visualizing Using Matplotlib
No ratings yet
Unit II Visualizing Using Matplotlib
24 pages
Adsa Lab Manual
No ratings yet
Adsa Lab Manual
52 pages
Chapter-1:-Introduction To R Language: 1.1 History and Overview
No ratings yet
Chapter-1:-Introduction To R Language: 1.1 History and Overview
7 pages
Adina Institute of Science & Technology: Department of Computer Science & Engg. M.Tech CSE-II Sem Lab Manuals MCSE - 203
100% (1)
Adina Institute of Science & Technology: Department of Computer Science & Engg. M.Tech CSE-II Sem Lab Manuals MCSE - 203
22 pages
Chapter-V CLASSIFICATION & CLUSTERING
No ratings yet
Chapter-V CLASSIFICATION & CLUSTERING
153 pages
Data Mining
100% (4)
Data Mining
9 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
Advance Data Structures: Tries
No ratings yet
Advance Data Structures: Tries
26 pages
Data Structure - UNIT-1
No ratings yet
Data Structure - UNIT-1
28 pages
M.Tech JNTUK ADS UNIT-2
100% (1)
M.Tech JNTUK ADS UNIT-2
20 pages
Unit V Data Visualization
No ratings yet
Unit V Data Visualization
49 pages
Array Data Structure Lect-3
No ratings yet
Array Data Structure Lect-3
16 pages
Tutorial On "R" Programming Language
No ratings yet
Tutorial On "R" Programming Language
25 pages
BioPerl Tutorial
100% (1)
BioPerl Tutorial
12 pages
Unit 1
No ratings yet
Unit 1
139 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
42 pages
Practical List of DBMS
No ratings yet
Practical List of DBMS
19 pages
Data Mining Handout
No ratings yet
Data Mining Handout
4 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Data Models in DBMS
No ratings yet
Data Models in DBMS
5 pages
Unit 1: Daa Two Mark Question and Answer 1
No ratings yet
Unit 1: Daa Two Mark Question and Answer 1
22 pages
Research Paper Presentation Pandas Moshiul Arefin
No ratings yet
Research Paper Presentation Pandas Moshiul Arefin
30 pages
Unit 2 - Data Structure - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Data Structure - WWW - Rgpvnotes.in
22 pages
Unit2
No ratings yet
Unit2
76 pages
Statistics and Data Science with R Part -4
No ratings yet
Statistics and Data Science with R Part -4
23 pages
BT1101 - R Code Cheatsheet 1.0
No ratings yet
BT1101 - R Code Cheatsheet 1.0
12 pages
Design and Analysis of Algorithms Sy
No ratings yet
Design and Analysis of Algorithms Sy
2 pages
Dbms-Question Bank
No ratings yet
Dbms-Question Bank
10 pages
Unit 5
No ratings yet
Unit 5
38 pages
Multiplexing in Mobile Computing
No ratings yet
Multiplexing in Mobile Computing
5 pages
Programming in C++ and Data Structures - 5
No ratings yet
Programming in C++ and Data Structures - 5
1 page
Basic Arithmetic Operations
No ratings yet
Basic Arithmetic Operations
1 page
Programming in C++ and Data Structures - 1
No ratings yet
Programming in C++ and Data Structures - 1
2 pages
Example 4 - Finding The Average of Numbers in A List
No ratings yet
Example 4 - Finding The Average of Numbers in A List
2 pages
Artificial Intelligence Notes PDF
100% (3)
Artificial Intelligence Notes PDF
2 pages
Example Program 2 - Address Program: Interface
No ratings yet
Example Program 2 - Address Program: Interface
6 pages
Iteration - Condition Controlled: WHILE Loops
No ratings yet
Iteration - Condition Controlled: WHILE Loops
3 pages
Example Program 2 - Adding User Numbers Program: Code When Btnadd Is Clicked
No ratings yet
Example Program 2 - Adding User Numbers Program: Code When Btnadd Is Clicked
2 pages
Example Program 1 - Capital City
No ratings yet
Example Program 1 - Capital City
2 pages
Example Program 1 VB
No ratings yet
Example Program 1 VB
2 pages
Introduction To Asp
No ratings yet
Introduction To Asp
1 page
Difference Between VB and VB
No ratings yet
Difference Between VB and VB
2 pages
What Is Machine Learning (ML) ?
No ratings yet
What Is Machine Learning (ML) ?
2 pages
Languages. in Order To Achieve This Certain Rules Must Be Laid and All The Languages 6
No ratings yet
Languages. in Order To Achieve This Certain Rules Must Be Laid and All The Languages 6
2 pages
Difference Between VB and VB
No ratings yet
Difference Between VB and VB
2 pages
Computer Graphics Is A Sub
No ratings yet
Computer Graphics Is A Sub
1 page
Serializability (Allow Concurrency) : Serial Non-Serial
No ratings yet
Serializability (Allow Concurrency) : Serial Non-Serial
2 pages
Data Flow Diagra1
No ratings yet
Data Flow Diagra1
3 pages
Distributed Database Architecture
No ratings yet
Distributed Database Architecture
2 pages
O Maths ETextbook Zimsec
No ratings yet
O Maths ETextbook Zimsec
45 pages
10.4 Exponential Equations PDF
100% (2)
10.4 Exponential Equations PDF
5 pages
(ε, δ) _epsilon Delta_ Definitions of Limits
No ratings yet
(ε, δ) _epsilon Delta_ Definitions of Limits
12 pages
IT - Revision Worksheets
100% (1)
IT - Revision Worksheets
8 pages
Use The Graphs To Answer The Questions Below. 1. 2.: Alg Ii Honors Name Period Assignment
No ratings yet
Use The Graphs To Answer The Questions Below. 1. 2.: Alg Ii Honors Name Period Assignment
2 pages
GMDH
No ratings yet
GMDH
5 pages
Indian Forest Service (IFoS-IFS) Mains Exam Maths Optional Real Analysis Previous Year Questions (PYQs) From 2020 To 2009
No ratings yet
Indian Forest Service (IFoS-IFS) Mains Exam Maths Optional Real Analysis Previous Year Questions (PYQs) From 2020 To 2009
6 pages
ct4BdB-week_6_7
No ratings yet
ct4BdB-week_6_7
62 pages
Fundamental of Mathematics II
No ratings yet
Fundamental of Mathematics II
42 pages
GenMath11 Q1 Mod26 Domain and Range of Logarithmic Functions 08082020
No ratings yet
GenMath11 Q1 Mod26 Domain and Range of Logarithmic Functions 08082020
19 pages
Operation On Functions
No ratings yet
Operation On Functions
1 page
Cse 1051 2022 06 29
No ratings yet
Cse 1051 2022 06 29
3 pages
Target 50
No ratings yet
Target 50
13 pages
LabVIEW Control Implementation Tutorial
No ratings yet
LabVIEW Control Implementation Tutorial
21 pages
DTU New Syllabus For EEE
100% (1)
DTU New Syllabus For EEE
84 pages
Algebra 1 Textbook Aligment - Big Ideas (1)
No ratings yet
Algebra 1 Textbook Aligment - Big Ideas (1)
23 pages
Get SAS 9 1 3 Language Reference Dictionary 5th ed Edition Sas Publishing PDF ebook with Full Chapters Now
100% (1)
Get SAS 9 1 3 Language Reference Dictionary 5th ed Edition Sas Publishing PDF ebook with Full Chapters Now
67 pages
Garcia 2004 Application of The Rosin-Rammler and Gates-Gaudin-Schuhmann Models To The Particle Size Distribution Analysis of Agglomerated Cork
100% (1)
Garcia 2004 Application of The Rosin-Rammler and Gates-Gaudin-Schuhmann Models To The Particle Size Distribution Analysis of Agglomerated Cork
6 pages
Real Analysis-Final
No ratings yet
Real Analysis-Final
175 pages
Kerala University B.Sc. Syllabus
100% (1)
Kerala University B.Sc. Syllabus
71 pages
What Is Output For 2 2 3
No ratings yet
What Is Output For 2 2 3
9 pages
A2 Chapter 3 Notes & HW 6
No ratings yet
A2 Chapter 3 Notes & HW 6
46 pages
Math8 Quarter 2 Module 7
No ratings yet
Math8 Quarter 2 Module 7
27 pages
Program Life Cycle: Steps To Follow in Writing or Creating A Program
No ratings yet
Program Life Cycle: Steps To Follow in Writing or Creating A Program
4 pages
Maths Project
No ratings yet
Maths Project
60 pages
Object Oriented Programming Name: TE-53 Sec: Lab Task 1 Clo 1,2 Plo 1
No ratings yet
Object Oriented Programming Name: TE-53 Sec: Lab Task 1 Clo 1,2 Plo 1
3 pages
8th Grade Math Welcome Letter 2015
No ratings yet
8th Grade Math Welcome Letter 2015
3 pages
Gen Math Examination
No ratings yet
Gen Math Examination
6 pages