0% found this document useful (0 votes)
5 views14 pages

data anlytics using r notes

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
5 views14 pages

data anlytics using r notes

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 14

Data analytics using R

Madhusudanacharyulu Padakandla
1.Introduction to R?
R is an open-source programming language that is widely used
as a statistical software and data analysis tool. R generally comes with
the Command-line interface. R is available across widely used platforms
like Windows, Linux, and macOS. Also, the R programming language is
the latest cutting-edge tool.
It was designed by Ross Ihaka and Robert Gentleman at the University
of Auckland, New Zealand
R programming is used as a leading tool for machine learning,
statistics, and data analysis.
It’s a platform-independent language. This means it can be applied to all
operating systems.

Why Use R?
 Statistical Analysis: R is designed for analysis and It provides
an extensive collection of graphical and statistical techniques, By
making a preferred choice for statisticians and data analysts.
 Open Source: R is an open – source software, which means it is
freely available to anyone. It can be accessble by a vibrant
community of users and developers.
 Data Visulaization : R boasts an array of libraries like ggplot2
that enable the creation of high-quality, customizable data
visualizations.
 Data Manipulation : R offers tools that are for data
manipulation and transformation. For example: IT simplifies the
process of filtering , summarizing and transforming data.
 Integration : R can be easily integrate with other programming
languages and data sources. IT has connectors to various
databases and can be used in conjunction with python, SQL and
other tools.
 Community and Packages: R has vast ecosystem of packages
that extend its functionality. There are packages that can help
you accomplish needs of analytics.

2.What is a vector and operations on vector?


Vectors are the most basic data types in R. Even a single object
created is also stored in the form of a vector. Vectors are nothing but
arrays as defined in other languages. Vectors contain a sequence of
homogeneous types of data. If mixed values are given then it auto
converts the data according to the precedence. There are various
operations that can be performed on vectors in R.

Creating a vector
Vectors can be created in many ways as shown in the following
example.
# Use of 'c' function
# to combine the values as a vector.
# by default the type will be double
X <- c(1, 4, 5, 2, 6, 7)
print('using c function')
print(X)
# using the seq() function to generate
# a sequence of continuous values
# with different step-size and length.
# length.out defines the length of vector.
Y <- seq(1, 10, length.out = 5)
print('using seq() function')
print(Y)
# using ':' operator to create
# a vector of continuous values.
Z <- 5:10
print('using colon')
print(Y)
Output:

using c function 1 4 5 2 6 7
using seq function 1.00 3.25 5.50 7.75 10.00
using colon 5 6 7 8 9 10

Accessing vector elements


Vector elements can be accessed in many ways. The most basic is using
the ‘[]’, subscript operator.

# Accessing elements using the position number.

X <- c(2, 5, 8, 1, 2)

print('using Subscript operator')

print(X[2])

# Accessing specific values by passing

# a vector inside another vector.

Y <- c(4, 5, 2, 1, 7)

print('using c function')

print(Y[c(4, 1)])

# Logical indexing

Z <- c(5, 2, 1, 4, 4, 3)

print('Logical indexing')
print(Z[Z>3])

Output:
using Subscript operator 5
using c function 1 4
Logical indexing 5 4 4
Modifying a vector
Vectors can be modified using different indexing variations
# Creating a vector
X <- c(2, 5, 1, 7, 8, 2)
# modify a specific element
X[3] <- 11
print('Using subscript operator')
print(X)
# Modify using different logics.
X[X>9] <- 0
print('Logical indexing')
print(X)
# Modify by specifying the position or elements.
X <- X[c(5, 2, 1)]
print('using c function')
print(X)
Output:

Using subscript operator 2 5 11 7 8 2


Logical indexing 2 5 0 7 8 2
using c function 8 5 2
Deleting a vector
Vectors can be deleted by reassigning them as NULL. To delete a vector
we use the NULL operator.
# Creating a vector
X <- c(5, 2, 1, 6)
# Deleting a vector
X <- NULL
print('Deleted vector')
print(X)
Output:
Deleted vector NULL
Arithmetic operations
We can perform arithmetic operations between 2 vectors. These
operations are performed element-wise and hence the length of both the
vectors should be the same.
# Creating Vectors
X <- c(5, 2, 5, 1, 51, 2)
Y <- c(7, 9, 1, 5, 2, 1)
# Addition
Z <- X + Y
print('Addition')
print(Z)
# Subtraction
S <- X - Y
print('Subtraction')
print(S)
# Multiplication
M <- X * Y
print('Multiplication')
print(M)
# Division
D <- X / Y
print('Division')
print(D)
Output:

Addition 12 11 6 6 53 3
Subtraction -2 -7 4 -4 49 1
Multiplication 35 18 5 5 102 2
Division 0.7142857 0.2222222 5.0000000 0.2000000 25.5000000
2.0000000
Sorting of Vectors
For sorting we use the sort() function which sorts the vector in
ascending order by default.

# Creating a Vector
X <- c(5, 2, 5, 1, 51, 2
# Sort in ascending order
A <- sort(X)
print('sorting done in ascending order')
print(A)
# sort in descending order.
B <- sort(X, decreasing = TRUE)
print('sorting done in descending order')
print(B)
Output:

sorting done in ascending order 1 2 2 5 5 51


sorting done in descending order 51 5 5 2 2 1

3.Explain arrays in r?
Arrays are the R data objects which can store data in more than two
dimensions. For example − If we create an array of dimension (2, 3, 4)
then it creates 4 rectangular matrices each with 2 rows and 3 columns.
Arrays can store only data type.

An array is created using the array() function. It takes vectors as input


and uses the values in the dim parameter to create an array.
Example
The following example creates an array of two 3x3 matrices each with 3
rows and 3 columns.
# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

# Take these vectors as input to the array.


result <- array(c(vector1,vector2),dim = c(3,3,2))
print(result)
output:
,,1

[,1] [,2] [,3]


[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15

,,2

[,1] [,2] [,3]


[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15

4.Data filtering using filter() Function?


The filter() function is used to produce a subset of the data frame,
retaining all rows that satisfy the specified conditions. The filter() method in R
programming language can be applied to both grouped and ungrouped data.
The expressions include comparison operators (==, >, >= ) , logical operators
(&, |, !, xor()) , range operators (between(), near()) as well as NA value check
against the column values. The subset data frame has to be retained in a
separate variable.

Syntax: filter(df , condition)


Parameter :
df: The data frame object
condition: filtering based upon this condition

Example:
# Install and load the dplyr package
install.packages("dplyr")
library(dplyr)
# Create a sample data frame
data <- data.frame(
Name = c("Alice", "Bob", "Charlie", "David", "Emily"),
Age = c(25, 30, 22, 35, 28),
Score = c(95, 80, 60, 75, 90)
)
# Print the original data frame
print("Original data frame:")
print(data)
# Use filter to subset rows based on a condition (e.g., age greater than
25)
filtered_data <- filter(data, Age > 25)
# Print the filtered data frame
print("Filtered data frame (Age > 25):")
print(filtered_data)
Output:
[1] "Original data frame:"
Name Age Score
1 Alice 25 95
2 Bob 30 80
3 Charlie 22 60
4 David 35 75
5 Emily 28 90

[1] "Filtered data frame (Age > 25):"


Name Age Score
1 Alice 25 95
2 Bob 30 80
4 David 35 75
5 Emily 28 90

R ifelse() Function
In R, the ifelse() function is a shorthand vectorized alternative to the
standard if...else statement.
Most of the functions in R take a vector as input and return a vectorized
output. Similarly, the vector equivalent of the traditional if...else block is
the ifelse() function.
The syntax of the ifelse() function is:
ifelse(test_expression, x, y)

The output vector has the element x if the output of


the test_expression is TRUE. If the output is FALSE, then the element in
the output vector will be y.

# input vector

x <- c(12, 9, 23, 14, 20, 1, 5)

# ifelse() function to determine odd/even numbers


ifelse(x %% 2 == 0, "EVEN", "ODD")
Output

[1] "EVEN" "ODD" "ODD" "EVEN" "EVEN" "ODD" "ODD"

Example 2: ifelse() Function for Pass/Fail


# input vector of marks
marks <- c(63, 58, 12, 99, 49, 39, 41, 2)

# ifelse() function to determine pass/fail


ifelse(marks < 40, "FAIL", "PASS")

[1] "PASS" "PASS" "FAIL" "PASS" "PASS" "FAIL" "PASS" "FAIL"

Vector recycling in R
We can see vector recycling, when we perform some kind of operations
like addition, subtraction. . . .etc on two vectors of unequal length. The
vector with a small length will be repeated as long as the operation
completes on the longer vector. If we perform an addition operation on a
vector of equal length the first value of vector1 is added with the first
value of vector 2 like that. The below image demonstrated operation on
unequal vectors and operation on equal vector.
So, the repetition of small length vector as long as completion of
operation on long length vector is known as vector recycling. This is
the special property of vectors is available in R language. Let us see the
implementation of vector recycling.
# creating vector with
# 1 to 6 values
vec1=1:6
# creating vector with 1:2
# values
vec2=1:2
# adding vector1 and vector2
print(vec1+vec2)
Output :

Example 2 : # creating vector with 10 to 14 values


vec1=10:14
# creating vector with 3 to 5 values
vec2=3:5
# adding vector1 and vector2
print(vec1+vec2)
Output :

Here the longer object length is not multiple of the shortest object length. So,
we got a warning message.

Test for Equality of All Vector Elements in R


Method 1: Using variance
We can say that all vector elements are equal if the variance is zero. We
can find variance by using the var() function
Syntax:
var(vector)==0
# consider a vector with same elements
vec1 = c(7, 7, 7, 7, 7, 7, 7)
print(var(vec1) == 0)
# consider a vector with different elements
vec2 = c(17, 27, 37, 47, 57, 7, 7)
print(var(vec2) == 0)
Output:
[1] TRUE
[1] FALSE

Method 2: Using length() and unique() function


By using unique function if all the elements are the same then the length is
1 so by this way if the length is 1, we can say all elements in a vector are
equal.
Syntax:
length(unique(vector))==1
 length() is used to find the length of unique vector
 unique() is used to get the unique values in a vector
# consider a vector with same elements

vec1 = c(7, 7, 7, 7, 7, 7, 7)

print(length(unique(vec1)) == 1)

# consider a vector with different elements

vec2 = c(17, 27, 37, 47, 57, 7, 7)


print(length(unique(vec2)) == 1)
Output:
[1] TRUE
[1] FALSE

R – Matrices
Creating matrices
Matrix is a rectangular arrangement of numbers in rows and columns. In
a matrix, as we know rows are the ones that run horizontally and
columns are the ones that run vertically. In R programming, matrices are
two-dimensional, homogeneous data structures. These are some
examples of matrices:

To create a matrix in R you need to use the function called matrix(). The
arguments to this matrix() are the set of elements in the vector.
You have to pass how many numbers of rows and how many numbers of
columns you want to have in your matrix.
Note: By default, matrices are in column-wise order.
# R program to create a matrix
A = matrix( c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3,byrow =
TRUE)
rownames(A) = c("a", "b", "c")
colnames(A) = c("d", "e", "f")
cat("The 3x3 matrix:\n")
print(A)
Output:
The 3x3 matrix:
def
a123
b456
c789

You might also like