data anlytics using r notes
data anlytics using r notes
Madhusudanacharyulu Padakandla
1.Introduction to R?
R is an open-source programming language that is widely used
as a statistical software and data analysis tool. R generally comes with
the Command-line interface. R is available across widely used platforms
like Windows, Linux, and macOS. Also, the R programming language is
the latest cutting-edge tool.
It was designed by Ross Ihaka and Robert Gentleman at the University
of Auckland, New Zealand
R programming is used as a leading tool for machine learning,
statistics, and data analysis.
It’s a platform-independent language. This means it can be applied to all
operating systems.
Why Use R?
Statistical Analysis: R is designed for analysis and It provides
an extensive collection of graphical and statistical techniques, By
making a preferred choice for statisticians and data analysts.
Open Source: R is an open – source software, which means it is
freely available to anyone. It can be accessble by a vibrant
community of users and developers.
Data Visulaization : R boasts an array of libraries like ggplot2
that enable the creation of high-quality, customizable data
visualizations.
Data Manipulation : R offers tools that are for data
manipulation and transformation. For example: IT simplifies the
process of filtering , summarizing and transforming data.
Integration : R can be easily integrate with other programming
languages and data sources. IT has connectors to various
databases and can be used in conjunction with python, SQL and
other tools.
Community and Packages: R has vast ecosystem of packages
that extend its functionality. There are packages that can help
you accomplish needs of analytics.
Creating a vector
Vectors can be created in many ways as shown in the following
example.
# Use of 'c' function
# to combine the values as a vector.
# by default the type will be double
X <- c(1, 4, 5, 2, 6, 7)
print('using c function')
print(X)
# using the seq() function to generate
# a sequence of continuous values
# with different step-size and length.
# length.out defines the length of vector.
Y <- seq(1, 10, length.out = 5)
print('using seq() function')
print(Y)
# using ':' operator to create
# a vector of continuous values.
Z <- 5:10
print('using colon')
print(Y)
Output:
using c function 1 4 5 2 6 7
using seq function 1.00 3.25 5.50 7.75 10.00
using colon 5 6 7 8 9 10
X <- c(2, 5, 8, 1, 2)
print(X[2])
Y <- c(4, 5, 2, 1, 7)
print('using c function')
print(Y[c(4, 1)])
# Logical indexing
Z <- c(5, 2, 1, 4, 4, 3)
print('Logical indexing')
print(Z[Z>3])
Output:
using Subscript operator 5
using c function 1 4
Logical indexing 5 4 4
Modifying a vector
Vectors can be modified using different indexing variations
# Creating a vector
X <- c(2, 5, 1, 7, 8, 2)
# modify a specific element
X[3] <- 11
print('Using subscript operator')
print(X)
# Modify using different logics.
X[X>9] <- 0
print('Logical indexing')
print(X)
# Modify by specifying the position or elements.
X <- X[c(5, 2, 1)]
print('using c function')
print(X)
Output:
Addition 12 11 6 6 53 3
Subtraction -2 -7 4 -4 49 1
Multiplication 35 18 5 5 102 2
Division 0.7142857 0.2222222 5.0000000 0.2000000 25.5000000
2.0000000
Sorting of Vectors
For sorting we use the sort() function which sorts the vector in
ascending order by default.
# Creating a Vector
X <- c(5, 2, 5, 1, 51, 2
# Sort in ascending order
A <- sort(X)
print('sorting done in ascending order')
print(A)
# sort in descending order.
B <- sort(X, decreasing = TRUE)
print('sorting done in descending order')
print(B)
Output:
3.Explain arrays in r?
Arrays are the R data objects which can store data in more than two
dimensions. For example − If we create an array of dimension (2, 3, 4)
then it creates 4 rectangular matrices each with 2 rows and 3 columns.
Arrays can store only data type.
,,2
Example:
# Install and load the dplyr package
install.packages("dplyr")
library(dplyr)
# Create a sample data frame
data <- data.frame(
Name = c("Alice", "Bob", "Charlie", "David", "Emily"),
Age = c(25, 30, 22, 35, 28),
Score = c(95, 80, 60, 75, 90)
)
# Print the original data frame
print("Original data frame:")
print(data)
# Use filter to subset rows based on a condition (e.g., age greater than
25)
filtered_data <- filter(data, Age > 25)
# Print the filtered data frame
print("Filtered data frame (Age > 25):")
print(filtered_data)
Output:
[1] "Original data frame:"
Name Age Score
1 Alice 25 95
2 Bob 30 80
3 Charlie 22 60
4 David 35 75
5 Emily 28 90
R ifelse() Function
In R, the ifelse() function is a shorthand vectorized alternative to the
standard if...else statement.
Most of the functions in R take a vector as input and return a vectorized
output. Similarly, the vector equivalent of the traditional if...else block is
the ifelse() function.
The syntax of the ifelse() function is:
ifelse(test_expression, x, y)
# input vector
Vector recycling in R
We can see vector recycling, when we perform some kind of operations
like addition, subtraction. . . .etc on two vectors of unequal length. The
vector with a small length will be repeated as long as the operation
completes on the longer vector. If we perform an addition operation on a
vector of equal length the first value of vector1 is added with the first
value of vector 2 like that. The below image demonstrated operation on
unequal vectors and operation on equal vector.
So, the repetition of small length vector as long as completion of
operation on long length vector is known as vector recycling. This is
the special property of vectors is available in R language. Let us see the
implementation of vector recycling.
# creating vector with
# 1 to 6 values
vec1=1:6
# creating vector with 1:2
# values
vec2=1:2
# adding vector1 and vector2
print(vec1+vec2)
Output :
Here the longer object length is not multiple of the shortest object length. So,
we got a warning message.
vec1 = c(7, 7, 7, 7, 7, 7, 7)
print(length(unique(vec1)) == 1)
R – Matrices
Creating matrices
Matrix is a rectangular arrangement of numbers in rows and columns. In
a matrix, as we know rows are the ones that run horizontally and
columns are the ones that run vertically. In R programming, matrices are
two-dimensional, homogeneous data structures. These are some
examples of matrices:
To create a matrix in R you need to use the function called matrix(). The
arguments to this matrix() are the set of elements in the vector.
You have to pass how many numbers of rows and how many numbers of
columns you want to have in your matrix.
Note: By default, matrices are in column-wise order.
# R program to create a matrix
A = matrix( c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3,byrow =
TRUE)
rownames(A) = c("a", "b", "c")
colnames(A) = c("d", "e", "f")
cat("The 3x3 matrix:\n")
print(A)
Output:
The 3x3 matrix:
def
a123
b456
c789