How to Use aggregate and Not Drop Rows with NA in R
In R Programming Language the aggregate()
function is used to compute summary statistics by group. By default, aggregate()
drop any rows with missing values (NA) in the grouping columns. However, we can specify the argument na.action = na.pass
to retain rows with NA values during aggregation.
Let us study in detail about how to use aggregate & Not Drop Rows with NA in R
Syntax:
aggregate(formula, data, FUN, na.action = na.pass)
Where:
formula
: A formula specifying the variables to be aggregated and the grouping variable(s).data
: The data frame containing the variables.FUN
: The function to be applied for aggregation (e.g.,mean
,sum
,max
, etc.).na.action
: Specifies how to handle NA values. Settingna.action = na.pass
retains rows with NA values during aggregation.
Aggregating with Sum
In this example, we have a dataset containing two columns: "Group" and "Value" and we will aggregate the sum of "Value" by "Group", and retain rows with NA values during aggregation.
# Create dataframe
df1 <- data.frame(Group = c("A", "B", "A", "B", NA),
Value = c(NA, 2, NA, 4, 5))
# Aggregate with sum and retain rows with NA values
result1 <- aggregate(Value ~ Group, data = df1, FUN = sum, na.action = na.pass)
# Display the result
print(result1)
Output:
Group Value
1 A NA
2 B 6
Aggregating with Custom Function
In this example, we want to find the median of "Rating" within each "Group" in a dataset df with two columns: "Group" and "Rating".Here we apply a custom function to compute the median of "Rating" within each "Group", ensuring that rows with NA values are not dropped during aggregation.
#Program in R to use the aggregate() function in R while retaining rows
# Create dataframe
df4 <- data.frame(Group = c("A", "B", "A", "B", NA),
Rating = c(3.5, 4.2, NA, 3.8, 4.5))
# Custom function to compute median
median_custom <- function(x) {
median(x, na.rm = TRUE)
}
# Aggregate with custom function and retain rows with NA values
result4 <- aggregate(Rating ~ Group, data = df4, FUN = median_custom,
na.action = na.pass)
# Display the result
print(result4)