Sign Test in R

Last Updated : 04 Jun, 2023

In statistics, the sign test is a non-parametric test used to compare two populations or samples and to test if they are equal or not. The sign test is also known as the Wilcoxon sign-rank test, and it is used when the assumptions of a parametric test cannot be met. In this article, we will explore how to perform the sign test in R. We will start by introducing the concept of the sign test and its assumptions, and then we will discuss how to perform the test using R.

Understanding the Sign Test

The sign test is a non-parametric test used to compare two populations or samples. It is used when the data are not normally distributed or when the variances of the two populations are not equal. The sign test is used to test the null hypothesis that the median of the two populations is equal.

The sign test works by comparing the differences between the pairs of observations in the two populations. For each pair of observations, we calculate the difference between the two observations. We then assign a plus or minus sign to each difference, depending on which observation is larger. If the two observations are equal, we assign a zero. We then count the number of pluses and minus signs and use a binomial distribution to calculate the probability of observing that number of plus signs if the null hypothesis is true.

Assumptions of the Sign Test

The sign test is a non-parametric test and it does not make any assumptions about the underlying distribution of the data. However, there are some assumptions that must be met in order for the sign test to be valid:

The data must be paired. That is, each observation in one population must be paired with an observation in the other population.
The differences between the paired observations must be independent.
The differences between the paired observations must be symmetrically distributed around zero.

Performing the Sign Test in R

To perform the sign test in R Programming Language, we will use the binom.test() function in the stats package. This function takes two arguments: x and y, which are the two populations or samples to be compared.

R

# Create a vector of data 
data <- c(-1, 2, -3, 4, -5, 6, -7, 8) 
  
# Perform sign test using the binomial test function 
binom_test_result <- binom.test(sum(data > 0), length(data), p=0.5, alternative="two.sided") 
  
# View the results 
print(binom_test_result)

Output:

Exact binomial test

data:  sum(data > 0) and length(data)
number of successes = 4, number of trials = 8, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.1570128 0.8429872
sample estimates:
probability of success 
0.5

In this example, we’re using the binom.test function from the stats library to perform a two-sided sign test with a null hypothesis that the median of data is equal to 0. We calculate the number of positive differences from 0 (sum(data > 0)) and pass that along with the total number of differences to the function (length(data)). The p argument specifies the hypothesized probability of success (i.e., the probability that an observation is greater than 0), which we set to 0.5. The output will include the test statistic and p-value.

The given output suggests that an exact binomial test was performed to test whether the true probability of success is significantly different from 0.5, based on a sample of 8 trials where 4 were successful (data > 0). The null hypothesis is that the true probability of success is equal to 0.5, and the alternative hypothesis is that it is not equal to 0.5.
The p-value of 1 indicates that we fail to reject the null hypothesis at the significance level of 0.05, which means that we do not have enough evidence to suggest that the true probability of success is significantly different from 0.5.
The 95% confidence interval for the true probability of success is calculated to be between 0.157 and 0.843. This suggests that the true probability of success could range from quite low (15.7%) to fairly high (84.3%), but we cannot say with certainty where the true probability lies based on this sample alone.
Finally, the sample estimate of the probability of success is calculated to be 0.5, which is the same as the null hypothesis value. This means that the observed proportion of successes in the sample is not significantly different from the expected proportion of successes under the null hypothesis.