In statistics, the sign test is a non-parametric test used to compare two populations or samples and to test if they are equal or not. The sign test is also known as the Wilcoxon sign-rank test, and it is used when the assumptions of a parametric test cannot be met. In this article, we will explore how to perform the sign test in R. We will start by introducing the concept of the sign test and its assumptions, and then we will discuss how to perform the test using R.
Understanding the Sign Test
The sign test is a non-parametric test used to compare two populations or samples. It is used when the data are not normally distributed or when the variances of the two populations are not equal. The sign test is used to test the null hypothesis that the median of the two populations is equal.
The sign test works by comparing the differences between the pairs of observations in the two populations. For each pair of observations, we calculate the difference between the two observations. We then assign a plus or minus sign to each difference, depending on which observation is larger. If the two observations are equal, we assign a zero. We then count the number of pluses and minus signs and use a binomial distribution to calculate the probability of observing that number of plus signs if the null hypothesis is true.
Assumptions of the Sign Test
The sign test is a non-parametric test and it does not make any assumptions about the underlying distribution of the data. However, there are some assumptions that must be met in order for the sign test to be valid:
- The data must be paired. That is, each observation in one population must be paired with an observation in the other population.
- The differences between the paired observations must be independent.
- The differences between the paired observations must be symmetrically distributed around zero.
Performing the Sign Test in R
To perform the sign test in R Programming Language, we will use the binom.test() function in the stats package. This function takes two arguments: x and y, which are the two populations or samples to be compared.
R
data <- c (-1, 2, -3, 4, -5, 6, -7, 8)
binom_test_result <- binom.test ( sum (data > 0), length (data), p=0.5, alternative= "two.sided" )
print (binom_test_result)
|
Output:
Exact binomial test
data: sum(data > 0) and length(data)
number of successes = 4, number of trials = 8, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.1570128 0.8429872
sample estimates:
probability of success
0.5
In this example, we’re using the binom.test function from the stats library to perform a two-sided sign test with a null hypothesis that the median of data is equal to 0. We calculate the number of positive differences from 0 (sum(data > 0)) and pass that along with the total number of differences to the function (length(data)). The p argument specifies the hypothesized probability of success (i.e., the probability that an observation is greater than 0), which we set to 0.5. The output will include the test statistic and p-value.
- The given output suggests that an exact binomial test was performed to test whether the true probability of success is significantly different from 0.5, based on a sample of 8 trials where 4 were successful (data > 0). The null hypothesis is that the true probability of success is equal to 0.5, and the alternative hypothesis is that it is not equal to 0.5.
- The p-value of 1 indicates that we fail to reject the null hypothesis at the significance level of 0.05, which means that we do not have enough evidence to suggest that the true probability of success is significantly different from 0.5.
- The 95% confidence interval for the true probability of success is calculated to be between 0.157 and 0.843. This suggests that the true probability of success could range from quite low (15.7%) to fairly high (84.3%), but we cannot say with certainty where the true probability lies based on this sample alone.
- Finally, the sample estimate of the probability of success is calculated to be 0.5, which is the same as the null hypothesis value. This means that the observed proportion of successes in the sample is not significantly different from the expected proportion of successes under the null hypothesis.
Similar Reads
Sign Test in R
In statistics, the sign test is a non-parametric test used to compare two populations or samples and to test if they are equal or not. The sign test is also known as the Wilcoxon sign-rank test, and it is used when the assumptions of a parametric test cannot be met. In this article, we will explore
4 min read
Tensorflow.js tf.sign() Function
Tensorflow.js is an open-source library that is developed by Google for running machine learning models as well as deep learning neural networks in the browser or node environment. The .sign() function is used to find the indication of the stated sign of a given number and is done element wise. Synt
1 min read
Use of Tilde ~ in R
In this article, we will be looking at the use of tilde(~) in the R programming language. Tilde symbol l is used within formulas of statistical models, as mainly this symbol is used to define the relationship between the dependent variable and the independent variables in the statistical model formu
2 min read
Signum Function
Signum Function is an important function in mathematics that helps us to know the sign of a real number. It is usually expressed as a function of a variable and denoted either by f(x) or by sgn(x). It may also be written as a sign(x). Signum Function also has applications in various fields such as p
5 min read
Vectorized IF Statement in R
In R Language vectorized operations are a powerful feature that allows you to apply functions or operations over entire vectors at once, rather than looping through each element individually. The ifelse() function is a primary tool for creating vectorized if statements in R Programming Language. Int
3 min read
How to Resolve t test Error in R
In R Programming Language T-tests are statistical tests used to determine if there is a significant difference between the means of two groups. Among the various statistical approaches, t-tests are commonly employed to compare means between two groups. However, when performing t-tests in R, users fr
4 min read
Create and Save a Script in R
Scripting is a powerful way to save, organize, and reuse your code when working with R. Instead of typing commands interactively in the R console, you can write a series of commands in a script file (.R file) and execute them all at once. Saving your work as an .R script ensures reproducibility and
3 min read
Add Significance Level & Stars to Plot in R
The significance level of a graph is defined as the probability of the wrong elimination of the null hypothesis even though it is true. It's represented in the form P-Value which lies between 0 and 1. In R Programming the significance levels and stars to the plot are added using geom_signif() functi
3 min read
Introduction to R Studio
R Studio is an integrated development environment(IDE) for R. IDE is a GUI, where you can write your quotes, see the results and also see the variables that are generated during the course of programming. R Studio is available as both Open source and Commercial software.R Studio is also available a
4 min read
Data Type Conversion in R
Prerequisite: Data Types in the R Data Type conversion is the process of converting one type of data to another type of data. R Programming Language has only 3 data types: Numeric, Logical, Character. In this article, we are going to see how to convert the data type in the R Programming language Sin
5 min read