Correlation and Regration

Introduction
The most commonly used techniques for investigating the relationship between two quantitative
variables are correlation and linear regression. Correlation quantifies the strength of the linear
relationship between a pair of variables, whereas regression expresses the relationship in the form
of an equation. For example, in patients attending an accident and emergency unit (A&E), we
could use correlation and regression to determine whether there is a relationship between age and
urea level, and whether the level of urea can be predicted for a given age.
Correlation
Correlation is a statistical measure that indicates the extent to which two or more variables
fluctuate together. A positive correlation indicates the extent to which those variables increase or
decrease in parallel; a negative correlation indicates the extent to which one variable increases as
the other decreases.
According to W. I. King, Correlation means that relation between two series of group of data their
exist some causal correlation.
Correlation is a statistical technique that can show whether and how strongly pairs of variables are
related. Although this correlation is fairly obvious your data may contain unsuspected correlations.
You may also suspect there are correlations, but don't know which are the strongest. An intelligent
correlation analysis can lead to a greater understanding of your data.
• Correlation is Positive or direct when the values increase together, and

• Correlation is Negative when one
Correlation can have a value:

• 1 is a perfect positive correlation
• 0 is no correlation (the values don't seem linked at all)
• -1 is a perfect negative correlation
The value shows how good the correlation is (not how steep the line is), and if it is positive or
negative. Usually, in statistics, there are three types of correlations: Pearson correlation, Kendall
rank correlation and Spearman correlation.
1
Correlation Coefficients Pearson, Kendall and Spearman
Correlation is a Bivariate analysis that measures the strengths of association between two
variables. In statistics, the value of the correlation coefficient varies between +1 and -1. When the
value of the correlation coefficient lies around ± 1, then it is said to be a perfect degree of
association between the two variables. As the correlation coefficient value goes towards 0, the
relationship between the two variables will be weaker. Usually, in statistics, we measure three
types of correlations: Pearson correlation, Kendall rank correlation and Spearman correlation.
Pearson 𝑟 correlation
Pearson correlation is widely used in statistics to measure the degree of the relationship between
linear related variables. For example, in the stock market, if we want to measure how two
commodities are related to each other, Pearson correlation is used to measure the degree of
relationship between the two commodities. The following formula is used to calculate the Pearson
correlation coefficient 𝑟:
𝑟 = Σ (𝑥𝑖 − 𝑥̅) (𝑦𝑖 − 𝑦̅) 𝑛𝑖=1 √Σ (𝑥𝑖 − 𝑥̅)2 Σ (𝑦𝑖 − 𝑦̅ )2
Kendall's Tau rank correlation
Kendall rank correlation is a non-parametric test that measures the strength of dependence between
two variables. If we consider two samples, x and y, where each sample size is n, we know that the
total number of pairings with x y is n (n-1)/2.
The following formula is used to calculate the value of Kendall rank correlation
𝒏𝒄 − 𝒏𝒅
𝒓=
𝟏
𝒏(𝒏 − 𝟏)
𝟐
Where:
𝜏 = Kendall rank correlation coefficient

𝑛𝑐 = number of concordant (Ordered in the same way).
𝑛𝑑= Number of discordant (Ordered differently).
2
Spearman rank correlation
Spearman rank correlation is a non-parametric test that is used to measure the degree of
association between two variables. It was developed by Spearman; thus, it is called the Spearman
rank correlation. Spearman rank correlation test does not assume any assumptions about the
distribution of the data and is the appropriate correlation analysis when the variables are
measured on a scale that is at least ordinal.
The following formula is used to calculate the Spearman rank correlation coefficient:
𝟔∑𝒅𝒊𝟐
𝒑=𝟏−
𝒏(𝒏𝟐 − 𝟏)
Where:
𝜌 = Spearman rank correlation coefficient
di= the difference between the ranks of corresponding values Xi and Yi
n= number of values in each data set.
On the basis of Number there are two types of correlation
• Simple correlation
• Multiple correlation
Simple correlation five types:
• Perfect positive correlation

• Perfect negative correlation
• Partial positive correlation
• Partial negative correlation
• Zero correlation
3
Table (1): Individual’s Increasing Age with Increasing Sickness
Individual 1 2 3 4 5 6 7 8 9 10
Increasing 20 30 32 35 40 46 52 55 58 62
Age
Increasing 1 2 0 3 4 6 5 7 8 9
Sickness
Suppose that (x) denotes for Increasing age and (y) denotes for Increasing sickness.
Table (2): Calculating the equation parameters

Individual x y xy 𝑥2 𝑦2
1 20 1 20 400 1
2 30 2 60 900 4
3 32 0 0 1024 0
4 35 3 105 1225 9
5 40 4 160 1600 10
6 46 6 276 2116 36
7 52 5 260 2704 25
8 55 7 385 3025 49
9 58 8 464 3364 64
10 62 9 558 3844 81
∑x=430 ∑y=45 ∑xy=2288 ∑𝑥 2 =20202 ∑𝑦 2 =285
We know the pearson’s formula r is

𝑛∑𝑥𝑦−∑𝑥.∑𝑦
r=
√{𝑛∑𝑥 2 −(∑𝑥 2 )}{𝑛∑𝑦 2 −(∑𝑦 2 )}
10.2288−430.45
r=
√(10.20202−184.900)(10.285−2025)
3530
r=
√14124000
3530
r=
3758.19
r = 0.93
4
Regression
Linear regression explores relationships that can be readily described by straight lines or their
generalization to many dimensions. A surprisingly large number of problems can be solved by
linear regression, and even more by means of transformation of the original variables that result in
linear relationships among the transformed variables.
Regression analysis is one of the most commonly used statistical techniques in social and
behavioral sciences as well as in physical sciences which involves identifying and evaluating the
relationship between a dependent variable and one or more independent variables, which are also
called predictor or explanatory variables. It is particularly useful for assess and adjusting for
confounding. Model of the relationship is hypothesized and estimates of the parameter values are
used to develop an estimated regression equation. Various tests are then employed to determine if
the model is satisfactory. If the model is deemed satisfactory, the estimated regression equation
can be used to predict the value of the dependent variable given values for the independent
variables
Independent variables
are characteristics that can be measured directly; these variables are also called predictor or
explanatory variables used to predict or to explain the behavior of the dependent variable.
Dependent variable
is a characteristic whose value depends on the values of independent variables.
Reliability and Validity

• Does the model make intuitive sense? Is the model easy to understand and interpret?
• Are all coefficients statistically significant? (p-values less than .05)
• Are the signs associated with the coefficients as expected?
• Does the model predict values that are reasonably close to the actual values?
• Is the model sufficiently sound? (High R-square, low standard error, etc.)
Objectives of Regression Analysis

Regression analysis used to explain variability in dependent variable by means of one or more of
independent or control variables and to analyze relationships among variables to answer; the
question of how much dependent variable changes with changes in each of the independent's
variables, and to forecast or predict the value of dependent variable based on the values of the
independent's variables.
5
The primary objective of regression is to develop a linear relationship between a response variable
and explanatory variables for the purposes of prediction, assumes that a functional linear
relationship exists, and alternative approaches (functional regression) are superior.
Simple Regression Model
Simple linear regression is a statistical method that allows us to summarize and study relationships
between two continuous (quantitative) variables. In a cause and effect relationship, the
independent variable is the cause, and the dependent variable is the effect. Least squares linear
regression is a method for predicting the value of a dependent variable y, based on the value of an
independent variable x.
• One variable, denoted (x), is regarded as the predictor, explanatory, or independent
variable.
• The other variable, denoted (y), is regarded as the response, outcome, or dependent
variable.
Mathematically, the regression model is represented by the following equation:

𝐲=𝛽0 ± 𝛽1 𝒙1 ± 𝜀1
Where:
x independent variable. 𝒏 Number of cases or individuals.
y dependent variable. Σ 𝒙𝐲 Sum of the product of dependent and
independent variables.
𝛽1 The Slope of the regression line Σ 𝒙 = Sum of independent variable.
𝜷𝟎 the intercept point of the Σ 𝐲 = Sum of dependent variable.
regression line and the y axis. Σ 𝒙𝟐 = Sum of square independent variable.
Multiple Regressions Model
Multiple regression is an extension of simple linear regression. It is used when we want to predict
the value of a dependent variable (target or criterion variable) based on the value of two or more
independent variables (predictor or explanatory variables). Multiple regression allows you to
determine the overall fit (variance explained) of the model and the relative contribution of each of
the predictors to the total variance explained. For example, you might want to know how much of
the variation in exam performance can be explained by revision time and lecture attendance "as a
whole", but also the "relative contribution" of each independent variable in explaining the variance.
6
Mathematically, the multiple regression model is represented by the following equation
𝒀=𝜷𝟎 ± 𝜷𝒊 𝑿𝒊…………± 𝜷𝒏 𝑿𝒏 ± 𝒖
Where:
𝑿𝒊 𝑡𝑜 𝑿𝒏 Represent independent variables.

𝐘 Dependent variable.
𝛽1 the regression coefficient of variable 𝒙𝟏
𝛽2 the regression coefficient of variable 𝒙𝟐
𝜷𝟎 the intercept point of the regression line and the y axis.
Conclusion
When comparing two different variables, two questions come to mind: “Is there a relationship
between two variables?” and “How strong is that relationship?” These questions can be answered
using regression and correlation. Regression answers whether there is a relationship (again this
book will explore linear only) and correlation answers how strong the linear relationship is. To
introduce both of these concepts, it is easier to look at a set of data.
7
References
• Whitley, E., & Ball, J. (2002). Statistics review 1: Presenting and summarizing data. Crit
Care.
• Zaid, M. A. (2015). Correlation and Regression Analysis. Organization of Islamic
cooperation.
• Allan G. Bluman, (2009). Elementary Statistics: A Brief Version, 7th Edition, New York:
McGraw-Hill.
• Higgins, J. (2006): The Radical Statistician: A Beginners Guide to Unleashing the Power
of Applied Statistics in The Real World (5th Ed.) Jim Higgins Publishing.

Correlation and Regration

Uploaded by

Copyright:

Available Formats

Correlation and Regration

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Correlation and Regration

Uploaded by

Copyright:

Available Formats

Introduction

• Correlation is Positive or direct when the values increase together, and

Correlation can have a value:

Kendall's Tau rank correlation

𝜏 = Kendall rank correlation coefficient

On the basis of Number there are two types of correlation

Simple correlation five types:

• Perfect positive correlation

Table (2): Calculating the equation parameters

We know the pearson’s formula r is

Reliability and Validity

Objectives of Regression Analysis

Simple Regression Model

Mathematically, the regression model is represented by the following equation:

Multiple Regressions Model

𝑿𝒊 𝑡𝑜 𝑿𝒏 Represent independent variables.

You might also like