Correlation coefficients are statistics which can help to describe data sets which contain
variables measured at the interval and ratio levels. Correlation coefficients are measures of
association between two (or more) variables.

Problems like: are certain abilities closely related & others relatively independent? Are bright
children tend to be less neurotic than average children? If we know the general intelligence of
a child as measured by a standard test, can we say anything about the scholastic achievements
as represented by grades? These involve the relations among abilities that can be studied
using correlatiuon.

When the relationship between two sets of measures/variables/attributes is ‘linear’. i.e. can be
described by a straight line, the correlation between scores may be expressed by the product
moment coefficient of correlation, denoted by letter the r. The Pearson r assumes that the
variables are measured at the interval or ratio level. If the variables are measured at the
ordinal level, however (for example, a Likert-type scale), then the Spearman rank correlation
can be used. Neither Pearson nor Spearman are designed for use with variables measured at
the nominal level; instead, use the point-biserial correlation (for one nominal variable) or phi
(for two nominal variables).

Perfect correlation: When the relationship is fixed, absolute & unchanging the relationship
between the two variables is said to be perfect, i.e. r=1.00. The numerical value of correlation
of coefficient will be in between -1 to + 1. It is known as real number value. In other words it
cannot take values less than -1 or more than +1. This happens when the relative position of
each subject is exactly the same in one test as in the other. It measures only linear correlation
between X and Y, which means when the variables are plotted a straight line is formed by the
points plotted.

Positive & negative correlation (strong): Correlation indicates both the strength of the
association and its direction (direct or inverse). If ‘r’ goes on approaching toward -1 then it
means that the relationship is going towards the strong negative side. When ‘r’ approaches to
the side of + 1 then it means the relationship is strong and positive. By this we can say that if
+1 is the result of the correlation then the relationship is in a positive state.

The weak correlation is signalled when the coefficient of correlation approaches to zero.
When ‘r’ is near about zero then we can deduce that the relationship is weak.

No Correlation: When there is no correspondence/ relationship between the scores of the

subjects upon the two attributes/ test measure, then the coefficient of correlation is zero.
There will be no linear relationship between X & Y.

Coefficient of correlation is a pure number without effect of any units on it because ‘r’ is a
scale invariant (remains unchanged). It does not gets affected when we add the same number
to all the values of one variable. We can multiply all the variables by the same positive
number. It does not affect the correlation coefficient.

Scatter Plot:
It is useful to obtain a plot of the joint distribution of the values of the two variables, X and
Y. These are called scatterplots. The values of X are displayed on the lower, or horizontal
axis (called the X-axis) and the values of Y are displayed on the upper or vertical axis (called
the Y-axis).

If small values of X are associated with small values for Y, and large values of X are
associated with large values of Y, then the data will stretch from the lower left hand corner of
the plot to the upper right hand corner of the plot. This indicates a positive relationship.

If small values of X are associated with large values for Y, and large values of X are
associated with small values of Y, then the data will stretch from the upper left hand corner of
the plot to the lower right hand corner of the plot. This indicates an inverse relationship.

If there is no discernible pattern to the distribution, then the two variables probably are not
related in a linear fashion. There may be a strong, non-linear relationship between the two
variables (for example, think of the normal curve) but it cannot be detected by r.

When there are only a few data points, it is fairly easy to estimate the strength of the
relationship by eyeballing the data. However, with many data points statistics are needed to
summarize the strength and direction of the relationship.


When N (number of subjects) is large much time & labor will be saved by arranging the data
in the form of a diagram or chart & then calculating deviations from assumed mean, instead
of actual mean This chart is called scattergram or scatter diagram.
Along the left-hand margin from bottom to top are laid off the class interval of the variable Y
& top of the diagram from left to right are laid off the class interval of the variable X.
Plot the values of the given variables (say X and Y) along the X-axis and Y-axis, respectively
& enter the tally in the respective grid.
The data are represented by a tally in a cell or square of the table in accordance with the two
Along the bottom of the diagram in the f(y) row is tabulated the number of people who fall in
the particular variable’s class interval & vice versa for f)x) column. The f(y) row & f(x)
column should meet the total number of data/subjects.
After all the tallies have been listed, the frequency in each cell is added & entered on the
diagram, then the scattergram is a correlation table. Even before calculation, one can
probability guesses the nature of relationship from the diagram.
The further step would be to calculate the mean of the variable in Y axis (height) for column
(i.e. for variable Y(weight)), in this example, mean height of people who fall between 100 to
109 pounds. Using assumed mean method, the mean is calculated & written in bottom
column, i.e. f(x) similarly for all the y axis the mean value is found. This gives an idea about
the pattern of covariance or the amount of value increase or decrease in one variable’s
influence the amount of value increase or decrease in another. In this example it appears that
an actual weight increase of approximately 70 pounds (104.5 – 174.5) corresponds to an
increase in mean height of 7.7 inches; that is the increase from lightest to the heaviest man is
parallel by an increase of approximately eight inches in height, from this it could be seen that
the correlation b/n height & weight is positive.
The above-mentioned method is applied to find the changes in mean weight (variable in X
axis) which corresponds to the given change in height (Y axis). Refer book for example.

