What are real strengths of association when using Pearson correlation?

Question

I have located a table that breaks downs Pearson's correlation values into 3 categories. See below: enter image description here

My question is this. One could have a large positive correlation of 1 as well as a perfect correlation. Is this table valid?

What would it mean for this table (or another table like it) to be valid or invalid? — gung - Reinstate Monica, Commented Jun 26, 2014 at 17:12
If someone says 'that water is very hot' and another says 'that water is very cool' is either necessarily valid or invalid? If the first person is trying to wash her hands while the second is trying to make tea, could that impact the likelihood of both being valid, compared to them both trying to make bread? .... [The point, in case I am being too obscure, is that both context and also the fact that opinions may in many situations reasonably differ, can affect any judgement of validity. We certainly don't have enough context to say much, but it's probably either opinion or 'received wisdom'] — Glen_b, Commented Jun 27, 2014 at 1:21

Peter Flom · Accepted Answer · 2014-06-26 17:05:00Z

2

First, the table isn't really "valid" or "invalid". It's a general guideline that will apply in some situations and not others. In some fields an R of $\pm{0.5}$ is very strong, in some it is very weak.

Second, a perfect correlation is, indeed, large.

answered Jun 26, 2014 at 17:05

Peter Flom

128k36 gold badges184 silver badges425 bronze badges

Add a comment |

ttnphns · Accepted Answer · 2014-06-27 08:16:10Z

I would expand @Peter's

In some fields an r of 0.5 is very strong, in some it is very weak

Clearly, in social sciences based on surveys/testing, correlation 0.5 between questions or items is considerably large; that same value will be seen as negligibly small in some branches of physics.

But another facet of it is the shape of the distribution and the degree of discretization of the scale.

Pearson $r$ theoretically varies between -1 and +1. However, in real data the empirical range that will not let out $r$ - computed between two given variables, - is usually narrower than the theoretical one. Linear correlation between X and Y has a chance to attain +1 if and only if the shape of the distribution is totally identical in X and Y (in other words, the two distributions differ no more then just linearly). Otherwise, the upper bound for $r$ will be lower than +1. Analogously, $r$ could attain -1 only if the shape of the distribution is fully identical in X and -Y; otherwise the lower bound for $r$ will be higher than -1. In order $r$ to have empirical range of variation as the theoretical one, -1 to +1, the two distributions must be not only identical in shape, - they must be both symmetrical in shape.

Real life correlating variables often have different and asymmetric distributional shapes. That means that $r$ for them has narrowed freedom to vary and can never reach +1 or -1 values. In the following example, X and Y correlate with $r=.573$, but those data could never give $r>.808$.

X(sorted)  Y(asis, going with X)) Y(sorted ascendinhly, like X)
   4                  4                   3
   4                  4                   4
   4                  5                   4
   5                  3                   4
   5                  4                   4
   5                  4                   5
   5                  6                   6
   5                  6                   6
   6                  6                   6
   6                  7                   7

r(Xsorted,Yasis)= .573  -  actual observed correlation
r(Xsorted,Ysorted)= .808  -  maximal correlation attainable with the observed data

What does considerable narrowing of the range for $r$ mean? It means that the strength of the association between the two variables is underestimated by the traditional $r$; and the more it is undrestimated as the stronger is the association. Because strong association is then inevitably nonlinear, but $r$ skims only linear portion of it. If the analyst facing this problem comes wishing to scoop out more of the association, he would probably think of using a nonparametric Spearman $rho$.

But an alternative an seemingly reasonable solution would be to rescale the observed $r$ value to the upper (or to the lower, or to both) its empirical bounds(s). With the example above, $.573/.808=.709$ may be regarded as the "corrected" or "true" magnitude of $r$ in the context of our specific data. Spearman $rho$ is measuring linear relationship after streightening curved relationship (by ranking). Rescaled $r$ is referencing itself to the ceiling of the linearity possible with the given data. The ceiling or bound value of $.808$ could be, in a sense, labeled "perfect association" despite that it is less than $1$.

Such rescaling, has, of course its shortcomings, the main being is that a matrix of such coefficients is often not positive semidefinite.

If the data measurement scale is coarse, such as dichotomous, data react sharper to the inequality of marginal distributional shapes, and the empirical range for $r$ is narrower than it is with fine scale. As seen below:

Histograms

Two interval variables with different shape of distribution:
   ooo
   ooo      oooooo
oooooo      oooooo
oooooo      oooooo
123456      123456  <- 6-grade scale
The upper empirical bound for r of these data = .956

The same two variables binned into dichotomous scale
(inequality of distribution shape preserved):
    o
    o
    o
    o        o  o
    o        o  o
    o        o  o
 o  o        o  o
 o  o        o  o
 o  o        o  o
 o  o        o  o
 o  o        o  o
 o  o        o  o
 2  5        2  5  <- dichotomous scale
The upper empirical bound for r of these data = .707

And it is the reason why factor analysis done on binary data via Pearson $r$ often give flat scree plot, suggesting too many factors to extract. Strong correlations just have little chance to occure in binary data with varying item difficulties (i.e., marginal shapes). Torgerson adviced then computation of the above rescaled $r$ values, which are higher. Another way out is to compute tetrachoric $r$ which is also higher than ordinary $r$.

Stack Exchange Network

What are real strengths of association when using Pearson correlation?

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged
correlation
effect-size
pearson-r
or ask your own question.

Linked

Hot Network Questions

What are real strengths of association when using Pearson correlation?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged correlationeffect-sizepearson-r or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
correlation
effect-size
pearson-r
or ask your own question.