Principal Components Analysis (PCA)
Principal Components Analysis (PCA)
02/05/2022
4 5.2 Principal Component Analysis (PCA) Procedure
Algebraically principle components are linear combinations of the p random variables
geometrically, these linear combinations represent the selection of a new coordinate
system obtained by rotating the original system with as the coordinate axis. The new
axis represent the directions with maximum variability and provide a simpler and more
parsimonious description of the covariance structure.
Suppose that we have a random vector X with population variance-covariance matrix
and
02/05/2022
5
Similarly
02/05/2022
()
𝑥1
𝑥2
6 ′
𝑌 𝑝 =𝑎𝑝1 𝑋 1 +𝑎𝑝 2 𝑋 2+ … … … ..+𝑎𝑝𝑝 𝑋 𝑝 =( 𝑎𝑝1 𝑎𝑝 2 ¿ 𝑎 𝑝𝑝 ) … =𝑎𝑝 𝑋
𝑥 𝑝 −1
𝑥𝑝
02/05/2022
5.3 First Principal Component (PCA1): Y1
7
The first principal component is the linear combination of -variables that has maximum
variance (among all linear combinations), so it accounts for as much variation in the data as
possible.
Specifically we will define coefficients for that component in such a way that its variance is
maximized, subject to the constraint that the sum of the squared coefficients is equal to one.
This constraint is required so that a unique answer may be obtained.
That is select such that
02/05/2022
8
5.4 Second Principal Component (PCA2): Y2
The second principal component is the linear combination of -variables that accounts for
as much of the remaining variation as possible, with the constraint that the correlation
between the first and second component is 0.
Select that maximizes the variance of this new component.
Along with the additional constraint these two components will be uncorrelated
with one another.
02/05/2022
5.5 How do we find the coefficients?
9
How do we find the coefficients for a principal component?
Result 5.1:
Let be the covariance matrix associated with the random vector . Letwhere Then the ith
principle component is given by
The solution involves the eigenvalues and eigenvectors of the variance-covariance matrix Σ.
The variance-covariance matrix may be written as a function of the eigenvalues and their
corresponding eigenvectors. This is determined by using the Spectral Decomposition
Theorem.
02/05/2022
10 5.6 Spectral Decomposition Theorem
The variance-covariance matrix can be written as
If are small. We might approximate Σ by
The total variation of
02/05/2022
11
This will give us an interpretation of the components in terms of the amount of the full
variation explained by each component. The proportion of variation explained by the ith
principal component is then going to be defined to be the eigenvalue for that component
divided by the sum of the eigenvalues. In other words, the ith principal component explains
the following proportion of the total variation:
Naturally, if the proportion of variation explained by the first principal components is large,
02/05/2022
then not much information is lost by considering only the first principal components.
12
5.7 Why It May Be Possible to Reduce Dimensions
When we have correlations (multicollinarity) between the x-variables, the data may more
or less fall on a line or plane in a lower number of dimensions. For instance, imagine a
plot of two x-variables that have a nearly perfect correlation. The data points will fall
close to a straight line. That line could be used as a new (one-dimensional) axis to
represent the variation among data points. As another example, suppose that we have
verbal, math, and total SAT scores for a sample of students. We have three variables, but
really (at most) two dimensions to the data because , meaning the third variable is
completely determined by the first two. The reason for saying “at most” two dimensions
is that if there is a strong correlation between verbal and math, then it may be possible
that there is only one true dimension to the data.
02/05/2022
13
Note:
All of this is defined in terms of the population variance-covariance matrix
which is unknown. However, we may estimate by the sample variance-
covariance matrix which is given in the standard formula here:
02/05/2022
14
5.8 Procedure
Compute the eigenvalues of the sample variance-covariance matrix S, and the
corresponding eigenvectors .Then we will define our estimated principal components
using the eigenvectors as our coefficients:
……………..
02/05/2022
15
Generally, we only retain the first k principal component. Here we must balance two
conflicting desires:
1. To obtain the simplest possible interpretation, we want to be as small as possible. If
we can explain most of the variation just by two principal components then this would
give us a much simpler description of the data. The smaller is the smaller amount of
variation is explained by the first component.
2. To avoid loss of information, we want the proportion of variation explained by the
first principal components to be large. Ideally as close to one as possible; i.e., we want
02/05/2022
Example 5.2: Places Rated
16 We will use the Places Rated Almanac data (Boyer and Savageau) which rates 329
communities according to nine criteria:
1. Climate and Terrain
2. Housing
3. Health Care & Environment
4. Crime
5. Transportation
6. Education
7. The Arts
8. Recreation
9. Economics
Are the correlation coefficients between the components and the variable .
02/05/2022
5.10 Scree Plot
19
Alternative Method to determine the number of principal components is to look at a Scree
Plot.
The scree plot for the variables without standardization (covariance matrix) 02/05/2022
20
Step 2: Next, we will compute the principal component scores. For example, the first principal
component can be computed using the elements of the first eigenvector:
02/05/2022
21 5.11 Interpretation of the Principal Components
Step 3: To interpret each component, we must compute the correlations between the original
data for each variable and each principal component.
These correlations are obtained using the correlation procedure. In the variable statement we
will include the first three principal components, "prin1, prin2, and prin3", in addition to all
nine of the original variables. We will use these correlations between the principal
components and the original variables to interpret these principal components.
Because of standardization, all principal components will have mean 0. The standard
deviation is also given for each of the components and these will be the square root of the
eigenvalue.
More important for our current purposes are the correlations between the principal
components and the original variables. These have been copied into the following table. You
will also note that if you look at the principal components themselves that there is zero
correlation between the components 02/05/2022
22
Principal Component
Variable 1 2 3
Interpretation of the principal components is based on finding which variables are most
strongly correlated with each component, i.e., which of these numbers are large in
magnitude, the farthest from zero in either positive or negative direction. Which numbers
we consider to be large or small is of course is a subjective decision. You need to
determine at what level the correlation value will be of importance. Here a correlation
value above 0.5 is deemed important. These larger correlations are in boldface in the
table above:
We will now interpret the principal component results with respect to the value that we
have deemed significant.
02/05/2022
24
First Principal Component Analysis - PCA1
The first principal component is strongly correlated with five of the original variables. The
first principal component increases with increasing Arts, Health, Transportation, Housing
and Recreation scores. This suggests that these five criteria vary together. If one increases,
then the remaining ones tend to as well. This component can be viewed as a measure of the
quality of Arts, Health, Transportation, and Recreation, and the lack of quality in Housing
(recall that high values for Housing are bad). Furthermore, we see that the first principal
component correlates most strongly with the Arts. In fact, we could state that based on the
correlation of 0.985 that this principal component is primarily a measure of the Arts. It
would follow that communities with high values would tend to have a lot of arts available,
in terms of theaters, orchestras, etc. Whereas communities with small values would have
very few of these types of opportunities.
02/05/2022
25
02/05/2022
26 Further analyses may include:
Scatter plots of principal component scores. In the present context, we may wish to
identify the locations of each point in the plot to see if places with high levels of a
given component tend to be clustered in a particular region of the country, while sites
with low levels of that component are clustered in another region of the country.
Principal components are often treated as dependent variables for regression and
analysis of variance.
02/05/2022
27
5.12 Alternative: Standardize the Variables
In the previous example we looked at principal components analysis applied to the raw
data. In our earlier discussion we noted that if the raw data is used principal component
analysis will tend to give more emphasis to those variables that have higher variances than
to those variables that have very low variances. In effect the results of the analysis will
depend on what units of measurement are used to measure each variable. That would imply
that a principal component analysis should only be used with the raw data if all variables
have the same units of measure. And even in this case, only if you wish to give those
variables which have higher variances more weight in the analysis.
02/05/2022
28 5.13 Summary
The results of principal component analysis depend on the scales at which the variables
are measured.
Variables with the highest sample variances will tend to be emphasized in the first few
principal components.
Principal component analysis using the covariance function should only be considered if
all of the variables have the same units of measurement.
If the variables either have different units of measurement (i.e., pounds, feet, gallons, etc), or
if we wish each variable to receive equal weight in the analysis, then the variables should be
standardized before a principal components analysis is carried out. Standardize the variables
by subtracting its mean from that variable and dividing it by its standard deviation:
02/05/2022
29
where
- Data for variable in sample unit
- Sample mean for variable
- Sample standard deviation for variable
We will now perform the principal component analysis using the standardized data.
Note: the variance-covariance matrix of the standardized data is equal to the correlation
matrix for the unstandardized data. Therefore, principal component analysis using the
standardized data is equivalent to principal component analysis using the correlation
matrix.
02/05/2022
5.13.1 Principal Component Analysis Procedure
30 The principal components are first calculated by obtaining the eigenvalues for the correlation
matrix:
In this matrix we denote the eigenvalues of the sample correlation matrix R, and the
corresponding eigenvectors
Then the estimated principal components scores are calculated using formulas similar to
before, but instead of using the raw data we will use the standardized data in the formulae
below:
02/05/2022
The scree plot for standardized variables (correlation matrix)
Step 2
34 Next, we can compute the principal component scores using the eigenvectors. This is a
formula for the first principal component:
And remember, this is now going to be a function, not of the raw data but the standardized
data.
The magnitudes of the coefficients give the contributions of each variable to that component.
Since the data have been standardized, they do not depend on the variances of the
corresponding variables.
02/05/2022
35
Step 3
Next, we can look at the coefficients for the principal components. In this case, since the data
are standardized, within a column the relative magnitude of those coefficients can be directly
assessed. Each column here corresponds with a column in the output of the program labeled
Eigenvectors.
02/05/2022
36
Principal Component
Variable 1 2 3 4 5
02/05/2022
38
Second Principal Component Analysis - PCA2
The second principal component is a measure of the severity of crime, the quality of the
economy, and the lack of quality in education. Crime and Economy increase with decreasing
Education. Here we can see that cities with high levels of crime and good economies also
tend to have poor educational systems.
Third Principal Component Analysis - PCA3
The third principal component is a measure of the quality of the climate and poorness of the
economy. Climate increases with decreasing Economy. The inclusion of economy within
this component will add a bit of redundancy within our results. This component is primarily
a measure of climate, and to a lesser extent the economy.
02/05/2022
39
Fourth Principal Component Analysis - PCA4
The fourth principal component is a measure of the quality of education and the economy
and the poorness of the transportation network and recreational opportunities. Education
and Economy increase with decreasing Transportation and Recreation.
Fifth Principal Component Analysis - PCA5
The fifth principal component is a measure of the severity of crime and the quality of
housing. Crime increases with decreasing housing.
02/05/2022
40 Example 5.4 (Summarizing sample variability with two sample principal components)
A census provided information, by tract, on five socioeconomic variables for the Madison,
Wisconsin, area. The data from 61 tracts produced the following summary statistics:
02/05/2022
41
then is approximately
02/05/2022
3. Each is distributed independently of the elements of the associated ·
44
Result 1 implies that, for large, the are independently distributed. Moreover, has an
approximate distribution. Using this normal distribution, we obtain A large
sample confidence interval for is thus provided by
02/05/2022
Result 2 implies that the are normally distributed about the corresponding s for large
45
samples. The elements of each are correlated, and the correlation depends to a large extent
on the separation of the eigenvalues (which unknown) and the sample size . Approximate
standard errors for the coefficients are given by the square roots of the diagonal elements of
where is derived from by substituting s for the s and s for the s.
Example 5.5. (Constructing a confidence interval for )
Obtain a 95% confidence interval for , the variance of the first population principal
component using the .
02/05/2022
46
5.14.2 Testing for the Equal Correlation Structure
The special correlation structure , all , is one important structure in which the eigenvalues
of are not distinct and the previous results do not apply.
To test for this structure, let
02/05/2022
and
47
A test of versus may be based on a likelihood ratio statistic, but Lawley has demonstrated that an
equivalent test procedure can be constructed from the off-diagonal elements of R.
Lawley's procedure requires the quantities
02/05/2022
48
It is evident that is the average of the off-diagonal elements in the kth column (or row) of and is the
overall average of the off-diagonal elements.
The large sample approximate -level test is to reject in favor of if
where is the upper th percentile of a chi-square distribution with degrees of freedom.
02/05/2022
49
Test for the Equal Correlation Structure.
02/05/2022
50
02/05/2022
The Bartlett’s test checks if the observed correlation matrix diverges significantly from the identity
51
matrix (theoretical matrix under : the variables are orthogonal). The PCA can perform a compression of
the available information only if we reject the null hypothesis.
In order to measure the overall relation between the variables, we compute the determinant of the
correlation matrix . Under , if the variables are highly correlated, we have .
The Bartlett's test statistic indicates to what extent we deviate from the reference situation |R| = 1. It uses
the following formula.
Under , it follows a distribution with a degree of freedom.
02/05/2022
Example 5.7
Apply
52 Bartlett’s sphericity test for following correlation matrix
5.15.2 KMO Measure of Sampling Adequacy (MSA)
The KMO index has the same goal. It checks if we can factorize efficiently the original variables. But it is
based on another idea.
The correlation matrix is always the starting point. We know that the variables are more or less correlated,
but the correlation between two variables can be influenced by the others. So, we use the partial correlation
in order to measure the relation between two variables by removing the effect of the remaining variables.
The KMO index compares the values of correlations between variables and those of the partial correlations.
If the KMO index is high the PCA can act efficiently; if KMO is low , the PCA is not relevant. Some
references give a table for the interpretation of the value of the KMO index obtained on a dataset.
02/05/2022
Partial
53correlation matrix
The partial correlation matrix can be obtained from the correlation matrix. We calculate the inverse of this last one and
we compute the partial correlation as follows:
Overall KMO index
The overall KMO index is computed as follows.
If the partial correlation is near to zero, the PCA can perform efficiently the factorization because the variables are
highly related: .
02/05/2022