0% found this document useful (0 votes)
141 views

Principal Components Analysis (PCA)

Principal components analysis (PCA) is a technique used to reduce the dimensionality of a dataset while retaining as much information as possible. It works by transforming the variables into a new set of uncorrelated principal components. The first principal component accounts for as much variability in the data as possible, and each subsequent component accounts for as much of the remaining variability as possible. The number of principal components retained is usually determined by examining the eigenvalues, retaining components that explain a high proportion of the total variability in the data. PCA is illustrated using a dataset that rates 329 communities on 9 criteria.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views

Principal Components Analysis (PCA)

Principal components analysis (PCA) is a technique used to reduce the dimensionality of a dataset while retaining as much information as possible. It works by transforming the variables into a new set of uncorrelated principal components. The first principal component accounts for as much variability in the data as possible, and each subsequent component accounts for as much of the remaining variability as possible. The number of principal components retained is usually determined by examining the eigenvalues, retaining components that explain a high proportion of the total variability in the data. PCA is illustrated using a dataset that rates 329 communities on 9 criteria.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

CHAPTER 5

PRINCIPAL COMPONENTS ANALYSIS


(PCA)
2 5.1 Introduction
A principal component analysis is concerned with explaining the variance-
covariance structure of a set of variables through a few linear combinations of these
variables. Its general objectives are (1) data reduction and (2) interpretation.
Although components are required to reproduce the total system variability, often
much of this variability can be accounted for by a small number of the principal
components. If so, there is (almost) as much information in the components as
there is in the original variables. The principal components can then replace the
initial variables, and the original data set, consisting of measurements on variables,
is reduced to a data set consisting of measurements on principal components.
An analysis sometimes data are collected on a large number of variables from a
single population. As an example consider the Places Rated dataset below.
02/05/2022
3 Example 5.1: Places Rated
In the Places Rated Almanac, Boyer and Savageau rated 329 communities according to the
following nine criteria:
1. Climate and Terrain
2. Housing
3. Health Care & the Environment
4. Crime
5. Transportation
6. Education
7. The Arts
8. Recreation
9. Economics

02/05/2022
4 5.2 Principal Component Analysis (PCA) Procedure
Algebraically principle components are linear combinations of the p random variables
geometrically, these linear combinations represent the selection of a new coordinate
system obtained by rotating the original system with as the coordinate axis. The new
axis represent the directions with maximum variability and provide a simpler and more
parsimonious description of the covariance structure.
Suppose that we have a random vector X with population variance-covariance matrix

and

02/05/2022
5

Consider the linear combinations

Similarly

02/05/2022
()
𝑥1
𝑥2
6 ′
𝑌 𝑝 =𝑎𝑝1 𝑋 1 +𝑎𝑝 2 𝑋 2+ … … … ..+𝑎𝑝𝑝 𝑋 𝑝 =( 𝑎𝑝1 𝑎𝑝 2 ¿ 𝑎 𝑝𝑝 ) … =𝑎𝑝 𝑋
𝑥 𝑝 −1
𝑥𝑝

Each of these can be thought of as a linear regression, predicting from . There is no


intercept, but an be viewed as regression coefficients. Note that  is a function of our
random data, and so is also random. Therefore it has a population variance

Moreover, Yi and Yj will have a population covariance

02/05/2022
5.3 First Principal Component (PCA1): Y1
7
 The first principal component is the linear combination of -variables that has maximum
variance (among all linear combinations), so it accounts for as much variation in the data as
possible.
Specifically we will define coefficients  for that component in such a way that its variance is
maximized, subject to the constraint that the sum of the squared coefficients is equal to one.
This constraint is required so that a unique answer may be obtained.
That is select such that

Subject to the constrain

02/05/2022
8
5.4 Second Principal Component (PCA2): Y2
The second principal component is the linear combination of -variables that accounts for
as much of the remaining variation as possible, with the constraint that the correlation
between the first and second component is 0.
Select  that maximizes the variance of this new component.

Subject to the constrain

 Along with the additional constraint these two components will be uncorrelated
with one another.

02/05/2022
5.5 How do we find the coefficients?
9
How do we find the coefficients for a principal component?
 Result 5.1:
Let be the covariance matrix associated with the random vector . Letwhere Then the ith
principle component is given by

With these choices,

The solution involves the eigenvalues and eigenvectors of the variance-covariance matrix Σ.
The variance-covariance matrix may be written as a function of the eigenvalues and their
corresponding eigenvectors. This is determined by using the Spectral Decomposition
Theorem.

02/05/2022
10 5.6 Spectral Decomposition Theorem
The variance-covariance matrix can be written as

 
If are small. We might approximate Σ by

 
The total variation of  

02/05/2022
11
This will give us an interpretation of the components in terms of the amount of the full
variation explained by each component. The proportion of variation explained by the ith
principal component is then going to be defined to be the eigenvalue for that component
divided by the sum of the eigenvalues. In other words, the ith principal component explains
the following proportion of the total variation:

A related quantity is the proportion of variation explained by the first  principal component.


This would be the sum of the first  eigenvalues divided by its total variation.

Naturally, if the proportion of variation explained by the first  principal components is large,
02/05/2022
then not much information is lost by considering only the first  principal components.
12
5.7 Why It May Be Possible to Reduce Dimensions
When we have correlations (multicollinarity) between the x-variables, the data may more
or less fall on a line or plane in a lower number of dimensions. For instance, imagine a
plot of two x-variables that have a nearly perfect correlation.  The data points will fall
close to a straight line. That line could be used as a new (one-dimensional) axis to
represent the variation among data points. As another example, suppose that we have
verbal, math, and total SAT scores for a sample of students. We have three variables, but
really (at most) two dimensions to the data because , meaning the third variable is
completely determined by the first two. The reason for saying “at most” two dimensions
is that if there is a strong correlation between verbal and math, then it may be possible
that there is only one true dimension to the data.
 

02/05/2022
13

Note:
All of this is defined in terms of the population variance-covariance matrix
which is unknown. However, we may estimate by the sample variance-
covariance matrix which is given in the standard formula here:

02/05/2022
14
5.8 Procedure
Compute the eigenvalues   of the sample variance-covariance matrix S, and the
corresponding eigenvectors .Then we will define our estimated principal components
using the eigenvectors as our coefficients:

……………..

02/05/2022
15
Generally, we only retain the first k principal component. Here we must balance two
conflicting desires:
1. To obtain the simplest possible interpretation, we want  to be as small as possible. If
we can explain most of the variation just by two principal components then this would
give us a much simpler description of the data. The smaller  is the smaller amount of
variation is explained by the first component.
2. To avoid loss of information, we want the proportion of variation explained by the
first  principal components to be large. Ideally as close to one as possible; i.e., we want

02/05/2022
Example 5.2: Places Rated
16 We will use the Places Rated Almanac data (Boyer and Savageau) which rates 329
communities according to nine criteria:
1. Climate and Terrain
2. Housing
3. Health Care & Environment
4. Crime
5. Transportation
6. Education
7. The Arts
8. Recreation
9. Economics

5.9 Data Analysis:


Step 1: We examine the eigenvalues to determine how many principal components should
02/05/2022
be considered:
17 Table 5.1: Eigenvalues and the proportion of variation explained by the principal
components.

Component Eigenvalue Proportion Cumulative

1 0.3775 0.7227 0.7227

2 0.0511 0.0977 0.8204

3 0.0279 0.0535 0.8739

4 0.0230 0.0440 0.9178

5 0.0168 0.0321 0.9500

6 0.0120 0.0229 0.9728

7 0.0085 0.0162 0.9890

8 0.0039 0.0075 0.9966

9 0.0018 0.0034 1.0000 02/05/2022


Result 5.2:
18 If are the principal components obtained from the covariance matrix , then

Are the correlation coefficients between the components and the variable .

Example 5.2: (Calculating the principal components)


Suppose the random numbers have the covariance matrix

02/05/2022
5.10 Scree Plot
19
Alternative Method to determine the number of principal components is to look at a Scree
Plot.

The scree plot for the variables without standardization (covariance matrix) 02/05/2022
20

Step 2: Next, we will compute the principal component scores. For example, the first principal
component can be computed using the elements of the first eigenvector:

02/05/2022
21 5.11 Interpretation of the Principal Components
Step 3: To interpret each component, we must compute the correlations between the original
data for each variable and each principal component.
These correlations are obtained using the correlation procedure. In the variable statement we
will include the first three principal components, "prin1, prin2, and prin3", in addition to all
nine of the original variables. We will use these correlations between the principal
components and the original variables to interpret these principal components.
Because of standardization, all principal components will have mean 0. The standard
deviation is also given for each of the components and these will be the square root of the
eigenvalue.
More important for our current purposes are the correlations between the principal
components and the original variables. These have been copied into the following table. You
will also note that if you look at the principal components themselves that there is zero
correlation between the components 02/05/2022
22
Principal Component

Variable 1 2 3

Climate 0.190 0.017 0.207

Housing 0.544 0.020 0.204

Health 0.782 -0.605 0.144

Crime 0.365 0.294 0.585

Transportation 0.585 0.085 0.234

Education 0.394 -0.273 0.027

Arts 0.985 0.126 -0.111

Recreation 0.520 0.402 0.519

Economy 0.142 0.150 0.239


02/05/2022
23

Interpretation of the principal components is based on finding which variables are most
strongly correlated with each component, i.e., which of these numbers are large in
magnitude, the farthest from zero in either positive or negative direction. Which numbers
we consider to be large or small is of course is a subjective decision. You need to
determine at what level the correlation value will be of importance. Here a correlation
value above 0.5 is deemed important. These larger correlations are in boldface in the
table above:
We will now interpret the principal component results with respect to the value that we
have deemed significant.

02/05/2022
24
First Principal Component Analysis - PCA1
The first principal component is strongly correlated with five of the original variables. The
first principal component increases with increasing Arts, Health, Transportation, Housing
and Recreation scores. This suggests that these five criteria vary together. If one increases,
then the remaining ones tend to as well. This component can be viewed as a measure of the
quality of Arts, Health, Transportation, and Recreation, and the lack of quality in Housing
(recall that high values for Housing are bad). Furthermore, we see that the first principal
component correlates most strongly with the Arts. In fact, we could state that based on the
correlation of 0.985 that this principal component is primarily a measure of the Arts. It
would follow that communities with high values would tend to have a lot of arts available,
in terms of theaters, orchestras, etc. Whereas communities with small values would have
very few of these types of opportunities.

02/05/2022
25

Second Principal Component Analysis - PCA2


The second principal component increases with only one of the values, decreasing Health.
This component can be viewed as a measure of how unhealthy the location is in terms of
available health care including doctors, hospitals, etc.
Third Principal Component Analysis - PCA3
The third principal component increases with increasing Crime and Recreation. This
suggests that places with high crime also tend to have better recreation facilities.
To complete the analysis we often times would like to produce a scatter plot of the
component scores.

02/05/2022
26 Further analyses may include:
 Scatter plots of principal component scores. In the present context, we may wish to
identify the locations of each point in the plot to see if places with high levels of a
given component tend to be clustered in a particular region of the country, while sites
with low levels of that component are clustered in another region of the country.
 Principal components are often treated as dependent variables for regression and
analysis of variance.

02/05/2022
27
5.12 Alternative: Standardize the Variables
In the previous example we looked at principal components analysis applied to the raw
data. In our earlier discussion we noted that if the raw data is used principal component
analysis will tend to give more emphasis to those variables that have higher variances than
to those variables that have very low variances. In effect the results of the analysis will
depend on what units of measurement are used to measure each variable. That would imply
that a principal component analysis should only be used with the raw data if all variables
have the same units of measure. And even in this case, only if you wish to give those
variables which have higher variances more weight in the analysis.

02/05/2022
28 5.13 Summary
 The results of principal component analysis depend on the scales at which the variables
are measured.
 Variables with the highest sample variances will tend to be emphasized in the first few
principal components.
 Principal component analysis using the covariance function should only be considered if
all of the variables have the same units of measurement.
If the variables either have different units of measurement (i.e., pounds, feet, gallons, etc), or
if we wish each variable to receive equal weight in the analysis, then the variables should be
standardized before a principal components analysis is carried out. Standardize the variables
by subtracting its mean from that variable and dividing it by its standard deviation:

02/05/2022
29

where
 - Data for variable  in sample unit 
 - Sample mean for variable 
 - Sample standard deviation for variable 
We will now perform the principal component analysis using the standardized data.
Note: the variance-covariance matrix of the standardized data is equal to the correlation
matrix for the unstandardized data. Therefore, principal component analysis using the
standardized data is equivalent to principal component analysis using the correlation
matrix.

02/05/2022
5.13.1 Principal Component Analysis Procedure
30 The principal components are first calculated by obtaining the eigenvalues for the correlation
matrix:

In this matrix we denote the eigenvalues of the sample correlation matrix R, and the
corresponding eigenvectors

Then the estimated principal components scores are calculated using formulas similar to
before, but instead of using the raw data we will use the standardized data in the formulae
below:

Rest of the procedure and the interpretations are as discussed before.


02/05/2022
Example 5.3: (Principle components obtained from covariance and correlation
31
matrices are different)
Consider the covariance matrix

5.13.2 Places Rated after Standardization


We need to focus on the eigenvalues of the correlation matrix that correspond to each of
the principal components. In this case, total variation of the standardized variables is going
to be equal to p, the number of variables. After standardization each variable has variance
equal to one, and the total variation is the sum of these variations, in this case the total
variation will be 9.
The eigenvalues of the correlation matrix are given in the second column in the table
below. Note also the proportion of variation explained by each of the principal
components, as well as the cumulative proportion of the variation explained. 02/05/2022
Step 1
32
Examine the eigenvalues to determine how many principal components should be
considered:

Component Eigenvalue Proportion Cumulative

1 3.2978 0.3664 0.3664

2 1.2136 0.1348 0.5013

3 1.1055 0.1228 0.6241

4 0.9073 0.1008 0.7249

5 0.8606 0.0956 0.8205

6 0.5622 0.0625 0.8830

7 0.4838 0.0538 0.9368

8 0.3181 0.0353 0.9721 02/05/2022

9 0.2511 0.0279 1.0000


Another approach would be to plot the differences between the ordered values and look for
33
a break or a sharp drop. The only sharp drop that is noticeable in this case is after the first
component. One might, based on this, select only one component. However, one
component is probably too few, particularly because we have only explained 37% of the
variation. Consider the scree plot based on the standardized variables.

02/05/2022
The scree plot for standardized variables (correlation matrix)
Step 2
34 Next, we can compute the principal component scores using the eigenvectors. This is a
formula for the first principal component:

And remember, this is now going to be a function, not of the raw data but the standardized
data.
The magnitudes of the coefficients give the contributions of each variable to that component.
Since the data have been standardized, they do not depend on the variances of the
corresponding variables.

02/05/2022
35

Step 3
Next, we can look at the coefficients for the principal components. In this case, since the data
are standardized, within a column the relative magnitude of those coefficients can be directly
assessed. Each column here corresponds with a column in the output of the program labeled
Eigenvectors.

02/05/2022
36
  Principal Component

Variable 1 2 3 4 5

Climate 0.158 0.069 0.800 0.377 0.041

Housing 0.384 0.139 0.080 0.197 -0.580

Health 0.410 -0.372 -0.019 0.113 0.030

Crime 0.259 0.474 0.128 -0.042 0.692

Transportation 0.375 -0.141 -0.141 -0.430 0.191

Education 0.274 -0.452 -0.241 0.457 0.224

Arts 0.474 -0.104 0.011 -0.147 0.012

Recreation 0.353 0.292 0.042 -0.404 -0.306

Economy 0.164 0.540 -0.507 0.476 -0.037


02/05/2022
Interpretation of the principal components is based on finding which variables are most
37 strongly correlated with each component. In other words, we need to decide which numbers
are large within each column. In the first column we will decide that Health and Arts are
large. This is very arbitrary. Other variables might have also been included as part of this first
principal component.

5.13.4 Component Summaries


First Principal Component Analysis - PCA1
The first principal component is a measure of the quality of Health and the Arts, and to some
extent Housing, Transportation and Recreation. Health increases with increasing values in the
Arts. If any of these variables goes up, so do the remaining ones. They are all positively
related as they all have positive signs.

02/05/2022
38
Second Principal Component Analysis - PCA2
The second principal component is a measure of the severity of crime, the quality of the
economy, and the lack of quality in education. Crime and Economy increase with decreasing
Education. Here we can see that cities with high levels of crime and good economies also
tend to have poor educational systems.
Third Principal Component Analysis - PCA3
The third principal component is a measure of the quality of the climate and poorness of the
economy. Climate increases with decreasing Economy. The inclusion of economy within
this component will add a bit of redundancy within our results. This component is primarily
a measure of climate, and to a lesser extent the economy.

02/05/2022
39
Fourth Principal Component Analysis - PCA4
The fourth principal component is a measure of the quality of education and the economy
and the poorness of the transportation network and recreational opportunities. Education
and Economy increase with decreasing Transportation and Recreation.
Fifth Principal Component Analysis - PCA5
The fifth principal component is a measure of the severity of crime and the quality of
housing. Crime increases with decreasing housing.

02/05/2022
40 Example 5.4 (Summarizing sample variability with two sample principal components)
A census provided information, by tract, on five socioeconomic variables for the Madison,
Wisconsin, area. The data from 61 tracts produced the following summary statistics:

02/05/2022
41

5.14 Large Sample Inferences


We have seen that the eigenvalues and eigenvectors of the covariance (correlation) matrix are the essence of
a principal component analysis. The eigenvectors determine the directions of maximum variability, and the
eigenvalues specify the variances. When the first few eigenvalues are much larger than the rest, most of the
total variance can be "explained" in fewer than dimensions.
In practice, decisions regarding the quality of the principal component approximation must be made on the
basis of the eigenvalue-eigenvector pairs extracted from . Because of sampling variation, these eigenvalues
and eigenvectors will differ from their underlying population counter parts.
The sampling distributions of and are difficult to derive and beyond the scope of this course. We shall
simply summarize the pertinent large sample results.
02/05/2022
42

5.14.1 Large Sample Properties of and


Currently available results concerning large sample confidence intervals for and assume
that the observations are a random sample from a normal population. It must also be
assumed that the (unknown) eigenvalues of are distinct and positive, so that . The one
exception is the case where the number of equal eigenvalues is known. Usually the
conclusions for distinct eigenvalues are applied, unless there is a strong reason to believe
that has a special structure that yields equal eigenvalues. Even when the normal
assumption is violated the confidence intervals obtained in this manner still provide some
indication of the uncertainty in and ·
02/05/2022
Anderson
43 and Girshick have established the following large sample distribution theory for
the eigenvalues and eigenvectors of :
1. Let be the diagonal matrix of eigenvalues of, then is approximately .
2. Let

 
then is approximately

02/05/2022
3. Each is distributed independently of the elements of the associated ·
44
Result 1 implies that, for large, the are independently distributed. Moreover, has an
approximate distribution. Using this normal distribution, we obtain A large
sample confidence interval for is thus provided by
 

02/05/2022
Result 2 implies that the are normally distributed about the corresponding s for large
45
samples. The elements of each are correlated, and the correlation depends to a large extent
on the separation of the eigenvalues (which unknown) and the sample size . Approximate
standard errors for the coefficients are given by the square roots of the diagonal elements of
where is derived from by substituting s for the s and s for the s.
Example 5.5. (Constructing a confidence interval for )

Obtain a 95% confidence interval for , the variance of the first population principal
component using the .

02/05/2022
 
46
5.14.2 Testing for the Equal Correlation Structure
The special correlation structure , all , is one important structure in which the eigenvalues
of are not distinct and the previous results do not apply.
To test for this structure, let

02/05/2022
and
47
A test of versus may be based on a likelihood ratio statistic, but Lawley has demonstrated that an
equivalent test procedure can be constructed from the off-diagonal elements of R.
Lawley's procedure requires the quantities

02/05/2022
48

It is evident that is the average of the off-diagonal elements in the kth column (or row) of and is the
overall average of the off-diagonal elements.
The large sample approximate -level test is to reject in favor of if

 
where is the upper th percentile of a chi-square distribution with degrees of freedom.

02/05/2022
49

Example 5.6 (Testing for equi-correlation structure)


The sample correlation matrix constructed from the n = 150 post-birth weights of female mice is
 

 
Test for the Equal Correlation Structure.

02/05/2022
50

5.15 Model Adequacy Tests


5.15.1 Bartlett’s sphericity test
Calculation of the correlations between the variables
The Bartlett’s test compares the observed correlation matrix to the identity matrix. In other words, it checks if
there is a certain redundancy between the variables that we can summarize with a few numbers of factors. If
the variables are perfectly correlated, only one factor is sufficient. If they are orthogonal, we need as many
factors as variables. In this last case, the correlation matrix is the same as the identity matrix. A simple
strategy is to visualize the correlation matrix. If the values outside the main diagonal are often high (in
absolute value), some variables are correlated; if most these values are near to zero, the PCA is not really
useful.

02/05/2022
The Bartlett’s test checks if the observed correlation matrix diverges significantly from the identity
51
matrix (theoretical matrix under : the variables are orthogonal). The PCA can perform a compression of
the available information only if we reject the null hypothesis.
In order to measure the overall relation between the variables, we compute the determinant of the
correlation matrix . Under , if the variables are highly correlated, we have .

The Bartlett's test statistic indicates to what extent we deviate from the reference situation |R| = 1. It uses
the following formula.

 
Under , it follows a distribution with a degree of freedom.

02/05/2022
Example 5.7
Apply
52 Bartlett’s sphericity test for following correlation matrix

 
5.15.2 KMO Measure of Sampling Adequacy (MSA)
The KMO index has the same goal. It checks if we can factorize efficiently the original variables. But it is
based on another idea.
The correlation matrix is always the starting point. We know that the variables are more or less correlated,
but the correlation between two variables can be influenced by the others. So, we use the partial correlation
in order to measure the relation between two variables by removing the effect of the remaining variables.
The KMO index compares the values of correlations between variables and those of the partial correlations.
If the KMO index is high the PCA can act efficiently; if KMO is low , the PCA is not relevant. Some
references give a table for the interpretation of the value of the KMO index obtained on a dataset.
02/05/2022
Partial
53correlation matrix
The partial correlation matrix can be obtained from the correlation matrix. We calculate the inverse of this last one and
we compute the partial correlation as follows:

 
Overall KMO index
The overall KMO index is computed as follows.

If the partial correlation is near to zero, the PCA can perform efficiently the factorization because the variables are
highly related: .

02/05/2022

You might also like