0% found this document useful (0 votes)

141 views

Principal Components Analysis (PCA)

Principal components analysis (PCA) is a technique used to reduce the dimensionality of a dataset while retaining as much information as possible. It works by transforming the variables into a new set of uncorrelated principal components. The first principal component accounts for as much variability in the data as possible, and each subsequent component accounts for as much of the remaining variability as possible. The number of principal components retained is usually determined by examining the eigenvalues, retaining components that explain a high proportion of the total variability in the data. PCA is illustrated using a dataset that rates 329 communities on 9 criteria.

Uploaded by

Ontime Bestwriters

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

141 views

Principal Components Analysis (PCA)

Uploaded by

Ontime Bestwriters

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 53

CHAPTER 5

PRINCIPAL COMPONENTS ANALYSIS

(PCA)
2 5.1 Introduction
A principal component analysis is concerned with explaining the variance-
covariance structure of a set of variables through a few linear combinations of these
variables. Its general objectives are (1) data reduction and (2) interpretation.
Although components are required to reproduce the total system variability, often
much of this variability can be accounted for by a small number of the principal
components. If so, there is (almost) as much information in the components as
there is in the original variables. The principal components can then replace the
initial variables, and the original data set, consisting of measurements on variables,
is reduced to a data set consisting of measurements on principal components.
An analysis sometimes data are collected on a large number of variables from a
single population. As an example consider the Places Rated dataset below.
02/05/2022
3 Example 5.1: Places Rated
In the Places Rated Almanac, Boyer and Savageau rated 329 communities according to the
following nine criteria:
1. Climate and Terrain
2. Housing
3. Health Care & the Environment
4. Crime
5. Transportation
6. Education
7. The Arts
8. Recreation
9. Economics

02/05/2022
4 5.2 Principal Component Analysis (PCA) Procedure
Algebraically principle components are linear combinations of the p random variables
geometrically, these linear combinations represent the selection of a new coordinate
system obtained by rotating the original system with as the coordinate axis. The new
axis represent the directions with maximum variability and provide a simpler and more
parsimonious description of the covariance structure.
Suppose that we have a random vector X with population variance-covariance matrix

and

02/05/2022
5

Consider the linear combinations

Similarly

02/05/2022
()
𝑥1
𝑥2
6 ′
𝑌 𝑝 =𝑎𝑝1 𝑋 1 +𝑎𝑝 2 𝑋 2+ … … … ..+𝑎𝑝𝑝 𝑋 𝑝 =( 𝑎𝑝1 𝑎𝑝 2 ¿ 𝑎 𝑝𝑝 ) … =𝑎𝑝 𝑋
𝑥 𝑝 −1
𝑥𝑝

Each of these can be thought of as a linear regression, predicting from . There is no

intercept, but an be viewed as regression coefficients. Note that is a function of our
random data, and so is also random. Therefore it has a population variance

Moreover, Yi and Yj will have a population covariance

02/05/2022
5.3 First Principal Component (PCA1): Y1
7
The first principal component is the linear combination of -variables that has maximum
variance (among all linear combinations), so it accounts for as much variation in the data as
possible.
Specifically we will define coefficients for that component in such a way that its variance is
maximized, subject to the constraint that the sum of the squared coefficients is equal to one.
This constraint is required so that a unique answer may be obtained.
That is select such that

Subject to the constrain

02/05/2022
8
5.4 Second Principal Component (PCA2): Y2
The second principal component is the linear combination of -variables that accounts for
as much of the remaining variation as possible, with the constraint that the correlation
between the first and second component is 0.
Select that maximizes the variance of this new component.

Subject to the constrain

Along with the additional constraint these two components will be uncorrelated
with one another.

02/05/2022
5.5 How do we find the coefficients?
9
How do we find the coefficients for a principal component?
Result 5.1:
Let be the covariance matrix associated with the random vector . Letwhere Then the ith
principle component is given by

With these choices,

The solution involves the eigenvalues and eigenvectors of the variance-covariance matrix Σ.
The variance-covariance matrix may be written as a function of the eigenvalues and their
corresponding eigenvectors. This is determined by using the Spectral Decomposition
Theorem.

02/05/2022
10 5.6 Spectral Decomposition Theorem
The variance-covariance matrix can be written as

If are small. We might approximate Σ by

The total variation of

02/05/2022
11
This will give us an interpretation of the components in terms of the amount of the full
variation explained by each component. The proportion of variation explained by the ith
principal component is then going to be defined to be the eigenvalue for that component
divided by the sum of the eigenvalues. In other words, the ith principal component explains
the following proportion of the total variation:

A related quantity is the proportion of variation explained by the first principal component.

This would be the sum of the first eigenvalues divided by its total variation.

Naturally, if the proportion of variation explained by the first principal components is large,
02/05/2022
then not much information is lost by considering only the first principal components.
12
5.7 Why It May Be Possible to Reduce Dimensions
When we have correlations (multicollinarity) between the x-variables, the data may more
or less fall on a line or plane in a lower number of dimensions. For instance, imagine a
plot of two x-variables that have a nearly perfect correlation. The data points will fall
close to a straight line. That line could be used as a new (one-dimensional) axis to
represent the variation among data points. As another example, suppose that we have
verbal, math, and total SAT scores for a sample of students. We have three variables, but
really (at most) two dimensions to the data because , meaning the third variable is
completely determined by the first two. The reason for saying “at most” two dimensions
is that if there is a strong correlation between verbal and math, then it may be possible
that there is only one true dimension to the data.

02/05/2022
13

Note:
All of this is defined in terms of the population variance-covariance matrix
which is unknown. However, we may estimate by the sample variance-
covariance matrix which is given in the standard formula here:

02/05/2022
14
5.8 Procedure
Compute the eigenvalues of the sample variance-covariance matrix S, and the
corresponding eigenvectors .Then we will define our estimated principal components
using the eigenvectors as our coefficients:

……………..

02/05/2022
15
Generally, we only retain the first k principal component. Here we must balance two
conflicting desires:
1. To obtain the simplest possible interpretation, we want  to be as small as possible. If
we can explain most of the variation just by two principal components then this would
give us a much simpler description of the data. The smaller  is the smaller amount of
variation is explained by the first component.
2. To avoid loss of information, we want the proportion of variation explained by the
first  principal components to be large. Ideally as close to one as possible; i.e., we want

02/05/2022
Example 5.2: Places Rated
16 We will use the Places Rated Almanac data (Boyer and Savageau) which rates 329
communities according to nine criteria:
1. Climate and Terrain
2. Housing
3. Health Care & Environment
4. Crime
5. Transportation
6. Education
7. The Arts
8. Recreation
9. Economics

5.9 Data Analysis:

Step 1: We examine the eigenvalues to determine how many principal components should
02/05/2022
be considered:
17 Table 5.1: Eigenvalues and the proportion of variation explained by the principal
components.

Component Eigenvalue Proportion Cumulative

1 0.3775 0.7227 0.7227

2 0.0511 0.0977 0.8204

3 0.0279 0.0535 0.8739

4 0.0230 0.0440 0.9178

5 0.0168 0.0321 0.9500

6 0.0120 0.0229 0.9728

7 0.0085 0.0162 0.9890

8 0.0039 0.0075 0.9966

9 0.0018 0.0034 1.0000 02/05/2022

Result 5.2:
18 If are the principal components obtained from the covariance matrix , then

Are the correlation coefficients between the components and the variable .

Example 5.2: (Calculating the principal components)

Suppose the random numbers have the covariance matrix

02/05/2022
5.10 Scree Plot
19
Alternative Method to determine the number of principal components is to look at a Scree
Plot.

The scree plot for the variables without standardization (covariance matrix) 02/05/2022
20

Step 2: Next, we will compute the principal component scores. For example, the first principal
component can be computed using the elements of the first eigenvector:

02/05/2022
21 5.11 Interpretation of the Principal Components
Step 3: To interpret each component, we must compute the correlations between the original
data for each variable and each principal component.
These correlations are obtained using the correlation procedure. In the variable statement we
will include the first three principal components, "prin1, prin2, and prin3", in addition to all
nine of the original variables. We will use these correlations between the principal
components and the original variables to interpret these principal components.
Because of standardization, all principal components will have mean 0. The standard
deviation is also given for each of the components and these will be the square root of the
eigenvalue.
More important for our current purposes are the correlations between the principal
components and the original variables. These have been copied into the following table. You
will also note that if you look at the principal components themselves that there is zero
correlation between the components 02/05/2022
22
Principal Component

Variable 1 2 3

Climate 0.190 0.017 0.207

Housing 0.544 0.020 0.204

Health 0.782 -0.605 0.144

Crime 0.365 0.294 0.585

Transportation 0.585 0.085 0.234

Education 0.394 -0.273 0.027

Arts 0.985 0.126 -0.111

Recreation 0.520 0.402 0.519

Economy 0.142 0.150 0.239

02/05/2022
23

Interpretation of the principal components is based on finding which variables are most
strongly correlated with each component, i.e., which of these numbers are large in
magnitude, the farthest from zero in either positive or negative direction. Which numbers
we consider to be large or small is of course is a subjective decision. You need to
determine at what level the correlation value will be of importance. Here a correlation
value above 0.5 is deemed important. These larger correlations are in boldface in the
table above:
We will now interpret the principal component results with respect to the value that we
have deemed significant.

02/05/2022
24
First Principal Component Analysis - PCA1
The first principal component is strongly correlated with five of the original variables. The
first principal component increases with increasing Arts, Health, Transportation, Housing
and Recreation scores. This suggests that these five criteria vary together. If one increases,
then the remaining ones tend to as well. This component can be viewed as a measure of the
quality of Arts, Health, Transportation, and Recreation, and the lack of quality in Housing
(recall that high values for Housing are bad). Furthermore, we see that the first principal
component correlates most strongly with the Arts. In fact, we could state that based on the
correlation of 0.985 that this principal component is primarily a measure of the Arts. It
would follow that communities with high values would tend to have a lot of arts available,
in terms of theaters, orchestras, etc. Whereas communities with small values would have
very few of these types of opportunities.

02/05/2022
25

Second Principal Component Analysis - PCA2

The second principal component increases with only one of the values, decreasing Health.
This component can be viewed as a measure of how unhealthy the location is in terms of
available health care including doctors, hospitals, etc.
Third Principal Component Analysis - PCA3
The third principal component increases with increasing Crime and Recreation. This
suggests that places with high crime also tend to have better recreation facilities.
To complete the analysis we often times would like to produce a scatter plot of the
component scores.

02/05/2022
26 Further analyses may include:
 Scatter plots of principal component scores. In the present context, we may wish to
identify the locations of each point in the plot to see if places with high levels of a
given component tend to be clustered in a particular region of the country, while sites
with low levels of that component are clustered in another region of the country.
 Principal components are often treated as dependent variables for regression and
analysis of variance.

02/05/2022
27
5.12 Alternative: Standardize the Variables
In the previous example we looked at principal components analysis applied to the raw
data. In our earlier discussion we noted that if the raw data is used principal component
analysis will tend to give more emphasis to those variables that have higher variances than
to those variables that have very low variances. In effect the results of the analysis will
depend on what units of measurement are used to measure each variable. That would imply
that a principal component analysis should only be used with the raw data if all variables
have the same units of measure. And even in this case, only if you wish to give those
variables which have higher variances more weight in the analysis.

02/05/2022
28 5.13 Summary
 The results of principal component analysis depend on the scales at which the variables
are measured.
 Variables with the highest sample variances will tend to be emphasized in the first few
principal components.
 Principal component analysis using the covariance function should only be considered if
all of the variables have the same units of measurement.
If the variables either have different units of measurement (i.e., pounds, feet, gallons, etc), or
if we wish each variable to receive equal weight in the analysis, then the variables should be
standardized before a principal components analysis is carried out. Standardize the variables
by subtracting its mean from that variable and dividing it by its standard deviation:

02/05/2022
29

where
 - Data for variable in sample unit
 - Sample mean for variable
 - Sample standard deviation for variable
We will now perform the principal component analysis using the standardized data.
Note: the variance-covariance matrix of the standardized data is equal to the correlation
matrix for the unstandardized data. Therefore, principal component analysis using the
standardized data is equivalent to principal component analysis using the correlation
matrix.

02/05/2022
5.13.1 Principal Component Analysis Procedure
30 The principal components are first calculated by obtaining the eigenvalues for the correlation
matrix:

In this matrix we denote the eigenvalues of the sample correlation matrix R, and the
corresponding eigenvectors

Then the estimated principal components scores are calculated using formulas similar to
before, but instead of using the raw data we will use the standardized data in the formulae
below:

Rest of the procedure and the interpretations are as discussed before.

02/05/2022
Example 5.3: (Principle components obtained from covariance and correlation
31
matrices are different)
Consider the covariance matrix

5.13.2 Places Rated after Standardization

We need to focus on the eigenvalues of the correlation matrix that correspond to each of
the principal components. In this case, total variation of the standardized variables is going
to be equal to p, the number of variables. After standardization each variable has variance
equal to one, and the total variation is the sum of these variations, in this case the total
variation will be 9.
The eigenvalues of the correlation matrix are given in the second column in the table
below. Note also the proportion of variation explained by each of the principal
components, as well as the cumulative proportion of the variation explained. 02/05/2022
Step 1
32
Examine the eigenvalues to determine how many principal components should be
considered:

Component Eigenvalue Proportion Cumulative

1 3.2978 0.3664 0.3664

2 1.2136 0.1348 0.5013

3 1.1055 0.1228 0.6241

4 0.9073 0.1008 0.7249

5 0.8606 0.0956 0.8205

6 0.5622 0.0625 0.8830

7 0.4838 0.0538 0.9368

8 0.3181 0.0353 0.9721 02/05/2022

9 0.2511 0.0279 1.0000

Another approach would be to plot the differences between the ordered values and look for
33
a break or a sharp drop. The only sharp drop that is noticeable in this case is after the first
component. One might, based on this, select only one component. However, one
component is probably too few, particularly because we have only explained 37% of the
variation. Consider the scree plot based on the standardized variables.

02/05/2022
The scree plot for standardized variables (correlation matrix)
Step 2
34 Next, we can compute the principal component scores using the eigenvectors. This is a
formula for the first principal component:

And remember, this is now going to be a function, not of the raw data but the standardized
data.
The magnitudes of the coefficients give the contributions of each variable to that component.
Since the data have been standardized, they do not depend on the variances of the
corresponding variables.

02/05/2022
35

Step 3
Next, we can look at the coefficients for the principal components. In this case, since the data
are standardized, within a column the relative magnitude of those coefficients can be directly
assessed. Each column here corresponds with a column in the output of the program labeled
Eigenvectors.

02/05/2022
36
Principal Component

Variable 1 2 3 4 5

Climate 0.158 0.069 0.800 0.377 0.041

Housing 0.384 0.139 0.080 0.197 -0.580

Health 0.410 -0.372 -0.019 0.113 0.030

Crime 0.259 0.474 0.128 -0.042 0.692

Transportation 0.375 -0.141 -0.141 -0.430 0.191

Education 0.274 -0.452 -0.241 0.457 0.224

Arts 0.474 -0.104 0.011 -0.147 0.012

Recreation 0.353 0.292 0.042 -0.404 -0.306

Economy 0.164 0.540 -0.507 0.476 -0.037

02/05/2022
Interpretation of the principal components is based on finding which variables are most
37 strongly correlated with each component. In other words, we need to decide which numbers
are large within each column. In the first column we will decide that Health and Arts are
large. This is very arbitrary. Other variables might have also been included as part of this first
principal component.

5.13.4 Component Summaries

First Principal Component Analysis - PCA1
The first principal component is a measure of the quality of Health and the Arts, and to some
extent Housing, Transportation and Recreation. Health increases with increasing values in the
Arts. If any of these variables goes up, so do the remaining ones. They are all positively
related as they all have positive signs.

02/05/2022
38
Second Principal Component Analysis - PCA2
The second principal component is a measure of the severity of crime, the quality of the
economy, and the lack of quality in education. Crime and Economy increase with decreasing
Education. Here we can see that cities with high levels of crime and good economies also
tend to have poor educational systems.
Third Principal Component Analysis - PCA3
The third principal component is a measure of the quality of the climate and poorness of the
economy. Climate increases with decreasing Economy. The inclusion of economy within
this component will add a bit of redundancy within our results. This component is primarily
a measure of climate, and to a lesser extent the economy.

02/05/2022
39
Fourth Principal Component Analysis - PCA4
The fourth principal component is a measure of the quality of education and the economy
and the poorness of the transportation network and recreational opportunities. Education
and Economy increase with decreasing Transportation and Recreation.
Fifth Principal Component Analysis - PCA5
The fifth principal component is a measure of the severity of crime and the quality of
housing. Crime increases with decreasing housing.

02/05/2022
40 Example 5.4 (Summarizing sample variability with two sample principal components)
A census provided information, by tract, on five socioeconomic variables for the Madison,
Wisconsin, area. The data from 61 tracts produced the following summary statistics:

02/05/2022
41

5.14 Large Sample Inferences

We have seen that the eigenvalues and eigenvectors of the covariance (correlation) matrix are the essence of
a principal component analysis. The eigenvectors determine the directions of maximum variability, and the
eigenvalues specify the variances. When the first few eigenvalues are much larger than the rest, most of the
total variance can be "explained" in fewer than dimensions.
In practice, decisions regarding the quality of the principal component approximation must be made on the
basis of the eigenvalue-eigenvector pairs extracted from . Because of sampling variation, these eigenvalues
and eigenvectors will differ from their underlying population counter parts.
The sampling distributions of and are difficult to derive and beyond the scope of this course. We shall
simply summarize the pertinent large sample results.
02/05/2022
42

5.14.1 Large Sample Properties of and

Currently available results concerning large sample confidence intervals for and assume
that the observations are a random sample from a normal population. It must also be
assumed that the (unknown) eigenvalues of are distinct and positive, so that . The one
exception is the case where the number of equal eigenvalues is known. Usually the
conclusions for distinct eigenvalues are applied, unless there is a strong reason to believe
that has a special structure that yields equal eigenvalues. Even when the normal
assumption is violated the confidence intervals obtained in this manner still provide some
indication of the uncertainty in and ·
02/05/2022
Anderson
43 and Girshick have established the following large sample distribution theory for
the eigenvalues and eigenvectors of :
1. Let be the diagonal matrix of eigenvalues of, then is approximately .
2. Let

then is approximately

02/05/2022
3. Each is distributed independently of the elements of the associated ·
44
Result 1 implies that, for large, the are independently distributed. Moreover, has an
approximate distribution. Using this normal distribution, we obtain A large
sample confidence interval for is thus provided by

02/05/2022
Result 2 implies that the are normally distributed about the corresponding s for large
45
samples. The elements of each are correlated, and the correlation depends to a large extent
on the separation of the eigenvalues (which unknown) and the sample size . Approximate
standard errors for the coefficients are given by the square roots of the diagonal elements of
where is derived from by substituting s for the s and s for the s.
Example 5.5. (Constructing a confidence interval for )

Obtain a 95% confidence interval for , the variance of the first population principal
component using the .

02/05/2022

46
5.14.2 Testing for the Equal Correlation Structure
The special correlation structure , all , is one important structure in which the eigenvalues
of are not distinct and the previous results do not apply.
To test for this structure, let

02/05/2022
and
47
A test of versus may be based on a likelihood ratio statistic, but Lawley has demonstrated that an
equivalent test procedure can be constructed from the off-diagonal elements of R.
Lawley's procedure requires the quantities

02/05/2022
48

It is evident that is the average of the off-diagonal elements in the kth column (or row) of and is the
overall average of the off-diagonal elements.
The large sample approximate -level test is to reject in favor of if

where is the upper th percentile of a chi-square distribution with degrees of freedom.

02/05/2022
49

Example 5.6 (Testing for equi-correlation structure)

The sample correlation matrix constructed from the n = 150 post-birth weights of female mice is

Test for the Equal Correlation Structure.

02/05/2022
50

5.15 Model Adequacy Tests

5.15.1 Bartlett’s sphericity test
Calculation of the correlations between the variables
The Bartlett’s test compares the observed correlation matrix to the identity matrix. In other words, it checks if
there is a certain redundancy between the variables that we can summarize with a few numbers of factors. If
the variables are perfectly correlated, only one factor is sufficient. If they are orthogonal, we need as many
factors as variables. In this last case, the correlation matrix is the same as the identity matrix. A simple
strategy is to visualize the correlation matrix. If the values outside the main diagonal are often high (in
absolute value), some variables are correlated; if most these values are near to zero, the PCA is not really
useful.

02/05/2022
The Bartlett’s test checks if the observed correlation matrix diverges significantly from the identity
51
matrix (theoretical matrix under : the variables are orthogonal). The PCA can perform a compression of
the available information only if we reject the null hypothesis.
In order to measure the overall relation between the variables, we compute the determinant of the
correlation matrix . Under , if the variables are highly correlated, we have .

The Bartlett's test statistic indicates to what extent we deviate from the reference situation |R| = 1. It uses
the following formula.

Under , it follows a distribution with a degree of freedom.

02/05/2022
Example 5.7
Apply
52 Bartlett’s sphericity test for following correlation matrix

5.15.2 KMO Measure of Sampling Adequacy (MSA)
The KMO index has the same goal. It checks if we can factorize efficiently the original variables. But it is
based on another idea.
The correlation matrix is always the starting point. We know that the variables are more or less correlated,
but the correlation between two variables can be influenced by the others. So, we use the partial correlation
in order to measure the relation between two variables by removing the effect of the remaining variables.
The KMO index compares the values of correlations between variables and those of the partial correlations.
If the KMO index is high the PCA can act efficiently; if KMO is low , the PCA is not relevant. Some
references give a table for the interpretation of the value of the KMO index obtained on a dataset.
02/05/2022
Partial
53correlation matrix
The partial correlation matrix can be obtained from the correlation matrix. We calculate the inverse of this last one and
we compute the partial correlation as follows:

Overall KMO index
The overall KMO index is computed as follows.

If the partial correlation is near to zero, the PCA can perform efficiently the factorization because the variables are
highly related: .

02/05/2022

Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
Gilbert Strang, Kai Borre - Linear Algebra, Geodesy, and GPS-Wellesley-Cambridge Press (1997)
100% (4)
Gilbert Strang, Kai Borre - Linear Algebra, Geodesy, and GPS-Wellesley-Cambridge Press (1997)
642 pages
A Step by Step Explanation of Principal Component Analysis
No ratings yet
A Step by Step Explanation of Principal Component Analysis
7 pages
The mathematics of quantum mechanics
From Everand
The mathematics of quantum mechanics
Alessio Mangoni
No ratings yet
Mathematical Modelling of A Mediaeval Battle: The Battle of Agincourt, 1415
No ratings yet
Mathematical Modelling of A Mediaeval Battle: The Battle of Agincourt, 1415
11 pages
Application of The AHP in Project Management
100% (2)
Application of The AHP in Project Management
9 pages
42663-0201596121 SM
No ratings yet
42663-0201596121 SM
85 pages
Robust Control System Design (2004) PDF
No ratings yet
Robust Control System Design (2004) PDF
15 pages
Bia b350f Unit 4
No ratings yet
Bia b350f Unit 4
38 pages
Unit5 1
No ratings yet
Unit5 1
98 pages
Lecture_note5
No ratings yet
Lecture_note5
53 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Chapter2 PCA
No ratings yet
Chapter2 PCA
65 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
1731050702_ML15_PCA
No ratings yet
1731050702_ML15_PCA
12 pages
Steps for PCA
No ratings yet
Steps for PCA
5 pages
Ch8-Principal Components
No ratings yet
Ch8-Principal Components
77 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
MV - Principal Components Using SAS
No ratings yet
MV - Principal Components Using SAS
69 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
Chapter6_MV
No ratings yet
Chapter6_MV
32 pages
L08 PrincipalComponentAnalysis
No ratings yet
L08 PrincipalComponentAnalysis
36 pages
Qrm2024 Topic5 Pca Fa
No ratings yet
Qrm2024 Topic5 Pca Fa
67 pages
Principal Component Analysis Slides
No ratings yet
Principal Component Analysis Slides
26 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Factor analysis is a statistical method used to explore the underlying structure of relationships among observed variables in a dataset. It aims to identify latent or unobservable factors that exp (2)
No ratings yet
Factor analysis is a statistical method used to explore the underlying structure of relationships among observed variables in a dataset. It aims to identify latent or unobservable factors that exp (2)
12 pages
PC Regression
No ratings yet
PC Regression
25 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
14 pages
Practical Guide To Principal Component N R
No ratings yet
Practical Guide To Principal Component N R
43 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
PC A Tutorial
No ratings yet
PC A Tutorial
12 pages
Unit-3
No ratings yet
Unit-3
28 pages
Lecture 9 PRINCIPAL COMPONENTS
No ratings yet
Lecture 9 PRINCIPAL COMPONENTS
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
02 Principal Components
No ratings yet
02 Principal Components
9 pages
DR Pca
No ratings yet
DR Pca
22 pages
Principal Component Analysis (PCA) Explained - Built in
No ratings yet
Principal Component Analysis (PCA) Explained - Built in
11 pages
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
No ratings yet
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
8 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Intermediate R - Principal Component Analysis
No ratings yet
Intermediate R - Principal Component Analysis
8 pages
Chapter 2 Principal Components Analysis: Math 3210
No ratings yet
Chapter 2 Principal Components Analysis: Math 3210
30 pages
Factor Analysis
No ratings yet
Factor Analysis
57 pages
Module12 - Unsupervised Learning
No ratings yet
Module12 - Unsupervised Learning
52 pages
Principal Components Analysis: Hal Whitehead BIOL4062/5062
No ratings yet
Principal Components Analysis: Hal Whitehead BIOL4062/5062
29 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
09 PCA
No ratings yet
09 PCA
22 pages
Multivariate Analysis Notes
No ratings yet
Multivariate Analysis Notes
6 pages
Jolliffe 2014
No ratings yet
Jolliffe 2014
5 pages
Multivariate Statistics Principal Component Analysis (PCA)
No ratings yet
Multivariate Statistics Principal Component Analysis (PCA)
41 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
AE - Tema 2 - Principal Component Analysis
No ratings yet
AE - Tema 2 - Principal Component Analysis
4 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
27 pages
Pac
No ratings yet
Pac
70 pages
Week 2 Notes
No ratings yet
Week 2 Notes
23 pages
Principal Component Analysis: 2.1 Definition of Principal Components
No ratings yet
Principal Component Analysis: 2.1 Definition of Principal Components
8 pages
MiM Predictive Analytics Sessions 1 2 (PCA)
No ratings yet
MiM Predictive Analytics Sessions 1 2 (PCA)
26 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
4 pages
Principal Components Analysis (PCA) : 2.1 Outline of Technique
No ratings yet
Principal Components Analysis (PCA) : 2.1 Outline of Technique
21 pages
Module 4-2 Principal Components Analysis
No ratings yet
Module 4-2 Principal Components Analysis
18 pages
Week 9 Lecture - Revision Test-dual-translated
No ratings yet
Week 9 Lecture - Revision Test-dual-translated
92 pages
STAT502
No ratings yet
STAT502
13 pages
Dynamic Bayesian Networks: Fundamentals and Applications
From Everand
Dynamic Bayesian Networks: Fundamentals and Applications
Fouad Sabry
No ratings yet
FIN 5309 Homework 9 Solution Fall 2018: Instructions
No ratings yet
FIN 5309 Homework 9 Solution Fall 2018: Instructions
16 pages
Optimization of Test and Maintenance Intervals Based On Risk and Cost
No ratings yet
Optimization of Test and Maintenance Intervals Based On Risk and Cost
14 pages
Chapter 7
No ratings yet
Chapter 7
49 pages
CI7150 WC&N Coursework 2019-20 PDF
No ratings yet
CI7150 WC&N Coursework 2019-20 PDF
2 pages
CI7150 WC&N Coursework 2019-20 PDF
No ratings yet
CI7150 WC&N Coursework 2019-20 PDF
2 pages
Using Matlab Based Laboratories To Demonstrate Wireless Communications Systems Principles PDF
No ratings yet
Using Matlab Based Laboratories To Demonstrate Wireless Communications Systems Principles PDF
6 pages
Using Matlab Based Laboratories To Demonstrate Wireless Communications Systems Principles PDF
No ratings yet
Using Matlab Based Laboratories To Demonstrate Wireless Communications Systems Principles PDF
6 pages
CI7150 WC&N Coursework 2019-20
No ratings yet
CI7150 WC&N Coursework 2019-20
2 pages
Interview and Simulation
100% (1)
Interview and Simulation
26 pages
Module Handbook (MAR312SL) (2019-20)
No ratings yet
Module Handbook (MAR312SL) (2019-20)
10 pages
LinearProgrammingModel1 PDF
No ratings yet
LinearProgrammingModel1 PDF
14 pages
Theory of LPC Vaidhyanathan
No ratings yet
Theory of LPC Vaidhyanathan
198 pages
MEG 505.1 - Lecture Notes-1
No ratings yet
MEG 505.1 - Lecture Notes-1
50 pages
Linear Algebra - Wikipedia
No ratings yet
Linear Algebra - Wikipedia
15 pages
Quantum COMP
No ratings yet
Quantum COMP
6 pages
Numerical Methods For Differential Equations and Applications
No ratings yet
Numerical Methods For Differential Equations and Applications
14 pages
M.tech Control Systems Syllabus - New Format 8714
No ratings yet
M.tech Control Systems Syllabus - New Format 8714
31 pages
B C A
0% (1)
B C A
51 pages
238e PDF
100% (1)
238e PDF
36 pages
Kalman Decomposition
No ratings yet
Kalman Decomposition
13 pages
HW 3
No ratings yet
HW 3
3 pages
Exercise Sheet 5: Quantum Information - Summer Semester 2020
No ratings yet
Exercise Sheet 5: Quantum Information - Summer Semester 2020
3 pages
Bachelor of Computer Application
No ratings yet
Bachelor of Computer Application
18 pages
VPD Manual
No ratings yet
VPD Manual
16 pages
AI-Generated Images Introduce Invisible Relevance Bias To Text-Image Retrieval
No ratings yet
AI-Generated Images Introduce Invisible Relevance Bias To Text-Image Retrieval
25 pages
Lecture Notes On The Gaussian Distribution
No ratings yet
Lecture Notes On The Gaussian Distribution
6 pages
EEET 3046 Control Systems (2020) : Lecture 10: Controllability and Controller Design by Pole Placement
No ratings yet
EEET 3046 Control Systems (2020) : Lecture 10: Controllability and Controller Design by Pole Placement
30 pages
Spacecraft Attitude Determinatio PDF
No ratings yet
Spacecraft Attitude Determinatio PDF
23 pages
07-Mech SNL 15.0 L05 Buckling
No ratings yet
07-Mech SNL 15.0 L05 Buckling
46 pages
IT2T1
No ratings yet
IT2T1
2 pages
General Aptitude Questions: Answer
No ratings yet
General Aptitude Questions: Answer
37 pages
Numpy and Matplotlib: Purushothaman.V.N March 10, 2011
No ratings yet
Numpy and Matplotlib: Purushothaman.V.N March 10, 2011
27 pages
Singular Value Decomposition
No ratings yet
Singular Value Decomposition
3 pages
Principles of Passive Vibration Control: Design of Absorber
No ratings yet
Principles of Passive Vibration Control: Design of Absorber
11 pages
Jeffrey D. Ullman Stanford University
No ratings yet
Jeffrey D. Ullman Stanford University
55 pages
FYP Technical Paper Hafiz
No ratings yet
FYP Technical Paper Hafiz
6 pages

Principal Components Analysis (PCA)

Uploaded by

Principal Components Analysis (PCA)

Uploaded by

CHAPTER 5

PRINCIPAL COMPONENTS ANALYSIS

Consider the linear combinations

Each of these can be thought of as a linear regression, predicting from . There is no

Moreover, Yi and Yj will have a population covariance

Subject to the constrain

Subject to the constrain

With these choices,

A related quantity is the proportion of variation explained by the first principal component.

5.9 Data Analysis:

Component Eigenvalue Proportion Cumulative

1 0.3775 0.7227 0.7227

2 0.0511 0.0977 0.8204

3 0.0279 0.0535 0.8739

4 0.0230 0.0440 0.9178

5 0.0168 0.0321 0.9500

6 0.0120 0.0229 0.9728

7 0.0085 0.0162 0.9890

8 0.0039 0.0075 0.9966

9 0.0018 0.0034 1.0000 02/05/2022

Example 5.2: (Calculating the principal components)

Climate 0.190 0.017 0.207

Housing 0.544 0.020 0.204

Health 0.782 -0.605 0.144

Crime 0.365 0.294 0.585

Transportation 0.585 0.085 0.234

Education 0.394 -0.273 0.027

Arts 0.985 0.126 -0.111

Recreation 0.520 0.402 0.519

Economy 0.142 0.150 0.239

Second Principal Component Analysis - PCA2

Rest of the procedure and the interpretations are as discussed before.

5.13.2 Places Rated after Standardization

Component Eigenvalue Proportion Cumulative

1 3.2978 0.3664 0.3664

2 1.2136 0.1348 0.5013

3 1.1055 0.1228 0.6241

4 0.9073 0.1008 0.7249

5 0.8606 0.0956 0.8205

6 0.5622 0.0625 0.8830

7 0.4838 0.0538 0.9368

8 0.3181 0.0353 0.9721 02/05/2022

9 0.2511 0.0279 1.0000

Climate 0.158 0.069 0.800 0.377 0.041

Housing 0.384 0.139 0.080 0.197 -0.580

Health 0.410 -0.372 -0.019 0.113 0.030

Crime 0.259 0.474 0.128 -0.042 0.692

Transportation 0.375 -0.141 -0.141 -0.430 0.191

Education 0.274 -0.452 -0.241 0.457 0.224

Arts 0.474 -0.104 0.011 -0.147 0.012

Recreation 0.353 0.292 0.042 -0.404 -0.306

Economy 0.164 0.540 -0.507 0.476 -0.037

5.13.4 Component Summaries

5.14 Large Sample Inferences

5.14.1 Large Sample Properties of and

Example 5.6 (Testing for equi-correlation structure)

5.15 Model Adequacy Tests

You might also like