Sessions 21-24 Factor Analysis - Ppt-Rev
Sessions 21-24 Factor Analysis - Ppt-Rev
Sessions 21-24 Factor Analysis - Ppt-Rev
Session 29-30
Session 1 -4 CEC 5 Presentations - Projects
Introduction to Research- Meaning &
Definition, Significance, Overview of
Methodology . Categories of Research, Session 31-32
Research Types , Process - Identification of Session 33 Other MultivariateTechniques –
Report Writing Discriminant, Cluster
1/20/2020
Problem
2
Stages of the Research Process
Management
Dilemma
Defining Research
Problem
Formulating Research
Research Report
Hypothesis
Data analysis
Developing Research & interpretation
Proposal
1/20/2020
Data Collection
3
Session Plan
Factor Analysis
Application Areas
Recommended Use
Steps in Factor Analysis
Analysis & Interpretation of Output
More on Communalities/Surrogate Variables
SPSS Lab
1/20/2020 4
Classification of Multivariate Statistical Techniques
1/20/2020 5
Classification of Multivariate Techniques (Contd.)
Dependence Techniques
• One or more variables can be identified as dependent variables and the remaining as independent
variables
• Choice of dependence technique depends on the number of dependent variables involved in
analysis
Interdependence Techniques
• Whole set of interdependent relationships is examined
• Further classified as having focus on variable or objects
1/20/2020 6
Factor Analysis
Suppose we ask a respondent who is likely to buy a 4-wheeler to rate the importance that he
would give to various aspects of a 4-wheeler like:
Mileage
Price
Smooth ride
Cost of spare parts
Servicing locations
Cooling effect
Leg space and so on
1/20/2020 7
Factor Analysis
Suppose he gives higher rating to following attributes:
Leg space
Smooth ride
Cooling effect
Interiors
1/20/2020 8
Factor Analysis
Similarly, if the respondent gives high scores on following attributes:
Price of car
Mileage
Price of spare parts
Interest on loan
What does he want?
He wants economy
1/20/2020 9
Factor Analysis Model
1/20/2020 10
Variables and Factors
Thus ‘comfort’, ‘economy’ are not single measurable entities but complex constructs that
are derived from many variables
Such complex constructs are called Factors. Identifying such factors greatly simplifies
understanding of complex phenomenon
Factor analysis is a tool that is used to identify factors from many inter-related variables.
Therefore, it is also called as data reduction technique
It is a multivariate technique where there is no distinction between dependent and
independent variables. ( Recall the chart)
All the variables under investigation are analysed together
Factors that explain most part of the variations of the original set of data are extracted
Helps in identifying the underlying structure of the data
1/20/2020 11
Factor Analysis Contd..
Factor Analysis is a set of techniques used for understanding variables by grouping them into
“factors” consisting of similar variables
It can also be used to confirm whether a hypothesized set of variables groups into a factor or not
It is most useful when a large number of variables needs to be reduced to a smaller set of “factors”
that contain most of the variance of the original variables
Factors extracted are linear combination of variables
Generally, Factor Analysis is done in two stages, called
• Extraction of Factors and
• Rotation of the Solution obtained in stage 1
Factor Analysis is best performed with interval or ratio-scaled variables
The factors extracted are statistically independent.
Thus the problem of multicollinearity in regression model can be solved using factor analysis.
1/20/2020 12
Steps in Factor Analysis
Step 1: Prepare statements:
Respondents are given statements related to the subject & response is obtained
1/20/2020 13
Pre Conditions for Factor Analysis
The objectives of factor analysis should be identified.
The variables to be included in the factor analysis should be specified based on
past research, theory, and judgment of the researcher. It is important that the
variables be appropriately measured on an interval or ratio scale.
An appropriate sample size should be used. As a rough guideline, there should be
at least four or five times as many observations (sample size) as there are
variables.
1/20/2020 14
Establishing The Strength Of Factor Analysis Solution
Step 2: To check goodness of fit
Sample Size taken for factor analysis should normally be more than 5 times the
number of variables for analysis
Following are three outputs that indicate appropriateness of model:
Correlation matrix
Bartlett’s test of sphericity
Kaiser-Meyre-Olkin(KMO) statistic
1/20/2020 15
Interpretation
Correlation matrix:
Non-diagonal values of correlation matrix close to zero show variables in row &
column are not related.
If such values are more, factor analysis will not be appropriate for the data.
1/20/2020 16
Interpretation of Bartlett’s test of sphericity
Bartlett’s test of sphericity can be used to test the null hypothesis that correlation matrix
is Identity matrix.
H0- The variables are uncorrelated in the population: in other words, the population
correlation matrix is an identity matrix.
H1- Variables are correlated , Matrix is not identity matrix
Rejection of this hypothesis indicates the appropriateness of factor analysis
If the p value is < .05, then reject the null hypothesis and it can be concluded that
correlation matrix is not identify matrix. Factor analysis would prove effective.
1/20/2020 17
Interpretation of KMO value
Kaiser-Meyer-Olkin (KMO) value also tells us whether factor analysis is appropriate for
data.
If KMO >= 0.50 factor analysis is appropriate for the data
The KMO is nothing but taking reliability measures of multiple split offs
1/20/2020 18
Step 3 – Extraction of Factors
Determines Number of factors to be extracted
Factors are linear combinations of original variables
Maximum number of factors equals no. of variables
Purpose is to reduce variables to fewer no. of factors
Popular method is Principal Component Analysis.
Based on the Concept of Eigen Values (the sum of the squares of factor loadings).
Higher the eigen value of the factor, higher is the amount of variance explained by
the factor
Extract least number of factors to explain maximum variance
1/20/2020 19
Eigen Values
Information captured by a factor is called its eigenvalue.
It is computed as sum of squares of factor loadings on the
factor.
1/20/2020 20
An Illustration….
X2 0.7245 0.2354
X3 -0.2585 0.9541
Eigenvalue (0.8045)2+(0.7245)2+(-0.2585)2 (-0.2578)2 +
= 1.2389 (0.2354)2 + 0.9541)2
(Sum of
= 1.0322
squares)
1/20/2020 21
Extraction of Factors
Each original variable has Eigen value =1 as every variable would form a factor in the
initial output.
By default, number of factors = number of variables
After the first stage during extraction only factors with eigen value >= 1 are retained
Factors with eigen value < 1 are no better than a single variable
The number of factors extracted is determined so that cumulative % of variance extracted
reaches a satisfactory level ( at least 60% ) for successfully taking out the most important
of variables as factors.
1/20/2020 22
Scree Plot- For Factor Extraction
A scree plot is a plot of the Eigen values against the number of factors in order of
extraction.
The Shape of the plot is used to determine the number of factors
The plot has a distinct break between steep slope of factors with large eigen
values & a gradual trailing off associated with rest of the factors
The gradual trailing off is referred to as Scree
The point at which scree begins denotes the No. of factors
Generally number of factors determined by scree plot is 1 or 2 more than
determined by eigen values
1/20/2020 23
Scree Plot- a plot of the Eigenvalues against the number of factors in order of
extraction.
3.0
2.5
2.0
Eigenvalue 1.5
1.0
0.5
0.0
1 2 3 4 5 6
1/20/2020
Component Number
24
Stage II -Step 3- Varimax Rotation
The rotated factor matrix comes as output of stage II when we request the
computer software to perform rotation & give us a rotated factor matrix
The popular method of rotation is Orthogonal(varimax)
The rotation keeps the factors orthogonal (independent) in relation to each other.
Rotation places the factors into positions that only the variables which are distinctly
related to a factor will be associated.
1/20/2020 25
Step 3: Rotate axis: To understand rotation of axis
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.6 -0.4 -0.2 0 0.2 0.4 0.6
1/20/2020 26
Varimax Rotation
1/20/2020 27
Rotation Explained
By rotating axis, points have come closer to new X & Y axis. Co-ordinates of points w.r.t. rotated axis
will be such that either X co-ordinate will be high or Y co-ordinate will be high but not both.
By rotating axis, we may get factor loadings high only on one factor and low on other factors. Such
factor loadings are easy to interpret.
‘Varimax’ rotation of axis is preferred method of rotation
Rotation does not affect communalities & % of total variance explained. However % of variance
accounted for by each factor does change . It redefines the factors in order to make sharper
distinctions in the meaning of the factors
The varimax rotation maximises the variance of the loadings
In factor rotation smallest loadings tend towards 0 & largest loadings tend towards 1.
The rotation is called orthogonal rotation if the axes are maintained at right angles.
1/20/2020 28
The Factor/component Matrix
The factor matrix ( whether unrotated or rotated ) gives us the loadings of each
variable on each of the extracted factors
This is similar to correlation matrix with loadings having values between 0 to 1
Values close to 1 represent high loadings & close to 0 low loadings
The objective is to find variables which have a high loading on one factor low
loadings on other factors.
If Factor 1 is loaded highly by variables ,say, 3.6 & 10,then, it is assumed that
Factor 1 is a linear combination of variables 3,6 & 10
It is given a suitable name representing essence of original variables (3,6 & 10)
1/20/2020 29
Step 4-Interpret and Give a suitable name
After extraction the next task is to interpret & name the factors
This is done by identifying which factors are associated with which original
variables
The factor/component matrix is used for this purpose
The original factor matrix is unrotated & comes as output of stage I but normally
rotated one is used
Although the initial or unrotated factor matrix indicates the relationship between the
factors and individual variables, it seldom results in factors that can be interpreted,
because the factors are correlated with many variables.
1/20/2020 30
Step 4 Surrogate Variables & Factor Names
By examining the factor matrix, one could select for each factor the variable with the
highest loading on that factor. That variable could then be used as a surrogate variable for
the associated factor.
However, the choice is not as easy if two or more variables have similarly high loadings.
In such a case, the choice between these variables should be based on theoretical and
measurement considerations.
By combining variables with high factor loadings, irrespective of sign, will give us
factors.
Based on variables that get combined, we name the factor. Naming the factor is
everybody’s judgment
1/20/2020 31
Step 5 -Factor Analysis – Linear combination of variables
1/20/2020 32
Weights Assigned
It is possible to select weights or factor score coefficients so that the first factor
explains the largest portion of the total variance.
Then a second set of weights can be selected, so that the second factor accounts
for most of the residual variance, subject to being uncorrelated with the first factor.
This same principle could be applied to selecting additional weights for the
additional factors.
1/20/2020 33
Interpretation Communality – h2
Communality.. Once number of factors are decided, one can find information from each
variable captured by selected factors. This is called communality of each variable
This is the proportion of variance explained by the common factors for each variable .It
shows how much of each variable is accounted for by underlying factors taken together.
It is sum of squares of factor loadings of the variable
It equals sum of squares of factor loadings for that variable. It ranges from 0 to 1.
In factor analysis the sum of the initial communality values of variables will be equal to
total number of variables considered for analysis
1/20/2020 34
Communality – An Illustration
1/20/2020 35
Eigen Values & Communalities
Variable Factor 1 Factor 2 Communality
From above , X1 & X2 can be combined with Factor 1 and X3 with Factor 2. For such
allocation we would prefer factor loadings of a variable high on only one factor. This
can be done with rotation of axis
1/20/2020 36
Factor Analysis- A Complete Illustration
In business research, a common application area of Factor Analysis is to
understand underlying motives of consumers who buy a product category or
a brand
For example, we assume that a two wheeler manufacturer is interested in
determining which variables his potential customers think about when they
consider his product
Let us assume that twenty two-wheeler owners were surveyed by this
manufacturer (or by a marketing research company on his behalf). They were
asked to indicate on a seven point scale (1=Completely Agree, 7=Completely
Disagree), their agreement or disagreement with a set of ten statements relating to
their perceptions and some attributes of the two-wheelers.
The objective of doing Factor Analysis is to find underlying "factors" which
would be fewer than 10 in number, but would be linear combinations of some of
the original 10 variables
1/20/2020 37
An Illustration….
The research design for data collection can be stated as follows-
Twenty 2-wheeler users were surveyed about their perceptions and image attributes of the vehicles they
owned. Ten questions were asked to each of them, all answered on a scale of 1 to 7 (1= completely
agree, 7= completely disagree).
1. I use a 2-wheeler because it is affordable.
3. Low maintenance cost makes a 2-wheeler very economical in the long run.
6. Some of my friends who don’t have their own vehicle are jealous of me.
7. I feel good whenever I see the ad for 2-wheeler on T.V., in a magazine or on a hoarding.
1/20/2020 38
The input data containing responses of twenty
respondents to the 10 statements are in Appendix 1, in
the form of a 20 Row by 10 column matrix
(reproduced below).
QUESTION NO.
S. 1 2 3 4 5 6 7 8 9 10
No.
1 1 4 1 6 5 6 5 2 3 2
2 2 3 2 4 3 3 3 5 5 2
3 2 2 2 1 2 1 1 7 6 2
4 5 1 4 2 2 2 2 3 2 3
5 1 2 2 5 4 4 4 1 1 2
6 3 2 3 3 3 3 3 6 5 3
7 2 2 5 1 2 1 2 4 4 5
8 4 4 3 4 4 5 3 2 3 3
9 2 3 2 6 5 6 5 1 4 1
10 1 4 2 2 1 2 1 4 4 1
1/20/2020 39
QUESTION NO.
S. 1 2 3 4 5 6 7 8 9 10
No.
11 1 5 1 3 2 3 2 2 2 1
12 1 6 1 1 1 1 1 1 2 2
13 3 1 4 4 4 3 3 6 5 3
14 2 2 2 2 2 2 2 1 3 2
15 2 5 1 3 2 3 2 2 1 6
16 5 6 3 2 1 3 2 5 5 4
17 1 4 2 2 1 2 1 1 1 3
18 2 3 1 1 2 2 2 3 2 2
19 3 3 2 3 4 3 4 3 3 3
20 4 3 2 7 6 6 6 2 3 6
1/20/2020 40
Steps 1 & 2
1/20/2020 41
SPSS Output & Interpretation
Interpret ?
1/20/2020 42
Step 3
As a first stage we request the software package used (SPSS,
Statistica, etc.) to EXTRACT factors with an Eigen Value of 1
or higher.
The method requested is the PRINCIPAL COMPONENTS
1/20/2020 43
Interpretation of the Output
1/20/2020 44
1. We note that three factors have been extracted, based
on our criterion that only Factors with eigen values of 1
or more should be extracted. We see from the Cum.
Pct. (Cumulative Percentage of Variance Explained)
column in that the three factors extracted together
account for 80 percent of the total variance
(information contained in the original ten variables).
This is a pretty good bargain, because we are able to
economise on the number of variables (from 10 we
have reduced them to 3 underlying factors), while we
lost only about 20 percent of the information content
(80 percent is retained by the 3 factors extracted out of
the 10 original variables).
2. This represents a reasonably good solution for our
problem.
1/20/2020 45
SPSS Output/Factor Matrix
Factor 1 Factor 2 Factor 3
1/20/2020 46
Step 4 Now, we try to interpret what these 3
extracted factors represent. This we can
accomplish by looking the two tables, the
rotated and unrotated factor matrices.
1.
Rotated Factor Matrix
Factor 1 Factor 2 Factor 3
VAR00001 .13402 .34749 .76402
VAR00002 -.18143 -.64300 -.07596
VAR00003 -.10944 .62985 .56742
VAR00004 .96986 -.06383 -.01338
VAR00005 .96455 .13362 .04660
VAR00006 .94544 -.13868 .02600
VAR00007 .97214 .02862 .09411
VAR00008 -.26169 .85203 .06517
VAR00009 .00891 .87772 -.08347
VAR00010 .07209 -.10990 .87874
1/20/2020 47
The Variables…..
1. I use a 2-wheeler because it is affordable.
2. It gives me a sense of freedom to own a 2-wheeler.
3. Low maintenance cost makes a 2-wheeler very economical in the long run.
4. A 2-wheeler is essentially a man’s vehicle.
5. I feel very powerful when I am on my 2-wheeler.
6. Some of my friends who don’t have their own vehicle are jealous of me.
7. I feel good whenever I see the ad for 2-wheeler on T.V., in a magazine or on a hoarding.
8. My vehicle gives me a comfortable ride.
9. I think 2-wheelers are a safe way to travel.
10. Three people should be legally allowed to travel on a 2-wheeler.
1/20/2020 48
1. Step 4 & 5 – Naming of factors
2. Looking at the rotated factor matrix, we notice that
variable nos. 4, 5, 6 and 7 have loadings of 0.96986,
0.96455, 0.94544 and 0.97214 on factor 1 (we look down
the Factor 1 column and look for high loadings close to
1.00). This suggests that Factor 1 is a combination of
these four original variables. The unrotated matrix also
suggests a similar grouping. Therefore, there is no
problem interpreting factor 1 as a combination of “a
man’s vehicle” (statement in variable 4), “feeling of
power” (variable 5), “others are jealous of me” (variable
6) and “feel good when I see my 2-wheeler ads”.
3. At this point, the researcher’s task is to find a suitable
phrase which captures the essence of the original
variables which form the underlying concept or “factor”.
In this case, factor 1 could be named “male ego”, or
“machismo”, or “pride of ownership” or something similar.
With the same mathematical output, interpretations of
different researchers may differ.
1/20/2020 49
Naming of Factors contd..
1. Now we will attempt to interpret factor 2. We look down the column
for Factor 2, and find that variables 8 and 9 have high loadings of
0.85203 and 0.87772, respectively. This indicates that factor 2 is a
combination of these two variables.
1/20/2020 51
Analysis & Interpretation
We must guard against the possibility that a variable may load highly on more
than one factors. Strictly speaking, a variable should load close to 1.00 on one
and only one factor, and load close to 0 on the other factors. If this is not the
case, it indicates that either the sample of respondents have more than one
opinion about the variable, or that the question/ variable may be unclear in its
phrasing.
The other issue important in practical use of factor analysis is the answer
to the question ‘what should be considered a high loading and what is not a
high loading?” Here, unfortunately, there is no clear-cut guideline, and many a
time, we must look at relative values in the factor matrix. Sometimes, 0.7 may
be treated as a high value, while sometimes 0.9 could be the cutoff for high
values.
1/20/2020 52
Analysis & Interpretation –Communality
1/20/2020 53
Analysis & Interpretation – Surrogate Variables
1/20/2020 54
Analysis & Interpretation
This may also partially explain why variable 2 is not appearing in our final
interpretation of the factors (in the earlier section). It is possible that variable 2 is
an independent variable which is not combining well with any other variable, and
therefore should be further investigated separately. “Freedom” could be a
different concept in the minds of our target audience.
It is recommended that we use the rotated factor matrix (rather than unrotated
factor matrix) for interpreting factors, particularly when we use the principal
components method for extraction of factors.
1/20/2020 55
SPSS Input/Output …
Factor scores. Factor scores are composite scores estimated for
each respondent on the derived factors
1/20/2020 56
SPSS & AMOS
AMOS:
A powerful structural equation modeling and confirmatory factor analysis tool that
specifies and fits your models easily and visually without using matrices.
Unveil more critical information than with conventional methods such as regression
or exploratory factor analysis.
1/20/2020 57
Factor Analysis – A Summary
Problem formulation
Rotation of Factors
Interpretation of Factors
Calculation of Selection of
Factor Scores Surrogate Variables
Analyze>Dimension Reduction>Factor …
1/20/2020 59
Recap….
Factor Analysis
Application Areas
Recommended Use
Steps in Factor Analysis
Analysis & Interpretation of Output
More on Communalities
SPSS Lab
1/20/2020 60
Q&A
Any Questions?
1/20/2020 61