The document discusses factor analysis methods like exploratory factor analysis and confirmatory factor analysis. It provides examples of established scales and covers different factor extraction and rotation methods.

Some examples of established scales discussed include the Symptom Distress Scale, Children's Depression Inventory, and Functional Assessment of Cancer Therapy Scale.

The main methods of factor analysis covered include factor extraction methods, factor rotation, and evaluating factor analysis models.

Current Topics in Statistics for Applied Researchers

Factor Analysis
George J. Knafl, PhD Professor & Senior Scientist [email protected]

to describe and demonstrate factor analysis of survey instrument data
primarily for assessment of established scales with some discussion of the development of new scales

emphasizing its use in exploratory, data-driven analyses

called exploratory factor analysis (EFA)

but with examples of its use in confirmatory, theorydriven analyses

called confirmatory factor analysis (CFA)

using the Statistical Package for the Social Sciences (SPSS) and the Statistical Analysis System (SAS)
PDF copy of slides are available on the Internet at

1. examples of established scales 2. principal component analysis vs. factor analysis
terminology and some primary factor analysis methods

3. factor extraction
survey of alternative methods

4. factor rotation
interpreting the results in terms of scales

5. factor analysis model evaluation

evaluating alternatives for factor extraction and rotation

6. a case study in ongoing scale development

with assistance from Kathleen Knafl

including example analyses in SPSS and SAS

Part 1 Examples of Scales

Data Used in Factor Analysis

factor analysis is used to identify dimensions underlying response (outcome) variables y
observed values for the variables y are available, so they are called manifest variables standardized variables z for the y are typically used and the correlation matrix R for the z is modeled

dimensions correspond to variables F called factors

observed values for the variables F are not available and so they are called latent variables

most types of manifest variables can be used

but more appropriate if they have more than a few distinct values and an approximate bell-shaped distribution

factor analysis is used in many different application areas

in the health sciences, it is usually applied to survey instrument data, and so that is the focus of these notes

A Simple Example
subjects undergoing radiotherapy were measured on 6 dimensions [1, p. 33]
number of symptoms amount of activity amount of sleep amount of food consumed appetite skin reaction

can these be grouped into sets of related measures to obtain a more parsimonious description of what they represent?
perhaps there are really only 2 distinct dimensions 6 for these 6 variables?

Survey Instruments
survey instruments consist of items
with discrete ranges of values, e.g., 1, 2,

items are grouped into disjoint sets

corresponding to scales items in these sets might be just summed
and then the scales are called summated possibly after reverse coding values for some items

or weighted and then summed

items might be further grouped into subsets

corresponding to subscales the subscales are often just used as the first step in computing the scales rather than as separate 7 measures

Example 1 - SDS
symptom distress scale [2]
symptom assessment for adults with cancer 13 items scored 1,2,3,4,5 measuring distress experience related to severity of 11 symptoms
nausea, appetite, insomnia, pain, fatigue, bowel pattern, concentration, appearance, outlook, breathing, cough and the frequency as well for nausea and pain

1 total scale
sum of the 13 items with none reverse coded higher scores indicate higher levels of symptom distress

Example 2 - CDI
Children's Depression Inventory [3]
27 items scored 0,1,2 assessing aspects of depressive symptoms for children and adolescents 1 total scale
sum of the 27 items after reverse coding 13 of them higher scores indicate higher depressive symptom levels

5 subscales measuring different aspects of depressive symptoms

negative mood, interpretation problems, ineffectiveness, anhedonia, and negative self-esteem the total scale equals the sum of the subscales

total scale used in practice rather than subscales

Example 3 FACES II
Family Adaptability & Cohesion Scales [4]
has several versions, will consider version II 30 items scored 1,2,3,4,5 2 scales
family adaptability
family's ability to alter its role relationships and power structure sum of 14 of the items after reverse coding 2 of them higher scores indicate higher family adaptability

family cohesion
the emotional bonding within the family sum of the other 16 of the items after reverse coding 6 of them higher scores indicate higher family cohesion

2 scales are typically used separately, but are sometimes summed to obtain a total FACES scale

Diabetes Quality of Life Youth scale [5]

51 items scored 1,2,3,4,5 3 scales
impact of diabetes

Example 4 - DQOLY

sum of 23 of the items after reverse coding 1 of them higher scores indicate higher negative impact (worse QOL)

diabetes-related worries
sum of 11 other items with none reverse coded higher scores indicate more worries (worse QOL)

satisfaction with life

sum of the other 17 items with none reverse coded higher scores indicate higher satisfaction (better QOL) so it has the reverse orientation to the other scales

the 3 scales are typically used separately and not usually combined into a total scale

the youth version of the scale is appropriate for children 13-17 years old
also has a school age version for children 8-12 years old 11 and a parent version

Example 5 - FACT
Functional Assessment of Cancer Therapy [6]
27 general (G) items scored 0-4 4 subscales
physical, social/family, emotional, functional subscales sums of 6-7 of the general items with some reverse coded

1 scale
the functional well-being scale (FACT-G) the sum of the 4 subscales higher scores indicate better levels of quality of life

extra items available for certain types of cancers

7 for colon (C) cancer, 9 for lung (L) cancer, scored 0-4 summed with some reverse coded into separate scales (FACT-C/FACT-L) these can also be added to the FACT-G
an overall functional well-being measure specific to the type of cancer

has been extended to chronic illnesses (FACIT)


Example 6 MOS SF-36

Medical Outcomes Study Short Form 36 [7]
36 items scored in varying ranges 8 subscales computed from 35 of the items
physical functioning, role-physical, bodily pain, general health, vitality, role-emotional, social functioning, mental health

2 scales computed from different weightings of the 8 subscales

two dimensions of quality of life physical component scale (PCS) physical health mental component scale (MCS) mental health

1 other item reporting overall assessment of health

but not used in computing scales

other versions with 12, 20, and 116 items


Example 7 - FMSS
Family Management Style Survey
a survey instrument currently under development parents of children having a chronic illness are being interviewed on how their families manage their child's chronic illness
as many parents as are willing to participate

there are 65 initial FMSS items

items 1-57 are applicable to both single and partnered parents items 58-65 address issues related to the parent's spouse and so are not completed by single parents

all items are coded from 1-5

1="strongly disagree" and 5="strongly agree"

challenge is to account for inter-parental correlation


Scale Development/Assessment
as part of scale development, an initial set of items is reduced to a final set of items which are then combined into one or more scales and possibly also subscales established scales, when used in novel settings, need to be assessed for their applicability to those settings such issues can be addressed in part using factor analysis techniques
will address these using data for the CDI, FACES II, DQOLY, and FMSS instruments starting with a popular approach related to principal 15 component analysis (PCA)

Part 2 Principal Component Analysis vs. Factor Analysis

factors, factor scores, and loadings eigenvalues and total variance conventions for choosing the # of factors communalities and specificities example analyses


Principal Component Analysis

standardize each item y
z = (y ! its average)/(its standard deviation) so the variance of each z equals 1 and the sum of the variances for all z's equals the # of items
called the total variance

items are typically standardized, but they do not have to be

associated with the z's are an equal # of principal components (PC's) each PC can be expressed as a weighted sum of z's
this is how they are defined and used for a standard PCA

each z can be expressed as a weighted sum of PC's

this is how they are used in a factor analysis based on PC's

Variable Reduction
PCA can be used to reduce the # of variables one such use is to simplify a regression analysis by reducing the # of predictor variables
predict a dependent variable using the first few PC's determined from the predictors, not all predictors

similar simplification for factor analysis

use the first few factors to model the z's

but not clear how many should you use

i.e., how many factors to extract?

diminishing returns to using more factors (or PC's), but hopefully there is a natural 18 separation point

Radiotherapy Data
can we model the correlation matrix R as if it its 6 dimensions were determined by 2 factors?
skin reaction is related to none of the others while appetite is related to the other 4 variables
Correlations Number of Symptoms 1 Amount Amount of Activity of Sleep .842** .322 .002 .364 10 10 10 .842** 1 .451 .002 .191 10 10 10 .322 .451 1 .364 .191 10 10 10 .412 .610 .466 .237 .061 .174 10 10 10 .766** .843** .641* .010 .002 .046 10 10 10 .348 -.116 .005 .325 .749 .989 10 10 10 Amount of Food Consumed .412 .237 10 .610 .061 10 .466 .174 10 1 Number of Symptoms Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Appetite Skin Reaction .766** .348 .010 .325 10 10 .843** -.116 .002 .749 10 10 .641* .005 .046 .989 10 10 .811** .067 .004 .854 10 10 10 .811** 1 .102 .004 .778 10 10 10 .067 .102 1 .854 .778 10 10 10

Amount of Activity

Amount of Sleep

Amount of Food Consumed Appetite

Skin Reaction

**. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).


(Common) Factor Analysis

treat each z as equal to a weighted sum of the same k factors F plus an error term u that is unique to each z
the weights L are called loadings z=L(1)@F(1)+L(2)@F(2)++L(k)@F(k)+u

the factors F are unobservable, so need to estimate their values

called the factor scores FS

same approach used with any factor extraction method since the same k factors F are used with each z, they are called common factors but different loadings L are used with each z different or unique errors u are also used with each z
hence they are called the unique (or specific) factors

Factor Analysis Assumptions

the factor analysis model for the standardized items z satisfies

assuming also that

the common factors F are
standardized (with mean 0 and variance 1) and independent of each other

the unique (specific) factors u

have mean zero (but not necessarily variance 1) and are independent of each other

all common factors are independent of all unique 21 factors

Factor Analysis Using PC's

PCA produces weights for computing the principal components PC from the z's factor analysis based on PC's uses these weights and PC scores to produce factor loadings L and factor scores FS to estimate factors, but only the first k are used

loadings are combined as entries in a matrix called the factor (pattern) matrix
1 row for each standardized item z
each containing loadings on all k factors for that standardized item

1 column for each factor F

each containing loadings for all z's on that factor

Radiotherapy Data Loadings

extracted 2 factors using the PCs # of symptoms loads more highly (.827) on factor 1 than on factor 2 (.361)
but the loading on factor 2 is not that small so maybe # of symptoms is distinctly related to both factors

loadings are usually rotated and ordered to be better able to allocated them to factors
Component Matrix a 1 Number of Symptoms Amount of Activity Amount of Sleep Amount of Food Consumed Appetite Skin Reaction Component 2 .827 .361 .903 -.152 .659 -.230 .790 -.128

.977 -.037 .134 .955 Extraction Method: Principal Component Analysis. a. 2 components extracted.


Ordered Rotated Loadings

the first 5 variables load more highly on factor 1 than on factor 2 only skin reaction loads more highly on factor 2 than factor 1
but factors with only 1 associated variable are suspect

however, # of symptoms loads highly on both factors

maybe it should be discarded since it is not unidimensional?
Rotated Component Matrix a 1 Appetite Amount of Activity Amount of Food Consumed Number of Symptoms Amount of Sleep Skin Reaction Component 2 .968 .140 .915 .015 .801 .748 .690 -.041 .017 .505 -.107 .963

Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 3 iterations.


part of each z is explained by the common factors

the communality for z is the amount of its variance explained by the common factors (hence its name)
1=VAR[z]=VAR[L(1)@F(1)+L(2)@F(2)++L(k)@F(k)]+VAR[u] variances add up due to independence assumptions

the variance of the unique factor u is called the uniqueness 1=VAR[z]=communality+uniqueness so the communality is between 0 and 1
u is also called the specific factor for z and then its variance is called the specificity

PC-Based Factor Analysis

can extract any # k of factors F up to the # of items z when k = the # of items
use all the factors F (and PC's) so the communality=1 and the uniqueness=0 for all z not really a factor analysis

when k < the # of z's

communalities are determined from loadings for the k factors
the communality of z = the sum of the squares of the loadings for z over all the factors F

then subtracted from 1 to get the uniqueness for z but need initial values for the communalities to start the computations

The PC Method
start by setting all communalities equal to 1
they stay that way if all the factor scores are used

if the # of factors < the # of items

recompute the communalities based on the extracted factors


Radiotherapy Data Communalities

communalities started out as all 1's
since the PC method was used to extract factors

but they were re-estimated based on loadings for the 2 extracted factors
the new values are < 1 as they should be when the # of factors < the # of items
Communalities Number of Symptoms Amount of Activity Amount of Sleep Amount of Food Consumed Appetite Skin Reaction Initial 1.000 1.000 1.000 1.000 Extraction .814 .838 .488 .641

1.000 .956 1.000 .930 Extraction Method: Principal Component Analysis.


Initial Communalities
the principal component (PC) method
all communalities start out as 1 and are then recomputed from the extracted factors

the principal factor (PF) method

the initial communalities are estimated and are then recomputed from the extracted factors

for both of these, can stop after the first step or iterate the process until the communalities do not change much
a problem occurs when communalities come out larger than 1 though 29

Initial Communality Estimates

initial communalities are usually estimated using the squared multiple correlations
square the multiple correlation of each z with all the other z's

SAS supports alternative ways to estimate the initial communalities

but calls them prior communalities adjusted SMCs
divide the SMCs by their maximum value

maximum absolute correlations

use the maximum absolute correlation of each z will all the other z's

random settings
generate random numbers between 0 and 1

not available in SPSS


PC-Based Alternatives
1-step principal component (PC) method
set communalities all to an initial value of 1 compute loadings and factor scores re-estimate the communalities from these and stop iterated version available in SAS but not in SPSS

1-step principal factor (PF) method

estimate the initial values for the communalities compute loadings and factor scores re-estimate the communalities from these and stop 1-step procedure available in SAS but not in SPSS iterated version available in both SPSS and SAS
called principal axis factoring (PAF) in SPSS

each factor F (or PC or FS) has an associated eigenvalue EV
also called a characteristic root since by definition it is a solution to the so-called characteristic equation for the correlation matrix R

the sum of the eigenvalues over all factors equals the total variance
sum of the EV's = total variance = # of items so an eigenvalue measures how much of the total variance of the z's is accounted for by its associated factor (or PC) in other words, factors with larger eigenvalues contribute more towards explaining the total variance of the z's

eigenvalues are generated in decreasing order

EV(1) EV(2) EV(3) eigenvalues at the start have the more important factors (or PC's) 32

The Eigenvalue-One Rule

the eigenvalue-one (EV-ONE) rule
also called the Kaiser-Guttman rule

says to use the factors with eigenvalues > 1 and discard the rest an eigenvalue > 1 means its factor contributes more to the total variance than a single z since each z has variance 1 and so contributes 1 to the total variance

Radiotherapy Data Eigenvalues

EV-ONE says to extract 2 factors 2 factors explain about 78% of the total variance
Total Variance Explained Initial Eigenvalues % of Variance Cumulative % 58.844 58.844 18.927 77.770 12.432 90.202 8.642 98.844 1.010 99.855 .145 100.000 Extraction Sums of Squared Loadings % of Variance Total Cumulative % 3.531 58.844 58.844 1.136 18.927 77.770

Component 1 2 3 4 5 6

Total 3.531 1.136 .746 .519 .061 .009

Extraction Method: Principal Component Analysis.


Other Possible Selection Rules

individual % of the total variance
use the factors whose eigenvalues exceed 5% (or 10%) of the total variance [8]

cumulative % of the total variance

use initial subset of factors the sum of whose eigenvalues first exceeds 70% (or 80%) of the total variance [8]

inspect a scree plot for a big change in slope

the plot of the eigenvalues in decreasing order

same rules apply to reducing the # of PC's


Radiotherapy Data Scree Plot

"scree" means debris at the bottom of a cliff
Scree Plot


look for the point on x-axis separating the "cliff" from the "debris" at its bottom i.e., a large change in slope

0 1 2 3 4 5 6

biggest change is between 1 and 2

Component Number

perhaps there is only 1 factor? or maybe as much as 4


Factor Analysis Properties

the loading L of z on F is the correlation between z and F the square of the loading L is the portion of the variance of z explained by F the sum of the square loadings over all factors is the portion of the variance of z explained by all the factors
so this sum equals the communality of z

the sum of the squared loadings over all z is the portion of the total variance explained by F
so this sum equals the eigenvalue EV for F

the correlation between any 2 z's is the sum of the products of their loadings on each of the factors

Factor Analysis Types

exploratory factor analysis (EFA)
use the data to determine how many factors there should be and which items to associate with those factors can be accomplished using the PC method, the PF method, and a variety of other methods supported by SPSS and SAS
use Analyze/Data Reduction/Factor... in SPSS use PROC FACTOR in SAS

confirmatory factor analysis (CFA)

use theory to pre-specify an item-factor allocation and assess whether it is a reasonable choice supported by SAS but not by SPSS
use PROC CALIS (Covariance AnaLysIS) in SAS SPSS users need to use another tool like LISREL or AMOS

The ABC Survey Instrument Data

example factor analyses are presented of
the baseline CDI, FACES II, and DQOLY items
without prior reverse coding

for the 103 adolescents with type 1 diabetes who responded at baseline to all the items of all 3 of these instruments
88.0% of the 117 subjects providing some baseline data

from Adolescents Benefit from Control (ABCs) of Diabetes Study (Yale School of Nursing, PI Margaret Grey) [9]

using SPSS (version 14.2) and SAS (version 9.1)

data and code are available on the Internet at

see [10] for details for some of the reported results


Principal Component Example

in SPSS, run the PC method for the FACES items extracting 2 factors and generate a scree plot
the same as the recommended # of scales
click on Analyze/Data Reduction/Factor... set "Variables:" to FACES1-FACES30 in "Extraction...", set "Number of factors" to 2 and request a scree plot use the default method of "Principal components" then execute the analysis


Communalities FACES1 FACES2 FACES3 FACES4 FACES5 FACES6 FACES7 FACES8 FACES9 FACES10 FACES11 FACES12 FACES13 FACES14 FACES15 FACES16 FACES17 FACES18 FACES19 FACES20 FACES21 FACES22 FACES23 FACES24 FACES25 FACES26 FACES27 FACES28 FACES29 FACES30 Initial 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 Extraction .504 .375 .426 .214 .305 .236 .458 .623 .258 .378 .211 .122 .128 .214 .394 .458 .430 .550 .473 .357 .342 .461 .494 .200 .599 .542 .309 .225 .383 .466

the initial communalities are all set to 1 for the PC method they are then recomputed (in the "Extraction" column) based on the 2 extracted factors all the recomputed communalities are < 1 as they should be for a factor analysis with k<30 if 30 factors had been extracted, the communalities would have all stayed 1
a standard PCA

Extraction Method: Principal Component Analysis.


Matrix a

the matrix of loadings
called the component matrix in SPSS for the PC method 30 rows, 1 for each item z 2 columns, 1 for each factor F

Component 2 .702 .110 .611 -.037 -.327 .565 .454 .093 .550 .048 .356 .330 .677 -.004 .789 -.027 -.318 .396 .338 .513 .387 .246 -.335 .102 .303 .191 .185 .424 -.231 .584 .630 .247 .655 -.031 .689 .273 -.487 .486 .565 .195 .564 .155 .673 .088 .686 -.151 -.315 .317 -.532 .562 .731 .089 .527 .178 -.239 .410 -.495 .371 .647 .217

FACES1 loads much more highly on the first factor than on the second factor
since .702 is much larger than .110 and so FACES1 is said to be a marker item (or salient) for factor 1

Extraction Method: Principal Component Analysis. a. 2 components extracted.

Total Variance Explained Initial Eigenvalues Extraction Sums of Squared Loadings % of % of Variance Cumulative % Total Variance Cumulative % Total 8.360 27.867 27.867 8.360 27.867 27.867 2.777 9.255 37.122 2.777 9.255 37.122 1.804 6.012 43.134 1.593 5.309 48.443 1.413 4.712 53.155 1.305 4.350 57.505 1.266 4.221 61.726 1.150 3.835 65.560 .984 3.279 68.839 .898 2.992 71.831 .818 2.726 74.557 .770 2.567 77.124 .708 2.359 79.484 .681 2.268 81.752 .583 1.945 83.697 .563 1.876 85.573 .519 1.731 87.304 .481 1.604 88.908 .453 1.509 90.417 .407 1.357 91.774 .381 1.270 93.043 .361 1.204 94.248 .310 1.035 95.282 .280 .933 96.215 .251 .836 97.051 .226 .752 97.803 .209 .697 98.500 .192 .641 99.141 .155 .516 99.657 .103 .343 100.000 Component 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

the "Total" column gives the eigenvalues in decreasing order the first 2 factors explain about 28% and 9% individually of the total variance
total variance = 30 since items are standardized

but only 37% together

could more be needed?

Extraction Method: Principal Component Analysis.

The # of Factors to Extract

conventional selection rules give different #'s of factors
first 8 have eigenvalues > 1 first 4 each explain more than 5% each first 1 each explain more than 10% each first 10 combined explain just over 70% first 14 combined explain just over 80%

none choose the recommended # of 2 factors


The Scree Plot

Scree Plot

seems to be a large change in slope between 2-3 factors

suggests that the recommended # of 2 factors might be a reasonable choice for the ABC FACES items but maybe the slope isn't close to constant until later


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Component Number


Principal Axis Factoring Example

in SPSS, run the PAF method for the FACES items extracting 2 factors as before
re-enter Analyze/Data Reduction/Factor... in "Extraction...", set "Method:" to "Principal axis factoring" note that the default is to analyze the correlation matrix i.e, factor analyze the standardized FACES items z then re-execute the analysis


Communalities FACES1 FACES2 FACES3 FACES4 FACES5 FACES6 FACES7 FACES8 FACES9 FACES10 FACES11 FACES12 FACES13 FACES14 FACES15 FACES16 FACES17 FACES18 FACES19 FACES20 FACES21 FACES22 FACES23 FACES24 FACES25 FACES26 FACES27 FACES28 FACES29 FACES30 Initial .650 .538 .615 .575 .594 .405 .582 .702 .501 .501 .511 .379 .410 .427 .462 .619 .699 .708 .574 .504 .617 .663 .723 .361 .665 .586 .534 .515 .489 .582 Extraction .478 .343 .352 .182 .272 .178 .427 .609 .182 .308 .161 .101 .102 .129 .288 .428 .399 .531 .429 .321 .307 .433 .462 .150 .589 .522 .268 .154 .331 .437

the initial communalities are all estimated using associated squared multiple correlations they are then recomputed based on the 2 extracted factors all the initial and recomputed communalities are < 1 as they should be for a factor analysis with k<30

Extraction Method: Principal Axis Factoring.



a Matrix

2 .107 -.035 .503 .055 .031 .259 .001 -.025 .304 .452 .176 .070 .152 .316 .490 .238 -.034 .272 .455 .175 .137 .096 -.133 .253 .560 .100 .142 .324 .328 .208


Factor 1 .683 .585 -.314 .423 .521 .332 .653 .780 -.299 .322 .361 -.310 .281 .170 -.219 .610 .631 .676 -.471 .538 .537 .651 .667 -.294 -.525 .715 .498 -.223 -.473 .627

the matrix of loadings

30 rows, 1 for each item z 2 columns, 1 for each factor F SPSS calls it the factor matrix SAS calls it the factor pattern matrix

FACES1 again loads much more highly on the first factor

since .683 is much larger than .107 loadings have changed, but only a little
from .702 and .110 for the PC method

Extraction Method: Principal Axis Factoring a. 2 factors extracted. 5 iterations require


PC vs. PF Methods
the use of the PC method vs. the PF method is thought to usually have little impact on the results
"one draws almost identical inferences from either approach in most analyses" [11, p. 535]

so far there seems to be only a minor impact to the choice of factor extraction method on the loadings for the FACES data
we will continue to consider this issue


Total Variance Explained Initial Eigenvalues Extraction Sums of Squared Loadings % of % of Variance Cumulative % Variance Cumulative % Total Total 8.360 27.867 27.867 8.360 27.867 27.867 2.777 9.255 37.122 2.777 9.255 37.122 1.804 6.012 43.134 1.593 5.309 48.443 1.413 4.712 53.155 1.305 4.350 57.505 1.266 4.221 61.726 1.150 3.835 65.560 .984 3.279 68.839 .898 2.992 71.831 .818 2.726 74.557 .770 2.567 77.124 .708 2.359 79.484 .681 2.268 81.752 .583 1.945 83.697 .563 1.876 85.573 .519 1.731 87.304 .481 1.604 88.908 .453 1.509 90.417 .407 1.357 91.774 .381 1.270 93.043 .361 1.204 94.248 .310 1.035 95.282 .280 .933 96.215 .251 .836 97.051 .226 .752 97.803 .209 .697 98.500 .192 .641 99.141 .155 .516 99.657 .103 .343 100.000 Component 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

exactly the same as for the PC method in SPSS, eigenvalues are always computed using the PC method
even if a different factor extraction method is used

so always get the same choice for the # of factors with the EVONE rule and other related rules
but the factor loadings 50 will change

Extraction Method: Principal Component Analysis.


in SPSS, run the PAF method for the FACES items extracting the # of factors determined by the EV-ONE rule
re-enter Analyze/Data Reduction/Factor... in "Extraction...", click on "Eigenvalues over:" and leave the default value at 1 this was the original default way for choosing # of factors to extract SPSS is set up to encourage the use of the EV-ONE rule then re-execute the analysis


Communalities FACES1 FACES2 FACES3 FACES4 FACES5 FACES6 FACES7 FACES8 FACES9 FACES10 FACES11 FACES12 FACES13 FACES14 FACES15 FACES16 FACES17 FACES18 FACES19 FACES20 FACES21 FACES22 FACES23 FACES24 FACES25 FACES26 FACES27 FACES28 FACES29 FACES30 Initial .650 .538 .615 .575 .594 .405 .582 .702 .501 .501 .511 .379 .410 .427 .462 .619 .699 .708 .574 .504 .617 .663 .723 .361 .665 .586 .534 .515 .489 .582

the initial communalities are all estimated using associated squared multiple correlations
and so they are the same as before

but communalities based on the extraction as well as the factor matrix are not produced the procedure did not converge because communalities over 1 were generated
suggests that the EV-ONE rule is of questionable value for the ABC FACES items
Factor Matrixa a. Attempted to extract 8 factors. In iteration 25, the communality of a variable exceeded 1.0. Extraction was terminated.

Extraction Method: Principal Axis Factoring.


Communality Anomalies
communalities are by definition between 0 & 1 but factor extraction methods can generate communalities > 1
Heywood case: when a communality = 1 ultra-Heywood case: when a communality > 1

SAS has an option that changes any communalities > 1 to 1, allowing the iteration process to continue and so avoiding the convergence problems reported for SPSS

EV-ONE Rule for CDI

in SPSS, run the PAF method for CDI items extracting the # of factors determined by the EV-ONE rule
re-enter Analyze/Data Reduction/Factor... from "Variables:", first remove FACES1-FACES30 and then add in CDI1-CDI27 then re-execute the analysis

the EV-ONE rule selects 10 factors PAF did not converge in the default # of 25 iterations
but the # of iterations can be increased
in "Extraction..." change "Maximum Iterations for Convergence:" to 200 (it did not converge at 100)

after more iterations, extraction is terminated because some communalities exceed 1 again the EV-ONE rule appears to be of questionable value

The Scree Plot

Scree Plot


but the scree plot suggests that 1 may be a reasonable choice for the # of factors
which is the recommended # of scales for CDI

or maybe 4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Factor Number

since there is bit of a drop between 4 and 5 factors



in SPSS, run the PAF method for the DQOLY items extracting the # of factors determined by the EV-ONE rule
re-enter Analyze/Data Reduction/Factor... from "Variables:", replace CDI1-CDI27 by DQOLY1-DQOLY51 then re-execute the analysis

converges in 14 iterations
but the EV-ONE rule selects 15 factors seems like far too many


The Scree Plot

Scree Plot



the scree plot, though, suggests that 3 may be a reasonable choice for the # of factors
which is the recommended # of scales for DQOLY

1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

Factor Number

perhaps a somewhat larger value might also be reasonable


EV-ONE Results Summary

the EV-ONE rule is the default approach in SPSS for choosing the # of factors it generated quite large choices for the # of factors for the 3 instruments of the ABC data
10 for CDI, 8 for FACES, 15 for DQOLY compared to recommended #'s: 1 for CDI, 2 for FACES, 3 for DQOLY "it is not recommended, despite its wide use, because it tends to suggest too many factors" [11, p. 482]

also rules based on % explained variance can generate much different choices for the # of factors
"basically inapplicable as a device to determine the # of factors" [11, p. 483]

scree plots suggested much lower #'s of factors

at or close to recommended # of factors for all 3 instruments but the scree plot approach is very subjective

how many factors to extract is not simply decided


The EV-ONE Rule in SAS

Preliminary Eigenvalues: Total = 16.6895 Average = 0.55631667 Eigenvalue Difference Proportion Cumulative 1 7.96355571 5.64829078 0.4772 0.4772 2 2.31526494 0.96513775 0.1387 0.6159 3 1.35012718 0.26534277 0.0809 0.6968 4 1.08478441 0.11094797 0.0650 0.7618 5 0.97383643 0.12771977 0.0584 0.8201 6 0.84611667 0.04494166 0.0507 0.8708 7 0.80117501 0.09409689 0.0480 0.9188 8 0.70707811 0.16561419 0.0424 0.9612 9 0.54146393 0.08980179 0.0324 0.9936 10 0.45166214 0.09919908 0.0271 1.0207 11 0.35246306 0.06591199 0.0211 1.0418 12 0.28655107 0.02695638 0.0172 1.0590 13 0.25959468 0.04076514 0.0156 1.0745 14 0.21882954 0.10220557 0.0131 1.0877 15 0.11662397 0.01059049 0.0070 1.0946 16 0.10603348 0.03817049 0.0064 1.1010 17 0.06786300 0.03781255 0.0041 1.1051 18 0.03005045 0.02028424 0.0018 1.1069 19 0.00976621 0.02906814 0.0006 1.1075 20 -.01930193 0.03167315 -0.0012 1.1063 21 -.05097508 0.00325667 -0.0031 1.1032 22 -.05423176 0.08408754 -0.0032 1.1000 23 -.13831929 0.00822764 -0.0083 1.0917 24 -.14654693 0.03833370 -0.0088 1.0829 25 -.18488064 0.01482608 -0.0111 1.0718 26 -.19970672 0.02597538 -0.0120 1.0599 27 -.22568210 0.01184965 -0.0135 1.0464 28 -.23753175 0.00463868 -0.0142 1.0321 29 -.24217043 0.05182292 -0.0145 1.0176 30 -.29399335 -0.0176 1.0000 4 factors will be retained by the MINEIGEN criterion.

using the 1-step PF method in SAS

the EV-ONE rule is applied to eigenvalues determined from the initial communalities not always to the eigenvalues from the PC's as in SPSS

in SAS, eigenvalue-based rules can generate different choices for the # of factors when applied to different factor extraction methods 4 factors are generated in this case for the FACES items instead of 8 as in SPSS 59

SPSS is primarily a menu-driven system
statistical analyses are readily requested using its point and click user interface

it does also have a programming interface

for more efficient execution of multiple analyses with code which it calls "syntax" executed in the syntax editor using the Run/All menu option

equivalent code for a menu-driven analysis can be generated using the "paste" button here is code for the most recent analysis


The SAS Interface

SAS is a menu-driven system but it starts up in its programming interface
statistical analyses are requested by invoking its statistical procedures or PROCs
PROC PRINCOMP for PCA PROC FACTOR for factor analysis

it also has a feature called Analyst for conducting menu-driven statistical analyses
click on Solutions/Analysis/Analyst to enter it

but not all statistical analyses are supported

Analyst supports PCA but not factor analysis

need to use the programming interface to conduct a factor analysis in SAS



the following code runs the 1-step PC method with # of factors determined by the EV-ONE rule applied to the FACES items assuming they are in the default data set

to request the 1-step PC method, use "METHOD=PRINCIPAL" with "PRIORS=ONE" (i.e, set initial/prior communalities to 1) to request the EV-ONE rule, use "MINEIGEN=1"
to request a specific # f of factors, replace "MINEIGEN=1" with "NFACTORS=f" to request the 1-step PF method, change to "PRIORS=SMC" (i.e, estimate the initial/prior communalities using the Squared Multiple Correlations) to iterate either of the above, change to "METHOD=PRINIT" can use "MAXITER=m" to request more than the default of 30 iterations adding "HEYWOOD" can avoid convergence problems


Setting the Number of Factors

SPSS provides 2 alternatives
choose "Eigenvalues over:" with the default of 1 or with some other value x
the default is to use the EV-ONE rule

or choose "Number of factors:" and provide a specific integer f (no more than the # of items)

SAS provides 3 alternatives

set "MINEIGEN=x" with x=1 to get the EV-ONE rule set "NFACTORS=f" for a specific integer f set "PERCENT=p" meaning the first so many factors whose combined eigenvalues explain over p% of the total variance if none set, as many factors as there are items are extracted if more than one set, the smallest such # is extracted

Part 3 Factor Extraction

survey of factor extraction methods goodness of fit test and penalized likelihood criteria factoring the correlation vs. the covariance matrix generating factor scores correlation/covariance residuals sample size and sampling adequacy missing values 64 example analyses

SPSS Factor Extraction Methods

7 different alternatives are supported in SPSS
principal component (1-step) + principal axis factoring (PAF)
PC-based factor extraction methods

unweighted least squares + generalized least squares

minimizing the sum of squared differences between the usual correlation estimates and the ones for the factor analysis model
with squared differences weighted in the generalized case

alpha factoring
maximizing the reliability (i.e., Chronbach's alpha) for the factors

maximum likelihood
treating the standardized items as multivariate normally distributed with factor analysis correlation structure

image factoring
Kaiser's image analysis of the image covariance matrix
matrix computed from the correlation matrix R and the diagonal 65 elements of its inverse matrix; related to anti-image covariance matrix

SAS Factor Extraction Methods

9 different alternatives are supported in SAS
the PC and PF methods
with 1-step and iterated versions of both (4 PC-based methods) PAF in SPSS is the same as the SAS iterated PF method

unweighted least squares

but not generalized least squares as in SPSS

alpha factoring maximum likelihood image component analysis

applying the PC method to the image covariance matrix
not the same as image factoring in SPSS but both use the image covariance matrix

Harris component analysis

uses a matrix computed from the correlation and covariance matrices

the results for some methods can be affected by how the initial communalities are estimated

Factor Extraction Alternatives

have demonstrated so far
PC method PF method

will now demonstrate

alpha factoring maximum likelihood (ML)

this covers the more commonly used methods [1,12] will not demonstrate other available methods
described as lesser-used in [13,p.362]

Chronbach's Alpha ()
a measure of internal consistency reliability
is computed for each scale of an instrument separately
after reverse coding items when appropriate

by convention, an acceptable value is one that is at least .7 [12]

is often the only quantity used to assess established scales, and so it seems desirable for scales to have maximum

Alpha Factoring Example

in SPSS, run the alpha factoring method for the FACES items extracting the recommended # of 2 factors
re-enter Analyze/Data Reduction/Factor... set "Variables:" to FACES1-FACES30 in "Extraction...", set "Method:" to "Alpha factoring", select "Numbers of Factors:" and set it to 2 then re-execute the analysis



Matrixa Factor

2 .075 -.016 .423 .164 .079 .279 -.012 -.022 .367 .384 .276 .123 .130 .406 .566 .162 -.015 .215 .402 .130 .112 .004 -.172 .236 .504 .017 .190 .352 .302 .152

1 FACES1 FACES2 FACES3 FACES4 FACES5 FACES6 FACES7 FACES8 FACES9 FACES10 FACES11 FACES12 FACES13 FACES14 FACES15 FACES16 FACES17 FACES18 FACES19 FACES20 FACES21 FACES22 FACES23 FACES24 FACES25 FACES26 FACES27 FACES28 FACES29 FACES30 .672 .582 -.289 .465 .526 .335 .683 .794 -.265 .312 .364 -.292 .268 .204 -.224 .592 .652 .676 -.458 .546 .518 .649 .663 -.298 -.514 .705 .521 -.245 -.474 .610

the matrix of loadings FACES1 once again loads much more highly on the first factor
since .672 is much larger than .075 once again the loadings have changed only a little
from .702 and .110 for the PC method

Extraction Method: Alpha Factoring. a. 2 factors extracted. 7 iterations required.

Problems with Alpha Factoring

the alpha factoring method converged in only 7 iterations for 2 factors using the FACES items however, it does not converge for 1 or 3 factors using the FACES items
even with the # of iterations set to 1000 it seems to be cycling, never getting close to a solution

for the CDI items, it does not converge for 1, 2, or 3 factors for DQOLY, it does not converge for 1 or 3 factors, but does converge for 2 factors the alpha factoring method seems very unreliable even when it works, its optimal properties are lost 71 following rotation [11, p. 482]

Maximum Likelihood Example

in SPSS, run the ML method for the FACES items extracting the recommended # of 2 factors
re-enter Analyze/Data Reduction/Factor... in "Extraction...", change "Method:" to "Maximum likelihood" then re-execute the analysis

estimates the correlation matrix R using its most likely value given the observed data assuming R has factor analysis structure and that item values are normally distributed or at least approximately so [1]

Factor Matrixa Factor 1 FACES1 FACES2 FACES3 FACES4 FACES5 FACES6 FACES7 FACES8 FACES9 FACES10 FACES11 FACES12 FACES13 FACES14 FACES15 FACES16 FACES17 FACES18 FACES19 FACES20 FACES21 FACES22 FACES23 FACES24 FACES25 FACES26 FACES27 FACES28 FACES29 FACES30 .692 .590 -.328 .406 .512 .321 .643 .769 -.317 .314 .354 -.314 .279 .150 -.222 .614 .632 .678 -.484 .536 .537 .644 .670 -.298 -.538 .720 .486 -.218 -.482 .634 2 .114 -.043 .491 -.015 .003 .226 .026 -.004 .230 .472 .091 .033 .173 .240 .426 .282 -.064 .316 .488 .191 .157 .147 -.096 .241 .596 .151 .125 .300 .330 .235

the matrix of loadings FACES1 once again loads much more highly on the first factor
since .692 is much larger than .114 the loadings have changed, but only a little
from .702 and .110 for the PC method

all 4 extraction methods generate similar loadings, at least for FACES1

Extraction Method: Maximum Likelihood. a. 2 factors extracted. 5 iterations required.


Goodness of Fit Test

for the ML method, it is possible to test how well the factor analysis model fits the data
H0: the correlation matrix R equals the one based on 2 factors vs. Ha: it does not p-value = .000 < .05 is significant so reject H0, but would like not to reject

Goodness-of-fit Test Chi-Square 572.052 df 376 Sig. .000

can search for the first # of factors for which this test becomes nonsignificant
Goodness-of-fit Test df 246

significant for 7 factors nonsignificant for 8 factors but this is not close to the recommended # of 2 factors

Chi-Square 290.767

Sig. .026

Goodness-of-fit Test Chi-Square 250.667 df 223 Sig. .098


Maximum Likelihood in SAS

get the same loadings as for SPSS
use "METHOD=ML" with "PRIORS=SMC" (to estimate the initial/prior communalities using the squared multiple correlations)

but the goodness of fit test is replaced by a similar test

seems to be something like a one-sided version of the test in SPSS with alternative hypothesis that more than the current # of factors are required but 8 is also the first # of factors for which this test is nonsignificant (but at p=.0894 compared to p=.098 in SPSS)
Test H0: 8 Factors are sufficient HA: More factors are needed DF 223 Chi-Square ChiSq 251.8939 0.0894

in any case, this test tends to generate "more factors than are practical" [11,p. 479] 75

Penalized Likelihood Criteria

SAS generates 2 penalized likelihood criteria
for selecting between alternative models models with more parameters have larger likelihoods, so offset this with more of a penalty for more parameters
and transform so that smaller values indicate better models

AIC (Akaike's Information Criterion)

penalty based on the # of parameters

BIC (Schwarz's Bayesian Information Criterion)

penalty based on the # of observations/cases as well as the # of parameters

neither are available in SPSS

the AIC option in SPSS syntax requests display of the anti-image covariance matrix


Results for AIC/BIC

the following are the values for k=8 factors
Akaike's Information Criterion Schwarz's Bayesian Criterion -146.66197 -734.20653

an AIC (BIC) value does not mean anything by itself it needs to be compared to AIC (BIC) values for other models

the minimum AIC is achieved at 9 factors

seems too large "AIC tends to include factors that are statistically significant but inconsequential for practical purposes" [14, p. 1336]

the minimum BIC is achieved at 2 factors

the only approach so far to select the recommended # of factors "seems to be less inclined to include trivial factors" [14, p. 1336]


The Matrix Being Factored

by default, SPSS/SAS factor the correlation matrix R
factoring the standardized items z
for y's, subtract means, divide by standard deviations, then factor

the most commonly used approach

both have an option to factor the covariance matrix

in SPSS, click on "Covariance matrix" in "Extraction..." in SAS, add "COVARIANCE" to PROC FACTOR statement

factoring the centered items instead

for y's, subtract means, then factor

so the total variance is now the sum of the variances for all the items and the EV-ONE rule should not be used only works with some factor extraction methods

SAS also allows factoring without subtracting means

with or without dividing y's by standard deviations
add "NOINT" to PROC FACTOR statement

Factoring a Covariance Matrix

in SPSS, run the PAF method on the covariance matrix for the FACES items extracting the recommended # of 2 factors
re-enter Analyze/Data Reduction/Factor... in "Extraction...", change "Method:" to "Principal axis factoring" and turn on "Covariance matrix" then re-execute the analysis

SPSS generates 2 types of output

"raw" output is for the (raw) covariance matrix "rescaled" output is for the correlation matrix obtained by rescaling results for the covariance matrix
in SAS, "weighted" is the same as "raw" in SPSS (i.e., the covariance matrix is a weighted correlation matrix) while "unweighted" is the same as "rescaled" the SPSS/SAS manuals do not provide details on factoring a covariance matrix, so the 79 above is a best guess

a Factor Matrix

Rescaled Factor 1 .679 .585 -.319 .445 .535 .334 .659 .787 -.295 .318 .369 -.304 .278 .173 -.226 .604 .625 .659 -.471 .532 .531 .650 .666 -.304 -.530 .707 .508 -.217 -.474 .618 2 .109 -.034 .511 .075 .051 .271 .004 -.018 .315 .451 .191 .066 .141 .330 .483 .240 -.031 .253 .447 .184 .134 .099 -.143 .243 .555 .098 .150 .314 .315 .212

Raw Factor FACES1 FACES2 FACES3 FACES4 FACES5 FACES6 FACES7 FACES8 FACES9 FACES10 FACES11 FACES12 FACES13 FACES14 FACES15 FACES16 FACES17 FACES18 FACES19 FACES20 FACES21 FACES22 FACES23 FACES24 FACES25 FACES26 FACES27 FACES28 FACES29 FACES30 1 .585 .610 -.381 .515 .654 .447 .700 .878 -.349 .400 .426 -.313 .311 .185 -.250 .640 .654 .704 -.580 .579 .492 .710 .660 -.366 -.556 .739 .585 -.250 -.500 .673 2 .094 -.035 .611 .087 .063 .363 .004 -.020 .372 .567 .220 .068 .158 .353 .533 .254 -.033 .271 .551 .201 .124 .108 -.141 .294 .583 .103 .172 .363 .332 .231

the matrix of loadings

use the rescaled loadings to be consistent with prior analyses these are the only ones reported by SAS

FACES1 once again loads much more highly on the first factor
since .679 is much larger than .109 the loadings have changed, but only a little
from .683 and .107 for the PAF method applied to the correlation matrix

does not appear to be much of an impact to factoring the covariance matrix vs. the correlation matrix

Extraction Method: Principal Axis Factoring. a. 2 factors extracted. 6 iterations required.

Generating the Factor Scores

factors identified by factor analysis have construct validity if they predict certain related variables this can be assessed using the factor scores which are estimates of the values of the factors for each of the observations/cases in the data set first generate factor score variables
in SPSS, click on "Scores..." and turn on "Save as variables"
variables are added at the end of the data set called FAC1_1, FAC2_1, etc. in SAS, add the "SCORE" option to the PROC FACTOR statement and specify a new data set name using the "OUT=" option a new data set is created with the specified name containing everything in the source data set plus variables called Factor1, Factor2, etc.

then use these variables as predictors in regression models of appropriate outcome variables 81

Correlation Residuals
how much correlations generated by the factor analysis model differ from standard estimates of the correlations
measures how well the model fits correlations between items when the covariance matrix is factored, covariance residuals are generated instead
to generate correlation residuals in SAS add the "RESIDUALS" option to the PROC FACTOR statement to generate listings of these residuals further adding the "OUTSTAT=" option gives a name to an output data set containing among other things the correlation residuals for further analysis in SPSS, use "Reproduced" for the "Correlation matrix" option of "Descriptives..." to generate a listing of residuals

these do not directly address the issue of whether the values for the items are reasonably treated as close to normally distributed or if any are outlying
item residuals address this issue
such results are reported later

Sample Size Considerations

sample sizes for planned factor analyses are based on conventional guidelines
not on formal power analyses recommendations for the sample size vary from 3 to 10 times the # of items and at least 100 [8,13,14]
higher values seem more important for development of new scales than for assessment of established scales

for the ABC data, there are only 3.8, 3.4, and 2.0 observations per item for the CDI, FACES, and DQOLY items, respectively
relatively low values especially for DQOLY

Measure of Sampling Adequacy

possible to assess the sampling adequacy of existing data using the Kaiser-Meier-Olkin (KMO) measure of sampling adequacy (MSA)
a summary of how small partial correlations are relative to ordinary correlations values at least .8 are considered good values under .5 are considered unacceptable
in SPSS, click on "Descriptives..." and set "KMO and Bartlett's test of sphericity" on in SAS, add the "MSA" option to the PROC FACTOR statement calculates overall MSA value + MSA values for each item

also get Bartlett's test of sphericity in SPSS

in SAS, it is only generated for the ML method

H0: the standardized items are independent (0 factor model) Ha: they are not (i.e., there is at least 1 factor)

Results for the ABC Data

observed sampling adequacy
.778 for FACES .725 for CDI .699 for DQOLY ABC items are somewhat adequate (>.5) but not good (<.8) FACES
KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy. Bartlett's Test of Sphericity Approx. Chi-Square df Sig. .778 1365.068 435 .000

KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy. Bartlett's Test of Sphericity Approx. Chi-Square df Sig. .725 920.324 351 .000

Bartlett's test of sphericity

H0: independent standardized items Ha: they are not p = .000 for all 3 cases
all three sets of standardized items are distinctly correlated and so require at least 1 factor

KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy. Bartlett's Test of Sphericity Approx. Chi-Square df Sig. .699 2911.235 1275 .000

however, this test is not considered of value [11, p.469]


Missing Values
by default, SPSS (SAS) deletes any cases (observations) with missing values for any of the items SPSS supports
"Exclude cases listwise", the default option "Exclude cases pairwise"
calculating correlations between pairs of items using all cases with non-missing values for both items can generate very unreliable estimates so best not to use

"Replace with mean"

replace missing values for an item with the average of all the nonmissing values for that item

SAS provides no other options

but can first impute values using PROC MI (for multiple imputation)

Missing Item Value Imputation

many instruments do not provide missing value guidelines when they do, they usually suggest replacing missing item values with averages of the nonmissing item values for a case
averaging values of the other items for that case rather than values of the other cases for that item
so different from the SPSS "Replace with mean" option

as long as there aren't too many items with missing values for that case
e.g., if at least 50% or 70% of the item values are 87 not missing

Part 4 Factor Rotation

marker items, allocating items to factors/scales, discarding items varimax rotation, normalization, testing for significant loadings orthogonal vs. oblique rotations, survey of alternative rotation approaches promax rotation, inter-factor correlations, the structure matrix impact of rotations reverse coding 88 example analyses

Marker Items for Factors

item z is considered a marker item (or a salient) for factor F if its absolute loading is high while its absolute loadings on all the other factors FN are all low
the absolute loading is the loading with its sign removed when discussing this, authors often ignore the issue of negative loadings, but in general signs of loadings need to be accounted for

what is meant by high?

typically, an absolute loading at or above a cutoff value, like 0.3, 0.35, 0.4, or 0.5 [8,15,16], is considered high while anything below that is considered low at least 0.3 at a minimum; at least 0.5 usually better [11]

if some factors have small #'s of marker items, the # of factors may have been set too high
at least 2 [11] or 3 [8,13] items per factor is desirable

Item-Scale Allocation
when developing scales for a new instrument, the items are usually separated into disjoint sets consisting of the marker items for each factor and used to compute associated scales
marker items represent distinct aspects of associated factors and are the basis for assigning scales meaningful names

items that have high absolute loadings on more than one factor are usually discarded [8]
they do not represent distinct aspects of only one factor

items that have low absolute loadings on all factors should then also be discarded
they do not represent distinct aspects of any factor most authors ignore this issue, but it does happen quite often in practice 90

General vs. Group Factors

should all items load on all factors or not?
general factors are those with all items loading on them
this is assumed in the standard factor analysis model

group factors are those with associated subsets of items loading on them
this is the basis for item-scale allocation rules

"not everyone agrees that general factors are undesirable" [11, p. 503]

instruments which partition their items into disjoint sets corresponding to marker items are assuming that all the factors are distinct group factors instruments that use all items to compute all the scales are assuming the factors are all general factors
e.g., the PCS and MCS scales of the MOS SF-36 are computed from all 35 items used in scale construction but these items are first partitioned into disjoint groups and 91 used to compute associated subscales

the interpretation of factors through their marker items can be difficult if based on the loadings generated directly by factor extraction rotated loadings are typically used instead
these are thought to be more readily interpretable

varimax is the most popular approach [8,12]

it attempts to minimize the # of z's that load highly on each of the factors

but there are a variety of other ways to rotate loadings


Varimax Rotation for FACES

in SPSS, run the ML method for the FACES items extracting the recommended # of 2 factors and rotate loadings using varimax rotation
re-enter Analyze/Data Reduction/Factor... in "Extraction...", change "Method:" to "Maximum likelihood" note there is no option for which type of matrix to factor it does not matter for ML the ML estimate of the correlation matrix induces the ML estimate of the covariance matrix and vice versa in "Rotation...", click on "Varimax" note the default rotation setting is "None" then re-execute the analysis


Rotating the Initial Loadings

Factor Matrixa Factor 1 FACES1 FACES2 FACES3 FACES4 FACES5 FACES6 FACES7 FACES8 FACES9 FACES10 FACES11 FACES12 FACES13 FACES14 FACES15 FACES16 FACES17 FACES18 FACES19 FACES20 FACES21 FACES22 FACES23 FACES24 FACES25 FACES26 FACES27 FACES28 FACES29 FACES30 .692 .590 -.328 .406 .512 .321 .643 .769 -.317 .314 .354 -.314 .279 .150 -.222 .614 .632 .678 -.484 .536 .537 .644 .670 -.298 -.538 .720 .486 -.218 -.482 .634 2 .114 -.043 .491 -.015 .003 .226 .026 -.004 .230 .472 .091 .033 .173 .240 .426 .282 -.064 .316 .488 .191 .157 .147 -.096 .241 .596 .151 .125 .300 .330 .235

the matrix of loadings

with 30 rows and 2 columns

is multiplied on the right by the factor transformation matrix

with 2 rows and 2 columns the one below is produced by varimax

to produce the rotated factor matrix

will also have 30 rows and 2 columns

same process for any rotation scheme but using a different transformation matrix
Factor Transformation Matrix Factor 1 2 1 .844 .536 2 -.536 .844

Extraction Method: Maximum Likelihood. a. 2 factors extracted. 5 iterations required.


Extraction Method: Maximum Likelihood. Rotation Method: Varimax with Kaiser Normalization.

Varimax Rotated Loadings

a Rotated Factor Matrix

Factor 1 FACES1 FACES2 FACES3 FACES4 FACES5 FACES6 FACES7 FACES8 FACES9 FACES10 FACES11 FACES12 FACES13 FACES14 FACES15 FACES16 FACES17 FACES18 FACES19 FACES20 FACES21 FACES22 FACES23 FACES24 FACES25 FACES26 FACES27 FACES28 FACES29 FACES30 .645 .475 -.014 .335 .434 .392 .557 .647 -.144 .518 .347 -.247 .329 .255 .041 .669 .499 .742 -.146 .555 .538 .623 .514 -.122 -.134 .688 .477 -.024 -.230 .661 2 -.274 -.353 .591 -.230 -.272 .018 -.322 -.415 .364 .230 -.113 .196 -.004 .123 .478 -.091 -.393 -.096 .671 -.126 -.156 -.221 -.440 .363 .792 -.259 -.155 .370 .536 -.142

the matrix of rotated loadings

with 30 rows and 2 columns

FACES1 once again loads more highly on the first factor

since .645 is larger than .274 loadings have changed quite a bit
from .702 and .110 for the PC method

especially the loading (!.245) on factor 2 which is now negative

Extraction Method: Maximum Likelihood. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 3 iterations.


the same percentage of total variance (32.814%) is explained using rotated loadings as with unrotated loadings but it is allocated differently to the factors
factor 2's contribution has increased from 6.937% to 12.107% while factor 1's contribution has decreased from 25.877% to 20.707%
Total Variance Explained Initial Eigenvalues % of Variance Cumulative % 27.867 27.867 9.255 37.122 6.012 43.134 5.309 48.443 4.712 53.155 4.350 57.505 4.221 61.726 3.835 65.560 3.279 68.839 2.992 71.831 2.726 74.557 2.567 77.124 2.359 79.484 2.268 81.752 1.945 83.697 1.876 85.573 1.731 87.304 1.604 88.908 1.509 90.417 1.357 91.774 1.270 93.043 1.204 94.248 1.035 95.282 .933 96.215 .836 97.051 .752 97.803 .697 98.500 .641 99.141 .516 99.657 .343 100.000 Extraction Sums of Squared Loadings Total % of Variance Cumulative % 7.763 25.877 25.877 2.081 6.937 32.814 Rotation Sums of Squared Loadings Total % of Variance Cumulative % 6.212 20.707 20.707 3.632 12.107 32.814 Factor 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Total 8.360 2.777 1.804 1.593 1.413 1.305 1.266 1.150 .984 .898 .818 .770 .708 .681 .583 .563 .519 .481 .453 .407 .381 .361 .310 .280 .251 .226 .209 .192 .155 .103

Reallocated Explained Variance


Extraction Method: Maximum Likelihood.

Sorting the Rotated Loadings

to more readily allocate items to factors, have items displayed in sorted order based on their loadings
in SPSS re-enter Analyze/Data Reduction/Factor... in "Options...", click on "Sorted by size" then re-execute the analysis in SAS add the "REORDER" option to the PROC FACTOR statement


Sorted Rotated Loadings

a Rotated Factor Matrix

Factor 1 FACES18 FACES26 FACES16 FACES30 FACES8 FACES1 FACES22 FACES7 FACES20 FACES21 FACES10 FACES23 FACES17 FACES27 FACES2 FACES5 FACES6 FACES11 FACES4 FACES13 FACES14 FACES12 FACES25 FACES19 FACES3 FACES29 FACES15 FACES28 FACES9 FACES24 .742 .688 .669 .661 .647 .645 .623 .557 .555 .538 .518 .514 .499 .477 .475 .434 .392 .347 .335 .329 .255 -.247 -.134 -.146 -.014 -.230 .041 -.024 -.144 -.122 2 -.096 -.259 -.091 -.142 -.415 -.274 -.221 -.322 -.126 -.156 .230 -.440 -.393 -.155 -.353 -.272 .018 -.113 -.230 -.004 .123 .196 .792 .671 .591 .536 .478 .370 .364 .363

column 1 values decrease in absolute value from FACES18 to FACES12 while remaining larger in absolute value than column 2 values
so 22 load more on factor 1: 18,28,,12

after that column 2 values decrease in absolute value while remaining larger in absolute value than column 1 values
other 8 load more on factor 2: 25,19,3,29,15, 28,9,24

need to know what the items are in order to interpret these results item 12 is the only item with maximum absolute loading for a negative loading
suggesting it will need reverse coding

Extraction Method: Maximum Likelihood. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 3 iterations.

a Rotated Factor Matrix

Discarding Items
using 0.3 as cutoff for low/high loadings
2 -.096 -.259 -.091 -.142 -.415 -.274 -.221 -.322 -.126 -.156 .230 -.440 -.393 -.155 -.353 -.272 .018 -.113 -.230 -.004 .123 .196 .792 .671 .591 .536 .478 .370 .364 .363

Factor 1 FACES18 FACES26 FACES16 FACES30 FACES8 FACES1 FACES22 FACES7 FACES20 FACES21 FACES10 FACES23 FACES17 FACES27 FACES2 FACES5 FACES6 FACES11 FACES4 FACES13 FACES14 FACES12 FACES25 FACES19 FACES3 FACES29 FACES15 FACES28 FACES9 FACES24 .742 .688 .669 .661 .647 .645 .623 .557 .555 .538 .518 .514 .499 .477 .475 .434 .392 .347 .335 .329 .255 -.247 -.134 -.146 -.014 -.230 .041 -.024 -.144 -.122

items 8,7,23,17,2 have both absolute loadings > 0.3 items 14,12 have both absolute loadings < 0.3 suggests discarding 7 items

using 0.4 instead

only items 8,23 now have both loadings high but items 6,11,4,13,14,12,28,9,24 now have both loadings low suggests discarding 11 items

FACES is an established instrument so it seems inappropriate to discard its items

but if so many items of an established instrument can be considered of negligible value, perhaps meaningful items can be discarded when developing new scales 99

Extraction Method: Maximum Likelihood. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 3 iterations.

Normalizing Before Rotating

by default, SPSS/SAS normalize the factor matrix prior to rotating it to reduce computational problems
both use Kaiser normalization as the default
dividing each row of the factor matrix by the sum of squares of the values in the row

SPSS also supports the case of no normalization

but this can only be selected using the programming interface, not with the menu-driven interface

SAS supports requests for the following

Kaiser normalization no normalization the Cureton-Mulaik weighting technique rescaling rows to represent covariances rather than correlations


Testing Loadings in SAS

Rotated Factor Pattern With 95% confidence limits; Cover 0? Estimate/StdErr/LowerCL/UpperCL/Coverage Display Factor1 FACES1 FACES1 0.64504 0.06450 0.50072 0.75446 0[] Factor2 -0.27420 0.09932 -0.45570 -0.07080 []0

SAS has an option to test for significantly nonzero loadings

add "COVER" to test if loadings equal zero or not "COVER=p" tests for loadings equal to p by default p=0

FACES30 FACES30 0.66117 0.06204 0.52184 0.76614 0[] -0.14169 0.10117 -0.33193 0.05963 [0]

"[0]" means 0 is in the 95% confidence interval for a loading "0[]" means it is not

FACES1 loads on both factors FACES30 loads on factor 1 but not factor 2


Significant Loadings for FACES

significant rotated loadings (using varimax rotation)
0 items load on neither factor
all FACES items appear to be of distinct value

11 items load only on factor 1 and 7 only on factor 2 12 items load on both factors, so 40% of the items address both factors
adaptability and cohesion are likely to be highly correlated

comparison to the recommended scales

adaptability based on 14 even items other than item 30
12 of these load highly on factor 1 item 24,28 load highly only on factor 2

cohesion based on the 15 odd items + item 30

11 of these load highly on factor 2 items 11,13,21,27,30 load highly only on factor 1

identified factors appear to be distinctly different from standard FACES constructs with 23.3% (7/30) 102 inconsistent items

Varimax Rotation for DQOLY

in SPSS, run the ML method for the DQOLY items extracting the recommended # of 3 factors and rotate loadings using varimax rotation
re-enter Analyze/Data Reduction/Factor... change "Variables:" to DQOLY1-DQOLY51 in "Extraction..." change "Number of factors:" to 3 then re-execute the analysis

will not consider rotations of CD1-CDI27 since the recommended # of scales is 1 and so rotations are unnecessary

Using Sorted Rotated Loadings

assigning items to factors
18 load more on factor 1: 35-51, 7 26 load more on factor 2: 1-6, 8-23, 31-34 7 load more on factor 3: 24-30 need to know what the items are in order to interpret these results

items that are possibly discardable

using 0.3 as the cutoff for low/high
40,34 load on more than 1 factor while 7,15,16 load on 0 factors suggests discarding 5 items

using 0.4 as the cutoff for low/high

0 items load on > 1 factor, but 10 load on no factors: 7,8,9,19,17,14,31,12,15,16 suggests discarding 10 items


Significant Loadings for DQOLY

significant rotated loadings (using variamax rotation)
0 items load on 0 factors
all DQOLY items appear to be of distinct value

30 items load on exactly 1 factor 19 items load on exactly 2 factors, 3 on all 3 factors
so 43% of the items address multiple factors

comparison to recommended scales

of the 17 satisfaction items (35-51), all load on factor 1 of the 23 impact items (1-23), all but 1 load on factor 2
all but item 7

of the 11 worry items (24-34), all but 2 load on factor 3

all but items 31,32

identified factors appear to be similar to standard DQOLY 105 constructs with only 5.9% (3/51) inconsistent items

Significant Loadings for CDI

using ML factor extraction of 1 factor without rotation and with the COVER option in SAS significant unrotated loadings
4 items do not load on the 1 factor
items 9,23,25,26

23 items load on the 1 factor

a substantial amount of 17.4% (4/27) of the items appear to be of negligible value for the ABC subjects

Orthogonal vs. Oblique Rotations

factors are independent of each other under the factor analysis model satisfying z=L(1)@F(1)+L(2)@F(2)++L(k)@F(k)+u rotations change both the loadings and the factors in such a way that the same relationships hold as before z=LN(1)@FN(1)+LN(2)@FN(2)++LN(k)@FN(k)+u an orthogonal rotation preserves perpendicularity between the axes
which means that factors remain independent

an oblique rotation does not preserve perpendicularity between the axes

which means that factors become correlated

SPSS Rotation Approaches

the default rotation approach is not to rotate ("None") 3 orthogonal rotation approaches are supported
Varimax, Quartimax, Equamax

2 oblique rotation approaches are supported

Direct Oblimin
changes with parameter called Delta with default value 0 becomes less oblique as Delta becomes more negative

starts with a Varimax rotation changes with parameter called Kappa with default value 4

SAS Rotation Approaches

the default rotation approach is not to rotate
use "ROTATE=" option to assign a rotation scheme, e.g., "ROTATE=VARIMAX"

many orthogonal rotation approaches are supported

ORTHOMAX with a weight parameter called GAMMA
GAMMA=1 by default, same as VARIMAX GAMMA=0, same as QUARTIMAX GAMMA=(# of factors)/2, same as EQUAMAX GAMMA=.5, same as BIQUARTIMAX GAMMA=(# of items), same as FACTORPARSIMAX GAMMA=(# of items)(# of factors 1)/(# of items + # of factors 2), same as PARSIMAX these include all orthogonal approaches supported in SPSS

orthogonal Crawford-Ferguson rotation approaches

ORTHCF with 2 parameters, ORTHGENCF with 4 parameters

SAS Rotation Approaches

many oblique rotation approaches are also supported
OBLIMIN with a weight parameter called TAU
TAU=0 is same default as in SPSS (but called Delta), same as QUARTIMIN TAU=1, same as COVARIMIN TAU=.5, same as BIQUARTIMIN

PROMAX with a parameter called POWER

POWER=3 is the default rather than 4 as in SPSS (but called Kappa) by default, it starts with a VARIMAX orthogonal rotation as in SPSS, but can be started from any other orthogonal or oblique rotation

Harris-Kaiser (HK) with a parameter called HKPOWER having default of 0.0

when HKPOWER=1, HK becomes VARIMAX Harris-Kaiser type oblique versions of other orthogonal approaches are also possible

oblique versions of all orthogonal approaches are available

but not clear if they overlap with the above or not

includes all orthogonal approaches supported in SPSS 110

Oblique Rotation Example

in SPSS, run the ML method for the FACES items extracting the recommended # of 2 factors and rotate loadings using promax rotation with its default parameter setting
re-enter Analyze/Data Reduction/Factor... change "Variables:" to FACES1-FACES30 in "Extraction..." change "Number of factors:" to 2 in "Rotation...", click on "Promax" leave Kappa at its default value of 4 to get the same result as the default promax rotation in SAS, change Kappa to 3 then re-execute the analysis


Rotating the Initial Loadings

Factor Matrixa Factor 1 FACES1 FACES2 FACES3 FACES4 FACES5 FACES6 FACES7 FACES8 FACES9 FACES10 FACES11 FACES12 FACES13 FACES14 FACES15 FACES16 FACES17 FACES18 FACES19 FACES20 FACES21 FACES22 FACES23 FACES24 FACES25 FACES26 FACES27 FACES28 FACES29 FACES30 .692 .590 -.328 .406 .512 .321 .643 .769 -.317 .314 .354 -.314 .279 .150 -.222 .614 .632 .678 -.484 .536 .537 .644 .670 -.298 -.538 .720 .486 -.218 -.482 .634 2 .114 -.043 .491 -.015 .003 .226 .026 -.004 .230 .472 .091 .033 .173 .240 .426 .282 -.064 .316 .488 .191 .157 .147 -.096 .241 .596 .151 .125 .300 .330 .235

the matrix of loadings

with 30 rows and 2 columns

it is multiplied on the right by the factor transformation matrix for a varimax rotation
with 2 rows and 2 columns to produce the varimax-rotated factor matrix

then this is multiplied on the right by another transformation matrix to generate the promax rotated loadings
will also have 30 rows and 2 columns

Extraction Method: Maximum Likelihood. a. 2 factors extracted. 5 iterations required.

Promax Rotated Loadings

a Pattern Matrix


Factor 1 .643 .429 .161 .308 .406 .447 .530 .603 -.053 .651 .357 -.220 .368 .323 .188 .725 .444 .806 .036 .586 .558 .634 .447 -.029 .085 .697 .491 .084 -.098 .701

2 -.102 -.244 .657 -.151 -.166 .145 -.184 -.259 .362 .422 -.016 .141 .100 .218 .548 .111 -.281 .128 .705 .035 -.003 -.049 -.329 .367 .843 -.071 -.022 .406 .527 .052

the matrix of promax rotated loadings

called the pattern matrix in SPSS with 30 rows and 2 columns

FACES1 once again loads more highly on the first factor

since .643 is larger than .102 with more of a difference vs. .645 and !.274 for varimax the first loading is about the same while the second has gotten smaller in absolute value

Extraction Method: Maximum Likelihood. Rotation Method: Promax with Kaiser Normaliz a. Rotation converged in 3 iterations.

Inter-Factor Correlations
since promax is an oblique rotation, the associated factors are correlated the factor correlation matrix contains those correlations
only 1 in this case because there are only 2 factors

the 2 factors for this case are distinctly inversely related with an estimated correlation of !.511
Factor Correlation Matrix Factor 1 2 1 1.000 -.511 2 -.511 1.000

Extraction Method: Maximum Likelihood. Rotation Method: Promax with Kaiser Normalization.


The Structure Matrix

Structure Matrix Factor 1 2 .695 -.431 .554 -.463 -.175 .574 .385 -.308 .491 -.374 .372 -.083 .624 -.454 .736 -.567 -.238 .389 .435 .089 .365 -.198 -.292 .253 .317 -.088 .212 .053 -.092 .452 .668 -.260 .587 -.508 .740 -.283 -.325 .687 .568 -.264 .560 -.288 .659 -.373 .615 -.557 -.217 .382 -.346 .800 .733 -.427 .502 -.272 -.124 .364 -.368 .577 .675 -.307 FACES1 FACES2 FACES3 FACES4 FACES5 FACES6 FACES7 FACES8 FACES9 FACES10 FACES11 FACES12 FACES13 FACES14 FACES15 FACES16 FACES17 FACES18 FACES19 FACES20 FACES21 FACES22 FACES23 FACES24 FACES25 FACES26 FACES27 FACES28 FACES29 FACES30

SPSS also generates a matrix it calls the structure matrix

SAS calls it the factor structure matrix it equals the pattern matrix multiplied on the right by the factor correlation matrix its entries are the correlations between the items and the factors

the correlation between

FACES1 and factor 1 is .695
about the same as the loading of .643

FACES1 and factor 2 is !.431

much different from the loading of !.102 a low absolute loading may be associated with 115 a substantially stronger correlation

Extraction Method: Maximum Likelihood Rotation Method: Promax with Kaiser No

The Reference Structure Matrix

pattern matrices can have absolute loadings larger than 1 and possibly much larger [8]
this did not happen for this analysis

SAS generates a reference structure matrix for use in place of the pattern matrix when it has such anomalous values
it is interpreted in the same way as a pattern matrix not generated in SPSS

however, if such problems occur, maybe that is an indication that the rotation approach needs to be changed

Using Sorted Rotated Loadings

factor 1 loadings decrease in absolute value from FACES18 to FACES12 while remaining larger in absolute value than factor 2 loadings
22 load more on factor 1: 18,28,,12

after that factor 2 loadings decrease in absolute value while remaining larger in absolute value than factor 1 loadings
8 load more on factor 2: 25,19,3,29,15,28,9,24

exactly the same as for varimax rotation

there no impact to using promax based on varimax over varimax without adjustment

"orthogonal rotations usually lead one to essentially the same major groupings as oblique rotations" [11, p. 536] 117

Impact of Rotations
considered 10 rotations plus no rotation [10]
4 orthogonal
varimax, quartimax, equamax, parsimax

6 oblique
Harris-Kaiser promax starting from each of the other 5
with the default parameter POWER=3

ran this in SAS

generated associated item-scale allocations

with each item allocated to the factor/scale for which it achieves its maximum absolute loading, without discarding any items 118

Impact of Rotations
for the FACES items
all 10 rotations generated the same allocation

for the DQOLY items

the 10 rotations generated 4 different allocations but these were not too different from each other

for both the FACES and DQOLY items

the allocations based on unrotated loadings were much different from the ones based on rotations and from recommended allocations

rotating the loadings appears to have a distinct impact on the results compared to not rotating them, but the choice of the rotation may not have much of an impact on those results 119

The Standard CDI Scale

a Factor Matrix

cdi1 cdi2 cdi3 cdi4 cdi5 cdi6 cdi7 cdi8 cdi9 cdi10 cdi11 cdi12 cdi13 cdi14 cdi15 cdi16 cdi17 cdi18 cdi19 cdi20 cdi21 cdi22 cdi23 cdi24 cdi25 cdi26 cdi27

Factor 1 .611 -.671 .644 .353 -.209 .539 -.741 -.295 .172 -.641 -.795 .283 -.313 .655 -.258 -.255 .211 -.239 .373 .532 -.421 .551 .155 -.463 -.055 .128 .271

CDI has 27 items scored from 0-2 these are summed to produce its one scale measuring the amount of depressive symptoms after reverse coding 13 of the items
items 2,5,7,8,10,11,13,15,16,18,21,24,25 are reverse coded replace an item y by 2 y

if 1 factor is extracted using the ML method, 13 items have negative loadings

the same items as are reverse coded in the standard CDI scale 120

Extraction Method: Maximum Likelihood. a. 1 factors extracted. 4 iterations required

The Standard FACES Scales

a Factor Matrix


Factor 1 .690 .587 -.291 .407 .518 .338 .651 .773 -.311 .336 .351 -.311 .296 .167 -.196 .625 .621 .688 -.440 .543 .547 .655 .666 -.284 -.481 .725 .497 -.202 -.454 .640

FACES has 30 items scored from 1-5 and 2 scales

family adaptability is computed by summing the even items other than item 30 with 2 items 24,28 reverse coded
i.e., replace an item y by 6 y

family cohesion is computed by summing the odd items plus item 30 with 6 items 3,9,15,19,25,29 reverse coded

if 1 factor is extracted using the ML method, 9 items have negative loadings

the same items as are reverse coded in the standard FACES scales plus one other: item 12 121

Extraction Method: Maximum Likelihood. a. 1 factors extracted. 5 iterations required.

The Standard DQOLY Scales

Factor Matrix

Factor 1 dqoly1 -.328 dqoly2 -.288 dqoly3 -.378 dqoly4 -.207 dqoly5 -.243 dqoly6 -.333 dqoly7 .398 dqoly8 -.229 dqoly9 -.305 dqoly10 -.381 dqoly11 -.264 dqoly12 -.085 dqoly13 -.403 dqoly14 -.195 dqoly15 -.299 dqoly16 -.287 dqoly17 -.391 dqoly18 -.328 dqoly19 -.300 dqoly20 -.365 dqoly21 -.244 dqoly22 -.206 dqoly23 -.105 dqoly24 -.268 dqoly25 -.248 dqoly26 -.296 dqoly27 -.262 dqoly28 -.115 dqoly29 -.392 dqoly30 -.281 dqoly31 -.386 dqoly32 -.281 dqoly33 -.288 dqoly34 -.487 dqoly35 .589 dqoly36 .464 dqoly37 .572 dqoly38 .577 dqoly39 .477 dqoly40 .569 dqoly41 .415 dqoly42 .663 dqoly43 .671 dqoly44 .610 dqoly45 .719 dqoly46 .581 dqoly47 .677 dqoly48 .704 dqoly49 .461 dqoly50 .670 dqoly51 .597 Extraction Method: Maximum Likelihood. a. 1 factors extracted. 8 iterations required.

DQOLY has 51 items scored 1-5 and 3 scales

disease impact is the sum if the first 23 items (1-23) with item 7 reverse coded worry is the sum of the next 11 items (2434) with none reverse coded satisfaction is the sum of the last 17 items (35-51) with none reverse coded
but these have the reverse orientation to all the other items except item 7

if 1 factor is extracted using the ML method, 18 items have positive loadings

the 17 satisfaction items along with item 7
the ones with reverse orientation in the standard DQOLY scales

Reverse Coding Summary

signs of the 1-factor loadings appear to provide information about appropriate reverse coding
even when there are more than 1 underlying factors to supplement related theoretical item construction considerations [16]

for CDI and DQOLY, items were separated into those usually reverse coded vs. those usually not for FACES, items were separated into those usually reverse coded plus item 12 vs. the others usually not
12 is the only item in the 2-factor solution with maximum absolute loading at a negative value

FACES item 12 is used to compute family adaptability

"it is hard to know what the rules are in our family" less clearly defined rules are supposed to mean more adaptability

perhaps, for the ABC subjects, more clearly defined family rules allowed them more flexibility to adapt in ways that do not violate those rules 123

Varimax Allocation - FACES

FACES has 30 items and 2 recommended scales
family adaptability computed from even items other than item 30 with items 24,28 reverse coded family cohesion computed from odd items plus item 30 with items 3,9,15,19,25,29 reverse coded

allocations based on ML extraction of 2 factors and varimax rotation

items 3,9,15,19,24,25,28,29 separated from the rest a much different allocation than recommended all items usually reverse coded are separated from all the other items
explains why the negative inter-factor correlation

Varimax Allocation - DQOLY

DQOLY has 51 items and 3 recommended scales
disease impact computed from items 1-23 with item 7 reverse coded worry computed from items 24-34 with none reverse coded satisfaction computed from items 35-51 with none reverse coded, but with reverse orientation to others except item 7

allocations based on ML extraction of 3 factors and varimax rotation

satisfaction items 35-51 plus item 7 all impact items except item 7 plus worry items 31-34 worry items 24-30 not too different from to the recommended allocation but once again all items usually considered to have the reverse orientation are separated from all the other items


Item-Scale Allocation Summary

varimax-based item-scale allocations for DQOLY were quite consistent with the recommended allocation
the recommended DQOLY scales seem appropriate to use with these subjects, but they were developed specifically for youth with diabetes

varimax-based item-scale allocations for FACES were quite different from the recommended allocation
the recommended FACES scales might be inappropriate to use for families with adolescents having type 1 diabetes

for both FACES and DQOLY, varimax rotation separated off the items with reverse orientation from the others
does it really identify sets of items associated with different latent constructs or just having different orientations?

Part 5 Factor Analysis Model Evaluation

scoring factor analysis models choosing the # of factors evaluating alternative factor extraction methods CFA models for scales suggested by rotations comparison of scales assessing individual items item residual analyses

Scoring Factor Analysis Models

using likelihood cross-validation (LCV)
measures how well a model estimated on portions of the data predicts the remaining data in subsets called folds
with the data randomly partitioned into k disjoint folds

based on likelihoods for data in folds using parameter estimates computed from data outside of the folds
using the multivariate normal likelihood as in ML factor extraction multiply these deleted fold likelihoods together and normalize to the # of item responses to get the LCV score

larger scores mean models more compatible with data

scores within 1% of best are nearly optimal alternatives

computable for EFA and CFA models

using specialized SAS macros available on the Internet
results from [10] are reported in what follows

Choosing the Number of Factors

using ML factor extraction, the recommended # of factors is chosen for all 3 sets of items using LCV
1 for CDI, 2 for FACES, and 3 for DQOLY this holds for any # k of folds as long as it is not too small so used k=10 for CDI and FACES and k=15 for DQOLY

LCV seems to be a reasonable way to assess how many factors to extract also considered a variety of other approaches
including rules based on eigenvalues and penalized likelihood criteria

the only other approach with somewhat acceptable results was the minimum BIC approach
which chose 1 for CDI, 2 for FACES, and 2 for DQOLY 129

Alternative Numbers of Factors

for CDI, 1 factor is a clear-cut choice
no other choices have LCV scores within 2% of best

3 factors has a score within 1% of best 1 factor has a score of just above 1% of best

2, 4, and 5 factors have scores within 1% of best

different choices for the # of factors can be competitive alternatives to the choice with the best score
a range of #'s of factors can have about the same effect part of why choosing the # of factors is a difficult problem

Alternative Extraction Methods

considered a variety of factor extraction methods, factoring the correlation matrix as well as the covariance matrix when possible
one-step and iterated PC and PF methods ML, unweighted least squares image component analysis, Harris component analysis

for all these methods, the recommended # of factors is chosen for all 3 sets of items using LCV
1 for CDI, 2 for FACES, and 3 for DQOLY there was also very little difference in maximum LCV scores for all of these methods

there does not seem to be much of an impact to the choice of factor extraction procedure 131

Evaluation of Rotations
rotations do not change the correlation structure of the EFA model and so cannot be directly evaluated by LCV but they do suggest summated scales with loadings changed to 1 or 0 which change the correlation structure so rotations can be evaluated using LCV by evaluating CFA models based on rotation-suggested scales considered variety of CFA models for FACES/DQOLY
based on rotation-suggested scales vs. on recommended scales with unit (1) loadings vs. with estimated loadings with all scales dependent vs. with all independent vs. with any subset independent and the rest dependent

also compared these to EFA models

with all items allocated to all scales with estimated loadings


Example CFA Model

this CFA model has
2 factors: F1 and F2 4 items: I1, I2, I3, and I4 items I1 and I2 load on factor F1
with loadings are L1_1 and L2_1 with unique errors U1 and U2 loadings L1_2 and L2_2 are 0











items I3 and I4 load on factor F2

with loadings are L3_2 and L4_2, with errors U3 and U4 loadings L3_1 and L4_1 are 0

covariance for F1, F2 is C1_2, variances of U1-U4 are V1V4

PROC CALIS ; LINEQS I1 = L1_1 F1 + U1 COV F1 F2 = C1_2; STD U1-U4 = V1-V4; VAR I1-I4; RUN; I2 = L2_1 F1 + U2 I3 = L3_2 F2 + U3 I4 = L4_2 F2 + U4;


Comparison of Scales
treating scales as dependent was always better
so subsequent reported results use dependent scales not so surprising since scales from the same instrument measure related latent constructs

varimax-suggested scales with estimated loading were best overall for both FACES and DQOLY
other rotations were as good or at least almost as good and a little better than EFA-based scales
so treating factors as grouped rather than as general is reasonable

when items are reallocated 1 at a time to other scales

starting from varimax-suggested scale with loadings reestimated no reallocation generated an improvement for FACES only one generated a very small improvement for DQOLY
which changes item 7's allocation to be compatible with its recommended allocation

varimax-suggested allocations may be almost optimal


Comparison of Summated Scales

recommended summated scales (with loadings of 1 or 0) were competitive for DQOLY but not for FACES
for adolescents with type 1 diabetes, the DQOLY scales seem reasonable to use but there can be a tangible penalty to using the standard FACES scales

on the other hand, summated scales based on unrotated loadings were not competitive for both FACES and DQOLY
the common practice of basing scales on a rotation appears much better than basing them on unrotated 135 loadings

Assessing Individual Items

to assess the value of an individual item
can use the % change in LCV score when an item's loadings are changed to 0 for all factors, effectively discarding it the larger the % decrease, the more valuable the item the larger the % increase, the more expendable the item

for FACES and DQOLY, all items are of some value

with % decreases ranging from 0.04% to 3.10%

for CDI, all but 5 items are of some value

5 items 9,18,23,25,26 had very small % increases (#0.04%)
all but item 18 also have nonsignificant loadings

there is no compelling reason to discard items from these 3 instruments

the removal of none provides a tangible improvement

Results for CDI

the EFA model had the better score
but the recommended summated scale was competitive

is the assumption of normality reasonable? are there any outlying item values?
need item residuals for this

to assess this, standardized the item residuals to be independent and standard normally distributed
for the 27103=2781 item values without reverse coding evaluated the EFA model with the better LCV score estimated the 2727 covariance matrix using all the data to reduce the effort, computed standardized residuals for item responses from subjects in each fold separately rather 137 than for all item responses of all subjects combined

Normal Probability Plot - CDI

normality assumption questionable
the plot is curved

there is an extreme standardized residuals of !7.2

Normal Plot for CDI Items
8 Standardized Residual 6 4 2 0 -2 -4 -6 -8 -4 -3 -2 -1 0 Norm al Score 1 2 3 4

for a value of item 25 with meaning: 0: nobody really loves me 1: I am not sure if anybody loves me 2: I am sure that somebody loves me a value of 2 occurs 101 times values of 0 and 1 each occur 1 time the one value of 0 is the outlier

almost all of these adolescents felt loved, so item 25 contributes little distinguishing information
its loading was also found not to be significantly different from zero

Standardized Residual Plot - CDI

observed CDI item means cluster near the extremes of 0 and 2 for item values
Standardized Residual Plot for CDI Items
8 Standardized Residual 6 4 2 0 -2 -4 -6 -8 0 0.5 1 CDI ITem Mean 1.5 2

with residuals tending to be more outlying the closer the mean is to the extremes

this might be why normality is questionable perhaps this will often hold when the range of item values is so limited

Residual Analysis - FACES

Normal Plot for FACES Items
6 Standardized Residual 4 2 0 -2 -4 -6 -4 -3 -2 -1 0 Norm al Score 1 2 3 4

using varimax-suggested scales with estimated loadings

without reverse coding items

normality assumption appears reasonable

normal plot is quite straight residuals quite symmetric between 4 and often 3

Standardized Residual Plot for FACES Items

6 Standardized Residual 4 2 0 -2 -4 -6 1 1.5 2 2.5 3 3.5 4 4.5 5 CDI ITem Mean

observed FACES item means are all well away from the extremes of 1 and 5 number of items responses

Residual Analysis - DQOLY

Normal Plot for DQOLY Items
6 Standardized Residual 4 2 0 -2 -4 -6 -6 -4 -2 0 Norm al Score 2 4 6

using varimax-suggested scales with estimated loadings

without reverse coding items

normality assumption somewhat reasonable

normal plot is fairly straight residuals sometimes asymmetric and occasionally outside of 4

Standardized Residual Plot for DQOLY Items

6 Standardized Residual 4 2 0 -2 -4 -6 1 1.5 2 2.5 3 3.5 4 4.5 5 CDI ITem Mean

observed DQOLY item means are all away from the extremes of 1 and 5 number of items responses 141

Part 5 A Case Study in Ongoing Scale Development


Developing New Scales

an extensive effort is required before it is possible to conduct a factor analysis
an initial pool of items needs to be generated
a primarily qualitative rather than quantitative task

item responses need to be collected from a large sample of subjects

5-10 subjects per initial item is usually considered desirable

some subjects need to be interviewed at 2 points in time

to be able to assess test-retest reliability

the construct validity of the new scales needs to be assessed after the factor analysis
do the new scales predict related quantities as expected?

Family Management Style Survey

currently under development
parents of children having a chronic illness are being interviewed on how their families manage their child's chronic illness interviewing is almost finished

data currently available for 528 parents of 382 families

236 families with 1 responding parent
3 of these were fathers; the mothers in these families agreed to participate, but it has not yet been possible to interview them

146 families with both mothers and fathers responding

so have data for 379 mothers and 149 fathers complicated by the need to account for the correlation between responses of parents from the same family
but can analyze data from mothers/fathers separately

only incomplete, preliminary results are currently available


The FMS Framework

FMSS items were based on the Family Management Style (FMS) Framework
conceptualizes how families define and manage a childs chronic illness [17]

5 FMSs
thriving, accommodating, enduring, struggling, floundering reflecting a continuum of difficulty for managing childhood chronic illness and the extent to which family members' experiences were similar or discrepant

3 major components of the illness experience

definitions of the situation, management behaviors, and perceived consequences refined into 8 major themes common to all families

FMSS items address the 3 components and 8 themes

so when asked to estimate the # of factors for the FMSS, the 145 PI replied between 3 and 8

The FMSS Items

there are 65 initial FMSS items
items 58-65 address issues related to the parent's spouse and so are not completed by single parents will restrict the factor analysis to items 1-57 applicable to both single and partnered parents
for all 528 parents have 9.3 subjects per item for only the 379 mothers have 6.6 subjects per item

all items are coded from 1-5

1="strongly disagree" and 5="strongly agree" the interview form also included 3 other choices
"Not Applicable", "Don't Know", "Refused" provides extra qualitative information for item assessment but also increases the # of missing responses

only 280 of the 379 mothers provided values of 1-5 for all of items 1-57 or 4.9 subjects per item very important to adjust for missing data as well as for inter146 parental correlation to avoid losing so much data

Item Response Consistency

74 parents were retested about 2 weeks apart
46 females and 28 males
including both parents of 24 families

correlations between test and retest responses for each of items 1-65
assess the consistency of responses to items over time computed for mothers separately from fathers used Spearman correlations since the range 1-5 for item values was limited for mothers, correlations were significantly nonzero for all items for fathers, correlations were nonsignificant for 8 of the items
items 36(p=.057), 22 (p=.077), and 7,8,18,19,29,60 (p>.10)

responses for mothers were reasonably consistent across time while fathers changed responses fairly often to quite a few items (8/65 or 12.3%) 147

FMSS Item Means

reported analyses that follow are for the 280 mothers who responded to all of items 1-57 items with ceiling/floor effects are undesirable
means for items 1-57 ranged from 1.4 to 4.7 1 item (42) had a mean < 1.5
most mothers strongly disagreed on 1 item

5 items (23,30,39,40,52) had means > 4.5

most mothers strongly agreed on several items

for 4 items (23,30,39,52), the middle value of 3 was over 2 standard deviations from the item mean
these may be distinctly problematic

FMSS Item Correlations

it is usually recommended to inspect the correlation matrix for the items before factoring them [11]
is there a substantial # of large correlations?
perhaps the items are close to independent of each other and so there will be little benefit to factoring which will be deceptive, extracting factors that really do not exist

do they form groups?

a daunting task when there are 57 items

with 57(57-1)/2=1566 distinct correlations

can assess if factoring provides a benefit over treating items as independent using LCV scores

Reverse Coding
extracting 1 factor using the ML method
signs of the loadings suggest that 25 of the 57 items need reverse coding from the other 32 items
items 1,3,8,9,15,17-20,23,24,26,28,30,31,36,37,39,40,46, 48,50,52,53,56

all of these items except item 36 were considered to have been worded positively
item 36 was considered to have been worded neutrally

all of the others were originally considered to have been worded negatively except for 5 items
items 5,6,7,45 were considered to have been worded positively item 43 was considered to have been worded neutrally 150 need to check on these potential inconsistencies

# of Factors
scree plot indicates 1 to about 7 factors using ML factor extraction
BIC is minimized at 3 factors LCV is maximized at 8 factors but LCV scores for 3-13 factors are all within 1% of best
Scree Plot
12 10 8


1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7

Component Number

so 3 factors would be a parsimonious nearly optimal choice

the # of factors could reasonably be one 11 different values

why choosing it can be difficult using conventional methods

but results are consistent with the FMS Framework


Comparison of Scales
using the 8-factor solution with best the LCV score the independent errors model had 9.8% lower score
so there is a distinct benefit to factoring the items

using estimated loadings

the EFA model had a better LCV score
treating factors as general with all items loading on all factors

than the associated CFA model for varimax-suggested scales with estimated loadings
treating factors as grouped with items loading only on one of the factors

but the scores were not too different

with only a 0.3% decrease in LCV

so treating the factors as group factors seems reasonable


Comparison of Scales
using unit loadings (i.e. summated scales)
the LCV score decreased by a little over 2% can be a tangible penalty to using summated scales

using the model suggested by the FMS Framework

with the 57 items allocated to 7 theory-based factors
items 59-65 correspond to a 8th theory-based factor, but these were not used in the analysis

the model with estimated loadings was competitive

LCV score about 1.5% lower than the best overall score

so basing scales on theory may be reasonable


Residual Analysis - FMSS

Normal Plot for FMSS Items
6 Standardized Residual 4 2 0 -2 -4 -6 -8 -6 -4 -2 0 Normal Score 2 4 6

using varimax-suggested scales with estimated loadings for 8 factors normality assumption somewhat questionable
normal plot curved at low end residuals can be skewed, but more to the low end than the high end
lower negative values are due to larger means, i.e., a tendency to respond as strongly disagree more often

Standardized Residual Plot for FMSS Items

6 5 4 3 2 1 0 -1 -2 -3 -4 -5 -6 1 1.5 2 2.5 3 Item Mean 3.5 4 4.5 5

Standardized Residual

extreme residuals for 6 items

with absolute value over 4.5 items 23,30,39,52 as identified before as well as items 55,57

number of items responses


Item Removal
removing either item 55 or item 57 imposes a tangible penalty in reduced LCV score
removing either generates a 1.1% decrease in LCV while they may generate large residuals, they still have value

removing items 23,30,39,52 does not impose a tangible penalty

removing them one at a time generates decreases in LCV of less than 0.5% removing all of them together generates a decrease in LCV of 0.6% these items all seem expendable

still need to assess the impact of removal of the other 155 items

Item Boxplots
items 23,30,39,52 are highly skewed at the low end
primarily strongly disagree with responses close to strongly agree outlying

items 55,57 are highly skewed the other way

primarily strongly agree with responses close to strongly disagree quite a bit less 156 likely, but not outlying

collection and analysis of the ABC data was supported in part by NIH/NINR Grant # R01 NR04009, PI Margaret Grey, and NIH/NIAID Grant # R01 AI057043, PI George Knafl collection and analysis of the FMSS data was supported in part by NIH/NINR Grant # R01 NR08048, PI Kathleen Knafl Jean O'Malley assisted in the preparation of these lecture notes and in organizing the background literature

1. Johnson RA, Wichern DW. Applied multivariate statistical analysis. Prentice-Hall, 1992. 2. McCorkle R, Young K. Development of a symptom distress scale. Cancer Nursing 1978; 1: 373-378. 3. Kovacs M. The children's depression inventory (CDI). Psychopharmacology Bulletin 1985; 21: 995-998. 4. Olsen DH, McCubbin HI, Barnes H, Larsen A, Larsen A, Muzen M, Wilson M. Family inventories. Family Social Science, 1982. 5. Ingersoll GM, Marrero DG. A modified quality of life measure for youths: psychometric properties. The Diabetes Educator 1991; 17: 114-118. 6. Cella DF, Tulsky DS, Gray G, Sarafian B, Linn E, Bonomi A, Silberman M, Yellen SB, Winicour P, Brannon J. The Functional Assessment of Cancer Therapy Scale: Development and validation of the general measure. Journal of Clinical Oncology 1993: 11; 570-579. 7. McHorney CA, Ware JE Jr., Raczek AE. The MOS 36 Item Short Form Health Survey (SF-36): II: Psychometric and clinical tests of validity in measuring physical and mental health constructs. Medical Care 1993; 31: 247-263. 8. Hatcher L. A step-by-step approach to using the SAS system for factor analysis and structural equation modeling. SAS Institute, 1994. 9. Grey M, Davidson M, Boland EA, Tamborlane WV. Clinical and psychosocial factors associated with achievement of treatment goals in adolescents with diabetes mellitus. Journal of Adolescent Health 2001; 28: 377-385. 10. Knafl GJ, Grey M. Factor analysis model evaluation using likelihood cross-validation. Statistical Methods for Medical Research in press. 11. Nunnally JC, Bernstein IH. Psychometric theory. McGraw-Hill, 1994. 12. Ferketich S, Muller M. Factor analysis revisited. Nursing Research 1990; 39: 59-62. 13. Polit DF. Data analysis and statistics for nursing research. Appleton & Lange, 1996. (see pp. 373-377 on presenting results for factor analysis) 14. SAS Institute Inc. SAS/STAT 9.1 user's guide. SAS Institute, 2004. 15. Spector PE. Summated rating scale construction: an introduction. Sage, 1992. 16. DeVellis RF. Scale development: theory and applications. Sage, 1991. 17. Knafl, K., B. Breitmayer, A. Gallo, & L. Zoeller. Family response to childhood chronic illness: description of 158 management styles. Journal of Pediatric Nursing 1996; 11: 315-326.

