A Critical Review of Psychosocial Hazard Measures 1658526248
A Critical Review of Psychosocial Hazard Measures 1658526248
A Critical Review of Psychosocial Hazard Measures 1658526248
Prepared by
The Institute for Employment Studies
for the Health and Safety Executive
Health and safety legislation requires that employers regularly conduct risk assessments to identify
what in their workplace is a potential hazard to (ie could harm) employee health.
The idea of risk assessment for physical hazards is well-established. More recently attention has
focused on the assessment of risk from psychosocial hazards and in doing so measures have been
developed or adopted from research to assess the prevalence of workplace stressors.
Whilst much research has been done on stress, there exists no systematic overview of the different
types of stressor measures available in the UK, nor is there any consistently recorded information
about their relative merits.
This report seeks to fill that gap by identifying a wide range of commonly used measures, assessing
the research evidence available on them and providing an overview of their relative strengths.
Conclusions are drawn about the state of knowledge in this area and issues for practice and research.
This report and the work it describes were funded by the Health and Safety Executive (HSE). Its
contents, including any opinions and/or conclusions expressed, are those of the authors alone and do
not necessarily reflect HSE policy.
HSE BOOKS
© Crown copyright 2001
Applications for reproduction should be made in writing to:
Copyright Unit, Her Majesty’s Stationery Office,
St Clements House, 2-16 Colegate, Norwich NR3 1BQ
ii
The Project Team
A large team of contributors were involved at different stages of
this report. Details of the full project team and their affiliations are
as follows:
iii
iv
Acknowledgements
v
vi
Contents
1. Introduction 1
1.1 Research objectives 2
1.2 The risk management framework 2
1.3 What do organisations do? 3
1.4 What psychosocial hazard measures are available? 5
3. Methodology 13
3.1 Identifying measures of psychosocial hazards 14
3.2 Targeting appropriate material 18
3.3 Search and review process 19
3.4 Search results on tool names 19
3.5 Search results on citations 21
3.6 Final search results 23
3.7 Review procedure 23
3.8 A brief overview of the psychometric criteria 25
3.9 Development of a pro forma to capture assessments 28
3.10 Assessment criteria and information sought 28
vii
4.11 Rizzo and House Measures of Role Conflict and
Role Ambiguity 56
Bibliography 113
viii
1. Introduction
There is a rapidly growing body of research on the management
and control of workplace stress. Some of it has attempted to
categorise the types of stress management initiatives organisations
can undertake. These categorisations often include the ideas of
prevention at source, providing individuals with skills which may
help them deal with stress problems, or treating those who have
been harmed (see, for example, Ivancevich, Matteson, Freedman
and Phillips, 1990). One approach which has gained in popularity
and which may help organisations determine the kind of stress
intervention to undertake has been to try and measure workplace
stressors through the use of self report questionnaires.
1
quality of psychosocial hazard measures. This chapter looks first
at the objectives for the research and then at the broader research
context for the study.
2
principles of regulations such as the Control of Substances
Hazardous to Health Regulations (originally made in 1988, but
subsequently amended and remade several times) can be
effectively employed to manage psychosocial hazards and the
psychological harms which may be a consequence of such
hazards. The implications are that psychosocial hazards can be
managed in much the same way as physical hazards and similar
risk assessment procedures can be used in their identification and
control in the workplace.
l identification of hazards
l assessment of associated risk
l implementation of appropriate control strategies
l monitoring of effectiveness of control strategies
l reassessment of risk.
3
some way, this is not necessarily done within a risk management
framework. Indeed is appears likely that what organisations do is
engage in wide range of rather different kinds of activities which,
either deliberately or otherwise, assess and manage psychosocial
hazards and harms.
4
and medium sized employers (SMEs) were doing things about
stress, some of which could be viewed as primary interventions.
However, it was only as a result of taking part in the focus groups
for the research that some SME managers realised such activities
counted as doing something about stress (as opposed to simply
being good management practice). Likewise, other practices such
as flexible working or employee attitude surveys may help with
identifying and managing psychosocial hazards and harms but
may not be considered in this light by organisations. Indeed,
many of the principles of good management in Investors in People
(IiP) might otherwise be labelled as primary stress management.
This would suggest that organisations use methods other than the
kinds of psychometric scales reviewed in this research to identify
workplace stressors. This has implications for the current project.
A complete review of each and every method of psychosocial
hazard assessment might include, for example, checklists of job or
environment characteristics, observations of employees, focus
groups, data from attitude surveys, measures of output, etc. The
use of such techniques is likely to vary across different types of
workplaces and different kinds of specific hazards. For example,
research into violence at work has identified specific hazards, such
as cash handling or certain aspects of working with the public.
Checklists have been developed to identify and minimise risks in
relation to these hazards and such checklists are widely used in
certain settings, eg the violence risk assessment checklist
promoted by UNISON or ‘Violence and Aggression to Staff in
Health Services’ (HSE, 1997). Attempts to review the whole area
would be an enormous task. The specific focus of this review is on
those psychometric scales or questionnaires designed to measure
workplace stressors.
5
diagnostic instruments. This type of review is helpful in
identifying some of the existing measures. However, not all
measures are available in the UK or would cross from US to UK
work cultures. A sense of what is available in the UK can be
gleaned from looking at publishing catalogues. However, an
examination of the catalogues of five major test publishing houses
(Thames Valley Test Company; NFER Nelson; SHL: Psychological
Corporation; and ASE) revealed only one product, the
Occupational Stress Indicator as available specifically for the
measurement of workplace stressors. Generic measures of harm
were also available (eg General Health Questionnaire [GHQ-12])
which measures poor levels of psychological well-being.
6
2. Effectiveness of Measures of Psychosocial
Hazards (Stressors)
7
kinds of items presented and responses required. For example,
some items may be very similar to those found in widely-used
employee attitude or opinion surveys, where the respondent is
required to indicate the extent to which they agree or disagree
with a statement such as: ‘in this job there is a great deal to do’.
Other kinds of items may ask about the stressfulness or otherwise
of potential psychosocial hazards more directly though a question
such as: ‘to what extent do you find each of the following to be
stressful?’. Such questions are followed by a list of potential
psychosocial hazards such as ‘workload’ and ‘relationships with
colleagues’ and respondents are required to indicate the extent to
which they feel each of the potential psychosocial hazards
presented is stressful. Yet others ask about the number of times an
individual has experienced a certain situation in a given time
frame. There are other types of both items and response scales,
and these will be discussed in more detail later. However, it is
worth noting that measures differ in the extent to which they
attempt to assess the magnitude of a hazard, its prevalence, or both.
8
Organisations may therefore vary widely in their reasons for
measuring psychosocial hazards. In addition, an organisation may
wish to measure psychosocial hazards to meet multiple goals —
from providing individual feedback to targeting interventions.
9
2.3.3 Different types of psychosocial hazard
measure
10
psychosocial hazards (eg. interviews, observation etc.). This lack of
information is perhaps surprising and disappointing given how
widely they appear to be used and the importance of the problems
they aim to address. However, as indicated earlier, it is not
uncommon for quite widespread organisational practices to have
received relatively little evaluation. Given the multiple aims and
reasons behind measuring stressors described earlier (Section 1.3),
the answer to whether or not measuring psychosocial hazards
helps depends to a large extent on why stressor measurement is
being undertaken.
11
12
3. Methodology
This chapter describes the process developed for identifying
measures of psychosocial hazards and the research papers which
contain information about the reliability and validity of those
measures. In the main, evidence on reliability and validity of
different measures was sought from peer reviewed published
research.
It should be noted that most of the evidence for the reliability and
validity of these measures does not come from research in which a
primary aim was to assess their reliability and validity. Rather,
most of the available evidence is taken from studies which used
the target measure to assess a range of work-related factors.
13
Figure 3.1: The review process
14
consideration and checked against other lists of measures such as
that supplied in Quick, Quick, Nelson and Hurrell (1997). The aim
at this stage was to be as inclusive as possible, so references to
groups of measures were also included for further investigation.
The following measures were identified as a starting point:
15
databases available mainly through academic libraries and used
widely in research for the identification of studies on a particular
topic. Abstracts (or summaries) from most relevant academic
journals are collected on the databases and were identified
through a search on key words. The following data bases were
identified as most relevant: Psyclit, Medline, and Web of Science
(the replacement for BIDS).
Psyclit
Medline
Web of Science
16
Table 3.1: Journals selected for inclusion in review
17
l Sample size: minimum of 100 individuals. This is the smallest
sample which can support reasonable multivariate analysis
during the development and statistical assessment of the tool.
l Sample population: working adults, asked about work. A
preference for multi organisation studies over single
organisation ones was proposed, as was the preference for UK
study groups, However, no papers were excluded for either
reason alone at this stage.
l Sampling methodology: Full population studies, random or
systematic sampling were sought in preference to convenience
sampling.
It was estimated that using the full data bases’ contents would
have generated several thousand abstracts for sifting, many of
which would have been irrelevant. As the aim of the project is to
identify material on measures and techniques in current use, a cut
off date of 1990 was imposed. There was little difference in the
estimated number of papers using the 1995 and 1990 cut offs, and
the range of material identified. Measures which had been
published before 1990 were not necessarily excluded, providing
articles had been published about or using them after 1989.
18
3.3 Search and review process
Additional measures were identified in the first, most general
search and were added to the list of potential measures and
approaches to evaluate.
The second step was a search on the names of the measures etc.
identified so far, in each of the three data bases. The full abstract
for each of the papers was then downloaded. In the version of
Medline available, it was not possible to limit the search to the
preferred journals, and the journal sift was carried out by hand.
19
Table 3.2: Search results on name of measure
Measure Bids/WoS Medline Psyclit Total no. No after
1/1/1990-17/7/2000 1/1/1990-17/7/2000 1/1/1990-18/7/2000 of hits duplicates
identified
HITS OUT IN HITS OUT IN HITS OUT IN
Effort-Reward Imbalance 9 5 4 19 — 5 11 5 6 15 8
Interpersonal Conflict at Work Scale 0 — — 0 — — 1 0 1 1 1
Job Content Questionnaire 5 4 1 12 11 1 4 1 3 5 4
Job Environment Scale 0 — — 0 — — 0 — — — —
Job Stress Survey 1 0 1 2 0 2 4 0 4 7 4
Measures of Demand and Control* 23 15 8 10 5 3 10 3 7 18 10
Measures of Role Stressors 14 9 5 1 1 0 26 18 8 13 11
NHS Workforce Initiative/scales 0 — — 10 10 0 10 — — 1 1
NIOSH Generic Job Stress Questionnaire 0 — — 1 0 1 0 — — 1 1
Objective Work Characteristics 0 — — 1 — 0 0 — — — —
Occupational Stress Indicator 39 18 21 11 11 0 36 20 16 37 26
Occupational Stress Inventory 5 2 3 5 5 0 7 4 3 6 4
Organisational Constraints Scale 0 — — 1 — 1 1 0 1 2 1
Organisational Stress Health Audit 0 — — 0 — — 0 — — — —
Pressure Management Indicator 0 — — 1 — 1 1 0 1 2 1
Quantitative Workload Inventory 0 — — 1 — 1 1 0 1 2 1
Role Ambiguity/Conflict Measures 63 37 26 30 30 0 26 17 9 35 31
Stress Audits 4 0 4 27 27 0 2 0 2 6 4
Stress Diagnostic Survey 0 — — 0 — — 3 2 1 1 1
The Job Diagnostic Survey 13 4 9 4 4 0 16 10 6 15 13
The Stress Profile 7 6 1 27 27 0 6 5 1 2 1
Whitehall (II) Studies 52 36 14 90 85 5 9 4 5 24 15
Work Environment Scale 5 5 0 17 17 0 6 4 2 2 2
*(Demand and control) and (measure or tool or scale or checklist or survey)
20
3.5 Search results on citations
With the downloaded abstracts from the citation searched it was
possible to identify and exclude those papers already found
through the searches on tool names. In addition, some papers
appeared in the citations of more than one tool. The results of the
citation searches are presented in Table 3.3.
A full copy of the new papers identified were then obtained for
the next stage of the reviewing process.
Measures of Demand and Karasek R A (1979), ‘Job demands, job decision latitudes, 481
Control and mental strain: Implications for job redesign’,
Administrative Science Quarterly, Vol. 24, pp 285-308
Wall T D, Jackson P R and Mullarkey S (1995), ‘Further
evidence on some new measures of job control, cognitive 11
demand and production responsibility’, Journal of
Organizational Behavior, Vol. 16, 5, pp 431-435
Jackson P R, Wall T D, Martin R and Davids K (1993), ‘New
measures of job control, cognitive demand and production 24
responsibility’, Journal of Applied Psychology, Vol. 78, 5, pp
753-762
Measures of Role Stressors House R J and Rizzo R (1972a), ‘Towards the 21
measurement of organizational practices: Scale
development and validation’, Journal of Applied
Psychology, Vol. 56, 5, pp 388-396
21
Measure Original Reference(s) Hits (all
journals)
House R J and Rizzo R (1972b), ‘Role conflict and
ambiguity as critical variables in a model of organizational
89
Behavior’, Organizational Behavior and Human Decision
Processes, Vol. 7, 3, pp 467-505
Michigan Stress Assessment French J R P and Kahn R L (1962), ‘A programmatic 23
approach to studying the industrial environment and
mental health’, Journal of Social Issues, Vol. 18 pp 1-47
NHS Workforce Initiative Haynes C E, Wall R D, Bolden R I, Stride C, Rick J (1999), *
‘Measures of Perceived Work Characteristics for Health
Services Research: Test of a Measurement Model and
Normative Data’, British Journal of Health Psychology, Vol.
4, pp 257-275
NIOSH Generic Job Stress Hurrell J J and McLaney M A (1988), ‘Exposure to Job 20
Questionnaire Stress: A new psychometric instrument’, Scandinavian
Journal of Work, Environment and Health, Vol. 14, pp 27-
28
National Institute for Occupational Safety and Health 0
(1997), NIOSH Generic Job Stress Questionnaire,
Cincinnati, NIOSH
Objective Work Characteristics Stansfeld S A, North F M, White I and Marmot M G (1995), 20
‘Work characteristics and psychiatric disorder in civil
servants in London’, Journal of Epidemiology and
Community Health, Vol. 49, 1, pp 48-53
Occupational Pressure Not traceable N/A
Inventory
Organisational Stress Health 0
Audit
Occupational Stress Indicator Cooper C L, Sloan S J and Williams S (1988), Occupational 103
Stress Indicator Management Guide, Oxford, England,
NFER-Nelson
Occupational Stress Inventory Osipow S H and Davis A S (1988), ‘The relationship of 17
coping resources to occupational stress and strain’, Journal
of Vocational Behaviour, Vol. 32, pp 1-15
Organisational Constraints Scale Peters L H and O’Connor E J (1980), ‘Situational 53
constraints and work outcomes: The influences of a
frequently overlooked construct’, Academy of Management
Review, 5, 391-397
Pressure Management Indicator Williams A and Cooper C L (1998), ‘Measuring occupational 2
stress: Development of the Pressure Management
Indicator’, Journal of Occupational Health Psychology, Vol.
3(4): 306-321
Quality of Employment Survey Margolis B L, Kroes W H and Quinn R R (1974), ‘Job 42
Stress: An unlisted occupational hazard’, Journal of
Occupational Medicine, Vol. 16, pp 659-661
Quinn R P and Shepard L J (1974), The 1972-1973 Quality
of Employment Survey, Ann Arbor, MI, Survey Research 135
Centre
Stress Diagnostic Survey Ivancevitch J M and Matteson M T (1980), Stress at work, 2
Glenview, IL, Scott Fresman
22
Measure Original Reference(s) Hits (all
journals)
The Job Diagnostic Survey Hackman J R and Oldham G R (1975), ‘Development of the 391
Job Diagnostic Survey’, Journal of Applied Psychology, Vol.
60, 2, pp 159-170
The Stress Profile — Derogatis Derogatis L R (1984), ‘Derogatis Stress Profile’ 12
Derogatis L R (1978), Psychological Medicine, Vol. 8, pp 1
605
The Stress Profile — Setterlind Setterlind S and Larson G (1995), ‘The Stress Profile – A 6
and Larson psychosocial approach to measuring stress’, Stress
Medicine, Vol. 11, 2, pp 85-92
Whitehall II studies Stansfield S A, North F M, White I and Marmot M G (1995), *
‘Work Characteristics and Psychiatric Disorder in Civil
Servants in London’, Journal of Epidemiology and
Community Health, Vol. 49, pp 48-53
Work Autonomy Scales Breaugh J A (1985), ‘The measurement of work autonomy’, 30
Human Relations, Vol. 38, 6, pp 551-570
Work Environment Scale Moos R H (1981), Work environment scale manual, Palo 43
Alto, CA, Consulting Psychologists Press
* not included in original search
23
Table 3.4: Number of articles for review
Measures No of
articles
The main measures
Job Diagnostic Survey 12
Job Stress Survey 4
Karasek Measures of Demand and Control/ 27
Job Content Questionnaire
Occupational Stress Indicator 27
Rizzo and House’s Measures/ 44
Role Ambiguity and Role Conflict
Whitehall II Scales 15
Total number of papers for full review 129
All other approaches and measures 30
Total 159
Source: IES
The reasons for excluding papers at this stage (in addition to failing
to meet the criteria for inclusion discussed earlier) were as follows:
24
3. The authors added items to scales or sub-scales which were
not in the original.
4. The authors changed the wording of some or all of the items.
5. The authors combined sub-scales for analysis which were
separate in the original measure.
6. The authors scored the items differently (eg collapsing
categories).
7. Relevant information was not provided (eg response rate,
alpha coefficients).
A careful look at reliability and validity indicates that they are not
the same things, although they are related. It is possible for an
instrument to be reliable, but not valid. For example, a watch that
is consistently five minutes fast is reliable — it will indicate 12.05
pm every day at midday. But the watch is not valid — it is five
minutes out. Whilst it is possible for an instrument to be
consistent (reliable) but not accurate (valid), it is not possible for
an instrument to be valid if it is not also reliable. In terms of
psychosocial hazard assessment, the assessment should, for
example, identify the same hazards for the same person doing the
same job over time, provided that job does not change (that is, it is
reliable) and the assessment should identify factors associated
with the job that have the potential to cause some harm (that it is
valid). There are numerous kinds of reliability and validity which
are discussed below and in the technical appendix.
25
3.8.1 Assessing reliability
and
There are essentially three main forms of validity: face, content and
construct validity.
26
assessment. In addition, if there were no obvious links between
the assessment and a clear theoretical framework which
attempted to explain how the hazards may cause harm, questions
over the content validity may also be raised.
27
measure something to do with work-related hazards, not, for
example, aspects of personality. This is discriminant validity.
Sensible response format for the items: preferably a clear and recent
time frame; not a Likert scale format (ie strongly agree to disagree —
as such scales might be more reflective of attitudes to the work rather
than necessarily the experience of events at work); frequency based
rating if the format is meant to measure incidence (eg not at all, once
or twice, sometimes); another sensible rating format.
Information about design flaws or other aspects of the study which are
relevant to assessing the reliability and validity of the measures.
28
4. Review of Main Measures
This chapter focuses on the main measures of psychosocial
hazards for which reasonable evidence of their reliability and
validity was available in the literature. It starts with general
findings about the sources of data used for the review, where the
information comes from and how the field of research into
organisational stress has developed as a whole. It provides an
overview of the types of issues that are currently a source of
debate and provide an important context for how and why
measures have developed in the way that we find them today. It
then goes on to discuss some of the key issues about validity
which apply to all the measures reviewed.
29
4.1.1 Peer-reviewed journals
l Submission of manuscript.
l The editor decides whether to send it for review.
l Expert reviewers (usually at least three) read and comment on the
article (which is blind reviewed and the identities of the reviewers
are likewise not known to the authors).
l Comments and recommendations are given to the editor.
l Editor makes a decision (reject, accept, revise and resubmit).
30
4.2.1 An interdisciplinary field
Another way of describing the field is that it always has been and
remains engaged in quite fundamental debates about both the
definitions of stress (ie what does stress actually mean?) and the
theoretical bases of stress (ie how can we understand and explain
what stress is and what it is supposed to do?). While it is common
for many fields to engage in such debates, the fundamental and
on-going nature of these make the organisational stress field
somewhat unusual. This feature of the field also has implications
for how hazards are measured.
31
for their effects. This issue has likewise been debated at some
length in the research literature.
Last, most of the published studies in the field fall into one of a
number of categories, including the following. The simplest form
of study will simply describe the hazards and/or harms
experienced by a particular occupational group. A second type of
study aims to examine associations between measures of hazards
and measures of harms to assess which kinds of hazards appear to
be most strongly associated with harm. A third kind attempts to
look at the numerous factors involved in hazard-harm
relationships, such as personality, coping skills, and perhaps other
aspects of the work environment. For example, a study may
attempt to see whether aspects of employees’ personalities
increase or reduce the strength of the relationships between
hazards and harms. Finally, there are also many methodological
studies that aim to address some of the methodological issues
outlined above.
It should be noted that there are relatively few studies which have
as their primary goal the psychometric evaluation of hazard
measures, and hence most of the evidence cited here comes
indirectly from papers which report studies that have used
relevant measures and which contain data that tell us something
about reliability and validity.
32
4.3 Typical hazard measures
Most measures of psychosocial hazards have similar features.
First, they are self-report and perceptual (ie not based on
observation or the measurement of some aspect of work). They
tend to contain lists of statements (items) describing aspects of
work that may represent psychosocial hazards (for example: ‘I
have little control over the way my work is scheduled’, ‘I often
experience marked increases in workload’). Respondents are then
required to respond to the statement usually by indicating on a
scale (which is then given a numerical value) the extent to which
they agree or disagree with the statement or how often they
experience the situation described. Usually, for reasons of
reliability and validity, a particular hazard such as workload will
be assessed by a number of similar items (a scale or sub-scale) and
the total of, or average score on, those items used as an indicator
of the level or amount of that hazard perceived by the respondent.
33
empirically when used in studies (see Section 3.9 for an overview
or Appendix I for the detail). There are, however, some aspects of
validity that are unrelated to the psychometric properties of the
measure itself and instead are related more fundamentally to the
theory on which the measure is based. Although a detailed
discussion of the theoretical aspects of validity is outside the scope
of this review, they are, nonetheless important and form the basis
for considering nearly all aspects of validity.
34
4.4.1 Theory and content validity
35
checking whether there are not significant relationships between
the measure and variables which are theoretically unrelated to the
measure. Third, predictive validity means examining expected
relationships between hazards and future harms. In each case, this
requires a good level of specification of what the hazard is, how it
works, what is should and should not be related to, and, most
importantly in this context, which particular harms may be caused
or predicted by the presence of the hazard.
The next sections look in more detail at the five main measures
identified for inclusion in the review.
The JDS comprises seven main scales which assess the following
work characteristics: skill variety; autonomy; task identity; task
significance; job feedback; feedback from others; and, dealing with
others. Of these, most attention has focussed on the first five
scales; the latter two scales (covering relations with others) often
being dropped from research studies. Two forms of the original
questionnaire exist: the ‘short form’ consisting of 53 items and a
longer version. A revised version has also been proposed.
36
4.5.1 Reliability
4.5.2 Validity
Face validity
Content validity
37
the scales are not timebound and the wording is somewhat old
fashioned in places.
Concurrent validity
Predictive validity
Discriminant validity
4.5.3 Utility
‘To what extent does your job require you to work with mechanical
equipment?’
l rate the accuracy of a statement (eg the job requires a lot of co-
operative work with other people); and
l indicate how much they agree with a statement (eg my opinion
of myself goes up when I do this job well).
38
The measure is freely available in the literature and has a scoring
key for both the short and long versions. It is generally applicable
across industrial sectors and occupations. Some limited norm data
are available in the literature.
This research identified six relevant papers for review through the
literature search. However, three papers had to be dropped on
subsequent reading of the full paper, leaving three original
research studies from which evidence is drawn.
4.6.1 Reliability
39
No data are given for the sub-scales, nor is data given on test-
retest reliability, test-retest sensitivity or inter-rater reliability.
4.6.2 Validity
Face validity
Content validity
Construct validity
40
Concurrent validity
Only one study looked at whether or not the sub-scales had the
expected relationships with other measures and behaved in the
expected way – in other words, are the JSS sub-scales related to
other measures as we would expect (eg are high levels of stress
related to high levels of psychological harm?). The findings from
the available research on the JSS were somewhat mixed.
Predictive validity
Discriminant validity
Discriminant validity data for this measure were not reported and
could not be ascertained from the papers reviewed.
4.6.3 Utility
That said, the JSS also has a number of weaknesses. To date, there
have been relatively little published data on reliability and
validity which makes it hard to fully assess the strength of the
measure. What limited work has been published is good, but
ideally these findings should be replicated across a range of
different settings and occupational groups before firm conclusions
about the performance of the measure can be drawn.
Additionally, while the items in the scale (face and content
validity) seem reasonable, and the structure of the scale (construct
validity) appears good, evidence on the relationship between this
measure and other theoretically related scales (concurrent
validity) was mixed. There is no evidence of the ability of this
measure to predict whether or not subsequent harm will occur (let
alone the extent of such harm) following exposure to the hazards
identified. Finally, the most consistent relationships with the JSS
sub-scales and the JSI (Job Stress Index: see Section 4.6) were
found with locus of control (LOC). As LOC is often characterised
as an individual trait (a way of viewing the world around you as
within or without your control) this calls into question the ability
of the scale to discriminate general views from specific work-
related risks.
41
indicate the amount of stress (ie the severity) they perceive to be
associated with each of the items on a scale of 1–9 in comparison
to a standard stressor (the assignment of disagreeable duties)
which is rated at 5. Respondents are then asked to indicate (on a
scale of 1–9+) the number of days over the previous six months
when they have been exposed to/encountered this stressor.
Typical items include ‘excessive paperwork’ and ‘working
overtime’ (Spielberger and Reheiser, 1994).
42
l Psychological Demands.
l Social Support.
l Physical Demands.
l Job Insecurity.
As has been indicated above, the JD-C model has been a very
popular basis on which to approach stressor research. As a result
we were able to identify 34 papers which, on the basis of the
abstract, looked appropriate for inclusion in the review. On closer
perusal, however, only 12 fulfilled the inclusion criteria. Many of
the excluded papers were directly concerned with exploring the
relevance and validity of the JD-C model to current working
environments. Unfortunately, for the purposes of this study, they
used one-off variations or adaptations of some of the JCQ scales,
so could not be used for establishing reliability or validity.
Of the remaining 12 papers, only two used all five JCQ scales. Five
of the papers used just the demands and control scales; two
papers used three scales (demands, control and social support);
and in some of the papers reliabilities are given separately for the
skill discretion and decision authority sub-scales of the decision
latitude measure. Additionally, five papers were found which
related to a revised version of the demands and control scales
used in an advanced manufacturing technology setting (Jackson,
Wall, Martin and Davids, 1993) and several papers were found
which used revised scales amongst London based, civil servants
(the Whitehall II Studies, see for example Standfeld, White, North
and Marmot 1995). These studies are considered separately at
Section 4.8.
4.7.1 Reliability
The two studies which report on all five JCQ scales do not report
scale reliabilities. Three of the five studies using just the demand
and control measures report whole scales reliabilities. One paper
provides separate reliabilities for the decision latitude and skill
discretion sub-scales, job demands and social support. Overall, the
following can be summarised:
l Reliabilities for the job demands scale range from .64 to .81.
l Reliabilities for the job control scale taken as a whole are
generally much better, ranging from .77 to .86.
l The one paper reporting separate reliabilities for the job
control sub-scales found skill discretion scale to have a
reliability of .74, where as the decision latitude scale had a
reliability of .65.
l One reliability was reported for the social support scale at .81.
43
The original job control scale has come in for criticism in the
literature for mixing together skill discretion and decision latitude.
This may in part explain why so many different and adapted
versions of the scale exist. It also suggests that there are probably
more reliable versions of this scale available, but an investigation
of all of them is beyond the remit of this particular study.
4.7.2 Validity
Face validity
Face validity appears good for the JCQ scales, with items being
developed and revised over decades and adapted for the UK.
Content validity
Frequency based rating scales are not used, but questions are
phrased in a way to avoid cognitive bias.
Construct validity
One lasting criticism of the JCQ has been the structure of the job
control measure and the extent to which it accurately measures
distinct components of control. This criticism has led to many
different variants of the control measure being developed. On the
whole though, control has been found to be a useful construct.
The demand scale is less contentious.
Concurrent validity
44
Predictive validity
Discriminant validity
4.7.3 Utility
Extensive data are available via the JCQ Centre and the JCQ users
network. The following is the JCQ Usage Policy:
45
The first group of studies centres on work by Jackson and
colleagues to develop more specific measures of demands and
control for use in an advanced manufacturing setting.
A further study using just the timing and method control scales
found reliabilities of .75 for timing control and .69 for method
control. One study which uses a combined method and timing
control measure reported a scale reliability of .83.
46
Table 4.4: Jackson, Wall, Martin and Davids — Demands and Control Scales
4.8.2 Validity
Face validity
Content validity
Construct validity
Concurrent validity
Predictive validity
47
Discriminant validity
4.8.3 Utility
Only one set of reliability data was identified for the scales
amongst the papers reviewed, details are given in Table 4.5.
Reliabilities are good with the exception of the job demands scale
which falls below the acceptable threshold (of 0.7) although this
might be in part due to its brevity.
Scale Reliability
Decision latitude (control – 15 items) 0.84
Job demands (four items) 0.67
Social support (six items) 0.79
48
Face validity
Content validity
Construct validity
Concurrent validity
Predictive validity
49
Importantly, the research has shown that these relationships hold
true even when personality characteristics such as negative
affectivity (a persons propensity to perceive and answer questions
negatively regardless of the objective circumstances) are taken
into account.
Discriminant validity
For example, the job demands scale (high scores eg high pace of
work or conflicting demands) was found to be predictive of future
psychiatric disorder whereas low job control was found to
increase the risk of heart disease. Social support at work was
generally found to be a positive factor in determining
psychological health as was decision authority.
Utility
50
Table 4.6: OSI Sources of Pressure Scale
The OSI is perhaps the best known measure of its type in the UK
and is widely used both in research and commercially within
organisations. As a result of its widespread use, many papers
were initially identified in the literature (which had used the OSI).
However, on reading the full paper, it became clear that only 14 of
the original 47 papers were suitable for inclusion in the review.
Twenty-one were excluded on the basis that they did not meet the
original criteria for inclusion in the review. A further seven papers
had to be dropped because they did not provide relevant
information (such as response rates or reliability coefficients for
the OSI sources of pressure sub-scales) and five papers were
excluded because sub-scales had been combined or items dropped
from the original measure.
4.10.1 Reliability
51
4.10.2 Validity
Face validity
The scale was developed with largely white collar and managerial
workforces, and there is little evidence of attempts to pre-test
items on a target population. As a result, the scale might be of less
relevance in non-managerial settings.
Content validity
Items for the OSI sources of pressure scales were based on Cooper
and Marshall’s (1976) model of occupational stress. The 61 items
therefore represent the sources of stress described in the model
which in turn was based on available stress literature.
Construct validity
l managerial responsibility
l organisation culture
l work demands
l personal demands of work.
52
The main factor in Davis’ analysis (managerial responsibility)
accounted for a lot of the variation in the way people responded
to the scale indicating that there might be a single underlying
explanation for responses. These four factors when considered as
scales had better reliability than the original scales proposed by
Cooper which leads Davis to conclude that the four scales might
be more useful in practical applications of the measure.
The third study which casts doubt on the construct validity of the
OSI sources of pressure scale was conducted by Lyne, Barrett,
Williams and Coaley (2000) who, like Davies before, found ‘no
correspondence’ between the patterns of responses found in the
data and the sub-scales suggested by Cooper and Bramwell
(1998). Lyne and colleagues found that statistical analysis
suggested the sources of pressure scale in fact was best interpreted
as three, or possibly four, sub-scales consisting of:
l workload
l pressures in the role of employee
l pressures of the managerial role
l (lack of support from home).
53
problems apply to current English versions of the OSI sources of
pressure scale.
Concurrent validity
54
Predictive validity
Discriminant validity
4.10.3 Utility
The OSI is perhaps the best known and most widely used measure
of workplace stress. It was designed specifically to aid the
diagnosis of stress in organisations. The OSI sources of pressure
scale, which is the only part of the OSI to be reviewed here,
consists of 61 items across six sub-scales.
The OSI has been widely used and researched and as a result it is
relatively well understood psychometrically compared to some of
the other measures included in this review. However, this
research also reveals several issues of concern relating to use of
the OSI.
55
4.11 Rizzo and House Measures of Role Conflict and Role
Ambiguity
Work into role dynamics and their impact on commitment,
satisfaction and performance within the workplace dates back to
1964. Kahn first proposed the theory of organisational role
dynamics. Rizzo, House and Lirtzman were among the first to
tackle the task of developing measures of these potential
workplace hazards (Rizzo, House and Lirtzman, 1970). Rizzo and
House’s work spans the last 30 years and focussed on the
relationships between role ambiguity (sometimes referred to as
role clarity), role conflict and other theoretically related measures,
such as leadership, satisfaction and anxiety. Since they were first
presented in a 1970 edition of Administrative Science Quarterly, the
Rizzo and House Role Ambiguity/Conflict scales have been
widely used and appear in many research studies in one form or
another.
Rizzo, House and Lirtzman (1970) state that original scales were
developed in response to the recognition that…
56
Role conflict:
Role ambiguity:
4.11.1 Reliability
4.11.2 Validity
Face validity
Table 4.7: Scale reliabilities for role conflict and role ambiguity scales
57
that was developed in the US over 30 years ago. This does
therefore raise some questions about the relevance of the language
used in the measure for a UK sample in the working environment
of today.
Content validity
Construct validity
Concurrent validity
Predictive validity
58
Discriminant validity
Discriminant validity data were mixed for role conflict and role
ambiguity scales. Two studies in particular suggest good evidence
of discriminant validity (Kelloway and Barling, 1990; Smith, Tisak
and Sneider, 1993). Where as other results point to a lack of
discrimination (Schuler, Alday and Brief, 1977). Hall and Spector,
1991 point out that the pattern of responses on these and other
measures remains consistent across people in similar and different
jobs, raising questions about the extent to which the work
environment is the sole cause of the observed relationships.
4.11.3 Utility
These measures of role ambiguity and role conflict are some of the
most long established. Even so, recent research suggests that they
are still useful in aiding our awareness of the existence of certain
workplace stressors.
59
60
5. Information About Other Measures
This section reviews an additional 11 measures for which only
limited evidence for the validity, reliability and utility was found.
These are:
l Effort-Reward Imbalance
l NHS Measures
l NIOSH Generic Job Stress Questionnaire
l Occupational Stress Inventory
l Pressure Management Indicator
l Role Hassles Index
l Stress Audits
l Stress Diagnostic Survey
l Stress Incident Record
l The Stress Profile
l Work Environment Scale.
61
5.1 Effort-Reward Imbalance (Siegrist and Peter, 1994)
5.1.1 Background
‘First, its focus is not on job task content but on the reward structure of
work.’ (Siegrist and Peter, 1994, p.131)
62
5.1.2 Utility and other information about the
measure itself
Not applicable.
5.2.1 Background
The aim was to develop short scales for use in the NHS, with good
face validity, clear factor structure and high internal reliabilities.
The scales were designed to measure autonomy/control, feedback
on work performance, influence over decisions, leader support,
role clarity, role conflict, peer support, and work demands. A final
construct of professional compromise was identified through pilot
interviews.
63
Sample items
See Haynes, Wall, Bolden, Stride and Rick (1999, pp. 273-275) for
full item list.
5.3.1 Background
64
Number of studies/reports used to examine this measure
5.4.1 Background
Two.
65
Administration and scoring
All items are self-report format. The response scale is a five point
Likert type, ranging from ‘rarely or never true’ to ‘true most of the
time’. The 14 scales can be used separately or summed to produce
measures of stress, strain and coping.
5.5.1 Background
66
‘As the PMI is a new instrument it is not yet possible to provide a
detailed list of research publications.’
One.
Sample items
Generally applicable.
5.6.1 Background
67
How was it developed?
Two.
Sample items
68
Availability of norm data
69
Numbers of items and sub-scales
Sample items
Factor analysis identified eight main factors for oil rig workers
(career prospects, safety, home-work interface, under-stimulation,
physical conditions, unpredictability, living conditions, physical
climate) plus four additional factors (organisation structure and
climate, physical well-being, workload, air transportation). Eight
factors were found by Sutherland and Davidson (ambiguity,
overload, manpower problems, culture and problems, homework
interface, role insecurity, boundary relationship, new technology).
5.8.1 Background
70
How was it developed?
5.9.1 Background
Two.
71
5.9.2 Utility and other information about the
measure itself
Not applicable.
Widely applicable.
Not applicable.
5.10.1 Background
An initial pool of 300 questions was tested on 500 subjects. ‘On the
basis of the statistical analysis all unreliable questions were
deleted. The remaining 250 were subjected to factor analysis’
(Setterlind and Larsson, 1995, p. 87). The reduced Stress Profile
was tested on a new group of 400 subjects and these results cross-
checked for validity against the first sample, reducing the profile
to 224 items.
One.
72
5.10.2 Utility and other information about the
measure itself
At least two other instruments exist called the Stress Profile — one
by Derogatis (1984), the other by Wheatley (1990) — were
identified during the search procedure. However, these two were
excluded from the review as the instrument and scales were not
about work and there is little evaluative literature.
5.11.1 Background
73
Number of studies/reports used to examine this measure
Two.
74
6. Conclusions and Recommendations
Thus far, this review has considered evidence for the reliability
and validity of a range of psychosocial hazard measures. It has not
yet summarised this evidence nor considered its implications for
research and practice. This chapter concludes the report by
providing a brief overview of the evidence presented in detail
earlier and considers what this evidence means for both future
research and future practice. Before this is done, the objectives of
the review are restated, and a description is provided of the kinds
of evidence and measures that were found.
75
6.2 What evidence was available?
A surprising finding of this review, given the many thousands of
research papers on occupational stress produced over the last
thirty or more years, is the general lack of serious (replicated)
studies examining the psychometric properties of measures of
psychosocial hazards. While there were many studies which used
these measures, they did not often include information which
could be used in this review. This was for two main reasons.
76
6.3 What measures are available?
A striking finding of this review was the lack of variety in the type
of psychosocial hazard measures that have been developed and
used. Extensive searches of the literature and discussions with
professional bodies revealed that by far the most common type of
hazard measurement was the self-report questionnaire. In
addition, nearly all of these were designed primarily for research
and not as organisational tools.
77
6.4 Evidence for reliability
Reliability is connected with the consistency of the measurement.
There are a number of different forms of reliability and each of
these was considered when examining the hazard measures.
78
6.5 Evidence for validity
While reliability is concerned with the consistency and
performance of the measurement, validity asks about the extent to
which scales accurately measure what we think they do. In
general, more evidence was available for each of the forms of
validity than was the case for each form of reliability.
Some aspects of validity are not connected with how the hazard
measure performs in practice but rather with the underlying
theory or explanation about what the hazard is, how it works, and
why it is being measured in the way that it is. As discussed earlier,
many forms of validity and, in particular, content and construct
validity, are very seriously compromised by the limited and weak
theory which underlies some of the hazard measures reviewed.
79
related measures taken at the same time. For example, we
might expect that a measure of workload would relate to a
measure of fatigue. A reasonable quantity of evidence was
available and this indicated that on the whole, the measures
showed moderate to good levels of construct validity.
However, it should also be noted that it was often the case that
hazard measures were related to many other variables which
can not only suggest good concurrent validity, but also weak
divergent validity (see below).
l Predictive validity: of all the kinds of reliability and validity
thus far discussed, predictive validity would appear to be the
most important feature of hazard measures as it refers to
whether a measure taken at one point in time predicts
theoretically related and important outcomes at some point in
the future. In other words, is there evidence that the hazard
measures reviewed here actually predict future levels of, say,
harms such as illness? One of the most significant findings of
this review is that there is very little evidence about the
predictive validity of psychosocial hazard measures. This
means that in general we simply do not know whether these
measures are valid tools measuring hazards which predict
harms.
l Discriminant validity: this refers to whether the measure is
not related to theoretically unrelated variables. As mentioned
above, concurrent validity for many of these measures is good
in that they are related to theoretically related measures.
However, there is also evidence that these measures are also
related to other measures to which they are not theoretically
related. If measures are related to things they should not be
related to, this gives reasonable grounds to question their
discriminant validity.
80
6.6 The utility of hazard measures
Given the similar nature and format of most hazard measures
discussed above relatively similar points can be made about their
utility. First, they can be administered by anyone and no special
training is explicitly required. Second, they are all reasonably easy
to complete though some, particularly the generic stressor
measures, contain a larger number of items. Third, there are issues
around the interpretation of these hazard measures which does
somewhat diminish their utility.
6.7 Recommendations
The main aim of this review was to assess the evidence for the
reliability and validity of a range of psychosocial hazard
measures. While it is recognised that, in practice, these measures
are probably rarely used on their own but supplemented with
other forms of investigation and assessment, it remains vital that
the measures that are used have reasonable reliability and
validity.
81
Broadly speaking, there was relatively little sound evidence about
the reliability and validity of these measures. However, what this
evidence strongly suggested was that the quality of these
measures was limited. This also means that their utility is also
likely to be quite limited. These weaknesses have now for the first
time been systematically identified. Some of the steps which can
be taken to improve such measures of psychosocial hazards are
now considered. These recommendations are not comprehensive
but focus on those that seem most important and urgent. First the
implications for practice and then the implications for research are
considered though it is recognised that these areas are
interrelated.
82
Third, it is recommended that organisations should continue to
develop other ways of assessing hazards in addition to self-report
questionnaires such as:
l observations
l task analysis
l job descriptions
l reports of harms and what these may tell us about hazards.
83
development of such measures. The testing of these new forms of
hazard measure needs to take place in diverse organisational
contexts in order to maximise reliability, validity, and utility.
84
Appendix I: Psychometric Criteria for Assessing
Psychosocial Hazard Measures
85
of answer to a number of standard questions from a closed list (eg
rate on a five point scale). Each choice in this list of answers is
usually then given a number (eg on a frequency scale, ‘never’ = 1,
‘sometimes’ = 2 etc.). Such data are usually treated as interval level
data, but are more correctly viewed as scalar (ie falling between
interval and ordinal data). This is the case for the standardised
instruments reviewed in this report. Qualitative or non-standard
methods cannot usually be assessed for reliability or validity in
the ways described below, although this is sometime possible (cf
Daniels, de Chernatony and Johnson , 1995). Nevertheless, where
possible, researchers should make every effort to report what
psychometric or other evidence they have for the validity and
reliability of the methods used.
A1.1 Reliability
There are two ways of assessing reliability for self-report
instruments: (i) internal consistency reliability is essential for any
instrument; (ii) test-retest reliability, which may or may not be
appropriate in any given instance. For instruments completed by
external raters, a third form of reliability, inter-rater is
appropriate.
86
that, on balance, the instrument shows acceptable reliability, with
a caveat for those scales with reliability <.70.
87
l rwg (James, Demaree and Wolf, 1984, 1993), which returns an
index of agreement across raters for each person on each scale
being rated. An average rwg may be reported across all persons
rated.
A1.2 Validity
There are essentially three main forms of validity: face, content and
construct. (NB: Different texts present slightly different
classification and sub-classifications of terms for validity, but all
include in general the forms of validity outlined here, cf
Oppenheim, 1992; Spector, 2000).
88
A1.2.4 Construct validity and factor analysis
89
correlate, so oblique rotation is often used (usually OBLIMIN), but
in some circumstance orthogonal rotation might be used instead
(often VARIMAX). Practical experience indicates that quibbling
over rotation choices here may be ‘splitting hairs’ too in many
situations.
a) Sample size: Ideally the sample size should exceed four times
the number items in a scale or 100, whichever is the greater ie 100
is about the minimum required sample for a factor analysis. If
there are more than 25 items in the scale the number of
respondents should be at least four times the number of items.
Generally speaking, the greater the sample size the more stable
the results. Thus, sample sizes exceeding ten times the number of
items in a scale or 200, whichever is the greater, are likely to give
more stable solutions. Sample sizes exceeding 20 times the
number of items in a scale or 400, whichever is the greater, are
likely to give even more stable solutions. Sample sizes exceeding
1,000 are likely to give the most stable solutions.
b) The Kaiser-Meyer-Olkin (KMO) measure of sampling
adequacy is also an index of the size of correlations amongst
items. This should be reported and should exceed 0.70. Values in
excess of 0.80 or even 0.90 are preferred.
90
which for a scale suspected to be multi-dimensional may indicate
substantial response bias).
c) Items load on expected factors, or item loadings make theoretical
sense. Factor loadings >.30 are usually taken as the minimum
threshold for a significant factor loading, but loadings of >.40 or
>.50 might be acceptable. It is important that the pattern of
loadings indicates a theoretically interpretable factor.
d) Items do not have large cross-loadings on several factors. Items with
cross-loadings on several factors are ‘noisy’ or inaccurate.
Retaining them can compromise the validity of the instrument.
91
because CFA on the other half of these data would be collected by
the same researchers, using the same protocols and procedures as
for the data subjected to EFA. Therefore, there are likely to be
common errors across both samples, making true cross-validation
problematic (Hurley, Scandura, Schriesheim, Brannick, Seers,
Vandenberg and Williams, 1997). Even independent samples
gathered by the same group of researchers may share common
problems across studies, making independent validation using
CFA on new samples also desirable.
Use of fit indices. Most CFA packages give several fit indices. In
many cases, fit indices such as the significance of χ2, AIC (Akaike’s
information criterion) and CAIC (Bozdogan’s variant of Akaike’s
information criterion) give values specific to that sample, and
should be used only to compare alternative models fitted on the
same sample for the instrument. There are several indices whose
range is ordinarily approximately between 0 and 1 for all samples,
and definitive guidelines can be given for judging the fit of a
hypothesised model across samples. These fit indices include (see
for example Medsker, Williams and Holahan, 1994):
Note that all fit indices should exceed 0.90, but the current
consensus is that the CFI should exceed 0.95 (Hu and Bentler,
92
1998), although previously values of >.90 were considered
adequate for the CFI.
93
variable taken before or concurrently with the measures of
hazards, so that the measures of psychosocial hazards are
predicting future changes in harm. Given the basic purpose of
hazard assessment – to measure features of work that may cause
harm, predictive validity is extremely important.
A1.3.1 Reliability
Internal consistency
Test-retest reliability
94
B. Approaching acceptability: Most sub-scales reliable (r>.70),
some marginal (>.60), over short and stable period.
C. Unacceptable: Instrument unreliable (many or all rs <.70, over
short and stable period).
n/a (1) — only one wave of data collected.
n/a (2) — period of months or years elapses between
measurements.
n/r — test-retest reliability not reported.
NB — a short and stable period is defined as a few days to a few
weeks, where no organisational or job change takes place.
Test-retest sensitivity
Inter-rater reliability
95
C. Fairly good > 40 per cent.
D. OK > 30 per cent.
E. Poor > 20 per cent.
F. Very poor < 20 per cent.
(NB — these figures were developed from reading the relevant
literature and experience with psychosocial hazard measures,
rather than formal statistical criteria.).
A1.3.2 Validity
Face validity
Content validity
A. Yes
B. No.
Sample size
96
C. OK: size > ten times the number of items in a scale or 200.
D. Minimum: size > four times the number items in an instrument
or 100, whichever is the greater.
E. Not acceptable: size < four times number of an instrument or <
100.
n/r: not reported.
A. Marvellous >.90.
B. Meritorious >.80.
C. Middling >.70.
D. Mediocre >.60.
E. Miserable >.50.
F. Unacceptable <.50.
n/r: not reported.
A. Suitable: Sample size and KMO all fall within A,B,C range.
B. Marginal: Sample size — range A-D, KMO range A-E.
C. Not suitable: sample size = E/nr, E/nr, KMO = F/nr.
97
NB: For principal components analysis, post-rotation variance
accounted for = pre-rotation variance accounted for.
Use of CFA
98
Sample size
Covariance matrix
Model fit
Loadings
99
Use of modification tests
100
C. Poor: statistically significant relationships/correlations with
theoretically related variables measured after administration of
hazard measures, including some measures of harm. Pattern of
relationships is almost the same for each sub-scale, including
correlation coefficients of similar size, relationships are with only
one or two other variables or pattern of significant relationships
disappears after controlling for initial levels of predicted variables.
101
3. Promising, but additional independent evidence needed. Two
or more studies conducted, but all by research teams
connected to scale developers.
4. Additional evidence needed. Only one validation study
conducted.
102
23. Additional evidence needed but promising: test-retest data
available from only one study — rated A.
24. Additional evidence needed but problematic: test-retest data
available from only one study — rated B or C.
25. Unknown: no test-retest data available.
38. Very good: Response rate rated A across all of several studies.
39. Good: Response rate rated A or B across all of several studies.
40. Fairly good: Response rate rated A, B or C across all of several
studies.
41. OK. Response rate rated A-D across all of several studies.
103
42. Possible problems. Response rate rated A-D across most of
several studies, occasional E or F.
43. Poor. Response rate rated mainly E or F across several studies.
Face validity
Content validity
104
49. Problematic (II): Scales subject to EFA in different studies,
scale structures replicated across studies, EFA solutions rated
A or B across studies.
50. Problematic (III): Scales subject to EFA or CFA in one study on
one sample, EFA/CFA solution rated B.
51. Not valid: Scales subject to EFA or CFA across several studies,
and either structure not replicated, or EFAs or CFAs rated C or
worse.
52. Unknown: No EFAs or CFAs reported.
105
Overall quality of discriminant validity
106
Appendix 2: Single Study Proforma
107
Paper ID:
Response rate
N= Reponse rate =
Reliability
Not rep’d 9 Not used 8 N/A 7 most ‘r’s>0.7 0 most ‘r’s<0.7 1 all ‘r’s<0.7 2
Validity
Concurrent Validity:
Predictive validity:
Divergent Validity:
Construct validity:
Either here or in other work reported in the article, was:
EFA used as the tool was being developed? Yes 1 No 0
NB: if ‘Yes’ for EFA or CFA refer to statistics expert as the second reviewer.
© Institute for Employment Studies
Appendix 3: Pro Formas for Exploratory and
Confirmatory Factor Analyses
109
Paper ID No.
Very appropriate: prior EFAs conducted on scales with independent Good: Several fit indices of kind described, all exceed min threshold. 4
samples or strong a priori structure. 3 Marginal: Several fit indices, all but one exceed min threshold 3
Appropriate: prior EFAs conducted or strong a priori structure. 2 Uncertain: One fit index which exceeds min threshold. 2
Not appropriate: neither prior EFAs conducted with scales nor Unacceptable: One or several fit indices all below min threshold 1
strong a priori structure 1
15. Loadings:
11 Sample size:
Acceptable-all significant and in hypothesised direction 3
Very good: size > 1000 5 Marginal: nearly all significant and all in hypothesised direction 2
Good: size > 20 * the no. of items in a scale or 400 4 Unacceptable: several non-sig loadings, or not in hyp’d direction. 1
OK: size > 10 * the no. of items in a scale or 200 3
Minimum: size > 4 * the no. items in an scale or 100 2 16. Use of modification tests.
Not acceptable: size < 4 * no. of an scale or < 100 1
Not reported 9 Acceptable: Modification tests not used, or modified scale structure on a
separate sample. 2
12. Covariance matrix: Unacceptable: Modification tests used, but no attempt at cross-validation
in a separate sample. 1
Matrix used in analysis not reported 9
Analysis on covariance matrix 2 17. Overall evaluation of CFA solution.
Analysis on correlation matrix. 1
Excellent: Model fit 4, loadings 3, modification tests, 2. 4
13. Evaluation of suitability for CFA Possibly acceptable: Model fit 4, loadings 3, modification tests 1 or 2 3
Marginal: Model fit 4/3, loadings 3/2, modification tests 1 or 2 2
Suitable: Unacceptable: None of the above 1
(Use of CFA – 3/2; sample size 5-3, covariance matrix 2) 3
Marginal:
(Use of CFA – 3/2; sample size 5-2, covariance matrix 2/1) 2
Not suitable:
(Use of CFA – 1; sample size 2/1; covariance matrix 1/9) 1
113
Bedeian A G, Mossholder K W, Kemery E R, Armenakis A A
(1992), Replication Requisites: A Second Look at Klenke-
hamel and Mathieu (1990), Human Relations, Vol. 45, No. 10
114
* Breaugh J A, Colihan J P (1994), ‘Measuring Facets of Job
Ambiguity: Construct Validity Evidence’, Journal of Applied
Psychology, Vol. 79, No. 2, pp 191-202
115
* Cheng Y, Kawachi I, Coakley E H, Schwarz J, Colditz G (2000),
‘Association Between Pyschosocial Work Characteristics
and Health Functioning in American Women: Prospective
Study’, British Medical Journal, Vol. 320, pp 1432-6
116
Cooper C L and Marshall J (1976), ‘Occupational sources of stress:
A review of the literature relating to coronary heart disease
and mental ill health’, Journal of Occupational Psychology,
Vol. 49, pp 11-28
117
Among Dutch Truck Drivers: A Re-Evaluation of Karasek's
Interactive Job Demand-Control Model’, Stress Medicine,
Vol. 16, pp 101-107
118
* Dewe P (1991), ‘Measuring Work Stressors: The Role of
Frequency, Duration, and Demand’, Work and Stress, Vol.
5, No. 2, pp 77-91
119
* Fotinatos-Ventouratos R, Cooper C L (1998), ‘Social Class
Differences and Occupational Stress’, International Journal
of Stress Management, Vol. 5, No. 4
120
* Grady G F, Judd B B, Javian S (1990), ‘The Dimensionality of Work
Autonomy Revisited’, Human Relations, Vol. 43, No. 12, pp
1219-1228
121
Services Research: Test of a Measurement Model and
Normative Data’, British Journal of Health Psychology, Vol. 4,
pp 257-275
Hays W L (1988), Statistics, 4th ed., New York: Holt, Rinehart and
Winston
122
* Hurrell Jr J J, McLaney M A (1988), ‘Exposure to Job Stress — A
New Psychometric Instrument’, Scandinavian Journal of
Work Environment and Health, Vol. 14. pp 27-28
123
Demand-Control Model on Academic Research and on
Workplace Practice’, Stress Medicine, Vol. 14, pp 231-236
124
Kelly C, Sprigg C and Sreenivasan B, (1998), ‘SME Managers’
Perceptions of Work Related Stress’, Health and Safety
Laboratory Report EWP/15/98
125
* Langan-Fox J, Poole M E (1995), ‘Occupational Stress in Australian
Business and Professional Women’, Stress Medicine, Vol.
11, pp 113-122
126
Melamed S, Kushnir T, Meir E I (1991), ‘Attenuating the Impact of
Job Demands: Additive and Interactive Effects of
Perceived Control and Social Support’, Journal of Vocational
Behavior, Vol. 39, pp 40-53
Moos R (1994), The Work Environment Scale Manual, (third ed.) Palo
Alto, CA, Consulting Psychologists’ Press
127
Narayanan L, Menon, S, Spector P (1999b), ‘Stress in the
Workplace: A Comparison of Gender and Occupations’,
Journal of Organizational Behavior, Vol. 20, pp 63-73
128
Oppenheim A N (1992), ‘Questionnaire Design, Interviewing and
Attitude Measurement’, London: St Martin’s Press
129
Quinn R P, Shepard L J (1974), The 1972-1973 Quality of
Employment Survey, Ann Arbor, MI, Survey Research
Centre
130
* Russinova V, Vassileva L, Randev P, Jiliova S, Cooper C L (1997),
‘Psychometric Analysis of the First Bulgarian Version of
the Occupational Stress Indicator (OSI)’, International
Journal of Stress Management, Vol. 4, No. 2
131
Shrout P E, Fleiss J L (1979), ‘Intraclass correlations: uses in
assessing rater reliability’, Psychological Bulletin, 86, 420-428
132
Slutter J K, van der Beek A J, Frings-Dresen M H W (1999), ‘The
Influence of Work Characteristics on the Need for
Recovery and Experienced Health: A Study on Coach
Drivers’, Ergonomics, Vol. 42, No. 4, p 573-583
133
Turnover Intentions, and Health’, Journal of Applied
Psychology, Vol. 76, No. 1, pp 46-53
134
Persons Questionnaire’, Soc. Sci. Med., Vol. 35, No. 8, pp
1027-1035
135
Role Ambiguity, Role Conflict, and Job Performance’,
Journal of Management, Vol. 26, No. 1, pp 155-169
136
* Wall T D, Jackson P R, Mullarkey S, Parker S K (1996), ‘The
Demands-Control Model of Job Strain: A More Specific
Test’, Journal of Occupational and Organizational Psychology,
Vol. 69, pp 153-166
137
* Zellars K L , Perrewe P L, Hochwarter W A (1999), ‘Mitigating
Burnout Among High-NA Employees in Health Care:
What Can Organizations Do?’, Journal of Applied Social
Psychology, Vol. 29, No. 11, pp 2250-2271
CRR 356