How To Analyze Likert and Other Rating Scale Data
How To Analyze Likert and Other Rating Scale Data
How To Analyze Likert and Other Rating Scale Data
com
Review article
http://www.pharmacyteaching.com
Abstract
Rating scales and rubrics are commonly used measurement tools in educational contexts. Unfortunately, there is a great deal
of controversy surrounding how data derived from these tools can and should be analyzed. One issue that is repeatedly raised is
whether these data are ordinal or continuous. A related question is whether parametric data analysis techniques are appropriate
and/or acceptable for these rating scale data. Some of this controversy may stem from a misunderstanding of fundamental
issues related to these particular tools or a poor use of terminology. This article provides a review of basic issues surrounding
measurement of various phenomena relevant to educational settings, as well as previous empirical studies examining the effects
of using parametric analysis approaches on rating scale data. Based on previous empirical evidence reviewed in this article,
parametric analytical approaches are acceptable provided certain criteria are met. Implications for research and teaching are
also briey discussed. After reading this article, the reader should be able to identify the characteristics of a true Likert scale
and explain the situations when parametric analytical techniques are potentially appropriate for rating scale data or when
nonparametric techniques are preferred.
r 2015 Elsevier Inc. All rights reserved.
Keywords: Likert scales; Data analysis; Measurement; Summated scales; Rating scales; Ordinal data
Situation
When I took my rst academic position and became
involved in educational scholarship, I found myself asked to
justify some of my analytical decisions. This was not a bad
thing in and of itself since we must be able to justify our
choices in any scholarly endeavor. At the time, what struck
me as odd were the comments related to how I had analyzed
data from rating scales. I remember one passerby at a poster
session stopping to look at my poster and saying, That is
interesting, but you know you really should not have used
means to report Likert scale results. After all, those are
ordinal data. I was slightly taken aback but thanked the
individual for his/her comments and started to think. The
statement about Likert scales being ordinal data was never
made in my statistics classes in graduate school, so I wrote
* Correspondence to: Spencer E. Harpe, PharmD, PhD, MPH,
Chicago College of Pharmacy, Midwestern University, 555 31st
Street, Downers Grove, IL 60515.
E-mail: [email protected]
http://dx.doi.org/10.1016/j.cptl.2015.08.001
1877-1297/r 2015 Elsevier Inc. All rights reserved.
837
838
agree nor
Strongly
disagree
Disagree
disagree
Agree
agree
Team-based learning
helps me learn.
Team-based learning is
more difficult than
traditional lecture.
Team-based learning
should be used in all
pharmacy courses.
A little helpful
Moderately helpful
Extremely helpful
D Adjectival scale:
How often do you complete required reading before class sessions?
[Rarely]
[Sometimes]
[Usually]
[Almost always]
Fig. 1. Examples of rating scales and response formats as they may be presented to respondents.
839
840
Measurement in the psychological and social context has a long history; the following books provide
insight into the historical and conceptual aspects of measurement in the social sciences:
Blalock HM Jr. Conceptualization and Measurement in the Social Sciences. Beverly Hills, CA:
SAGE Publications;1982.
Michell J. Measurement in Psychology: A Critical History of the Methodological Concept. New
York, NY: Cambridge University Press;2005.
Michell J. An Introduction to the Logic of Psychological Measurement. Hillsdale, NJ: Lawrence
Erlbaum Associates, Inc.;1990.
Miller DC, Salkind NJ. Handbook of Research Design and Social Measurement. 6th ed.
Thousand Oaks, CA:SAGE Publications;2002.
The number of resources available for statistical analysis can be overwhelming. Several options are
listed below that are either standards in a field or are written in a way to minimize math and focus on
interpretation and application:
Aparasu RR, Bentley JP. Principles of Research Design and Drug Literature Evaluation.
Burlington, MA: Jones & Bartlett;2014.
Daniel WW, Cross CL. Biostatistics: A Foundation for Analysis in the Health Sciences. 10th ed.
New York, NY: Wiley;2013. [This is a standard textbook for many graduate-level introductory
biostatistics courses so it contains the most math when compared to the other options listed
here.]
Motulsky H. Intuitive Biostatistics: A Nonmathematical Guide to Statistical Thinking. 3rd ed. New
York, NY: Oxford University Press;2013.
Norman GR, Streiner DL. PDQ Statistics. 3rd ed. Shelton, CT: Peoples Medical Publishing
HouseUSA;2003.
Salkind NJ. Statistics for People who (Think They) Hate Statistics. 5th ed. Thousand Oaks, CA:
SAGE Publications;2013.
841
842
still important since violated assumptions, especially deviations from normality, can result in a loss of statistical power
(i.e., the probability of identifying a difference or effect
when one actually exists).55 Common assumptions include
independence of observations, homogeneity of variance,
and a normal distribution. The independence assumption
must be veried conceptually, and alternate tests should be
used when this is violated (e.g., the paired t test). Few
would disagree that perfect normal distributions are rarely
found in practice. Similarly, perfectly equal variances
between groups (i.e., homogeneity of variance) are exceedingly rare. In practice, we are faced with determining
whether a distribution is normal enough or the variances
are equal enough. Normality can be readily assessed
through graphical means, such as histograms, density
estimates, and normal QQ plots. Equality of variance can
be crudely assessed by comparing the group standard
deviations. There are also formal statistical tests to assess
normality (e.g., KolmogorovSmirnov or ShapiroWilk
tests) and equality of variance (e.g., Levenes test, Bartletts
test, or the BrownForsythe test), which are implemented in
most common statistical packages. Thankfully, empirical
studies have shown that some common parametric methods,
especially the t test and F test, are relatively robust to
violations of normal distribution and equality of variance
provided that two-tailed hypothesis tests are used and the
sample sizes within groups are reasonably similar.33,36 We
can also draw comfort from the central limit theorem since
the distribution of the sample mean or differences in the
means will be normally distributed with reasonably large
sample sizes.23 Still, it is important to keep in mind
situations where a nonparametric analysis approach may
be preferred: a one-tailed test is genuinely needed, the group
sizes are substantially different, sample sizes are moderate,
or there is a severe departure from normality.
Recommendation 3: Individual rating items with numerical
response formats at least ve categories in length may
generally be treated as continuous data
Aside from aggregated rating scales, it is not uncommon
to see individual rating items used. These may appear as
individual Likert items outside of a composite scale or a
holistic rubric score made by external raters (e.g., a faculty
member judging a students performance in a simulation;
Peeters discusses important points to consider when developing these rubrics1,2. A common example is when a
student is asked to rate their satisfaction with a teaching
method when provided with a numerical scale (e.g., 1
Not at all satised and 10 Completely satised).
Johnson and Christensen54 provide a typology for these
individual items. When anchors or labels are only provided
at the extremes, these can be called numerical rating scales
(Fig. 1B). Fully anchored rating scales represent the
situation when an anchor is provided for each numerical
option (Fig. 1C). An individual Likert item could be viewed
843
844
Table
Outcome measures for Harpe et al.61
Questionnaire Description
Self-efcacy
to learn
statistics
(SELS)
Current
statistics
selfEfcacy
(CSSE)
Survey of
attitudes
towards
statistics
(SATS-36)
more advanced approaches when statistical modeling procedures are being used.
Applications
Two articles from pharmacy education have been
chosen to illustrate the previous recommendations in
action. The rst is a study of the effect of learningcentered assessment on statistical knowledge and attitudes.61 For this study, the authors used several measures
for the study outcomes (Table). Two measures were related
to statistical self-efcacy: self-efcacy to learn statistics
(SELS) and current statistics self-efcacy (CSSE).62 Both
845
Fig. 3. Distributions of selected pre-test measures for Harpe et al.61 The bars represent a histogram and the solid line is an overlay of a density
estimate for each measure to provide an alternate view of normality. CSSE Current Statistics Self-Efcacy scale; SATS-36 Survey of
Attitudes Towards Statistics, 36-item scale.
846
847
848
Conclusions
The controversy surrounding the appropriate analysis of
various types of rating scales has existed for over 65 years
dating back to the time when the original framework for
levels of measurement was proposed by Stevens. Over time
a wide variety of studies have been conducted examining
the statistical properties of data from these summated or
aggregate scales and from individual items with various
forms of rating. The general consensus is that the use of
parametric statistical tests to analyze these data is potentially appropriate taking into consideration the recommendations provided here. While this one article is not expected
to end the controversy, it may serve to provide support and
offer some clarity to those who are using these common
measurement tools.
Conicts of interest
The author has no conicts of interest to disclose with
respect to the authorship and/or publication of this article.
References
1. Peeters MJ. Measuring rater judgments within learning assessments, Part 1: why the number of categories matter in a rating
scale. Curr Pharm Teach Learn. 2015;7(5):656661.
2. Peeters MJ. Measuring rater judgments within learning assessments, Part 2: a mixed approach to creating rubrics. Curr
Pharm Teach Learn. 2015;7(5):662668.
3. Creswell JW. Research Design: Qualitative. Quantitative, and
Mixed Methods Approaches. 4th ed., Thousand Oaks, CA:
SAGE Publications; 2013.
4. Kuzon WM. Jr, Urbanchek MG, McCabe S. The seven deadly
sins of statistical analysis. Ann Plast Surg. 1996;37(3):265272.
5. Jamieson S. Likert scales: how to (ab)use them. Med Educ.
2004;38(12):12171218.
6. Armstrong GD. Parametric statistics and ordinal data: a
pervasive misconception. Nurs Res. 1981;30(1):6062.
7. Knapp TR. Treating ordinal scales as interval scales: an attempt
to resolve the controversy. Nurs Res. 1990;39(2):121123.
8. Pell G. Use and misuse of Likert scales. Med Educ. 2005;39(9):
970.
9. Jamieson S. Use and misuse of Likert scalesauthors reply.
Med Educ. 2005;39(9):971972.
10. Cario J, Perla R. Ten common misunderstandings, misconceptions, persistent myths and urban legends about Likert
scales and Likert response formats and their antidotes. J Soc
Sci. 2007;3(3):106116.
11. Cario J, Perla R. Resolving the 50-year debate around using and
misusing Likert scales. Med Educ. 2008;42(12):11501152.
12. Michell J. An Introduction to the Logic of Psychological
Measurement. Hillsdale, NJ: Lawrence Erlbaum Associates,
Inc; 1990.
13. Thomson W. Popular Lectures and Addresses, vol. 1. London,
UK: Macmillan Publishers; 1891, p. 8081.
14. Campbell NR. Physics. The Elements. Cambridge, UK: Cambridge University Press; 1920.
15. Bartlett RJ. Measurement in psychology. The Advancement of
Science: the Report of the British Association for the Advancement of Science. 1:422441; 1939-1940.
16. Stevens SS. On the theory of scales of measurement. Science.
1946;103(2684):677680.
17. Singleton RA. Jr, Straits BC. Approaches to Social Research.
3rd ed., New York, NY: Oxford University Press; 1999.
18. Cohen L, Manion L, Morrison K. Research Methods in
Education. 7th ed., London, UK: Routledge; 2011.
19. Smith SM, Albaum GS. Fundamentals of Marketing Research.
Thousand Oaks, CA: SAGE Publications; 2005.
20. Andrich D. Rasch Models for Measurement. Newbury Park,
CA: SAGE Publications; 1988.
21. Linacre BD, Wright JM. Observations are always ordinal;
measurements, however, must be interval. Arch Phys Med
Rehabil. 1989;70(12):857860.
22. Wolfe EW, Smith EV. Instrument development tools and
activities for measure validation using Rasch models: part I
instrument development tools. J Appl Meas. 2007;8(1):97123.
23. Norman G. Likert scales, levels of measurement and the laws
of statistics. Adv Health Sci Educ Theory Pract. 2010;15(5):
625632.
24. Likert R. A technique for the measurement of attitudes. Arch
Psychol. 1932;22(140):555.
25. Burns N, Grove SK. The Practice of Nursing Research:
Appraisal, Synthesis, and Generation of Evidence. 6th ed.,
St. Louis, MO: Saunders; 2009.
26. Streiner DL, Norman GR, Cairney J. Health Measurement
Scales: A Practical Guide to Their Development and Use.
5th ed., New York, NY: Oxford University Press; 2015.
849
850
69. Hoover MJ, Jung R, Jacobs DM, Peeters MJ. Educational testing
validity and reliability in pharmacy and medical education
literature. Am J Pharm Educ. 2013;77(10): Article 213.
70. Peeters MJ, Beltyukova SA, Martin BA. Educational testing
and validity of conclusions in the scholarship of teaching and
learning. Am J Pharm Educ. 2013;77(9): Article 186.
71. Radloff LS. The CES-D scale: a self-report depression scale for
research in the general population. Appl Psychol Meas. 1977;1
(3):385401.
72. Ware JE. Jr, Sherbourne CD. The MOS 36-item short-form
survey (SF-36). I. Conceptual framework and item selection.
Med Care. 1992;30(6):473483.