Data Analysis - Using Excel
Data Analysis - Using Excel
Data Analysis - Using Excel
by William W. Dorner
Skeptical? I don't blame you. The following examples show how to apply Excel for
the graphical analysis of quality data. The examples range from somewhat obvious
to downright clever. As it turns out, Excel's capabilities are limited only by your
creativity.
Change the UCL and LCL lines to bold dashed lines with no markers.
Change the x-bar data series to a bold solid line with large visible markers.
Tweak the fonts, axes, titles, headers, footers and margins, as desired.
Another popular quality tool that's a snap to produce using Excel is a Pareto chart.
If you already have your data summarized, as in Figure 4, you can obtain a Pareto
chart by following these steps:
Create Percentage (e.g., in cell C2, type: =B2/$B$7) and Cumulative Percentage
(e.g., in cell D2, type: =C2. In cell D3, type: =D2+C3) columns, as shown in Figure
5.
Once again using the Control key to select noncontiguous columns, highlight
the Category, Percentage and Cumulative Percentage columns. In the example, this
corresponds to cells A1:A6 and C1:D6.
Use the ChartWizard to generate a Combination Bar Chart with both left and
right axes.
After adding some titles and customizing the formats, you get a chart like the one
in Figure 6. Note that the height of each bar references the left axis, whereas the
cumulative percentage line references the right axis.
Perhaps your data are not summarized but are instead in a list where each category
has a numerical code (see Figure 7). Fear not! Excel has an even easier way to
create a histogram using its Data Analysis ToolPak.
Not all Excel users are familiar with the Data Analysis ToolPak because it is an
Add-In, meaning that it does not appear in your Tools menu by default -- you must
put it there. It contains a wealth of statistical tools that will come in handy for any
data analyst, such as: descriptive statistics, regression analysis, analysis of variance
(ANOVA) and random number generation.
To load the Data Analysis ToolPak, select Tools and then Add-Ins. Click on the
appropriate box and select OK. After a few seconds, the Add-In will be loaded. To
activate the Data Analysis ToolPak, select Tools and then Data Analysis. A pop-up
window will appear listing all of the data analysis functions.
Scroll down, highlight Histogram, then click OK. A pop-up window will appear. In
the field titled Input Range, enter the cell range containing the numerical category
codes of your data (select M2 through M11). In the Bin Range field, enter the cell
range containing the list of all category codes, or "bins," you want to be tabulated
and charted (select N2 through N6). Next, select where you would like Excel to
deposit the finished output. Finally, check all three boxes at the bottom of the
pop-up window: Pareto, Cumulative Percentage and Chart Output. When you click
OK, Excel will generate a tabular summary and a Pareto chart.
One statistical tool notably absent from Excel is the box-and-whisker plot. As
designed by Mary Eleanor Spear, the box-and-whisker plot was originally called a
range bar1. Renowned statistician John Tukey later modified the display and
coined the name box-and-whisker2, which some shorten to boxplot.
The simple box-and-whisker plot provides a vivid snapshot of a data set using just
five statistics: the minimum, 25th percentile, median, 75th percentile and
maximum. Tukey refers to the 25th and 75th percentiles as the hinges of the data
set and the minimum and maximum as the extremes.
The plot also illuminates the shape of the distribution. The location of the median
line and the relative length of the whiskers help indicate how symmetrical the data
are. When the median lies far from the center of the box or if one whisker is much
longer than the other, you know that the distribution is skewed to some extent.
It wouldn't take much for Microsoft to add the box-and-whisker plot to Excel.
Until they do, there is a simple way for Excel users to create professional-looking
box-and-whisker plots.
The trick is to use Excel's charting capabilities in a way that they were never
intended to be used. But first, you must obtain the five statistics necessary to
construct the box-and-whisker plot. Suppose the data set resides in cells A1:A50.
The Excel functions you'll need to use are:
Next, enter these numbers into the center column (X) of a spreadsheet template like
the one in Figure 9. The rightmost column (Y) contains constants that will also be
used to construct the plot. You should not shuffle the rows for convenience -- the
order of the data entries does matter.
Having entered the data, use Excel's XY (Scatter) plot with connecting lines and no
point markers. Excel plots the XY coordinates, then "connects the dots" in
sequential order as they appear in the spreadsheet. With some customization, you
can create a professional-looking box-and-whisker plot. In fact, Figure 8 was
created using Excel.
Variations on a theme
The value of box-and-whisker plots increases when used to compare multiple data
sets. By adding additional series to the Excel XY plot, you can easily obtain
multiple box-and-whisker plots, as in Figure 10. The multiple plots allow easy
comparisons between distributions of data.
With a little effort, you can create useful control charts, Pareto charts, histograms
and box-and-whisker plots. Granted, you could eventually reach a point of
diminishing marginal returns, beyond which you may wish to purchase more
powerful statistical software. But for the occasional number cruncher, Excel is
likely to fulfill all of your graphical needs, and then some
References
1. Spear, M.E. 1952. Charting Statistics. New York: McGraw-Hill.
2. Tukey, J.W. 1977. Exploratory Data Analysis (First Edition). Reading, MA:
Addison-Wesley Publishing Co.
3. McGill, R.; J.W. Tukey; and W.A. Larsen. 1978. "Variations of Box Plots." The
American Statistician. 32. 12-16. For the interested reader, McGill et al.
recommend constructing notches as follows: Median ± 1.7 x [(1.25 x IQR)/(1.35 x
sample size)]