Chart Dos and Donts
Chart Dos and Donts
Chart Dos and Donts
Legal notice
The contents of this publication do not necessarily reflect the official opinions of the European Commission or other
institutions of the European Union. Neither the European Environment Agency nor any person or company acting on
behalf of the Agency is responsible for the use that may be made of the information contained in this report.
Copyright notice
© European Environment Agency, 2013
Reproduction is authorised, provided the source is acknowledged, save where otherwise stated.
Information about the European Union is available on the Internet. It can be accessed through the Europa server
(www.europa.eu).
Do use the full axis for bar charts. Our eyes are very sensitive to the area of bars, and we
draw inaccurate conclusions when those bars are truncated.
Wrong Correct
Another bad example shown on BBC UK show Breakfast. Did really the men height doubled
from 1871 to 1971?
If you need to show data details that are not visible when using the full axis, than the original chart
with full axis must be accompanied with a zoomed in chart, a so called panel chart. See
example below
If you have only one category to show, than you can show a portion of the chart by using a line
chart in a specific range.
Another suggestion is to break the axis, so that part of the axis shows the small values, then
another part of the axis shows the large values, with a section of the axis scale removed. Sounds
good, but youve lost any correlation between the large and small values.
Making these charts interactive will solve many of the issues stated above. For example the user
would be able to mouse over a column and get the exact value, filter out some categories or sort
the columns according to their values for easier comparison.
Be clear when some data is missing. Explain the reason why is missing. Use the full axis
and do not skip values when you have numerical data.
The x-axis in the "wrong example" below has a time-series with inconsistent intervals (missing
years 2003 and 2004) giving a distorted view of data over time.
Wrong Correct
Note: Data has not been reported for 2003 and 2004.
Perfection is achieved not when there is nothing more to add, but when there is nothing
left to take away Antoine de Saint-Exupery
As shown in the example above, it is important to remove any visual clutter like the dark
background and the dark grid (non-data-ink) and instead enhance the visibility of the data
information part (data-ink), in this case the bars. The grid can be removed or made in a much
more subtle style, since it is a supporting tool rather than the data itself.
Try to use a clear language in your chart title and descriptions. Avoid acronyms like "MS"
and use the extended form "Member State" or even better simply "Country". It is ok
to use well-known abbreviations like EU or GDP or those your audience understand
clearly.
Use a descriptive chart title and annotation that not only describe what is being measured
rather also why the reader should care and how to read the chart. This will avoid
misinterpretation and save time for the chart viewer.
Example
Improved title with note: Change in cadmium emissions. Note: A reduction of emission is an indication of
improved air quality in major European cities.
Charts are mostly communication tools. We have already made some reasoning on the "why and
how" when we choose the chart type (bar, line, scatter plots etc.). Specific chart types are best at
showing specific aspects of the data.
You can skip this rule if you are building a raw "Statistical exploratory charting tool" where user
can slice and create any chart they want.
For end-products ready to be consumed by the target audience, you should always explain how
to read the chart and the reasoning behind it. Try to be objective and leave out any subjective
interpretations.
Although it is possible to tell hundred stories using a single line chart, it makes a lot of
sense to keep the focus on just one story.
Therefore you should highlight just one or two important lines in the chart, but keep the others as
context in the background.
The above chart remade below in a much better version which highlight the rise and fall of
Microsoft. Do you see what has made the difference?
It is more important to give emphasis to the data itself and sort the chart by the data
attributes, rather than non-data attributes (for example labels like country names).
It will be otherwise very difficult if not impossible for users to do a proper comparison across the
many bars. It is in any case easy with a quick eye-scan to find your own country in the list.
If the chart is interactive, give the user the possibility to change the default sort order and
a way to filter out data and compare only a few categories.
The pie chart below (even though pie charts should be avoided) works also better when presented
with sorted data values. It starts at 12 oclock with the largest slice. It is much easier to
understand the relations between the parts, what is bigger and what is smaller, even when the
values are not readable or the areas are very similar.
If possible label lines individually and avoid legend (Gregor Aisch, Doing the Line
Charts Right)
Rotate bars if the category names are long (Cole Nussbaumer, my penchant for
horizontal bar charts)
Do not use legend when you have only one data category
If there is only one value category plotted in your chart, than there is no need to have a
legend. The title can already contain all needed information. Otherwise you can label
the axis directly.
The legend display one category only and it is already in the title, no need to add it to the axis
either.
The slope of a line chart should be close to 45 degrees for the best perception.
Robert Kosara has a great summary of the "banking to 45 degrees" practice first proposed by Bill
Cleveland.
The same data is presented three ways. The slope is a reflection of the scales used on the two
axes.
However, in some cases there can be legitimate reasons why not to stick completely to "banking
to 45 degrees". For example to analyze the data and reveal certain patterns which would not be
visible in the 45 degree slope. See example below.
Two plots of monthly atmospheric carbon dioxide measurements, taken from 1959 to 1990. The
first plot, with an aspect ratio of 1.17, reveals an accelerating increase in CO2 levels. The second
plot, with an aspect ratio of 7.87, facilitates closer inspection of seasonal fluctuations, revealing a
gradual attack followed by a steeper decay. Source: Computer Science Division, University of
California, Berkeley (http://vis.berkeley.edu/papers/banking/)
When using economic values in your charts than you must be carefull about adjusting the
value according to inflation.
This is done by using the CPI (consumer price index). A Euro in 2010 just does not have the
same spending power as a Euro in 1961.
Source: http://www.aboutinflation.com/inflation/european-union---inflation
The purchasing power of 100 EUR in year 1961 is equivalent to 1948 EUR in year 2010.
Have a fresh set of eyes look at what you've done and give you feedback. You may be
surprised by what is confusing or enlightening! to others.
Studies show that 3D effects reduce comprehension. Blow apart effects likewise make it
hard to compare elements and judge areas.
Below another (in)famous churtjunk. Compare the 21,2% with the 19,5% slices in the pie. Which
one looks bigger?
It is difficult to compare many slices in a pie chart. Try alternative charts to convey your
message, most often bar or column charts will be a much better alternative.
The donut chart is just another pie chart with a hole punched in the middle. The donut chart is a
useless chart made worse. Avoid donut charts for the same reasons.
Further reading:
Avoid stacked charts since the parts in the stacked charts are difficult to compare with
each other.
To solve this issue some chart tools allow the user to filter out interactively the stacked categories
and be able to do have a single category displayed.
Same issue applies to stacked areas charts. It is difficult to compare the areas in the different
regions when stacked (figure above) and much easier to have them as lines (figure below) and a
Another example on how bad stacked bar charts can be in certain cases
Lets see how the chart above looks like as a line chart
Now we can clearly see the decline of household category Married Couples with Children.
Moreover we can more clearly see the trends in the other categories as well.
Quite often superimposing time series of two different measurements will show a strong
correlation. Many things change same way over time. It is an easy mistake to confuse
correlation with causation.
For example if you plot two different data series (A and B) on a common time series, you will
notice that both follow a similar pattern over time. It is very hard if not impossible to prove that A
cause B or viceversa. There are so many third factors that have influence both on A and B that
are not plotted on the chart. Many other external factors can be the cause of both A and B
changing the same way over time. Only a very large profound statistical-based study on all
factors can give some indication of causation, if any exists.
Even correlation can be questionable when seen on a chart. See de-noising data, a method for
identify true correlation by removing the time-series data
Even though your data has a geographical dimension, it doesn't automatically mean that it
will best be displayed on a map. Choose your chart type wisely.
In fact most data has a geographical dimension if we think about it but it does not always convey
new insight when displayed on a map. A very bad map example below, where a huge amount of
data is displayed just because it has a location attached to it. However the user does not get any
insight from this map. There is no correlation or pattern in this map which we could further
investigate.
Another bad example of where a map feels in the way and making it more difficult to
understand the data displayed on it.
source: Nordregio.se
When a chart or map moves it is difficult to remember the values that were shown in
previous scenes and compare with values in a current scene. It is also impossible to
print the visualization.
A series of small charts / maps, so called small multiples, may convey the message
much better than an animation.
Below an excellent example of a small multiple which effectively shows the trend over time for
consumption of liquor per person by county. An animated chart or a map would not have been
able to achieve such scientific elegance in representation of data.
See other examples when small multiple chart is the best alternative to a map
Below an animated map showing water stress in several river basin districts over four seasons
during 2002-2012. Although the animation may be appealing to the eye, it is difficult to use in order
to compare different years or seasons.
Experimenting with the speed of the animation will help you see any pattern that are otherwise
hidden if the speed is too slow or too fast.
Below same data shown as small multiples. Since the maps are shown by year and by season, it
is easier to compare any year to any year or any season to any other season. We can clearly see
that the summers are those with highest water exploitation index and that south Europe,
especially Spain is the most affected. In North Europe, England, Copenhagen and Stockholm
area also stands out. We can also see that there is no up- or down-trend over time for all
seasons. A small multiple of line charts would probably work even better than the map.
Wrong Correct
The left chart says that 33,5% males and 28,6% females passed by on the street and 37,9%
where unknown (the missing data). However we all know that on any given day for a long period
of time there should be around 50% male and 50% female (unless we are on a very gender-
specific area of the city). The issue with the chart above is that the unknown must not be treated
as a third category different from the other two. The unknown contains actually both male and
female most probably with the same distribution. Therefore the missing data must be removed
and only reported separately. This is standard practice in all statistical survey. On the right the
chart corrected, without the unknown. In this case an indication of a margin of error would also
help.
Tell your audience how confident you are in your assertions by.
Include error bars any time you use data to make an argument
Using color categories that are relatively universal makes it easier to see differences
between colors.
Different colors should be used for different categories (e.g., male/female, types of fruit),
not different values in a range (e.g., age, temperature).
Do not use rainbows for range values
If you want color to show a numerical value, use a range that goes from white to a highly
saturated color in one of the universal color categories. no rainbows
Example of bad chart, where we use different colors for same measurement
Remember, 7% to 10% of the male audience have color deficiency issues (color
blindness). Therefore make your charts safe against color-blindness.
Below you have the same chart displayed as a color-blind person would see it.
http://guides.library.duke.edu/topten
http://www.slideshare.net/idigdata/data-
visualization-best-practices-2013
http://www.amazon.com/Envisioning-
Information-Edward-R-Tufte/dp/0961392118
http://en.wikipedia.org/wiki/Misleading_graph
http://junkcharts.typepad.com/
http://www.datasciencecentral.com/profiles/blogs/data-science-ebook-2nd-edition-table-of-
content
http://darkhorseanalytics.com/blog/data-looks-better-naked/
Related content
See also
Water exploitation index 2002 - 2012 - small multiples [https://www.eea.europa.eu/data-and-
maps/daviz/learn-more/water-exploitation-index-2002-2012/view]
Web editing and system integration tasks [https://www.eea.europa.eu/soer-2015/help/web-
integrators]
chart after we removed the unnecessary legend information [https://www.eea.europa.eu/data-and-
maps/daviz/learn-more/legendafter.png/view]
chart with no need for legend [https://www.eea.europa.eu/data-and-maps/daviz/learn-
more/copy_of_legendbefore.png/view]