How to visualize your data in an understandable way

by Thomas Mejtoft

We just walk on by. We just keep on dreaming.

Blondie*

It is easy to use data to deceive your readers by making it hard to understand your results. However, don’t we all want our work to be read and have an impact. It is important to make visualizations of data to be understandable. It is important to think about how we visualize the data that we have.

This short note gives some suggestions on how to visualize different data to increase the understandability of it.

Hope you find this material useful!
If you are looking for other resources around writing to use, here is a page with resources and material. If you are looking for how to use and cite figures, screenshots, code etc. please refer to the following documents: How to use and cite figures from other sources, How to cite screenshots, References to secondary sources and review articles, Writing references to personal communication, Writing references to programming code, and Citing content created by generative AI. Regarding quotes and visualizing data, please read the following documents: Master quotes in writing and How to visualize your data in an understandable way.


Visualizing your data

These guidelines are loosely based on the following sources:

Publication manual of the American Psychological Association by APA
Tables and Figures by APA Style
ACM Web Accessibility Statement by ACM
Resolution and Size by IEEE Author Center

Day, R., & Gastel, B. (2012). How to write and publish a scientific paper (7th ed.). Cambridge University Press

The most important things when creating figures

Think about the readability and accessibility when creating the figures:

  • Make figures that is readable “out-of-the-box”. Use labels, scales, units, legends etc. in your figures to make people understand what the different parts in the figure are about. A figure (with the figure text) should be readable without any further explanation. This means that you should avoid uncommon abbreviations etc. in the figure.
  • Use fonts that are of a readable size. When the figure is mounted in the text, the font should be readable and comparable to other text in the writing. In general, 8–10pt is usually a minimum to make the text in a figure readable.
  • Avoid distracting elements. Keep the figures simple and without elements that are not needed for the figures to be understood. This can be to keep purely esthetic elements to a minimum.
  • Design for high accessibility. Avoid colors or grayscale and other elements that might create accessibility issues with your figures. Use colors that are distinguishable for those with color vision deficiency or combine e.g., colors with patterns for increased accessibility of a figure. This is also important for those printing your documents in black and white. For high accessibility, also provide good descriptions in the text of what is depicted in the figures. Furthermore, metadata, so-called “alternative text”, should be provided to a figure.
  • Be honest about the sample size. Add the number of respondents (n) to each figure to be transparent with the sample size. This can be done in either in the figure or in the figure text by adding n, e.g., (n=142).
  • Create high quality graphics. Creating vector graphic figures is best practice and makes both the reading experience on screen and in print of the best quality. However, most graphics are bitmap (e.g., jpg, png, etc.) and using these formats, the size of figures should be so the print resolution will become >300dpi for color and grayscale figures and >600dpi for black and white line art. This also creates graphics with high readability on screen. Using vector graphic (e.g., eps etc.) is preferred but fairly uncommon as output.
  • Give appropriate credit. If you have based you figure on someone else’s material, give a reference or other appropriate credit in the figure of figure text. Here you can read about how to cite and use of figures from other sources.
Be consequent — Don’t use numbers to deceive the readers

Be consequent in how your data is presented. Don’t mix units in-between data that should be comparable. You can (most often use any unit, but data that the reader should be able to compare should have the same unit in your writings. It is kind of obvious to not mix units that needs a conversion, such as °C and °F or kWh and Joules. Nevertheless, avoid even mixing units that just need simple conversion, such as decimal, fraction and percentages.

Example:

Hard to understand:
Half of the respondents preferred email, 1/3 text messages, and 17% personal communication.

Better way:
Out of the respondents, 50% preferred email, 33% text messages, and 17% personal communication.
Think about how your choice of illustration of the data affects how it is perceived.

A table, a list or a figure stands out compared to writing results in the text. Results of similar importance should be illustrated in a similar way. Don’t use e.g., a figure to make other results “disappear” in the writings. Using different ways of visualizing data is not only confusing, but it also makes it hard to compare and different people might perceive the data differently due to their ability to read the text and analyze the figures. But in general people then to look at figures, table and lists and regard that information as more important.

Bad example (mixing visualizations to unintentional highlight some of the data):

The female respondents preferred sms (35%) as their primary communication channels (Figure 1) and among male respondents, email was preferred (50%) other means of communication stated were phone call (1%), sms (10%), WhatsApp (7%), Messenger (2%), postal mail (30%). 

In the example above, the data from the female respondents are visualized in a figure and the most common means of communication is mentioned in the text, while the data of the male respondents are only stated in the text. Combining these visualization methods make the female data stand out compared to the male data and be more visual. If this was not intended all data should be either in-text or in figures and described in a similar way.

Use the same scale on the y-axis for comparable data.

Using automatic functions when creating different diagrams usually adjusts the y-axis to the highest number. This cases a situation when different figures contain similar number to be hard to compare and one column might appear more important than it is.

Bad example (different scale):

The female respondents preferred sms (35%) as their primary communication channels (Figure 2) while male respondents preferred email (50%) (Figure 3). 

In the example above, different scales are used on the y-axis in Figure 2 and Figure 3, even though the data is supposed to be comparable. In this example 35% in Figure 2 seems higher than 50% in Figure 3 due to Figure 2 having a scale on the y-axis (max 40%) that differ from Figure 3 (max 60%). In this case, not having the same y-axis, especially if the figures are next to each other, is a way to deceive most readers.

Good example (same scale in comparable figures):

The female respondents preferred sms (35%) as their primary communication channels (Figure 4) while male respondents preferred email (50%) (Figure 5). 

In the example above, the same scale of the y-axis in both Figure 4 and Figure 5 (max 60%) is used. In this case the height of the columns can be visually comparable between figures be the readers.

Good example (data in same figure and same scale):

The female respondents preferred sms (35%) as their primary communication channels while male respondents preferred email (50%) (Figure 6). 

In the example above, the same figure (Figure 6) is used to visualize the data from both series (female and male respondents). This makes the data easy to compare for the reader and it both increases the understandability and is space efficient in the publication. If it is possible to combine data that should be comparable in same figure, it should be done.

Do not let you scale be bigger than your theoretical maximum

Having the maximum of the y-axis higher than the maximum in the study might confuse the reader. The best practice might be to have e.g., the max of the y-axis at 100% (if not over 100% can be obtained in the study) or if a number is used on the y-axis not having a higher number than the maximum that can be obtained as the maximum on the y-axis.

Bad example (scale out-of-bounds):

Out of all respondents, 3% did not know if they were smart or not.

In the example above, the maximum of the y-axis of Figure 7 is 120% even though no column can exceed 100%. Consequently the maximum of the y-axis should be set to 100% or lower (in this example 100% is the most suitable maximum for the y-axis).

Start your scale at zero

If there is a natural starting point or a zero in your data, start your y-axis from this zero point (e.g., 0%). This is especially important if when longitudinal differences are to be illustrated so an increase or drop do not show bigger than it is.

Bad example (scale blown up):

The number of people defining themselves as "pet lovers" have declined since 2000.

In the example above, the decrease is only 1% over the first 10 years and 2% over the next 10 years. However, in Figure 8 is seems way larger due to the y-axis being zoomed in on 57% to 62%. If this scale is not used intentional to point out differences, the scale on the y-axis should be larger to show that it is “almost no change”.
It is important to make the figure show significant differences and not insignificant differences.

Adjust the axis to make differences show

If the data is comparable only with a figure, adjust axis to make differences in the data to show in the visualization. Do not use a to large scale to hide differences or a to small scale to show insignificant differences. However, avoid making the mistake to not start at the natural zero-point.

If the data should be comparable between figures, it is more important to create comparable scales and, consequently, the adjustment is done according to the data in all comparable figures.

Good example:

Hippos, Kangaroos and Ants are significantly more preferred as pets than cats and dogs (Figure 9).

In the example above, the y-axis of Figure 9 is adjusted to the range of the data. This makes it easy to see significant differences and understand the data.

Bad example: — however, not wrong in any sense

Hippos, Kangaroos and Ants are significantly more preferred as pets than cats and dogs (Figure 10).

In the example above, the y-axis of Figure 10 is from 0% to 100%, which makes it hard to see the significant differences between columns and there is general waste of space.

Put units on the axis

Do not forget to put units on the axis. This makes the data hard to understand and might end up in misunderstandings. Even if the units are stated in the figure text, units on the axis are preferred.

Bad example (no units):

More energy is used during the weekdays compared to weekends (Figure 11).

In the example above, there is no label on the y-axis of Figure 11. Consequently, the readers have no idea what the energy consumption is or what is shown in the figure.

Good example (units on y-axis):

More energy is used during the weekdays compared to weekends (Figure 12).

In the example above, there is a label (kWh) on the y-axis of Figure 12. Hence, the reader can understand the figure and interpret the energy consumption.

Visualizing different types of data in the most understandable way

Visualizing simple data — is a figure even necessary?

The first starting point is to decide if a figure of the data is necessary. In some cases, we have data that is low in complexity, e.g., the data of female vs. male or yes vs. no or similar data. In this case it is possible to visualize the data in three different ways — in the text, as a table, and a figure (see the example below). Looking strictly at the cost of space vs. value, the first (text only) is the most efficient way to visualize the data. Please note that a table or a figure also need some text in the document to refer to the table or the figure.

A general recommendation is that simple data might not need a complex visualization, but can be written in-text. However, the other ways of visualizing the data are in no way incorrect, but might be inefficient and might not increase the understandability of the data.

Alternative 1 (text only):

Among the respondents in the survey, 57% were female and 43% were male.

Writing in text when there is a simple data set, with few options, is a good option since it is space efficient and easy to understand.

Alternative 2 (table):

Table 1. Distribution of male and female respondents.

Respondents
Female57%
Male43%

A table gives a good overview of the data and is an alternative in-between in-text only and a figure. However, it might be seen as waste of space.

Alternative 3 (figure): — Might be seen as a waste of space.

A figure gives a good overview of the data, but it might be seen as waste of space.

Visualizing non-ordered data

Unordered data is a set that don’t have any defines order and, consequently, where the nearby alternatives have no connection to the current alternative. This can be to e.g., make people choose between different options. This type of data can be presented either in the text, as a table or as a figure. The most space-efficient way might be text only, but table or figures are preferred due to understandability. Please note that a table or a figure also need some text in the document to make a cross-reference to the table or figure.

Alternative 1 (text only):

According to the results, the preferred way of communicating with other students are phone call (2%), sms (23%), email (35%), WhatsApp (8%), Messenger (12%), postal mail (20%).

Alternative 2 (table):

Table 2. Preferred way of communication among respondents.

Preferred way of communication
Phone call2%
Sms23%
Email35%
WhatsApp8%
Messenger12%
Postal mail20%

The table is fairly easy to understand. It is rather space consuming and do not provide the same visual understanding as a figure.

Alternative 3 (figure):

A pie chart illustrate the distribution nicely. Please not that the colors might be problematic for someone with color vision deficiency.

A bar chart give the reader a good overview of the data and it has high accessibility.

Visualizing Likert-like data

Data that is ordered from e.g., low to high, 1 to 5, etc. (e.g., Likert scales), could be either visualized by using a table, list, or as a figure (stacked bar chart or histogram). A stacked bar chart or histogram should be used to increase the understanding of the data since the alternatives selected by the respondents in the survey are ordered. Using e.g., a circle diagram makes it hard to see the distribution in reference to the order of the answers and should not be used. Writing the data in text only usually makes it hard to understand. Please note that a table or a figure also need some text in the document to refer to the table or figure.

Alternative 1 (figure): — preferred

A stacked bar chart is space efficient and show the data so it is easy to read.
There are many other different ways of creating understandable stacked bar charts that works very well.

A histogram clearly show the distribution of the data.

Using e.g., a circle diagram makes it hard to see the distribution in relation to the order of the answers and should not be used.

Alternative 2 (table):

Table 3. Opinion on smartness by respondents.

Do you think you are smart?
Strongly disagree4%
Disagree12%
Neutral24%
Agree32%
Strongly agree28%

A table is fairly easy to understand. It is rather space consuming and do not provide the same visual understanding as a figure.


*Quote from the song Dreaming (Stein & Harry, 1979).
Stein, C., & Harry, D. (1979). Dreaming. Dreaming [Single]. Chrysalis.


Licensed under a Attribution-ShareAlike 4.0 Creative Commons license.

(First published by Thomas Mejtoft: 2023-11-21; Last updated: 2023-04-08)