Paper: Visualizing Uncertainty for Non-Expert End Users
Today's paper is Visualizing Uncertainty for Non-Expert End Users: The Challenge of the Deterministic Construal Error by Susan Joslyn Sonia Savelli. This is a bit different from the recent stuff, as this paper is about coming up with visual constructions to communicate uncertainty and probability, rather than deterministic data.
This paper is interesting because it appears to sort of have fallen into its thesis by accident. They went in to test variations on well-known themes around data visualization, but found a new type of error—"Deterministic Construal Error" (DCE)—when they changed the protocol around testing how users read data. What they found was that while good visualizations can help understand uncertainty, a large amount of readers (expert or not) end up not even recognizing the data is probabilistic and instead think it refers to deterministic data. So the paper covers overall graph construction, and then the ways that DCE can be sticky.
The paper starts with overall theory, first for numerical expressions:
- most people understand probabilities on a practical, if not on a theoretical level
- numeric expressions of uncertainty work better than verbal ones, and can counteract negative effect of errors in forecast or false alarms (people don't lost trust), and work regardless of education level
And then for visualizations:
- graphical techniques such as blur, fading, sketchiness, dotted or broken lines, transparency, size, texture, and color saturation make intuitive sense, but the actual scientific evidence that they carry meaning is very limited
- when you ask users to pick their favorite visualizations, they tend to choose those that they felt give them the best understanding. However, tests showed that there was no correlation with their actual understanding.
- animations of rapidly alternating frames conveys better information than static information like error bars
- saturation, brightness, and transparency tend to work alright
- hue isn't good though. The problem is that color variations in hue are often used to convey risk. Risk is often the combination of severity and likelihood, so adding color to probabilistic charts really messes with people, including experts using expert data. Avoiding colors that communicate risk (red, yellow, orange, purple) may help.
- box plots are better for precise values, color-coded spectral charts can be better for "big picture" relative uncertainty
- when no uncertainty information is provided in weather data, people expect higher levels of uncertainty than when they're given that information.
- Most people tend to assume a normal distribution of events when communicating uncertainty, but a signification portion of people do assume a uniform distribution even if it isn't the case.
- when people see tornado polygons they tend to assume uniform risk within the polygon, whereas this isn't true. Gradient polygons may be more useful, and they noticed that in terms of readiness, it helps people in most likely areas prepare harder at no cost for less probable polygon areas.
Uncertainty is overall difficult to test for because of background and expectations: a single value in weather forecasts comes with uncertainty expectations because we're very used to them being probabilistic. What the researchers they did was compare it to a "no uncertainty" control to figure out if people learned anything new from the visualization.
Another comparison came from a study looking at hurricane charts:
When they tried both types of charts against each other, researchers found that people would believe hurricane damage to increase with time with the cone variation on the left, where people mistake the width of the cone to mean dimension and strength of the hurricane, rather than its most likely paths. On the other hand, the graph on the right reduced such error.
However, other studies revealed that the plots like on the right had people feel there is more uncertainty than in the plot on the left, despite both of them showing equivalent probabilities.
Now we come to the study itself. They started with this sort of graph:
They presented user the one on the right, with the intended usage described on the left with the key. Users were given the key as above, let's be clear. But they found that 36% of people still misinterpreted 44F to be the daytime high and 38F to be the daytime low, and 30F to be the nighttime low.
So the researchers went "okay, maybe our visualization is not good", so they tried the following variations, still with the key, and trying to emphasize uncertainty:
But the results were the same. So they tried more things to increase focus, and even included the key inside the chart so people couldn't skip it:
Still, same DCEs at the same rate. So they tried the following ones to try and show the relationships:
It had no effect. Researchers were quite annoyed by then:
We even tested a version in which the single-value forecast was accompanied by a plus/minus sign and a number indicating the amount of error to expect. Although not really a “visualization,” we were sure it would lead to fewer misinterpretations, because the only temperature value shown was the “correct” answer, the single-value temperature. Obtaining the upper- and lower-bound values required adding or subtracting the error amount. However, participants made DCEs with this format as well. This was particularly discouraging because it was clear that participants were willing to go to considerable effort to interpret the forecast as deterministic.
At that point, it was clear that we were dealing with a much deeper problem than expectations derived from other websites. In one final effort, we removed the visualization altogether and presented the information in a text format (See Figure 1B). With this format, the misinterpretation all but disappeared. Nearly everyone (94%) was able to correctly identify the numbers intended as the daytime high and nighttime low temperatures. This suggested that the previous errors were at least partially due to the fact that a visualization was used, rather than the particular type of visualization.
They point out people are able to make the same mistake without visualization rules, for example, thinking that a 40% precipitation risk means it rains over 40% of the area. You need to word things differently ("40% probability of rain, 60% probability of no rain") to counteract that effect.
The key way to detect that error is to not prime people and tell them the visualization is based on uncertainty, and just ask the question. If you tell people that uncertainty is the point, they're primed to make fewer DCEs.
Indeed, because of the need to reduce cognitive load and the fact that most predictions available to members of the public have historically been deterministic, people may automatically assume that they are receiving a deterministic message, unless that interpretation is somehow blocked. As a matter of fact, there is some evidence that blocking DCEs is possible when the specific psychological mechanism that leads to them is understood.
The issue is that detecting this effect is somewhat new and there aren't tons of hard proof about effective workarounds. Experience and training helps, but then you're no longer talking to non-specialists to the same extent. But text required no specific training and worked well!
They conclude:
[T]he text format was equally advantageous in terms of forecast understanding, trust in the forecast and decision quality as were the visualizations. In other words, omitting the visualization reduced DCEs with no costs. Moreover, participants were able to understand and use the text predictive interval forecast with absolutely no special training.
As a side note, this might be even more surprising as an effect because I think there's a mistake in the text version:
I'm pretty sure it should say "there is an 80% chance that the daytime high will be equal to or lower to 44F and more than or equal to 38F" to make the range inclusive (>38, <44), but they wrote it to mean there's an 80% chance the weather is both above 44F and below 38F (<38, >44). It doesn't make sense to use the 'and' conjunction to specify the out-of-bound ranges given the previous instructions, and would imply only a 20% range within 38..44 if it were even possible.
The integer values are also wrong for the night time weather one because it bounds it to >36F or <33F (where is should be >33F and <36F)—nevermind that the range in the visualizations was 30..36F rather than 33..36F.
I'd love to know if the actual study used the wrong text or not, but getting better results with incorrect text than with correct visualizations is mind bending.