research-article

How Visualizing Inferential Uncertainty Can Mislead Readers About Treatment Effects in Scientific Results

Authors:
Jake M. Hofman

Microsoft Research, New York, NY, USA

Microsoft Research, New York, NY, USA
View Profile

,
Daniel G. Goldstein

Microsoft Research, New York, NY, USA

Microsoft Research, New York, NY, USA
View Profile

,
Jessica Hullman

Northwestern University, Evanston, IL, USA

Northwestern University, Evanston, IL, USA
View Profile

CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing SystemsApril 2020Pages 1–12https://doi.org/10.1145/3313831.3376454

Published:23 April 2020Publication History

CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems

Pages 1–12

ABSTRACT

When presenting visualizations of experimental results, scientists often choose to display either inferential uncertainty (e.g., uncertainty in the estimate of a population mean) or outcome uncertainty (e.g., variation of outcomes around that mean) about their estimates. How does this choice impact readers' beliefs about the size of treatment effects? We investigate this question in two experiments comparing 95% confidence intervals (means and standard errors) to 95% prediction intervals (means and standard deviations). The first experiment finds that participants are willing to pay more for and overestimate the effect of a treatment when shown confidence intervals relative to prediction intervals. The second experiment evaluates how alternative visualizations compare to standard visualizations for different effect sizes. We find that axis rescaling reduces error, but not as well as prediction intervals or animated hypothetical outcome plots (HOPs), and that depicting inferential uncertainty causes participants to underestimate variability in individual outcomes.

Supplemental Material

paper327pv.mp4

mp4

1.2 MB

Download

Available for Download

zip

paper327aux.zip (1.8 MB)

The supplement includes a PDF that shows the stimuli and screenshots used in our experiments.

References

Alice R Albrecht and Brian J Scholl. 2010. Perceptually averaging in a continuous visual world: Extracting statistical summary representations over time. Psychological Science 21, 4 (2010), 560--567.Google ScholarCross Ref
American Psychological Association and others. 2001. Publication manual (5th edition). American Psychological Association Washington, DC.Google Scholar
Nicholas J Barrowman and Ransom A Myers. 2003. Raindrop plots: a new way to display collections of likelihoods and distributions. The American Statistician 57, 4 (2003), 268--274.Google ScholarCross Ref
Sarah Belia, Fiona Fidler, Jennifer Williams, and Geoff Cumming. 2005. Researchers misunderstand confidence intervals and standard error bars. Psychological methods 10, 4 (2005), 389.Google Scholar
Melanie L Bell, Mallorie H Fiero, Haryana M Dhillon, Victoria J Bray, and Janette L Vardy. 2017. Statistical controversies in cancer research: using standardized effect size graphs to enhance interpretability of cancer-related clinical trials with patient-reported outcomes. Annals of Oncology 28, 8 (2017), 1730--1733.Google ScholarCross Ref
Lonni Besançon and Pierre Dragicevic. 2019. The Continued Prevalence of Dichotomous Inferences at CHI. (2019).Google Scholar
Katherine S Button, John PA Ioannidis, Claire Mokrysz, Brian A Nosek, Jonathan Flint, Emma SJ Robinson, and Marcus R Munafò. 2013. Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience 14, 5 (2013), 365.Google ScholarCross Ref
Colin F Camerer, Anna Dreber, Felix Holzmeister, Teck-Hua Ho, Jürgen Huber, Magnus Johannesson, Michael Kirchler, Gideon Nave, Brian A Nosek, Thomas Pfeiffer, and others. 2018. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour 2, 9 (2018), 637.Google ScholarCross Ref
Beth Chance, Robert del Mas, and Joan Garfield. 2004. Reasoning about sampling distribitions. In The challenge of developing statistical literacy, reasoning and thinking. Springer, 295--323.Google Scholar
Open Science Collaboration and others. 2015. Estimating the reproducibility of psychological science. Science 349, 6251 (2015), aac4716.Google Scholar
Michael Correll and Michael Gleicher. 2014. Error bars considered harmful: Exploring alternate encodings for mean and error. Visualization and Computer Graphics, IEEE Transactions on 20, 12 (2014), 2142--2151.Google Scholar
Geoff Cumming. 2013. Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge.Google Scholar
Geoff Cumming, Fiona Fidler, Pav Kalinowski, and Jerry Lai. 2012. The statistical recommendations of the American Psychological Association Publication Manual: Effect sizes, confidence intervals, and meta-analysis. Australian Journal of Psychology 64, 3 (2012), 138--146.Google ScholarCross Ref
Geoff Cumming and Sue Finch. 2005. Inference by eye: confidence intervals and how to read pictures of data. American Psychologist 60, 2 (2005), 170.Google ScholarCross Ref
Peter Cummings. 2011. Arguments for and against standardized mean differences (effect sizes). Archives of pediatrics & adolescent medicine 165, 7 (2011), 592--596.Google Scholar
Pierre Dragicevic, Yvonne Jansen, Abhraneel Sarma, Matthew Kay, and Fanny Chevalier. 2019. Increasing the Transparency of Research Papers with Explorable Multiverse Analyses. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 65.Google ScholarDigital Library
Michael Fernandes, Logan Walls, Sean Munson, Jessica Hullman, and Matthew Kay. 2018. Uncertainty Displays Using Quantile Dotplots or CDFs Improve Transit Decision-Making. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 144.Google ScholarDigital Library
Rocio Garcia-Retamero and Edward T Cokely. 2013. Communicating health risks with visual aids. Current Directions in Psychological Science 22, 5 (2013), 392--399.Google ScholarCross Ref
Gerd Gigerenzer. 1994. Why the distinction between single-event probabilities and frequencies is important for psychology (and vice versa). In Subjective probability. Wiley, 129--161.Google Scholar
Gerd Gigerenzer and Ulrich Hoffrage. 1995. How to improve Bayesian reasoning without instruction: frequency formats. Psychological review 102, 4 (1995), 684.Google Scholar
Daniel G Goldstein, Eric J Johnson, and William F Sharpe. 2008. Choosing outcomes versus choosing products: Consumer-focused retirement investment advice. Journal of Consumer Research 35, 3 (2008), 440--456.Google ScholarCross Ref
Daniel G Goldstein and David Rothschild. 2014. Lay understanding of probability distributions. Judgment & Decision Making 9, 1 (2014).Google Scholar
Rink Hoekstra, Richard D Morey, Jeffrey N Rouder, and Eric-Jan Wagenmakers. 2014. Robust misinterpretation of confidence intervals. Psychonomic bulletin & review 21, 5 (2014), 1157--1164.Google Scholar
Ulrich Hoffrage and Gerd Gigerenzer. 1998. Using natural frequencies to improve diagnostic inferences. Academic medicine 73, 5 (1998), 538--540.Google ScholarCross Ref
Jessica Hullman, Matthew Kay, Yea-Seul Kim, and Samana Shrestha. 2018. Imagining Replications: Graphical Prediction & Discrete Visualizations Improve Recall & Estimation of Effect Uncertainty. IEEE transactions on visualization and computer graphics 24, 1 (2018), 446--456.Google Scholar
Jessica Hullman, Paul Resnick, and Eytan Adar. 2015. Hypothetical outcome plots outperform error bars and violin plots for inferences about reliability of variable ordering. PloS one 10, 11 (2015), e0142444.Google ScholarCross Ref
Harald Ibrekk and M Granger Morgan. 1987. Graphical communication of uncertain quantities to nontechnical people. Risk analysis 7, 4 (1987), 519--529.Google Scholar
Christopher H Jackson. 2008. Displaying uncertainty with shading. The American Statistician 62, 4 (2008), 340--347.Google ScholarCross Ref
Alex Kale, Francis Nguyen, Matthew Kay, and Jessica Hullman. 2018. Hypothetical Outcome Plots Help Untrained Observers Judge Trends in Ambiguous Data. IEEE transactions on visualization and computer graphics (2018).Google Scholar
Peter Kampstra and others. 2008. Beanplot: A boxplot alternative for visual comparison of distributions. (2008).Google Scholar
Matthew Kay, Tara Kola, Jessica R Hullman, and Sean A Munson. 2016. When (ish) is my bus?: User-centered visualizations of uncertainty in everyday, mobile predictive systems. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 5092--5103.Google ScholarDigital Library
Yea-Seul Kim, Logan Walls, Pete Krafft, and Jessica Hullman. 2019. A Bayesian Cognition Approach to Improve Data Visualization. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM.Google ScholarDigital Library
Martin Krzywinski and Naomi Altman. 2013. Points of significance: error bars. (2013).Google Scholar
George E Newman and Brian J Scholl. 2012. Bar graphs depicting averages are perceptually misinterpreted: The within-the-bar bias. Psychonomic bulletin & review 19, 4 (2012), 601--607.Google Scholar
Nathaniel Schenker and Jane F Gentleman. 2001. On judging the significance of differences by examining the overlap between confidence intervals. The American Statistician 55, 3 (2001), 182--186.Google ScholarCross Ref
Michael Schulte-Mecklenbeck, Joseph G Johnson, Ulf Böckenholt, Daniel G Goldstein, J Edward Russo, Nicolette J Sullivan, and Martijn C Willemsen. 2017. Process-tracing methods in decision making: On growing up in the 70s. Current Directions in Psychological Science 26, 5 (2017), 442--450.Google ScholarCross Ref
Transparent Statistics in Human--Computer Interaction Working Group. 2019. Transparent Statistics Guidelines. (Feb 2019). DOI: http://dx.doi.org/10.5281/zenodo.1186169 (Available at https://transparentstats.github.io/guidelines).Google ScholarCross Ref
Leland Wilkinson. 1999. Statistical methods in psychology journals: Guidelines and explanations. American psychologist 54, 8 (1999), 594.Google Scholar

Index Terms

How Visualizing Inferential Uncertainty Can Mislead Readers About Treatment Effects in Scientific Results
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
      1. Laboratory experiments
  2. Visualization

Recommendations

The Probability that Your Hypothesis Is Correct, Credible Intervals, and Effect Sizes for IR Evaluation
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Using classical statistical significance tests, researchers can only discuss P(D⁺|H), the probability of observing the data D at hand or something more extreme, under the assumption that the hypothesis H is true (i.e., the p-value). But what we usually ...
Read More
Visualizing Data with Bounded Uncertainty
INFOVIS '02: Proceedings of the IEEE Symposium on Information Visualization (InfoVis'02)

Visualization is a powerful way to facilitate data analysis, but it is crucial that visualization systems explicitly convey the presence, nature, and degree of uncertainty to users. Otherwise, there is a danger that data will be falsely interpreted, ...
Read More
Visualizing Large-Scale Uncertainty in Astrophysical Data

Visualization of uncertainty or error in astrophysical data is seldom available in simulations of astronomical phenomena, and yet almost all rendered attributes possess some degree of uncertainty due to observational error. Uncertainties associated with ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems
April 2020
10688 pages
ISBN:9781450367080
DOI:10.1145/3313831
General Chairs:
Regina Bernhaupt
Eindhoven University of Technology, Netherlands
,
Florian 'Floyd' Mueller
Monash University, Australia
,
David Verweij
Newcastle University, UK
,
Josh Andres
RMIT, Australia
,
Program Chairs:
Joanna McGrenere
University of British Columbia, Canada
,
Andy Cockburn
University of Canterbury, New Zealand
,
Ignacio Avellino
University of Maryland Baltimore County, USA
,
Alix Goguey
Grenoble Alpes University, France
,
Pernille Bjørn
University of Copenhagen, Denmark
,
Shengdong (Shen) Zhao
National University of Singapore, Singapore
,
Briane Paul Samson
Future University Hakodate, Japan & De La Salle University, Philippines
,
Rafal Kocielnik
University of Washington, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 April 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Honorable Mention
Author Tags
confidence intervals
effect sizes
judgment and decision making
prediction intervals
uncertainty visualization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate6,199of26,314submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 921
  Total Downloads
- Downloads (Last 12 months)189
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

How Visualizing Inferential Uncertainty Can Mislead Readers About Treatment Effects in Scientific Results

CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

The Probability that Your Hypothesis Is Correct, Credible Intervals, and Effect Sizes for IR Evaluation

Visualizing Data with Bounded Uncertainty

Visualizing Large-Scale Uncertainty in Astrophysical Data