What makes a scatterplot hard to comprehend: data size and pattern salience matter

Wang, Jiachen; Cai, Xiwen; Su, Jiajie; Liao, Yu; Wu, Yingcai

doi:10.1007/s12650-021-00778-8

What makes a scatterplot hard to comprehend: data size and pattern salience matter

Regular Paper
Published: 12 September 2021

Volume 25, pages 59–75, (2022)
Cite this article

Journal of Visualization Aims and scope Submit manuscript

Jiachen Wang¹,
Xiwen Cai¹,
Jiajie Su¹,
Yu Liao¹ &
…
Yingcai Wu ORCID: orcid.org/0000-0002-1119-3237¹

809 Accesses
8 Citations
Explore all metrics

Abstract

With the growing popularity of visualizations in various fields, visualization comprehension has gained considerable attention. In this work, we focus on the effect of data size and pattern salience on comprehension of scatterplot, a popular visualization type. We began with a preliminary study in which we interviewed 50 people in terms of comprehension difficulties of 90 different visualizations. The results reveal that data size is one of the top three factors affecting visualization comprehension. Besides, the effect of data size probably depends on the pattern salience within the data. Therefore, we carried out our experiment on the effect of data size and data-related pattern salience on three intermediate-level comprehension tasks, namely finding anomalies, judging correlation, and identifying clusters. The tasks were conducted on the scatterplot due to its familiarity to users and ability to support diverse tasks. Through the experiment, we found a significant interaction effect of data size and pattern salience on the comprehension of the trends in scatterplots. In specific conditions of pattern salience, data size impacts the judgment of anomalies and cluster centers. We discussed the findings in our experiment and further summarized the factors in visualization comprehension.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of Two Visualization Tools in Supporting Comprehension of Data Trends

Evaluation of Data Visualizations with Bloom’s Six Levels of Understanding

Research on the Fuzziness in the Design of Big Data Visualization

Notes

References

Alper B, Riche NH, Chevalier F, Boy J, Sezgin M (2017) Visualization literacy at elementary school. In: Proceedings of the CHI conference on human factors in computing systems, pp 5485–5497
Bertin J, Berg WJ (1985) Semiology of graphics: diagrams, networks, maps. Ann Assoc Am Geogr 75(4):605–609
Google Scholar
Best LA, Hunter AC, Stewart BM (2006) Perceiving relationships: a physiological examination of the perception of scatterplots. In: Barker-Plummer D, Cox R, Swoboda N (eds) Diagrammatic representation and inference. Diagrams 2006, pp 244–257
Borkin MA, Vo AA, Bylinskii Z, Isola P, Sunkavalli S, Oliva A, Pfister H (2013) What makes a visualization memorable? IEEE Trans Vis Comput Graph 19(12):2306–2315
Google Scholar
Börner K, Maltese A, Balliet RN, Heimlich J (2016) Investigating aspects of data visualization literacy using 20 information visualizations and 273 science museum visitors. Inf Vis 15(3):198–213
Google Scholar
Börner K, Bueckle A, Ginda M (2019) Data visualization literacy: definitions, conceptual, frameworks, exercises, and assessments. Proc Natl Acad Sci 116(6):1857–1864
Google Scholar
Boy J, Rensink RA, Bertini E, Fekete JD (2014) A principled way of assessing visualization literacy. IEEE Trans Vis Comput Graph 20(12):1963–1972
Google Scholar
Carpenter PA, Shah P (1998) A model of the perceptual and conceptual processes in graph comprehension. J Exp Psychol Appl 4(2):75–100
Google Scholar
Carswell CM (1992) Choosing specifiers: an evaluation of the basic tasks model of graphical perception. Hum Factors 34(5):535–554
Google Scholar
Chen R, Shu X, Chen J, Weng D, Tang J, Fu S, Wu Y (2021) Nebula: a coordinating grammar of graphics. IEEE Trans Vis Comput Graph. https://doi.org/10.1109/TVCG.2021.3076222
Article Google Scholar
Cleveland WS, McGill R (1984) Graphical perception: theory, experimentation, and application to the development of graphical methods. J Am Stat Assoc 79(387):531–554
Google Scholar
Curcio FR (1987) Comprehension of mathematical relationships expressed in graphs. J Res Math Educ 18(5):382–393
Google Scholar
delMas R, Garfield J, Ooms A (2005) Using assessment items to study students’ difficulty reading and interpreting graphical representations of distributions. In: Proceedings of the fourth international research forum on statistical reasoning, thinking, and literacy
Deng Z, Weng D, Liang Y, Bao J, Zheng Y, Schreck T, Xu M, Wu Y (2021) Visual cascade analytics of large-scale spatiotemporal data. IEEE Trans Vis Comput Graph. https://doi.org/10.1109/TVCG.2021.3071387
Article Google Scholar
Embretson SE, Reise SP (2000) Item response theory for psychologists. Lawrence Erlbaum Associates Publishers, Mahwah
Google Scholar
Filipov V, Schetinger V, Raminger K, Soursos N, Zapke S, Miksch S (2021) Gone full circle: a radial approach to visualize event-based networks in digital humanities. Vis Inform 5(1):45–60
Google Scholar
Freedman EG, Shah P (2002) Toward a model of knowledge-based graph comprehension. In: Hegarty M, Meyer B, Narayanan NH (eds) Diagrammatic representation and inference. Diagrams 2002, pp 18–30
Friendly M, Denis D (2005) The early origins and development of the scatterplot. J Hist Behav Sci 41(2):103–130
Google Scholar
Galesic M, Garcia-Retamero R (2011) Graph literacy: a cross-cultural comparison. Med Decis Mak 31(3):444–457
Google Scholar
Handzic M, Lam B, Aurum A, Oliver G (2002) A comparative analysis of two knowledge discovery tool: Scatterplot versus barchart. In: Proceedings of international conference on data mining, pp 167–176
Heer J, Bostock M, Ogievetsky V (2010) A tour through the visualization zoo. Commun ACM 53(6):59–67
Google Scholar
Hopkins B, Skellam JG (1954) A new method for determining the type of distribution of plant individuals. Ann Bot 18(2):213–227
Google Scholar
Huang W, Eades P, Hong SH (2009) Measuring effectiveness of graph visualizations: a cognitive load perspective. Inf Vis 8(3):139–152
Google Scholar
Hu K, Gaikwad N, Bakker M, Hulsebos M, Zgraggen E, Hidalgo C, Kraska T, Li G, Satyanarayan A (2019) Çağatay Demiralp: Viznet: towards a large-scale visualization learning and benchmarking repository. In: Proceedings of the conference on human factors in computing systems, pp 1–12
Jin Z, Chen N, Shi Y, Qian W, Xu M, Cao N (2021) TrammelGraph: visual graph abstraction for comparison. J Vis 24(2):365–379
Google Scholar
Kim Y, Heer J (2018) Assessing effects of task and data distribution on the effectiveness of visual encodings. Comput Graph Forum 37(3):157–167
Google Scholar
Klein G, Moon B, Hoffman RR (2006) Making sense of sensemaking 2: a macrocognitive model. IEEE Intell Syst 21(5):88–92
Google Scholar
Klein G, Phillips JK, Rall EL, Peluso DA (2007) A data-frame theory of sensemaking. In: Expertise out of context: proceedings of the sixth international conference on naturalistic decision making, pp 113–155
Kwon BC, Lee B (2016) A comparative evaluation on online learning approaches using parallel coordinate visualization. In: Proceedings of the CHI conference on human factors in computing systems, pp 993–997
Lan J, Wang J, Shu X, Zhou Z, Zhang H, Wu Y (2021) RallyComparator: visual comparison of the multivariate and spatial stroke sequence in a Table Tennis Rally. J Vis (to appear)
Lee S, Kim SH, Hung YH (2016) How do people make sense of unfamiliar visualizations? A grounded model of novice’s information visualization sensemaking. IEEE Trans Vis Comput Graph 22(1):499–508
Google Scholar
Lee S, Kim SH, Kwon BC (2017) Vlat: development of a visualization literacy assessment test. IEEE Trans Vis Comput Graph 23(1):551–560
Google Scholar
Lee S, Kwon B, Yang J, Lee B, Kim SH (2019) The correlation between users’ cognitive characteristics and visualization literacy. Appl Sci 9(3):488
Google Scholar
Li J, Martens JB, van Wijk JJ (2010) Judging correlation from scatterplots and parallel coordinate plots. Inf Vis 9(1):13–30
Google Scholar
Li Y, Fujiwara T, Choi YK, Kim KK, Ma KL (2020) A visual analytics system for multi-model comparison on clinical data predictions. Vis Inform 4(2):122–131
Google Scholar
Liu FT, Ting KM, hua Zhou Z (2008) Isolation forest. In: Proceedings of IEEE international conference on data mining, pp 413–422
Liu Z, Stasko J (2010) Mental models, visual reasoning and interaction in information visualization: a top-down perspective. IEEE Trans Vis Comput Graph 16(6):999–1008
Google Scholar
Ma Y, Tung AK, Wang W, Gao X, Pan Z, Chen W (2020) Scatternet: a deep subjective similarity model for visual analysis of scatterplots. IEEE Trans Vis Comput Graph 26(3):1562–1576
Google Scholar
Mei H, Guan H, Xin C, Wen X, Chen W (2020) DataV: data visualization on large high-resolution displays. Vis Inform 4(3):12–23
Google Scholar
Nguyen QV, Miller N, Arness D, Huang W, Huang ML, Simoff S (2020) Evaluation on interactive visualization data with scatterplots. Vis Inform 4(4):1–10
Google Scholar
Niklas E, Fekete JD (2010) Hierarchical aggregation for information visualization: overview, techniques, and design guidelines. IEEE Trans Vis Comput Graph 16(3):439–454
Google Scholar
Pan J, Chen W, Zhao X, Zhou S, Zeng W, Zhu M, Chen J, Fu S, Wu Y (2020) Exemplar-based layout fine-tuning for node-link diagrams. IEEE Trans Vis Comput Graph 27(2):1655–1665
Google Scholar
Patterson RE, Blaha LM, Grinstein GG, Liggett KK, Kaveney DE, Sheldon KC, Havig PR, Moore JA (2014) A human cognition framework for information visualization. Comput Graph 42:42–58
Google Scholar
Pinker S (1990) A theory of graph comprehension. In: Freedle R (ed) Artificial intelligence and the future of testing. Lawrence Erlbaum Associates Publishers, Mahwah, pp 73–126
Google Scholar
Rensink RA, Baldridge G (2010) The perception of correlation in scatterplots. Comput Graph Forum 29(3):1203–1210
Google Scholar
Ruchikachorn P, Mueller K (2015) Learning visualizations by analogy: promoting visual literacy through visualization morphing. IEEE Trans Vis Comput Graph 21(9):1028–1044
Google Scholar
Ryan G, Mosca A, Chang R, Wu E (2019) At a glance: pixel approximate entropy as a measure of line chart complexity. IEEE Trans Vis Comput Graph 25(1):872–881
Google Scholar
Sarikaya A, Gleicher M (2018) Scatterplots: tasks, data, and designs. IEEE Trans Vis Comput Graph 24(1):402–412
Google Scholar
Shah P, Freedman EG (2011) Bar and line graph comprehension: an interaction of top-down and bottom-up processes. Top Cognit Sci 3(3):560–578
Google Scholar
Shah P, Hoeffner J (2002) Review of graph comprehension research: implications for instruction. Educ Psychol Rev 14(1):47–69
Google Scholar
Shi D, Xu X, Sun F, Shi Y, Cao N (2020) Calliope: automatic visual data story generation from a spreadsheet. IEEE Trans Vis Comput Graph 27(2):453–463
Google Scholar
Shu X, Wu J, Wu X, Liang H, Cui W, Wu Y, Qu H (2021) Dancingwords: exploring animated word clouds to tell stories. J Vis 24(1):85–100
Google Scholar
Simkin D, Hastie R (1987) An information-processing analysis of graph perception. J Am Stat Assoc 82(398):454–465
Google Scholar
Spence I (2005) No humble pie: the origins and usage of a statistical chart. J Educ Behav Stat 30(4):353–368
Google Scholar
Spence I, Lewandowsky S (1991) Displaying proportions and percentages. Appl Cognit Psychol 5(1):61–77
Google Scholar
Tang J, Zhou Y, Tang T, Weng D, Xie B, Yu L, Zhang H, Wu Y (2022) A visualization approach for monitoring order processing in e-commerce warehouse. IEEE Trans Vis Comput Graph
Tatu A, Bak P, Bertini E, Keim D, Schneidewind J (2010) Visual quality metrics and human perception: an initial study on 2d projections of large multidimensional data. In: Proceedings of the international conference on advanced visual interfaces, pp 49–56
Tufte ER (2001) The visual display of quantitative information. Graphics Press, Cheshire
Google Scholar
Wainer H (1992) Understanding graphs and tables. Educ Res 21(1):14–23
Google Scholar
Wang Y, Wang Z, Zhu L, Zhang J, Fu CW, Cheng Z, Tu C, Chen B (2018) Is there a robust technique for selecting aspect ratios in line charts? IEEE Trans Vis Comput Graph 24(12):3096–3110
Google Scholar
Wang J, Zhao K, Deng D, Cao A, Xie X, Zhou Z, Zhang H, Wu Y (2020) Tac-Simur: tactic-based simulative visual analytics of table tennis. IEEE Trans Vis Comput Graph 26(1):407–417
Google Scholar
Wang J, Wu J, Cao A, Zhou Z, Zhang H, Wu Y (2021) Tac-Miner: visual tactic mining for multiple table tennis matches. IEEE Trans Vis Comput Graph 27(6):2770–2782
Google Scholar
Wang Y, Peng TQ, Lu H, Wang H, Xie X, Qu H, Wu Y (2022) Seek for success: a visualization approach for understanding the dynamics of academic careers. IEEE Trans Vis Comput Graph
Weng D, Zheng C, Deng Z, Ma M, Bao J, Zheng Y, Xu M, Wu Y (2021) Towards better bus networks: a visual analytics approach. IEEE Trans Vis Comput Graph 27(2):817–827
Google Scholar
Wilkinson L, Anand A, Grossman R (2005) Graph-theoretic scagnostics. In: Proceedings of IEEE symposium on information visualization, pp 157–164
Wu Y, Weng D, Deng Z, Bao J, Xu M, Wang Z, Zheng Y, Ding Z, Chen W (2020) Towards better detection and analysis of massive spatiotemporal co-occurrence patterns. IEEE Trans Intell Transp Syst 22(6):3387–3402
Google Scholar
Wu J, Liu D, Guo Z, Xu Q, Wu Y (2022) TacticFlow: visual analytics of ever-changing tactics in racket sports. IEEE Trans Vis Comput Graph
Xiong C, Ceja CR, Ludwig CJ, Franconeri S (2020) Biased average position estimates in line and bar graphs: underestimation, overestimation, and perceptual pull. IEEE Trans Vis Comput Graph 26(1):301–310
Google Scholar
Yang F, Harrison LT, Rensink RA, Franconeri SL, Chang R (2019) Correlation judgment and visualization features: a comparative study. IEEE Trans Vis Comput Graph 25(3):1474–1488
Google Scholar
Ye S, Chen Z, Chu X, Wang Y, Fu S, Shen L, Zhou K, Wu Y (2020) Shuttlespace: exploring and analyzing movement trajectory in immersive visualization. IEEE Trans Vis Comput Graph 27(2):860–869
Google Scholar
Yoghourdjian V, Archambault D, Diehl S, Dwyer T, Klein K, Purchase HC, Wu HY (2018) Exploring the limits of complexity: a survey of empirical studies on graph visualisation. Vis Inform 2(4):264–282
Google Scholar
Yoghourdjian V, Yang Y, Dwyer T, Lawrence L, Wybrow M, Marriott K (2020) Scalability of network visualisation from a cognitive load perspective. IEEE Trans Vis Comput Graph 27(2):1677–1687
Google Scholar
Zhao Y, Jiang H, Qin Y, Xie H, Wu Y, Liu S, Zhou Z, Xia J, Zhou F et al (2020) Preserving minority structures in graph sampling. IEEE Trans Vis Comput Graph 27(2):1698–1708
Google Scholar
Zhao M, Qu H, Sedlmair M (2019) Neighborhood perception in bar charts. In: Proceedings of the CHI conference on human factors in computing systems, pp 1–12
Zhu H, Zhu M, Feng Y, Cai D, Hu Y, Wu S, Wu X, Chen W (2021) Visualizing large-scale high-dimensional data via hierarchical embedding of KNN graphs. Vis Inform 5:51–59
Google Scholar

Download references

Acknowledgements

We thank all participants and reviewers for their thoughtful feedback and comments. The work was supported by Zhejiang Provincial Natural Science Foundation (LR18F020001).

Author information

Authors and Affiliations

State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, China
Jiachen Wang, Xiwen Cai, Jiajie Su, Yu Liao & Yingcai Wu

Authors

Jiachen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiwen Cai
View author publications
You can also search for this author in PubMed Google Scholar
Jiajie Su
View author publications
You can also search for this author in PubMed Google Scholar
Yu Liao
View author publications
You can also search for this author in PubMed Google Scholar
Yingcai Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yingcai Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary file 1

.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, J., Cai, X., Su, J. et al. What makes a scatterplot hard to comprehend: data size and pattern salience matter. J Vis 25, 59–75 (2022). https://doi.org/10.1007/s12650-021-00778-8

Download citation

Received: 08 July 2021
Accepted: 23 July 2021
Published: 12 September 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s12650-021-00778-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions