Skip to main content
Log in

What makes a scatterplot hard to comprehend: data size and pattern salience matter

  • Regular Paper
  • Published:
Journal of Visualization Aims and scope Submit manuscript

Abstract

With the growing popularity of visualizations in various fields, visualization comprehension has gained considerable attention. In this work, we focus on the effect of data size and pattern salience on comprehension of scatterplot, a popular visualization type. We began with a preliminary study in which we interviewed 50 people in terms of comprehension difficulties of 90 different visualizations. The results reveal that data size is one of the top three factors affecting visualization comprehension. Besides, the effect of data size probably depends on the pattern salience within the data. Therefore, we carried out our experiment on the effect of data size and data-related pattern salience on three intermediate-level comprehension tasks, namely finding anomalies, judging correlation, and identifying clusters. The tasks were conducted on the scatterplot due to its familiarity to users and ability to support diverse tasks. Through the experiment, we found a significant interaction effect of data size and pattern salience on the comprehension of the trends in scatterplots. In specific conditions of pattern salience, data size impacts the judgment of anomalies and cluster centers. We discussed the findings in our experiment and further summarized the factors in visualization comprehension.

Graphic abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://www.kaggle.com/.

  2. https://github.com/VisWang/scatterplots-dataset.

References

  • Alper B, Riche NH, Chevalier F, Boy J, Sezgin M (2017) Visualization literacy at elementary school. In: Proceedings of the CHI conference on human factors in computing systems, pp 5485–5497

  • Bertin J, Berg WJ (1985) Semiology of graphics: diagrams, networks, maps. Ann Assoc Am Geogr 75(4):605–609

    Google Scholar 

  • Best LA, Hunter AC, Stewart BM (2006) Perceiving relationships: a physiological examination of the perception of scatterplots. In: Barker-Plummer D, Cox R, Swoboda N (eds) Diagrammatic representation and inference. Diagrams 2006, pp 244–257

  • Borkin MA, Vo AA, Bylinskii Z, Isola P, Sunkavalli S, Oliva A, Pfister H (2013) What makes a visualization memorable? IEEE Trans Vis Comput Graph 19(12):2306–2315

    Google Scholar 

  • Börner K, Maltese A, Balliet RN, Heimlich J (2016) Investigating aspects of data visualization literacy using 20 information visualizations and 273 science museum visitors. Inf Vis 15(3):198–213

    Google Scholar 

  • Börner K, Bueckle A, Ginda M (2019) Data visualization literacy: definitions, conceptual, frameworks, exercises, and assessments. Proc Natl Acad Sci 116(6):1857–1864

    Google Scholar 

  • Boy J, Rensink RA, Bertini E, Fekete JD (2014) A principled way of assessing visualization literacy. IEEE Trans Vis Comput Graph 20(12):1963–1972

    Google Scholar 

  • Carpenter PA, Shah P (1998) A model of the perceptual and conceptual processes in graph comprehension. J Exp Psychol Appl 4(2):75–100

    Google Scholar 

  • Carswell CM (1992) Choosing specifiers: an evaluation of the basic tasks model of graphical perception. Hum Factors 34(5):535–554

    Google Scholar 

  • Chen R, Shu X, Chen J, Weng D, Tang J, Fu S, Wu Y (2021) Nebula: a coordinating grammar of graphics. IEEE Trans Vis Comput Graph. https://doi.org/10.1109/TVCG.2021.3076222

    Article  Google Scholar 

  • Cleveland WS, McGill R (1984) Graphical perception: theory, experimentation, and application to the development of graphical methods. J Am Stat Assoc 79(387):531–554

    Google Scholar 

  • Curcio FR (1987) Comprehension of mathematical relationships expressed in graphs. J Res Math Educ 18(5):382–393

    Google Scholar 

  • delMas R, Garfield J, Ooms A (2005) Using assessment items to study students’ difficulty reading and interpreting graphical representations of distributions. In: Proceedings of the fourth international research forum on statistical reasoning, thinking, and literacy

  • Deng Z, Weng D, Liang Y, Bao J, Zheng Y, Schreck T, Xu M, Wu Y (2021) Visual cascade analytics of large-scale spatiotemporal data. IEEE Trans Vis Comput Graph. https://doi.org/10.1109/TVCG.2021.3071387

    Article  Google Scholar 

  • Embretson SE, Reise SP (2000) Item response theory for psychologists. Lawrence Erlbaum Associates Publishers, Mahwah

    Google Scholar 

  • Filipov V, Schetinger V, Raminger K, Soursos N, Zapke S, Miksch S (2021) Gone full circle: a radial approach to visualize event-based networks in digital humanities. Vis Inform 5(1):45–60

    Google Scholar 

  • Freedman EG, Shah P (2002) Toward a model of knowledge-based graph comprehension. In: Hegarty M, Meyer B, Narayanan NH (eds) Diagrammatic representation and inference. Diagrams 2002, pp 18–30

  • Friendly M, Denis D (2005) The early origins and development of the scatterplot. J Hist Behav Sci 41(2):103–130

    Google Scholar 

  • Galesic M, Garcia-Retamero R (2011) Graph literacy: a cross-cultural comparison. Med Decis Mak 31(3):444–457

    Google Scholar 

  • Handzic M, Lam B, Aurum A, Oliver G (2002) A comparative analysis of two knowledge discovery tool: Scatterplot versus barchart. In: Proceedings of international conference on data mining, pp 167–176

  • Heer J, Bostock M, Ogievetsky V (2010) A tour through the visualization zoo. Commun ACM 53(6):59–67

    Google Scholar 

  • Hopkins B, Skellam JG (1954) A new method for determining the type of distribution of plant individuals. Ann Bot 18(2):213–227

    Google Scholar 

  • Huang W, Eades P, Hong SH (2009) Measuring effectiveness of graph visualizations: a cognitive load perspective. Inf Vis 8(3):139–152

    Google Scholar 

  • Hu K, Gaikwad N, Bakker M, Hulsebos M, Zgraggen E, Hidalgo C, Kraska T, Li G, Satyanarayan A (2019) Çağatay Demiralp: Viznet: towards a large-scale visualization learning and benchmarking repository. In: Proceedings of the conference on human factors in computing systems, pp 1–12

  • Jin Z, Chen N, Shi Y, Qian W, Xu M, Cao N (2021) TrammelGraph: visual graph abstraction for comparison. J Vis 24(2):365–379

    Google Scholar 

  • Kim Y, Heer J (2018) Assessing effects of task and data distribution on the effectiveness of visual encodings. Comput Graph Forum 37(3):157–167

    Google Scholar 

  • Klein G, Moon B, Hoffman RR (2006) Making sense of sensemaking 2: a macrocognitive model. IEEE Intell Syst 21(5):88–92

    Google Scholar 

  • Klein G, Phillips JK, Rall EL, Peluso DA (2007) A data-frame theory of sensemaking. In: Expertise out of context: proceedings of the sixth international conference on naturalistic decision making, pp 113–155

  • Kwon BC, Lee B (2016) A comparative evaluation on online learning approaches using parallel coordinate visualization. In: Proceedings of the CHI conference on human factors in computing systems, pp 993–997

  • Lan J, Wang J, Shu X, Zhou Z, Zhang H, Wu Y (2021) RallyComparator: visual comparison of the multivariate and spatial stroke sequence in a Table Tennis Rally. J Vis (to appear)

  • Lee S, Kim SH, Hung YH (2016) How do people make sense of unfamiliar visualizations? A grounded model of novice’s information visualization sensemaking. IEEE Trans Vis Comput Graph 22(1):499–508

    Google Scholar 

  • Lee S, Kim SH, Kwon BC (2017) Vlat: development of a visualization literacy assessment test. IEEE Trans Vis Comput Graph 23(1):551–560

    Google Scholar 

  • Lee S, Kwon B, Yang J, Lee B, Kim SH (2019) The correlation between users’ cognitive characteristics and visualization literacy. Appl Sci 9(3):488

    Google Scholar 

  • Li J, Martens JB, van Wijk JJ (2010) Judging correlation from scatterplots and parallel coordinate plots. Inf Vis 9(1):13–30

    Google Scholar 

  • Li Y, Fujiwara T, Choi YK, Kim KK, Ma KL (2020) A visual analytics system for multi-model comparison on clinical data predictions. Vis Inform 4(2):122–131

    Google Scholar 

  • Liu FT, Ting KM, hua Zhou Z (2008) Isolation forest. In: Proceedings of IEEE international conference on data mining, pp 413–422

  • Liu Z, Stasko J (2010) Mental models, visual reasoning and interaction in information visualization: a top-down perspective. IEEE Trans Vis Comput Graph 16(6):999–1008

    Google Scholar 

  • Ma Y, Tung AK, Wang W, Gao X, Pan Z, Chen W (2020) Scatternet: a deep subjective similarity model for visual analysis of scatterplots. IEEE Trans Vis Comput Graph 26(3):1562–1576

    Google Scholar 

  • Mei H, Guan H, Xin C, Wen X, Chen W (2020) DataV: data visualization on large high-resolution displays. Vis Inform 4(3):12–23

    Google Scholar 

  • Nguyen QV, Miller N, Arness D, Huang W, Huang ML, Simoff S (2020) Evaluation on interactive visualization data with scatterplots. Vis Inform 4(4):1–10

    Google Scholar 

  • Niklas E, Fekete JD (2010) Hierarchical aggregation for information visualization: overview, techniques, and design guidelines. IEEE Trans Vis Comput Graph 16(3):439–454

    Google Scholar 

  • Pan J, Chen W, Zhao X, Zhou S, Zeng W, Zhu M, Chen J, Fu S, Wu Y (2020) Exemplar-based layout fine-tuning for node-link diagrams. IEEE Trans Vis Comput Graph 27(2):1655–1665

    Google Scholar 

  • Patterson RE, Blaha LM, Grinstein GG, Liggett KK, Kaveney DE, Sheldon KC, Havig PR, Moore JA (2014) A human cognition framework for information visualization. Comput Graph 42:42–58

    Google Scholar 

  • Pinker S (1990) A theory of graph comprehension. In: Freedle R (ed) Artificial intelligence and the future of testing. Lawrence Erlbaum Associates Publishers, Mahwah, pp 73–126

    Google Scholar 

  • Rensink RA, Baldridge G (2010) The perception of correlation in scatterplots. Comput Graph Forum 29(3):1203–1210

    Google Scholar 

  • Ruchikachorn P, Mueller K (2015) Learning visualizations by analogy: promoting visual literacy through visualization morphing. IEEE Trans Vis Comput Graph 21(9):1028–1044

    Google Scholar 

  • Ryan G, Mosca A, Chang R, Wu E (2019) At a glance: pixel approximate entropy as a measure of line chart complexity. IEEE Trans Vis Comput Graph 25(1):872–881

    Google Scholar 

  • Sarikaya A, Gleicher M (2018) Scatterplots: tasks, data, and designs. IEEE Trans Vis Comput Graph 24(1):402–412

    Google Scholar 

  • Shah P, Freedman EG (2011) Bar and line graph comprehension: an interaction of top-down and bottom-up processes. Top Cognit Sci 3(3):560–578

    Google Scholar 

  • Shah P, Hoeffner J (2002) Review of graph comprehension research: implications for instruction. Educ Psychol Rev 14(1):47–69

    Google Scholar 

  • Shi D, Xu X, Sun F, Shi Y, Cao N (2020) Calliope: automatic visual data story generation from a spreadsheet. IEEE Trans Vis Comput Graph 27(2):453–463

    Google Scholar 

  • Shu X, Wu J, Wu X, Liang H, Cui W, Wu Y, Qu H (2021) Dancingwords: exploring animated word clouds to tell stories. J Vis 24(1):85–100

    Google Scholar 

  • Simkin D, Hastie R (1987) An information-processing analysis of graph perception. J Am Stat Assoc 82(398):454–465

    Google Scholar 

  • Spence I (2005) No humble pie: the origins and usage of a statistical chart. J Educ Behav Stat 30(4):353–368

    Google Scholar 

  • Spence I, Lewandowsky S (1991) Displaying proportions and percentages. Appl Cognit Psychol 5(1):61–77

    Google Scholar 

  • Tang J, Zhou Y, Tang T, Weng D, Xie B, Yu L, Zhang H, Wu Y (2022) A visualization approach for monitoring order processing in e-commerce warehouse. IEEE Trans Vis Comput Graph

  • Tatu A, Bak P, Bertini E, Keim D, Schneidewind J (2010) Visual quality metrics and human perception: an initial study on 2d projections of large multidimensional data. In: Proceedings of the international conference on advanced visual interfaces, pp 49–56

  • Tufte ER (2001) The visual display of quantitative information. Graphics Press, Cheshire

    Google Scholar 

  • Wainer H (1992) Understanding graphs and tables. Educ Res 21(1):14–23

    Google Scholar 

  • Wang Y, Wang Z, Zhu L, Zhang J, Fu CW, Cheng Z, Tu C, Chen B (2018) Is there a robust technique for selecting aspect ratios in line charts? IEEE Trans Vis Comput Graph 24(12):3096–3110

    Google Scholar 

  • Wang J, Zhao K, Deng D, Cao A, Xie X, Zhou Z, Zhang H, Wu Y (2020) Tac-Simur: tactic-based simulative visual analytics of table tennis. IEEE Trans Vis Comput Graph 26(1):407–417

    Google Scholar 

  • Wang J, Wu J, Cao A, Zhou Z, Zhang H, Wu Y (2021) Tac-Miner: visual tactic mining for multiple table tennis matches. IEEE Trans Vis Comput Graph 27(6):2770–2782

    Google Scholar 

  • Wang Y, Peng TQ, Lu H, Wang H, Xie X, Qu H, Wu Y (2022) Seek for success: a visualization approach for understanding the dynamics of academic careers. IEEE Trans Vis Comput Graph

  • Weng D, Zheng C, Deng Z, Ma M, Bao J, Zheng Y, Xu M, Wu Y (2021) Towards better bus networks: a visual analytics approach. IEEE Trans Vis Comput Graph 27(2):817–827

    Google Scholar 

  • Wilkinson L, Anand A, Grossman R (2005) Graph-theoretic scagnostics. In: Proceedings of IEEE symposium on information visualization, pp 157–164

  • Wu Y, Weng D, Deng Z, Bao J, Xu M, Wang Z, Zheng Y, Ding Z, Chen W (2020) Towards better detection and analysis of massive spatiotemporal co-occurrence patterns. IEEE Trans Intell Transp Syst 22(6):3387–3402

    Google Scholar 

  • Wu J, Liu D, Guo Z, Xu Q, Wu Y (2022) TacticFlow: visual analytics of ever-changing tactics in racket sports. IEEE Trans Vis Comput Graph

  • Xiong C, Ceja CR, Ludwig CJ, Franconeri S (2020) Biased average position estimates in line and bar graphs: underestimation, overestimation, and perceptual pull. IEEE Trans Vis Comput Graph 26(1):301–310

    Google Scholar 

  • Yang F, Harrison LT, Rensink RA, Franconeri SL, Chang R (2019) Correlation judgment and visualization features: a comparative study. IEEE Trans Vis Comput Graph 25(3):1474–1488

    Google Scholar 

  • Ye S, Chen Z, Chu X, Wang Y, Fu S, Shen L, Zhou K, Wu Y (2020) Shuttlespace: exploring and analyzing movement trajectory in immersive visualization. IEEE Trans Vis Comput Graph 27(2):860–869

    Google Scholar 

  • Yoghourdjian V, Archambault D, Diehl S, Dwyer T, Klein K, Purchase HC, Wu HY (2018) Exploring the limits of complexity: a survey of empirical studies on graph visualisation. Vis Inform 2(4):264–282

    Google Scholar 

  • Yoghourdjian V, Yang Y, Dwyer T, Lawrence L, Wybrow M, Marriott K (2020) Scalability of network visualisation from a cognitive load perspective. IEEE Trans Vis Comput Graph 27(2):1677–1687

    Google Scholar 

  • Zhao Y, Jiang H, Qin Y, Xie H, Wu Y, Liu S, Zhou Z, Xia J, Zhou F et al (2020) Preserving minority structures in graph sampling. IEEE Trans Vis Comput Graph 27(2):1698–1708

    Google Scholar 

  • Zhao M, Qu H, Sedlmair M (2019) Neighborhood perception in bar charts. In: Proceedings of the CHI conference on human factors in computing systems, pp 1–12

  • Zhu H, Zhu M, Feng Y, Cai D, Hu Y, Wu S, Wu X, Chen W (2021) Visualizing large-scale high-dimensional data via hierarchical embedding of KNN graphs. Vis Inform 5:51–59

    Google Scholar 

Download references

Acknowledgements

We thank all participants and reviewers for their thoughtful feedback and comments. The work was supported by Zhejiang Provincial Natural Science Foundation (LR18F020001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yingcai Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Cai, X., Su, J. et al. What makes a scatterplot hard to comprehend: data size and pattern salience matter. J Vis 25, 59–75 (2022). https://doi.org/10.1007/s12650-021-00778-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12650-021-00778-8

Keywords

Navigation