Skip to main content

Quality Matters: Understanding the Impact of Incomplete Data on Visualization Recommendation

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2020)

Abstract

Incomplete data is a crucial challenge to data exploration, analytics, and visualization recommendation. Incomplete data would distort the analysis and reduce the benefits of any data-driven approach leading to poor and misleading recommendations. Several data imputation methods have been introduced to handle the incomplete data challenge. However, it is well-known that those methods cannot fully solve the incomplete data problem, but they are rather a mitigating solution that allows for improving the quality of the results provided by the different analytics operating on incomplete data. Hence, in the absence of a robust and accurate solution for the incomplete data problem, it is important to study the impact of incomplete data on different visual analytics, and how those visual analytics are affected by the incomplete data problem. In this paper, we conduct a study to observe the interplay between incomplete data and recommended visual analytics, under a combination of different conditions including: (1) the distribution of incomplete data, (2) the adopted data imputation methods, (3) the types of insights revealed by recommended visualizations, and (4) the quality measures used for assessing the goodness of recommendations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    thal: Thallium heart scan (normal, fixed defect, reversible defect).

References

  1. Bad data cost. https://www.entrepreneur.com/article/332238

  2. Diabetes 130 us hospitals 1999–2008. https://www.kaggle.com/brandao/diabetes

  3. e-handbook of statistical methods. http://www.itl.nist.gov/div898/handbook/

  4. Heart disease data set. https://archive.ics.uci.edu/ml/datasets/heart+Disease

  5. Inside airbnb. http://insideairbnb.com/new-york-city/

  6. Janitor work is key hurdle to insights. https://nyti.ms/1mZywng

  7. Power bi. https://powerbi.microsoft.com/en-us/

  8. Spotfire. https://www.tibco.com/products/tibco-spotfire/

  9. Tableau. https://public.tableau.com/s/

  10. Barata, A.P., et al.: Imputation methods outperform missing-indicator for data missing completely at random. In: ICDM (2019)

    Google Scholar 

  11. Batista, G.E.A.P.A., et al.: A study of k-nn as an imputation method. In: HIS (2002)

    Google Scholar 

  12. Bono, R., et al.: Bias, precision, and accuracy of skewness and kurtosis estimators for frequently used continuous distributions. SYMMAM 12(1), 19 (2020)

    Article  Google Scholar 

  13. Cambronero, J., et al.: Query optimization for dynamic imputation. PVLDB 10(11), 1310–1321 (2017)

    Google Scholar 

  14. Demiralp, Ç., et al.: Foresight: recommending visual insights. PVLDB 10(12), 1937–1940 (2017)

    Google Scholar 

  15. Ding, R., et al.: Quickinsights: quick and automatic discovery of insights from multi-dimensional data. In: SIGMOD (2019)

    Google Scholar 

  16. Ehrlinger, L., Haunschmid, V., Palazzini, D., Lettner, C.: A DaQL to monitor data quality in machine learning applications. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DEXA 2019. LNCS, vol. 11706, pp. 227–237. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27615-7_17

    Chapter  Google Scholar 

  17. Ehsan, H., et al.: Muve: efficient multi-objective view recommendation for visual data exploration. In: ICDE (2016)

    Google Scholar 

  18. Ehsan, H., et al.: Efficient recommendation of aggregate data visualizations. TKDE 30(2), 263–277 (2018)

    Google Scholar 

  19. Ehsan H., Sharaf M.A., Demartini G. (2020) QuRVe: query refinement for view recommendation in visual data exploration. In: Darmont J., Novikov B., Wrembel R. (eds.) New Trends in Databases and Information Systems. ADBIS 2020. Communications in Computer and Information Science, vol 1259. Springer, Cham. https://doi.org/10.1007/978-3-030-54623-6_14

  20. Garciarena, U., et al.: An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers. Expert Syst. Appl. 89, 52–65 (2017)

    Article  Google Scholar 

  21. Järvelin, K., et al.: Cumulated gain-based evaluation of IR techniques. TOIS 20(4), 422–446 (2002)

    Article  Google Scholar 

  22. Kandel, S., et al.: Profiler: integrated statistical analysis and visualization for data quality assessment. In: AVI (2012)

    Google Scholar 

  23. Key, A., et al.: Vizdeck: dashboards for visual analytics. In: SIGMOD (2012)

    Google Scholar 

  24. Khatri, H., et al.: QPIAD: query processing over incomplete autonomous databases. In: ICDE (2007)

    Google Scholar 

  25. Kim, A., et al.: Rapid sampling for visualizations with ordering guarantees. PVLDB 8(5), 521–532 (2015)

    Google Scholar 

  26. Kim, W.Y., et al.: A taxonomy of dirty data. KDD 7(1), 81–99 (2003)

    MathSciNet  Google Scholar 

  27. Little, R.J.A., et al.: Statistical Analysis with Missing Data. Wiley, Hoboken (1986)

    Google Scholar 

  28. Luo, Y., et al.: Deepeye: towards automatic data visualization. In: ICDE (2018)

    Google Scholar 

  29. Mafrur, R., et al.: Dive: Diversifying view recommendation for visual data exploration. In: CIKM (2018)

    Google Scholar 

  30. Manning, C.D., et al.: Introduction to Information Retrieval. Cambridge (2008)

    Google Scholar 

  31. Miao, X., et al.: SI2P: a restaurant recommendation system using preference queries over incomplete information. PVLDB 9(13), 1509–1512 (2016)

    Google Scholar 

  32. Mirkin, B.: Divisive and separate cluster structures. Core Data Analysis: Summarization, Correlation, and Visualization. UTCS, pp. 405–475. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-00271-8_5

    Chapter  Google Scholar 

  33. Park, Y., et al.: Viz-aware sampling for very large databases. In: ICDE (2016)

    Google Scholar 

  34. Siddiqui, T., et al.: Effortless data exploration with zenvisage: an expressive and interactive visual analytics system. PVLDB 10(4), 457–468 (2016)

    Google Scholar 

  35. Tang, B., et al.: Extracting top-k insights from multi-dimensional data. In: SIGMOD (2017)

    Google Scholar 

  36. Vartak, M., et al.: SEEDB: efficient data-driven visualization recommendations to support visual analytics. In: PVLDB (2015)

    Google Scholar 

  37. Webber, W., et al.: A similarity measure for indefinite rankings. TOIS 28(4), 20–38 (2010)

    Article  Google Scholar 

  38. Wu, C., et al.: Using association rules for completing missing data. In: HIS (2004)

    Google Scholar 

  39. Zhang, A., et al.: Interval estimation for aggregate queries on incomplete data. J. Comput. Sci. Technol. 34(6), 1203–1216 (2019). https://doi.org/10.1007/s11390-019-1970-4

    Article  Google Scholar 

Download references

Acknowledgments

Rischan Mafrur is sponsored by the Indonesia Endowment Fund for Education (Lembaga Pengelola Dana Pendidikan / LPDP)(201706220111044). Dr Mohamed A. Sharaf is supported by UAE University Grant (G00003352). Dr Guido Zuccon is the recipient of an Australian Research Council DECRA Research Fellowship (DE180101579) and a Google Faculty Award.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rischan Mafrur .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mafrur, R., Sharaf, M.A., Zuccon, G. (2020). Quality Matters: Understanding the Impact of Incomplete Data on Visualization Recommendation. In: Hartmann, S., Küng, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2020. Lecture Notes in Computer Science(), vol 12391. Springer, Cham. https://doi.org/10.1007/978-3-030-59003-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59003-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59002-4

  • Online ISBN: 978-3-030-59003-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics