Abstract
Incomplete data is a crucial challenge to data exploration, analytics, and visualization recommendation. Incomplete data would distort the analysis and reduce the benefits of any data-driven approach leading to poor and misleading recommendations. Several data imputation methods have been introduced to handle the incomplete data challenge. However, it is well-known that those methods cannot fully solve the incomplete data problem, but they are rather a mitigating solution that allows for improving the quality of the results provided by the different analytics operating on incomplete data. Hence, in the absence of a robust and accurate solution for the incomplete data problem, it is important to study the impact of incomplete data on different visual analytics, and how those visual analytics are affected by the incomplete data problem. In this paper, we conduct a study to observe the interplay between incomplete data and recommended visual analytics, under a combination of different conditions including: (1) the distribution of incomplete data, (2) the adopted data imputation methods, (3) the types of insights revealed by recommended visualizations, and (4) the quality measures used for assessing the goodness of recommendations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
thal: Thallium heart scan (normal, fixed defect, reversible defect).
References
Bad data cost. https://www.entrepreneur.com/article/332238
Diabetes 130 us hospitals 1999–2008. https://www.kaggle.com/brandao/diabetes
e-handbook of statistical methods. http://www.itl.nist.gov/div898/handbook/
Heart disease data set. https://archive.ics.uci.edu/ml/datasets/heart+Disease
Inside airbnb. http://insideairbnb.com/new-york-city/
Janitor work is key hurdle to insights. https://nyti.ms/1mZywng
Power bi. https://powerbi.microsoft.com/en-us/
Tableau. https://public.tableau.com/s/
Barata, A.P., et al.: Imputation methods outperform missing-indicator for data missing completely at random. In: ICDM (2019)
Batista, G.E.A.P.A., et al.: A study of k-nn as an imputation method. In: HIS (2002)
Bono, R., et al.: Bias, precision, and accuracy of skewness and kurtosis estimators for frequently used continuous distributions. SYMMAM 12(1), 19 (2020)
Cambronero, J., et al.: Query optimization for dynamic imputation. PVLDB 10(11), 1310–1321 (2017)
Demiralp, Ç., et al.: Foresight: recommending visual insights. PVLDB 10(12), 1937–1940 (2017)
Ding, R., et al.: Quickinsights: quick and automatic discovery of insights from multi-dimensional data. In: SIGMOD (2019)
Ehrlinger, L., Haunschmid, V., Palazzini, D., Lettner, C.: A DaQL to monitor data quality in machine learning applications. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A.M., Khalil, I. (eds.) DEXA 2019. LNCS, vol. 11706, pp. 227–237. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27615-7_17
Ehsan, H., et al.: Muve: efficient multi-objective view recommendation for visual data exploration. In: ICDE (2016)
Ehsan, H., et al.: Efficient recommendation of aggregate data visualizations. TKDE 30(2), 263–277 (2018)
Ehsan H., Sharaf M.A., Demartini G. (2020) QuRVe: query refinement for view recommendation in visual data exploration. In: Darmont J., Novikov B., Wrembel R. (eds.) New Trends in Databases and Information Systems. ADBIS 2020. Communications in Computer and Information Science, vol 1259. Springer, Cham. https://doi.org/10.1007/978-3-030-54623-6_14
Garciarena, U., et al.: An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers. Expert Syst. Appl. 89, 52–65 (2017)
Järvelin, K., et al.: Cumulated gain-based evaluation of IR techniques. TOIS 20(4), 422–446 (2002)
Kandel, S., et al.: Profiler: integrated statistical analysis and visualization for data quality assessment. In: AVI (2012)
Key, A., et al.: Vizdeck: dashboards for visual analytics. In: SIGMOD (2012)
Khatri, H., et al.: QPIAD: query processing over incomplete autonomous databases. In: ICDE (2007)
Kim, A., et al.: Rapid sampling for visualizations with ordering guarantees. PVLDB 8(5), 521–532 (2015)
Kim, W.Y., et al.: A taxonomy of dirty data. KDD 7(1), 81–99 (2003)
Little, R.J.A., et al.: Statistical Analysis with Missing Data. Wiley, Hoboken (1986)
Luo, Y., et al.: Deepeye: towards automatic data visualization. In: ICDE (2018)
Mafrur, R., et al.: Dive: Diversifying view recommendation for visual data exploration. In: CIKM (2018)
Manning, C.D., et al.: Introduction to Information Retrieval. Cambridge (2008)
Miao, X., et al.: SI2P: a restaurant recommendation system using preference queries over incomplete information. PVLDB 9(13), 1509–1512 (2016)
Mirkin, B.: Divisive and separate cluster structures. Core Data Analysis: Summarization, Correlation, and Visualization. UTCS, pp. 405–475. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-00271-8_5
Park, Y., et al.: Viz-aware sampling for very large databases. In: ICDE (2016)
Siddiqui, T., et al.: Effortless data exploration with zenvisage: an expressive and interactive visual analytics system. PVLDB 10(4), 457–468 (2016)
Tang, B., et al.: Extracting top-k insights from multi-dimensional data. In: SIGMOD (2017)
Vartak, M., et al.: SEEDB: efficient data-driven visualization recommendations to support visual analytics. In: PVLDB (2015)
Webber, W., et al.: A similarity measure for indefinite rankings. TOIS 28(4), 20–38 (2010)
Wu, C., et al.: Using association rules for completing missing data. In: HIS (2004)
Zhang, A., et al.: Interval estimation for aggregate queries on incomplete data. J. Comput. Sci. Technol. 34(6), 1203–1216 (2019). https://doi.org/10.1007/s11390-019-1970-4
Acknowledgments
Rischan Mafrur is sponsored by the Indonesia Endowment Fund for Education (Lembaga Pengelola Dana Pendidikan / LPDP)(201706220111044). Dr Mohamed A. Sharaf is supported by UAE University Grant (G00003352). Dr Guido Zuccon is the recipient of an Australian Research Council DECRA Research Fellowship (DE180101579) and a Google Faculty Award.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Mafrur, R., Sharaf, M.A., Zuccon, G. (2020). Quality Matters: Understanding the Impact of Incomplete Data on Visualization Recommendation. In: Hartmann, S., Küng, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2020. Lecture Notes in Computer Science(), vol 12391. Springer, Cham. https://doi.org/10.1007/978-3-030-59003-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-59003-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59002-4
Online ISBN: 978-3-030-59003-1
eBook Packages: Computer ScienceComputer Science (R0)