Query-Oriented Answer Imputation for Aggregate Queries

Hannou, Fatma-Zohra; Amann, Bernd; Baazizi, Mohamed-Amine

doi:10.1007/978-3-030-28730-6_19

Fatma-Zohra Hannou¹²,
Bernd Amann¹² &
Mohamed-Amine Baazizi¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11695))

Included in the following conference series:

European Conference on Advances in Databases and Information Systems

815 Accesses

Abstract

Data imputation is a well-known technique for repairing missing data values but can incur a prohibitive cost when applied to large data sets. Query-driven imputation offers a better alternative as it allows for fixing only the data that is relevant for a query. We adopt a rule-based query rewriting technique for imputing the answers of analytic queries that are missing or suffer from incorrectness due to data incompleteness. We present a novel query rewriting mechanism that is guided by partition patterns which are compact representations of complete and missing data partitions. Our solution strives to infer the largest possible set of missing answers while improving the precision of incorrect ones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We omit attribute names when they’re not necessary for understanding.

References

Buck, S.F.: A method of estimation of missing values in multivariate data suitable for use with an electronic computer. J. R. Stat. Soc. Ser. B (Methodol) 22, 302–306 (1960)
MathSciNet MATH Google Scholar
Cambronero, J., Feser, J.K., Smith, M.J., Madden, S.: Query optimization for dynamic imputation. Proc. VLDB Endowment 10(11), 1310–1321 (2017)
Article Google Scholar
Chu, X., Ilyas, I.F., Krishnan, S., Wang, J.: Data cleaning: overview and emerging challenges. In: Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data, pp. 2201–2206. ACM, New York (2016)
Google Scholar
Chung, Y., Mortensen, M.L., Binnig, C., Kraska, T.: Estimating the impact of unknown unknowns on aggregate query results. ACM Trans. Database Syst. (TODS) 43(1), 3 (2018)
Article MathSciNet Google Scholar
Dallachiesa, M., et al.: NADEEF: a commodity data cleaning system. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 541–552. ACM (2013)
Google Scholar
Fan, W.: Dependencies revisited for improving data quality. In: Proceedings of the 2008 ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 159–170. ACM (2008)
Google Scholar
Fan, W., Geerts, F.: Relative information completeness. ACM Trans. Database Syst. (TODS) 35(4), 27 (2010)
Article Google Scholar
Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. Proc. VLDB Endowment 3(1–2), 173–184 (2010)
Article Google Scholar
Farhangfar, A., Kurgan, L., Dy, J.: Impact of imputation of missing values on classification error for discrete data. Pattern Recognit. 41(12), 3692–3705 (2008)
Article Google Scholar
Garofalakis, M.N., Gibbons, P.B.: Approximate query processing: taming the terabytes. In: Proceedings of 27th International Conference on Very Large Databases (VLDB), pp. 343–352 (2001)
Google Scholar
Hannou, F.Z., Amann, B., Baazizi, A.M.: Exploring and comparing table fragments with fragment summaries. In: The Eleventh International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA). IARIA (2019)
Google Scholar
Liao, Z., Lu, X., Yang, T., Wang, H.: Missing data imputation: a fuzzy k-means clustering algorithm over sliding window. In: 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 3, pp. 133–137. IEEE (2009)
Google Scholar
Mansinghka, V., Tibbetts, R., Baxter, J., Shafto, P., Eaves, B.: BayesDB: A probabilistic programming system for querying the probable implications of data. arXiv preprint arXiv:1512.05006 (2015)
Razniewski, S., Korn, F., Nutt, W., Srivastava, D.: Identifying the extent of completeness of query answers over partially complete databases. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, pp. 561–576, 31 May–4 June 2015
Google Scholar
Silva-Ramírez, E.L., Pino-Mejías, R., López-Coello, M., Cubiles-de-la Vega, M.D.: Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw. 24(1), 121–129 (2011)
Article Google Scholar
Wang, J., Krishnan, S., Franklin, M.J., Goldberg, K., Kraska, T., Milo, T.: A sample-and-clean framework for fast and accurate query processing on dirty data. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 469–480. ACM (2014)
Google Scholar
Wang, J., Tang, N.: Towards dependable data repairing with fixing rules. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 457–468 (2014)
Google Scholar
Zhu, B., He, C., Liatsis, P.: A robust missing value imputation method for noisy data. Appl. Intell. 36(1), 61–74 (2012)
Article Google Scholar

Download references

Acknowledgement

This work has partially been supported by the EBITA collaborative research project between the Fraunhofer Institute and Sorbonne Université.

Author information

Authors and Affiliations

Sorbonne Université, CNRS, LIP6, Paris, France
Fatma-Zohra Hannou, Bernd Amann & Mohamed-Amine Baazizi

Authors

Fatma-Zohra Hannou
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Amann
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed-Amine Baazizi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bernd Amann .

Editor information

Editors and Affiliations

University of Maribor, Maribor, Slovenia
Tatjana Welzer
Alpen-Adria Universität Klagenfurt, Klagenfurt, Austria
Johann Eder
University of Maribor, Maribor, Slovenia
Vili Podgorelec
University of Maribor, Maribor, Slovenia
Aida Kamišalić Latifić

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hannou, FZ., Amann, B., Baazizi, MA. (2019). Query-Oriented Answer Imputation for Aggregate Queries. In: Welzer, T., Eder, J., Podgorelec, V., Kamišalić Latifić, A. (eds) Advances in Databases and Information Systems. ADBIS 2019. Lecture Notes in Computer Science(), vol 11695. Springer, Cham. https://doi.org/10.1007/978-3-030-28730-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-28730-6_19
Published: 13 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28729-0
Online ISBN: 978-3-030-28730-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics