Aggregate Query Processing on Incomplete Data

Zhang, Anzhen; Wang, Jinbao; Li, Jianzhong; Gao, Hong

doi:10.1007/978-3-319-96890-2_24

Anzhen Zhang¹⁶,
Jinbao Wang¹⁶,
Jianzhong Li¹⁶ &
…
Hong Gao¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10987))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

1407 Accesses
6 Citations

Abstract

Incomplete data has been a longstanding issue in database community, and yet the subject is poorly handled by both theory and practice. In this paper, we propose to directly estimate the aggregate query result on incomplete data, rather than imputing the missing values. An interval estimation, composed of the upper and lower bound of aggregate query results among all possible interpretation of missing values, are presented to the end-users. The ground-truth aggregate result is guaranteed to be among the interval. Experimental results are consistent with the theoretical results, and suggest that the estimation is invaluable to better assess the results of aggregate queries on incomplete data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Reviews: https://snap.stanford.edu/data/web-Amazon.html.
2.
TPC-H: http://www.tpc.org/tpch.

References

Osborne, J.W.: Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data. Sage, Thousand Oaks (2012)
Google Scholar
Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)
Google Scholar
Ebaid, A., Elmagarmid, A.K., Ilyas, I.F., Ouzzani, M., Quiané-Ruiz, J.-A., Tang, N., Yin, S.: NADEEF: a generalized data cleaning system. PVLDB 6(12), 1218–1221 (2013)
Google Scholar
Deng, T., Fan, W., Geerts, F.: Capturing missing tuples and missing values. ACM Trans. Database Syst. 41(2), 10:1–10:47 (2016)
Article MathSciNet Google Scholar
Guagliardo, P., Libkin, L.: Correctness of SQL queries on databases with nulls. SIGMOD Rec. 46(3), 5–16 (2017)
Article Google Scholar
Fahandar, M.A., Hüllermeier, E., Couso, I.: Statistical inference for incomplete ranking data: the case of rank-dependent coarsening. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp. 1078–1087 (2017)
Google Scholar
Sarabia, J.M., Shahtahmassebi, G.: Bayesian estimation of incomplete data using conditionally specified priors. Commun. Stat. Simul. Comput. 46(5), 3419–3435 (2017)
MathSciNet MATH Google Scholar
Lipski Jr., W.: On semantic issues connected with incomplete information databases. ACM Trans. Database Syst. 4(3), 262–296 (1979)
Article Google Scholar
Reiter, R.: On closed world data bases. In: Logic and Data Bases, Symposium on Logic and Data Bases, Centre d’études et de recherches de Toulouse, pp. 55–76 (1977)
Chapter Google Scholar
Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4), 397–434 (1979)
Article Google Scholar
Mayfield, C., Neville, J., Prabhakar, S.: ERACER: a database approach for statistical inference and data cleaning. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6–10, 2010, pp. 75–86 (2010)
Google Scholar
Rubin, D.B., Little, R.J.A.: Statistical Analysis with Missing Data. Wiley, Hoboken (2014)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Anzhen Zhang, Jinbao Wang, Jianzhong Li & Hong Gao

Authors

Anzhen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jinbao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anzhen Zhang .

Editor information

Editors and Affiliations

South China University of Technology, Guangzhou, China
Yi Cai
Nagoya University, Nagoya, Japan
Yoshiharu Ishikawa
Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
Jianliang Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, A., Wang, J., Li, J., Gao, H. (2018). Aggregate Query Processing on Incomplete Data. In: Cai, Y., Ishikawa, Y., Xu, J. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 10987. Springer, Cham. https://doi.org/10.1007/978-3-319-96890-2_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-96890-2_24
Published: 19 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96889-6
Online ISBN: 978-3-319-96890-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics