Skip to main content

Aggregate Query Processing on Incomplete Data

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10987))

Abstract

Incomplete data has been a longstanding issue in database community, and yet the subject is poorly handled by both theory and practice. In this paper, we propose to directly estimate the aggregate query result on incomplete data, rather than imputing the missing values. An interval estimation, composed of the upper and lower bound of aggregate query results among all possible interpretation of missing values, are presented to the end-users. The ground-truth aggregate result is guaranteed to be among the interval. Experimental results are consistent with the theoretical results, and suggest that the estimation is invaluable to better assess the results of aggregate queries on incomplete data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Reviews: https://snap.stanford.edu/data/web-Amazon.html.

  2. 2.

    TPC-H: http://www.tpc.org/tpch.

References

  1. Osborne, J.W.: Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data. Sage, Thousand Oaks (2012)

    Google Scholar 

  2. Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)

    Google Scholar 

  3. Ebaid, A., Elmagarmid, A.K., Ilyas, I.F., Ouzzani, M., Quiané-Ruiz, J.-A., Tang, N., Yin, S.: NADEEF: a generalized data cleaning system. PVLDB 6(12), 1218–1221 (2013)

    Google Scholar 

  4. Deng, T., Fan, W., Geerts, F.: Capturing missing tuples and missing values. ACM Trans. Database Syst. 41(2), 10:1–10:47 (2016)

    Article  MathSciNet  Google Scholar 

  5. Guagliardo, P., Libkin, L.: Correctness of SQL queries on databases with nulls. SIGMOD Rec. 46(3), 5–16 (2017)

    Article  Google Scholar 

  6. Fahandar, M.A., Hüllermeier, E., Couso, I.: Statistical inference for incomplete ranking data: the case of rank-dependent coarsening. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp. 1078–1087 (2017)

    Google Scholar 

  7. Sarabia, J.M., Shahtahmassebi, G.: Bayesian estimation of incomplete data using conditionally specified priors. Commun. Stat. Simul. Comput. 46(5), 3419–3435 (2017)

    MathSciNet  MATH  Google Scholar 

  8. Lipski Jr., W.: On semantic issues connected with incomplete information databases. ACM Trans. Database Syst. 4(3), 262–296 (1979)

    Article  Google Scholar 

  9. Reiter, R.: On closed world data bases. In: Logic and Data Bases, Symposium on Logic and Data Bases, Centre d’études et de recherches de Toulouse, pp. 55–76 (1977)

    Chapter  Google Scholar 

  10. Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. 4(4), 397–434 (1979)

    Article  Google Scholar 

  11. Mayfield, C., Neville, J., Prabhakar, S.: ERACER: a database approach for statistical inference and data cleaning. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6–10, 2010, pp. 75–86 (2010)

    Google Scholar 

  12. Rubin, D.B., Little, R.J.A.: Statistical Analysis with Missing Data. Wiley, Hoboken (2014)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anzhen Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, A., Wang, J., Li, J., Gao, H. (2018). Aggregate Query Processing on Incomplete Data. In: Cai, Y., Ishikawa, Y., Xu, J. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 10987. Springer, Cham. https://doi.org/10.1007/978-3-319-96890-2_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96890-2_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96889-6

  • Online ISBN: 978-3-319-96890-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics