Skip to main content

Exploiting Hierarchies for Efficient Detection of Completeness in Stream Data

  • Conference paper
  • First Online:
  • 2101 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9877))

Abstract

In big data settings, the data can often be externally sourced with little or no knowledge of its quality. In such settings, users need to be empowered with the capacity to understand the quality of data sets and implications for use, in order to mitigate the risk of making investments in datasets that will not deliver. In this paper we present an approach for detecting the completeness of high volume stream data generated by a large number of data providers. By exploiting the inherent hierarchies within database attributes, we are able to devise an efficient solution for computing query specific completeness, thereby improving user understanding of implications of using query results based on incomplete data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    There are actually 14 corporations in the transit sector with that acronym listed on Wikipedia.

  2. 2.

    http://goo.gl/x2kZD5, for instance, cites 1200 buses in Brisbane in 2012.

References

  1. https://en.wikipedia.org/wiki/Bus_bunching

  2. Abiteboul, S., Dong, L., Etzioni, O., Srivastava, D., Weikum, G., Stoyanovich, J., Suchanek, F.M.: The elephant in the room: getting value from big data. In: WebDB, pp. 1–5. ACM (2015)

    Google Scholar 

  3. Ashton, K.: That ‘internet of things’ thing. RFiD J. 22(7), 97–114 (2009)

    Google Scholar 

  4. Biswas, J., Naumann, F., Qiu, Q.: Assessing the completeness of sensor data. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 717–732. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Blakeley, J.A., Coburn, N., Larson, P.: Updating derived relations: Detecting irrelevant and autonomously computable updates. In: VLDB (1986)

    Google Scholar 

  6. Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: ICDE (2007)

    Google Scholar 

  7. Brown, P., Link, S.: Probabilistic keys for data quality management. In: Zdravkovic, J., Kirikova, M., Johannesson, P. (eds.) CAiSE 2015. LNCS, vol. 9097, pp. 118–132. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  8. Golab, L., Johnson, T.: Consistency in a stream warehouse. In: CIDR, pp. 114–122 (2011)

    Google Scholar 

  9. Tamer Özsu, M., Golab, L.: Issues in data stream management. ACM Sigmod Rec. 32(2), 5–14 (2003)

    Article  MATH  Google Scholar 

  10. Hartig, O., Zhao, J.: Using web data provenance for quality assessment. In: CEUR Workshop Proceedings (2009)

    Google Scholar 

  11. Jayawardene, V., Sadiq, S., Indulska, M.: The curse of dimensionality in data quality. In: ACIS, pp. 1–11 (2013)

    Google Scholar 

  12. Levy, A.Y.: Obtaining complete answers from incomplete databases. In: VLDB, pp. 402–412 (1996)

    Google Scholar 

  13. Levy, A.Y., Sagiv, Y.: Queries independent of updates. In: Proceedings of the VLDB, pp. 171–181 (1993)

    Google Scholar 

  14. McAfee, A.: Mastering the three worlds of information technology. Harvard Bus. Rev. 84(11), 141 (2006)

    Google Scholar 

  15. Motro, A.: Integrity = Validity + Completeness. ACM TODS 14(4), 480–502 (1989)

    Article  Google Scholar 

  16. Nutt, W., Paramonov, S., Savkovic, O.: Implementing query completeness reasoning. In CIKM, pp. 733–742 (2015)

    Google Scholar 

  17. Razniewski, S., Korn, F., Nutt, W., Srivastava, D.: Identifying the extent of completeness of query answers over partially complete databases. In: SIGMOD, pp. 561–576 (2015)

    Google Scholar 

  18. Razniewski, S., Montali, M., Nutt, W.: Verification of query completeness over processes. In: Daniel, F., Wang, J., Weber, B. (eds.) BPM 2013. LNCS, vol. 8094, pp. 155–170. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  19. Tucker, P., Maier, D., Sheard, T., Fegaras, L., et al.: Exploiting punctuation semantics in continuous data streams. TKDE 15(3), 555–568 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Razniewski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Razniewski, S., Sadiq, S., Zhou, X. (2016). Exploiting Hierarchies for Efficient Detection of Completeness in Stream Data. In: Cheema, M., Zhang, W., Chang, L. (eds) Databases Theory and Applications. ADC 2016. Lecture Notes in Computer Science(), vol 9877. Springer, Cham. https://doi.org/10.1007/978-3-319-46922-5_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46922-5_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46921-8

  • Online ISBN: 978-3-319-46922-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics