Exploiting Hierarchies for Efficient Detection of Completeness in Stream Data

Razniewski, Simon; Sadiq, Shazia; Zhou, Xiaofang

doi:10.1007/978-3-319-46922-5_33

Exploiting Hierarchies for Efficient Detection of Completeness in Stream Data

Simon Razniewski¹⁶,
Shazia Sadiq¹⁷ &
Xiaofang Zhou¹⁷

Conference paper
First Online: 21 September 2016

2101 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9877))

Abstract

In big data settings, the data can often be externally sourced with little or no knowledge of its quality. In such settings, users need to be empowered with the capacity to understand the quality of data sets and implications for use, in order to mitigate the risk of making investments in datasets that will not deliver. In this paper we present an approach for detecting the completeness of high volume stream data generated by a large number of data providers. By exploiting the inherent hierarchies within database attributes, we are able to devise an efficient solution for computing query specific completeness, thereby improving user understanding of implications of using query results based on incomplete data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
There are actually 14 corporations in the transit sector with that acronym listed on Wikipedia.
2.
http://goo.gl/x2kZD5, for instance, cites 1200 buses in Brisbane in 2012.

References

https://en.wikipedia.org/wiki/Bus_bunching
Abiteboul, S., Dong, L., Etzioni, O., Srivastava, D., Weikum, G., Stoyanovich, J., Suchanek, F.M.: The elephant in the room: getting value from big data. In: WebDB, pp. 1–5. ACM (2015)
Google Scholar
Ashton, K.: That ‘internet of things’ thing. RFiD J. 22(7), 97–114 (2009)
Google Scholar
Biswas, J., Naumann, F., Qiu, Q.: Assessing the completeness of sensor data. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 717–732. Springer, Heidelberg (2006)
Chapter Google Scholar
Blakeley, J.A., Coburn, N., Larson, P.: Updating derived relations: Detecting irrelevant and autonomously computable updates. In: VLDB (1986)
Google Scholar
Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: ICDE (2007)
Google Scholar
Brown, P., Link, S.: Probabilistic keys for data quality management. In: Zdravkovic, J., Kirikova, M., Johannesson, P. (eds.) CAiSE 2015. LNCS, vol. 9097, pp. 118–132. Springer, Heidelberg (2015)
Chapter Google Scholar
Golab, L., Johnson, T.: Consistency in a stream warehouse. In: CIDR, pp. 114–122 (2011)
Google Scholar
Tamer Özsu, M., Golab, L.: Issues in data stream management. ACM Sigmod Rec. 32(2), 5–14 (2003)
Article MATH Google Scholar
Hartig, O., Zhao, J.: Using web data provenance for quality assessment. In: CEUR Workshop Proceedings (2009)
Google Scholar
Jayawardene, V., Sadiq, S., Indulska, M.: The curse of dimensionality in data quality. In: ACIS, pp. 1–11 (2013)
Google Scholar
Levy, A.Y.: Obtaining complete answers from incomplete databases. In: VLDB, pp. 402–412 (1996)
Google Scholar
Levy, A.Y., Sagiv, Y.: Queries independent of updates. In: Proceedings of the VLDB, pp. 171–181 (1993)
Google Scholar
McAfee, A.: Mastering the three worlds of information technology. Harvard Bus. Rev. 84(11), 141 (2006)
Google Scholar
Motro, A.: Integrity = Validity + Completeness. ACM TODS 14(4), 480–502 (1989)
Article Google Scholar
Nutt, W., Paramonov, S., Savkovic, O.: Implementing query completeness reasoning. In CIKM, pp. 733–742 (2015)
Google Scholar
Razniewski, S., Korn, F., Nutt, W., Srivastava, D.: Identifying the extent of completeness of query answers over partially complete databases. In: SIGMOD, pp. 561–576 (2015)
Google Scholar
Razniewski, S., Montali, M., Nutt, W.: Verification of query completeness over processes. In: Daniel, F., Wang, J., Weber, B. (eds.) BPM 2013. LNCS, vol. 8094, pp. 155–170. Springer, Heidelberg (2013)
Chapter Google Scholar
Tucker, P., Maier, D., Sheard, T., Fegaras, L., et al.: Exploiting punctuation semantics in continuous data streams. TKDE 15(3), 555–568 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Free University of Bozen-Bolzano, Bolzano, Italy
Simon Razniewski
School of ITEE, The University of Queensland, Brisbane, Australia
Shazia Sadiq & Xiaofang Zhou

Authors

Simon Razniewski
View author publications
You can also search for this author in PubMed Google Scholar
Shazia Sadiq
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofang Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon Razniewski .

Editor information

Editors and Affiliations

Monash University , Clayton, Australia
Muhammad Aamir Cheema
School of Comp. Science a. Engineer, University of New South Wales School of Comp. Science a. Engineer, Sydney, Australia
Wenjie Zhang
University of New South Wales , Sydney, New South Wales, Australia
Lijun Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Razniewski, S., Sadiq, S., Zhou, X. (2016). Exploiting Hierarchies for Efficient Detection of Completeness in Stream Data. In: Cheema, M., Zhang, W., Chang, L. (eds) Databases Theory and Applications. ADC 2016. Lecture Notes in Computer Science(), vol 9877. Springer, Cham. https://doi.org/10.1007/978-3-319-46922-5_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-46922-5_33
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46921-8
Online ISBN: 978-3-319-46922-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics