Provisional reporting for rank joins

Abid, Adnan; Tagliasacchi, Marco

doi:10.1007/s10844-012-0234-3

Provisional reporting for rank joins

Published: 23 February 2013

Volume 40, pages 479–500, (2013)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Adnan Abid^1,2 &
Marco Tagliasacchi¹

201 Accesses
6 Citations
Explore all metrics

Abstract

Rank join operators perform a relational join among two or more relations, assign numeric scores to the join results based on a given scoring function, and return K join results with the highest scores, while accessing a subset of data from the input relations. Most of the rank join operators compute a score upper bound for a join result that can be potentially obtained after retrieving the unseen data. A join result is kept in an output buffer, and is deterministically reported to the user if its score is greater than or equal to the score upper bound. The value of the score upper bound decreases subject to further data extraction from the relations. In case of Web services as data sources, which are characterized by non-negligible response time for every data fetch, the value of score upper bound might decrease slowly. Consequently, there is a long delay in reporting a join result stored in the output buffer. This paper addresses the problem of efficiently reporting a top join result obtained by joining the data of two Web services, which are characterized by non-negligible response time. We present a probabilistic reporting method which computes the confidence with which a join result may appear among final top-K joins. It reports a join result as soon as the measure of its confidence exceeds a given threshold. This helps in reporting a join result soon after its observation. An extensive experimental study with various settings of different operating parameters validates the importance of the proposed approach on both real and synthetic data sets. The results show that our proposed approach significantly reduces the average difference between the time when a join result is observed and the time when it is reported, while incurring negligible errors in the final results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Top-k spatial distance joins

Article 12 February 2020

Pay-as-you-go Approximate Join Top-k Processing for the Web of Data

Scalable top-k keyword search in relational databases

Article 06 October 2017

References

Abid, A., & Tagliasacchi, M. (2011). Parallel data access for multiway rank joins. In ICWE conference (pp. 44–58).
Arai, B., Das, G., Gunopulos, D., Koudas, N. (2007). Anytime measures for top-k algorithms. In Proceedings of the 33rd international conference on very large data bases (VLDB ’07) VLDB endowment (pp. 914–925).
Arai, B., Das, G., Gunopulos, D., Koudas, N. (2009). Anytime measures for top-k algorithms on exact and fuzzy data sets. The VLDB Journal, 18, 407–427.
Article Google Scholar
Barbará, D., Garcia-Molina, H., Porter, D. (1992). The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 4(5), 487–502.
Article Google Scholar
Braga, D., Ceri, S., Daniel, F., Martinenghi, D. (2008). Mashing up search services. IEEE Internet Computing, 12(5), 16–23.
Article Google Scholar
Bruno, N., Chaudhuri, S., Gravano, L. (2002). Top-k selection queries over relational databases: mapping strategies and performance evaluation. ACM Transactions on Database Systems, 27, 153–187.
Article Google Scholar
Bruno, N., Gravano, L., Marian, A. (2002). Evaluating top-k queries over web-accessible databases. In ICDE (p. 369).
Ceri, S. (2010). Search Computing: Challenges and Directions. Lecture Notes in Computer Science.
Chang, K.C.-C., & Hwang, S.-W. (2002). Minimal probing: supporting expensive predicates for top-k queries. In Proceedings of the 2002 ACM SIGMOD international conference on management of data, (SIGMOD ’02) (pp. 346–357). New York, USA: ACM Press.
Google Scholar
Cheng, R., Kalashnikov, D.V., Prabhakar, S. (2003). Evaluating probabilistic queries over imprecise data. In SIGMOD conference (pp. 551–562).
Dalvi, C.R.N., & Suciu, D. (2007). Efficient top-k query evaluation on probabilistic data. In ICDE (pp. 886–895).
Fagin, R. (1998). Fuzzy queries in multimedia database systems. In Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (pp. 1–10). Seattle, Washington: ACM Press.
Chapter Google Scholar
Fagin, R. (1999). Combining fuzzy information from multiple systems. Journal of Computer and System Sciences, 58(1), 83–99.
Article MathSciNet MATH Google Scholar
Fagin, R., & Wimmers, E.L. (1997). Incorporating user preferences in multimedia queries. In Proceedings of the 6th international conference on database theory (pp. 247–261). London, UK: Springer-Verlag.
Google Scholar
Horvitz, E.J. (1987). Reasoning about beliefs and actions under computational resource constraints. In Proceedings of the 1987 workshop on uncertainty in artificial intelligence (pp. 429–444).
Hua, M., Pei, J., Zhang, W., Lin, X. (2008). Efficiently answering probabilistic threshold top-k queries on uncertain data. In Proceedings of the 2008 IEEE 24th international conference on data engineering (pp. 1403–1405). Washington, DC, USA: IEEE Computer Society.
Chapter Google Scholar
Ilyas, I., Aref, W., Elmagarmid, A. (2004). Supporting top-k join queries in relational databases. The VLDB Journal, 13(3), 207–221.
Article Google Scholar
Lakshmanan, L.V.S., Leone, N., Ross, R., Subrahmanian, V.S. (1997). Probview: a flexible probabilistic database system. ACM Transactions on Database Systems, 22, 419–469.
Article Google Scholar
Lian, X., & Chen, L. (2008). Probabilistic ranked queries in uncertain databases. In Proceedings of the 11th international conference on extending database technology: Advances in database technology (EDBT ’08) (pp. 511–522). New York, USA: ACM Press.
Chapter Google Scholar
Marian, A., Bruno, N., Gravano, L. (2004). Evaluating top-k queries over web-accessible databases. ACM Transactions on Database Systems, 29(2), 319–362.
Article Google Scholar
Natsev, A., chi Chang, Y., Smith, J.R., Li, C.-S., Vitter, J.S. (2001). Supporting incremental join queries on ranked inputs. In VLDB conference (pp. 281–290).
Ślezak, D., & Kowalski, M. (2010). Towards approximate sql: infobright’s approach. In Proceedings of the 7th international conference on rough sets and current trends in computing (RSCTC’10) (pp. 630–639). Berlin, Heidelberg: Springer-Verlag.
Google Scholar
Soliman, M.A., & Ilyas, I.F. (2007). Top-k query processing in uncertain databases. In ICDE (pp. 896–905).
Theobald, M., Weikum, G., Schenkel, R. (2004). Top-k query evaluation with probabilistic guarantees. In Proceedings of the thirtieth international conference on very large data bases, VLDB endowment (VLDB ’04) (Vol. 30, pp. 648–659). .
Yi, K., Li, F., Kollios, G., Srivastava, D. (2008). Efficient processing of top-k queries in uncertain databases. In ICDE (pp. 1406–1408).

Download references

Acknowledgements

This research is part of the “Search Computing” (SeCo) project, funded by the European Research Council (ERC), under the 2008 Call for “IDEAS Advanced Grants”, dedicated to frontier research. We are thankful to Prof. Stefano Ceri for his guidance and useful discussions during this work.

Author information

Authors and Affiliations

Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci, 32 – 20133, Milano, Italy
Adnan Abid & Marco Tagliasacchi
Faculty of Information Technology, University of Central Punjab, Lahore, Pakistan
Adnan Abid

Authors

Adnan Abid
View author publications
You can also search for this author in PubMed Google Scholar
Marco Tagliasacchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adnan Abid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abid, A., Tagliasacchi, M. Provisional reporting for rank joins. J Intell Inf Syst 40, 479–500 (2013). https://doi.org/10.1007/s10844-012-0234-3

Download citation

Received: 20 May 2012
Revised: 17 December 2012
Accepted: 22 December 2012
Published: 23 February 2013
Issue Date: June 2013
DOI: https://doi.org/10.1007/s10844-012-0234-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Provisional reporting for rank joins

Abstract

Access this article

Similar content being viewed by others

Top-k spatial distance joins

Pay-as-you-go Approximate Join Top-k Processing for the Web of Data

Scalable top-k keyword search in relational databases

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Provisional reporting for rank joins

Abstract

Access this article

Similar content being viewed by others

Top-k spatial distance joins

Pay-as-you-go Approximate Join Top-k Processing for the Web of Data

Scalable top-k keyword search in relational databases

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation