Abstract.
Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.
Similar content being viewed by others
References
Bruno N, Chaudhuri S, Gravano L (2002) Top-k selection queries over relational databases: mapping strategies and performance evaluation. ACM Trans Database Sys (TODS) 27(2):369-380
Bruno N, Gravano L, Marian A (2002) Evaluating top-k queries over web-accessible databases. In: Proceedings of the IEEE 18th international conference on data engineering (ICDE), San Jose, CA, pp 153-187
Carey MJ, Kossmann D (1997) On saying “Enough already!” in SQL. In: Proceedings of the ACM SIGMOD international conference on management of data, Tucson, AZ, pp 219-230
Carey MJ, Kossmann D (1998) Reducing the braking distance of an SQL query engine. In: Proceedings of the 24th international conference on very large databases (VLDB), New York, August 1998, pp 158-169. Morgan Kaufmann, San Francisco
Chen-Chuan Chang K, won Hwang S (2002) Minimal probing: supporting expensive predicates for top-k queries. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 346-357
Diaconis P (1988) Group representation in probability and statistics. IMS Lecture Series 11, IMS
Diaconis P, Graham R (1977) Spearman’s footrule as a measure of disarray. J R Stat Soc 39(2):262-368
Dwork C, Ravi Kumar S, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. In: Proceedings of the 10th international conference on the World Wide Web, Hong Kong, pp 613-622
Fagin R (1999) Combining fuzzy information from multiple systems. J Comput Sys Sci 58(1):216-226
Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS), Santa Barbara, CA, pp 102-113
Güntzer U, Balke W-T, Kießling W (2000) Optimizing multi-feature queries for image databases. In: Proceedings of the 26th international conference on very large databases (VLDB), Cairo, Egypt. Morgan Kaufmann, San Francisco, pp 419-428
Güntzer U, Balke W-T, Kießling W (2001) Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the IEEE international symposium on information technology (ITCC), Las Vegas, pp 622-628
Haas PJ, Hellerstein JM (1999) Ripple joins for online aggregation. In: Proceedings of the ACM SIGMOD international conference on management of data, Philadelphia, pp 287-298
Hong W, Stonebraker M (1993) Optimization of parallel query execution plans in XPRS. Distrib Parallel Databases 1(1):9-32
Ilyas IF, Aref WG, Elmagarmid AK (2002) Joining ranked inputs in practice. In: Proceedings of the 28th international conference on very large databases (VLDB), Hong Kong. Morgan Kaufmann, San Francisco, pp 950-961
Natsev A, Chang Y-C, Smith JR, Li C-S, Vitter JS (2001) Supporting incremental join queries on ranked inputs. In: Proceedings of the 27th international conference on very large databases (VLDB), Rome, pp 281-290. Morgan Kaufmann, San Francisco
Nepal S, Ramakrishna MV (1999) Query processing issues in image (multimedia) databases. In: Proceedings of the IEEE 15th international conference on data engineering (ICDE), Sydney, Australia, pp 22-29
Selinger PG, Astrahan MM, Chamberlin DD, Lorie Ra, Price TG (1979) Access path election in a relational database management system. In: Proceedings of the ACM SIGMOD international conference on management of data, Boston, pp 23-34
Seshadri P, Paskin M (1997) Predator: An or-dbms with enhanced data types. In: Proceedings of the ACM SIGMOD international conference on management of data, Tucson, AZ, pp 568-571
Urhan T, Franklin MJ (2000) XJoin: A reactively scheduled pipelined join operator. IEEE Data Eng Bull 23(2):27-33
Urhan T, Franklin MJ (2001) Dynamic pipeline scheduling for improving interactive query performance. In: Proceedings of the 27th international conference on very large databases (VLDB), Rome. Morgan Kaufmann, San Francisco, pp 501-510
Wilschut AN, Apers PMG (1991) Dataflow query execution in a parallel main-memory environment. Distrib Parallel Databases 1(1):68-77
Author information
Authors and Affiliations
Corresponding author
Additional information
Received: 23 December 2003, Accepted: 31 March 2004, Published online: 12 August 2004
Edited by: S. Abiteboul
Extended version of the paper published in the Proceedings of the 29th International Conference on Very Large Databases, VLDB 2003, Berlin, Germany, pp 754-765
Rights and permissions
About this article
Cite this article
Ilyas, I.F., Aref, W.G. & Elmagarmid, A.K. Supporting top-k join queries in relational databases. VLDB 13, 207–221 (2004). https://doi.org/10.1007/s00778-004-0128-2
Issue Date:
DOI: https://doi.org/10.1007/s00778-004-0128-2