skip to main content
10.1145/1247480.1247494acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Progressive and selective merge: computing top-k with ad-hoc ranking functions

Published: 11 June 2007 Publication History

Abstract

The family of threshold algorithm (ie, TA) has been widely studied for efficiently computing top-k queries. TA uses a sort-merge framework that assumes data lists are pre-sorted, and the ranking functions are monotone. However, in many database applications, attribute values are indexed by tree-structured indices (eg, B-tree, R-tree), and the ranking functions are not necessarily monotone. To answer top-k queries with ad-hoc ranking functions, this paper studies anindex-merge paradigm that performs progressive search over the space of joint states composed by multiple index nodes.
We address two challenges for efficient query processing. First, to minimize the search complexity, we present a double-heap algorithm which supports not only progressive state search but also progressive state generation. Second, to avoid unnecessary disk access, we characterize a type of "empty-state" that does not contribute to the final results, and propose a new materialization model, join-signature, to prune empty-states. Our performance study shows that the proposed method achieves one order of magnitude speed-up over baseline solutions.

References

[1]
H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. Io-top-k: Index-access optimized top-k query processing. In VLDB, pages 475--486, 2006.
[2]
K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In SIGMOD Conference, pages 359--370, 1999.
[3]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970.
[4]
C. Bohm and H. P. Kriegel. Determining the convex hull in large multidimensional databases. In DaWaK, pages 294--306. Springer-Verlag, 2001.
[5]
S. Borzsonyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421--430, 2001.
[6]
T. Brinkhoff, H. P. Kriegel, and B. Seeger. Efficient processing of spatial joins using r-trees. In SIGMOD Conference, pages 237--246, 1993.
[7]
N. Bruno, L. Gravano, and A. Marian. Evaluating top-k queries over web-accessible databases. In ICDE, 2002.
[8]
K. Chakrabarti, V. Ganti, J. Han, and D. Xin. Ranking objects by exploiting relationships: computing top-k over aggregation. In SIGMOD Conference, pages 371--382, 2006.
[9]
S. Churdhuri and U. Dayal. An overview of data warehousing and data cube. SIGMOD Record, 26:65--74, 1997.
[10]
R. Fagin. Fuzzy queries in multimedia database systems. In PODS, pages 1--10, 1998.
[11]
R. Fagin. Combining fuzzy information: an overview. SIGMOD Record, 31(2):109--118, 2002.
[12]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001.
[13]
A. Fraenkel and S. Klein. Novel compression of sparse bit-strings-preliminary report. Combinatorial Algorithms on Words, NATO ASI Series, 12:169--183, 1985.
[14]
H. Garcia-Molina, J. D. Ullman, and J. Widom. Database Systems: The Complete Book. Prentice Hall, 2002.
[15]
G. R. Hjaltason and H. Samet. Incremental distance join algorithms for spatial databases. In SIGMOD Conference, pages 237--248, 1998.
[16]
S. Michel, P. Triantafillou, and G. Weikum. Klee: a framework for distributed top-k query algorithms. In VLDB, pages 637--648, 2005.
[17]
K. Morfonios and Y. Ioannidis. Cure for cubes: cubing using a rolap engine. In VLDB, pages 379--390, 2006.
[18]
D. Papadias, Y. Tao, G. Fu, and B. Seeger. Progressive skyline computation in database systems. ACM Trans. Database Syst., 30(1):41--82, 2005.
[19]
H. Shin, B. Moon, and S. Lee. Adaptive and incremental processing for distance join queries. IEEE Trans. Knowl. Data Eng., 15(6):1561--1578, 2003.
[20]
P. Valduriez. Join indices. ACM Trans. Database Systems, 12:218--246, 1987.
[21]
D. Xin, J. Han, H. Cheng, and X. Li. Answering top-k queries with multi-dimensional selections: The ranking cube approach. In VLDB, pages 463--475, 2006.
[22]
Z. Zhang, S. won Hwang, K. C. C. Chang, M. Wang, C. A. Lang, and Y. C. Chang. Boolean + ranking: querying a database by k-constrained optimization. In SIGMOD Conference, pages 359--370, 2006.
[23]
M. Zhu, D. Papadias, J. Zhang, and D. L. Lee. Top-k spatial joins. IEEE Trans. Knowl. Data Eng., 17(4):567--579, 2005.

Cited By

View all
  • (2024)QueryArtisan: Generating Data Manipulation Codes for Ad-hoc Analysis in Data LakesProceedings of the VLDB Endowment10.14778/3705829.370583218:2(108-116)Online publication date: 1-Oct-2024
  • (2024)Top-k on Sequences: A New Approach to Enhanced Similarity SearchInformation Integration and Web Intelligence10.1007/978-3-031-78090-5_20(236-251)Online publication date: 4-Dec-2024
  • (2021)A Memory-Efficient Adaptive Optimal Binary Search Tree Architecture for IPV6 Lookup AddressMobile Computing and Sustainable Informatics10.1007/978-981-16-1866-6_57(749-764)Online publication date: 23-Jul-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
June 2007
1210 pages
ISBN:9781595936868
DOI:10.1145/1247480
  • General Chairs:
  • Lizhu Zhou,
  • Tok Wang Ling,
  • Program Chair:
  • Beng Chin Ooi
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. progressive merge
  2. selective merge
  3. top-k query

Qualifiers

  • Article

Conference

SIGMOD/PODS07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)QueryArtisan: Generating Data Manipulation Codes for Ad-hoc Analysis in Data LakesProceedings of the VLDB Endowment10.14778/3705829.370583218:2(108-116)Online publication date: 1-Oct-2024
  • (2024)Top-k on Sequences: A New Approach to Enhanced Similarity SearchInformation Integration and Web Intelligence10.1007/978-3-031-78090-5_20(236-251)Online publication date: 4-Dec-2024
  • (2021)A Memory-Efficient Adaptive Optimal Binary Search Tree Architecture for IPV6 Lookup AddressMobile Computing and Sustainable Informatics10.1007/978-981-16-1866-6_57(749-764)Online publication date: 23-Jul-2021
  • (2020)Evaluating top-k queries with inconsistency degreesProceedings of the VLDB Endowment10.14778/3407790.340781513:12(2146-2158)Online publication date: 1-Jul-2020
  • (2020)Laser2Vec: Similarity-based Retrieval for Robotic Perception Data2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS45743.2020.9340815(10657-10662)Online publication date: 24-Oct-2020
  • (2020)Top-k spatial distance joinsGeoInformatica10.1007/s10707-020-00393-z24:3(591-631)Online publication date: 12-Feb-2020
  • (2020)Index-based, High-dimensional, Cosine Threshold Querying with Optimality GuaranteesTheory of Computing Systems10.1007/s00224-020-10009-6Online publication date: 26-Oct-2020
  • (2016)Quark-XProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983727(831-840)Online publication date: 24-Oct-2016
  • (2016)Evaluating Top-N queries in n-dimensional normed spacesInformation Sciences: an International Journal10.1016/j.ins.2016.09.035374:C(255-275)Online publication date: 20-Dec-2016
  • (2015)Verification of Top-K Algorithm for a Family of Non-monotonic Ranking Functions2015 IEEE International Conference on Systems, Man, and Cybernetics10.1109/SMC.2015.462(2643-2648)Online publication date: Oct-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media