Skip to main content
Log in

Efficient processing of top-k queries: selective NRA algorithms

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Efficient processing of top-k queries has drawn increasing attention from both industry and academia due to its varied applications. Lower access cost is a crucial concern for a top-k query processing. Typically, when answering a top-k query, there exist two types of accesses: sorted access and random access. In some scenarios, the latter is not supported by the data source. Fagin et al. proposed the No Random Access (NRA) algorithm (Fagin et al, J Comput Syst Sci 66:614–656, 2003) for this situation. In this paper, we motivate our work by a key observation of the NRA algorithm: the number of accesses could be further reduced by selectively (instead of in parallel) performing sorted accesses to different lists of the dataset. Based on this insight, we propose a Selective NRA (SNRA) algorithm aiming to cut down the unnecessary access cost. Later, we optimize the SNRA algorithm in terms of runtime cost and present the SNRA-opt algorithm. Furthermore, we address the problem of instance optimality theoretically and turn SNRA (and SNRA-opt) into instance optimal algorithms, termed as Hybrid-SNRA (HSNRA) and HSNRA-opt. Extensive experimental results show that our algorithms perform significantly fewer sorted accesses than NRA (and its state-of-the-art variations). In terms of runtime cost, the proposed SNRA-opt and HSNRA-opt algorithms are two orders of magnitude faster than the NRA algorithm. In addition, we discuss the parameter selection problem of the SNRA algorithms, both theoretically and experimentally.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://kdd.ics.uci.edu

  2. http://www.dianping.com

  3. c = 1, p < ∞ → HSNRA; c = 1, p = ∞ → SNRA; c > 1, p < ∞ → HSNRA-opt; c > 1, p < ∞ → SNRA-opt.

References

  • Akbarinia, R., Pacitti, E., & Valduriez, P. (2007). Best position algorithms for top-k queries. In Proceedings of the 33rd international conference on very large data bases, VLDB ’07 (pp. 495–506).

  • Balke, W., Güntzer, U., & Kießling, W. (2010). On real-time top k querying for mobile services. On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE pp. 125–143

  • Bast, H., Majumdar, D., Schenkel, R., Theobald, M., & Weikum, G. (2006). IO-Top-k: Index-access optimized top-k query processing. In Proceedings of the 32nd international conference on very large data bases, VLDB Endowment, VLDB ’06 (pp. 475–486).

  • Fagin, R. (1999). Combining fuzzy information from multiple systems. Journal of Computer and System Sciences, 58, 83–99.

    Article  MathSciNet  MATH  Google Scholar 

  • Fagin, R. (2002). Combining fuzzy information: An overview. ACM SIGMOD Record, 31(2), 109–118.

    Article  Google Scholar 

  • Fagin, R., Lotem, A., & Naor, M. (2003). Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 66, 614–656.

    Article  MathSciNet  MATH  Google Scholar 

  • Getoor, L., & Diehl, C. (2005). Link mining: A survey. ACM SIGKDD Explorations Newsletter, 7(2), 12.

    Google Scholar 

  • Güntzer, U., Balke, W., & Kie, W. (2001). Towards efficient multi-feature queries in heterogeneous environments. In Proceedings of the IEEE international conference on information technology: Coding and computing (pp. 622–628).

  • Gurský, P., & Vojtáš, P. (2008). Speeding up the nra algorithm. In Proceedings of the 2nd international conference on scalable uncertainty management, SUM ’08 (pp. 243–255).

  • Hwang, S., & Chang, K. (2007). Optimizing top-k queries for middleware access: A unified cost-based approach. ACM Transactions on Database Systems (TODS), 32(1), 5.

    Article  Google Scholar 

  • Long, X., & Suel, T. (2005). Three-level caching for efficient query processing in large web search engines. In Proceedings of the 14th international conference on world wide web, WWW ’05 (pp. 257–266). New York, NY, USA.

  • Luo, Y., Lin, X., Wang, W., & Zhou, X. (2007). Spark: Top-k keyword query in relational databases. In Proceedings of the 2007 ACM SIGMOD international conference on management of data (pp. 115–126). ACM.

  • Mamoulis, N., Yiu, M., Cheng, K., & Cheung, D. (2007). Efficient top-k aggregation of ranked inputs. ACM Transactions on Database Systems (TODS), 32(3), 19.

    Article  Google Scholar 

  • Nepal, S., & Ramakrishna, M. (1999). Query processing issues in image (multimedia) databases. In Proceedings 15th international conference on data engineering (pp. 22–29).

  • Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of information by computer. Addison-Wesley.

  • Shmueli-Scheuer, M., Li, C., Mass, Y., Roitman, H., Schenkel, R., & Weikum, G. (2009). Best-effort top-k query processing under budgetary constraints. In IEEE international conference on data engineering (pp. 928–939). IEEE.

  • Theobald, M., Weikum, G., & Schenkel, R. (2004). Top-k query evaluation with probabilistic guarantees. In Proceedings of the thirtieth international conference on very large data bases-volume 30, VLDB endowment (p. 659).

  • Wimmers, E., Haas, L., Roth, M., & Braendli, C. (1999). Using Fagin’s algorithm for merging ranked results in multimedia middleware. In Fourth IFCIS international conference on cooperative information systems, citeseer (pp. 267–278).

  • Xin, D., Han, J., & Chang, K. (2007). Progressive and selective merge: Computing top-k with ad-hoc ranking functions. In Proceedings of the 2007 ACM SIGMOD international conference on management of data (pp. 103–114). ACM.

  • Yuan, J., Sun, G. Z., Tian, Y., Chen, G., & Liu, Z. (2009). Selective-nra algorithms for top-k queries. In Proceedings of the joint international conferences on advances in data and web management, APWeb/WAIM ’09 (pp. 15–26). Berlin, Heidelberg: Springer-Verlag.

    Chapter  Google Scholar 

  • Zhu, M., Shi, S., Li, M., & Wen, J. R. (2007). Effective top-k computation in retrieving structured documents with term-proximity support. In Proceedings of the sixteenth ACM conference on conference on information and knowledge management, CIKM ’07 (pp. 771–780).

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under the grant No. 61033009 and No. 60873210. This work is also supported by the Anhui Natural Science Foundation under the grant No. 1208085QF106.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Yuan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, J., Sun, G., Luo, T. et al. Efficient processing of top-k queries: selective NRA algorithms. J Intell Inf Syst 39, 687–710 (2012). https://doi.org/10.1007/s10844-012-0208-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-012-0208-5

Keywords

Navigation