Abstract
Efficient processing of top-k queries has drawn increasing attention from both industry and academia due to its varied applications. Lower access cost is a crucial concern for a top-k query processing. Typically, when answering a top-k query, there exist two types of accesses: sorted access and random access. In some scenarios, the latter is not supported by the data source. Fagin et al. proposed the No Random Access (NRA) algorithm (Fagin et al, J Comput Syst Sci 66:614–656, 2003) for this situation. In this paper, we motivate our work by a key observation of the NRA algorithm: the number of accesses could be further reduced by selectively (instead of in parallel) performing sorted accesses to different lists of the dataset. Based on this insight, we propose a Selective NRA (SNRA) algorithm aiming to cut down the unnecessary access cost. Later, we optimize the SNRA algorithm in terms of runtime cost and present the SNRA-opt algorithm. Furthermore, we address the problem of instance optimality theoretically and turn SNRA (and SNRA-opt) into instance optimal algorithms, termed as Hybrid-SNRA (HSNRA) and HSNRA-opt. Extensive experimental results show that our algorithms perform significantly fewer sorted accesses than NRA (and its state-of-the-art variations). In terms of runtime cost, the proposed SNRA-opt and HSNRA-opt algorithms are two orders of magnitude faster than the NRA algorithm. In addition, we discuss the parameter selection problem of the SNRA algorithms, both theoretically and experimentally.
Similar content being viewed by others
Notes
c = 1, p < ∞ → HSNRA; c = 1, p = ∞ → SNRA; c > 1, p < ∞ → HSNRA-opt; c > 1, p < ∞ → SNRA-opt.
References
Akbarinia, R., Pacitti, E., & Valduriez, P. (2007). Best position algorithms for top-k queries. In Proceedings of the 33rd international conference on very large data bases, VLDB ’07 (pp. 495–506).
Balke, W., Güntzer, U., & Kießling, W. (2010). On real-time top k querying for mobile services. On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE pp. 125–143
Bast, H., Majumdar, D., Schenkel, R., Theobald, M., & Weikum, G. (2006). IO-Top-k: Index-access optimized top-k query processing. In Proceedings of the 32nd international conference on very large data bases, VLDB Endowment, VLDB ’06 (pp. 475–486).
Fagin, R. (1999). Combining fuzzy information from multiple systems. Journal of Computer and System Sciences, 58, 83–99.
Fagin, R. (2002). Combining fuzzy information: An overview. ACM SIGMOD Record, 31(2), 109–118.
Fagin, R., Lotem, A., & Naor, M. (2003). Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 66, 614–656.
Getoor, L., & Diehl, C. (2005). Link mining: A survey. ACM SIGKDD Explorations Newsletter, 7(2), 12.
Güntzer, U., Balke, W., & Kie, W. (2001). Towards efficient multi-feature queries in heterogeneous environments. In Proceedings of the IEEE international conference on information technology: Coding and computing (pp. 622–628).
Gurský, P., & Vojtáš, P. (2008). Speeding up the nra algorithm. In Proceedings of the 2nd international conference on scalable uncertainty management, SUM ’08 (pp. 243–255).
Hwang, S., & Chang, K. (2007). Optimizing top-k queries for middleware access: A unified cost-based approach. ACM Transactions on Database Systems (TODS), 32(1), 5.
Long, X., & Suel, T. (2005). Three-level caching for efficient query processing in large web search engines. In Proceedings of the 14th international conference on world wide web, WWW ’05 (pp. 257–266). New York, NY, USA.
Luo, Y., Lin, X., Wang, W., & Zhou, X. (2007). Spark: Top-k keyword query in relational databases. In Proceedings of the 2007 ACM SIGMOD international conference on management of data (pp. 115–126). ACM.
Mamoulis, N., Yiu, M., Cheng, K., & Cheung, D. (2007). Efficient top-k aggregation of ranked inputs. ACM Transactions on Database Systems (TODS), 32(3), 19.
Nepal, S., & Ramakrishna, M. (1999). Query processing issues in image (multimedia) databases. In Proceedings 15th international conference on data engineering (pp. 22–29).
Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of information by computer. Addison-Wesley.
Shmueli-Scheuer, M., Li, C., Mass, Y., Roitman, H., Schenkel, R., & Weikum, G. (2009). Best-effort top-k query processing under budgetary constraints. In IEEE international conference on data engineering (pp. 928–939). IEEE.
Theobald, M., Weikum, G., & Schenkel, R. (2004). Top-k query evaluation with probabilistic guarantees. In Proceedings of the thirtieth international conference on very large data bases-volume 30, VLDB endowment (p. 659).
Wimmers, E., Haas, L., Roth, M., & Braendli, C. (1999). Using Fagin’s algorithm for merging ranked results in multimedia middleware. In Fourth IFCIS international conference on cooperative information systems, citeseer (pp. 267–278).
Xin, D., Han, J., & Chang, K. (2007). Progressive and selective merge: Computing top-k with ad-hoc ranking functions. In Proceedings of the 2007 ACM SIGMOD international conference on management of data (pp. 103–114). ACM.
Yuan, J., Sun, G. Z., Tian, Y., Chen, G., & Liu, Z. (2009). Selective-nra algorithms for top-k queries. In Proceedings of the joint international conferences on advances in data and web management, APWeb/WAIM ’09 (pp. 15–26). Berlin, Heidelberg: Springer-Verlag.
Zhu, M., Shi, S., Li, M., & Wen, J. R. (2007). Effective top-k computation in retrieving structured documents with term-proximity support. In Proceedings of the sixteenth ACM conference on conference on information and knowledge management, CIKM ’07 (pp. 771–780).
Acknowledgements
This work is supported by the National Natural Science Foundation of China under the grant No. 61033009 and No. 60873210. This work is also supported by the Anhui Natural Science Foundation under the grant No. 1208085QF106.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yuan, J., Sun, G., Luo, T. et al. Efficient processing of top-k queries: selective NRA algorithms. J Intell Inf Syst 39, 687–710 (2012). https://doi.org/10.1007/s10844-012-0208-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-012-0208-5