Skip to main content
Log in

TKAP: Efficiently processing top-k query on massive data by adaptive pruning

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In many applications, top-k query is an important operation to return a set of interesting points in a potentially huge data space. The existing algorithms, either maintaining too many candidates, or requiring assistant structures built on the specific attribute subset, or returning results with probabilistic guarantee, cannot process top-k query on massive data efficiently. This paper proposes a sorted-list-based TKAP algorithm, which utilizes some data structures of low space overhead, to efficiently compute top-k results on massive data. In round-robin retrieval on sorted lists, TKAP performs adaptive pruning operation and maintains the required candidates until the stop condition is satisfied. The adaptive pruning operation can be adjusted by the information obtained in round-robin retrieval to achieve a better pruning effect. The adaptive pruning rule is developed in this paper, along with its theoretical analysis. The extensive experimental results, conducted on synthetic and real-life data sets, show the significant advantage of TKAP over the existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. For attributes in T, we only consider \(A_1, \ldots , A_m\) in Sect. 4.

References

  1. Akbarinia R, Pacitti E, Valduriez P (2007) Best position algorithms for top-k queries. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp 495–506

  2. Chang YC, Bergman L, Castelli V et al (2000) The onion technique: indexing for linear optimization queries. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp 391–402

  3. Das G, Gunopulos D, Koudas N, Tsirogiannis D (2006) Answering top-k queries using views. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp 451–462

  4. Fagin R, Kumar R, Sivakumar D (2003a) Efficient similarity search and classification via rank aggregation. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp 301–312

  5. Fagin R, Lotem A, and Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp 102–113

  6. Fagin R, Lotem A, Naor M (2003b) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656

    Article  MathSciNet  MATH  Google Scholar 

  7. Fan H, Zaïane O, Foss A, Wu J (2009) Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowl Inf Syst 19(1):31–51

    Article  Google Scholar 

  8. Ge S, Hou LU, Mamoulis N, Cheung DW (2013) Efficient all top-k computation—a unified solution for all top-k, reverse top-k and top-m influential queries. IEEE Trans Knowl Data Eng 25(5):1015–1027

    Article  Google Scholar 

  9. Güntzer U, Balke WT, Kießling W (2000) Optimizing multi-feature queries for image databases. In: Proceedings of the 26th International Conference on Very Large Data Bases, pp 419–428

  10. Güntzer U, Balke WT, Kießling W (2001) Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the International Conference on Information Technology: Coding and Computing, pp 622–628

  11. Han X, Li J, Yang D (2011) Supporting early pruning in top-k query processing on massive data. Inf Process Lett 111(11):524–532

    Article  MathSciNet  MATH  Google Scholar 

  12. Han X, Li J, Yang D (2012) Pi-join: efficiently processing join queries on massive data. Knowl Inf Syst 32(3):527–557

    Article  Google Scholar 

  13. Heo JS, Cho J, Whang KY (2013) Subspace top-k query processing using the hybrid-layer index with a tight bound. Data Knowl Eng 83:1–19

    Article  Google Scholar 

  14. Hristidis V, Papakonstantinou Y (2004) Algorithms and applications for answering ranked queries using ranked views. VLDB J 13(1):49–70

    Article  Google Scholar 

  15. Ilyas I, Beskales G, Soliman M (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4):11:1–11:58

    Article  Google Scholar 

  16. Lee J, Cho H, Hwang SW (2012) Efficient dual-resolution layer indexing for top-k queries. In: Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, pp 1084–1095

  17. Mamoulis N, Yiu ML, Cheng KH, Cheung DW (2007) Efficient top-k aggregation of ranked inputs. ACM Trans Database Syst 32(3):19

    Article  Google Scholar 

  18. Pang H, Ding X, Zheng B (2010) Efficient processing of exact top-k queries over disk-resident sorted lists. VLDB J 19(3):437–456

    Article  Google Scholar 

  19. Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):57–86

    Article  Google Scholar 

  20. Xie M, Lakshmanan L, Wood P (2013) Efficient top-k query answering using cached views. In: Proceedings of the 16th International Conference on Extending Database Technology, pp 489–500

  21. Xin D, Chen C, Han J (2006) Towards robust indexing for ranked queries. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp 235–246

  22. Yang B, Huang H (2010) Topsil-miner: an efficient algorithm for mining top-k significant itemsets over data streams. Knowl Inf Syst 23(2):225–242

    Article  Google Scholar 

  23. Zou L, Chen L (2011) Pareto-based dominant graph: an efficient indexing structure to answer top-k queries. IEEE Trans Knowl Data Eng 23(5):727–741

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Basic Research (973) Program of China under Grant No. 2012CB316200, the National Natural Science Foundation of China under Grant Nos. 61402130, 61272046, 61190115, 61173022, 61033015, Shandong Provincial Natural Science Foundation under Grant No. ZR2013FQ028, Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under Grant Nos. HIT.NSRIF.2014136 and HIT(WH)201308.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xixian Han.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, X., Liu, X., Li, J. et al. TKAP: Efficiently processing top-k query on massive data by adaptive pruning. Knowl Inf Syst 47, 301–328 (2016). https://doi.org/10.1007/s10115-015-0836-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0836-5

Keywords

Navigation