TKAP: Efficiently processing top-k query on massive data by adaptive pruning

Han, Xixian; Liu, Xianmin; Li, Jianzhong; Gao, Hong

doi:10.1007/s10115-015-0836-5

TKAP: Efficiently processing top-k query on massive data by adaptive pruning

Regular Paper
Published: 01 May 2015

Volume 47, pages 301–328, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Xixian Han¹,
Xianmin Liu¹,
Jianzhong Li¹ &
…
Hong Gao¹

387 Accesses
5 Citations
Explore all metrics

Abstract

In many applications, top-k query is an important operation to return a set of interesting points in a potentially huge data space. The existing algorithms, either maintaining too many candidates, or requiring assistant structures built on the specific attribute subset, or returning results with probabilistic guarantee, cannot process top-k query on massive data efficiently. This paper proposes a sorted-list-based TKAP algorithm, which utilizes some data structures of low space overhead, to efficiently compute top-k results on massive data. In round-robin retrieval on sorted lists, TKAP performs adaptive pruning operation and maintains the required candidates until the stop condition is satisfied. The adaptive pruning operation can be adjusted by the information obtained in round-robin retrieval to achieve a better pruning effect. The adaptive pruning rule is developed in this paper, along with its theoretical analysis. The extensive experimental results, conducted on synthetic and real-life data sets, show the significant advantage of TKAP over the existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate Continuous Top-k Query over Sliding Window

Article 11 January 2017

Uncertain top-k query processing in distributed environments

Article 20 November 2015

Generic Top-k Query Processing with Breadth-First Strategies

Notes

For attributes in T, we only consider \(A_1, \ldots , A_m\) in Sect. 4.

References

Akbarinia R, Pacitti E, Valduriez P (2007) Best position algorithms for top-k queries. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp 495–506
Chang YC, Bergman L, Castelli V et al (2000) The onion technique: indexing for linear optimization queries. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp 391–402
Das G, Gunopulos D, Koudas N, Tsirogiannis D (2006) Answering top-k queries using views. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp 451–462
Fagin R, Kumar R, Sivakumar D (2003a) Efficient similarity search and classification via rank aggregation. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp 301–312
Fagin R, Lotem A, and Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp 102–113
Fagin R, Lotem A, Naor M (2003b) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656
Article MathSciNet MATH Google Scholar
Fan H, Zaïane O, Foss A, Wu J (2009) Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowl Inf Syst 19(1):31–51
Article Google Scholar
Ge S, Hou LU, Mamoulis N, Cheung DW (2013) Efficient all top-k computation—a unified solution for all top-k, reverse top-k and top-m influential queries. IEEE Trans Knowl Data Eng 25(5):1015–1027
Article Google Scholar
Güntzer U, Balke WT, Kießling W (2000) Optimizing multi-feature queries for image databases. In: Proceedings of the 26th International Conference on Very Large Data Bases, pp 419–428
Güntzer U, Balke WT, Kießling W (2001) Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the International Conference on Information Technology: Coding and Computing, pp 622–628
Han X, Li J, Yang D (2011) Supporting early pruning in top-k query processing on massive data. Inf Process Lett 111(11):524–532
Article MathSciNet MATH Google Scholar
Han X, Li J, Yang D (2012) Pi-join: efficiently processing join queries on massive data. Knowl Inf Syst 32(3):527–557
Article Google Scholar
Heo JS, Cho J, Whang KY (2013) Subspace top-k query processing using the hybrid-layer index with a tight bound. Data Knowl Eng 83:1–19
Article Google Scholar
Hristidis V, Papakonstantinou Y (2004) Algorithms and applications for answering ranked queries using ranked views. VLDB J 13(1):49–70
Article Google Scholar
Ilyas I, Beskales G, Soliman M (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4):11:1–11:58
Article Google Scholar
Lee J, Cho H, Hwang SW (2012) Efficient dual-resolution layer indexing for top-k queries. In: Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, pp 1084–1095
Mamoulis N, Yiu ML, Cheng KH, Cheung DW (2007) Efficient top-k aggregation of ranked inputs. ACM Trans Database Syst 32(3):19
Article Google Scholar
Pang H, Ding X, Zheng B (2010) Efficient processing of exact top-k queries over disk-resident sorted lists. VLDB J 19(3):437–456
Article Google Scholar
Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):57–86
Article Google Scholar
Xie M, Lakshmanan L, Wood P (2013) Efficient top-k query answering using cached views. In: Proceedings of the 16th International Conference on Extending Database Technology, pp 489–500
Xin D, Chen C, Han J (2006) Towards robust indexing for ranked queries. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp 235–246
Yang B, Huang H (2010) Topsil-miner: an efficient algorithm for mining top-k significant itemsets over data streams. Knowl Inf Syst 23(2):225–242
Article Google Scholar
Zou L, Chen L (2011) Pareto-based dominant graph: an efficient indexing structure to answer top-k queries. IEEE Trans Knowl Data Eng 23(5):727–741
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Basic Research (973) Program of China under Grant No. 2012CB316200, the National Natural Science Foundation of China under Grant Nos. 61402130, 61272046, 61190115, 61173022, 61033015, Shandong Provincial Natural Science Foundation under Grant No. ZR2013FQ028, Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under Grant Nos. HIT.NSRIF.2014136 and HIT(WH)201308.

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Xixian Han, Xianmin Liu, Jianzhong Li & Hong Gao

Authors

Xixian Han
View author publications
You can also search for this author in PubMed Google Scholar
Xianmin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xixian Han.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, X., Liu, X., Li, J. et al. TKAP: Efficiently processing top-k query on massive data by adaptive pruning. Knowl Inf Syst 47, 301–328 (2016). https://doi.org/10.1007/s10115-015-0836-5

Download citation

Received: 19 August 2014
Revised: 20 February 2015
Accepted: 10 April 2015
Published: 01 May 2015
Issue Date: May 2016
DOI: https://doi.org/10.1007/s10115-015-0836-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TKAP: Efficiently processing top-k query on massive data by adaptive pruning

Abstract

Access this article

Similar content being viewed by others

Approximate Continuous Top-k Query over Sliding Window

Uncertain top-k query processing in distributed environments

Generic Top-k Query Processing with Breadth-First Strategies

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TKAP: Efficiently processing top-k query on massive data by adaptive pruning

Abstract

Access this article

Similar content being viewed by others

Approximate Continuous Top-k Query over Sliding Window

Uncertain top-k query processing in distributed environments

Generic Top-k Query Processing with Breadth-First Strategies

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation