TDEP: efficiently processing top-k dominating query on massive data

Han, Xixian; Li, Jianzhong; Gao, Hong

doi:10.1007/s10115-013-0728-5

TDEP: efficiently processing top-k dominating query on massive data

Regular Paper
Published: 10 January 2014

Volume 43, pages 689–718, (2015)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Xixian Han¹,
Jianzhong Li¹ &
Hong Gao¹

556 Accesses
18 Citations
Explore all metrics

Abstract

In many applications including multicriteria decision making, top-k dominating query is a practically useful tool to return k tuples with the highest domination scores in a potentially huge data space. The existing algorithms, either requiring indexes built on the specific attribute subset, or incurring high I/O cost or memory cost, cannot process top-k dominating query on massive data efficiently. In this paper, a novel algorithm TDEP is proposed to utilize sorted lists built for each attribute with low cost to return top-k dominating result on massive data efficiently. Through analysis, it is found that TDEP can be divided into two phases: growing phase and shrinking phase. In each phase, TDEP retrieves the sorted lists in round-robin fashion and maintains the candidates until the stop condition is satisfied. The theoretical analysis is provided for the execution behavior in two phases. An efficient method is developed to compute the domination scores of tuples with the obtained candidates only. Besides, TDEP adopts early pruning to reduce the number of candidate tuples maintained significantly. The extensive experimental results, conducted on synthetic and real-life data sets, show the significant performance advantage of TDEP over the existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient processing of top-k dominating queries in distributed environments

Article 29 April 2015

Efficient Processing of Top-K Dominating Queries on Incomplete Data Using MapReduce

Top-k dominating queries on incomplete large dataset

Article 17 August 2021

Notes

In this paper, we consider that the tuples are in general position [24].
In the following section, we will see that the practical probability is even much higher.
http://basketballreference.com/.

References

Akbarinia R, Pacitti E, Valduriez P (2007) Best position algorithms for top-k queries. In: Proceedings of the 33rd international conference on very large data bases, pp 495–506
Bloom B (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7):422–426
Article MATH Google Scholar
Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data engineering, pp 421–430
Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: Proceedings of the 19th international conference on data engineering, pp 717–719
Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 102–113
Fagin R, Lotem A, Naor M (2003) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656
Article MATH MathSciNet Google Scholar
Fan H, Zaïane O, Foss A, Wu J (2009) Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowl Inf Syst 19(1):31–51
Article Google Scholar
Feller W (1968) An introduction to probability theory, vol 1, 3rd edn. Wiley, New York
MATH Google Scholar
Godfrey P, Shipley R, Gryz J (2007) Algorithms and analyses for maximal vector computation. VLDB J 16(1):5–28
Article Google Scholar
Güntzer U, Balke W, Kießling W (2000) Optimizing multi-feature queries for image databases. In: Proceedings of the 26th international conference on very large data bases, pp 419–428
Güntzer U, Balke W, Kießling W (2001) Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the international conference on information technology: coding and computing, pp 622–628
Han X, Li J, Yang D (2011) Supporting early pruning in top-k query processing on massive data. Inf Process Lett 111(11):524–532
Article MATH MathSciNet Google Scholar
Han X, Li J, Wang J, Yang D (2013) Efficient skyline computation on big data. IEEE Trans Knowl Data Eng 25(11):2521–2535
Article Google Scholar
Hose K, Vlachou A (2012) A survey of skyline processing in highly distributed environments. VLDB J 21(3):359–384
Article Google Scholar
Huang Z, Sun S, Wang W (2010) Efficient mining of skyline objects in subspaces over data streams. Knowl Inf Syst 22(2):159–183
Article Google Scholar
Ilyas I, Beskales G, Soliman M (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv 40(4):11
Article Google Scholar
Kontaki M, Papadopoulos A, Manolopoulos Y (2012) Continuous top-k dominating queries. IEEE Trans Knowl Data Eng 24(5):840–853
Article Google Scholar
Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th international conference on very large data bases, pp 275–286
Lian X, Chen L (2009) Top-k dominating queries in uncertain databases. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 660–671
Mamoulis N, Yiu M, Cheng K, Cheung D (2007) Efficient top-k aggregation of ranked inputs. ACM Trans Database Syst 32(3):19
Article Google Scholar
Pang H, Ding X, Zheng B (2010) Efficient processing of exact top-k queries over disk-resident sorted lists. VLDB J 19(3):437–456
Article Google Scholar
Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82
Article Google Scholar
Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):57–86
Article Google Scholar
Sheng C, Tao Y (2011) On finding skylines in external memory. In: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 107–116
Skoutas D, Sacharidis D, Simitsis A, Kantere V, Sellis T (2009) Top-k dominant web services under multi-criteria matching. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, pp 898–909
Sun S, Huang Z, Zhong H, Dai D, Liu H, Li J (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25(3):575–606
Article Google Scholar
Tan K, Eng P, Ooi B (2001) Efficient progressive skyline computation. In: Proceedings of the 27th international conference on very large data bases, pp 301–310
Tiakas E, Papadopoulos A, Manolopoulos Y (2011) Progressive processing of subspace dominating queries. VLDB J 20(6):921–948
Article Google Scholar
Yang B, Huang H (2010) Topsil-miner: an efficient algorithm for mining top-k significant itemsets over data streams. Knowl Inf Syst 23(2):225–242
Article Google Scholar
Yiu M, Mamoulis N (2007) Efficient processing of top-k dominating queries on multi-dimensional data. In: Proceedings of the 33rd international conference on very large data, bases, pp 483–494
Yiu M, Mamoulis N (2009) Multi-dimensional top-k dominating queries. VLDB J 18(3):695–718
Article Google Scholar
Zhang W, Lin X, Zhang Y, Pei J, Wang W (2010) Threshold-based probabilistic top-k dominating queries. VLDB J 19(2):283–305
Article Google Scholar
Zipf G (1949) Human behaviour and the principle of least-effort. Addison-Wesley, Cambridge
Google Scholar

Download references

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions. This work was supported in part by the National Basic Research (973) Program of China under Grant No. 2012CB316200, the National Natural Science Foundation of China under Grant Nos. 61190115, 61173022, 61033015, 60831160525, 61272046, and 60903016, Shandong Provincial Natural Science Foundation under Grant No. ZR2013FQ028, Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under Grant Nos. HIT.NSRIF.2014136 and HIT(WH)201308, and National Science & Technology Pillar Program under Grant Nos. 2012BAA13B01, 2012BAH10F03, 2013BAH17F00.

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Xixian Han, Jianzhong Li & Hong Gao

Authors

Xixian Han
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xixian Han.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, X., Li, J. & Gao, H. TDEP: efficiently processing top-k dominating query on massive data. Knowl Inf Syst 43, 689–718 (2015). https://doi.org/10.1007/s10115-013-0728-5

Download citation

Received: 27 May 2013
Revised: 22 September 2013
Accepted: 27 December 2013
Published: 10 January 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s10115-013-0728-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TDEP: efficiently processing top-k dominating query on massive data

Abstract

Access this article

Similar content being viewed by others

Efficient processing of top-k dominating queries in distributed environments

Efficient Processing of Top-K Dominating Queries on Incomplete Data Using MapReduce

Top-k dominating queries on incomplete large dataset

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TDEP: efficiently processing top-k dominating query on massive data

Abstract

Access this article

Similar content being viewed by others

Efficient processing of top-k dominating queries in distributed environments

Efficient Processing of Top-K Dominating Queries on Incomplete Data Using MapReduce

Top-k dominating queries on incomplete large dataset

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation