Ranking the big sky: efficient top-k skyline computation on massive data

Han, Xixian; Wang, Bailing; Li, Jianzhong; Gao, Hong

doi:10.1007/s10115-018-1256-0

Ranking the big sky: efficient top-k skyline computation on massive data

Regular Paper
Published: 01 September 2018

Volume 60, pages 415–446, (2019)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Xixian Han ORCID: orcid.org/0000-0001-5477-9249¹,
Bailing Wang¹,
Jianzhong Li¹ &
…
Hong Gao¹

341 Accesses
3 Citations
Explore all metrics

Abstract

In many applications, top-k skyline query is an important operation to return k skyline tuples with the highest domination scores in a potentially huge data space. It is analyzed that the existing algorithms cannot process top-k skyline query on massive data efficiently. In this paper, we propose a novel table-scan-based algorithm RSTS to compute top-k skyline results on massive data efficiently. RSTS first builds the presorted table, whose tuples are arranged in the order of round-robin retrieval on sorted column lists. RSTS consists of two phases. In phase 1, the candidate tuples are acquired by the sequential scan on the presorted table. In phase 2, RSTS calculates the domination scores of the candidates and returns query results by another sequential scan. It is proved that RSTS has the characteristic of early termination, along with the theoretical analysis of scan depths. The pruning rule for candidate tuples is devised in this paper. The theoretical pruning effect shows that majority of the skyline results can be discarded directly. The extensive experimental results, conducted on synthetic and real-life data sets, show that RSTS outperforms the existing algorithms significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 5

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Article Open access 15 January 2021

A Survey of High Utility Itemset Mining

Notes

In this paper, the tuples are considered in general position [29].
We do not consider domination relationship between cad and itself.

References

Asudeh A, Thirumuruganathan S, Zhang N, Das G (2016) Discovering the skyline of web databases. Proc VLDB Endow 9(7):600–611
Article Google Scholar
Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data engineering, pp 421–430
Chan C, Jagadish H, Tan K, Tung K, Zhang Z (2006) Finding k-dominant skylines in high dimensional space. In: Proceedings of the 2006 ACM SIGMOD international conference on management of data, SIGMOD ’06, pp 503–514
Chan C, Jagadish H, Tan K, Tung K, Zhang Z (2006) On high dimensional skylines. In: Proceedings of the 10th international conference on advances in database technology, EDBT’06, pp 478–495
Chen Y, Lee C (2015) Neural skyline filter for accelerating skyline search algorithms. Expert Syst 32(1):108–131
Article Google Scholar
Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: Proceedings of the 19th international conference on data engineering, 5–8 March 2003, Bangalore, India, pp 717–719
Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’01, pp 102–113
Fernandez R, Pietzuch P, Kreps J et al (2015) Liquid: unifying nearline and offline big data integration. In: CIDR 2015, Seventh biennial conference on innovative data systems research, online proceedings
Gao Y, Liu Q, Chen L (2015) Efficient algorithms for finding the most desirable skyline objects. Knowl-Based Syst 89:250–264
Article Google Scholar
Godfrey P (2004) Skyline cardinality for relational processing. In: Seipel D, Turull-Torres JM (eds) Foundations of information and knowledge systems, vol 2942. Springer, Berlin, pp 78–97
Chapter Google Scholar
Godfrey P, Shipley R, Gryz J (2007) Algorithms and analyses for maximal vector computation. VLDB J 16(1):5–28
Article Google Scholar
Han X, Li J, Gao H (2015) Efficient top-k retrieval on massive data. IEEE Trans Knowl Data Eng 27(10):2687–2699
Article Google Scholar
Han X, Li J, Gao H (2015) Tdep: efficiently processing top-k dominating query on massive data. Knowl Inf Syst 43(3):689–718
Article Google Scholar
Han X, Li J, Yang D, Wang J (2013) Efficient skyline computation on big data. IEEE Trans Knowl Data Eng 25(11):2521–2535
Article Google Scholar
Hose K, Vlachou A (2012) A survey of skyline processing in highly distributed environments. VLDB J 21(3):359–384
Article Google Scholar
Huang J, Jiang B, Pei J (2013) Skyline distance: a measure of multidimensional competence. Knowl Inf Syst 34(2):373–396
Article Google Scholar
Koltun V, Papadimitriou C (2007) Approximately dominating representatives. Theor Comput Sci 371(3):148–154
Article MathSciNet MATH Google Scholar
Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of VLDB’02, pp 275–286
Lee J, Hwang S (2014) Scalable skyline computation using a balanced pivot selection technique. Inf Syst 39:1–21
Article Google Scholar
Lee J, You G, Hwang S (2009) Personalized top-k skyline queries in high-dimensional space. Inf Syst 34(1):45–61
Article Google Scholar
Lian X, Chen L (2013) Probabilistic top-k dominating queries in uncertain databases. Inf Sci 226:23–46
Article MathSciNet MATH Google Scholar
Lin X, Yuan Y, Zhang Q, Zhang Y (2007) Selecting stars: the k most representative skyline operator. In: Proceedings of the 23rd international conference on data engineering, ICDE 2007, pp 86–95
Liu J, Yang J, Xiong L, Pei J (2017) Secure skyline queries on cloud platform. In: 33rd IEEE international conference on data engineering, ICDE 2017, pp 633–644
Magnani M, Assent I, Mortensen M (2014) Taking the big picture: representative skylines based on significance and diversity. VLDB J 23(5):795–815
Article Google Scholar
Mullesgaard K, Pederseny J, Lu H, Zhou Y (2014) Efficient skyline computation in mapreduce. In: Proceedings of the 17th international conference on extending database technology, EDBT 2014, pp 37–48
Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82
Article Google Scholar
Park Y, Min J, Shim K (2013) Parallel computation of skyline and reverse skyline queries using mapreduce. Proc VLDB Endow 6(14):2002–2013
Article Google Scholar
Sarma A, Lall A, Nanongkai D et al (2011) Representative skylines using threshold-based preference distributions. In: Proceedings of the 2011 IEEE 27th international conference on data engineering, ICDE ’11, pp 387–398
Sheng C, Tao Y (2011) On finding skylines in external memory. In: Proceedings of PODS’11, pp 107–116
Sun S, Huang Z, Zhong H (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25(3):575–606
Article Google Scholar
Tan K, Eng P, Ooi B (2001) Efficient progressive skyline computation. In: Proceedings of VLDB’01, pp 301–310
Tao Y, Ding L, Lin X, Pei J (2009) Distance-based representative skyline. In: Proceedings of the 2009 IEEE international conference on data engineering, ICDE ’09, pp 892–903
Tao Y, Xiao X, Pei J (2007) Efficient skyline and top-k retrieval in subspaces. IEEE Trans Knowl Data Eng 19(8):1072–1088
Article Google Scholar
Tiakas E, Papadopoulos A, Manolopoulos Y (2011) Progressive processing of subspace dominating queries. VLDB J 20(6):921–948
Article Google Scholar
Vlachou A, Vazirgiannis M (2010) Ranking the sky: discovering the importance of skyline points through subspace dominance relationships. Data Knowl Eng 69(9):943–964 x
Article Google Scholar
Xia T, Zhang D, Tao Y (2008) On skylining with flexible dominance relation. In: Proceedings of the 24th international conference on data engineering, ICDE 2008, pp 1397–1399
Yiu M, Mamoulis N (2009) Multi-dimensional top-k dominating queries. VLDB J 18(3):695–718x
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China under Grant Nos. 61402130, 61632010, 61502121, National key research and development program under Grant No. 2016YFB1000703, Weihai-HIT co-construction program under Grant No. ZMZ001702.

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Xixian Han, Bailing Wang, Jianzhong Li & Hong Gao

Authors

Xixian Han
View author publications
You can also search for this author in PubMed Google Scholar
Bailing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xixian Han.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, X., Wang, B., Li, J. et al. Ranking the big sky: efficient top-k skyline computation on massive data. Knowl Inf Syst 60, 415–446 (2019). https://doi.org/10.1007/s10115-018-1256-0

Download citation

Received: 26 June 2017
Revised: 16 June 2018
Accepted: 14 July 2018
Published: 01 September 2018
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s10115-018-1256-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ranking the big sky: efficient top-k skyline computation on massive data

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

A Survey of High Utility Itemset Mining

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ranking the big sky: efficient top-k skyline computation on massive data

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

A Survey of High Utility Itemset Mining

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation