Abstract
In many applications, top-k skyline query is an important operation to return k skyline tuples with the highest domination scores in a potentially huge data space. It is analyzed that the existing algorithms cannot process top-k skyline query on massive data efficiently. In this paper, we propose a novel table-scan-based algorithm RSTS to compute top-k skyline results on massive data efficiently. RSTS first builds the presorted table, whose tuples are arranged in the order of round-robin retrieval on sorted column lists. RSTS consists of two phases. In phase 1, the candidate tuples are acquired by the sequential scan on the presorted table. In phase 2, RSTS calculates the domination scores of the candidates and returns query results by another sequential scan. It is proved that RSTS has the characteristic of early termination, along with the theoretical analysis of scan depths. The pruning rule for candidate tuples is devised in this paper. The theoretical pruning effect shows that majority of the skyline results can be discarded directly. The extensive experimental results, conducted on synthetic and real-life data sets, show that RSTS outperforms the existing algorithms significantly.


















Similar content being viewed by others
Notes
In this paper, the tuples are considered in general position [29].
We do not consider domination relationship between cad and itself.
References
Asudeh A, Thirumuruganathan S, Zhang N, Das G (2016) Discovering the skyline of web databases. Proc VLDB Endow 9(7):600–611
Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data engineering, pp 421–430
Chan C, Jagadish H, Tan K, Tung K, Zhang Z (2006) Finding k-dominant skylines in high dimensional space. In: Proceedings of the 2006 ACM SIGMOD international conference on management of data, SIGMOD ’06, pp 503–514
Chan C, Jagadish H, Tan K, Tung K, Zhang Z (2006) On high dimensional skylines. In: Proceedings of the 10th international conference on advances in database technology, EDBT’06, pp 478–495
Chen Y, Lee C (2015) Neural skyline filter for accelerating skyline search algorithms. Expert Syst 32(1):108–131
Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: Proceedings of the 19th international conference on data engineering, 5–8 March 2003, Bangalore, India, pp 717–719
Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’01, pp 102–113
Fernandez R, Pietzuch P, Kreps J et al (2015) Liquid: unifying nearline and offline big data integration. In: CIDR 2015, Seventh biennial conference on innovative data systems research, online proceedings
Gao Y, Liu Q, Chen L (2015) Efficient algorithms for finding the most desirable skyline objects. Knowl-Based Syst 89:250–264
Godfrey P (2004) Skyline cardinality for relational processing. In: Seipel D, Turull-Torres JM (eds) Foundations of information and knowledge systems, vol 2942. Springer, Berlin, pp 78–97
Godfrey P, Shipley R, Gryz J (2007) Algorithms and analyses for maximal vector computation. VLDB J 16(1):5–28
Han X, Li J, Gao H (2015) Efficient top-k retrieval on massive data. IEEE Trans Knowl Data Eng 27(10):2687–2699
Han X, Li J, Gao H (2015) Tdep: efficiently processing top-k dominating query on massive data. Knowl Inf Syst 43(3):689–718
Han X, Li J, Yang D, Wang J (2013) Efficient skyline computation on big data. IEEE Trans Knowl Data Eng 25(11):2521–2535
Hose K, Vlachou A (2012) A survey of skyline processing in highly distributed environments. VLDB J 21(3):359–384
Huang J, Jiang B, Pei J (2013) Skyline distance: a measure of multidimensional competence. Knowl Inf Syst 34(2):373–396
Koltun V, Papadimitriou C (2007) Approximately dominating representatives. Theor Comput Sci 371(3):148–154
Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of VLDB’02, pp 275–286
Lee J, Hwang S (2014) Scalable skyline computation using a balanced pivot selection technique. Inf Syst 39:1–21
Lee J, You G, Hwang S (2009) Personalized top-k skyline queries in high-dimensional space. Inf Syst 34(1):45–61
Lian X, Chen L (2013) Probabilistic top-k dominating queries in uncertain databases. Inf Sci 226:23–46
Lin X, Yuan Y, Zhang Q, Zhang Y (2007) Selecting stars: the k most representative skyline operator. In: Proceedings of the 23rd international conference on data engineering, ICDE 2007, pp 86–95
Liu J, Yang J, Xiong L, Pei J (2017) Secure skyline queries on cloud platform. In: 33rd IEEE international conference on data engineering, ICDE 2017, pp 633–644
Magnani M, Assent I, Mortensen M (2014) Taking the big picture: representative skylines based on significance and diversity. VLDB J 23(5):795–815
Mullesgaard K, Pederseny J, Lu H, Zhou Y (2014) Efficient skyline computation in mapreduce. In: Proceedings of the 17th international conference on extending database technology, EDBT 2014, pp 37–48
Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82
Park Y, Min J, Shim K (2013) Parallel computation of skyline and reverse skyline queries using mapreduce. Proc VLDB Endow 6(14):2002–2013
Sarma A, Lall A, Nanongkai D et al (2011) Representative skylines using threshold-based preference distributions. In: Proceedings of the 2011 IEEE 27th international conference on data engineering, ICDE ’11, pp 387–398
Sheng C, Tao Y (2011) On finding skylines in external memory. In: Proceedings of PODS’11, pp 107–116
Sun S, Huang Z, Zhong H (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25(3):575–606
Tan K, Eng P, Ooi B (2001) Efficient progressive skyline computation. In: Proceedings of VLDB’01, pp 301–310
Tao Y, Ding L, Lin X, Pei J (2009) Distance-based representative skyline. In: Proceedings of the 2009 IEEE international conference on data engineering, ICDE ’09, pp 892–903
Tao Y, Xiao X, Pei J (2007) Efficient skyline and top-k retrieval in subspaces. IEEE Trans Knowl Data Eng 19(8):1072–1088
Tiakas E, Papadopoulos A, Manolopoulos Y (2011) Progressive processing of subspace dominating queries. VLDB J 20(6):921–948
Vlachou A, Vazirgiannis M (2010) Ranking the sky: discovering the importance of skyline points through subspace dominance relationships. Data Knowl Eng 69(9):943–964 x
Xia T, Zhang D, Tao Y (2008) On skylining with flexible dominance relation. In: Proceedings of the 24th international conference on data engineering, ICDE 2008, pp 1397–1399
Yiu M, Mamoulis N (2009) Multi-dimensional top-k dominating queries. VLDB J 18(3):695–718x
Acknowledgements
This work was supported in part by National Natural Science Foundation of China under Grant Nos. 61402130, 61632010, 61502121, National key research and development program under Grant No. 2016YFB1000703, Weihai-HIT co-construction program under Grant No. ZMZ001702.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Han, X., Wang, B., Li, J. et al. Ranking the big sky: efficient top-k skyline computation on massive data. Knowl Inf Syst 60, 415–446 (2019). https://doi.org/10.1007/s10115-018-1256-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1256-0