Skip to main content
Log in

Ranking the big sky: efficient top-k skyline computation on massive data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In many applications, top-k skyline query is an important operation to return k skyline tuples with the highest domination scores in a potentially huge data space. It is analyzed that the existing algorithms cannot process top-k skyline query on massive data efficiently. In this paper, we propose a novel table-scan-based algorithm RSTS to compute top-k skyline results on massive data efficiently. RSTS first builds the presorted table, whose tuples are arranged in the order of round-robin retrieval on sorted column lists. RSTS consists of two phases. In phase 1, the candidate tuples are acquired by the sequential scan on the presorted table. In phase 2, RSTS calculates the domination scores of the candidates and returns query results by another sequential scan. It is proved that RSTS has the characteristic of early termination, along with the theoretical analysis of scan depths. The pruning rule for candidate tuples is devised in this paper. The theoretical pruning effect shows that majority of the skyline results can be discarded directly. The extensive experimental results, conducted on synthetic and real-life data sets, show that RSTS outperforms the existing algorithms significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. In this paper, the tuples are considered in general position [29].

  2. We do not consider domination relationship between cad and itself.

References

  1. Asudeh A, Thirumuruganathan S, Zhang N, Das G (2016) Discovering the skyline of web databases. Proc VLDB Endow 9(7):600–611

    Article  Google Scholar 

  2. Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data engineering, pp 421–430

  3. Chan C, Jagadish H, Tan K, Tung K, Zhang Z (2006) Finding k-dominant skylines in high dimensional space. In: Proceedings of the 2006 ACM SIGMOD international conference on management of data, SIGMOD ’06, pp 503–514

  4. Chan C, Jagadish H, Tan K, Tung K, Zhang Z (2006) On high dimensional skylines. In: Proceedings of the 10th international conference on advances in database technology, EDBT’06, pp 478–495

  5. Chen Y, Lee C (2015) Neural skyline filter for accelerating skyline search algorithms. Expert Syst 32(1):108–131

    Article  Google Scholar 

  6. Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: Proceedings of the 19th international conference on data engineering, 5–8 March 2003, Bangalore, India, pp 717–719

  7. Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’01, pp 102–113

  8. Fernandez R, Pietzuch P, Kreps J et al (2015) Liquid: unifying nearline and offline big data integration. In: CIDR 2015, Seventh biennial conference on innovative data systems research, online proceedings

  9. Gao Y, Liu Q, Chen L (2015) Efficient algorithms for finding the most desirable skyline objects. Knowl-Based Syst 89:250–264

    Article  Google Scholar 

  10. Godfrey P (2004) Skyline cardinality for relational processing. In: Seipel D, Turull-Torres JM (eds) Foundations of information and knowledge systems, vol 2942. Springer, Berlin, pp 78–97

    Chapter  Google Scholar 

  11. Godfrey P, Shipley R, Gryz J (2007) Algorithms and analyses for maximal vector computation. VLDB J 16(1):5–28

    Article  Google Scholar 

  12. Han X, Li J, Gao H (2015) Efficient top-k retrieval on massive data. IEEE Trans Knowl Data Eng 27(10):2687–2699

    Article  Google Scholar 

  13. Han X, Li J, Gao H (2015) Tdep: efficiently processing top-k dominating query on massive data. Knowl Inf Syst 43(3):689–718

    Article  Google Scholar 

  14. Han X, Li J, Yang D, Wang J (2013) Efficient skyline computation on big data. IEEE Trans Knowl Data Eng 25(11):2521–2535

    Article  Google Scholar 

  15. Hose K, Vlachou A (2012) A survey of skyline processing in highly distributed environments. VLDB J 21(3):359–384

    Article  Google Scholar 

  16. Huang J, Jiang B, Pei J (2013) Skyline distance: a measure of multidimensional competence. Knowl Inf Syst 34(2):373–396

    Article  Google Scholar 

  17. Koltun V, Papadimitriou C (2007) Approximately dominating representatives. Theor Comput Sci 371(3):148–154

    Article  MathSciNet  MATH  Google Scholar 

  18. Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of VLDB’02, pp 275–286

  19. Lee J, Hwang S (2014) Scalable skyline computation using a balanced pivot selection technique. Inf Syst 39:1–21

    Article  Google Scholar 

  20. Lee J, You G, Hwang S (2009) Personalized top-k skyline queries in high-dimensional space. Inf Syst 34(1):45–61

    Article  Google Scholar 

  21. Lian X, Chen L (2013) Probabilistic top-k dominating queries in uncertain databases. Inf Sci 226:23–46

    Article  MathSciNet  MATH  Google Scholar 

  22. Lin X, Yuan Y, Zhang Q, Zhang Y (2007) Selecting stars: the k most representative skyline operator. In: Proceedings of the 23rd international conference on data engineering, ICDE 2007, pp 86–95

  23. Liu J, Yang J, Xiong L, Pei J (2017) Secure skyline queries on cloud platform. In: 33rd IEEE international conference on data engineering, ICDE 2017, pp 633–644

  24. Magnani M, Assent I, Mortensen M (2014) Taking the big picture: representative skylines based on significance and diversity. VLDB J 23(5):795–815

    Article  Google Scholar 

  25. Mullesgaard K, Pederseny J, Lu H, Zhou Y (2014) Efficient skyline computation in mapreduce. In: Proceedings of the 17th international conference on extending database technology, EDBT 2014, pp 37–48

  26. Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82

    Article  Google Scholar 

  27. Park Y, Min J, Shim K (2013) Parallel computation of skyline and reverse skyline queries using mapreduce. Proc VLDB Endow 6(14):2002–2013

    Article  Google Scholar 

  28. Sarma A, Lall A, Nanongkai D et al (2011) Representative skylines using threshold-based preference distributions. In: Proceedings of the 2011 IEEE 27th international conference on data engineering, ICDE ’11, pp 387–398

  29. Sheng C, Tao Y (2011) On finding skylines in external memory. In: Proceedings of PODS’11, pp 107–116

  30. Sun S, Huang Z, Zhong H (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25(3):575–606

    Article  Google Scholar 

  31. Tan K, Eng P, Ooi B (2001) Efficient progressive skyline computation. In: Proceedings of VLDB’01, pp 301–310

  32. Tao Y, Ding L, Lin X, Pei J (2009) Distance-based representative skyline. In: Proceedings of the 2009 IEEE international conference on data engineering, ICDE ’09, pp 892–903

  33. Tao Y, Xiao X, Pei J (2007) Efficient skyline and top-k retrieval in subspaces. IEEE Trans Knowl Data Eng 19(8):1072–1088

    Article  Google Scholar 

  34. Tiakas E, Papadopoulos A, Manolopoulos Y (2011) Progressive processing of subspace dominating queries. VLDB J 20(6):921–948

    Article  Google Scholar 

  35. Vlachou A, Vazirgiannis M (2010) Ranking the sky: discovering the importance of skyline points through subspace dominance relationships. Data Knowl Eng 69(9):943–964 x

    Article  Google Scholar 

  36. Xia T, Zhang D, Tao Y (2008) On skylining with flexible dominance relation. In: Proceedings of the 24th international conference on data engineering, ICDE 2008, pp 1397–1399

  37. Yiu M, Mamoulis N (2009) Multi-dimensional top-k dominating queries. VLDB J 18(3):695–718x

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China under Grant Nos. 61402130, 61632010, 61502121, National key research and development program under Grant No. 2016YFB1000703, Weihai-HIT co-construction program under Grant No. ZMZ001702.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xixian Han.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, X., Wang, B., Li, J. et al. Ranking the big sky: efficient top-k skyline computation on massive data. Knowl Inf Syst 60, 415–446 (2019). https://doi.org/10.1007/s10115-018-1256-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1256-0

Keywords

Navigation