Skip to main content
Log in

SEPT: an efficient skyline join algorithm on massive data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Skyline join is an important operation in many applications to return all join tuples that are not dominated by any other join tuples. It is found that the existing algorithms cannot process skyline join on massive data efficiently. This paper presents a novel skyline join algorithm SEPT on massive data. SEPT utilizes sorted positional index lists with join information which require low space overhead to reduce I/O cost significantly. The sorted positional index list is constructed for each potential skyline attribute in the joined tables and is arranged in ascending order of the attribute. SEPT consists of two phases. In phase one, SEPT obtains candidate join positional index pairs of skyline join results. During retrieving the sorted positional index lists, SEPT performs pruning on candidate join positional index pairs in order to discard the candidates whose corresponding join tuples are not skyline join results. In phase two, SEPT exploits the obtained candidate join positional index pairs to get skyline join results by a selective and sequential scan on the tables. The experimental results on synthetic and real data sets show that SEPT has a significant advantage over the existing skyline join algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. In this paper, we consider that the tuples are in general position [29].

References

  1. Bartolini I, Ciaccia P, Patella M (2008) Efficient sort-based skyline evaluation. ACM Trans Database Syst 33(4):31:1–31:49

    Article  Google Scholar 

  2. Bentley J, Kung H, Schkolnick M, Thompson C (1978) On the average number of maxima in a set of vectors and applications. J ACM 25(4):536–543

    Article  MATH  MathSciNet  Google Scholar 

  3. Bloom B (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7): 422–426

    Google Scholar 

  4. Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data, engineering, pp 421–430

  5. Bryan R (2007) Data-intensive supercomputing: the case for disc. In: Technical report CMU-CS-07-128. School of Computer Science, Carnegie Mellon University

  6. Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: Proceedings of the 19th international conference on data, engineering, pp 717–719

  7. Courant R, John F (1989) Introduction to calculus and analysis: volume I, 1st edn. Springer, New York

    Book  MATH  Google Scholar 

  8. Gibas M, Canahuate G, Ferhatosmanoglu H (2008) Online index recommendations for high-dimensional databases using query workloads. IEEE Trans Knowl Data Eng 20(2):246–260

    Article  Google Scholar 

  9. Godfrey P (2004) Skyline cardinality for relational processing. In: Seipel D, Turull-Torres JMa (eds) Foundations of information and knowledge systems, vol 2942. Springer, Berlin, pp 78–97

  10. Godfrey P, Shipley R, Gryz J (2007) Algorithms and analyses for maximal vector computation. VLDB J 16(1):5–28

    Article  Google Scholar 

  11. Gray J, Shenoy P (2000) Rules of thumb in data engineering. In: Proceedings of the 16th international conference on data, engineering, pp 3–12

  12. Han X, Li J, Yang D (2012) Pi-join: efficiently processing join queries on massive data. Knowl Inf Syst 32(3):527–557

    Article  Google Scholar 

  13. Han X, Li J, Yang D, Wang J (2013) Efficient skyline computation on big data. IEEE Trans Knowl Data Eng 25(11):2521–2535

    Article  Google Scholar 

  14. Huang J, Jiang B, Pei J, Chen J, Tang Y (2013) Skyline distance: a measure of multidimensional competence. Knowl Inf Syst 34(2):373–396

    Article  Google Scholar 

  15. Huang Z, Sun S, Wang W (2010) Efficient mining of skyline objects in subspaces over data streams. Knowl Inf Syst 22(2):159–183

    Article  Google Scholar 

  16. Jin W, Ester M, Hu Z, Han J (2007) The multi-relational skyline operator. In: Proceedings of the 23rd international conference on data, engineering, pp 1276–1280

  17. Jin W, Morse M, Patel J, Ester M, Hu Z (2010) Evaluating skylines in the presence of equijoins. In: Proceedings of the 26th international conference on data, engineering, pp 249–260

  18. Khalefa M, Mokbel M, Levandoski J (2011) Prefjoin: an efficient preference-aware join operator. In: Proceedings of the 27th international conference on data, engineering, pp 995–1006

  19. Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th international conference on very large data, bases, pp 275–286

  20. Kung H, Luccio F, Preparata F (1975) On finding the maxima of a set of vectors. J ACM 22(4):469–476

    Article  MATH  MathSciNet  Google Scholar 

  21. Lee K, Lee W, Zheng B, Li H, Tian Y (2010) Z-sky: an efficient skyline query processing framework based on z-order. VLDB J 19(3):333–362

    Article  Google Scholar 

  22. Luo C, Jiang Z, Hou W, He S, Zhu Q (2012) A sampling approach for skyline query cardinality estimation. Knowl Inf Syst 32(2):281–301

    Article  Google Scholar 

  23. Nagendra M, Candan K (2012) Skyline-sensitive joins with lr-pruning. In: Proceedings of the 15th international conference on extending database technology, pp 252–263

  24. Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82

    Article  Google Scholar 

  25. Raghavan V, Rundensteiner E (2010) Progressive result generation for multi-criteria decision support queries. In: Proceedings of the 26th international conference on data, engineering, pp 733–744

  26. Raghavan V, Rundensteiner E, Srivastava S (2011) Skyline and mapping aware join query evaluation. Inf Syst 36(6):917–936

    Article  Google Scholar 

  27. Rudin W (1976) Principles of mathematical analysis, 3rd edn. McGraw-Hill Book Co., New York

    MATH  Google Scholar 

  28. Seagate (2012) Barracuda xt: no compromise. Speed and capacity for high-performance desktop systems. http://www.seagate.com/docs/pdf/datasheet/disc/ds_barracuda_xt.pdf

  29. Sheng C, Tao Y (2011) On finding skylines in external memory. In: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 107–116

  30. Sun D, Wu S, Li J, Tung A (2008) Skyline-join in distributed databases. In: Proceedings of the 24th international conference on data engineering workshops, pp 176–181

  31. Sun S, Huang Z, Zhong H, Dai D, Liu H, Li J (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25(3):575–606

    Article  Google Scholar 

  32. Tan K, Eng P, Ooi B (2001) Efficient progressive skyline computation. In: Proceedings of the 27th international conference on very large data, bases, pp 301–310

  33. Tao Y, Xiao X, Pei J (2007) Efficient skyline and top-k retrieval in subspaces. IEEE Trans Knowl Data Eng 19(8):1072–1088

    Article  Google Scholar 

  34. Tom’s Hardware (2006) Hard drives: 40mb to 750gb. http://www.tomshardware.com/reviews/15-years-of-hard-drive-history,1368--2.html

  35. Vlachou A, Doulkeridis C, Polyzotis N (2011) Skyline query processing over joins. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 73–84

Download references

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions. This work was supported in part by the National Basic Research (973) Program of China under Grant No. 2012CB316200, the National Natural Science Foundation of China under Grant Nos. 61190115, 61173022, 61033015, 61272046, Shandong Provincial Natural Science Foundation under Grant No. ZR2013FQ028, Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under Grant Nos. HIT.NSRIF.2014136 and HIT(WH)201308, National Science & Technology Pillar Program under Grant Nos. 2012BAA13B01, 2012BAH10F03, 2013BAH17F00.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xixian Han.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, X., Li, J., Gao, H. et al. SEPT: an efficient skyline join algorithm on massive data. Knowl Inf Syst 43, 355–388 (2015). https://doi.org/10.1007/s10115-014-0734-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-014-0734-2

Keywords

Navigation