Abstract
Skyline join is an important operation in many applications to return all join tuples that are not dominated by any other join tuples. It is found that the existing algorithms cannot process skyline join on massive data efficiently. This paper presents a novel skyline join algorithm SEPT on massive data. SEPT utilizes sorted positional index lists with join information which require low space overhead to reduce I/O cost significantly. The sorted positional index list is constructed for each potential skyline attribute in the joined tables and is arranged in ascending order of the attribute. SEPT consists of two phases. In phase one, SEPT obtains candidate join positional index pairs of skyline join results. During retrieving the sorted positional index lists, SEPT performs pruning on candidate join positional index pairs in order to discard the candidates whose corresponding join tuples are not skyline join results. In phase two, SEPT exploits the obtained candidate join positional index pairs to get skyline join results by a selective and sequential scan on the tables. The experimental results on synthetic and real data sets show that SEPT has a significant advantage over the existing skyline join algorithms.
Similar content being viewed by others
Notes
In this paper, we consider that the tuples are in general position [29].
References
Bartolini I, Ciaccia P, Patella M (2008) Efficient sort-based skyline evaluation. ACM Trans Database Syst 33(4):31:1–31:49
Bentley J, Kung H, Schkolnick M, Thompson C (1978) On the average number of maxima in a set of vectors and applications. J ACM 25(4):536–543
Bloom B (1970) Space/time trade-offs in hash coding with allowable errors. Commun ACM 13(7): 422–426
Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on data, engineering, pp 421–430
Bryan R (2007) Data-intensive supercomputing: the case for disc. In: Technical report CMU-CS-07-128. School of Computer Science, Carnegie Mellon University
Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: Proceedings of the 19th international conference on data, engineering, pp 717–719
Courant R, John F (1989) Introduction to calculus and analysis: volume I, 1st edn. Springer, New York
Gibas M, Canahuate G, Ferhatosmanoglu H (2008) Online index recommendations for high-dimensional databases using query workloads. IEEE Trans Knowl Data Eng 20(2):246–260
Godfrey P (2004) Skyline cardinality for relational processing. In: Seipel D, Turull-Torres JMa (eds) Foundations of information and knowledge systems, vol 2942. Springer, Berlin, pp 78–97
Godfrey P, Shipley R, Gryz J (2007) Algorithms and analyses for maximal vector computation. VLDB J 16(1):5–28
Gray J, Shenoy P (2000) Rules of thumb in data engineering. In: Proceedings of the 16th international conference on data, engineering, pp 3–12
Han X, Li J, Yang D (2012) Pi-join: efficiently processing join queries on massive data. Knowl Inf Syst 32(3):527–557
Han X, Li J, Yang D, Wang J (2013) Efficient skyline computation on big data. IEEE Trans Knowl Data Eng 25(11):2521–2535
Huang J, Jiang B, Pei J, Chen J, Tang Y (2013) Skyline distance: a measure of multidimensional competence. Knowl Inf Syst 34(2):373–396
Huang Z, Sun S, Wang W (2010) Efficient mining of skyline objects in subspaces over data streams. Knowl Inf Syst 22(2):159–183
Jin W, Ester M, Hu Z, Han J (2007) The multi-relational skyline operator. In: Proceedings of the 23rd international conference on data, engineering, pp 1276–1280
Jin W, Morse M, Patel J, Ester M, Hu Z (2010) Evaluating skylines in the presence of equijoins. In: Proceedings of the 26th international conference on data, engineering, pp 249–260
Khalefa M, Mokbel M, Levandoski J (2011) Prefjoin: an efficient preference-aware join operator. In: Proceedings of the 27th international conference on data, engineering, pp 995–1006
Kossmann D, Ramsak F, Rost S (2002) Shooting stars in the sky: an online algorithm for skyline queries. In: Proceedings of the 28th international conference on very large data, bases, pp 275–286
Kung H, Luccio F, Preparata F (1975) On finding the maxima of a set of vectors. J ACM 22(4):469–476
Lee K, Lee W, Zheng B, Li H, Tian Y (2010) Z-sky: an efficient skyline query processing framework based on z-order. VLDB J 19(3):333–362
Luo C, Jiang Z, Hou W, He S, Zhu Q (2012) A sampling approach for skyline query cardinality estimation. Knowl Inf Syst 32(2):281–301
Nagendra M, Candan K (2012) Skyline-sensitive joins with lr-pruning. In: Proceedings of the 15th international conference on extending database technology, pp 252–263
Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. ACM Trans Database Syst 30(1):41–82
Raghavan V, Rundensteiner E (2010) Progressive result generation for multi-criteria decision support queries. In: Proceedings of the 26th international conference on data, engineering, pp 733–744
Raghavan V, Rundensteiner E, Srivastava S (2011) Skyline and mapping aware join query evaluation. Inf Syst 36(6):917–936
Rudin W (1976) Principles of mathematical analysis, 3rd edn. McGraw-Hill Book Co., New York
Seagate (2012) Barracuda xt: no compromise. Speed and capacity for high-performance desktop systems. http://www.seagate.com/docs/pdf/datasheet/disc/ds_barracuda_xt.pdf
Sheng C, Tao Y (2011) On finding skylines in external memory. In: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 107–116
Sun D, Wu S, Li J, Tung A (2008) Skyline-join in distributed databases. In: Proceedings of the 24th international conference on data engineering workshops, pp 176–181
Sun S, Huang Z, Zhong H, Dai D, Liu H, Li J (2010) Efficient monitoring of skyline queries over distributed data streams. Knowl Inf Syst 25(3):575–606
Tan K, Eng P, Ooi B (2001) Efficient progressive skyline computation. In: Proceedings of the 27th international conference on very large data, bases, pp 301–310
Tao Y, Xiao X, Pei J (2007) Efficient skyline and top-k retrieval in subspaces. IEEE Trans Knowl Data Eng 19(8):1072–1088
Tom’s Hardware (2006) Hard drives: 40mb to 750gb. http://www.tomshardware.com/reviews/15-years-of-hard-drive-history,1368--2.html
Vlachou A, Doulkeridis C, Polyzotis N (2011) Skyline query processing over joins. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 73–84
Acknowledgments
We thank anonymous reviewers for their very useful comments and suggestions. This work was supported in part by the National Basic Research (973) Program of China under Grant No. 2012CB316200, the National Natural Science Foundation of China under Grant Nos. 61190115, 61173022, 61033015, 61272046, Shandong Provincial Natural Science Foundation under Grant No. ZR2013FQ028, Natural Scientific Research Innovation Foundation in Harbin Institute of Technology under Grant Nos. HIT.NSRIF.2014136 and HIT(WH)201308, National Science & Technology Pillar Program under Grant Nos. 2012BAA13B01, 2012BAH10F03, 2013BAH17F00.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Han, X., Li, J., Gao, H. et al. SEPT: an efficient skyline join algorithm on massive data. Knowl Inf Syst 43, 355–388 (2015). https://doi.org/10.1007/s10115-014-0734-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-014-0734-2