Skip to main content
Log in

Skyline-join query processing in distributed databases

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

The skyline-join operator, as an important variant of skylines, plays an important role in multi-criteria decision making problems. However, as the data scale increases, previous methods of skyline-join queries cannot be applied to new applications. Therefore, in this paper, it is the first attempt to propose a scalable method to process skyline-join queries in distributed databases. First, a tailored distributed framework is presented to facilitate the computation of skyline-join queries. Second, the distributed skyline-join query algorithm (DSJQ) is designed to process skyline-join queries. DSJQ contains two phases. In the first phase, two filtering strategies are used to filter out unpromising tuples from the original tables. The remaining tuples are transmitted to the corresponding data nodes according a partition function, which can guarantee that the tuples with the same join value are transferred to the same node. In the second phase, we design a scheduling plan based on rotations to calculate the final skyline-join result. The scheduling plan can ensure that calculations are equally assigned to all the data nodes, and the calculations on each data node can be processed in parallel without creating a bottleneck node. Finally, the effectiveness of DSJQ is evaluated through a series of experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Borzsony S, Kossmann D, and Stocker K. The skyline operator. In: Proceedings of the 17th IEEE International Conference on Data Engineering. 2001, 421–430

    Chapter  Google Scholar 

  2. Balke W-T, Güntzer U, Zheng J X. Efficient distributed skylining for web information systems. In: Proceedings of the 9th International Conference on Extending Database Technology. 2004, 256–273

    Google Scholar 

  3. Afrati F-N, Koutris P, Suciu D, Ullman J-D. Parallel skyline queries. In: Proceedings of the 15th International Conference on Database Theory. 2012, 274–284

    Google Scholar 

  4. Chen L, Lian X. Dynamic skyline queries in metric spaces. In: Proceedings of the 11th International Conference on Extending Database Technology. 2008, 333–343

    Google Scholar 

  5. Sun D L, Wu S, Li J Z, Tung A K H. Skyline-join in distributed databases. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 176–181

    Google Scholar 

  6. Nagendra M, Candan K S. Skyline-sensitive joins with LR-pruning. In: Proceedings of the 15th ACM International Conference on Extending Database Technology. 2012, 252–263

    Google Scholar 

  7. Jin W, Ester M, Hu Z, Han J. The multi-relational skyline operator. In: Proceedings of the 23rd IEEE International Conference on Data Engineering. 2007, 1276–1280

    Google Scholar 

  8. Vlachou A, Doulkeridis C, Polyzotis N. Skyline query processing over joins. In: Proceedings of the 2011 ACM SIGMOD International conference on Management Of Data. 2011, 73–84

    Chapter  Google Scholar 

  9. Jin W, Morse MD, Patel JM, EsterM, Hu Z. Evaluating skylines in the presence of equijoins. In: Proceedings of the 26th IEEE International Conference on Data Engineering. 2010, 249–260

    Google Scholar 

  10. Kung H-T, Luccio F, Preparata F P. On finding the maxima of a set of vectors. Journal of the ACM, 1975, 22(4): 469–476

    Article  MathSciNet  MATH  Google Scholar 

  11. Chomicki J, Godfrey P, Gryz J, Liang D. Skyline with presorting. In: Proceedings of the 19th International Conference on Data Engineering. 2003, 717–719

    Google Scholar 

  12. Tan K-L, Eng P-K, Ooi B C. Efficient progressive skyline computation. In: Proceedings of the 27th International Conference on Very Large Data Bases. 2001, 301–310

    Google Scholar 

  13. Kossmann D, Ramsak F, Rost S. Shooting stars in the sky: An online algorithm for skyline queries. In: Proceedings of the 28th International Conference on Very Large Data Bases. 2002, 275–286

    Google Scholar 

  14. Papadias D, Tao Y, Fu G, Seeger B. An optimal and progressive algorithm for skyline queries. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management Of Data. 2003, 467–478

    Chapter  Google Scholar 

  15. Lee K C, Zheng B, Li H, LeeWC. Approaching the skyline in Z order. In: Proceedings of the 33rd International Conference on Very Large Data Bases. 2007, 279–290

    Google Scholar 

  16. Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters. In: Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation. 2004, 137–150

    Google Scholar 

  17. Wang H J, Qin X P, Zhou X, Li F R, Qin Z Y, Zhu Q, Wang S. Efficient query processing framework for big data warehouse: an almost joinfree approach. Frontiers of Computer Science. 2015, 9(2): 224–236

    Article  MathSciNet  Google Scholar 

  18. Vlachou A, Doulkeridis C, Kotidis Y, Vazirgiannis M. Skypeer: Efficient subspace skyline computation over distributed data. In: Proceedings of the 23rd IEEE International Conference on Data Engineering. 2007, 416–425

    Google Scholar 

  19. Chen L, Cui B, Lu H, Xu L H, Xu Q Q. iSky: Efficient and progressive skyline computing in a structured P2P network. In: Proceedings of the 28th IEEE International Conference on Distributed Computing Systems. 2008, 160–167

    Google Scholar 

  20. Cui B, Lu H, Xu Q Q, Chen L J, Dai Y F, Zhou Y L. Parallel distributed processing of constrained skyline queries by filtering. In: Proceedings of the 24th IEEE International Conference on Data Engineering. 2008, 546–555

    Google Scholar 

  21. Afrati F N, Koutris P, Suciu D, Ullman J D. Parallel skyline queries. In: Proceedings of the 15 th ACM International Conference on Digital Telecommunications. 2012, 274–284

    Google Scholar 

  22. Köhler H, Yang J, Zhou X F. Efficient parallel skyline processing using hyperplane projections. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management Of Data. 2011, 85–96

    Chapter  Google Scholar 

  23. Park Y, Min J-K, Shim K. Parallel computation of skyline and reverse skyline queries using mapreduce. In: Proceedings of International Conference on Very Large Data Bases. 2013, 2002–2013

    Google Scholar 

  24. Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating Systems Design and Implementation. 2004, 137–150

    Google Scholar 

  25. Bartolini I, Ciaccia P, Patella M. SaLSa: computing the skyline without scanning the whole sky. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 2006, 405–414

    Google Scholar 

  26. Lin X M, Zhang Y, Zhang W J, Cheema M A. Stochastic skyline operator In: Proceedings of the 27th IEEE International Conference on Data Engineering. 2011, 721–732

    Google Scholar 

  27. Godfrey P, Shipley R, Gryz J. Maximal vector computation in large data sets. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 229–240

    Google Scholar 

  28. Khalefa M E, Mokbel M F, Levandoski J J. Skyline query processing for uncertain data. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 2010, 1293–1296

    Google Scholar 

  29. Lian X, Chen L. Efficient processing of probabilistic group subspace skyline queries in uncertain databases. Information Systems. 2013, 38(3): 265–285

    Article  Google Scholar 

  30. Bloom B H. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM. 1970, 13(7): 422–426

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mei Bai.

Additional information

Mei Bai received her BS and MS degrees in computer science and technology from the Northeastern University, China in 2009 and 2011, respectively. She is currently a PhD candidate in the Department of Computer Science, Northeastern University. Her research interests include sensory data management and uncertain data management.

Junchang Xin received the BS,MS, and PhD degrees in computer science and technology from the Northeastern University, China in 2002, 2005, and 2008, respectively. He is currently an associate professor in the Department of Computer Science, Northeastern University, China. His research interests include sensory data management, uncertain data management, cloud computing, and machine learning.

Guoren Wang received his BS, MS and PhD degrees from the Department of Computer Science, Northeastern University, China in 1988, 1991, and 1996, respectively. Currently, he is a professor in the Department of Computer Science, Northeastern University, China. His research interests are XML data management, query processing and optimization, bioinformatics, high-dimensional indexing, parallel database systems, and P2P data management.

Roger Zimmermann is an associate professor with the Department of Computer Science at the National University of Singapore (NUS), Singapore where he is also a deputy director with the Interactive and Digital Media Institute (IDMI) and a co-director with the Center of Social Media Innovations for Communities (COSMIC). He received his PhD from University of Southern California, USA in 1998. His research interests are distributed and peer-to-peer systems, collaborative environments, streaming media architectures, and mobile location-based services.

Xite Wang is a PhD candidate in College of Information Science & Engineering, Northeastern University, China, where he received his BS and MS degrees in 2009 and 2011, respectively. His research interests include cloud computing and big-data management.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bai, M., Xin, J., Wang, G. et al. Skyline-join query processing in distributed databases. Front. Comput. Sci. 10, 330–352 (2016). https://doi.org/10.1007/s11704-015-4534-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-015-4534-y

Keywords

Navigation