Skip to main content
Log in

An efficient parallel processing method for skyline queries in MapReduce

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Skyline queries are useful for finding only interesting tuples from multi-dimensional datasets for multi-criteria decision making. To improve the performance of skyline query processing for large-scale data, it is necessary to use parallel and distributed frameworks such as MapReduce that has been widely used recently. There are several approaches which process skyline queries on a MapReduce framework to improve the performance of query processing. Some methods process a part of the skyline computation in a serial manner, while there are other methods that process all parts of the skyline computation in parallel. However, each of them suffers from at least one of two drawbacks: (1) the serial computations may prevent them from fully utilizing the parallelism of the MapReduce framework; (2) when processing the skyline queries in a parallel and distributed manner, the additional overhead for the parallel processing may outweigh the benefit gained from parallelization. In order to efficiently process skyline queries for large data in parallel, we propose a novel two-phase approach in MapReduce framework. In the first phase, we start by dividing the input dataset into a number of subsets (called cells) and then we compute local skylines only for the qualified cells. The outer-cell filter used in this phase considerably improves the performance by eliminating a large number of tuples in unqualified cells. In the second phase, the global skyline is computed from local skylines. To separately determine global skyline tuples from each local skyline in parallel, we design the inner-cell filter and also propose efficient methods to reduce the overhead caused by computing and utilizing the inner-cell filters. The primary advantage of our approach is that it processes skyline queries fast and in a fully parallelized manner in all states of the MapReduce framework with the two filtering techniques. Throughout extensive experiments, we demonstrate that the proposed approach substantially increases the overall performance of skyline queries in comparison with the state-of-the-art skyline processing methods. Especially, the proposed method achieves remarkably good performance and scalability with regard to the dataset size and the dimensionality. Our approach has significant benefits for large-scale query processing of skylines in distributed and parallel computing environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

References

  1. Afrati FN, Koutris P, Suciu D, Ullman JD (2012) Parallel skyline queries. In: Proceedings of the 15th International Conference on Database Theory, ICDT ’12, pp 274–284. ACM, New York . https://doi.org/10.1145/2274576.2274605

  2. Balke WT, Güntzer U, Zheng JX (2004) Efficient distributed skylining for web information systems. In: International Conference on Extending Database Technology. Springer, Berlin, pp 256–273

  3. Borzsony, S, Kossmann D, Stocker K (2001) The skyline operator. In: Data Engineering, 2001. Proceedings. 17th International Conference on, pp 421–430. IEEE

  4. Chaudhuri S, Dalvi N, Kaushik R (2006) Robust cardinality and cost estimation for skyline operator. In: Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference on, pp 64–64. IEEE

  5. Chen L, Cui B, Lu H (2011) Constrained skyline query processing against distributed data sites. IEEE Trans Knowl Data Eng 23(2):204–217

    Article  Google Scholar 

  6. Chen L, Hwang K, Wu J (2012) Mapreduce skyline query processing with a new angular partitioning approach. In: Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International, pp 2262–2270. IEEE

  7. Chin KK, Lee CW (2009) Trafficscan bringing real-time travel information to motorists . https://www.lta.gov.sg/ltaacademy/doc/IS02-p07%20TrafficScan.pdf

  8. Chomicki J, Godfrey P, Gryz J, Liang D (2005) Skyline with presorting: theory and optimizations. In: Intelligent Information Processing and Web Mining. Springer, Berlin, pp 595–604

  9. Cosgaya-Lozano A, Rau-Chaplin A, Zeh N (2007) Parallel computation of skyline queries. In: High Performance Computing Systems and Applications, 2007. HPCS 2007. 21st International Symposium on, pp 12–12. IEEE

  10. Cui B, Chen L, Xu L, Lu H, Song G, Xu Q (2009) Efficient skyline computation in structured peer-to-peer systems. IEEE Trans Knowl Data Eng 21(7):1059–1072

    Article  Google Scholar 

  11. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  12. Gusfield D (1990) Very simple methods for all pairs network flow analysis. SIAM J Comput 19(1):143–155

    Article  MathSciNet  MATH  Google Scholar 

  13. Huang Z, Jensen CS, Lu H, Ooi BC (2006) Skyline queries against mobile lightweight devices in manets. In: Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference on, pp 66–66. IEEE

  14. Koh JL, Chen CC, Chan CY, Chen AL (2017) Mapreduce skyline query processing with partitioning and distributed dominance tests. Inf Sci 375:114–137

    Article  Google Scholar 

  15. Köhler H, Yang J, Zhou X (2011) Efficient parallel skyline processing using hyperplane projections. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp 85–96. ACM

  16. Lappas T, Gunopulos D (2010) Efficient confident search in large review corpora. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, pp 195–210

  17. Lee J, Hwang Sw, Nie Z, Wen JR. (2010) Navigation system for product search. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp 1113–1116. IEEE

  18. Mullesgaard K, Pedersen JL, Lu H, Zhou Y (2014) Efficient skyline computation in mapreduce. In: 17th International Conference on Extending Database Technology (EDBT), pp 37–48

  19. Park Y, Min JK, Shim K (2013) Parallel computation of skyline and reverse skyline queries using mapreduce. Proc VLDB Endow 6(14):2002–2013

    Article  Google Scholar 

  20. Park Y, Min JK, Shim K (2017) Efficient processing of skyline queries using mapreduce. IEEE Transactions on Knowledge and Data Engineering 29(5):1031–1044

    Article  Google Scholar 

  21. Rajit D. https://www.springboard.com/blog/free-public-data-sets-data-science-project/

  22. Rocha-Junior JB, Vlachou A, Doulkeridis C, Nørvåg K (2011) Efficient execution plans for distributed skyline query processing. In: Proceedings of the 14th International Conference on Extending Database Technology, pp 271–282. ACM

  23. Saran H, Vazirani VV (1995) Finding \(k\) cuts within twice the optimal. SIAM J Comput 24(1):101–108

    Article  MathSciNet  MATH  Google Scholar 

  24. Shang H, Kitsuregawa M (2013) Skyline operator on anti-correlated distributions. Proc VLDB Endow 6(9):649–660

    Article  Google Scholar 

  25. Tan KL, Eng PK, Ooi BC et al (2001) Efficient progressive skyline computation. In: VLDB, vol 1, pp 301–310

  26. Valkanas G, Papadopoulos AN (2010) Efficient and adaptive distributed skyline computation. In: International Conference on Scientific and Statistical Database Management. Springer, Berlin, pp 24–41

  27. Vlachou A, Doulkeridis C, Kotidis Y (2008). Angle-based space partitioning for efficient parallel skyline computation. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp 227–238. ACM

  28. Vlachou A, Doulkeridis C, Kotidis Y, Vazirgiannis M (2007) Skypeer: efficient subspace skyline computation over distributed data. In: Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pp 416–425. IEEE

  29. Wang S, Ooi BC, Tung AK, Xu L (2007) Efficient skyline query processing on peer-to-peer networks. In: 2007 IEEE 23rd International Conference on Data Engineering, pp 1126–1135. IEEE

  30. Wu P, Zhang C, Feng Y, Zhao BY, Agrawal D, El Abbadi A (2006) Parallelizing skyline queries for scalable distribution. In: International Conference on Extending Database Technology. Springer, Berlin, pp 112–130

  31. Yuan Y, Lin X, Liu Q, Wang W, Yu JX, Zhang Q (2005) Efficient computation of the skyline cube. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp 241–252. VLDB Endowment

  32. Zhang B, Zhou S, Guan J (2011) Adapting skyline computation to the mapreduce framework: algorithms and experiments. In: International Conference on Database Systems for Advanced Applications. Springer, Berlin, pp 403–414

  33. Zhang J, Jiang X, Ku WS, Qin X (2016) Efficient parallel skyline evaluation using mapreduce. IEEE Trans Parallel Distrib Syst 27(7):1996–2009

    Article  Google Scholar 

  34. Zhu L, Tao Y, Zhou S (2009) Distributed skyline retrieval with low bandwidth consumption. IEEE Trans Knowl Data Eng 21(3):384–400

    Article  Google Scholar 

  35. Zou L, Chen L, Özsu MT, Zhao D (2010) Dynamic skyline queries in large graphs. In: International Conference on Database Systems for Advanced Applications. Springer, Berlin, pp 62–78

Download references

Acknowledgements

This work was supported by the Bio-Synergy Research Project (2013M3A9C4078137) of the MSIT (Ministry of Science and ICT), Korea through the NRF, and by the MSIT (Ministry of Science and ICT), Korea under the ITRC support program (IITP-2017-2013-0-00881) supervised by the IITP.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Myoung Ho Kim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, J., Kim, M.H. An efficient parallel processing method for skyline queries in MapReduce. J Supercomput 74, 886–935 (2018). https://doi.org/10.1007/s11227-017-2171-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2171-y

Keywords

Navigation