Skip to main content
Log in

A parallel computation of skyline using multiple regression analysis-based filtering on MapReduce

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

In the last decade, skyline query processing has become widely important because of its usefulness in decision making applications. Since the size of the datasets used for skyline query processing are huge, algorithms for MapReduce-based skyline query processing have been widely studied. However, existing algorithms suffer from low-filtering efficiency for local skyline computation, and unrealistically assume both uniform data distributions and dimensional independence. In this paper, we propose a parallel skyline query processing algorithm for MapReduce using multiple regression analysis. The goal of our algorithm is to efficiently find a set of skylines from a large dataset by reducing the number of candidates prior to the skyline computation. To develop the skyline computation algorithm on anti-correlated datasets, we computed a data filtering threshold line based on a multiple regression analysis of the sampled dataset. To guarantee the accuracy of the skyline result, we considered both a filtering threshold line and a grid-based cell dominance condition. Thus, only relevant data could be computed in the real skyline computation step. For local skyline computation, we utilized an angle-based partitioning of data space that effectively eliminates non-promising points in partitions. For the global skyline computation, we used the dominance relationship among grid-based partitions to prune out unnecessary skyline points. Performance analyses showed that our parallel skyline query processing algorithm outperformed existing algorithms, under various settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Borzsony, K.D., Stocker, K.: The skyline operator. In: Proceedings of the 17th International Conference on Data Engineering, pp. 421–430. IEEE (2001)

  2. Lappas, T., Gunopulos, D.: Efficient Confident Search in Large Review Corpora. Machine Learning and Knowledge Discovery in Databases. Springer, Heidelberg (2010)

    Google Scholar 

  3. Levandoski, J.J., Mokbel, M.F., Khalefa, M.E.: Preference query evaluation over expensive attributes. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 319–328. ACM (2010)

  4. Lee, J., Hwang, S., Nie, Z., Wen, J.-R.: Navigation system for product search. In: IEEE 26th International Conference on Data Engineering (ICDE), pp. 1113–1116. IEEE (2010)

  5. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  6. Hadoop-Apache Software Foundation project home page. http://hadoop.apache.org/. Accessed 05 Aug 2017

  7. Shvachko, K., et al.: The hadoop distributed file system. In: IEEE 26th Symposium onMass Storage Systems and Technologies (MSST) (2010)

  8. Deng, K., Zhou, X., Shen, H.T.: Multi-source skyline query processing in road networks. In: IEEE 23rd International Conference on Data Engineering (ICDE), pp. 796–805 (2007)

  9. Dellis, E., Seeger, B.: Efficient computation of reverse skyline queries. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, pp. 291–302 (2007)

  10. Lee, K.C., et al.: Z-SKY: an efficient skyline query processing framework based on Z-order. VLDB J. 19(3), 333–362 (2010)

    Article  Google Scholar 

  11. Chen, L., Hwang, K., Wu, J.: MapReduce skyline query processing with a new angular partitioning approach. In: IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW). IEEE (2012)

  12. Park, Y., Min, J.-K., Shim, K.: Parallel computation of skyline and reverse skyline queries using MapReduce. In: Proceedings of the VLDB Endowment (2013)

  13. Mullesgaard, K., Pedersen, J.L., Lu, H., Zhou, Y.: Efficient skyline computation in MapReduce. In: 17th International Conference on Extending Database Technology (EDBT), pp. 37–48 (2014)

  14. Zhang, B., Zhou, S., Guan, J.: Adapting skyline computation to the MapReduce framework: algorithms and experiments. In: DASFAA Workshops (2011)

  15. Afrati, F.N., Koutris, P., Suciu, D., Ullman, J.D.: Parallel skyline queries. Theory Comput. Syst. 57(4), 1008–1037 (2015)

    Article  MATH  MathSciNet  Google Scholar 

  16. Köhler, H., Yang, J., Zhou, X.: Efficient parallel skyline processing using hyperplane projections. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 85–96 (2011)

  17. Shang, H., Kitsuregawa, M.: Skyline operator on anti-correlated distributions. Proc. VLDB Endow. 6(9), 649–660 (2013)

    Article  Google Scholar 

  18. Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: ICDE, vol. 3, pp. 717–719 (2003)

  19. Tan, K.-L., Eng, P.-K., Ooi, B.C.: Efficient progressive skyline computation In: VLDB (2001)

  20. Hunt, N., Tyrrell, S.: Stratified Sampling. Webpage at Coventry University (2001). Retrieved 12 July 2012

  21. Dismuke, C., Richard, L.: Chapter 9: Ordinary least squares. In: Methods and Designs for Outcomes Research, pp. 93–104. American Society of Health-System Pharmacists (2006)

  22. Cui, B., Lu, H., Xu, Q., Chen, L., Dai, Y., Zhou, Y.C.: Parallel distributed processing of constrained skyline queries by filtering. In: 24th ICDE (2008)

Download references

Acknowledgements

This work was partly supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. R0113-16-0005, Development of a Unified Data Engineering Technology for Large-scale Transaction Processing and Real-time Complex Analytics). This research was also supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (grant number 2016R1D1A3B03935298).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jae-Woo Chang.

Ethics declarations

Conflict of interest

The author(s) declare(s) that there is no conflict of interest regarding the publication of this paper.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jang, M., Song, Y. & Chang, JW. A parallel computation of skyline using multiple regression analysis-based filtering on MapReduce. Distrib Parallel Databases 35, 383–409 (2017). https://doi.org/10.1007/s10619-017-7202-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-017-7202-4

Keywords

Navigation