Skip to main content
Log in

A two-phase data space partitioning for efficient skyline computation

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The skyline has attracted a lot of attention due to its wide application in various fields. However, the skyline computation is a challenging issue as there is a high probability that today’s applications deal with large and high-dimensional data. As skyline computation for such huge amount of data consumes much time, parallel and distributed skyline computations are considered. State-of-the-art methods for parallel and distributed skyline computations use various data space partitioning techniques. However, these methods are not efficient, as in certain cases, these methods perform unnecessary skyline computations in a partitioned space, where local-skyline tuples do not contribute to the global-skyline. This may impose additional processing overload and enlarge the overall skyline computation time. In this paper, we propose a novel data space partitioning method for parallel and distributed skyline computation that consists of two-phases: diagonal and entropy score curve based partitioning. The proposed method produces a small set of local-skyline tuples and leads to a more sophisticated merging step. The experiment results demonstrate that the proposed method reduces the number of comparisons and processing time of skyline computation in large amount of data when compared with the existing state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Zhu, H., Zhu, P., Li, X., Liu, Q.: Top-k skyline groups queries. In: Proceedings of 20th International Conference on Extending Database Technology (EDBT), pp. 442–445 (2017)

  2. Asudeh, A., Thirumuruganathan, S., Zhang, N., Das, G.: Discovering the skyline of web databases. Proc. VLDB Endow. 9(7), 600–611 (2016)

    Article  Google Scholar 

  3. Park, Y., Min, J.K., Shim, K.: Parallel computation of skyline and reverse skyline queries using MapReduce. Proc. VLDB Endow. 6(14), 2002–2013 (2013)

    Article  Google Scholar 

  4. Wu, P., Zhang, C., Feng, Y., Zhao, B.Y., Agrawal, D., Abbadi, A.E.: Parallelizing skyline queries for scalable distribution. In: Proceedings of the 10th international conference on Advances in Database Technology (EDBT), pp. 112–130 (2006)

  5. Vlachou, A., Doulkeridis, C., Kotidis, Y.: Angle-based space partitioning for efficient parallel skyline computation. In: Proceedings of the International conference on Management of data (SIGMOD), pp. 227–238 (2008)

  6. Borzsonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of the 17th International Conference on Data Engineering (ICDE), pp. 421–430 (2001)

  7. Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of the 19th International Conference on Data Engineering (ICDE), pp. 717–719 (2003)

  8. Chomicki, J., Ciaccia, P., Meneghetti, N.: Skyline queries, front and back. ACM SIGMOD Record, vol. 42, no. 3 (2013)

  9. Choi, J.H., Lee, Y.J., Shin, H.S., Nasridinov, A.: An efficient computation of skyline queries using hash tables. Adv. Sci. Lett. 22(9), 2348–2353 (2016)

    Article  Google Scholar 

  10. Bartolini, I., Ciaccia, P., Patella, M.: Efficient sort-based skyline evaluation. ACM Trans. Database Syst. 33(4), 31–49 (2008)

    Article  Google Scholar 

  11. Bartolini, I., Ciaccia, P., Patella, M.: SaLSa: computing the skyline without scanning the whole sky. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM), pp. 405–414 (2006)

  12. Ihm, S.Y., Lee, K.E., Nasridinov, A., Heo, J.S., Park, Y.H.: Approximate convex skyline: a partitioned layer-based index for efficient processing top-k queries. Knowl.-Based Syst. 61, 13–28 (2014)

    Article  Google Scholar 

  13. Son, Y., Ihm, S.Y., Nasridinov, A., Park, Y.H.: Adaptive convex skyline: a threshold-based project partitioned layer-based index for efficient-processing top-k queries in entrepreneurship applications. J. Supercomput. 72(11), 4262–4275 (2016)

    Article  Google Scholar 

  14. Zhange, B., Zhou, S., Guan, J.: Adapting skyline computation to the MapReduce framework: algorithms and experiments. In: Proceeding of the 16th International Conference on Database Systems for Advanced Applications (DASFAA), pp. 403–414 (2011)

  15. Park, Y., Min, J.K., Shim, K.: Efficient processing of skyline queries using MapReduce. IEEE Trans. Knowl. Data Eng. 29(5), 1031–1044 (2017)

    Article  Google Scholar 

  16. Koh, J.L., Chen, C.C., Chan, C.Y., Chen, A.L.P.: MapReduce skyline query processing with partitioning and distributed dominance tests. Inf. Sci. 375, 114–137 (2017)

    Article  Google Scholar 

  17. Wu, J., Chen, L., Yu, Q., Kuang, L., Wang, Y., Wu, Z.: Selecting skyline services for QoS-aware composition by upgrading MapReduce paradigm. Cluster Computing 16(4), 693–706 (2013)

    Article  Google Scholar 

  18. Bogh, K.S., Chester, S., Assent, I.: SkyAlign: a portable, work-efficient skyline algorithm for multicore and GPU architectures. VLDB J. 25(6), 817–841 (2016)

    Article  Google Scholar 

  19. Bogh, K.S., Chester, S., Assent, I.: Work-efficient parallel skyline computation for the GPU. Proc. VLDB Endow. 8(9), 962–973 (2015)

    Article  Google Scholar 

  20. Hayward, R., McDiarmid, C.: Average case analysis of heap building by repeated insertion. J. Algorithms 12(1), 126–153 (1991)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No.R7120-17-1007, SIAT CCTV Cloud Platform). Aziz Nasridinov and Jong-Hyeok Choi contributed equally to this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Young-Ho Park.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nasridinov, A., Choi, JH. & Park, YH. A two-phase data space partitioning for efficient skyline computation. Cluster Comput 20, 3617–3628 (2017). https://doi.org/10.1007/s10586-017-1070-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1070-6

Keywords

Navigation