Abstract
The skyline has attracted a lot of attention due to its wide application in various fields. However, the skyline computation is a challenging issue as there is a high probability that today’s applications deal with large and high-dimensional data. As skyline computation for such huge amount of data consumes much time, parallel and distributed skyline computations are considered. State-of-the-art methods for parallel and distributed skyline computations use various data space partitioning techniques. However, these methods are not efficient, as in certain cases, these methods perform unnecessary skyline computations in a partitioned space, where local-skyline tuples do not contribute to the global-skyline. This may impose additional processing overload and enlarge the overall skyline computation time. In this paper, we propose a novel data space partitioning method for parallel and distributed skyline computation that consists of two-phases: diagonal and entropy score curve based partitioning. The proposed method produces a small set of local-skyline tuples and leads to a more sophisticated merging step. The experiment results demonstrate that the proposed method reduces the number of comparisons and processing time of skyline computation in large amount of data when compared with the existing state-of-the-art methods.
Similar content being viewed by others
References
Zhu, H., Zhu, P., Li, X., Liu, Q.: Top-k skyline groups queries. In: Proceedings of 20th International Conference on Extending Database Technology (EDBT), pp. 442–445 (2017)
Asudeh, A., Thirumuruganathan, S., Zhang, N., Das, G.: Discovering the skyline of web databases. Proc. VLDB Endow. 9(7), 600–611 (2016)
Park, Y., Min, J.K., Shim, K.: Parallel computation of skyline and reverse skyline queries using MapReduce. Proc. VLDB Endow. 6(14), 2002–2013 (2013)
Wu, P., Zhang, C., Feng, Y., Zhao, B.Y., Agrawal, D., Abbadi, A.E.: Parallelizing skyline queries for scalable distribution. In: Proceedings of the 10th international conference on Advances in Database Technology (EDBT), pp. 112–130 (2006)
Vlachou, A., Doulkeridis, C., Kotidis, Y.: Angle-based space partitioning for efficient parallel skyline computation. In: Proceedings of the International conference on Management of data (SIGMOD), pp. 227–238 (2008)
Borzsonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of the 17th International Conference on Data Engineering (ICDE), pp. 421–430 (2001)
Chomicki, J., Godfrey, P., Gryz, J., Liang, D.: Skyline with presorting. In: Proceedings of the 19th International Conference on Data Engineering (ICDE), pp. 717–719 (2003)
Chomicki, J., Ciaccia, P., Meneghetti, N.: Skyline queries, front and back. ACM SIGMOD Record, vol. 42, no. 3 (2013)
Choi, J.H., Lee, Y.J., Shin, H.S., Nasridinov, A.: An efficient computation of skyline queries using hash tables. Adv. Sci. Lett. 22(9), 2348–2353 (2016)
Bartolini, I., Ciaccia, P., Patella, M.: Efficient sort-based skyline evaluation. ACM Trans. Database Syst. 33(4), 31–49 (2008)
Bartolini, I., Ciaccia, P., Patella, M.: SaLSa: computing the skyline without scanning the whole sky. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM), pp. 405–414 (2006)
Ihm, S.Y., Lee, K.E., Nasridinov, A., Heo, J.S., Park, Y.H.: Approximate convex skyline: a partitioned layer-based index for efficient processing top-k queries. Knowl.-Based Syst. 61, 13–28 (2014)
Son, Y., Ihm, S.Y., Nasridinov, A., Park, Y.H.: Adaptive convex skyline: a threshold-based project partitioned layer-based index for efficient-processing top-k queries in entrepreneurship applications. J. Supercomput. 72(11), 4262–4275 (2016)
Zhange, B., Zhou, S., Guan, J.: Adapting skyline computation to the MapReduce framework: algorithms and experiments. In: Proceeding of the 16th International Conference on Database Systems for Advanced Applications (DASFAA), pp. 403–414 (2011)
Park, Y., Min, J.K., Shim, K.: Efficient processing of skyline queries using MapReduce. IEEE Trans. Knowl. Data Eng. 29(5), 1031–1044 (2017)
Koh, J.L., Chen, C.C., Chan, C.Y., Chen, A.L.P.: MapReduce skyline query processing with partitioning and distributed dominance tests. Inf. Sci. 375, 114–137 (2017)
Wu, J., Chen, L., Yu, Q., Kuang, L., Wang, Y., Wu, Z.: Selecting skyline services for QoS-aware composition by upgrading MapReduce paradigm. Cluster Computing 16(4), 693–706 (2013)
Bogh, K.S., Chester, S., Assent, I.: SkyAlign: a portable, work-efficient skyline algorithm for multicore and GPU architectures. VLDB J. 25(6), 817–841 (2016)
Bogh, K.S., Chester, S., Assent, I.: Work-efficient parallel skyline computation for the GPU. Proc. VLDB Endow. 8(9), 962–973 (2015)
Hayward, R., McDiarmid, C.: Average case analysis of heap building by repeated insertion. J. Algorithms 12(1), 126–153 (1991)
Acknowledgements
This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No.R7120-17-1007, SIAT CCTV Cloud Platform). Aziz Nasridinov and Jong-Hyeok Choi contributed equally to this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nasridinov, A., Choi, JH. & Park, YH. A two-phase data space partitioning for efficient skyline computation. Cluster Comput 20, 3617–3628 (2017). https://doi.org/10.1007/s10586-017-1070-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-1070-6