Abstract
Because any MapReduce job requires a series of complex operations such as task scheduling and resource allocation independently, there are a lot of redundant disk I/O and resource duplicate application operations among multiple MapReduce jobs coordinated by the same algorithm, causing inefficient resource utilization in job computing process. Big data mining algorithms are usually divided into several MapReduce Jobs, taking ItemBased algorithm as an example, this paper has analyzed the resource efficiency of mining algorithm with multi-MapReduce job collaboration scenario. It proposed an ItemBased algorithm based on DistributedCache, which used DistributedCache to cache I/O data between multiple MapReduce Jobs, breaks the defect of independence between jobs, and reduced the waiting delay between Map and Reduce tasks. The experimental results show that, DistributedCache can improve the data reading speed of MapReduce jobs. The algorithm reconstructed by DistributedCache greatly reduces the waiting delay between Map and Reduce tasks, and improves the resource efficiency by more than three times.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. http://www.emc.com/collateral/analyst-reports/idc-the-digitaluniverse-in-2020.pdf. Accessed 15 Mar 2018
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: Proceedings of the 19th ACM Symposium on Operating System Principles, pp. 29–43. ACM Press, New York (2003)
Chen, C., Lin, J., Kuo, S.: MapReduce scheduling for deadline-constrained jobs in heterogeneous cloud computing systems. IEEE Trans. Cloud Comput. 6(1), 127–140 (2018)
Liao, B., Zhang, T., Yu, J., et al.: Energy consumption modeling and optimization analysis for MapReduce. J. Comput. Res. Dev. 53(9), 2107–2131 (2016)
Wu, Q., Wang, L.P., Luo, X.Z., et al.: Top-k high utility pattern mining algorithm based on MapReduce. Appl. Res. Comput. 34(10), 2897–2900 (2017)
Liao, B., Zhang, T., Yu, J., et al.: Temperature aware energy-efficient task scheduling strategies for MapReduce. J. Commun. 37(1), 61–75 (2016)
Zhao, Z.D., Shang, M.S.: User-based collaborative-filtering recommendation algorithms on Hadoop. In: Proceedings of International Conference on Knowledge Discovery and Data Mining, pp. 478–481. IEEE Press, Piscataway (2010)
Ma, M.M., Wang, S.P.: Research of user-based collaborative filtering recommendation algorithm based on Hadoop. In: Proceedings of International Conference on Computer Information Systems and Industrial Applications, pp. 63–66. Atlantis, New York (2015)
Schelter, S., Boden, C., Markl, V.: Scalable similarity-based neighborhood methods with MapReduce. In: Proceedings of ACM Conference on Recommender Systems, pp. 163–170. ACM Press, New York (2012)
Das, A.S., Datar, M., Garg, A., et al.: Google news personalization: scalable online collaborative filtering. In: Proceedings of International Conference on World Wide Web, pp. 271–280. ACM Press, New York (2007)
Jiang, J., Lu, J., Zhang, G., et al.: Scaling-up item-based collaborative filtering recommendation algorithm based on Hadoop. In: Proceedings of IEEE World Congress on Services, pp. 490–497. IEEE Press, Piscataway (2011)
Liao, B., Zhang, T., Guo, B.L., et al.: Performance optimization of ItemBased recommendation algorithm based on spark. J. Comput. Appl. 37(7), 1900–1905 (2017)
Acknowledgment
This work was supported in part by Research Project of Hubei Provincial Department of Education (No. B2017590).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Fengli, Z., Xiaoli, L. (2019). Resource Efficiency Optimization for Big Data Mining Algorithm with Multi-MapReduce Collaboration Scenario. In: Huang, DS., Huang, ZK., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2019. Lecture Notes in Computer Science(), vol 11645. Springer, Cham. https://doi.org/10.1007/978-3-030-26766-7_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-26766-7_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26765-0
Online ISBN: 978-3-030-26766-7
eBook Packages: Computer ScienceComputer Science (R0)