Abstract
Cloud computing has been extensively adopted to handle the enormous amount of data from Internet of Things, Big Date, and many other cutting-edge research areas in recent years. As cloud systems serve more and more jobs, it will be getting more difficult for time-critical or urgent jobs with high priority in a busy cloud environment to complete their execution as soon as users would like to have. To facilitate the prompt execution of those jobs, it is imperative for cloud systems to provide schemes expediting their execution. The Apache Hadoop is one of the most popular cloud platforms in cloud computing. Unfortunately, it is not equipped with flexible mechanisms to hasten the course of prioritized jobs. There had been various approaches proposed to accelerate the execution of prioritized jobs from different aspects. However, those approaches not only target at just certain existing Hadoop job schedulers but also require modifications made to those job schedulers. Thus, they cannot be directly applied to other job schedulers without major porting efforts, much less to new job schedulers developed in the future. We designed and implemented a new scheme enabling dynamic resource allocation to jobs selected by job schedulers. As a result, without making changes to job schedulers, our scheme could help some current and future Hadoop job schedulers speed up the execution of jobs with high priority. Experimental results demonstrate that jobs executed with high priority can reduce their execution time by up to 68.28%.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/federation.html
Agarwal, S., Borthakur, D., Stoica, I.: Snapshots in Hadoop distributed file system. Technical report, EECS Department, University of California, Berkeley, November 2010 (2011)
Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383–1394. ACM (2015)
Blagojevic, F., Guyot, C., Wang, Q., Tsai, T., Mateescu, R., Bandic, Z.: Priority IO scheduling in the cloud. In: Proceedings of USENIX Conference on Hot Topics Cloud Computing, pp. 1–6 (2013)
Borthakur, D., et al.: Apache Hadoop goes realtime at Facebook. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 1071–1080. ACM, New York (2011). https://doi.org/10.1145/1989323.1989438
Bui, D.M., Hussain, S., Huh, E.N., Lee, S.: Adaptive replication management in hdfs based on supervised learning. IEEE Trans. Knowl. Data Eng. 28(6), 1369–1382 (2016)
Burns, B., Oppenheimer, D.: Design patterns for container-based distributed systems. In: 8th \(\{\)USENIX\(\}\) Workshop on Hot Topics in Cloud Computing, HotCloud 2016 (2016)
Buyya, R., Broberg, J., Goscinski, A.M.: Cloud Computing: Principles and Paradigms, vol. 87. Wiley, Hoboken (2010)
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: ACM SIGOPS Operating Systems Review, vol. 37, pp. 29–43. ACM (2003)
Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: ZooKeeper: wait-free coordination for internet-scale systems. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, vol. 8, pp. 11–11 (2010)
Karanasos, K., et al.: Mercury: hybrid centralized and distributed scheduling in large shared clusters. In: 2015 \(\{\)USENIX\(\}\) Annual Technical Conference, \(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 2015, pp. 485–497 (2015)
Kc, K., Anyanwu, K.: Scheduling Hadoop jobs to meet deadlines. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), pp. 388–392. IEEE (2010)
Kondikoppa, P., Chiu, C.H., Cui, C., Xue, L., Park, S.J.: Network-aware scheduling of MapReduce framework on distributed clusters over high speed networks. In: Proceedings of the 2012 Workshop on Cloud Services, Federation, and the 8th Open Cirrus Summit, pp. 39–44. ACM (2012)
Li, H., Ghodsi, A., Zaharia, M., Shenker, S., Stoica, I.: Tachyon: reliable, memory speed storage for cluster computing frameworks. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 1–15. ACM (2014)
Oriani, A., Garcia, I.C.: From backup to hot standby: high availability for HDFS. In: 2012 IEEE 31st Symposium on Reliable Distributed Systems (SRDS), pp. 131–140. IEEE (2012)
Qin, P., Dai, B., Huang, B., Xu, G.: Bandwidth-aware scheduling with SDN in Hadoop: a new trend for big data. IEEE Syst. J. 11, 2337–2344 (2015)
Rasooli, A., Down, D.G.: An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems. In: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, pp. 30–44. IBM Corporation (2011)
Renner, T., Thamsen, L., Kao, O.: CoLoc: distributed data and container colocation for data-intensive applications. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 3008–3015. IEEE (2016)
Rista, C., Griebler, D., Maron, C.A., Fernandes, L.G.: Improving the network performance of a container-based cloud environment for Hadoop systems. In: 2017 International Conference on High Performance Computing & Simulation (HPCS), pp. 619–626. IEEE (2017)
Sandholm, T., Lai, K.: Dynamic proportional share scheduling in Hadoop. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2010. LNCS, vol. 6253, pp. 110–131. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16505-4_7
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Tan, J., Meng, X., Zhang, L.: Coupling task progress for MapReduce resource-aware scheduling. In: 2013 Proceedings IEEE INFOCOM, pp. 1618–1626. IEEE (2013)
Varga, M., Petrescu-Nita, A., Pop, F.: Deadline scheduling algorithm for sustainable computing in Hadoop environment. Comput. Secur. 76, 354–366 (2018)
Vavilapalli, V.K., et al.: Apache Hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, p. 5. ACM (2013)
White, T.: Hadoop: The Definitive Guide, 3rd edn. O’Reilly, Newton (2012)
Yeh, T., Huang, H.: Realizing prioritized scheduling service in the Hadoop system. In: 2018 IEEE 6th International Conference on Future Internet of Things and Cloud (FiCloud), pp. 47–54. IEEE (2018)
Yeh, T., Sun, Y.: Enabling prioritized cloud I/O service in Hadoop distributed file system. In: The 16th IEEE International Conference on High Performance Computing and Communications, pp. 256–259. IEEE (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Yeh, T., Yu, S. (2020). Achieving Dynamic Resource Allocation in the Hadoop Cloud System. In: Hsu, CH., Kallel, S., Lan, KC., Zheng, Z. (eds) Internet of Vehicles. Technologies and Services Toward Smart Cities. IOV 2019. Lecture Notes in Computer Science(), vol 11894. Springer, Cham. https://doi.org/10.1007/978-3-030-38651-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-38651-1_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38650-4
Online ISBN: 978-3-030-38651-1
eBook Packages: Computer ScienceComputer Science (R0)