Designing a Hadoop system based on computational resources and network delay for wide area networks

Matsuno, Tomohiro; Chatterjee, Bijoy Chand; Kitsuwan, Nattapong; Oki, Eiji; Veeraraghavan, Malathi; Okamoto, Satoru; Yamanaka, Naoaki

doi:10.1007/s11235-018-0464-y

Designing a Hadoop system based on computational resources and network delay for wide area networks

Published: 26 April 2018

Volume 70, pages 13–25, (2019)
Cite this article

Telecommunication Systems Aims and scope Submit manuscript

Tomohiro Matsuno¹,
Bijoy Chand Chatterjee^1,2,
Nattapong Kitsuwan ORCID: orcid.org/0000-0002-6335-424X¹,
Eiji Oki^1,3,
Malathi Veeraraghavan⁴,
Satoru Okamoto⁵ &
…
Naoaki Yamanaka⁵

284 Accesses
1 Citation
Explore all metrics

Abstract

This paper proposes a Hadoop system that considers both slave server’s processing capacity and network delay for wide area networks to reduce the job processing time. The task allocation scheme in the proposed Hadoop system divides each individual job into multiple tasks using suitable splitting ratios and then allocates the tasks to different slaves according to the computational capability of each server and the availability of network resources. We incorporate software-defined networking to the proposed Hadoop system to manage path computation elements and network resources. The performance of proposed Hadoop system is experimentally evaluated with fourteen machines located in the different parts of the globe using a scale-out approach. A scale-out experiment using the proposed and conventional Hadoop systems is conducted by executing both single job and multiple jobs. The practical testbed and simulation results indicate that the proposed Hadoop system is effective compared to the conventional Hadoop system in terms of processing time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Edge computing: current trends, research challenges and future directions

Article 18 January 2021

Queue stability and dynamic throughput maximization in multi-agent heterogeneous wireless networks

Article 13 April 2024

A survey of Kubernetes scheduling algorithms

Article Open access 13 June 2023

References

Manikandan, S., & Ravi, S. (2014). Big data analysis using apache hadoop. In International conference on IT convergence and security (ICITCS) (pp. 1–4).
Dong, F., & Akl, S. G. (2006). Scheduling algorithms for grid computing: State of the art and open problems. Report: Technical.
Apache Hadoop. http://hadoop.apache.org/.
Adnan M., Afzal M., Aslam M., Jan R., & Martinez-Enriquez A. (2014). Minimizing big data problems using cloud computing based on hadoop architecture. In 11th annual high-capacity optical networks and emerging/enabling technologies (HONET) (pp. 99–103).
Cloudera Impala Project. http://impala.io/.
Cao, Z., Lin, J., Wan, C., Song, Y., Taylor, G., & Li, M. (2017). Hadoop-based framework for big data analysis of synchronised harmonics in active distribution network. IET Generation, Transmission & Distribution, 11(16), 3930–3937. https://doi.org/10.1049/iet-gtd.2016.1723.
Article Google Scholar
White, T. (2012). Hadoop: The definitive guide (3rd ed.). Newton: O’Reilly Media Inc.
Google Scholar
Martin, B. (2014). SARAH-statistical analysis for resource allocation in hadoop. In IEEE 13th international conference on trust, security and privacy in computing and communications (TrustCom) (pp. 777–782).
Chen, D., Chen, Y., Brownlow, B. N., Kanjamala, P. P., Arredondo, C. A. G., Radspinner, B. L., et al. (2017). Real-time or near real-time persisting daily healthcare data into HDFS and elasticsearch index inside a big data platform. IEEE Transactions on Industrial Informatics, 13(2), 595–606. https://doi.org/10.1109/TII.2016.2645606.
Article Google Scholar
Palanisamy, B., Singh, A., & Liu, L. (2014). Cost-effective resource provisioning for mapreduce in a cloud. IEEE Transactions on Parallel and Distributed Systems, 26(5), 1265–1279. https://doi.org/10.1109/TPDS.2014.2320498.
Article Google Scholar
Zhao, Y., Wu, J., & Liu, C. (2014). Dache: A data aware caching for big-data applications using the MapReduce framework. Tsinghua Science and Technology, 19(1), 39–50. https://doi.org/10.1109/TST.2014.6733207.
Article Google Scholar
Jung, H., & Nakazato, H. (2014). Dynamic scheduling for speculative execution to improve MapReduce performance in heterogeneous environment. In IEEE 34th international conference on distributed computing systems workshops (ICDCSW) (pp. 119–124).
Hsiao, J. & Kao, S. (2014). A usage-aware scheduler for improving MapReduce performance in heterogeneous environments. In International conference on information science, electronics and electrical engineering (ISEEE) (pp. 1648–1652).
Zhu, N., Liu, X., Liu, J., & Hua, Y. (2014). Towards a cost-efficient MapReduce: Mitigating power peaks for Hadoop clusters. Tsinghua Science and Technology, 19(1), 24–32. https://doi.org/10.1109/TST.2014.6733205.
Article Google Scholar
Xu, X., Cao, L., & Wang, X. (2014). Adaptive task scheduling strategy based on dynamic workload adjustment for heterogeneous Hadoop clusters. IEEE Systems Journal, 10(2), 471–482. https://doi.org/10.1109/JSYST.2014.2323112.
Article Google Scholar
Yao, Y., Wang, J., Sheng, B., Lin, J., & Mi, N. (2014). HaSTE: Hadoop YARN scheduling based on task-dependency and resource-demand. In IEEE 7th international conference on cloud computing (CLOUD) (pp. 184–191).
Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R. H., & Stoica, I. (2008). Improving MapReduce performance in heterogeneous environments. In 8th USENIX symposium on operating systems design and implementation (OSDI) (pp. 29–42).
Xiong, R., Luo, J., & Dong, F. (2014). SLDP: A novel data placement strategy for large-scale heterogeneous Hadoop cluster. In Second international conference on advanced cloud and big data (CBD) (pp. 9–17).
Guo, Z. & Fox, G. (2012). Improving MapReduce performance in heterogeneous network environments and resource utilization. In 12th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid) (pp. 714–716).
Matsuno, T., Chatterjee, B. C., Oki, E., Okamoto, S., Yamanaka, N., & Veeraraghavan, M. (2015). Task allocation scheme for Hadoop in campus network environment. In IEICE society conference (pp. B-12-20).
Matsuno, T., Chatterjee, B. C., Oki, E., Okamoto, S., Yamanaka, N., & Veeraraghavan, M. (2015). Resource allocation scheme for Hadoop in campus networks. In 21st Asia-Pacific conference on communications (APCC) (APCC 2015) (pp. 596–597).
Matsuno, T., Chatterjee, B. C., Oki, E., Okamoto, S., Yamanaka, N., & Veeraraghavan, M. (2016). Task allocation scheme based on computational and network resources for heterogeneous Hadoop clusters. In IEEE 17th international conference on high performance switching and routing (HPSR) (pp. 200–205).
Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., & Stoica, I. (2010). Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In 5th European conference on computer systems (EuroSys ’10) (pp. 265–278).
Tan, J., Meng, X., & Zhang, L. (2013). Coupling task progress for mapreduce resource-aware scheduling. In IEEE INFOCOM (pp. 1618–1626).
Seo, S., Jang, I., Woo, K., Kim, I., Kim, J. S., & Maeng, S. (2009).HPMR: Prefetching and pre-shuffling in shared mapreduce computation environment. In IEEE international conference on cluster computing and workshops (pp. 1–8).
Jin, J., Luo, J., Song, A., Dong, F., & Xiong, R. (2011). Bar: An efficient data locality driven task scheduling algorithm for cloud computing. In 11th IEEE/ACM international symposium on cluster, cloud and grid computing (pp. 295–304).
Fischer, M. J., Su, X., & Yin, Y. (2010). Assigning tasks for efficiency in Hadoop: Extended abstract. In Twenty-second annual ACM symposium on parallelism in algorithms and architectures (SPAA ’10) (pp. 30–39).
Wang, G., Ng, T. E., & Shaikh, A. (2012). Programming your network at run-time for big data applications. In First workshop on hot topics in software defined networks (HotSDN ’12) (pp. 103–108).
Qin, P., Dai, B., Huang, B., & Xu, G. (2017). Bandwidth-aware scheduling with SDN in Hadoop: A new trend for big data. IEEE Systems Journal, 11(4), 2337–2344. https://doi.org/10.1109/JSYST.2015.2496368.
Article Google Scholar
Zhu, T., Feng, D., Wang, F., Hua, Y., Shi, Q., Liu, J., et al. (2017). Efficient anonymous communication in SDN-based data center networks. IEEE/ACM Transactions on Networking, 25(6), 3767–3780. https://doi.org/10.1109/TNET.2017.2751616.
Article Google Scholar
Ruffini, M., Slyne, F., Bluemm, C., Kitsuwan, N., & McGettrick, S. (2015). Software defined networking for next generation converged metro-access networks. Optical Fiber Technology, 26(A), 31–41. https://doi.org/10.1016/j.yofte.2015.08.008.
Article Google Scholar
OpenFlow. http://archive.openflow.org/.
Oki, E. (2013). Linear programming and algorithms for communication networks. Boca Raton: CRC Press.
Google Scholar
When SDN meets Hadoop big data analysis, things get dynamic. Retrieved January 20, 2018 from http://searchsdn.techtarget.com/opinion/When-SDN-meets-Hadoop-big-data-analysis-things-get-dynamic.
Kitsuwan, N., McGettrick, S., Slyne, F., Payne, D. B., & Ruffini, M. (2015). Independent transient plane design for protection in OpenFlow-based networks. IEEE/OSA Journal of Optical Communications and Networking, 7(4), 264–275. https://doi.org/10.1364/JOCN.7.000264.
Article Google Scholar
Zhao, S., & Medhi, D. (2017). Application-aware network design for Hadoop MapReduce optimization using software-defined networking. IEEE Transactions on Network and Service Management, 14(4), 804–816. https://doi.org/10.1109/TNSM.2017.2728519.
Article Google Scholar
Le Roux, J. L. (2007). Path computation element communication protocol (PCECP) specific requirements for inter-area MPLS and GMPLS traffic engineering. IETF RFC 4927. https://tools.ietf.org/html/rfc4927.
Lee, Y., Le Roux, J. L., King, D., & Oki, E. (2009). Path computation element communication protocol (PCEP) Requirements and Protocol Extensions in Support of Global Concurrent Optimization. IETF RFC 5557. https://tools.ietf.org/html/rfc5557.
Oki, E., Inoue, I., & Shiomoto, K. (2007). Path computation element (PCE)-based traffic engineering in MPLS and GMPLS networks. In IEEE sarnoff symposium (pp. 1–5).
Oki, E., Takada. T., Le Roux, J. L., & Farrel, A. (2009). Framework for PCE-based inter-layer MPLS and GMPLS Traffic Engineering. IETF RFC 5623. https://tools.ietf.org/html/rfc5623.
Apache Hadoop source code. Retrieved November 29, 2016 from http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1-src.tar.gz/.
VMware solution. Retrieved January 24, 2016 from http://www.vsolution.jp/.
Ishii, M., Han, J., & Makino, H. (2013). Design and Performance Evaluation for Hadoop Clusters on Virtualized Environment. In International Conference on Information Networking (ICOIN) (pp. 244-249).
Pi program. Retrieved January 24, 2016 from http://h2np.net/pi/mt-bbp.c.
Machin-Like Formulas. Retrieved November 29, 2016 from http://mathworld.wolfram.com/ Machin-LikeFormulas.html.
WordCount. Retrieved November 29, 2016 from http://hadoop.apache.org/docs/stable/ hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html.
Apache Hadoop examples. Retrieved November 29, 2016 from http://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/ examples/terasort/package-summary.html.

Download references

Author information

Authors and Affiliations

The Department of Computer and Network Engineering, The University of Electro-Communications, Chofugaoka 1-5-1, Tokyo, 182-8585, Japan
Tomohiro Matsuno, Bijoy Chand Chatterjee, Nattapong Kitsuwan & Eiji Oki
Indraprastha Institute of Information Technology, Delhi, India
Bijoy Chand Chatterjee
Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501, Japan
Eiji Oki
The Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, USA
Malathi Veeraraghavan
The Department of Information and Computer Science, Keio University, Tokyo, Japan
Satoru Okamoto & Naoaki Yamanaka

Authors

Tomohiro Matsuno
View author publications
You can also search for this author in PubMed Google Scholar
Bijoy Chand Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
Nattapong Kitsuwan
View author publications
You can also search for this author in PubMed Google Scholar
Eiji Oki
View author publications
You can also search for this author in PubMed Google Scholar
Malathi Veeraraghavan
View author publications
You can also search for this author in PubMed Google Scholar
Satoru Okamoto
View author publications
You can also search for this author in PubMed Google Scholar
Naoaki Yamanaka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nattapong Kitsuwan.

Additional information

This work was supported in part by the national institute of information and communications technology (NICT), Japan and by an NSF Grant (CNS-1405171), USA. Parts of this paper were partially presented at The 21st Asia-pacific conference on communications (APCC 2015) and the 2016 IEEE 17th International conference on high performance switching and routing (HPSR 2016).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Matsuno, T., Chatterjee, B.C., Kitsuwan, N. et al. Designing a Hadoop system based on computational resources and network delay for wide area networks. Telecommun Syst 70, 13–25 (2019). https://doi.org/10.1007/s11235-018-0464-y

Download citation

Published: 26 April 2018
Issue Date: 15 January 2019
DOI: https://doi.org/10.1007/s11235-018-0464-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Designing a Hadoop system based on computational resources and network delay for wide area networks

Abstract

Access this article

Similar content being viewed by others

Edge computing: current trends, research challenges and future directions

Queue stability and dynamic throughput maximization in multi-agent heterogeneous wireless networks

A survey of Kubernetes scheduling algorithms

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Designing a Hadoop system based on computational resources and network delay for wide area networks

Abstract

Access this article

Similar content being viewed by others

Edge computing: current trends, research challenges and future directions

Queue stability and dynamic throughput maximization in multi-agent heterogeneous wireless networks

A survey of Kubernetes scheduling algorithms

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation