Skip to main content
Log in

Designing a Hadoop system based on computational resources and network delay for wide area networks

  • Published:
Telecommunication Systems Aims and scope Submit manuscript

Abstract

This paper proposes a Hadoop system that considers both slave server’s processing capacity and network delay for wide area networks to reduce the job processing time. The task allocation scheme in the proposed Hadoop system divides each individual job into multiple tasks using suitable splitting ratios and then allocates the tasks to different slaves according to the computational capability of each server and the availability of network resources. We incorporate software-defined networking to the proposed Hadoop system to manage path computation elements and network resources. The performance of proposed Hadoop system is experimentally evaluated with fourteen machines located in the different parts of the globe using a scale-out approach. A scale-out experiment using the proposed and conventional Hadoop systems is conducted by executing both single job and multiple jobs. The practical testbed and simulation results indicate that the proposed Hadoop system is effective compared to the conventional Hadoop system in terms of processing time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Manikandan, S., & Ravi, S. (2014). Big data analysis using apache hadoop. In International conference on IT convergence and security (ICITCS) (pp. 1–4).

  2. Dong, F., & Akl, S. G. (2006). Scheduling algorithms for grid computing: State of the art and open problems. Report: Technical.

  3. Apache Hadoop. http://hadoop.apache.org/.

  4. Adnan M., Afzal M., Aslam M., Jan R., & Martinez-Enriquez A. (2014). Minimizing big data problems using cloud computing based on hadoop architecture. In 11th annual high-capacity optical networks and emerging/enabling technologies (HONET) (pp. 99–103).

  5. Cloudera Impala Project. http://impala.io/.

  6. Cao, Z., Lin, J., Wan, C., Song, Y., Taylor, G., & Li, M. (2017). Hadoop-based framework for big data analysis of synchronised harmonics in active distribution network. IET Generation, Transmission & Distribution, 11(16), 3930–3937. https://doi.org/10.1049/iet-gtd.2016.1723.

    Article  Google Scholar 

  7. White, T. (2012). Hadoop: The definitive guide (3rd ed.). Newton: O’Reilly Media Inc.

    Google Scholar 

  8. Martin, B. (2014). SARAH-statistical analysis for resource allocation in hadoop. In IEEE 13th international conference on trust, security and privacy in computing and communications (TrustCom) (pp. 777–782).

  9. Chen, D., Chen, Y., Brownlow, B. N., Kanjamala, P. P., Arredondo, C. A. G., Radspinner, B. L., et al. (2017). Real-time or near real-time persisting daily healthcare data into HDFS and elasticsearch index inside a big data platform. IEEE Transactions on Industrial Informatics, 13(2), 595–606. https://doi.org/10.1109/TII.2016.2645606.

    Article  Google Scholar 

  10. Palanisamy, B., Singh, A., & Liu, L. (2014). Cost-effective resource provisioning for mapreduce in a cloud. IEEE Transactions on Parallel and Distributed Systems, 26(5), 1265–1279. https://doi.org/10.1109/TPDS.2014.2320498.

    Article  Google Scholar 

  11. Zhao, Y., Wu, J., & Liu, C. (2014). Dache: A data aware caching for big-data applications using the MapReduce framework. Tsinghua Science and Technology, 19(1), 39–50. https://doi.org/10.1109/TST.2014.6733207.

    Article  Google Scholar 

  12. Jung, H., & Nakazato, H. (2014). Dynamic scheduling for speculative execution to improve MapReduce performance in heterogeneous environment. In IEEE 34th international conference on distributed computing systems workshops (ICDCSW) (pp. 119–124).

  13. Hsiao, J. & Kao, S. (2014). A usage-aware scheduler for improving MapReduce performance in heterogeneous environments. In International conference on information science, electronics and electrical engineering (ISEEE) (pp. 1648–1652).

  14. Zhu, N., Liu, X., Liu, J., & Hua, Y. (2014). Towards a cost-efficient MapReduce: Mitigating power peaks for Hadoop clusters. Tsinghua Science and Technology, 19(1), 24–32. https://doi.org/10.1109/TST.2014.6733205.

    Article  Google Scholar 

  15. Xu, X., Cao, L., & Wang, X. (2014). Adaptive task scheduling strategy based on dynamic workload adjustment for heterogeneous Hadoop clusters. IEEE Systems Journal, 10(2), 471–482. https://doi.org/10.1109/JSYST.2014.2323112.

    Article  Google Scholar 

  16. Yao, Y., Wang, J., Sheng, B., Lin, J., & Mi, N. (2014). HaSTE: Hadoop YARN scheduling based on task-dependency and resource-demand. In IEEE 7th international conference on cloud computing (CLOUD) (pp. 184–191).

  17. Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R. H., & Stoica, I. (2008). Improving MapReduce performance in heterogeneous environments. In 8th USENIX symposium on operating systems design and implementation (OSDI) (pp. 29–42).

  18. Xiong, R., Luo, J., & Dong, F. (2014). SLDP: A novel data placement strategy for large-scale heterogeneous Hadoop cluster. In Second international conference on advanced cloud and big data (CBD) (pp. 9–17).

  19. Guo, Z. & Fox, G. (2012). Improving MapReduce performance in heterogeneous network environments and resource utilization. In 12th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid) (pp. 714–716).

  20. Matsuno, T., Chatterjee, B. C., Oki, E., Okamoto, S., Yamanaka, N., & Veeraraghavan, M. (2015). Task allocation scheme for Hadoop in campus network environment. In IEICE society conference (pp. B-12-20).

  21. Matsuno, T., Chatterjee, B. C., Oki, E., Okamoto, S., Yamanaka, N., & Veeraraghavan, M. (2015). Resource allocation scheme for Hadoop in campus networks. In 21st Asia-Pacific conference on communications (APCC) (APCC 2015) (pp. 596–597).

  22. Matsuno, T., Chatterjee, B. C., Oki, E., Okamoto, S., Yamanaka, N., & Veeraraghavan, M. (2016). Task allocation scheme based on computational and network resources for heterogeneous Hadoop clusters. In IEEE 17th international conference on high performance switching and routing (HPSR) (pp. 200–205).

  23. Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., & Stoica, I. (2010). Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In 5th European conference on computer systems (EuroSys ’10) (pp. 265–278).

  24. Tan, J., Meng, X., & Zhang, L. (2013). Coupling task progress for mapreduce resource-aware scheduling. In IEEE INFOCOM (pp. 1618–1626).

  25. Seo, S., Jang, I., Woo, K., Kim, I., Kim, J. S., & Maeng, S. (2009).HPMR: Prefetching and pre-shuffling in shared mapreduce computation environment. In IEEE international conference on cluster computing and workshops (pp. 1–8).

  26. Jin, J., Luo, J., Song, A., Dong, F., & Xiong, R. (2011). Bar: An efficient data locality driven task scheduling algorithm for cloud computing. In 11th IEEE/ACM international symposium on cluster, cloud and grid computing (pp. 295–304).

  27. Fischer, M. J., Su, X., & Yin, Y. (2010). Assigning tasks for efficiency in Hadoop: Extended abstract. In Twenty-second annual ACM symposium on parallelism in algorithms and architectures (SPAA ’10) (pp. 30–39).

  28. Wang, G., Ng, T. E., & Shaikh, A. (2012). Programming your network at run-time for big data applications. In First workshop on hot topics in software defined networks (HotSDN ’12) (pp. 103–108).

  29. Qin, P., Dai, B., Huang, B., & Xu, G. (2017). Bandwidth-aware scheduling with SDN in Hadoop: A new trend for big data. IEEE Systems Journal, 11(4), 2337–2344. https://doi.org/10.1109/JSYST.2015.2496368.

    Article  Google Scholar 

  30. Zhu, T., Feng, D., Wang, F., Hua, Y., Shi, Q., Liu, J., et al. (2017). Efficient anonymous communication in SDN-based data center networks. IEEE/ACM Transactions on Networking, 25(6), 3767–3780. https://doi.org/10.1109/TNET.2017.2751616.

    Article  Google Scholar 

  31. Ruffini, M., Slyne, F., Bluemm, C., Kitsuwan, N., & McGettrick, S. (2015). Software defined networking for next generation converged metro-access networks. Optical Fiber Technology, 26(A), 31–41. https://doi.org/10.1016/j.yofte.2015.08.008.

    Article  Google Scholar 

  32. OpenFlow. http://archive.openflow.org/.

  33. Oki, E. (2013). Linear programming and algorithms for communication networks. Boca Raton: CRC Press.

    Google Scholar 

  34. When SDN meets Hadoop big data analysis, things get dynamic. Retrieved January 20, 2018 from http://searchsdn.techtarget.com/opinion/When-SDN-meets-Hadoop-big-data-analysis-things-get-dynamic.

  35. Kitsuwan, N., McGettrick, S., Slyne, F., Payne, D. B., & Ruffini, M. (2015). Independent transient plane design for protection in OpenFlow-based networks. IEEE/OSA Journal of Optical Communications and Networking, 7(4), 264–275. https://doi.org/10.1364/JOCN.7.000264.

    Article  Google Scholar 

  36. Zhao, S., & Medhi, D. (2017). Application-aware network design for Hadoop MapReduce optimization using software-defined networking. IEEE Transactions on Network and Service Management, 14(4), 804–816. https://doi.org/10.1109/TNSM.2017.2728519.

    Article  Google Scholar 

  37. Le Roux, J. L. (2007). Path computation element communication protocol (PCECP) specific requirements for inter-area MPLS and GMPLS traffic engineering. IETF RFC 4927. https://tools.ietf.org/html/rfc4927.

  38. Lee, Y., Le Roux, J. L., King, D., & Oki, E. (2009). Path computation element communication protocol (PCEP) Requirements and Protocol Extensions in Support of Global Concurrent Optimization. IETF RFC 5557. https://tools.ietf.org/html/rfc5557.

  39. Oki, E., Inoue, I., & Shiomoto, K. (2007). Path computation element (PCE)-based traffic engineering in MPLS and GMPLS networks. In IEEE sarnoff symposium (pp. 1–5).

  40. Oki, E., Takada. T., Le Roux, J. L., & Farrel, A. (2009). Framework for PCE-based inter-layer MPLS and GMPLS Traffic Engineering. IETF RFC 5623. https://tools.ietf.org/html/rfc5623.

  41. Apache Hadoop source code. Retrieved November 29, 2016 from http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.1/hadoop-2.7.1-src.tar.gz/.

  42. VMware solution. Retrieved January 24, 2016 from http://www.vsolution.jp/.

  43. Ishii, M., Han, J., & Makino, H. (2013). Design and Performance Evaluation for Hadoop Clusters on Virtualized Environment. In International Conference on Information Networking (ICOIN) (pp. 244-249).

  44. Pi program. Retrieved January 24, 2016 from http://h2np.net/pi/mt-bbp.c.

  45. Machin-Like Formulas. Retrieved November 29, 2016 from http://mathworld.wolfram.com/ Machin-LikeFormulas.html.

  46. WordCount. Retrieved November 29, 2016 from http://hadoop.apache.org/docs/stable/ hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html.

  47. Apache Hadoop examples. Retrieved November 29, 2016 from http://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/ examples/terasort/package-summary.html.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nattapong Kitsuwan.

Additional information

This work was supported in part by the national institute of information and communications technology (NICT), Japan and by an NSF Grant (CNS-1405171), USA. Parts of this paper were partially presented at The 21st Asia-pacific conference on communications (APCC 2015) and the 2016 IEEE 17th International conference on high performance switching and routing (HPSR 2016).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Matsuno, T., Chatterjee, B.C., Kitsuwan, N. et al. Designing a Hadoop system based on computational resources and network delay for wide area networks. Telecommun Syst 70, 13–25 (2019). https://doi.org/10.1007/s11235-018-0464-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11235-018-0464-y

Keywords

Navigation