Making a case for the on-demand multiple distributed message queue system in a Hadoop cluster

Nguyen, Cao Ngoc; Hwang, Soonwook; Kim, Jik-Soo

doi:10.1007/s10586-017-1031-0

Making a case for the on-demand multiple distributed message queue system in a Hadoop cluster

Published: 10 July 2017

Volume 20, pages 2095–2106, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Cao Ngoc Nguyen¹,
Soonwook Hwang¹ &
Jik-Soo Kim²

469 Accesses
4 Citations
Explore all metrics

Abstract

In this paper, we present a framework that can provide users with a simple, convenient and powerful way to deploy multiple message queue system on demand in a Hadoop cluster. Specifically, we are leveraging the Apache Kafka which is one of the state of art distributed message queue systems that can achieve high throughput, low latency, and good load balancing. Our framework provides automation of setting up and starting Kafka brokers on the fly and users can leverage the framework to quickly adopt Kafka without spending much efforts on installation and configuration challenges. In addition, the framework supports users to run their Kafka-based applications without detailed knowledge about the Hadoop YARN APIs and underlying mechanisms. We present a use case of the framework to evaluate Kafka’s performance with various test cases and working scenarios. The experimental results allow Kafka’s potential users to perceive the influences of different settings on the queuing performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics on Apache Spark

Article 13 October 2016

A brief introduction to distributed systems

Article Open access 16 August 2016

Big data analytics: a survey

Article Open access 01 October 2015

References

Apache Kafka: A high-throughput distributed messaging system. http://kafka.apache.org/ (2017). Accessed 8 July 2017
Apache Kafka use cases. https://kafka.apache.org/uses (2017). Accessed 8 July 2017
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008)
Article Google Scholar
He, C., Weitzel, D., Swanson, D., Lu, Y.: HOG: distributed Hadoop MapReduce on the grid. In: Proceedings of the 5th Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS) 2012 in conjunction with SC12 (2012)
Hintjens, P.: ZeroMQ: Messaging for Many Applications. O’Reilly Media, Inc., Newton (2013)
Google Scholar
Introducing KOYA Apache Kafka on YARN. https://www.datatorrent.com/blog/introducing-koya-apache-kafka-on-yarn/ (2017). Accessed 8 July 2017
Kim, J.S., Nguyen, C., Hwang, S.: MOHA: many-task computing meets the big data platform. In: IEEE 12th International Conference on eScience (eScience 2016) (2016)
Kreps, J., Narkhede, N., Rao, J., et al.: Kafka: a distributed messaging system for log processing. In: Proceedings of the NetDB (2011)
Liu, G., Wood, T.: Cloud-scale application performance monitoring with SDN and NFV. In: 2015 IEEE International Conference on Cloud Engineering (IC2E), pp. 440–445. IEEE, New York (2015)
Lu, X., Liang, F., Wang, B., Zha, L., Xu, Z.: DataMPI: extending MPI to Hadoop-like big data computing. In: Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS ’14) (2014)
Murthy, A., Vavilapalli, V., Eadline, D., Niemiec, J., Markham, J.: Apache Hadoop YARN: Moving Beyond MapReduce and Batch Processing with Apache Hadoop 2. Addison-Wesley Data & Analytics, New York (2014)
Google Scholar
Murthy, A.C., Vavilapalli, V.K., Eadline, D., Niemiec, J., Markham, J.: Apache Hadoop YARN: Moving Beyond MapReduce and Batch Processing with Apache Hadoop 2. Pearson Education, Upper Saddle River (2013)
Google Scholar
Nannoni, N.: Message-oriented middleware for scalable data analytics architectures. Master’s thesis, KTH—Information and Communication Technology School (2015)
Nguyen, C., Kim, J.S., Hwang, S.: KOHA: building a Kafka-based distributed queue system on the fly in a Hadoop cluster. In: 2016 IEEE 1st International Workshops on Foundations and Applications of Self-* Systems (2016)
Preuveneers, D., Berbers, Y., Joosen Samurai, W.: A batch and streaming context architecture for large-scale intelligent applications and environments. J. Ambient Intell. Smart Environ. 8(1), 63–78 (2016)
Article Google Scholar
Raicu, I., Foster, I., Wilde, M., Zhang, Z., Iskra, K., Beckman, P., Zhao, Y., Szalay, A., Choudhary, A., Little, P., et al.: Middleware support for many-task computing. Cluster Comput. 13(3), 291–314 (2010)
Article Google Scholar
Raicu, I., Foster, I., Zhao, Y.: Many-task computing for grids and supercomputers. In: Proceedings of the Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS’08) (2008)
Richardson, A., et al.: Introduction to RabbitMQ—An Open Source Message Broker That Just Works. Google, London (2008)
Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10) (2010)
Snyder, B., Bosanac, D., Davies, R.: Introduction to apache activeMQ. In: ActiveMQ in Action, pp. 6–16
The Apache Hadoop Project: Open-source software for reliable, scalable, distributed computing. http://hadoop.apache.org/ (2017). Accessed 8 July 2017
Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E.: Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing (SoCC’13) (2013)
Xu, L., Li, M., Butt, A.R.: GERBIL: MPI+YARN. In: Proceedings of the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (2015)
Ye, J., Chow, J.H., Chen, J., Zheng, Z.: Stochastic gradient boosted distributed decision trees. In: Proceedings of the 18th ACM conference on Information and knowledge management (CIKM’09) (2009)
Zookeeper: A centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. https://zookeeper.apache.org/ (2017). Accessed 8 July 2017

Download references

Acknowledgements

This work was supported by Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. R0190-16-2012, High Performance Big Data Analytics Platform Performance Acceleration Technologies Development).

Author information

Authors and Affiliations

Korea Institute of Science and Technology Information, University of Science & Technology, Daejeon, Republic of Korea
Cao Ngoc Nguyen & Soonwook Hwang
Department of Computer Engineering, Myongji University, Yongin, Republic of Korea
Jik-Soo Kim

Authors

Cao Ngoc Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Soonwook Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Jik-Soo Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jik-Soo Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, C.N., Hwang, S. & Kim, JS. Making a case for the on-demand multiple distributed message queue system in a Hadoop cluster. Cluster Comput 20, 2095–2106 (2017). https://doi.org/10.1007/s10586-017-1031-0

Download citation

Received: 28 February 2017
Revised: 20 May 2017
Accepted: 03 July 2017
Published: 10 July 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10586-017-1031-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Making a case for the on-demand multiple distributed message queue system in a Hadoop cluster

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

A brief introduction to distributed systems

Big data analytics: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Making a case for the on-demand multiple distributed message queue system in a Hadoop cluster

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

A brief introduction to distributed systems

Big data analytics: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation