Skip to main content
Log in

Making a case for the on-demand multiple distributed message queue system in a Hadoop cluster

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In this paper, we present a framework that can provide users with a simple, convenient and powerful way to deploy multiple message queue system on demand in a Hadoop cluster. Specifically, we are leveraging the Apache Kafka which is one of the state of art distributed message queue systems that can achieve high throughput, low latency, and good load balancing. Our framework provides automation of setting up and starting Kafka brokers on the fly and users can leverage the framework to quickly adopt Kafka without spending much efforts on installation and configuration challenges. In addition, the framework supports users to run their Kafka-based applications without detailed knowledge about the Hadoop YARN APIs and underlying mechanisms. We present a use case of the framework to evaluate Kafka’s performance with various test cases and working scenarios. The experimental results allow Kafka’s potential users to perceive the influences of different settings on the queuing performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Apache Kafka: A high-throughput distributed messaging system. http://kafka.apache.org/ (2017). Accessed 8 July 2017

  2. Apache Kafka use cases. https://kafka.apache.org/uses (2017). Accessed 8 July 2017

  3. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008)

    Article  Google Scholar 

  4. He, C., Weitzel, D., Swanson, D., Lu, Y.: HOG: distributed Hadoop MapReduce on the grid. In: Proceedings of the 5th Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS) 2012 in conjunction with SC12 (2012)

  5. Hintjens, P.: ZeroMQ: Messaging for Many Applications. O’Reilly Media, Inc., Newton (2013)

    Google Scholar 

  6. Introducing KOYA Apache Kafka on YARN. https://www.datatorrent.com/blog/introducing-koya-apache-kafka-on-yarn/ (2017). Accessed 8 July 2017

  7. Kim, J.S., Nguyen, C., Hwang, S.: MOHA: many-task computing meets the big data platform. In: IEEE 12th International Conference on eScience (eScience 2016) (2016)

  8. Kreps, J., Narkhede, N., Rao, J., et al.: Kafka: a distributed messaging system for log processing. In: Proceedings of the NetDB (2011)

  9. Liu, G., Wood, T.: Cloud-scale application performance monitoring with SDN and NFV. In: 2015 IEEE International Conference on Cloud Engineering (IC2E), pp. 440–445. IEEE, New York (2015)

  10. Lu, X., Liang, F., Wang, B., Zha, L., Xu, Z.: DataMPI: extending MPI to Hadoop-like big data computing. In: Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS ’14) (2014)

  11. Murthy, A., Vavilapalli, V., Eadline, D., Niemiec, J., Markham, J.: Apache Hadoop YARN: Moving Beyond MapReduce and Batch Processing with Apache Hadoop 2. Addison-Wesley Data & Analytics, New York (2014)

    Google Scholar 

  12. Murthy, A.C., Vavilapalli, V.K., Eadline, D., Niemiec, J., Markham, J.: Apache Hadoop YARN: Moving Beyond MapReduce and Batch Processing with Apache Hadoop 2. Pearson Education, Upper Saddle River (2013)

    Google Scholar 

  13. Nannoni, N.: Message-oriented middleware for scalable data analytics architectures. Master’s thesis, KTH—Information and Communication Technology School (2015)

  14. Nguyen, C., Kim, J.S., Hwang, S.: KOHA: building a Kafka-based distributed queue system on the fly in a Hadoop cluster. In: 2016 IEEE 1st International Workshops on Foundations and Applications of Self-* Systems (2016)

  15. Preuveneers, D., Berbers, Y., Joosen Samurai, W.: A batch and streaming context architecture for large-scale intelligent applications and environments. J. Ambient Intell. Smart Environ. 8(1), 63–78 (2016)

    Article  Google Scholar 

  16. Raicu, I., Foster, I., Wilde, M., Zhang, Z., Iskra, K., Beckman, P., Zhao, Y., Szalay, A., Choudhary, A., Little, P., et al.: Middleware support for many-task computing. Cluster Comput. 13(3), 291–314 (2010)

    Article  Google Scholar 

  17. Raicu, I., Foster, I., Zhao, Y.: Many-task computing for grids and supercomputers. In: Proceedings of the Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS’08) (2008)

  18. Richardson, A., et al.: Introduction to RabbitMQ—An Open Source Message Broker That Just Works. Google, London (2008)

    Google Scholar 

  19. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10) (2010)

  20. Snyder, B., Bosanac, D., Davies, R.: Introduction to apache activeMQ. In: ActiveMQ in Action, pp. 6–16

  21. The Apache Hadoop Project: Open-source software for reliable, scalable, distributed computing. http://hadoop.apache.org/ (2017). Accessed 8 July 2017

  22. Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E.: Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing (SoCC’13) (2013)

  23. Xu, L., Li, M., Butt, A.R.: GERBIL: MPI+YARN. In: Proceedings of the 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (2015)

  24. Ye, J., Chow, J.H., Chen, J., Zheng, Z.: Stochastic gradient boosted distributed decision trees. In: Proceedings of the 18th ACM conference on Information and knowledge management (CIKM’09) (2009)

  25. Zookeeper: A centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. https://zookeeper.apache.org/ (2017). Accessed 8 July 2017

Download references

Acknowledgements

This work was supported by Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. R0190-16-2012, High Performance Big Data Analytics Platform Performance Acceleration Technologies Development).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jik-Soo Kim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, C.N., Hwang, S. & Kim, JS. Making a case for the on-demand multiple distributed message queue system in a Hadoop cluster. Cluster Comput 20, 2095–2106 (2017). https://doi.org/10.1007/s10586-017-1031-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1031-0

Keywords

Navigation