Skip to main content
Log in

On the role of message broker middleware for many-task computing on a big-data platform

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

We have designed and implemented a new data processing framework called “Many-task computing On HAdoop” (MOHA) which aims to effectively support fine-grained many-task applications that can show another type of data-intensive workloads in the YARN-based Hadoop 2.0 platform. MOHA is developed as one of Hadoop YARN applications so that it can transparently co-host existing many-task computing (MTC) applications with other data processing workflows such as MapReduce in a single Hadoop cluster. In this paper, we investigate main characteristics of two well-known open-source message broker middleware systems (Apache ActiveMQ and Kafka) and their implications on a many-task management scheme in our MOHA framework. Through our extensive experiments with a real MTC application, we demonstrate and discuss trade-offs between parallelism and load balancing of data access patterns in message broker middleware systems for Many-Task Computing on Hadoop.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Raicu, I., Foster, I., Wilde, M., Zhang, Z., Iskra, K., Beckman, P., Zhao, Y., Szalay, A., Choudhary, A., Little, P., et al.: Middleware support for many-task computing. Clust. Comput. 13(3), 291–314 (2010)

    Article  Google Scholar 

  2. Raicu, I., Foster, I.T., Zhao, Y.: Many-task computing for grids and supercomputers. In: Many-Task Computing on Grids and Supercomputers, 2008. MTAGS 2008. Workshop on, pp. 1–11. IEEE (2008)

  3. The Apache Hadoop project: Open-source software for reliable, scalable, distributed computing. http://hadoop.apache.org/

  4. Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al.: Apache hadoop yarn: Yet another resource negotiator. In: Proceedings of the 4th annual Symposium on Cloud Computing, p. 5. ACM (2013)

  5. Apache Spark: Lighting-fast cluster computing. https://spark.apache.org/

  6. Apache Storm: A free and open source distributed realtime computation system. http://storm.apache.org/

  7. Open MPI: Open Source High Performance Computing. https://www.open-mpi.org/

  8. Kim, J.S., Nguyen, C., Hwang, S.: Moha: Many-task computing meets the big data platform. In: e-Science (e-Science), 2016 IEEE 12th International Conference on, pp. 193–202. IEEE (2016)

  9. Nguyen, C., Kim, J.S., Lee, J., Hwang, S.: A case study of leveraging high-throughput distributed message queue system for many-task computing on hadoop. In: Foundations and Applications of Self* Systems (FAS* W), 2017 IEEE 2nd International Workshops on, pp. 257–262. IEEE (2017)

  10. Apache ActiveMQ: The most popular and powerful open source messaging and Integration Patterns server. http://activemq.apache.org/

  11. Apache Kafka: A high-throughput distributed messaging system: http://kafka.apache.org/

  12. Kreps, J., Narkhede, N., Rao, J.: Kafka: a distributed messaging system for log processing. In: Proceedings of the 6th International Workshop on Networking Meets Databases (NetDB’11) (2011)

  13. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACS 5(1), 107–113 (2008)

    Article  Google Scholar 

  14. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop Distributed File System. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10) (2010)

  15. Mukesh Kumar, “Kafka: A detail introduction. https://www.linkedin.com/pulse/kafka-detail-introduction-mukesh-kumar

  16. Ashburn, T.T., Thor, K.B.: Drug repositioning: identifying and developing new uses for existing drugs. Nat. Rev. Drug Discov. 3(8), 673 (2004)

    Article  Google Scholar 

  17. Gabra, N.M., Mustafa, B., Kumar, Y.P., Devi, C.S., Srishailam, A., Reddy, P.V., Reddy, K.L., Satyanarayana, S.: Synthesis, characterization, dna binding studies, photocleavage, cytotoxicity and docking studies of ruthenium (ii) light switch complexes. J. Fluoresc. 24(1), 169–181 (2014)

    Article  Google Scholar 

  18. AutoDock Vina: Molecular docking and virtual screening program. http://vina.scripps.edu/

  19. Trott, O., Olson, A.J.: Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31(2), 455–461 (2010)

    Google Scholar 

  20. Luckow, A., Santcroos, M., Weidner, O., Merzky, A., Mantha, P., Jha, S.: P*: a model of pilot-abstractions. In: Proceedings of the 8th IEEE International Conference on eScience (eScience 2012) (2012)

  21. Nguyen, C.N., Kim, J.S., Hwang, S.: Koha: Building a kafka-based distributed queue system on the fly in a hadoop cluster. In: Foundations and Applications of Self* Systems, IEEE International Workshops on, pp. 48–53. IEEE (2016)

  22. Murthy, A., Vavilapalli, V., Eadline, D., Niemiec, J., Markham, J.: Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2. Addison-Wesley Data & Analytics (2014)

  23. Wang, K., Rajendran, A., Raicu, I.: Matrix: Many-task computing execution fabric at exascale. Tech Report, IIT (2013)

  24. Kim, J.S., Rho, S., Kim, S., Kim, S., Kim, S., Hwang, S.: Htcaas: leveraging distributed supercomputing infrastructures for large-scale scientific computing. In: IEEE/ACM 6th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS13) held with SC13 (2013)

  25. Rho, S., Kim, S., Kim, S., Kim, S., Kim, J.S., Hwang, S.: Htcaas: a large-scale high-throughput computing by leveraging grids, supercomputers and cloud. In: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, pp. 1341–1342. IEEE (2012)

  26. Xu, L., Li, M., Butt, A.R.: Gerbil: Mpi+ yarn. In: Cluster, Cloud and Grid Computing (CCGrid). In: 2015 15th IEEE/ACM International Symposium on, pp. 627–636. IEEE (2015)

  27. Zafar, H., Khan, F.A., Carpenter, B., Shafi, A., Malik, A.W.: Mpj express meets yarn: towards java hpc on hadoop systems. Procedia Comput. Sci. 51, 2678–2682 (2015)

    Article  Google Scholar 

  28. Baccar, S., Derguech, W., Curry, E., Abid, M.: Modeling and querying sensor services using ontologies. In: International Conference on Business Information Systems, pp. 90–101. Springer (2015)

  29. Cafaro, A., Bruijnes, M., van Waterschoot, J., Pelachaud, C., Theune, M., Heylen, D.: Selecting and expressing communicative functions in a saiba-compliant agent framework. In: International Conference on Intelligent Virtual Agents, pp. 73–82. Springer (2017)

  30. Treyer, L., Klein, B., König, R., Meixner, C.: Lightweight urban computation interchange (luci) system. In: Proceedings: FOSS4G pp. 421–432 (2015)

  31. Cui, X., Dong, Z., Lin, L., Song, R., Yu, X.: Grandland traffic data processing platform. In: Big Data (BigData Congress), 2014 IEEE International Congress on, pp. 766–767. IEEE (2014)

  32. Li, K., Deolalikar, V., Pradhan, N.: Big data gathering and mining pipelines for CRM using open-source. In: Big Data (Big Data), 2015 IEEE International Conference on, pp. 2936–2938. IEEE (2015)

  33. Celar, S., Mudnic, E., Seremet, Z.: State-of-the-art of messaging for distributed computing systems. Int. J. Vallis Aurea 3(2), 5–18 (2017)

    Google Scholar 

  34. Dobbelaere, P., Esmaili, K.S.: Kafka versus rabbitmq: a comparative study of two industry reference publish/subscribe implementations: industry paper. In: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, pp. 227–238. ACM (2017)

  35. John, V., Liu, X.: A survey of distributed message broker queues. arXiv preprint arXiv:1704.00411 (2017)

Download references

Acknowledgements

This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No. R0190-16-2012, High Performance Big Data Analytics Platform Performance Acceleration Technologies Development), and Basic Science Research Pro- gram through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (No. 2015R1C1A1A02036524).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jik-Soo Kim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, C.N., Lee, J., Hwang, S. et al. On the role of message broker middleware for many-task computing on a big-data platform. Cluster Comput 22 (Suppl 1), 2527–2540 (2019). https://doi.org/10.1007/s10586-018-2634-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-018-2634-9

Keywords

Navigation