On the role of message broker middleware for many-task computing on a big-data platform

Nguyen, Cao Ngoc; Lee, Jaehwan; Hwang, Soonwook; Kim, Jik-Soo

doi:10.1007/s10586-018-2634-9

On the role of message broker middleware for many-task computing on a big-data platform

Published: 31 March 2018

Volume 22, pages 2527–2540, (2019)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Cao Ngoc Nguyen¹,
Jaehwan Lee²,
Soonwook Hwang¹ &
…
Jik-Soo Kim ORCID: orcid.org/0000-0002-0104-4617³

828 Accesses
7 Citations
Explore all metrics

Abstract

We have designed and implemented a new data processing framework called “Many-task computing On HAdoop” (MOHA) which aims to effectively support fine-grained many-task applications that can show another type of data-intensive workloads in the YARN-based Hadoop 2.0 platform. MOHA is developed as one of Hadoop YARN applications so that it can transparently co-host existing many-task computing (MTC) applications with other data processing workflows such as MapReduce in a single Hadoop cluster. In this paper, we investigate main characteristics of two well-known open-source message broker middleware systems (Apache ActiveMQ and Kafka) and their implications on a many-task management scheme in our MOHA framework. Through our extensive experiments with a real MTC application, we demonstrate and discuss trade-offs between parallelism and load balancing of data access patterns in message broker middleware systems for Many-Task Computing on Hadoop.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics on Apache Spark

Article 13 October 2016

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

The big data system, components, tools, and technologies: a survey

Article 18 September 2018

References

Raicu, I., Foster, I., Wilde, M., Zhang, Z., Iskra, K., Beckman, P., Zhao, Y., Szalay, A., Choudhary, A., Little, P., et al.: Middleware support for many-task computing. Clust. Comput. 13(3), 291–314 (2010)
Article Google Scholar
Raicu, I., Foster, I.T., Zhao, Y.: Many-task computing for grids and supercomputers. In: Many-Task Computing on Grids and Supercomputers, 2008. MTAGS 2008. Workshop on, pp. 1–11. IEEE (2008)
The Apache Hadoop project: Open-source software for reliable, scalable, distributed computing. http://hadoop.apache.org/
Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al.: Apache hadoop yarn: Yet another resource negotiator. In: Proceedings of the 4th annual Symposium on Cloud Computing, p. 5. ACM (2013)
Apache Spark: Lighting-fast cluster computing. https://spark.apache.org/
Apache Storm: A free and open source distributed realtime computation system. http://storm.apache.org/
Open MPI: Open Source High Performance Computing. https://www.open-mpi.org/
Kim, J.S., Nguyen, C., Hwang, S.: Moha: Many-task computing meets the big data platform. In: e-Science (e-Science), 2016 IEEE 12th International Conference on, pp. 193–202. IEEE (2016)
Nguyen, C., Kim, J.S., Lee, J., Hwang, S.: A case study of leveraging high-throughput distributed message queue system for many-task computing on hadoop. In: Foundations and Applications of Self* Systems (FAS* W), 2017 IEEE 2nd International Workshops on, pp. 257–262. IEEE (2017)
Apache ActiveMQ: The most popular and powerful open source messaging and Integration Patterns server. http://activemq.apache.org/
Apache Kafka: A high-throughput distributed messaging system: http://kafka.apache.org/
Kreps, J., Narkhede, N., Rao, J.: Kafka: a distributed messaging system for log processing. In: Proceedings of the 6th International Workshop on Networking Meets Databases (NetDB’11) (2011)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACS 5(1), 107–113 (2008)
Article Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop Distributed File System. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10) (2010)
Mukesh Kumar, “Kafka: A detail introduction. https://www.linkedin.com/pulse/kafka-detail-introduction-mukesh-kumar
Ashburn, T.T., Thor, K.B.: Drug repositioning: identifying and developing new uses for existing drugs. Nat. Rev. Drug Discov. 3(8), 673 (2004)
Article Google Scholar
Gabra, N.M., Mustafa, B., Kumar, Y.P., Devi, C.S., Srishailam, A., Reddy, P.V., Reddy, K.L., Satyanarayana, S.: Synthesis, characterization, dna binding studies, photocleavage, cytotoxicity and docking studies of ruthenium (ii) light switch complexes. J. Fluoresc. 24(1), 169–181 (2014)
Article Google Scholar
AutoDock Vina: Molecular docking and virtual screening program. http://vina.scripps.edu/
Trott, O., Olson, A.J.: Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31(2), 455–461 (2010)
Google Scholar
Luckow, A., Santcroos, M., Weidner, O., Merzky, A., Mantha, P., Jha, S.: P*: a model of pilot-abstractions. In: Proceedings of the 8th IEEE International Conference on eScience (eScience 2012) (2012)
Nguyen, C.N., Kim, J.S., Hwang, S.: Koha: Building a kafka-based distributed queue system on the fly in a hadoop cluster. In: Foundations and Applications of Self* Systems, IEEE International Workshops on, pp. 48–53. IEEE (2016)
Murthy, A., Vavilapalli, V., Eadline, D., Niemiec, J., Markham, J.: Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2. Addison-Wesley Data & Analytics (2014)
Wang, K., Rajendran, A., Raicu, I.: Matrix: Many-task computing execution fabric at exascale. Tech Report, IIT (2013)
Kim, J.S., Rho, S., Kim, S., Kim, S., Kim, S., Hwang, S.: Htcaas: leveraging distributed supercomputing infrastructures for large-scale scientific computing. In: IEEE/ACM 6th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS13) held with SC13 (2013)
Rho, S., Kim, S., Kim, S., Kim, S., Kim, J.S., Hwang, S.: Htcaas: a large-scale high-throughput computing by leveraging grids, supercomputers and cloud. In: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:, pp. 1341–1342. IEEE (2012)
Xu, L., Li, M., Butt, A.R.: Gerbil: Mpi+ yarn. In: Cluster, Cloud and Grid Computing (CCGrid). In: 2015 15th IEEE/ACM International Symposium on, pp. 627–636. IEEE (2015)
Zafar, H., Khan, F.A., Carpenter, B., Shafi, A., Malik, A.W.: Mpj express meets yarn: towards java hpc on hadoop systems. Procedia Comput. Sci. 51, 2678–2682 (2015)
Article Google Scholar
Baccar, S., Derguech, W., Curry, E., Abid, M.: Modeling and querying sensor services using ontologies. In: International Conference on Business Information Systems, pp. 90–101. Springer (2015)
Cafaro, A., Bruijnes, M., van Waterschoot, J., Pelachaud, C., Theune, M., Heylen, D.: Selecting and expressing communicative functions in a saiba-compliant agent framework. In: International Conference on Intelligent Virtual Agents, pp. 73–82. Springer (2017)
Treyer, L., Klein, B., König, R., Meixner, C.: Lightweight urban computation interchange (luci) system. In: Proceedings: FOSS4G pp. 421–432 (2015)
Cui, X., Dong, Z., Lin, L., Song, R., Yu, X.: Grandland traffic data processing platform. In: Big Data (BigData Congress), 2014 IEEE International Congress on, pp. 766–767. IEEE (2014)
Li, K., Deolalikar, V., Pradhan, N.: Big data gathering and mining pipelines for CRM using open-source. In: Big Data (Big Data), 2015 IEEE International Conference on, pp. 2936–2938. IEEE (2015)
Celar, S., Mudnic, E., Seremet, Z.: State-of-the-art of messaging for distributed computing systems. Int. J. Vallis Aurea 3(2), 5–18 (2017)
Google Scholar
Dobbelaere, P., Esmaili, K.S.: Kafka versus rabbitmq: a comparative study of two industry reference publish/subscribe implementations: industry paper. In: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, pp. 227–238. ACM (2017)
John, V., Liu, X.: A survey of distributed message broker queues. arXiv preprint arXiv:1704.00411 (2017)

Download references

Acknowledgements

This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No. R0190-16-2012, High Performance Big Data Analytics Platform Performance Acceleration Technologies Development), and Basic Science Research Pro- gram through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (No. 2015R1C1A1A02036524).

Author information

Authors and Affiliations

Korea Institute of Science and Technology Information, University of Science & Technology, Daejeon, Republic of Korea
Cao Ngoc Nguyen & Soonwook Hwang
School of Electronics and Information Engineering, Korea Aerospace University, Goyang, Republic of Korea
Jaehwan Lee
Department of Computer Engineering, Myongji University, Yongin, Republic of Korea
Jik-Soo Kim

Authors

Cao Ngoc Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Jaehwan Lee
View author publications
You can also search for this author in PubMed Google Scholar
Soonwook Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Jik-Soo Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jik-Soo Kim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, C.N., Lee, J., Hwang, S. et al. On the role of message broker middleware for many-task computing on a big-data platform. Cluster Comput 22 (Suppl 1), 2527–2540 (2019). https://doi.org/10.1007/s10586-018-2634-9

Download citation

Received: 24 January 2018
Accepted: 20 March 2018
Published: 31 March 2018
Issue Date: 16 January 2019
DOI: https://doi.org/10.1007/s10586-018-2634-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the role of message broker middleware for many-task computing on a big-data platform

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Containerization technologies: taxonomies, applications and challenges

The big data system, components, tools, and technologies: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the role of message broker middleware for many-task computing on a big-data platform

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Containerization technologies: taxonomies, applications and challenges

The big data system, components, tools, and technologies: a survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation