A Study on the Performance and Scalability of Apache Flink Over Hadoop MapReduce

A Study on the Performance and Scalability of Apache Flink Over Hadoop MapReduce

Pankaj Lathar, K. G. Srinivasa
Copyright: © 2019 |Volume: 2 |Issue: 1 |Pages: 13
ISSN: 2572-4908|EISSN: 2572-4894|EISBN13: 9781522568902|DOI: 10.4018/IJFC.2019010103
Cite Article Cite Article

MLA

Lathar, Pankaj, and K. G. Srinivasa. "A Study on the Performance and Scalability of Apache Flink Over Hadoop MapReduce." IJFC vol.2, no.1 2019: pp.61-73. http://doi.org/10.4018/IJFC.2019010103

APA

Lathar, P. & Srinivasa, K. G. (2019). A Study on the Performance and Scalability of Apache Flink Over Hadoop MapReduce. International Journal of Fog Computing (IJFC), 2(1), 61-73. http://doi.org/10.4018/IJFC.2019010103

Chicago

Lathar, Pankaj, and K. G. Srinivasa. "A Study on the Performance and Scalability of Apache Flink Over Hadoop MapReduce," International Journal of Fog Computing (IJFC) 2, no.1: 61-73. http://doi.org/10.4018/IJFC.2019010103

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

With the advancements in science and technology, data is being generated at a staggering rate. The raw data generated is generally of high value and may conceal important information with the potential to solve several real-world problems. In order to extract this information, the raw data available must be processed and analysed efficiently. It has however been observed, that such raw data is generated at a rate faster than it can be processed by traditional methods. This has led to the emergence of the popular parallel processing programming model – MapReduce. In this study, the authors perform a comparative analysis of two popular data processing engines – Apache Flink and Hadoop MapReduce. The analysis is based on the parameters of scalability, reliability and efficiency. The results reveal that Flink unambiguously outperformance Hadoop's MapReduce. Flink's edge over MapReduce can be attributed to following features – Active Memory Management, Dataflow Pipelining and an Inline Optimizer. It can be concluded that as the complexity and magnitude of real time raw data is continuously increasing, it is essential to explore newer platforms that are adequately and efficiently capable of processing such data.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.