Skip to main content
Log in

MapReduce and Its Applications, Challenges, and Architecture: a Comprehensive Review and Directions for Future Research

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Profound attention to MapReduce framework has been caught by many different areas. It is presently a practical model for data-intensive applications due to its simple interface of programming, high scalability, and ability to withstand the subjection to flaws. Also, it is capable of processing a high proportion of data in distributed computing environments (DCE). MapReduce, on numerous occasions, has proved to be applicable to a wide range of domains. However, despite the significance of the techniques, applications, and mechanisms of MapReduce, there is no comprehensive study at hand in this regard. Thus, this paper not only analyzes the MapReduce applications and implementations in general, but it also provides a discussion of the differences between varied implementations of MapReduce as well as some guidelines for planning future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wang, B., Huang, S., Qiu, J., Liu, Y., Wang, G.: Parallel online sequential extreme learning machine based on MapReduce. Neurocomputing 149, 224–232 (2015)

    Article  Google Scholar 

  2. Marozzo, F., Talia, D., Trunfio, P.: P2P-MapReduce: parallel data processing in dynamic Cloud environments. J. Comput. Syst. Sci. 78, 1382–1402 (2012)

    Article  Google Scholar 

  3. Mohamed, H., Marchand-Maillet, S.: MRO-MPI: MapReduce overlapping using MPI and an optimized data exchange policy. Parallel Comput. 39, 851–866 (2013)

    Article  Google Scholar 

  4. Barre, B., Klein, M., Soucy-Boivin, M., Ollivier, P.-A., Hallé, S.: MapReduce for parallel trace validation of LTL properties. In: Runtime Verification, pp. 184–198 (2013)

  5. Lu, L., Shi, X., Jin, H., Wang, Q., Yuan, D., Wu, S.: Morpho: a decoupled MapReduce framework for elastic cloud computing. Futur. Gener. Comput. Syst. 36, 80–90 (2014)

    Article  Google Scholar 

  6. Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53, 72–77 (2010)

    Article  Google Scholar 

  7. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)

    Article  Google Scholar 

  8. Kolb, L., Thor, A., Rahm, E.: Multi-pass sorted neighborhood blocking with MapReduce. Comput. Sci. Res. Dev. 27, 45–63 (2012)

    Article  Google Scholar 

  9. Anjos, J.C., Carrera, I., Kolberg, W., Tibola, A.L., Arantes, L.B., Geyer, C.R.: MRA++: scheduling and data placement on MapReduce for heterogeneous environments. Futur. Gener. Comput. Syst. 42, 22–35 (2015)

    Article  Google Scholar 

  10. Zhang, J., Wong, J.-S., Li, T., Pan, Y.: A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems. Int. J. Approx. Reason. 55, 896–907 (2014)

    Article  Google Scholar 

  11. Slagter, K., Hsu, C.-H., Chung, Y.-C., Yi, G.: SmartJoin: a network-aware multiway join for MapReduce. Clust. Comput. 17, 1–13 (2014)

    Article  Google Scholar 

  12. Xiao, Z., Xiao, Y.: Achieving accountable MapReduce in cloud computing. Futur. Gener. Comput. Syst. 30, 1–13 (2014)

    Article  Google Scholar 

  13. Plantenga, T.D., Choe, Y.R., Yoshimura, A.: Using performance measurements to improve mapreduce algorithms. Procedia Comput. Sci. 9, 1920–1929 (2012)

    Article  Google Scholar 

  14. Polato, I., Ré, R., Goldman, A., Kon, F.: A comprehensive view of Hadoop research—a systematic literature review. J. Netw. Comput. Appl. 46, 1–25 (2014)

    Article  Google Scholar 

  15. Shamsi, J., Khojaye, M.A., Qasmi, M.A.: Data-intensive cloud computing: requirements, expectations, challenges, and solutions. J. Grid Comput. 11, 281–310 (2013)

    Article  Google Scholar 

  16. Plimpton, S.J., Devine, K.D.: MapReduce in MPI for large-scale graph algorithms. Parallel Comput. 37, 610–632 (2011)

    Article  Google Scholar 

  17. Wolf, J., Balmin, A., Rajan, D., Hildrum, K., Khandekar, R., Parekh, S., et al.: On the optimization of schedules for MapReduce workloads in the presence of shared scans. VLDB J.—Int. J. Very Large Data Bases 21, 589–609 (2012)

    Article  Google Scholar 

  18. Aznoli, F., Navimipour, N.J.: Cloud services recommendation: Reviewing the recent advances and suggesting the future research directions. J. Netw. Comput. Appl. 77, 73–86 (2017)

    Article  Google Scholar 

  19. Vakili, A., Navimipour, N.J.: Comprehensive and systematic review of the service composition mechanisms in the cloud environments. J. Netw. Comput. Appl. 81, 24–36 (2017)

    Article  Google Scholar 

  20. Yang, H., Luan, Z., Li, W., Qian, D.: MapReduce workload modeling with statistical approach. J. Grid Comput. 10, 279–310 (2012)

    Article  Google Scholar 

  21. Choi, J., Choi, C., Ko, B., Kim, P.: A method of DDoS attack detection using HTTP packet pattern and rule engine in cloud computing environment. Soft Comput. 18, 1697–1703 (2014)

    Article  Google Scholar 

  22. Chiregi, M., Navimipour, N.J.: A new method for trust and reputation evaluation in the cloud environments using the recommendations of opinion leaders’ entities and removing the effect of troll entities. Comput. Hum. Behav. 60, 280–292 (2016)

    Article  Google Scholar 

  23. Chiregi, M., Navimipour, N.J.: A comprehensive study of the trust evaluation mechanisms in the cloud computing. J. Serv. Sci. Res. 9, 1–30 (2017)

    Article  Google Scholar 

  24. Navimipour, N.J., Rahmani, A.M., Navin, A.H., Hosseinzadeh, M.: Expert Cloud: a Cloud-based framework to share the knowledge and skills of human resources. Comput. Hum. Behav. 46, 57–74 (2015)

    Article  Google Scholar 

  25. Keshanchi, B., Souri, A., Navimipour, N.J.: An improved genetic algorithm for task scheduling in the cloud environments using the priority queues: formal verification, simulation, and statistical testing. J. Syst. Softw. 124, 1–21 (2017)

    Article  Google Scholar 

  26. Hazratzadeh, S., Navimipour, N.J.: Colleague recommender system in the Expert Cloud using the features matrix. Kybernetes 45, 1–30 (2017)

    Google Scholar 

  27. Mohammadi, S.Z., Navimipour, J.N.: Invalid cloud providers’ identification using the support vector machine. Int. J. Next-Generation Comput. 8, 82–89 (2017)

    Google Scholar 

  28. Zhang, J., Xiang, D., Li, T., Pan, Y.: M2M: a simple Matlab-to-MapReduce translator for cloud computing. Tsinghua Sci. Technol. 18, 1–9 (2013)

    Article  Google Scholar 

  29. Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5, 716–727 (2012)

    Article  Google Scholar 

  30. Cormack, G.V., Smucker, M.D., Clarke, C.L.: Efficient and effective spam filtering and re-ranking for large web datasets. Inf. Retr. 14, 441–465 (2011)

    Article  Google Scholar 

  31. Lin, J.: Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 155–162 (2009)

  32. Zhao, W., Ma, H., He, Q: Parallel k-means clustering based on mapreduce. In: Cloud Computing, pp. 674–679. Springer, Berlin (2009)

  33. Baraglia, R., De Francisci Morales, G., Lucchese, C.: Document similarity self-join with MapReduce. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 731–736 (2010)

  34. Caruana, G., Li, M., Liu, Y.: An ontology enhanced parallel SVM for scalable spam filter training. Neurocomputing 108, 45–57 (2013)

    Article  Google Scholar 

  35. Liao, R., Zhang, Y., Guan, J., Zhou, S.: CloudNMF: a MapReduce implementation of nonnegative matrix factorization for large-scale biological datasets. Genomics Proteomics Bioinforma. 12, 48–51 (2014)

    Article  Google Scholar 

  36. Svendsen, M., Tirthapura, S.: Mining maximal cliques from a large graph using MapReduce: tackling highly uneven subproblem sizes. J. Parallel Distrib. Comput. 79, 104–114 (2012)

    Google Scholar 

  37. Lee, K.-H., Lee, Y.-J., Choi, H., Chung, Y.D., Moon, B.: Parallel data processing with MapReduce: a survey. ACM SIGMOD Rec. 40, 11–20 (2012)

    Article  Google Scholar 

  38. Li, R., Hu, H., Li, H., Wu, Y., Yang, J.: Mapreduce parallel programming model: a state-of-the-art survey. Int. J. Parallel Prog. 44, 832–866 (2016)

    Article  Google Scholar 

  39. Khezr, S.N., Navimipour, N.J.: MapReduce and its application in optimization algorithms: a comprehensive study. Majlesi J. Multimed. Process. 4, 31–33 (2015)

  40. Vijayalakshmi, V., Akila, A., Nagadivya, S.: The survey on MapReduce. Int. J. Eng. Sci. Technol. 4, 3335–3342 (2012)

  41. Kalavri, V., Vlassov, V.: Mapreduce: limitations, optimizations and open issues. In: 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 1031–1038 (2013)

  42. Debortoli, S., Müller, O., vom Brocke, J.: Comparing business intelligence and big data skills. Bus. Inf. Syst. Eng. 6, 289–300 (2014)

    Article  Google Scholar 

  43. Lin, J., Dyer, C.: Data-intensive text processing with MapReduce. Synth. Lect. Human Lang. Technol. 3, 1–177 (2010)

    Article  Google Scholar 

  44. Jain, R., Sarkar, P., Subhraveti, D.: Gpfs-snc: an enterprise cluster file system for big data. IBM J. Res. Dev. 57, 5:1–5:10 (2013)

    Article  Google Scholar 

  45. Lee, D., Kim, J.-S., Maeng, S.: Large-scale incremental processing with MapReduce. Futur. Gener. Comput. Syst. 36, 66–79 (2014)

    Article  Google Scholar 

  46. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, pp. 2–2 (2012)

  47. Zhao, Y., Wu, J.: Dache: a data aware caching for big-data applications using the MapReduce framework. In: INFOCOM, 2013 Proceedings IEEE, pp. 35–39 (2013)

  48. Costa, P., Donnelly, A., Rowstron, A.I., O’Shea, G.: Camdoop: exploiting in-network aggregation for big data applications. In: NSDI, pp. 3–3 (2012)

  49. Pandey, S, Tokekar, V.: Prominence of MapReduce in Big Data Processing. In: 2014 Fourth International Conference on Communication Systems and Network Technologies (CSNT), pp. 555–560 (2014)

  50. Ji, C., Li, Z., Qu, W., Xu, Y., Li, Y.: Scalable nearest neighbor query processing based on Inverted Grid Index. J. Netw. Comput. Appl. 44, 172–182 (2014)

    Article  Google Scholar 

  51. Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13, 1–37 (2015)

    Article  Google Scholar 

  52. Wu, T.-Y., Chen, C.-Y., Kuo, L.-S., Lee, W.-T., Chao, H.-C.: Cloud-based image processing system with priority-based data distribution mechanism. Comput. Commun. 35, 1809–1818 (2012)

    Article  Google Scholar 

  53. Senger, H., Gil-Costa, V., Arantes, L., Marcondes, C.A.C., Marín, M., Sato, L.M., et al.: BSP cost and scalability analysis for MapReduce operations. Concurr. Comput. Pract. Exp. 28, 2503–2527 (2016)

    Article  Google Scholar 

  54. Idris, M., Hussain, S., Ali, M., Abdulali, A., Siddiqi, M.H., Kang, B.H., et al.: Context-aware scheduling in MapReduce: a compact review. Concurr. Comput. Pract. Exp. 27, 5332–5349 (2015)

    Article  Google Scholar 

  55. Lee, C.-W., Hsieh, K.-Y., Hsieh, S.-Y., Hsiao, H.-C.: A dynamic data placement strategy for Hadoop in heterogeneous environments. Big Data Res. 1, 14–22 (2014)

    Article  Google Scholar 

  56. Aridhi, S., d’Orazio, L., Maddouri, M., Mephu Nguifo, E.: Density-based data partitioning strategy to approximate large-scale subgraph mining. Inf. Syst. 48, 213–223 (2015)

    Article  Google Scholar 

  57. Ding, L., Wang, G., Xin, J., Wang, X., Huang, S., Zhang, R.: ComMapReduce: an improvement of mapreduce with lightweight communication mechanisms. Data Knowl. Eng. 88, 224–247 (2013)

    Article  Google Scholar 

  58. Laclavík, M., Šeleng, M., Hluchý, L.: Towards large scale semantic annotation built on mapreduce architecture. In: Computational Science–ICCS 2008. Springer, pp. 331–338 (2008)

  59. Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: ACM SIGOPS Operating Systems Review, pp. 59–72 (2007)

  60. Yoo, R.M., Romano, A., Kozyrakis, C: Phoenix rebirth: scalable MapReduce on a large-scale shared-memory system. In: IEEE International Symposium on Workload Characterization, 2009. IISWC 2009, pp. 198–207 (2009)

  61. Fang, W., He, B., Luo, Q., Govindaraju, N.K.: Mars: accelerating mapreduce with graphics processors. IEEE Trans. Parallel Distrib. Syst. 22, 608–620 (2011)

    Article  Google Scholar 

  62. Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.-H., Qiu, J., et al.: Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 810–818 (2010)

  63. Pan, J., Biannic, Y.L., Magoules, F.: Parallelizing multiple group-by query in share-nothing environment: a MapReduce study case. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 856–863 (2010)

  64. Aarnio, T: Parallel data processing with MapReduce. In: TKK T-110.5190, Seminar on Internetworking (2009)

  65. Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. In: ACM SIGOPS Operating Systems Review, pp. 29–43 (2003)

  66. Liu, Y., Li, M., Alham, N.K., Hammoud, S.: HSim: a MapReduce simulator in enabling cloud computing. Futur. Gener. Comput. Syst. 29, 300–308 (2013)

    Article  Google Scholar 

  67. Wang, L., Tao, J., Ranjan, R., Marten, H., Streit, A., Chen, J., et al.: G-Hadoop: MapReduce across distributed data centers for data-intensive computing. Futur. Gener. Comput. Syst. 29, 739–750 (2013)

    Article  Google Scholar 

  68. Rasooli, A., Down, D.G.: Guidelines for Selecting Hadoop Schedulers Based on System Heterogeneity. J. Grid Comput. 12, 499–519 (2014)

    Article  Google Scholar 

  69. Kala Karun, A., Chitharanjan, K.: A review on hadoop—HDFS infrastructure extensions. In: 2013 IEEE Conference on Information & Communication Technologies (ICT), pp. 132–137 (2013)

  70. Vaidya, M: Parallel processing of cluster by map reduce. Int. J. Distrib. Parallel Syst. 3, 167 (2012)

  71. Gu, R., Yang, X., Yan, J., Sun, Y., Wang, B., Yuan, C., et al.: SHadoop: improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters. J. Parallel Distrib. Comput. 74, 2166–2179 (2014)

    Article  Google Scholar 

  72. O’Driscoll, A., Daugelaite, J., Sleator, R.D.: ‘Big data’, Hadoop and cloud computing in genomics. J. Biomed. Inform. 46, 774–781 (2013)

    Article  MATH  Google Scholar 

  73. Vijayalakshmi, V., Akila, A, Nagadivya, S.: The survey on mapreduce. Int. J. Eng. Sci. 4, 3335–3342 (2012)

  74. Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Project Website 11, 21 (2007)

    Google Scholar 

  75. He, W., Cui, H., Lu, B., Zhao, J., Li, S., Ruan, G., et al.: Hadoop+: modeling and evaluating the heterogeneity for MapReduce applications in heterogeneous clusters. In: Proceedings of the 29th ACM on International Conference on Supercomputing, pp. 143–153 (2015)

  76. He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: a MapReduce framework on graphics processors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 260–269 (2008)

  77. Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C: Evaluating mapreduce for multi-core and multiprocessor systems. In: IEEE 13th International Symposium on High Performance Computer Architecture, 2007. HPCA 2007, pp. 13–24 (2007)

  78. Chen, R., Chen, H., Zang, B.: Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pp. 523–534 (2010)

  79. Chen, Y., Qiao, Z., Jiang, H., Li, K.-C., Ro, W.W.: Mgmr: Multi-gpu based mapreduce. In: Grid and Pervasive Computing, pp. 433–442. Springer (2013)

  80. Gu, Y., Grossman, R.L.: Sector and Sphere: the design and implementation of a high-performance data cloud. Philos. Trans. R. Soc. Lond. A: Math. Phys. Eng. Sci. 367, 2429–2445 (2009)

    Article  Google Scholar 

  81. Zhang, Y., Gao, Q., Gao, L., Wang, C.: imapreduce: a distributed computing framework for iterative computation. J. Grid Comput. 10, 47–68 (2012)

    Article  Google Scholar 

  82. Liu, Q., Todman, T., Luk, W., Constantinides, G.A.: Automated mapping of the MapReduce pattern onto parallel computing platforms. J. Signal Process. Syst. 67, 65–78 (2012)

    Article  Google Scholar 

  83. Qian, J., Miao, D., Zhang, Z., Yue, X.: Parallel attribute reduction algorithms using MapReduce. Inf. Sci. 279, 671–690 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  84. Derbeko, P., Dolev, S., Gudes, E., Sharma, S.: Security and privacy aspects in MapReduce on clouds: a survey. Comput. Sci. Rev. 20, 1–28 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  85. Xia, T: Large-scale sms messages mining based on map-reduce. In: International Symposium on Computational Intelligence and Design, 2008. ISCID’08, pp. 7–12 (2008)

  86. Jin, C., Vecchiola, C., Buyya, R.: MRPGA: an extension of MapReduce for parallelizing genetic algorithms. In: IEEE Fourth International Conference on eScience, 2008. eScience’08, pp. 214–221 (2008)

  87. Xu, B., Gao, J., Li, C.: An efficient algorithm for DNA fragment assembly in MapReduce. Biochem. Biophys. Res. Commun. 426, 395–398 (2012)

    Article  Google Scholar 

  88. Hsu, C.-Y., Yang, C.-S., Yu, L.-C., Lin, C.-F., Yao, H.-H., Chen, D.-Y., et al.: Development of a cloud-based service framework for energy conservation in a sustainable intelligent transportation system. Int. J. Prod. Econ. 164, 454–461 (2015)

    Article  Google Scholar 

  89. Zhang, F., Cao, J.: A task-level adaptive mapreduce framework for real-time streaming data in healthcare applications. Futur. Gener. Comput. Syst. 43, 149–160 (2015)

    Article  Google Scholar 

  90. López, V., del Río, S., Benítez, J.M., Herrera, F.: Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data. Fuzzy Sets Syst. (2014)

  91. Xu, X., Ji, Z., Yuan, F., Liu, X.: A novel parallel approach of cuckoo search using MapReduce. In: 2014 International Conference on Computer, Communications and Information Technology (CCIT 2014) (2014)

  92. Bi, X., Zhao, X., Wang, G., Zhang, P., Wang, C.: Distributed extreme learning machine with kernels based on MapReduce. Neurocomputing 149, 456–463 (2015)

    Article  Google Scholar 

  93. del Río, S., López, V., Benítez, J.M., Herrera, F.: On the use of MapReduce for imbalanced big data using Random Forest. Inf. Sci. 285, 112–137 (2014)

    Article  Google Scholar 

  94. Kim, J., Chou, J., Rotem, D.: iPACS: power-aware covering sets for energy proportionality and performance in data parallel computing clusters. J. Parallel Distrib. Comput. 74, 1762–1774 (2014)

    Article  Google Scholar 

  95. Paniagua, C., Flores, H., Srirama, S.N.: Mobile sensor data classification for human activity recognition using MapReduce on cloud. Procedia Comput. Sci. 10, 585–592 (2012)

    Article  Google Scholar 

  96. Urbani, J., Kotoulas, S., Maassen, J., Van Harmelen, F., Bal, H.: WebPIE: a web-scale parallel inference engine using MapReduce. Web Semant. Sci. Serv. Agents World Wide Web 10, 59–75 (2012)

    Article  Google Scholar 

  97. Li, Z., Shen, Y., Yao, B., Guo, M.: OFScheduler: a dynamic network optimizer for MapReduce in heterogeneous cluster. Int. J. Parallel Prog. 43, 1–17 (2013)

    Google Scholar 

  98. Rizvandi, N.B., Taheri, J., Moraveji, R., Zomaya, A.Y.: Network load analysis and provisioning of MapReduce applications. In: 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp. 161–166 (2012)

  99. Maurya, M., Mahajan, S.: Performance analysis of MapReduce Programs on Hadoop cluster. In: 2012 World Congress on Information and Communication Technologies (WICT), pp. 505–510 (2012)

  100. Ahmad, F., Chakradhar, S.T., Raghunathan, A., Vijaykumar, T.: Tarazu: optimizing mapreduce on heterogeneous clusters. In: ACM SIGARCH Computer Architecture News, pp. 61–74 (2012)

  101. Ahmad, F., Lee, S., Thottethodi, M., Vijaykumar, T: Puma: purdue mapreduce benchmarks suite (2012)

  102. Brandt, A.: Algebraic analysis of MapReduce samples. Bachelor Thesis, University of Koblenz-Landau (2010)

    Google Scholar 

  103. Verikas, A., Gelzinis, A., Bacauskiene, M.: Mining data with random forests: a survey and results of new tests. Pattern Recogn. 44, 330–349 (2011)

    Article  Google Scholar 

  104. Miner, D., Shook, A.: MapReduce design patterns: building effective algorithms and analytics for Hadoop and other systems. O’Reilly Media, Inc. (2012)

  105. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10, 95 (2010)

    Google Scholar 

  106. Xin, J., Wang, Z., Qu, L., Wang, G.: Elastic extreme learning machine for big data classification. Neurocomputing 149, 464–471 (2015)

    Article  Google Scholar 

  107. He, Q., Shang, T., Zhuang, F., Shi, Z.: Parallel extreme learning machine for regression based on MapReduce. Neurocomputing 102, 52–58 (2013)

    Article  Google Scholar 

  108. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings. 2004 IEEE International Joint Conference on Neural Networks, 2004, pp. 985–990 (2004)

  109. Huang, G.-B., Chen, L., Siew, C.-K.: Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Netw. 17, 879–892 (2006)

    Article  Google Scholar 

  110. Huang, G.-B., Chen, L.: Convex incremental extreme learning machine. Neurocomputing 70, 3056–3062 (2007)

    Article  Google Scholar 

  111. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70, 489–501 (2006)

    Article  Google Scholar 

  112. Alamir, P, Navimipour, N.J.: Trust evaluation between the users of social networks using the quality of service requirements and call log histories. Kybernetes 45, 1505–1523 (2016)

  113. Mohammad Aghdam, S., Navimipour, N.J.: Opinion leaders selection in the social networks based on trust relationships propagation. Karbala Int. J. Modern Sci. 2, 88–97 (2016)

    Article  Google Scholar 

  114. Nourozi, M., Souri, A., Navimipour, N.J.: User relationship management approach for human behavior interactions in the social networks: behavioral modeling and formal verification. Behav. Inf. Technol. (2018, in press)

  115. Liu, G., Zhang, M., Yan, F.: Large-scale social network analysis based on mapreduce. In: 2010 International Conference on Computational Aspects of Social Networks (CASoN), pp. 487–490 (2010)

  116. Yang, S.-J., Chen, Y.-R.: Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds. J. Netw. Comput. Appl. 57, 61–70, 11// (2015)

    Article  Google Scholar 

  117. Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. In: Proceedings of the Sixth International Symposium on Micro Machine and Human Science, 1995. MHS’95, pp. 39–43 (1995)

  118. Shi, Y., Eberhart, R.C.: Empirical study of particle swarm optimization. In: Proceedings of the 1999 Congress on Evolutionary Computation, 1999. CEC 99 (1999)

  119. Sheikholeslami, F., Navimipour, J.N.: Service allocation in the cloud environments using multi-objective particle swarm optimization algorithm based on crowding distance. Swarm Evol. Comput. 35, 53–64 (2017)

    Article  Google Scholar 

  120. McNabb, A.W., Monson, C.K., Seppi, K.D.: Parallel pso using mapreduce. In: IEEE Congress on Evolutionary Computation, 2007. CEC 2007, pp. 7–14 (2007)

  121. Gandomi, A.H., Yang, X.-S., Alavi, A.H.: Cuckoo search algorithm: a metaheuristic approach to solve structural optimization problems. Eng. Comput. 29, 17–35 (2013)

    Article  Google Scholar 

  122. Navimipour, N.J., Milani, F.S.: Task scheduling in the cloud computing based on the cuckoo search algorithm. Int. J. Model. Optim. 5, 44 (2015)

    Article  Google Scholar 

  123. Li, H., Wei, X., Fu, Q., Luo, Y.: MapReduce delay scheduling with deadline constraint. Concurr. Comput. Pract. Exp. 26, 766–778 (2014)

    Article  Google Scholar 

  124. Asghari, S., Navimipour, J.N.: Cloud services composition using an inverted ant colony optimization algorithm. Int. J. Bio-Inspired Comput. (2017, in press)

  125. Asghari, S., Navimipour, J.N.: Resource discovery in peer to peer networks using an inverted ant colony optimization algorithm. Peer-to-Peer Netw. Appl. (2017, in press)

  126. Azad, P., Navimipour, N.J.: An energy-aware task scheduling in cloud computing using a hybrid cultural and ant colony optimization algorithm. Int. J. Cloud Appl. Comput. 7 (2017, in press)

  127. Dréo, J., Siarry, P.: A new ant colony algorithm using the heterarchical concept aimed at optimization of multiminima continuous functions. In: Ant Algorithms. Springer, pp. 216–221 (2002)

  128. Wu, B., Wu, G., Yang, M.: A mapreduce based ant colony optimization approach to combinatorial optimization problems. In: 2012 Eighth International Conference on Natural Computation (ICNC), pp. 728–732 (2012)

  129. Wang, H., Xu, Z., Pedrycz, W.: An overview on the roles of fuzzy set techniques in big data processing: trends, challenges and opportunities. Knowl.-Based Syst. 118, 15–30 (2016)

    Article  Google Scholar 

  130. Li, X., Song, J., Zhang, F., Ouyang, X., Khan, S.U.: MapReduce-based fast fuzzy c-means algorithm for large-scale underwater image segmentation. Futur. Gener. Comput. Syst. 65, 90–101 (2016)

    Article  Google Scholar 

  131. Cheng, S.-T., Wang, H.-C., Chen, Y.-J., Chen, C.-F.: Performance analysis using petri net based MapReduce model in heterogeneous clusters. In: Advances in Web-Based Learning–ICWL 2013 Workshops, pp. 170–179 (2013)

  132. Jayasree, M.: Data mining: exploring big data using Hadoop and MapReduce (2008)

  133. Mesmoudi, A., Hacid, M.-S., Toumani, F.: Benchmarking SQL on MapReduce systems using large astronomy databases. Distrib. Parallel Databases 34, 1–32 (2015)

    Google Scholar 

  134. Wu, L., Yuan, L., You, J.: Survey of large-scale data management systems for big data applications. J. Comput. Sci. Technol. 30, 163–183 (2015)

    Article  Google Scholar 

  135. Müller, G., Sonehara, N., Echizen, I., Wohlgemuth, S.: Sustainable cloud computing. Bus. Inf. Syst. Eng. 3, 129–131 (2011)

    Article  Google Scholar 

  136. Milani, A.S., Navimipour, N.J.: Load balancing mechanisms and techniques in the cloud environments: systematic literature review and future trends. J. Netw. Comput. Appl. 71, 86–89 (2016)

    Article  Google Scholar 

  137. Milani, B.A., Navimipour, N.J.: A comprehensive review of the data replication techniques in the cloud environments: major trends and future directions. J. Netw. Comput. Appl. 64, 229–238 (2016)

    Article  Google Scholar 

  138. Ashouraie, M., Navimipour, N.J.: Priority-based task scheduling on heterogeneous resources in the Expert Cloud. Kybernetes 44, 1455–1471 (2015)

    Article  Google Scholar 

  139. Chiregi, M., Navimipour, N.J.: Trusted services identification in the cloud environment using the topological metrics. Karbala Int. J. Modern Sci. 2, 203–210 (2016)

    Article  Google Scholar 

  140. Sun, Y., Qi, J., Zhang, R., Chen, Y., Du, X.: MapReduce based location selection algorithm for utility maximization with capacity constraints. Computing 97, 1–21 (2013)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nima Jafari Navimipour.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khezr, S.N., Navimipour, N.J. MapReduce and Its Applications, Challenges, and Architecture: a Comprehensive Review and Directions for Future Research. J Grid Computing 15, 295–321 (2017). https://doi.org/10.1007/s10723-017-9408-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-017-9408-0

Keywords

Navigation