Skip to main content
Log in

A Modeling Language for MapReduce Programing in a Storage System Perspective

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

MapReduce is a powerful distributed data analysis programming model. It runs on big data storage systems and processes data in a parallel way. An appropriate way to ensure the correctness of MapReduce programs is formal method analysis, which requires firstly a formal model of MapReduce. In this paper we propose a modeling language to establish the formal model of the MapReduce framework. Unlike other approaches, our language describes the processing of data in the MapReduce programs from a perspective of underlying files and blocks, so that the details of data processing can be clearly demonstrated. The language is based on our previous work, a language describing the management of massive data storage systems, with extensions from two aspects: block content data refinement and concurrency support. Based on our language, the features of the MapReduce programming model can be discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Dean, J., & Ghemawat, S. (2004). Mapreduce: Simplified data processing on large clusters. In 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, California, USA (pp. 137–150). https://doi.org/10.1145/1327452.1327492.

  2. Ghemawat, S., Gobioff, H., Leung, S. (2003). The google file system. In Proc. 19th ACM Symposium on Operating Systems Principles 2003, (SOSP 2003), ACM, Bolton Landing, NY, USA (pp. 29–43). https://doi.org/10.1145/945445.945450.

  3. Hu, F., Qiu, M., Li, J., Grant, T., Tylor, D., McCaleb, S., Butler, L., Hamner, R. (2011). A review on cloud computing: Design challenges in architecture and security. CIT, 19(1), 25–55. http://cit.srce.hr/index.php/CIT/article/view/1864.

    Article  Google Scholar 

  4. Jing, Y., Wang, H., Huang, Y., Zhang, L., Xu, J., & Cao, Y. (2017). A modeling language to describe massive data storage management in cyber-physical systems. J. Parallel Distrib. Comput., 103, 113–120. https://doi.org/10.1016/j.jpdc.2016.12.008.

    Article  Google Scholar 

  5. Reynolds, J.C. (2002). Separation logic: a logic for shared mutable data structures. In Proceedings 17th Annual IEEE Symp. Logic in Computer Science (pp. 55–74). https://doi.org/10.1109/LICS.2002.1029817.

  6. Su, W., Yang, F., Zhu, H., Li, Q. (2009). Modeling mapreduce with CSP. In Proceedings Third IEEE Int. Symp. Theoretical Aspects of Software Engineering (pp. 301–302). https://doi.org/10.1109/TASE.2009.28.

  7. Yang, F., Su, W., Zhu, H., Li, Q. (2010). Formalizing mapreduce with CSP. In Proceedings 17th IEEE Int. Conf. and Workshops Engineering of Computer Based Systems (pp. 358–367). https://doi.org/10.1109/ECBS.2010.50.

  8. Pereverzeva, I., Butler, M.J., Fathabadi, A.S., Laibinis, L., Troubitsyna, E. (2014). Formal derivation of distributed mapreduce. In Proceedings 4th International Conference on Abstract State Machines, Alloy, B, TLA, VDM, and Z, Toulouse, France (pp. 238–254). https://doi.org/10.1007/978-3-662-43652-3_21.

  9. Charalambidis, A., Papaspyrou, N., Rondogiannis, P. (2014). Tagged dataflow: a formal model for iterative map-reduce. In Proc. the Workshops of the EDBT/ICDT 2014 Joint Conference (EDBT/ICDT 2014), Athens, Greece (pp. 29–36).

  10. Feldman, J., Muthukrishnan, S., Sidiropoulos, A., Stein, C., Svitkina, Z. (2010). On distributing symmetric streaming computations. ACM Trans. Algorithms, 6(4), 66:1–66:19. https://doi.org/10.1145/1824777.1824786.

    Article  MathSciNet  MATH  Google Scholar 

  11. Karloff, H., Suri, S., Vassilvitskii, S. (2010). A model of computation for mapreduce. In Proceedings the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA (pp. 938–948). https://doi.org/10.1137/1.9781611973075.76.

  12. Noia, T.D., Mongiello, M., Sciascio, E.D. (2014). A computational model for mapreduce job flow. In Proceedings 29th Italian Conference on Computational Logic, Torino, Italy (pp. 335– 340).

  13. Li, J., Qiu, M., Ming, Z., Quan, G., Qin, X., Gu, Z. (2012). Online optimization for scheduling preemptable tasks on iaas cloud systems. J. Parallel Distrib. Comput., 72(5), 666–677. https://doi.org/10.1016/j.jpdc.2012.02.002.

    Article  Google Scholar 

  14. Chen, Z., Qiu, M., Ming, Z., Yang, L.T., Zhu, Y. (2013). Clustering scheduling for hardware tasks in reconfigurable computing systems. Journal of Systems Architecture - Embedded Systems Design, 59(10-D), 1424–1432. https://doi.org/10.1016/j.sysarc.2013.05.015.

    Google Scholar 

  15. Wieder, A., Bhatotia, P., Post, A., Rodrigues, R. (2010). Brief announcement: modelling mapreduce for optimal execution in the cloud. In Proceedings 29th Annual ACM Symposium on Principles of Distributed Computing, (PODC 2010), Zurich, Switzerland (pp. 408–409). https://doi.org/10.1145/1835698.1835795.

  16. Nuñez, A., & Merayo, M.G. (2014). A formal framework to analyze cost and performance in map-reduce based applications. J. Comput. Science, 5(2), 106–118. https://doi.org/10.1016/j.jocs.2013.04.003.

    Article  Google Scholar 

  17. Ruiz, M.C., Calleja, J.L., Cazorla, D. (2015). Petri nets formalization of map/reduce paradigm to optimise the performance-cost tradeoff. In Proceedings 2015 IEEE TrustCom/BigDataSE/ISPA, IEEE, Helsinki, Finland (pp. 92–99). https://doi.org/10.1109/Trustcom.2015.617.

  18. Ruiz, M.C., Cazorla, D., Leándrez, D.P., Conejero, J. (2016). Formal performance evaluation of the map/reduce framework within cloud computing. The Journal of Supercomputing, 72(8), 3136–3155. https://doi.org/10.1007/s11227-015-1553-2.

    Article  Google Scholar 

  19. Ono, K., Hirai, Y., Tanabe, Y., Noda, N., Hagiya, M. (2011). Using coq in specification and program extraction of hadoop mapreduce applications. In Proceedings SEFM 2011, Montevideo, Uruguay (pp. 350–365). https://doi.org/10.1007/978-3-642-24690-6_24.

  20. Reddy, G.S., Feng, Y., Liu, Y., Dong, J.S., Sun, J., Kanagasabai, R. (2013). Towards formal modeling and verification of cloud architectures: A case study on hadoop. In Proceedings IEEE 9th World Congress on Services, (SERVICES 2013), Santa Clara, CA, USA (pp. 306–311). https://doi.org/10.1109/SERVICES.2013.47.

  21. Hoare, C.A.R. (1969). An axiomatic basis for computer programming. Commun. ACM, 12(10), 576–580. https://doi.org/10.1145/363235.363259.

    Article  MATH  Google Scholar 

  22. Dijkstra, E.W. (1968). Co-operating sequential processes. In The origin of concurrent programming, Springer (pp. 65–138). https://doi.org/10.1007/978-1-4757-3472-0_2.

  23. Hansen, P.B. (1972). Structured multiprogramming. Commun. ACM, 15(7), 574–578. https://doi.org/10.1145/361454.361473.

    Article  Google Scholar 

  24. Hoare, C.A.R. (1972). Towards a theory of parallel programming. In The origin of concurrent programming, Springer (pp. 231–244). https://doi.org/10.1007/978-1-4757-3472-0_6.

  25. Hoare, C.A.R. (1978). Communicating sequential processes. Commun. ACM, 21(8), 666–677. https://doi.org/10.1145/359576.359585.

    Article  MATH  Google Scholar 

  26. Milner, R. (1980). A Calculus of Communicating Systems. In Vol. 92 of Lecture Notes in Computer Science, Springer. https://doi.org/10.1007/3-540-10235-3.

  27. Milner, R. (1989). Communication and concurrency. Prentice Hall: PHI Series in computer science.

    MATH  Google Scholar 

  28. Girault, C., & Valk, R. (2003). Petri nets for systems engineering - a guide to modeling, verification, and applications. Berlin: Springer.

    MATH  Google Scholar 

  29. Owicki, S.S., & Gries, D. (1976). Verifying properties of parallel programs: An axiomatic approach. Commun. ACM, 19(5), 279–285. https://doi.org/10.1145/360051.360224.

    Article  MathSciNet  MATH  Google Scholar 

  30. Brookes, S.D. (1993). Full abstraction for a shared variable parallel language. In Proceedings of the Eighth Annual Symposium on Logic in Computer Science (LICS ’93), Montreal, Canada, June 19-23, 1993, IEEE Computer Society (pp. 98–109). https://doi.org/10.1109/LICS.1993.287596.

  31. Brookes, S.D. A semantics for concurrent separation logic. https://doi.org/10.1007/978-3-540-28644-8_2.

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grants Nos. 61572003, 61370053, and 61772035).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hanpin Wang or Yu Huang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jing, Y., Wang, H., Huang, Y. et al. A Modeling Language for MapReduce Programing in a Storage System Perspective. J Sign Process Syst 90, 1133–1150 (2018). https://doi.org/10.1007/s11265-017-1298-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-017-1298-7

Keywords

Navigation