Skip to main content
Log in

Modeling and Verifying HDFS Using Process Algebra

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

Hadoop Distributed File System (HDFS) is a high fault-tolerant distributed file system, which provides a high throughput access to application data and is suitable for applications that have large data sets. Since HDFS is widely used, analysis on it in a formal framework is of great significance. In this paper, we use Communicating Sequential Processes (CSP) to model and analyze HDFS. We mainly focus on the dominant parts which include reading files and writing files in HDFS and formalize them in detail. Moreover, we also model the heartbeat mechanism. Finally, we use the model checker Process Analysis Toolkit (PAT) to simulate the model constructed and verify whether it caters for the specification and some important properties, which include Deadlock-freeness, Minimal Distance Scheme, Mutual Exclusion, Write-Once Scheme and Robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Azzedin F (2013) Towards a scalable HDFS architecture 2013 International conference on collaboration technologies and systems, CTS 2013, San Diego, CA, USA, May 20-24, 2013, pp 155–161. doi:10.1109/CTS.2013.6567222

  2. Bergstra JA, Klop JW (1985) Algebra of communicating processes with abstraction. Theor Comput Sci 37:77-121. doi:10.1016/0304-3975(85)90088-X.

  3. Bui D, Hussain S, Huh E, Lee S (2016) Adaptive replication management in HDFS based on supervised learning. IEEE Trans Knowl Data Eng 28(6):1369-1382. doi:10.1109/TKDE.2016.2523510.

  4. Buyya R, Yeo CS, Venugopal S, Broberg J, Brandic I (2009) Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation Comp Syst 25(6):599–616. doi:10.1016/j.future.2008.12.001.

  5. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113. doi:10.1145/1327452.1327492.

  6. Dong B, Zheng Q, Tian F, Chao K, Ma R, Anane R (2012) An optimized approach for storing and accessing small files on cloud storage. J Network and Computer Applications 35(6):1847-1862. doi:10.1016/j.jnca.2012.07.009.

  7. Dong B, Zheng Q, Tian F, Chao K, Godwin N, Ma T, Xu H (2014) Performance models and dynamic characteristics analysis for HDFS write and read operations: A systematic view. J Syst Softw 93:132–151. doi:10.1016/j.jss.2014.02.038.

  8. Ghemawat S, Gobioff H, Leung S (2003) The google file system Proceedings of the 19th ACM symposium on operating systems principles 2003, SOSP 2003, Bolton Landing, NY, USA, October 19-22, 2003, pp 29–43. doi:10.1145/945445.945450

  9. Hoare CAR (1985) Communicating Sequential Processes. Prentice Hall International in Computer Science

  10. Liu Y, Sun J, Dong JS (2010) Analyzing hierarchical complex real-time systems. In: Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2010, Santa Fe, NM, USA, November 7-11, 2010, pp 365-366. doi:10.1145/1882291.1882350

  11. Lowe G, Davies J (1999) Using CSP to verify sequential consistency. Distributed Computing 12(2-3):91-103. doi:10.1007/s004460050060.

  12. Lowe G, Roscoe AW (1997) Using CSP, to detect errors in the TMN protocol. IEEE Trans Software Eng 23(10):659–669. doi:10.1109/32.637148.

  13. Mazur T, Lowe G (2014) Csp-based counter abstraction for systems with node identifiers. Sci Comput Program 81:3–52. doi:10.1016/j.scico.2013.03.018.

  14. Milner R (1980) A Calculus of Communicating Systems, vol 92, Springer. Lecture Notes in Computer Science

  15. Roscoe AW (1997) The Theory and Practice of Concurrency. Prentice Hall International in Computer Science

  16. Roscoe AW (2007) Understanding Concurrent Systems, Springer Verlag

  17. Roscoe AW, Huang J (2013) Checking noninterference in timed CSP. Formal Asp Comput 25(1):3-35. doi:10.1007/s00165-012-0251-6.

  18. Shafer J, Rixner S, Cox AL (2010) The hadoop distributed filesystem: Balancing portability and performance. In: IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2010, www.ispass.org, 28-30 March 2010, White Plains, NY, USA, pp 122–133. doi:10.1109/ISPASS.2010.5452045

  19. Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: IEEE 26th symposium on mass storage systems and technologies, MSST 2012, Lake Tahoe, Nevada, USA, May 3-7, 2010, pp 1–10

  20. Si Y, Sun J, Liu Y, Dong JS, Pang J, Zhang SJ, Yang X (2014) Model checking with fairness assumptions using PAT. Frontiers of Computer Science 8(1):1–16. doi:10.1007/s11704-013-3091-5.

  21. Sun J, Liu Y, Dong JS (2008) Model checking CSP revisited: Introducing a process analysis toolkit. In: Proceedings of the Leveraging Applications of Formal Methods, Verification and Validation, 3rd International Symposium, ISoLA 2008, Porto Sani, Greece, October 13-15, 2008, pp 307–322. doi:10.1007/978-3-540-88479-8_22

  22. Sun J, Liu Y, Dong JS, Pang J (2009) Pat: Towards flexible verification under fairness. In: Proceedings of the 21th international conference on computer aided verification (CAV’09), Springer, lecture notes in computer science, vol 5643, pp 709–714. doi:10.1007/978-3-642-02658-4_59

  23. Sun J, Liu Y, Song S, Dong JS, Li X (2011) PRTS: An approach for model checking probabilistic real-time hierarchical systems. In: Proceedings of the Formal Methods and Software Engineering - 13th International Conference on Formal Engineering Methods, ICFEM 2011, Durham, UK, October 26-28, 2011, pp 147-162. doi:10.1007/978-3-642-24559-6_12

  24. Sun J, Liu Y, Dong J S, Liu Y, Shi L, André É (2013) Modeling and verifying hierarchical real-time systems using stateful timed CSP. ACM Trans Softw Eng Methodol 22(1):3. doi:10.1145/2430536.2430537.

  25. Tian F, Ma T, Dong B, Zheng Q (2015) Pwlm3-based automatic performance model estimation method for HDFS write and read operations. Future Generation Comp Syst 50:127-139. doi:10.1016/j.future.2015.01.011.

  26. Wang F, Qiu J, Yang J, Dong B, Li X H, Li Y (2009) Hadoop high availability through metadata replication Proceedings of the First International CIKM Workshop on Cloud Data Management, CloudDb 2009, Hong Kong, China, November 2, 2009, pp 37-44. doi:10.1145/1651263.1651271

  27. Wu X, Zhu H, Zhao Y, Wang Z, Liu S (2013) Modeling and verifying the ariadne protocol using process algebra. Comput Sci Inf Syst 10(1):393-421. doi:10.2298/CSIS120601009W.

  28. Zhang Q, Cheng L, Boutaba R (2010) Cloud computing: state-of-the-art and research challenges. J Internet Services and Applications 1(1):7-18. doi:10.1007/s13174-010-0007-6.

  29. Zhu Y, Hu H, Ahn G, Yau SS (2012) Efficient audit service outsourcing for data integrity in clouds. J Syst Softw 85(5):1083–1095. doi:10.1016/j.jss.2011.12.024.

Download references

Acknowledgments

This work was partly supported by the Danish National Research Foundation and the National Natural Science Foundation of China (No. 61361136002) for the Danish-Chinese Center for Cyber Physical Systems. It was also supported by the National Natural Science Foundation of China (No. 61321064 and No. 61472140), Shanghai Natural Science Foundation (No. 14ZR1412500) and Shanghai Collaborative Innovation Center of Trustworthy Software for Internet of Things (No. ZF1213).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phan Cong Vinh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, W., Zhu, H., Wu, X. et al. Modeling and Verifying HDFS Using Process Algebra. Mobile Netw Appl 22, 318–331 (2017). https://doi.org/10.1007/s11036-017-0812-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-017-0812-2

Keywords

Navigation