Skip to main content

ZERMIA - A Fault Injector Framework for Testing Byzantine Fault Tolerant Protocols

  • Conference paper
  • First Online:
Network and System Security (NSS 2021)

Abstract

Byzantine fault tolerant (BFT) protocols are designed to increase system dependability and security. They guarantee liveness and correctness even in the presence of arbitrary faults. However, testing and validating BFT systems is not an easy task. As is the case for most concurrent and distributed applications, the correctness of these systems is not solely dependant on algorithm and protocol correctness. Ensuring the correct behaviour of BFT systems requires exhaustive testing under real-world scenarios. An approach is to use fault injection tools that deliberate introduce faults into a target system to observe its behaviour. However, existing tools tend to be designed for specific applications and systems, thus cannot be used generically.

We argue that more advanced and powerful tools and frameworks are needed for testing the security and safety of distributed applications in general, and BFT systems in particular. Specifically, a fault injection framework that can be integrated into both client and server side applications, for testing them exhaustively.

We present ZERMIA, a modular and extensible fault injection framework, designed for testing and validating concurrent and distributed applications. We validate ZERMIA’s principles by conduction a series of experiments on a distributed applications and a state of the art BFT library, to show the benefits of ZERMIA for testing and validating applications.

This work was partially funded by POCI-01-0247-FEDER-041435 (SafeCities), POCI-01-0247-FEDER-047264 (Theia) and POCI-01-0247-FEDER-039598 (COP) financed by Fundo Europeu de Desenvolvimento Regional (FEDER), through COMPETE 2020 and Portugal 2020. Rolando Martins was partially supported by project EU H2020-SU-ICT-03-2018 No. 830929 CyberSec4Europe.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    All messages exchanged during this protocol are signed by the respective sender for authentication purposes. Validating messages includes validating the message signature.

  2. 2.

    Note that Agents do not exchange information directly with each other, all synchronisation and coordination is managed by the Coordinator.

  3. 3.

    Note that this can occur when fault triggering conditions are based on different factors, e.g., after a number of rounds and after receiving a specific message.

References

  1. Arlat, J., et al.: Fault injection for dependability validation: a methodology and some applications. IEEE Trans. Softw. Eng. 16(2), 166–182 (1990). https://doi.org/10.1109/32.44380

    Article  Google Scholar 

  2. Aublin, P.L., Mokhtar, S.B., Quéma, V.: Rbft: redundant byzantine fault tolerance. In: 2013 IEEE 33rd International Conference on Distributed Computing Systems, pp. 297–306 (2013). https://doi.org/10.1109/ICDCS.2013.53

  3. Bessani, A., Sousa, J., Alchieri, E.E.: State machine replication for the masses with bft-smart. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 355–362 (2014). https://doi.org/10.1109/DSN.2014.43

  4. Carreira, J., Madeira, H., Silva, J.: Xception: a technique for the experimental evaluation of dependability in modern computers. IEEE Trans. Softw. Eng. 24(2), 125–136 (1998). https://doi.org/10.1109/32.666826

    Article  Google Scholar 

  5. Castro, M., Liskov, B.: Practical byzantine fault tolerance. In: Proceedings of the Third Symposium on Operating Systems Design and Implementation, pp. 173–186. OSDI 1999, USENIX Association, USA (1999)

    Google Scholar 

  6. Castro, M., Liskov, B.: Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20(4), 398–461 (2002)

    Article  Google Scholar 

  7. Chandra, R., Lefever, R., Cukier, M., Sanders, W.: Loki: a state-driven fault injector for distributed systems. In: Proceeding International Conference on Dependable Systems and Networks. DSN 2000, pp. 237–242 (2000). https://doi.org/10.1109/ICDSN.2000.857544

  8. Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)

    Article  MathSciNet  Google Scholar 

  9. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with ycsb. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154. SoCC 2010, Association for Computing Machinery, New York, NY, USA (2010). https://doi.org/10.1145/1807128.1807152

  10. Correia, M., Veronese, G.S., Neves, N.F., Verissimo, P.: Byzantine consensus in asynchronous message-passing systems: a survey. Int. J. Crit. Comput. Based Syst. 2(2), 141–161 (2011)

    Article  Google Scholar 

  11. Cotroneo, D., De Simone, L., Liguori, P., Natella, R., Bidokhti, N.: How bad can a bug get? An empirical analysis of software failures in the openstack cloud computing platform. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 200–211. ESEC/FSE 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3338906.3338916

  12. Duraes, J.A., Madeira, H.S.: Emulation of software faults: a field data study and a practical approach. IEEE Trans. Softw. Eng. 32(11), 849–867 (2006). https://doi.org/10.1109/TSE.2006.113

    Article  Google Scholar 

  13. Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)

    Article  MathSciNet  Google Scholar 

  14. Fabre, J.C., Salles, F., Moreno, M., Arlat, J.: Assessment of cots microkernels by fault injection. In: Dependable Computing for Critical Applications, vol. 7, pp. 25–44 (1999). https://doi.org/10.1109/DCFTS.1999.814288

  15. Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)

    Article  MathSciNet  Google Scholar 

  16. Fonseca, P., Li, C., Rodrigues, R.: Finding complex concurrency bugs in large multi-threaded applications. In: Proceedings of the Sixth Conference on Computer Systems, pp. 215–228. EuroSys 2011, Association for Computing Machinery, New York, NY, USA (2011). https://doi.org/10.1145/1966445.1966465

  17. Fonseca, P., Li, C., Singhal, V., Rodrigues, R.: A study of the internal and external effects of concurrency bugs. In: 2010 IEEE/IFIP International Conference on Dependable Systems Networks (DSN), pp. 221–230 (2010). https://doi.org/10.1109/DSN.2010.5544315

  18. Google: gRPC - A High-Performance Open-Source Universal RPC Framework (2015). http://www.grpc.io/. Accessed 01 July 2021

  19. Gunawi, H.S., et al.: Fate and destini: a framework for cloud recovery testing. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, pp. 238–252. NSDI 2011, USENIX Association, USA (2011)

    Google Scholar 

  20. Han, S., Shin, K., Rosenberg, H.: Doctor: an integrated software fault injection environment for distributed real-time systems. In: Proceedings of 1995 IEEE International Computer Performance and Dependability Symposium, pp. 204–213 (1995). https://doi.org/10.1109/IPDS.1995.395831

  21. Hiller, M., Jhumka, A., Suri, N.: Propane: an environment for examining the propagation of errors in software. In: Proceedings of the 2002 ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 81–85. ISSTA 2002, Association for Computing Machinery, New York, NY, USA (2002). https://doi.org/10.1145/566172.566184

  22. Hsueh, M.C., Tsai, T., Iyer, R.: Fault injection techniques and tools. Computer 30(4), 75–82 (1997). https://doi.org/10.1109/2.585157

    Article  Google Scholar 

  23. Jin, A., Jiang, J., Hu, J., Lou, J.: A pin-based dynamic software fault injection system. In: 2008 The 9th International Conference for Young Computer Scientists, pp. 2160–2167 (2008). https://doi.org/10.1109/ICYCS.2008.329

  24. Kanawati, G., Kanawati, N., Abraham, J.: Ferrari: a flexible software-based fault and error injection system. IEEE Trans. Comput. 44(2), 248–260 (1995). https://doi.org/10.1109/12.364536

    Article  MATH  Google Scholar 

  25. Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)

    Article  Google Scholar 

  26. Li, G., Lu, S., Musuvathi, M., Nath, S., Padhye, R.: Efficient scalable thread-safety-violation detection: finding thousands of concurrency bugs during testing. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles, pp. 162–180. SOSP 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3341301.3359638

  27. Liu, S., Viotti, P., Cachin, C., Quéma, V., Vukolic, M.: XFT: practical fault tolerance beyond crashes. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, pp. 485–500. OSDI 2016, USENIX Association, USA (2016)

    Google Scholar 

  28. Lu, Q., Farahani, M., Wei, J., Thomas, A., Pattabiraman, K.: Llfi: an intermediate code-level fault injection tool for hardware faults. In: 2015 IEEE International Conference on Software Quality, Reliability and Security, pp. 11–16 (2015). https://doi.org/10.1109/QRS.2015.13

  29. Martins, E., Rubira, C., Leme, N.: Jaca: a reflective fault injection tool based on patterns. In: Proceedings International Conference on Dependable Systems and Networks, pp. 483–487 (2002). https://doi.org/10.1109/DSN.2002.1028934

  30. Martins, M., Rosa, A.: A fault injection approach based on reflective programming. In: Proceeding International Conference on Dependable Systems and Networks. DSN 2000, pp. 407–416 (2000). https://doi.org/10.1109/ICDSN.2000.857569

  31. Martins, R., et al.: Experiences with fault-injection in a byzantine fault-tolerant protocol. In: Eyers, D., Schwan, K. (eds.) Middleware 2013. LNCS, vol. 8275, pp. 41–61. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45065-5_3

    Chapter  Google Scholar 

  32. Natella, R., Cotroneo, D., Madeira, H.S.: Assessing dependability with software fault injection: a survey. ACM Comput. Surv. 48(3) (2016). https://doi.org/10.1145/2841425

  33. Platania, M., Obenshain, D., Tantillo, T., Amir, Y., Suri, N.: On choosing server- or client-side solutions for bft. ACM Comput. Surv. 48(4) (2016). https://doi.org/10.1145/2886780

  34. Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)

    Article  MathSciNet  Google Scholar 

  35. Rosenberg, H., Shin, K.: Software fault injection and its application in distributed systems. In: FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing, pp. 208–217 (1993). https://doi.org/10.1109/FTCS.1993.627324

  36. Sanches, B.P., Basso, T., Moraes, R.: J-swfit: a java software fault injection tool. In: 2011 5th Latin-American Symposium on Dependable Computing, pp. 106–115 (2011). https://doi.org/10.1109/LADC.2011.20

  37. Sastry Hari, S.K., Adve, S.V., Naeimi, H., Ramachandran, P.: Relyzer: application resiliency analyzer for transient faults. IEEE Micro 33(3), 58–66 (2013). https://doi.org/10.1109/MM.2013.30

    Article  Google Scholar 

  38. Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4), 299–319 (1990)

    Article  Google Scholar 

  39. Segall, Z., et al.: Fiat - fault injection based automated testing environment. In: Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, Highlights from Twenty-Five Years, p. 394 (1995). https://doi.org/10.1109/FTCSH.1995.532663

  40. Sousa, J., Bessani, A., Vukolic, M.: A byzantine fault-tolerant ordering service for the hyperledger fabric blockchain platform. In: 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 51–58 (2018). https://doi.org/10.1109/DSN.2018.00018

  41. Svenningsson, R., Vinter, J., Eriksson, H., Törngren, M.: MODIFI: a model-implemented fault injection tool. In: Schoitsch, E. (ed.) SAFECOMP 2010. LNCS, vol. 6351, pp. 210–222. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15651-9_16

    Chapter  Google Scholar 

  42. Tsai, T.K., Iyer, R.K.: Measuring fault tolerance with the FTAPE fault injection tool. In: Beilner, H., Bause, F. (eds.) TOOLS 1995. LNCS, vol. 977, pp. 26–40. Springer, Heidelberg (1995). https://doi.org/10.1007/BFb0024305

    Chapter  Google Scholar 

  43. Tsudik, G.: Message authentication with one-way hash functions. SIGCOMM Comput. Commun. Rev. 22(5), 29–38 (1992)

    Article  Google Scholar 

  44. Wang, J., et al.: A comprehensive study on real world concurrency bugs in node.js. In: 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 520–531 (2017). https://doi.org/10.1109/ASE.2017.8115663

  45. Yin, M., Malkhi, D., Reiter, M.K., Gueta, G.G., Abraham, I.: Hotstuff: bft consensus with linearity and responsiveness. In: Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, pp. 347–356. PODC 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3293611.3331591

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João Soares .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Soares, J., Fernandez, R., Silva, M., Freitas, T., Martins, R. (2021). ZERMIA - A Fault Injector Framework for Testing Byzantine Fault Tolerant Protocols. In: Yang, M., Chen, C., Liu, Y. (eds) Network and System Security. NSS 2021. Lecture Notes in Computer Science(), vol 13041. Springer, Cham. https://doi.org/10.1007/978-3-030-92708-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92708-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92707-3

  • Online ISBN: 978-3-030-92708-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics