Skip to main content

IoT Service Runtime Fault Tolerance Mechanism Based on Flink Dynamic Checkpoint

  • Conference paper
  • First Online:
Service Science (ICSS 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1844))

Included in the following conference series:

  • 543 Accesses

Abstract

By integrating Internet of Things (IoT) capabilities to sense real-time conditions in the physical environment, traditional Business Process Management (BPM) has the potential to become more flexible and adaptive. However, the integration of BPM and IoT faces challenges such as programming mechanism mismatches, resource management mechanism mismatches, and adaptive mechanism mismatches. This research considers IoT service-based technology as an effective approach to integrate BPM and IoT. The IoT service must be calculable, composable, bindable, and fault-tolerant. When IoT services run on Apache Flink, the native fault tolerance mechanism may not meet the fault tolerance needs of IoT services due to high-speed fluctuation characteristics of IoT service data sources. Additionally, traditional static checkpoint fault-tolerant mechanisms may not balance runtime overhead and recovery delay optimally. This paper proposes an on-demand dynamic checkpoint fault-tolerant method that calculates the recovery delay in real-time based on data fluctuation rates and actively triggers the checkpoint operation when the user threshold is reached. Experiments show that the proposed method improves system efficiency by up to 11.9% compared to the static checkpoint mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Stankovic, J.A.: Research directions for the Internet of Things. IEEE Internet Things J. 1(1), 3ā€“9 (2014)

    Article  Google Scholar 

  2. Stoyanova, M., Nikoloudakis, Y., Panagiotakis, S., Pallis, E., Markakis, E.K.: A survey on the internet of things (IoT) forensics: challenges, approaches, and open issues. IEEE Commun. Surv. Tutor. 22(2), 1191ā€“1221 (2020)

    Article  Google Scholar 

  3. Weske, M.: Business Process Management: Concepts, Languages, Architectures. Springer, Heidelberg (2019). https://doi.org/10.1007/978-3-662-59432-2

    Book  Google Scholar 

  4. Gruhn, V., et al.: BRIBOT: towards a service-based methodology for bridging business processes and IoT big data. In: Service-Oriented Computing: 19th International Conference (ICSOC), pp. 597ā€“611 (2021)

    Google Scholar 

  5. Zhang, J., Zhou, A., Sun, Q., Wang, S., Yang, F.: Overview on fault tolerance strategies of composite service in service computing. Wirel. Commun. Mob. Comput. (2018)

    Google Scholar 

  6. Wang, S., Huang, L., Sun, L., Hsu, C.H., Yang, F.: Efficient and reliable service selection for heterogeneous distributed software systems. Futur. Gener. Comput. Syst. 74, 158ā€“167 (2017)

    Article  Google Scholar 

  7. Liu, A., Li, Q., Huang, L., Xiao, M.: FACTS: a framework for fault-tolerant composition of transactional web services. IEEE Trans. Serv. Comput. 3(1), 46ā€“59 (2009)

    Article  Google Scholar 

  8. Erradi, A., Maheshwari, P., Tosic, V.: Recovery policies for enhancing web services reliability. In: 2006 IEEE International Conference on Web Services (ICWS 2006), pp. 189ā€“196. IEEE (2006)

    Google Scholar 

  9. Wang, S., Lei, T., Zhang, L., Hsu, C.H., Yang, F.: Offloading mobile data traffic for QoS-aware service provision in vehicular cyber-physical systems. Futur. Gener. Comput. Syst. 61, 118ā€“127 (2016)

    Article  Google Scholar 

  10. Angarita, R., Rukoz, M., Cardinale, Y.: Modeling dynamic recovery strategy for composite web services execution. World Wide Web 19, 89ā€“109 (2016)

    Article  Google Scholar 

  11. Gupta, S., Bhanodia, P.: A fault tolerant mechanism for composition of web services using subset replacement. Int. J. Adv. Res. Comput. Commun. Eng. 2(8), 3080ā€“3085 (2013)

    Google Scholar 

  12. Vargas-Santiago, M., HernĆ”ndez, S.E.P., Morales-Rosales, L.A., Kacem, H.H.: Survey on web services fault tolerance approaches based on check-pointing mechanisms. J. Softw. 12(7), 507ā€“525 (2017)

    Article  Google Scholar 

  13. Mansour, H.E., Dillon, T.: Dependability and rollback recovery for composite web services. IEEE Trans. Serv. Comput. 4(4), 328ā€“339 (2010)

    Article  Google Scholar 

  14. Chiu, L.Y., Fan, S., Liu, Y., et al.: Providing a fault tolerant system in a loosely-coupled cluster environment using application checkpoints and logs. U.S. Patent 9,098,439 (2015)

    Google Scholar 

  15. Chandy, K.M., Lamport, L.: Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. (TOCS). 3(1), 63ā€“75 (1985)

    Article  Google Scholar 

  16. Young, J.W.: A first order approximation to the optimum checkpoint interval. Commun. ACM 17(9), 530ā€“531 (1974)

    Article  MATH  Google Scholar 

  17. Daly, J.T.: A higher order estimate of the optimum checkpoint interval for restart dumps. Futur. Gener. Comput. Syst. 22(3), 303ā€“312 (2006)

    Article  Google Scholar 

  18. Chen, N., Ren, S.: Adaptive optimal checkpoint interval and its impact on systemā€™s overall quality in soft real-time applications. In: Proceedings of the 2009 ACM Symposium on Applied Computing, pp. 1015ā€“1020 (2009)

    Google Scholar 

  19. Jin, H., Chen, Y., Zhu, H., Sun, X. H.: Optimizing HPC fault-tolerant environment: an analytical approach. In: 2010 39th International Conference on Parallel Processing, pp. 525ā€“534. IEEE (2010)

    Google Scholar 

  20. Punnekkat, S., Burns, A., Davis, R.: Analysis of checkpointing for real-time systems. Real-Time Syst. 20(1), 83ā€“102 (2001)

    Article  MATH  Google Scholar 

  21. Zhuang, Y., Wei, X., Li, H., Wang, Y., He, X.: An optimal checkpointing model with online OCI adjustment for stream processing applications. In: 2018 27th International Conference on Computer Communication and Networks (ICCCN), pp. 1ā€“9. IEEE (2018)

    Google Scholar 

  22. Jayasekara, S., Harwood, A., Karunasekera, S.: A utilization model for optimization of checkpoint intervals in distributed stream processing systems. Futur. Gener. Comput. Syst. 110, 68ā€“79 (2020)

    Article  Google Scholar 

  23. Geldenhuys, M.K., Thamsen, L., Kao, O.: Chiron: optimizing fault tolerance in QoS-aware distributed stream processing jobs. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 434ā€“440. IEEE (2020)

    Google Scholar 

  24. Salama, A., Binnig, C., Kraska, T., Zamanian, E.: Cost-based fault-tolerance for parallel data processing. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 285ā€“297 (2015)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the International Cooperation and Exchange Program of National Natural Science Foundation of China (No. 62061136006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wentao Bai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bai, W., Fang, J., Chang, W. (2023). IoT Service Runtime Fault Tolerance Mechanism Based on Flink Dynamic Checkpoint. In: Wang, Z., Wang, S., Xu, H. (eds) Service Science. ICSS 2023. Communications in Computer and Information Science, vol 1844. Springer, Singapore. https://doi.org/10.1007/978-981-99-4402-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-4402-6_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-4401-9

  • Online ISBN: 978-981-99-4402-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics