Skip to main content

Scalable Online Monitoring of Distributed Systems

  • Conference paper
  • First Online:
Book cover Runtime Verification (RV 2020)

Abstract

Distributed systems are challenging for runtime verification. Centralized specifications provide a global view of the system, but their semantics requires totally-ordered observations, which are often unavailable in a distributed setting. Scalability is also problematic, especially for online first-order monitors, which must be parallelized in practice to handle high volume, high velocity data streams. We argue that scalable online monitors must ingest events from multiple sources in parallel, and we propose a general model for input to such monitors. Our model only assumes a low-resolution global clock and allows for out-of-order events, which makes it suitable for distributed systems. Based on this model, we extend our existing monitoring framework, which slices a single event stream into independently monitorable substreams. Our new framework now slices multiple event streams in parallel. We prove our extension correct and empirically show that the maximum monitoring latency significantly improves when slicing is a bottleneck.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, D.J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003). https://doi.org/10.1007/s00778-003-0095-z

    Article  Google Scholar 

  2. Akidau, T., et al.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc. VLDB Endow. 8(12), 1792–1803 (2015)

    Article  Google Scholar 

  3. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Popa, L., et al. (eds.) PODS 2002, pp. 1–16. ACM (2002)

    Google Scholar 

  4. Bartocci, E., Falcone, Y., Francalanza, A., Reger, G.: Introduction to runtime verification. In: Bartocci, E., Falcone, Y. (eds.) Lectures on Runtime Verification. LNCS, vol. 10457, pp. 1–33. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75632-5_1

    Chapter  Google Scholar 

  5. Basin, D., Bhatt, B.N., Krstić, S., Traytel, D.: Almost event-rate independent monitoring. FMSD 54(3), 449–478 (2019). https://doi.org/10.1007/s10703-018-00328-3

    Article  MATH  Google Scholar 

  6. Basin, D., Caronni, G., Ereth, S., Harvan, M., Klaedtke, F., Mantel, H.: Scalable offline monitoring of temporal specifications. FMSD 49(1–2), 75–108 (2016). https://doi.org/10.1007/s10703-016-0242-y

    Article  MATH  Google Scholar 

  7. Basin, D., Gras, M., Krstić, S., Schneider, J.: Implementation, experimental evaluation, and Isabelle/HOL formalization associated with this paper (2020). https://github.com/krledmno1/krledmno1.github.io/releases/download/v1.0/multi-source.tar.gz

  8. Basin, D., Harvan, M., Klaedtke, F., Zălinescu, E.: Monitoring data usage in distributed systems. IEEE Trans. Softw. Eng. 39(10), 1403–1426 (2013)

    Article  Google Scholar 

  9. Basin, D., Klaedtke, F., Marinovic, S., Zălinescu, E.: On real-time monitoring with imprecise timestamps. In: Bonakdarpour, B., Smolka, S.A. (eds.) RV 2014. LNCS, vol. 8734, pp. 193–198. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11164-3_16

    Chapter  Google Scholar 

  10. Basin, D., Klaedtke, F., Müller, S., Zălinescu, E.: Monitoring metric first-order temporal properties. J. ACM 62(2), 15:1–15:45 (2015)

    Article  MathSciNet  Google Scholar 

  11. Basin, D., Klaedtke, F., Zălinescu, E.: Failure-aware runtime verification of distributed systems. In: Harsha, P., Ramalingam, G. (eds.) FSTTCS 2015. LIPIcs, vol. 45, pp. 590–603. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2015)

    Google Scholar 

  12. Basin, D., Klaedtke, F., Zălinescu, E.: The MonPoly monitoring tool. In: Reger, G., Havelund, K. (eds.) RV-CuBES 2017. Kalpa Publications in Comp., vol. 3, pp. 19–28. EasyChair (2017)

    Google Scholar 

  13. Basin, D., Klaedtke, F., Zălinescu, E.: Runtime verification over out-of-order streams. ACM Trans. Comput. Log. 21(1), 5:1–5:43 (2020)

    Article  MathSciNet  Google Scholar 

  14. Bauer, A., Falcone, Y.: Decentralised LTL monitoring. FMSD 48(1–2), 46–93 (2016). https://doi.org/10.1007/978-3-642-32759-9_10

    Article  MATH  Google Scholar 

  15. Bauer, A., Küster, J.-C., Vegliach, G.: From propositional to first-order monitoring. In: Legay, A., Bensalem, S. (eds.) RV 2013. LNCS, vol. 8174, pp. 59–75. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40787-1_4

    Chapter  MATH  Google Scholar 

  16. Beame, P., Koutris, P., Suciu, D.: Communication steps for parallel query processing. J. ACM 64(6), 40:1–40:58 (2017)

    Article  MathSciNet  Google Scholar 

  17. Becker, D., Rabenseifner, R., Wolf, F., Linford, J.C.: Scalable timestamp synchronization for event traces of message-passing applications. Parallel Comput. 35(12), 595–607 (2009)

    Article  MathSciNet  Google Scholar 

  18. Bersani, M.M., Bianculli, D., Ghezzi, C., Krstić, S., San Pietro, P.: Efficient large-scale trace checking using MapReduce. In: Dillon, L., et al. (eds.) ICSE 2016, pp. 888–898. ACM (2016)

    Google Scholar 

  19. Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink™: stream and batch processing in a single engine. IEEE Data Eng. Bull. 38(4), 28–38 (2015)

    Google Scholar 

  20. Coulouris, G., Dollimore, J., Kindberg, T.: Distributed Systems - Concepts and designs, 3rd edn. International Computer Science Series, Addison-Wesley-Longman (2002)

    Google Scholar 

  21. D’Angelo, B., Sankaranarayanan, S., Sánchez, C., Robinson, W., Finkbeiner, B., et al.: LOLA: runtime monitoring of synchronous systems. In: TIME 2005, pp. 166–174. IEEE (2005)

    Google Scholar 

  22. Desnoyers, M., Dagenais, M.R.: The LTTng tracer: a low impact performance and behavior monitor for GNU/Linux. In: OLS 2006, pp. 209–224 (2006)

    Google Scholar 

  23. El-Hokayem, A., Falcone, Y.: THEMIS: a tool for decentralized monitoring algorithms. In: Bultan, T., Sen, K. (eds.) ISSTA 2017, pp. 372–375. ACM (2017)

    Google Scholar 

  24. El-Hokayem, A., Falcone, Y.: On the monitoring of decentralized specifications: semantics, properties, analysis, and simulation. ACM Trans. Softw. Eng. Methodol. 29(1), 1:1–1:57 (2020)

    Article  Google Scholar 

  25. Falcone, Y., Krstić, S., Reger, G., Traytel, D.: A taxonomy for classifying runtime verification tools. Int. J. Softw. Tools Technol. Transf. (2020, to appear)

    Google Scholar 

  26. Faymonville, P., Finkbeiner, B., Schirmer, S., Torfah, H.: A stream-based specification language for network monitoring. In: Falcone, Y., Sánchez, C. (eds.) RV 2016. LNCS, vol. 10012, pp. 152–168. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46982-9_10

    Chapter  Google Scholar 

  27. Francalanza, A., Pérez, J.A., Sánchez, C.: Runtime verification for decentralised and distributed systems. In: Bartocci, E., Falcone, Y. (eds.) Lectures on Runtime Verification. LNCS, vol. 10457, pp. 176–210. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75632-5_6

    Chapter  Google Scholar 

  28. Gras, M.: Scalable multi-source online monitoring. Bachelor’s thesis, ETH Zürich (2020)

    Google Scholar 

  29. Hallé, S., Khoury, R., Gaboury, S.: Event stream processing with multiple threads. In: Lahiri, S., Reger, G. (eds.) RV 2017. LNCS, vol. 10548, pp. 359–369. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67531-2_22

    Chapter  Google Scholar 

  30. Havelund, K., Peled, D., Ulus, D.: First order temporal logic monitoring with BDDs. In: Stewart, D., Weissenbacher, G. (eds.) FMCAD 2017, pp. 116–123. IEEE (2017)

    Google Scholar 

  31. Havelund, K., Reger, G., Thoma, D., Zălinescu, E.: Monitoring events that carry data. In: Bartocci, E., Falcone, Y. (eds.) Lectures on Runtime Verification. LNCS, vol. 10457, pp. 61–102. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75632-5_3

    Chapter  Google Scholar 

  32. Kreps, J., Narkhede, N., Rao, J., et al.: Kafka: a distributed messaging system for log processing. In: NetDB 2011, vol. 11, pp. 1–7 (2011)

    Google Scholar 

  33. Leucker, M., Sánchez, C., Scheffel, T., Schmitz, M., Schramm, A.: Runtime verification of real-time event streams under non-synchronized arrival. Softw. Qual. J. 28(2), 745–787 (2020). https://doi.org/10.1007/s11219-019-09493-y

    Article  Google Scholar 

  34. Mills, D.L.: Internet time synchronization: the network time protocol. RFC 1129, 1 (1989)

    Google Scholar 

  35. Mostafa, M., Bonakdarpour, B.: Decentralized runtime verification of LTL specifications in distributed systems. In: IPDPS 2015, pp. 494–503. IEEE (2015)

    Google Scholar 

  36. Raszyk, M., Basin, D., Krstić, S., Traytel, D.: Multi-head monitoring of metric temporal logic. In: Chen, Y.-F., Cheng, C.-H., Esparza, J. (eds.) ATVA 2019. LNCS, vol. 11781, pp. 151–170. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31784-3_9

    Chapter  Google Scholar 

  37. Raszyk, M., Basin, D., Traytel, D.: Multi-head monitoring of metric dynamic logic. In: Hung, D.V., Sokolsky, O. (eds.) ATVA 2020. LNCS, vol. 12302. Springer (2020, to appear)

    Google Scholar 

  38. Reger, G., Rydeheard, D.: From First-order Temporal Logic to Parametric Trace Slicing. In: Bartocci, E., Majumdar, R. (eds.) RV 2015. LNCS, vol. 9333, pp. 216–232. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23820-3_14

    Chapter  Google Scholar 

  39. Rosu, G., Chen, F.: Semantics and algorithms for parametric monitoring. Log. Methods Comput. Sci. 8(1), 1–47 (2012)

    Article  MathSciNet  Google Scholar 

  40. Schneider, J., Basin, D., Brix, F., Krstić, S., Traytel, D.: Scalable online first-order monitoring. In: Colombo, C., Leucker, M. (eds.) RV 2018. LNCS, vol. 11237, pp. 353–371. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03769-7_20

    Chapter  Google Scholar 

  41. Schneider, J., Basin, D., Brix, F., Krstić, S., Traytel, D.: Adaptive online first-order monitoring. In: Chen, Y.-F., Cheng, C.-H., Esparza, J. (eds.) ATVA 2019. LNCS, vol. 11781, pp. 133–150. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31784-3_8

    Chapter  Google Scholar 

  42. Schneider, J., Basin, D., Krstić, S., Traytel, D.: A formally verified monitor for metric first-order temporal logic. In: Finkbeiner, B., Mariani, L. (eds.) RV 2019. LNCS, vol. 11757, pp. 310–328. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32079-9_18

    Chapter  Google Scholar 

  43. Sen, K., Vardhan, A., Agha, G., Rosu, G.: Efficient decentralized monitoring of safety in distributed systems. In: Finkelstein, A., et al. (eds.) ICSE 2004, pp. 418–427. IEEE (2004)

    Google Scholar 

  44. Srivastava, U., Widom, J.: Flexible time management in data stream systems. In: Beeri, C., Deutsch, A. (eds.) PODS 2004, pp. 263–274. ACM (2004)

    Google Scholar 

  45. Tucker, P.A., Maier, D.: Exploiting punctuation semantics in data streams. In: Agrawal, R., Dittrich, K.R. (eds.) ICDE 2002, p. 279. IEEE (2002)

    Google Scholar 

Download references

Acknowledgment

Dmitriy Traytel and the anonymous reviewers helped us improve this paper. This research is funded by the US Air Force grant “Monitoring at Any Cost” (FA9550-17-1-0306) and by the SNSF grant “Big Data Monitoring” (167162). The authors are listed alphabetically.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Srđan Krstić or Joshua Schneider .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Basin, D., Gras, M., Krstić, S., Schneider, J. (2020). Scalable Online Monitoring of Distributed Systems. In: Deshmukh, J., Ničković, D. (eds) Runtime Verification. RV 2020. Lecture Notes in Computer Science(), vol 12399. Springer, Cham. https://doi.org/10.1007/978-3-030-60508-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60508-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60507-0

  • Online ISBN: 978-3-030-60508-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics