Skip to main content

Consistency Maintenance in Distributed Analytical Stream Processing

  • Conference paper
  • First Online:
Book cover New Trends in Databases and Information Systems (ADBIS 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 909))

Included in the following conference series:

  • 1230 Accesses

Abstract

State-of-the-art industrial and research projects in the area of distributed stream processing mainly consider only a limited set of delivery-level consistency models, which do not guarantee consistency regarding business requirements. However, such guarantees are able to make stream analytics more reliable. In this paper we define a problem of designing mechanisms, which can detect and possibly fix semantic-based inconsistencies. The results which have been already obtained and a detailed plan of further research are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache Hadoop, October 2017. http://hadoop.apache.org/

  2. Trident, March 2018. http://storm.apache.org/releases/current/Trident-tutorial.html

  3. Akidau, T., et al.: Millwheel: fault-tolerant stream processing at internet scale. Proc. VLDB 6(11), 1033–1044 (2013)

    Article  Google Scholar 

  4. Apache Storm, Octoner 2017. http://storm.apache.org/

  5. Baylor, D., et al.: Tfx: a tensorflow-based production-scale machine learning platform. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 1387–1395. ACM, New York (2017). https://doi.org/10.1145/3097983.3098021

  6. Carbone, P., Ewen, S., Fóra, G., Haridi, S., Richter, S., Tzoumas, K.: State management in Apache Flink\({\textregistered }\): consistent stateful distributed stream processing. Proc. VLDB 10(12), 1718–1729 (2017)

    Article  Google Scholar 

  7. Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink: stream and batch processing in a single engine. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 36(4), 28–38 (2015)

    Google Scholar 

  8. Chintapalli, S., et al.: Benchmarking streaming computation engines: storm, flink and spark streaming. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1789–1792, May 2016. https://doi.org/10.1109/IPDPSW.2016.138

  9. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). https://doi.org/10.1145/1327452.1327492

    Article  Google Scholar 

  10. Doulkeridis, C., Norvaag, K.: A survey of large-scale analytical query processing in mapreduce. VLDB J. 23(3), 355–380 (2014)

    Article  Google Scholar 

  11. Fischer, P.M., Esmaili, K.S., Miller, R.J.: Stream schema: providing and exploiting static metadata for data stream processing. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 207–218. ACM, New York (2010). https://doi.org/10.1145/1739041.1739068

  12. Garcia-Molina, H.: Using semantic knowledge for transaction processing in a distributed database. ACM Trans. Database Syst. 8(2), 186–213 (1983). https://doi.org/10.1145/319983.319985

    Article  MATH  Google Scholar 

  13. Guo, J., Lam, I.H., Chan, C., Xiao, G.: Collaboratively maintaining semantic consistency of heterogeneous concepts towards a common concept set. In: Proceedings of the 2nd ACM SIGCHI Symposium on Engineering Interactive Computing Systems, EICS 2010, pp. 213–218. ACM, New York (2010). https://doi.org/10.1145/1822018.1822052

  14. Hambling, B., Van Goethem, P.: User acceptance testing: a step-by-step guide. BCS Learning & Development (2013)

    Google Scholar 

  15. Jacques-Silva, G., et al.: Consistent regions: guaranteed tuple processing in IBM streams. Proc. VLDB Endow. 9(13), 1341–1352 (2016)

    Article  Google Scholar 

  16. Kulkarni, S., et al.: Twitter heron: stream processing at scale. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, pp. 239–250. ACM, New York (2015). https://doi.org/10.1145/2723372.2742788

  17. Kuralenok, I.E., Marshalkin, N., Trofimov, A., Novikov, B.: An optimistic approach to handle out-of-order events within analytical stream processing. Accepted at SEIM (2018). http://seim-conf.org/en/about/accepted-papers/

  18. Kuralenok, I.E., Trofimov, A., Marshalkin, N., Novikov, B.: Flamestream: model and runtime for distributed stream processing. In: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR 2018, pp. 8:1–8:2. ACM, New York (2018). https://doi.org/10.1145/3206333.3209273

  19. Li, J., Tufte, K., Shkapenyuk, V., Papadimos, V., Johnson, T., Maier, D.: Out-of-order processing: a new architecture for high-performance stream systems. Proc. VLDB Endow. 1(1), 274–288 (2008)

    Article  Google Scholar 

  20. Mihaila, G.A., Stanoi, I., Lang, C.A.: Anomaly-free incremental output in stream processing. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, pp. 359–368. ACM, New York (2008). https://doi.org/10.1145/1458082.1458132

  21. Noghabi, S.A., et al.: Samza: stateful scalable stream processing at Linkedin. Proc. VLDB Endow. 10(12), 1634–1645 (2017)

    Article  Google Scholar 

  22. Rodríguez, M.A., Bertossi, L., Caniupán, M.: An inconsistency tolerant approach to querying spatial databases. In: Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2008, pp. 36:1–36:10. ACM, New York (2008). https://doi.org/10.1145/1463434.1463480

  23. Tucker, P.A., Maier, D., Sheard, T., Fegaras, L.: Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15(3), 555–568 (2003). https://doi.org/10.1109/TKDE.2003.1198390

    Article  Google Scholar 

  24. Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I.: Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In: Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Ccomputing, HotCloud 2012, p. 10. USENIX Association, Berkeley (2012)

    Google Scholar 

  25. Zou, Q., et al.: From a stream of relational queries to distributed stream processing. Proc. VLDB Endow. 3(1–2), 1394–1405 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Artem Trofimov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Trofimov, A. (2018). Consistency Maintenance in Distributed Analytical Stream Processing. In: Benczúr, A., et al. New Trends in Databases and Information Systems. ADBIS 2018. Communications in Computer and Information Science, vol 909. Springer, Cham. https://doi.org/10.1007/978-3-030-00063-9_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00063-9_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00062-2

  • Online ISBN: 978-3-030-00063-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics