Abstract
State-of-the-art industrial and research projects in the area of distributed stream processing mainly consider only a limited set of delivery-level consistency models, which do not guarantee consistency regarding business requirements. However, such guarantees are able to make stream analytics more reliable. In this paper we define a problem of designing mechanisms, which can detect and possibly fix semantic-based inconsistencies. The results which have been already obtained and a detailed plan of further research are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apache Hadoop, October 2017. http://hadoop.apache.org/
Trident, March 2018. http://storm.apache.org/releases/current/Trident-tutorial.html
Akidau, T., et al.: Millwheel: fault-tolerant stream processing at internet scale. Proc. VLDB 6(11), 1033–1044 (2013)
Apache Storm, Octoner 2017. http://storm.apache.org/
Baylor, D., et al.: Tfx: a tensorflow-based production-scale machine learning platform. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 1387–1395. ACM, New York (2017). https://doi.org/10.1145/3097983.3098021
Carbone, P., Ewen, S., Fóra, G., Haridi, S., Richter, S., Tzoumas, K.: State management in Apache Flink\({\textregistered }\): consistent stateful distributed stream processing. Proc. VLDB 10(12), 1718–1729 (2017)
Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink: stream and batch processing in a single engine. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 36(4), 28–38 (2015)
Chintapalli, S., et al.: Benchmarking streaming computation engines: storm, flink and spark streaming. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1789–1792, May 2016. https://doi.org/10.1109/IPDPSW.2016.138
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). https://doi.org/10.1145/1327452.1327492
Doulkeridis, C., Norvaag, K.: A survey of large-scale analytical query processing in mapreduce. VLDB J. 23(3), 355–380 (2014)
Fischer, P.M., Esmaili, K.S., Miller, R.J.: Stream schema: providing and exploiting static metadata for data stream processing. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 207–218. ACM, New York (2010). https://doi.org/10.1145/1739041.1739068
Garcia-Molina, H.: Using semantic knowledge for transaction processing in a distributed database. ACM Trans. Database Syst. 8(2), 186–213 (1983). https://doi.org/10.1145/319983.319985
Guo, J., Lam, I.H., Chan, C., Xiao, G.: Collaboratively maintaining semantic consistency of heterogeneous concepts towards a common concept set. In: Proceedings of the 2nd ACM SIGCHI Symposium on Engineering Interactive Computing Systems, EICS 2010, pp. 213–218. ACM, New York (2010). https://doi.org/10.1145/1822018.1822052
Hambling, B., Van Goethem, P.: User acceptance testing: a step-by-step guide. BCS Learning & Development (2013)
Jacques-Silva, G., et al.: Consistent regions: guaranteed tuple processing in IBM streams. Proc. VLDB Endow. 9(13), 1341–1352 (2016)
Kulkarni, S., et al.: Twitter heron: stream processing at scale. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 2015, pp. 239–250. ACM, New York (2015). https://doi.org/10.1145/2723372.2742788
Kuralenok, I.E., Marshalkin, N., Trofimov, A., Novikov, B.: An optimistic approach to handle out-of-order events within analytical stream processing. Accepted at SEIM (2018). http://seim-conf.org/en/about/accepted-papers/
Kuralenok, I.E., Trofimov, A., Marshalkin, N., Novikov, B.: Flamestream: model and runtime for distributed stream processing. In: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond, BeyondMR 2018, pp. 8:1–8:2. ACM, New York (2018). https://doi.org/10.1145/3206333.3209273
Li, J., Tufte, K., Shkapenyuk, V., Papadimos, V., Johnson, T., Maier, D.: Out-of-order processing: a new architecture for high-performance stream systems. Proc. VLDB Endow. 1(1), 274–288 (2008)
Mihaila, G.A., Stanoi, I., Lang, C.A.: Anomaly-free incremental output in stream processing. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, pp. 359–368. ACM, New York (2008). https://doi.org/10.1145/1458082.1458132
Noghabi, S.A., et al.: Samza: stateful scalable stream processing at Linkedin. Proc. VLDB Endow. 10(12), 1634–1645 (2017)
Rodríguez, M.A., Bertossi, L., Caniupán, M.: An inconsistency tolerant approach to querying spatial databases. In: Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2008, pp. 36:1–36:10. ACM, New York (2008). https://doi.org/10.1145/1463434.1463480
Tucker, P.A., Maier, D., Sheard, T., Fegaras, L.: Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15(3), 555–568 (2003). https://doi.org/10.1109/TKDE.2003.1198390
Zaharia, M., Das, T., Li, H., Shenker, S., Stoica, I.: Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In: Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Ccomputing, HotCloud 2012, p. 10. USENIX Association, Berkeley (2012)
Zou, Q., et al.: From a stream of relational queries to distributed stream processing. Proc. VLDB Endow. 3(1–2), 1394–1405 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Trofimov, A. (2018). Consistency Maintenance in Distributed Analytical Stream Processing. In: Benczúr, A., et al. New Trends in Databases and Information Systems. ADBIS 2018. Communications in Computer and Information Science, vol 909. Springer, Cham. https://doi.org/10.1007/978-3-030-00063-9_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-00063-9_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00062-2
Online ISBN: 978-3-030-00063-9
eBook Packages: Computer ScienceComputer Science (R0)