Abstract
Efficient processing of input data streams is central to IoT systems, and the goal of this paper is to develop a logical foundation for specifying the computation of such stream processing. In the proposed model, both the input and output of a stream processing system consists of tagged data items with a dependency relation over tags that captures the logical ordering constraints over data items. While a system processes the input data one item at a time, incrementally producing output data items, its semantics is a function from input data traces to output data traces, where a data trace is an equivalence class of sequences of data items induced by the dependency relation. This data-trace transduction model generalizes both acyclic Kahn process networks and relational query processors, and can specify computations over data streams with a rich variety of ordering and synchronization characteristics. To form complex systems from simpler ones, we define sequential composition and parallel composition operations over data-trace transductions, and show how to define commonly used idioms in stream processing such as sliding windows, key-based partitioning, and map-reduce.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abadi, D., Ahmad, Y., Balazinska, M., Cetintemel, U., Cherniack, M., Hwang, J.H., Lindner, W., Maskey, A., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., Zdonik, S.: The design of the Borealis stream processing engine. In: Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR), pp. 277–289 (2005)
Abadi, D., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)
de Alfaro, L., Henzinger, T.: Interface automata. In: Proceedings of the Ninth Annual ACM Symposium on Foundations of Software Engineering (FSE), pp. 109–120 (2001)
Ali, M., Chandramouli, B., Goldstein, J., Schindlauer, R.: The extensibility framework in Microsoft StreamInsight. In: Proceedings of the 27th IEEE International Conference on Data Engineering (ICDE), pp. 1242–1253 (2011)
Alur, R., Fisman, D., Raghothaman, M.: Regular programming for quantitative properties of data streams. In: Thiemann, P. (ed.) ESOP 2016. LNCS, vol. 9632, pp. 15–40. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49498-1_2
Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121–142 (2006)
Benveniste, A., Caspi, P., Edwards, S., Halbwachs, N., Guernic, P.L., de Simone, R.: The synchronous languages 12 years later. Proc. IEEE 91(1), 64–83 (2003)
Brock, J.D., Ackerman, W.B.: Scenarios: a model of non-determinate computation. In: Díaz, J., Ramos, I. (eds.) ICFPC 1981. LNCS, vol. 107, pp. 252–259. Springer, Heidelberg (1981). https://doi.org/10.1007/3-540-10699-5_102
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation, OSDI 2004, pp. 137–149. USENIX Association (2004). https://www.usenix.org/legacy/publications/library/proceedings/osdi04/tech/dean.html
Grumbach, S., Milo, T.: An algebra of pomsets. Inf. Comput. 150, 268–306 (1999)
Kahn, G.: The semantics of a simple language for parallel programming. Inf. Process. 74, 471–475 (1974)
Kulkarni, S., Bhagat, N., Fu, M., Kedigehalli, V., Kellogg, C., Mittal, S., Patel, J., Ramasamy, K., Taneja, S.: Twitter heron: stream processing at scale. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 239–250 (2015)
Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 558–565 (1978)
Lee, E.A., Messerschmitt, D.G.: Synchronous data flow. Proc. IEEE 75(9), 1235–1245 (1987)
Li, J., Maier, D., Tufte, K., Papamidos, V., Tucker, P.: Semantics and evaluation techniques for window aggregates in data streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 311–322 (2015)
Lynch, N.: Distributed Algorithms. Morgan Kaufmann, Burlington (1996)
Mamouras, K., Raghothaman, M., Alur, R., Ives, Z., Khanna, S.: StreamQRE: modular specification and efficient evaluation of quantitative queries over streaming data. In: Proceedings of 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 693–708 (2017)
Mazurkiewicz, A.: Trace theory. In: Brauer, W., Reisig, W., Rozenberg, G. (eds.) ACPN 1986. LNCS, vol. 255, pp. 278–324. Springer, Heidelberg (1987). https://doi.org/10.1007/3-540-17906-2_30
Panangaden, P., Shanbhogue, V.: The expressive power of indeterminate dataflow primitives. Inf. Comput. 98(1), 99–131 (1992)
Pratt, V.: Modeling concurrency with partial orders. Int. J. Parallel Program. 15(1), 33–71 (1986)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Alur, R., Mamouras, K., Stanford, C., Tannen, V. (2018). Interfaces for Stream Processing Systems. In: Lohstroh, M., Derler, P., Sirjani, M. (eds) Principles of Modeling. Lecture Notes in Computer Science(), vol 10760. Springer, Cham. https://doi.org/10.1007/978-3-319-95246-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-95246-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95245-1
Online ISBN: 978-3-319-95246-8
eBook Packages: Computer ScienceComputer Science (R0)