skip to main content
10.1145/2335484.2335487acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article

From a calculus to an execution environment for stream processing

Published:16 July 2012Publication History

ABSTRACT

At one level, this paper is about River, a virtual execution environment for stream processing. Stream processing is a paradigm well-suited for many modern data processing systems that ingest high-volume data streams from the real world, such as audio/video streaming, high-frequency trading, and security monitoring. One attractive property of stream processing is that it lends itself to parallelization on multicores, and even to distribution on clusters when extreme scale is required. Stream processing has been co-evolved by several communities, leading to diverse languages with similar core concepts. Providing a common execution environment reduces language development effort and increases portability. We designed River as a practical realization of Brooklet, a calculus for stream processing. So at another level, this paper is about a journey from theory (the calculus) to practice (the execution environment). The challenge is that, by definition, a calculus abstracts away all but the most central concepts. Hence, there are several research questions in concretizing the missing parts, not to mention a significant engineering effort in implementing them. But the effort is well worth it, because using a calculus as a foundation yields clear semantics and proven correctness results.

References

  1. L. Amini, H. Andrade, R. Bhagwan, F. Eskesen, R. King, P. Selo, Y. Park, and C. Venkatramani. SPC: A distributed, scalable platform for data mining. In Proc. 4th International Workshop on Data Mining Standards, Services, and Platforms, pp. 27--37, Aug. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Arasu, S. Babu, and J. Widom. The CQL continuous query language: Semantic foundations and query execution. The VLDB Journal, 15(2):121--142, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Auerbach, D. F. Bacon, P. Cheng, and R. Rabbah. Lime: A Java compatible and synthesizable language for heterogeneous architectures. In Proc. ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 89--108, Oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In ACM SIGMOD International Conference on Management of Data, pp. 261--272, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Bravenboer and E. Visser. Concrete syntax for objects. In Proc. ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 365--383, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. FlumeJava: Easy, efficient data-parallel pipelines. In Proc. ACM Conference on Programming Language Design and Implementation, pp. 363--375, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proc. 6th USENIX Symposium on Operating Systems Design and Implementation, pp. 137--150, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Communications of the ACM, 35(6):85--98, June 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Fegaras. Optimizing queries with object updates. Journal of Intelligent Information Systems, 12(2--3):219--242, Mar. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Gedik, H. Andrade, K.-L. Wu, P. S. Yu, and M. Doo. SPADE: The System S declarative stream processing engine. In Proc. ACM SIGMOD International Conference on Management of Data, pp. 1123--1134, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Ghelli, N. Onose, K. Rose, and J. Siméon. XML query optimization in the presence of side effects. In Proc. ACM SIGMOD International Conference on Management of Data, pp. 339--352, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proc. 12th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 151--162, Oct. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. I. Gordon, W. Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A stream compiler for communication-exposed architectures. In Proc. 10th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 291--303, Dec. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Grimm. Better extensibility through modular syntax. In Proc. ACM Conference on Programming Language Design and Implementation, pp. 38--51, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Grimm, J. Davis, E. Lemar, A. MacBeth, S. Swanson, T. Anderson, B. Bershad, G. Borriello, S. Gribble, and D. Wetherall. System support for pervasive applications. ACM Transactions on Computer Systems, 22(4):421--486, Nov. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Gurevich, D. Leinders, and J. Van Den Bussche. A theory of stream queries. In Proc. 11th International Conference on Database Programming Languages, vol. 4797 of LNCS, pp. 153--168, Sept. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Hirzel. Partition and compose: Parallel complex event processing. In Proc. 6th International Conference on Distributed Event-Based Systems, July 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Hirzel and R. Grimm. Jeannie: Granting Java native interface developers their wishes. In Proc. ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 19--38, Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Isard, M. B. Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel program from sequential building blocks. In Proc. 2nd European Conference on Computer Systems, pp. 59--72, Mar. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Khandekar, I. Hildrum, S. Parekh, D. Rajan, J. Wolf, K.-L. Wu, H. Andrade, and B. Gedik. COLA: Optimizing stream processing applications via graph partitioning. In Proc. 10th ACM/IFIP/USENIX International Conference on Middleware, vol. 5896 of LNCS, pp. 308--327, Nov. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. F. Labonte, P. Mattson, W. Thies, I. Buck, C. Kozyrakis, and M. Horowitz. The stream virtual machine. In Proc. 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 267--277, Sept./Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Lattner and V. Adve. LLVM: A compilation framework for life-long program analysis and transformation. In Proc. 2nd IEEE/ACM International Symposium on Code Generation and Optimization, pp. 75--88, Mar. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. A. Lee and D. G. Messerschmitt. Synchronous data flow. Proceedings of the IEEE, 75(9):1235--1245, Sept. 1987.Google ScholarGoogle ScholarCross RefCross Ref
  24. E. Meijer, B. Beckman, and G. Bierman. LINQ: Reconciling object, relations and XML in the .NET framework. In Proc. ACM SIGMOD International Conference on Management of Data, p. 706, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Miranda, A. Pop, P. Dumont, A. Cohen, and M. Duranton. Erbium: A deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes. In Proc. International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 11--20, Oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. C. Mogul and K. K. Ramakrishnan. Eliminating receive livelock in an interrupt-driven kernel. ACM Transactions on Computer Systems, 15(3):217--252, Aug. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: A universal execution engine for distributed data-flow computing. In Proc. 8th ACM/USENIX Symposium on Networked Systems Design and Implementation, pp. 113--126, Mar. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. Nystrom, M. R. Clarkson, and A. C. Myers. Polyglot: An extensible compiler framework for Java. In Proc. 12th International Conference on Compiler Construction, vol. 2622 of LNCS, pp. 138--152, Apr. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. In Proc. ACM SIGMOD International Conference on Management of Data, pp. 1099--1110, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Pietzuch, J. Ledlie, J. Schneidman, M. Roussopoulos, M. Welsh, and M. Seltzer. Network-aware operator placement for stream-processing systems. In Proc. 22nd International Conference on Data Engineering, pp. 49--61, Apr. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data: Parallel analysis with Sawzall. Scientific Programming, 13(4):277--298, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Soulé, M. Hirzel, R. Grimm, B. Gedik, H. Andrade, V. Kumar, and K.-L. Wu. A universal calculus for stream processing languages. In Proc. 19th European Symposium on Programming, vol. 6012 of LNCS, pp. 507--528, Mar. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. W. Thies and S. Amarasinghe. An empirical characterization of stream programs and its implications for language and compiler design. In Proc. 19th International Conference on Parallel Architectures and Compilation Techniques, pp. 365--376, Sept. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A language for streaming applications. In Proc. 11th International Conference on Compiler Construction, vol. 2304 of LNCS, pp. 179--196, Apr. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. Thies, M. Karczmarek, M. Gordon, D. Maze, J. Wong, H. Hoffmann, M. Brown, and S. Amarasinghe. StreamIt: A compiler for streaming applications. Technical Report MIT-LCS-TM-622, Massachusetts Institute of Technology, Dec. 2001.Google ScholarGoogle Scholar
  36. J. Wolf, N. Bansal, K. Hildrum, S. Parekh, D. Rajan, R. Wagle, K.-L. Wu, and L. Fleischer. SODA: An optimizing scheduler for large-scale stream-based distributed computer systems. In Proc. 9th ACM/IFIP/-USENIX International Conference on Middleware, vol. 5346 of LNCS, pp. 306--325, Dec. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proc. 8th USENIX Symposium on Operating Systems Design and Implementation, pp. 1--14, Dec. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Zhang, Q. J. Li, R. Rabbah, and S. Amarasinghe. A lightweight streaming layer for multicore execution. ACM SIGARCH Computer Architecture News, 36(2):18--27, May 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. From a calculus to an execution environment for stream processing

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            DEBS '12: Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
            July 2012
            410 pages
            ISBN:9781450313155
            DOI:10.1145/2335484

            Copyright © 2012 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 16 July 2012

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate130of553submissions,24%

            Upcoming Conference

            DEBS '24

          PDF Format

          View or Download as a PDF file.

          PDFPresentation Slides

          eReader

          View online with eReader.

          eReader