Skip to main content

Open Execution Engines of Stream Analysis Operations

  • Conference paper
Data Management in Cloud, Grid and P2P Systems (Globe 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7450))

  • 557 Accesses

Abstract

In this paper we describe our massively parallel and elastic stream analysis platform; it is capable of supporting the graph-structured dataflow process with each logical operator executed by multiple physical instances running in parallel over distributed server nodes. We propose the canonical dataflow operator framework to provide automated and systematic support for executing, parallelizing and granulizing the continuous operations.

We focus on the following issues: first, how to categorize the meta-properties of stream operators such as the I/O, blocking, data grouping characteristics, for providing unified and automated system support; next, how to elastically and correctly parallelize a stateful operator that is history-sensitive, relying on the prior state and data processing results; and further, how to analyze unbounded stream granularly to ensure sound semantics (e.g. aggregation). These issues are not properly abstracted and systematically addressed in the current generation of stream processing systems, but left to user programs which can result in fragile code, disappointing performance and incorrect results.

We tackle these issues by introducing the open-executors. An open executor supports the streaming operations with specific characteristics and running pattern, but is open for the application logic to be plugged-in. We illustrate the power of this approach by showing the system support in parallelizing and granulizing dataflow operations safely and correctly. The proposed canonical operation framework can be generalized to allow us to standardize various operational patterns of stream operators, and have these patterns supported systematically and automatically. We have built this platform; our experience reveals its value in real-time, continuous, elastic data-parallel and topological stream analysis process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arasu, A., Babu, S., Widom, J.: The CQL Continuous Query Language: Semantic Foundations and Query Execution. VLDB Journal 15(2) (June 2006)

    Google Scholar 

  2. Abadi, D.J., et al.: The Design of the Borealis Stream Processing Engine. In: CIDR (2005)

    Google Scholar 

  3. Bryant, R.E.: Data-Intensive Supercomputing: The case for DISC. CMU-CS-07-128 (2007)

    Google Scholar 

  4. Chen, Q., Hsu, M., Zeller, H.: Experience in Continuous analytics as a Service (CaaaS). In: EDBT 2011 (2011)

    Google Scholar 

  5. Chen, Q., Hsu, M.: Experience in Extending Query Engine for Continuous Analytics. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DAWAK 2010. LNCS, vol. 6263, pp. 190–202. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Chen, Q., Hsu, M.: Continuous MapReduce for In-DB Stream Analytics. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2010. LNCS, vol. 6428, pp. 16–34. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Dean, J.: Experiences with MapReduce, an abstraction for large-scale computation. In: Int. Conf. on Parallel Architecture and Compilation Techniques. ACM (2006)

    Google Scholar 

  8. Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: Distributed data-parallel programs from sequential building blocks. In: EuroSys 2007 (March 2007)

    Google Scholar 

  9. Franklin, M.J., et al.: Continuous Analytics: Rethinking Query Processing in a Network¬Effect World. In: CIDR 2009 (2009)

    Google Scholar 

  10. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: A Not-So-Foreign Language for Data Processing. In: ACM SIGMOD 2008 (2008)

    Google Scholar 

  11. ØMQ Lightweight Messaging Kernel, http://www.zeromq.org/

  12. Apache ZooKeeper, http://zookeeper.apache.org/

  13. Kryo - Fast, efficient Java serialization, http://code.google.com/p/kryo/

  14. Twitter’s Open Source Storm Finally Hits, http://siliconangle.com/blog/2011/09/20/twitter-storm-finally-hits/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, Q., Hsu, M. (2012). Open Execution Engines of Stream Analysis Operations. In: Hameurlain, A., Hussain, F.K., Morvan, F., Tjoa, A.M. (eds) Data Management in Cloud, Grid and P2P Systems. Globe 2012. Lecture Notes in Computer Science, vol 7450. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32344-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32344-7_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32343-0

  • Online ISBN: 978-3-642-32344-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics