Skip to main content

Experience in Extending Query Engine for Continuous Analytics

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6263))

Abstract

Combining data warehousing and stream processing technologies has great potential in offering low-latency data-intensive analytics. Unfortunately, such convergence has not been properly addressed so far. The current generation of stream processing systems isin general built separately from the data warehouse and query engine, which can causesignificant overhead in data access and data movement, and is unable to take advantage of the functionalities already offered by the existing data warehouse systems.

In this work we tackle some hard problems not properly addressed previously in integrating stream analytics capability into the existing query engine. We define an extended SQL query model that unifies queries over both static relations and dynamic streaming data, and develop techniques to extend query engines to support the unified model. We propose the cut-and-rewind query execution model to allow a query with full SQL expressive power to be applied to stream data by converting the latter into a sequence of “chunks”, and executing the query over each chunk sequentially, but without shutting the query instance down between chunks for continuously maintaining the application context across the execution cycles as required by sliding-window operators. We also propose the cycle-based transaction model to support Continuous Querying with Continuous Persisting (CQCP) with cycle-based isolation and visibility.

We have prototyped our approach by extending the PostgreSQL. This work has resulted in a new kind of tightly integrated, highly efficient system with the advanced stream processing capability as well as the full DBMS functionality. We demonstrate it with the popular Linear Road benchmark, and report the performance. By leveraging the matured code base of a query engine to the maximal extent, we can significantly reduce the engineering investment needed for developing the streaming technology. Providing this capability on proprietary parallel analytics engine is work in progress.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abadi, D., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: A New Model and Architecture for Data Stream Management. VLDB J. 2(12), 120–139 (2003)

    Article  Google Scholar 

  2. Abadi, D.J., et al.: The Design of the Borealis Stream Processing Engine. In: CIDR (2005)

    Google Scholar 

  3. Arasu, A., Babu, S., Widom, J.: The CQL Continuous Query Language: Semantic Foundations and Query Execution. VLDB Journal 2(15) (June 2006)

    Google Scholar 

  4. Bryant, R.E.: Data-Intensive Supercomputing: The case for DISC, CMU-CS-07-128 (2007)

    Google Scholar 

  5. Chandrasekaran, S., et al.: TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In: CIDR 2003 (2003)

    Google Scholar 

  6. Chaiken, R., Jenkins, B., Larson, P.-Å., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. In: VLDB 2008 (2008)

    Google Scholar 

  7. Chen, J., et al.: NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In: SIGMOD (2000)

    Google Scholar 

  8. Chen, Q., Hsu, M.: Cooperating SQL Dataflow Processes for In-DB Analytics. In: Proc. CoopIS 2009 (2009)

    Google Scholar 

  9. Chen, Q., Hsu, M., Liu, R.: Extend UDF Technology for Integrated Analytics. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DAWAK 2009. LNCS, vol. 5691, pp. 256–270. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Cooper, B.F., et al.: PNUTS: Yahoo!’s Hosted Data Serving Platform. In: VLDB 2008 (2008)

    Google Scholar 

  11. Cranor, C.D., et al.: Gigascope: A Stream Database for Network Applications. In: SIGMOD 2003 (2003)

    Google Scholar 

  12. Gedik, B., Andrade, H., Wu, K.-L., Yu, P.S., Doo, M.C.: SPADE: The System S Declarative Stream Processing Engine. In: ACM SIGMOD 2008 (2008)

    Google Scholar 

  13. Franklin, M.J., et al.: Continuous Analytics: Rethinking Query Processing in a NetworkEffect World. In: CIDR 2009 (2009)

    Google Scholar 

  14. Jain, N., et al.: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core. In: SIGMOD (2006)

    Google Scholar 

  15. Liarou, E., et al.: Exploiting the Power of Relational Databases for Efficient Stream Processing. In: EDBT 2009 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, Q., Hsu, M. (2010). Experience in Extending Query Engine for Continuous Analytics. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2010. Lecture Notes in Computer Science, vol 6263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15105-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15105-7_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15104-0

  • Online ISBN: 978-3-642-15105-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics