Continuous MapReduce for In-DB Stream Analytics

Chen, Qiming; Hsu, Meichun

doi:10.1007/978-3-642-16961-8_9

Qiming Chen¹⁹ &
Meichun Hsu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6428))

Included in the following conference series:

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems"

1410 Accesses
6 Citations

Abstract

Scaling-out data-intensive analytics is generally made by means of parallel computation for gaining CPU bandwidth, and incremental computation for balancing workload. Combining these two mechanisms is the key to support large scale stream analytics.

Map-Reduce (M-R) is a programming model for supporting parallel computation over vast amounts of data on large clusters of commodity machines. Through a simple interface with two functions, map and reduce, this model facilitates parallel implementation of data intensive applications. In-DB M-R allows these functions to be embedded within standard queries to exploit the SQL expressive power, and allows them to be executed by the query engine with fast data access and reduced data move. However, when the data form infinite streams, the semantics and scale-out capability of M-R are challenged.

To solve this problem, we propose to integrate M-R with the continuous query model characterized by Cut-Rewind (C-R), i.e. cut a query execution based on some granule of the stream data and then rewind the state of the query without shutting it down, for processing the next chunk of stream data. This approach allows an M-R query with full SQL expressive power to be applied to dynamic stream data chunk by chunk for continuous, window-based stream analytics.

Our experience shows that integrating M-R and C-R can provide a powerful combination for parallelized and granulized stream processing. This combination enables us to scale-out stream analytics “horizontally” based on the M-R model, and “vertically” based on the C-R model.

The proposed approach has been prototyped on a commercial and proprietary parallel database engine. Our preliminary experiments reveal the merit of using query engine for near-real-time parallel and incremental stream analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arasu, A., Babu, S., Widom, J.: The CQL Continuous Query Language: Semantic Foundations and Query Execution. VLDB Journal 2(15) (June 2006)
Google Scholar
Chandrasekaran, S., et al.: TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In: CIDR 2003 (2003)
Google Scholar
Bryant, R.E.: Data-Intensive Supercomputing: The case for DISC, CMU-CS-07-128 (2007)
Google Scholar
Chandrasekaran, S., et al.: TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In: CIDR 2003 (2003)
Google Scholar
Chen, Q., Hsu, M.: Experience in Extending Query Engine for Continuous Analytics, Tech Rep HPL-2010-44 (2010)
Google Scholar
Chen, Q., Therber, A., Hsu, M., Zeller, H., Zhang, B., Wu, R.: Efficiently Support Map-Reduce alike Computation Models Inside Parallel DBMS. In: Proc. Thirteenth International Database Engineering & Applications Symposium, IDEAS 2009 (2009)
Google Scholar
Chen, Q., Hsu, M., Liu, R.: Extend UDF Technology for Integrated Analytics. In: Proc. DaWaK 2009 (2009)
Google Scholar
Chen, Q., Hsu, M.: Data-Continuous SQL Process Model. In: Proc. 16th International Conference on Cooperative Information Systems, CoopIS 2008 (2008)
Google Scholar
Dean, J.: Experiences with MapReduce, an abstraction for large-scale computation. In: Int. Conf. on Parallel Architecture and Compilation Techniques. ACM, New York (2006)
Google Scholar
DeWitt, D.J., Paulson, E., Robinson, E., Naughton, J., Royalty, J., Shankar, S., Krioukov, A.: Clustera: An Integrated Computation And Data Management System. In: VLDB 2008 (2008)
Google Scholar
Franklin, M.J., et al.: Continuous Analytics: Rethinking Query Processing in a NetworkEffect World. In: CIDR 2009 (2009)
Google Scholar
Gedik, B., Andrade, H., Wu, K.-L., Yu, P.S., Doo, M.C.: SPADE: The System S Declarative Stream Processing Engine. In: ACM SIGMOD 2008 (2008)
Google Scholar
Greenplum, Greenplum MapReduce for the Petabytes Database (2008), http://www.greenplum.com/resources/MapReduce/
HP Neoview enterprise data warehousing platform, http://h71028.www7.hp.com/enterprise/cache/414444-0-0-225-121.html
Liarou, E., et al.: Exploiting the Power of Relational Databases for Efficient Stream Processing. In: EDBT 2009 (2009)
Google Scholar
Yang, H.-c., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: ACM SIGMOD 2007 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

HP Labs, Hewlett Packard Co., Palo Alto, California, USA
Qiming Chen & Meichun Hsu

Authors

Qiming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Meichun Hsu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

STAR Lab, Vrije Universiteit Brussel (VUB), Bldg G/10, Pleinlaan 2, 1050, Brussels, Belgium
Robert Meersman
Curtin University of Technology, DEBII - CBS, De Laeter Way, 6102, Bentley, WA, Australia
Tharam Dillon
Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660, Boadilla del Monte, Madrid, Spain
Pilar Herrero

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Q., Hsu, M. (2010). Continuous MapReduce for In-DB Stream Analytics. In: Meersman, R., Dillon, T., Herrero, P. (eds) On the Move to Meaningful Internet Systems: OTM 2010 Workshops. OTM 2010. Lecture Notes in Computer Science, vol 6428. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16961-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-16961-8_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16960-1
Online ISBN: 978-3-642-16961-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics