Abstract
In this chapter, we introduce a new promising technique for query processing, online aggregation. Online aggregation is proposed based on the assumption that for some applications, the precise results are not always required. Instead, the approximate results can provide a good enough estimation. Compared to the precise results, computing the approximate ones are more cost effective, especially for large-scale datasets. To generate the approximate result, online aggregation retrieves samples continuously from the database. The samples are streamed to the query engine for processing the query. The accuracy of the approximate result is described by a statistical model. Normally, the result is refined as more samples are obtained. The user can terminate the processing at any time, when he/she is satisfied with the quality of the result.
The performance of online aggregation relies on the sampling approach and estimation model. In this chapter, our discussion is focused on these two components. Besides introducing the basic principles of online aggregation, we also review some new applications built on top of it. We complete the chapter by discussing the challenges of online aggregation and some future directions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
TPC-H Benchmark. http://www.tpc.org/tpc-h
Acharya, S., Gibbons, P.B., Poosala, V.: Congressional Samples for Approximate Answering of Group-By Queries. In: SIGMOD Conference, pp. 487–498 (2000)
Acharya, S., Gibbons, P.B., Poosala, V., Ramaswamy, S.: Join Synopses for Approximate Query Answering. In: SIGMOD Conference, pp. 275–286 (1999)
Babcock, B., Chaudhuri, S., Das, G.: Dynamic Sample Selection for Approximate Query Processing. In: SIGMOD Conference, pp. 539–550 (2003)
Chaudhuri, S., Das, G., Datar, M., Motwani, R., Narasayya, V.R.: Overcoming Limitations of Sampling for Aggregation Queries. In: ICDE, pp. 534–542 (2001)
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce Online. Tech. rep., University of California, Berkeley (2009), http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-136.pdf
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Gerth, J., Talbot, J., Elmeleegy, K., Sears, R.: Online Aggregation and Continuous Query Support in MapReduce. In: SIGMOD Conference, pp. 1115–1118 (2010)
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI, pp. 137–150 (2004)
Haas, P.J., Hellerstein, J.M.: Ripple Joins for Online Aggregation. In: SIGMOD Conference, pp. 287–298 (1999)
Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online Aggregation. In: SIGMOD Conference, pp. 171–182 (1997)
Jacobs, A.: The Pathologies of Big Data. Commun. ACM 52(8), 36–44 (2009)
Jermaine, C., Dobra, A., Arumugam, S., Joshi, S., Pol, A.: A Disk-Based Join With Probabilistic Guarantees. In: SIGMOD Conference, pp. 563–574 (2005)
Jermaine, C., Pol, A., Arumugam, S.: Online Maintenance of Very Large Random Samples. In: SIGMOD Conference, pp. 299–310 (2004)
Litwin, W.: Linear Hashing: A New Tool for File and Table Addressing. In: VLDB, pp. 212–223 (1980)
Luo, G., Ellmann, C.J., Haas, P.J., Naughton, J.F.: A Scalable Hash Ripple Join Algorithm. In: SIGMOD Conference, pp. 252–262 (2002)
Olken, F.: Random Sampling from Databases. Ph.D. thesis. University of California (1993)
Olken, F., Rotem, D.: Maintenance of Materialized Views of Sampling Queries. In: ICDE, pp. 632–641 (1992)
Stoica, I., Morris, R., Karger, D.R., Kaashoek, M.F., Balakrishnan, H.: Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. In: SIGCOMM, pp. 149–160 (2001)
Tan, K.L., Goh, C.H., Ooi, B.C.: Online Feedback for Nested Aggregate Queries with Multi-Threading. In: VLDB, pp. 18–29 (1999)
Wu, S., Jiang, S., Ooi, B.C., Tan, K.L.: Distributed Online Aggregation. PVLDB 2(1), 443–454 (2009)
Wu, S., Ooi, B.C., Tan, K.L.: Continuous Sampling for Online Aggregation over Multiple Queries. In: SIGMOD Conference, pp. 651–662 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wu, S., Ooi, B.C., Tan, KL. (2013). Online Aggregation. In: Catania, B., Jain, L. (eds) Advanced Query Processing. Intelligent Systems Reference Library, vol 36. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28323-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-28323-9_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28322-2
Online ISBN: 978-3-642-28323-9
eBook Packages: EngineeringEngineering (R0)