Skip to main content

Online Aggregation

  • Chapter
Advanced Query Processing

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 36))

  • 980 Accesses

Abstract

In this chapter, we introduce a new promising technique for query processing, online aggregation. Online aggregation is proposed based on the assumption that for some applications, the precise results are not always required. Instead, the approximate results can provide a good enough estimation. Compared to the precise results, computing the approximate ones are more cost effective, especially for large-scale datasets. To generate the approximate result, online aggregation retrieves samples continuously from the database. The samples are streamed to the query engine for processing the query. The accuracy of the approximate result is described by a statistical model. Normally, the result is refined as more samples are obtained. The user can terminate the processing at any time, when he/she is satisfied with the quality of the result.

The performance of online aggregation relies on the sampling approach and estimation model. In this chapter, our discussion is focused on these two components. Besides introducing the basic principles of online aggregation, we also review some new applications built on top of it. We complete the chapter by discussing the challenges of online aggregation and some future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. TPC-H Benchmark. http://www.tpc.org/tpc-h

  2. Acharya, S., Gibbons, P.B., Poosala, V.: Congressional Samples for Approximate Answering of Group-By Queries. In: SIGMOD Conference, pp. 487–498 (2000)

    Google Scholar 

  3. Acharya, S., Gibbons, P.B., Poosala, V., Ramaswamy, S.: Join Synopses for Approximate Query Answering. In: SIGMOD Conference, pp. 275–286 (1999)

    Google Scholar 

  4. Babcock, B., Chaudhuri, S., Das, G.: Dynamic Sample Selection for Approximate Query Processing. In: SIGMOD Conference, pp. 539–550 (2003)

    Google Scholar 

  5. Chaudhuri, S., Das, G., Datar, M., Motwani, R., Narasayya, V.R.: Overcoming Limitations of Sampling for Aggregation Queries. In: ICDE, pp. 534–542 (2001)

    Google Scholar 

  6. Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce Online. Tech. rep., University of California, Berkeley (2009), http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-136.pdf

  7. Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Gerth, J., Talbot, J., Elmeleegy, K., Sears, R.: Online Aggregation and Continuous Query Support in MapReduce. In: SIGMOD Conference, pp. 1115–1118 (2010)

    Google Scholar 

  8. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI, pp. 137–150 (2004)

    Google Scholar 

  9. Haas, P.J., Hellerstein, J.M.: Ripple Joins for Online Aggregation. In: SIGMOD Conference, pp. 287–298 (1999)

    Google Scholar 

  10. Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online Aggregation. In: SIGMOD Conference, pp. 171–182 (1997)

    Google Scholar 

  11. Jacobs, A.: The Pathologies of Big Data. Commun. ACM 52(8), 36–44 (2009)

    Article  Google Scholar 

  12. Jermaine, C., Dobra, A., Arumugam, S., Joshi, S., Pol, A.: A Disk-Based Join With Probabilistic Guarantees. In: SIGMOD Conference, pp. 563–574 (2005)

    Google Scholar 

  13. Jermaine, C., Pol, A., Arumugam, S.: Online Maintenance of Very Large Random Samples. In: SIGMOD Conference, pp. 299–310 (2004)

    Google Scholar 

  14. Litwin, W.: Linear Hashing: A New Tool for File and Table Addressing. In: VLDB, pp. 212–223 (1980)

    Google Scholar 

  15. Luo, G., Ellmann, C.J., Haas, P.J., Naughton, J.F.: A Scalable Hash Ripple Join Algorithm. In: SIGMOD Conference, pp. 252–262 (2002)

    Google Scholar 

  16. Olken, F.: Random Sampling from Databases. Ph.D. thesis. University of California (1993)

    Google Scholar 

  17. Olken, F., Rotem, D.: Maintenance of Materialized Views of Sampling Queries. In: ICDE, pp. 632–641 (1992)

    Google Scholar 

  18. Stoica, I., Morris, R., Karger, D.R., Kaashoek, M.F., Balakrishnan, H.: Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. In: SIGCOMM, pp. 149–160 (2001)

    Google Scholar 

  19. Tan, K.L., Goh, C.H., Ooi, B.C.: Online Feedback for Nested Aggregate Queries with Multi-Threading. In: VLDB, pp. 18–29 (1999)

    Google Scholar 

  20. Wu, S., Jiang, S., Ooi, B.C., Tan, K.L.: Distributed Online Aggregation. PVLDB 2(1), 443–454 (2009)

    Google Scholar 

  21. Wu, S., Ooi, B.C., Tan, K.L.: Continuous Sampling for Online Aggregation over Multiple Queries. In: SIGMOD Conference, pp. 651–662 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sai Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wu, S., Ooi, B.C., Tan, KL. (2013). Online Aggregation. In: Catania, B., Jain, L. (eds) Advanced Query Processing. Intelligent Systems Reference Library, vol 36. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28323-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28323-9_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28322-2

  • Online ISBN: 978-3-642-28323-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics