ABSTRACT
Co-hash partitioning is a popular partitioning strategy in distributed query processing, where tables are co-located using join predicates. In this paper, we study the benefits of co-hash partitioning for obtaining approximate answers.
- G. Eadon, E. I. Chong, S. Shankar, A. Raghavan, J. Srinivasan, and S. Das. Supporting table partitioning by reference in Oracle. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD '08, pages 1111--1122, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. SIGMOD Rec., 28(2):287--298, June 1999. Google ScholarDigital Library
- J. R. Haritsa. The Picasso database query optimizer visualizer. Proc. VLDB Endow., 3(1-2):1517--1520, Sept. 2010. Google ScholarDigital Library
- C. Qin and F. Rusu. PF-OLA: a high-performance framework for parallel online aggregation. Distributed and Parallel Databases, 32(3):337--375, 2014. Google ScholarDigital Library
- J. Ramnarayan, B. Mozafari, S. Wale, S. Menon, N. Kumar, H. Bhanawat, S. Chakraborty, Y. Mahajan, R. Mishra, and K. Bachhav. Snappydata: A hybrid transactional analytical store built on Spark. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, pages 2153--2156, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
- C. E. Särndal, B. Swensson, and J. Wretman. Model Assisted Survey Sampling. Springer, New York, 1992.Google ScholarCross Ref
- J. Shute, R. Vingralek, B. Samwel, B. Handy, C. Whipkey, E. Rollins, M. Oancea, K. Littlefield, D. Menestrina, S. Ellner, J. Cieslewicz, I. Rae, T. Stancescu, and H. Apte. F1: A distributed sql database that scales. Proc. VLDB Endow., 6(11):1068--1079, Aug. 2013. Google ScholarDigital Library
- TPC-H Benchmark. http://www.tpc.org/tpch.Google Scholar
- K. Q. Tran, J. F. Naughton, B. Sundarmurthy, and D. Tsirogiannis. JECB: A join-extension, code-based approach to oltp data partitioning. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pages 39--50, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- F. M. Waas. Beyond Conventional Data Warehousing --- Massively Parallel Data Processing with Greenplum Database. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.Google Scholar
- R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: Sql and rich analytics at scale. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 13--24, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- E. Zamanian, C. Binnig, and A. Salama. Locality-aware partitioning in parallel database systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pages 17--30, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
Index Terms
- Exploiting Data Partitioning To Provide Approximate Results
Recommendations
Exploiting Embedded Synopsis for Exact and Approximate Query Processing
Database and Expert Systems ApplicationsAbstractOrganizing data in structured forms is a mainstream approach to improve the efficiency of query processing on database. For example, a B+Tree index is widely employed to limit the search space for selective queries, speeding up the query ...
Exploiting predicate-window semantics over data streams
The continuous sliding-window query model is used widely in data stream management systems where the focus of a continuous query is limited to a set of the most recent tuples. In this paper, we show that an interesting and important class of queries ...
Comments