short-paper

Exploiting Data Partitioning To Provide Approximate Results

Authors:
Bruhathi Sundarmurthy

U of Wisconsin Madison

U of Wisconsin Madison
View Profile

,
Paraschos Koutris

U of Wisconsin Madison

U of Wisconsin Madison
View Profile

,
Jeffrey Naughton

Google

Google
View Profile

BeyondMR'18: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and BeyondJune 2018Article No.: 5Pages 1–5https://doi.org/10.1145/3206333.3206337

Published:15 June 2018Publication History

BeyondMR'18: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond

Pages 1–5

ABSTRACT

Co-hash partitioning is a popular partitioning strategy in distributed query processing, where tables are co-located using join predicates. In this paper, we study the benefits of co-hash partitioning for obtaining approximate answers.

References

G. Eadon, E. I. Chong, S. Shankar, A. Raghavan, J. Srinivasan, and S. Das. Supporting table partitioning by reference in Oracle. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD '08, pages 1111--1122, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. SIGMOD Rec., 28(2):287--298, June 1999. Google ScholarDigital Library
J. R. Haritsa. The Picasso database query optimizer visualizer. Proc. VLDB Endow., 3(1-2):1517--1520, Sept. 2010. Google ScholarDigital Library
C. Qin and F. Rusu. PF-OLA: a high-performance framework for parallel online aggregation. Distributed and Parallel Databases, 32(3):337--375, 2014. Google ScholarDigital Library
J. Ramnarayan, B. Mozafari, S. Wale, S. Menon, N. Kumar, H. Bhanawat, S. Chakraborty, Y. Mahajan, R. Mishra, and K. Bachhav. Snappydata: A hybrid transactional analytical store built on Spark. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, pages 2153--2156, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
C. E. Särndal, B. Swensson, and J. Wretman. Model Assisted Survey Sampling. Springer, New York, 1992.Google ScholarCross Ref
J. Shute, R. Vingralek, B. Samwel, B. Handy, C. Whipkey, E. Rollins, M. Oancea, K. Littlefield, D. Menestrina, S. Ellner, J. Cieslewicz, I. Rae, T. Stancescu, and H. Apte. F1: A distributed sql database that scales. Proc. VLDB Endow., 6(11):1068--1079, Aug. 2013. Google ScholarDigital Library
TPC-H Benchmark. http://www.tpc.org/tpch.Google Scholar
K. Q. Tran, J. F. Naughton, B. Sundarmurthy, and D. Tsirogiannis. JECB: A join-extension, code-based approach to oltp data partitioning. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pages 39--50, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
F. M. Waas. Beyond Conventional Data Warehousing --- Massively Parallel Data Processing with Greenplum Database. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.Google Scholar
R. S. Xin, J. Rosen, M. Zaharia, M. J. Franklin, S. Shenker, and I. Stoica. Shark: Sql and rich analytics at scale. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, pages 13--24, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
E. Zamanian, C. Binnig, and A. Salama. Locality-aware partitioning in parallel database systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pages 17--30, New York, NY, USA, 2015. ACM. Google ScholarDigital Library

Index Terms

Exploiting Data Partitioning To Provide Approximate Results
1. Information systems
  1. Data management systems
    1. Information integration
      1. Data warehouses
  2. Information systems applications
    1. Decision support systems
      1. Data analytics
      2. Online analytical processing

Recommendations

Exploiting Embedded Synopsis for Exact and Approximate Query Processing
Database and Expert Systems Applications
Abstract
Organizing data in structured forms is a mainstream approach to improve the efficiency of query processing on database. For example, a B+Tree index is widely employed to limit the search space for selective queries, speeding up the query ...
Read More
Exploiting predicate-window semantics over data streams

The continuous sliding-window query model is used widely in data stream management systems where the focus of a continuous query is limited to a set of the most recent tuples. In this paper, we show that an interesting and important class of queries ...
Read More
Exploiting Constraints in Query Processing: Query transformation using interity constraints and views
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BeyondMR'18: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond
June 2018
54 pages
ISBN:9781450357036
DOI:10.1145/3206333
Conference Chairs:
Foto Afrati,
Jacek Sroka,
Program Chair:
Ke Yi,
Publications Chair:
Jan Hidders
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate19of36submissions,53%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 157
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploiting Data Partitioning To Provide Approximate Results

BeyondMR'18: Proceedings of the 5th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploiting Embedded Synopsis for Exact and Approximate Query Processing

Exploiting predicate-window semantics over data streams

Exploiting Constraints in Query Processing: Query transformation using interity constraints and views