abstract

XLJoins

Author:
Ali Mohammadi Shanghooshabad

University of Warwick, Coventry, United Kingdom

University of Warwick, Coventry, United Kingdom
View Profile

SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataJune 2021Pages 2902–2904https://doi.org/10.1145/3448016.3450582

Published:18 June 2021Publication History

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Pages 2902–2904

ABSTRACT

In many analytic settings join operations are fundamental as data is dispersed across different data sets (SQL or NoSQL tables, .csv files recording logs, click streams, KPIs from system/network monitoring, IoT telemetry, etc). However, in the era of big data the join operation can become exorbitantly expensive in terms of execution times and/or memory/space footprints.

References

Swarup Acharya, Phillip B Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. 1999. Join synopses for approximate query answering. In Proceedings of the 1999 ACM SIGMOD international conference on Management of data. 275--286.Google ScholarDigital Library
Christos Anagnostopoulos and Peter Triantafillou. 2015. Learning set cardinality in distance nearest neighbours. In 2015 IEEE international conference on data mining. IEEE, 691--696.Google ScholarDigital Library
Christos Anagnostopoulos and Peter Triantafillou. 2017a. Efficient scalable accurate regression queries in in-dbms analytics. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, 559--570.Google ScholarCross Ref
Christos Anagnostopoulos and Peter Triantafillou. 2017b. Query-driven learning for predictive analytics of data subspace cardinality. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 11, 4 (2017), 1--46.Google Scholar
Christopher M Bishop. 2013. Model-based machine learning. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 371, 1984 (2013), 20120222.Google ScholarCross Ref
Surajit Chaudhuri, Rajeev Motwani, and Vivek Narasayya. 1999. On random sampling over joins. ACM SIGMOD Record, Vol. 28, 2 (1999), 263--274.Google ScholarDigital Library
Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2019. DeepDB: Learn from Data, not from Queries! arXiv preprint arXiv:1909.00607 (2019).Google Scholar
Steffen L Lauritzen. 1996. Graphical models. Vol. 17. Clarendon Press.Google Scholar
Qingzhi Ma and Peter Triantafillou. 2019. Dbest: Revisiting approximate query processing engines with machine learning models. In Proceedings of the 2019 International Conference on Management of Data. 1553--1570.Google ScholarDigital Library
Frank J Massey Jr. 1951. The Kolmogorov-Smirnov test for goodness of fit. Journal of the American statistical Association, Vol. 46, 253 (1951), 68--78.Google ScholarCross Ref
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.Google Scholar
Vasanth Krishna Namasivayam and Viktor K Prasanna. 2006. Scalable parallel implementation of exact inference in Bayesian networks. In 12th International Conference on Parallel and Distributed Systems-(ICPADS'06), Vol. 1. IEEE, 8--pp.Google ScholarDigital Library
Frank Olken. 1993. Random sampling from databases. Ph.D. Dissertation. University of California, Berkeley.Google Scholar
Yongjoo Park, Barzan Mozafari, Joseph Sorenson, and Junhao Wang. 2018. VerdictDB: universalizing approximate query processing. In Proceedings of the 2018 International Conference on Management of Data. ACM, 1461--1476.Google ScholarDigital Library
Yongjoo Park, Ahmad Shahab Tajik, Michael Cafarella, and Barzan Mozafari. 2017. Database learning: Toward a database that becomes smarter every time. In Proceedings of the 2017 ACM International Conference on Management of Data. 587--602.Google ScholarDigital Library
Judea Pearl. 1982. Reverend Bayes on inference engines: A distributed hierarchical approach .Cognitive Systems Laboratory, School of Engineering and Applied Science...Google Scholar
Magnus Sahlgren. 2008. The distributional hypothesis. Italian Journal of Disability Studies, Vol. 20 (2008), 33--53.Google Scholar
Saravanan Thirumuruganathan, Shohedul Hasan, Nick Koudas, and Gautam Das. 2019. Approximate query processing using deep generative models. arXiv preprint arXiv:1903.10000 (2019).Google Scholar
Yinglong Xia and Viktor K Prasanna. 2010. Parallel exact inference on the cell broadband engine processor. J. Parallel and Distrib. Comput., Vol. 70, 5 (2010), 558--572.Google ScholarDigital Library
Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: one cardinality estimator for all tables. Proceedings of the VLDB Endowment, Vol. 14, 1 (2020), 61--73.Google ScholarDigital Library
Nevin L Zhang and David Poole. 1994. A simple approach to Bayesian network computations. In Proc. of the Tenth Canadian Conference on Artificial Intelligence .Google Scholar
Zhuoyue Zhao, Robert Christensen, Feifei Li, Xiao Hu, and Ke Yi. 2018. Random sampling over joins revisited. In Proceedings of the 2018 International Conference on Management of Data. 1525--1539.Google ScholarDigital Library

Index Terms

XLJoins
1. Information systems
  1. Data management systems
    1. Middleware for databases

Recommendations

Towards WAN-aware join sampling over geo-distributed data
EdgeSys '22: Proceedings of the 5th International Workshop on Edge Systems, Analytics and Networking

Large scale data analytics over geographically distributed data sources is challenging primarily due to the constrained and heterogeneous resource availability such as the wide area network (WAN) bandwidth. In this work, we look at the problem of ...
Read More
Sampling time-based sliding windows in bounded space
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Random sampling is an appealing approach to build synopses of large data streams because random samples can be used for a broad spectrum of analytical tasks. Users are often interested in analyzing only the most recent fraction of the data stream in ...
Read More
Edge-colouring of join graphs

A join graph is the complete union of two arbitrary graphs. We give sufficient conditions for a join graph to be 1-factorizable. As a consequence of our results, the Hilton's Overfull Subgraph Conjecture holds true for several subclasses of join graphs.
...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
June 2021
2969 pages
ISBN:9781450383431
DOI:10.1145/3448016
General Chairs:
Guoliang Li
Tsinghua University (China)
,
Zhanhuai Li
Northwestern Polytechnical University (China)
,
Program Chairs:
Stratos Idreos
Harvard University (USA)
,
Divesh Srivastava
AT&T (USA)
Copyright © 2021 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 June 2021
Check for updates
Author Tags
join graphs
join sampling
model joins
uniform sampling
Qualifiers
- abstract
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 91
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

XLJoins

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Towards WAN-aware join sampling over geo-distributed data

Sampling time-based sliding windows in bounded space

Edge-colouring of join graphs