skip to main content
10.1145/3448016.3450582acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
abstract

XLJoins

Published:18 June 2021Publication History

ABSTRACT

In many analytic settings join operations are fundamental as data is dispersed across different data sets (SQL or NoSQL tables, .csv files recording logs, click streams, KPIs from system/network monitoring, IoT telemetry, etc). However, in the era of big data the join operation can become exorbitantly expensive in terms of execution times and/or memory/space footprints.

References

  1. Swarup Acharya, Phillip B Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. 1999. Join synopses for approximate query answering. In Proceedings of the 1999 ACM SIGMOD international conference on Management of data. 275--286.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Christos Anagnostopoulos and Peter Triantafillou. 2015. Learning set cardinality in distance nearest neighbours. In 2015 IEEE international conference on data mining. IEEE, 691--696.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Christos Anagnostopoulos and Peter Triantafillou. 2017a. Efficient scalable accurate regression queries in in-dbms analytics. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, 559--570.Google ScholarGoogle ScholarCross RefCross Ref
  4. Christos Anagnostopoulos and Peter Triantafillou. 2017b. Query-driven learning for predictive analytics of data subspace cardinality. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 11, 4 (2017), 1--46.Google ScholarGoogle Scholar
  5. Christopher M Bishop. 2013. Model-based machine learning. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 371, 1984 (2013), 20120222.Google ScholarGoogle ScholarCross RefCross Ref
  6. Surajit Chaudhuri, Rajeev Motwani, and Vivek Narasayya. 1999. On random sampling over joins. ACM SIGMOD Record, Vol. 28, 2 (1999), 263--274.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2019. DeepDB: Learn from Data, not from Queries! arXiv preprint arXiv:1909.00607 (2019).Google ScholarGoogle Scholar
  8. Steffen L Lauritzen. 1996. Graphical models. Vol. 17. Clarendon Press.Google ScholarGoogle Scholar
  9. Qingzhi Ma and Peter Triantafillou. 2019. Dbest: Revisiting approximate query processing engines with machine learning models. In Proceedings of the 2019 International Conference on Management of Data. 1553--1570.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Frank J Massey Jr. 1951. The Kolmogorov-Smirnov test for goodness of fit. Journal of the American statistical Association, Vol. 46, 253 (1951), 68--78.Google ScholarGoogle ScholarCross RefCross Ref
  11. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.Google ScholarGoogle Scholar
  12. Vasanth Krishna Namasivayam and Viktor K Prasanna. 2006. Scalable parallel implementation of exact inference in Bayesian networks. In 12th International Conference on Parallel and Distributed Systems-(ICPADS'06), Vol. 1. IEEE, 8--pp.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Frank Olken. 1993. Random sampling from databases. Ph.D. Dissertation. University of California, Berkeley.Google ScholarGoogle Scholar
  14. Yongjoo Park, Barzan Mozafari, Joseph Sorenson, and Junhao Wang. 2018. VerdictDB: universalizing approximate query processing. In Proceedings of the 2018 International Conference on Management of Data. ACM, 1461--1476.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yongjoo Park, Ahmad Shahab Tajik, Michael Cafarella, and Barzan Mozafari. 2017. Database learning: Toward a database that becomes smarter every time. In Proceedings of the 2017 ACM International Conference on Management of Data. 587--602.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Judea Pearl. 1982. Reverend Bayes on inference engines: A distributed hierarchical approach .Cognitive Systems Laboratory, School of Engineering and Applied Science...Google ScholarGoogle Scholar
  17. Magnus Sahlgren. 2008. The distributional hypothesis. Italian Journal of Disability Studies, Vol. 20 (2008), 33--53.Google ScholarGoogle Scholar
  18. Saravanan Thirumuruganathan, Shohedul Hasan, Nick Koudas, and Gautam Das. 2019. Approximate query processing using deep generative models. arXiv preprint arXiv:1903.10000 (2019).Google ScholarGoogle Scholar
  19. Yinglong Xia and Viktor K Prasanna. 2010. Parallel exact inference on the cell broadband engine processor. J. Parallel and Distrib. Comput., Vol. 70, 5 (2010), 558--572.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: one cardinality estimator for all tables. Proceedings of the VLDB Endowment, Vol. 14, 1 (2020), 61--73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Nevin L Zhang and David Poole. 1994. A simple approach to Bayesian network computations. In Proc. of the Tenth Canadian Conference on Artificial Intelligence .Google ScholarGoogle Scholar
  22. Zhuoyue Zhao, Robert Christensen, Feifei Li, Xiao Hu, and Ke Yi. 2018. Random sampling over joins revisited. In Proceedings of the 2018 International Conference on Management of Data. 1525--1539.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. XLJoins

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data
      June 2021
      2969 pages
      ISBN:9781450383431
      DOI:10.1145/3448016

      Copyright © 2021 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 June 2021

      Check for updates

      Qualifiers

      • abstract

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%
    • Article Metrics

      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader