research-article

Presto: A Decade of SQL Analytics at Meta

Authors:
Yutian Sun

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0000-0002-0848-9029
View Profile

,
Tim Meehan

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0003-5575-8273
View Profile

,
Rebecca Schlussel

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0005-3322-2472
View Profile

,
Wenlei Xie

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0009-3504-6619
View Profile

,
Masha Basmanova

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0003-8018-3790
View Profile

,
Orri Erling

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0006-9143-2184
View Profile

,
Andrii Rosa

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0002-6567-9598
View Profile

,
Shixuan Fan

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0005-7100-6869
View Profile

,
Rongrong Zhong

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0003-2708-3184
View Profile

,
Arun Thirupathi

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0005-4615-0412
View Profile

,
Nikhil Collooru

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0005-6999-5960
View Profile

,
Ke Wang

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0003-5422-9150
View Profile

,
Sameer Agarwal

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0003-5217-2795
View Profile

,
Arjun Gupta

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0002-5798-8657
View Profile

,
Dionysios Logothetis

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0006-2501-1074
View Profile

,
Kostas Xirogiannopoulos

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0000-0002-3443-1242
View Profile

,
Amit Dutta

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0000-8994-8137
View Profile

,
Varun Gajjala

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0003-9469-7629
View Profile

,
Rohit Jain

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0005-0386-2698
View Profile

,
Ajay Palakuzhy

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0009-9830-809X
View Profile

,
Prithvi Pandian

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0007-1076-9824
View Profile

,
Sergey Pershin

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0008-1406-1804
View Profile

,
Abhisek Saikia

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0006-4369-3267
View Profile

,
Pranjal Shankhdhar

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0003-7301-3961
View Profile

,
Neerad Somanchi

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0004-4564-8780
View Profile

,
Swapnil Tailor

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0003-7316-871X
View Profile

,
Jialiang Tan

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0005-3025-5754
View Profile

,
Sreeni Viswanadha

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0007-7964-0623
View Profile

,
Zac Wen

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0002-9282-8819
View Profile

,
Biswapesh Chattopadhyay

Meta Platforms, Inc, Menlo Park, CA, USA

Meta Platforms, Inc, Menlo Park, CA, USA

0009-0000-2760-0899
View Profile

,
Bin Fan

Alluxio, Inc, San Mateo, CA, USA

Alluxio, Inc, San Mateo, CA, USA

0009-0003-2682-9060
View Profile

,
Deepak Majeti

Ahana Cloud, Inc, Mountain View, CA, USA

Ahana Cloud, Inc, Mountain View, CA, USA

0000-0003-3031-5778
View Profile

,
Aditi Pandit

Ahana Cloud, Inc, Mountain View, CA, USA

Ahana Cloud, Inc, Mountain View, CA, USA

0009-0003-9361-5521
View Profile

Proceedings of the ACM on Management of Data Volume 1 Issue 2Article No.: 189pp 1–25https://doi.org/10.1145/3589769

Published:20 June 2023Publication History

Proceedings of the ACM on Management of Data

Abstract

Presto is an open-source distributed SQL query engine that supports analytics workloads involving multiple exabyte-scale data sources. Presto is used for low-latency interactive use cases as well as long-running ETL jobs at Meta. It was originally launched at Meta in 2013 and donated to the Linux Foundation in 2019. Over the last ten years, upholding query latency and scalability with the hyper growth of data volume at Meta as well as new SQL analytics requirements have raised impressive challenges for Presto. A top priority has been ensuring query reliability does not regress with the shift towards smaller, more elastic container allocation, which requires queries to run with substantially smaller memory headroom and can be preempted at any time. Additionally, new demands from machine learning, privacy, and graph analytics have driven Presto maintainers to think beyond traditional data analytics. In this paper, we discuss several successful evolutions in recent years that have improved Presto latency as well as scalability by several orders of magnitude in production at Meta. Some of the notable ones are hierarchical caching, native vectorized execution engines, materialized views, and Presto on Spark. With these new capabilities, we have deprecated or are in the process of deprecating various legacy query engines so that Presto becomes the single piece to serve interactive, ad-hoc, ETL, and graph processing workloads for the entire data warehouse.

Supplemental Material

PACMMOD-V1mod189.mp4

mp4

19.1 MB

Download

References

RaptorX: Building a 10X Faster Presto. 2021. https://prestodb.io/blog/2021/02/04/raptorx.Google Scholar
Oracle Labs PGX: Parallel Graph AnalytiX. 2022. https://www.oracle.com/middleware/technologies/parallel-graph-analytix.html.Google Scholar
Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter Boncz, George Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan Sequeda, et al. 2018. G-CORE: A core for future graph query languages. In Proceedings of the 2018 International Conference on Management of Data. 1421--1432.Google ScholarDigital Library
Snowpark API. 2022. https://docs.snowflake.com/en/developer-guide/snowpark/index.html.Google Scholar
Michael Armbrust, Tathagata Das, Sameer Paranjpye, Reynold Xin, Shixiong Zhu, Ali Ghodsi, Burak Yavuz, Mukul Murthy, Joseph Torres, Liwen Sun, Peter A. Boncz, Mostafa Mokhtar, Herman Van Hovell, Adrian Ionescu, Alicja Luszczak, Michal Switakowski, Takuya Ueshin, Xiao Li, Michal Szafranski, Pieter Senster, and Matei Zaharia. 2020. Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores. Proc. VLDB Endow. , Vol. 13, 12 (2020), 3411--3424.Google ScholarDigital Library
Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia. 1383--1394.Google ScholarDigital Library
Nikos Armenatzoglou, Sanuj Basu, Naga Bhanoori, Mengchu Cai, Naresh Chainani, Kiran Chinta, Venkatraman Govindaraju, Todd J. Green, Monish Gupta, Sebastian Hillig, Eric Hotinger, Yan Leshinksy, Jintian Liang, Michael McCreedy, Fabian Nagel, Ippokratis Pandis, Panos Parchas, Rahul Pathak, Orestis Polychroniou, Foyzur Rahman, Gaurav Saxena, Gokul Soundararajan, Sriram Subramanian, and Doug Terry. 2022. Amazon Redshift Re-invented. In SIGMOD '22: International Conference on Management of Data. ACM, 2205--2217.Google Scholar
Presto Unlimited: MPP SQL Engine at Scale. 2019. https://prestodb.io/blog/2019/08/05/presto-unlimited-mpp-database-at-scale.Google Scholar
Bradley R Bebee, Daniel Choi, Ankit Gupta, Andi Gutmans, Ankesh Khandelwal, Yigit Kiran, Sainath Mallidi, Bruce McGaughy, Mike Personick, Karthik Rajan, et al. 2018. Amazon Neptune: Graph Data Management in the Cloud.. In ISWC (P&D/Industry/BlueSky).Google Scholar
Alexander Behm, Shoumik Palkar, Utkarsh Agarwal, Timothy Armstrong, David Cashman, Ankur Dave, Todd Greenstein, Shant Hovsepian, Ryan Johnson, Arvind Sai Krishnan, Paul Leventis, Ala Luszczak, Prashanth Menon, Mostafa Mokhtar, Gene Pang, Sameer Paranjpye, Greg Rahn, Bart Samwel, Tom van Bussel, Herman Van Hovell, Maryann Xue, Reynold Xin, and Matei Zaharia. 2022. Photon: A Fast Query Engine for Lakehouse Systems. In SIGMOD '22: International Conference on Management of Data. ACM, 2326--2339.Google Scholar
Brendan Burns, Brian Grant, David Oppenheimer, Eric A. Brewer, and John Wilkes. 2016. Borg, Omega, and Kubernetes. Commun. ACM , Vol. 59, 5 (2016), 50--57.Google ScholarDigital Library
Meta Data Centers. 2022. https://datacenters.fb.com/.Google Scholar
Biswapesh Chattopadhyay, Priyam Dutta, Weiran Liu, Ott Tinn, Andrew McCormick, Aniket Mokashi, Paul Harvey, Hector Gonzalez, David Lomax, Sagar Mittal, Roee Ebenstein, Nikita Mikhaylin, Hung-Ching Lee, Xiaoyan Zhao, Tony Xu, Luis Perez, Farhad Shahmohammadi, Tran Bui, Neil Mckay, Selcuk Aya, Vera Lychagina, and Brett Elliott. 2019. Procella: Unifying serving and analytical data at YouTube. Proc. VLDB Endow. , Vol. 12, 12 (2019), 2022--2034.Google ScholarDigital Library
Biswapesh Chattopadhyay, Pedro Eugenio Rocha Pedreira, Sundaram Narayanan, Sameer Agarwal, Yutian Sun, Peng Li, Suketu Vakharia, and Weiran Liu. 2023. Shared Foundations: Modernizing Meta's Data Lakehouse. In 13th Conference on Innovative Data Systems Research, CIDR.Google Scholar
Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. 2015. One Trillion Edges: Graph Processing at Facebook-Scale. Proc. VLDB Endow. , Vol. 8, 12 (2015), 1804--1815.Google ScholarDigital Library
ClickHouse. 2016. https://clickhouse.com/.Google Scholar
Disaggregated Coordinator. 2022. https://prestodb.io/blog/2022/04/15/disggregated-coordinator.Google Scholar
Beno^i t Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. 2016. The Snowflake Elastic Data Warehouse. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016. ACM, 215--226.Google Scholar
Ankur Dave, Alekh Jindal, Li Erran Li, Reynold Xin, Joseph Gonzalez, and Matei Zaharia. 2016. GraphFrames: an integrated API for mixing graph and relational queries. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, Redwood Shores, CA, USA, June 24 - 24, 2016, , Peter A. Boncz and Josep Llu'i s Larriba-Pey (Eds.). ACM, 2.Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In 6th Symposium on Operating System Design and Implementation (OSDI 2004). 137--150.Google Scholar
Alin Deutsch, Nadime Francis, Alastair Green, Keith Hare, Bei Li, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Wim Martens, Jan Michels, et al. 2022. Graph pattern matching in gql and sql/pgq. In Proceedings of the 2022 International Conference on Management of Data. 2246--2258.Google ScholarDigital Library
David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael Stonebraker, and David A. Wood. 1984. Implementation Techniques for Main Memory Database Systems. In SIGMOD'84, Proceedings of Annual Meeting, Boston, Massachusetts, USA, June 18--21, 1984. ACM Press, 1--8.Google ScholarDigital Library
Tomasz Drabas and Denny Lee. 2017. Learning PySpark. Packt Publishing Ltd.Google Scholar
Cynthia Dwork. 2006. Differential privacy. In Automata, Languages and Programming: 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10--14, 2006, Proceedings, Part II 33. Springer, 1--12.Google Scholar
Cosco: An efficient facebook-scale shuffle service. 2020. https://databricks.com/session/cosco-an-efficient-facebook-scale-shuffle-service.Google Scholar
Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. 2018. Cypher: An evolving query language for property graphs. In Proceedings of the 2018 International Conference on Management of Data. 1433--1445.Google ScholarDigital Library
Apache Hudi. 2017. https://hudi.apache.org.Google Scholar
Apache Iceberg. 2018. https://iceberg.apache.org.Google Scholar
Avoid Data Silos in Presto in Meta: the journey from Raptor to RaptorX. 2022. https://prestodb.io/blog/2022/01/28/avoid-data-silos-in-presto-in-meta.Google Scholar
Xiaowei Jiang, Yuejun Hu, Yu Xiang, Guangran Jiang, Xiaojun Jin, Chen Xia, Weihua Jiang, Jun Yu, Haitao Wang, Yuan Jiang, Jihong Ma, Li Su, and Kai Zeng. 2020. Alibaba Hologres: A Cloud-Native Service for Hybrid Serving/Analytical Processing. Proc. VLDB Endow. , Vol. 13, 12 (2020), 3272--3284.Google ScholarDigital Library
GQL: One Property Query Language. 2022. https://gql.today/.Google Scholar
Yuan Mei, Luwei Cheng, Vanish Talwar, Michael Y. Levin, Gabriela Jacques-Silva, Nikhil Simha, Anirban Banerjee, Brian Smith, Tim Williamson, Serhat Yilmaz, Weitao Chen, and Guoqiang Jerry Chen. 2020. Turbine: Facebook's Service Management Platform for Stream Processing. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20--24, 2020. IEEE, 1591--1602.Google ScholarCross Ref
Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. 2010. Dremel: Interactive Analysis of Web-Scale Datasets. Proc. VLDB Endow. , Vol. 3, 1 (2010), 330--339.Google ScholarDigital Library
Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis, Hossein Ahmadi, Dan Delorey, Slava Min, Mosha Pasumansky, and Jeff Shute. 2020. Dremel: A Decade of Interactive SQL Analysis at Web Scale. Proc. VLDB Endow. , Vol. 13, 12 (2020), 3461--3472.Google ScholarDigital Library
Neo4j. 2022. https://neo4j.com/.Google Scholar
Diego Ongaro and John K. Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In 2014 USENIX Annual Technical Conference, USENIX ATC '14. 305--319.Google Scholar
Common Sub-Expression optimization. 2021. https://prestodb.io/blog/2021/11/22/common-sub-expression-optimization.Google Scholar
Apache ORC. 2013. https://orc.apache.org/.Google Scholar
Apache Parquet. 2013. https://parquet.apache.org/.Google Scholar
Pedro Pedreira, Chris Croswhite, and Luis Carlos Erpen De Bona. 2016. Cubrick: Indexing Millions of Records per Second for Interactive Analytics. Proc. VLDB Endow. , Vol. 9, 13 (2016), 1305--1316.Google ScholarDigital Library
Pedro Pedreira, Orri Erling, Maria Basmanova, Kevin Wilfong, Laith S. Sakka, Krishna Pai, Wei He, and Biswapesh Chattopadhyay. 2022. Velox: Meta's Unified Execution Engine. Proc. VLDB Endow. , Vol. 15, 12, 3372--3384.Google ScholarDigital Library
Mark Raasveldt and Hannes Mü hleisen. 2019. DuckDB: an Embeddable Analytical Database. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference. ACM, 1981--1984.Google ScholarDigital Library
Bart Samwel, John Cieslewicz, Ben Handy, Jason Govig, Petros Venetis, Chanjun Yang, Keith Peters, Jeff Shute, Daniel Tenedorio, Himani Apte, Felix Weigel, David Wilhite, Jiacheng Yang, Jun Xu, Jiexing Li, Zhan Yuan, Craig Chasseur, Qiang Zeng, Ian Rae, Anurag Biyani, Andrew Harn, Yang Xia, Andrey Gubichev, Amr El-Helw, Orri Erling, Zhepeng Yan, Mohan Yang, Yiqun Wei, Thanh Do, Colin Zheng, Goetz Graefe, Somayeh Sardashti, Ahmed M. Aly, Divy Agrawal, Ashish Gupta, and Shivakumar Venkataraman. 2018. F1 Query: Declarative Querying at Scale. Proc. VLDB Endow. , Vol. 11, 12 (2018), 1835--1848.Google ScholarDigital Library
Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, and Christopher Berner. 2019. Presto: SQL on Everything. In 35th IEEE International Conference on Data Engineering, ICDE. IEEE, 1802--1813.Google Scholar
Leonard D. Shapiro. 1986. Join Processing in Database Systems with Large Main Memories. ACM Trans. Database Syst. , Vol. 11, 3 (1986), 239--264.Google ScholarDigital Library
Chunqiang Tang, Kenny Yu, Kaushik Veeraraghavan, Jonathan Kaldor, Scott Michelson, Thawan Kooburat, Aravind Anbudurai, Matthew Clark, Kabir Gogia, Long Cheng, Ben Christensen, Alex Gartrell, Maxim Khutornenko, Sachin Kulkarni, Marcin Pawlowski, Tuomas Pelkonen, Andre Rodrigues, Rounak Tibrewal, Vaishnavi Venkatesan, and Peter Zhang. 2020. Twine: A Unified Cluster Management System for Shared Infrastructure. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, November 4--6, 2020. USENIX Association, 787--803.Google Scholar
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Anthony, Hao Liu, and Raghotham Murthy. 2010. Hive - a petabyte scale data warehouse using Hadoop. In Proceedings of the 26th International Conference on Data Engineering, ICDE. 996--1005.Google ScholarCross Ref
TigerGraph. 2022. https://www.tigergraph.com/.Google Scholar
Apache Tinkerpop. 2022. https://tinkerpop.apache.org/.Google Scholar
Tutorial: How to Define SQL Functions With Presto Across All Connectors. 2021. https://dzone.com/articles/tutorial-how-to-define-sql-functions-with-presto-a.Google Scholar
Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming Meng, and Hassan Chafi. 2016. PGQL: a property graph query language. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems. 1--6.Google ScholarDigital Library
Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache Hadoop YARN: yet another resource negotiator. In ACM Symposium on Cloud Computing, SOCC '13, Santa Clara, CA, USA, October 1--3, 2013, , Guy M. Lohman (Ed.). ACM, 5:1--5:16.Google ScholarDigital Library
Royce J Wilson, Celia Yuxin Zhang, William Lam, Damien Desfontaines, Daniel Simmons-Marengo, and Bryant Gipson. 2020. Differentially private SQL with bounded user contribution. Proceedings on privacy enhancing technologies, Vol. 2020, 2 (2020), 230--250.Google ScholarCross Ref
Scaling with Presto on Spark. 2021. https://prestodb.io/blog/2021/10/26/Scaling-with-Presto-on-Spark.Google Scholar
Getting Started with PrestoDB and Aria Scan Optimizations. 2020. https://prestodb.io/blog/2020/08/14/getting-started-and-aria.Google Scholar
Reynold S. Xin, Joseph E. Gonzalez, Michael J. Franklin, and Ion Stoica. 2013. GraphX: a resilient distributed graph system on Spark. In First International Workshop on Graph Data Management Experiences and Systems, GRADES, co-located with SIGMOD/PODS. CWI/ACM, 2.Google ScholarDigital Library
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud'10. ioGoogle ScholarDigital Library

Index Terms

Presto: A Decade of SQL Analytics at Meta
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

Evaluating SQL-on-Hadoop for Big Data Warehousing on Not-So-Good Hardware
IDEAS '17: Proceedings of the 21st International Database Engineering & Applications Symposium

Big Data is currently conceptualized as data whose volume, variety or velocity impose significant difficulties in traditional techniques and technologies. Big Data Warehousing is emerging as a new concept for Big Data analytics. In this context, SQL-on-...
Read More
Evaluating Presto and SparkSQL with TPC-DS
Database Systems for Advanced Applications. DASFAA 2022 International Workshops
Abstract
From the perspective of the development trend of database technology and the application of big data, the unified management and analysis of relational data and non-relational data is a new trend. New relational computing engines, such as SparkSQL ...
Read More
Modeling Analytics for Computational Storage
ICPE '20: Proceedings of the ACM/SPEC International Conference on Performance Engineering

Next generation flash storage will be armed with a substantial amount of computing power. In this paper, we investigate opportunities to utilize this computational capability to optimize Online Analytical Processing (OLAP) applications. We have directed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Management of Data Volume 1, Issue 2
PACMMOD
June 2023
2310 pages
EISSN:2836-6573
DOI:10.1145/3605748
Editor:
Divyakant Agrawal
UC Santa Barbara, United States
Issue’s Table of Contents
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2023
Published in pacmmod Volume 1, Issue 2

Permissions
Request permissions about this article.
Request Permissions
Author Tags
data analytics
data warehouse
distributed database
etl
olap
presto
sql
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 309
  Total Downloads
- Downloads (Last 12 months)309
- Downloads (Last 6 weeks)38
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Presto: A Decade of SQL Analytics at Meta

Proceedings of the ACM on Management of Data

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Evaluating SQL-on-Hadoop for Big Data Warehousing on Not-So-Good Hardware

Evaluating Presto and SparkSQL with TPC-DS

Modeling Analytics for Computational Storage

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Presto: A Decade of SQL Analytics at Meta

Proceedings of the ACM on Management of Data

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Evaluating SQL-on-Hadoop for Big Data Warehousing on Not-So-Good Hardware

Evaluating Presto and SparkSQL with TPC-DS

Modeling Analytics for Computational Storage

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media