tutorial

Optimizing Tensor Computations: From Applications to Compilation and Runtime Techniques

Authors:

Matthias Boehm,

Matteo Interlandi,

Chris JermaineAuthors Info & Claims

SIGMOD '23: Companion of the 2023 International Conference on Management of Data

Pages 53 - 59

https://doi.org/10.1145/3555041.3589407

Published: 05 June 2023 Publication History

Abstract

Machine learning (ML) training and scoring fundamentally relies on linear algebra programs and more general tensor computations. Most ML systems utilize distributed parameter servers and similar distribution strategies for mini-batch stochastic gradient descent training. However, many more tasks in the data science and engineering lifecycle can benefit from efficient tensor computations. Examples include primitives for data cleaning, data and model debugging, data augmentation, query processing, numerical simulations, as well as a wide variety of training and scoring algorithms. In this survey tutorial, we first make a case for the importance of optimizing more general tensor computations, and then provide an in-depth survey of existing applications, optimizing compilation techniques, and underlying runtime strategies. Interestingly, there are close connections to data-intensive applications, query rewriting and optimization, as well as query processing and physical design. Our goal for the tutorial is to structure existing work, create common terminology, and identify open research challenges.

References

[1]

Mart'i n Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. 265--283.

Digital Library

[2]

Shreya Agrawal, Luke Barrington, Carla Bromberg, John Burge, Cenk Gazen, and Jason Hickey. 2019. Machine Learning for Precipitation Now- casting from Radar Images. CoRR, Vol. abs/1912.12132 (2019).

[3]

Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinl"a nder, Matthias J. Sax, Sebastian Schelter, Mareike Hö ger, Kostas Tzoumas, and Daniel Warneke. 2014. The Stratosphere platform for big data analytics. VLDB J., Vol. 23, 6 (2014), 939--964. https://doi.org/10.1007/s00778-014-0357-y

Digital Library

[4]

Randy Allen and Ken Kennedy. 2001. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann.

[5]

Arash Ashari, Naser Sedaghati, John Eisenlohr, and P. Sadayappan. 2014. An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs. In ICS. 273--282. https://doi.org/10.1145/2597652.2597678

Digital Library

[6]

Paul Barham, Aakanksha Chowdhery, Jeff Dean, Sanjay Ghemawat, Steven Hand, Dan Hurt, Michael Isard, Hyeontaek Lim, Ruoming Pang, Sudip Roy, Brennan Saeta, Parker Schuh, Ryan Sepassi, Laurent El Shafey, Chandramohan A. Thekkath, and Yonghui Wu. 2022. Pathways: Asynchronous Distributed Dataflow for ML. In MLSys. https://proceedings.mlsys.org/paper/2022/hash/

[7]

Sebastian Baunsgaard and Matthias Boehm. 2023. AWARE: Workload-aware, Redundancy-exploiting Linear Algebra. In SIGMOD. https://doi.org/10.1145/3588682

Digital Library

[8]

Sebastian Baunsgaard, Matthias Boehm, Ankit Chaudhary, Behrouz Derakhshan, Stefan Geißelsö der, Philipp M. Grulich, Michael Hildebrand, Kevin Innerebner, Volker Markl, Claus Neubauer, Sarah Osterburg, Olga Ovcharenko, Sergey Redyuk, Tobias Rieger, Alireza Rezaei Mahdiraji, Sebastian Benjamin Wrede, and Steffen Zeuch. 2021. ExDRa: Exploratory Data Science on Federated Raw Data. In SIGMOD. 2450--2463. https://doi.org/10.1145/3448016.3457549

Digital Library

[9]

Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir, Vihan Jain, Levent Koc, Chiu Yuen Koo, Lukasz Lew, Clemens Mewald, Akshay Naresh Modi, Neoklis Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Euijong Whang, Martin Wicke, Jarek Wilkiewicz, Xin Zhang, and Martin Zinkevich. 2017. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. In SIGKDD. 1387--1395.

[10]

Geoffrey Belter, Elizabeth R. Jessup, Ian Karlin, and Jeremy G. Siek. 2009. Automating the generation of composed linear algebra kernels. In SC. https://doi.org/10.1145/1654059.1654119

Digital Library

[11]

Matthias Boehm, Iulian Antonov, Sebastian Baunsgaard, Mark Dokter, Robert Ginthö r, Kevin Innerebner, Florijan Klezin, Stefanie N. Lindstaedt, Arnab Phani, Benjamin Rath, Berthold Reinwald, Shafaq Siddiqui, and Sebastian Benjamin Wrede. 2020. SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle. In CIDR.

[12]

Matthias Boehm, Michael Dusenberry, Deron Eriksson, Alexandre V. Evfimievski, Faraz Makari Manshadi, Niketan Pansare, Berthold Reinwald, Frederick Reiss, Prithviraj Sen, Arvind Surve, and Shirish Tatikonda. 2016. SystemML: Declarative Machine Learning on Spark. Proc. VLDB Endow., Vol. 9, 13 (2016), 1425--1436. https://doi.org/10.14778/3007263.3007279

Digital Library

[13]

Matthias Boehm, Arun Kumar, and Jun Yang. 2019. Data Management in Machine Learning Systems. Morgan & Claypool Publishers. https://doi.org/10.2200/S00895ED1V01Y201901DTM057

[14]

Matthias Boehm, Berthold Reinwald, Dylan Hutchison, Prithviraj Sen, Alexandre V. Evfimievski, and Niketan Pansare. 2018. On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML. Proc. VLDB Endow., Vol. 11, 12 (2018), 1755--1768. https://doi.org/10.14778/3229863.3229865

Digital Library

[15]

Matthias Boehm, Shirish Tatikonda, Berthold Reinwald, Prithviraj Sen, Yuanyuan Tian, Douglas Burdick, and Shivakumar Vaithyanathan. 2014. Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML. Proc. VLDB Endow., Vol. 7, 7 (2014), 553--564. https://doi.org/10.14778/2732286.2732292

Digital Library

[16]

Matthias Bö hm, Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Frederick R. Reiss, Prithviraj Sen, Shirish Tatikonda, and Yuanyuan Tian. 2014. SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs. IEEE Data Eng. Bull., Vol. 37, 3 (2014), 52--62. http://sites.computer.org/debull/A14sept/p52.pdf

[17]

Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloé Kiddon, Jakub Konecný, Stefano Mazzocchi, Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. 2019. Towards Federated Learning at Scale: System Design. In MLSys.

[18]

Zhuhua Cai, Zografoula Vagena, Luis Leopoldo Perez, Subramanian Arumugam, Peter J. Haas, and Christopher M. Jermaine. 2013. Simulation of database-valued markov chains using SimSQL. In SIGMOD. 637--648. https://doi.org/10.1145/2463676.2465283

Digital Library

[19]

José Cambronero, John K. Feser, Micah J. Smith, and Samuel Madden. 2017. Query Optimization for Dynamic Imputation. Proc. VLDB Endow., Vol. 10, 11 (2017), 1310--1321.

Digital Library

[20]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Q. Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In OSDI. 578--594. https://www.usenix.org/conference/osdi18/presentation/chen

[21]

Stephen Chou, Fredrik Kjolstad, and Saman P. Amarasinghe. 2018. Format abstraction for sparse tensor algebra compilers. Proc. ACM Program. Lang., Vol. 2, OOPSLA (2018), 123:1--123:30. https://doi.org/10.1145/3276493

Digital Library

[22]

Edith Cohen. 1997. Size-Estimation Framework with Applications to Transitive Closure and Reachability. J. Comput. Syst. Sci., Vol. 55, 3 (1997), 441--453. https://doi.org/10.1006/jcss.1997.1534

Digital Library

[23]

Edith Cohen. 1998. Structure Prediction and Computation of Sparse Matrix Products. J. Comb. Optim., Vol. 2, 4 (1998), 307--332.

[24]

Keith D. Cooper and Linda Torczon. 2004. Engineering a Compiler. Morgan Kaufmann.

[25]

Andrew Crotty, Alex Galakatos, Kayhan Dursun, Tim Kraska, Carsten Binnig, Ugur cC etintemel, and Stan Zdonik. 2015. An Architecture for Compiling UDF-centric Workflows. Proc. VLDB Endow., Vol. 8, 12 (2015), 1466--1477. https://doi.org/10.14778/2824032.2824045

Digital Library

[26]

Ekin D. Cubuk, Barret Zoph, Dandelion Mané, Vijay Vasudevan, and Quoc V. Le. 2019. AutoAugment: Learning Augmentation Strategies From Data. In CVPR. 113--123. https://doi.org/10.1109/CVPR.2019.00020

[27]

William J. Dally. 2018. Hardware for Deep Learning. https://youtu.be/zDBF0xwQW-0 MLSys Keynote.

[28]

Patrick Damme et al. 2022. DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines. In CIDR. https://www.cidrdb.org/cidr2022/papers/p4-damme.pdf

[29]

Tri Dao, Albert Gu, Alexander Ratner, Virginia Smith, Chris De Sa, and Christopher Ré. 2019. A Kernel Theory of Modern Data Augmentation. In ICML (Proceedings of Machine Learning Research, Vol. 97). 1528--1537. http://proceedings.mlr.press/v97/dao19b.html

[30]

Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew W. Senior, Paul A. Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large Scale Distributed Deep Networks. In NeurIPS. 1232--1240.

Digital Library

[31]

Xin Luna Dong and Theodoros Rekatsinas. 2018. Data Integration and Machine Learning: A Natural Synergy. In SIGMOD. 1645--1650. https://doi.org/10.1145/3183713.3197387

Digital Library

[32]

Joseph Vinish D'silva, Florestan De Moor, and Bettina Kemme. 2018. AIDA - Abstraction for Advanced In-Database Analytics. Proc. VLDB Endow., Vol. 11, 11 (2018), 1400--1413.

Digital Library

[33]

Tarek Elgamal, Shangyu Luo, Matthias Boehm, Alexandre V. Evfimievski, Shirish Tatikonda, Berthold Reinwald, and Prithviraj Sen. 2017. SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning. In CIDR. http://cidrdb.org/cidr2017/papers/p3-elgamal-cidr17.pdf

[34]

Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, and Berthold Reinwald. 2018. Compressed linear algebra for large-scale machine learning. VLDB J., Vol. 27, 5 (2018), 719--744.

Digital Library

[35]

Jingzhi Fang, Yanyan Shen, Yue Wang, and Lei Chen. 2020. Optimizing DNN Computation Graph using Graph Substitutions. Proc. VLDB Endow., Vol. 13, 11 (2020), 2734--2746. http://www.vldb.org/pvldb/vol13/p2734-fang.pdf

Digital Library

[36]

Shaoduo Gan, Xiangru Lian, Rui Wang, Jianbin Chang, Chengjun Liu, Hongmei Shi, Shengzhuo Zhang, Xianghong Li, Tengxu Sun, Jiawei Jiang, Binhang Yuan, Sen Yang, Ji Liu, and Ce Zhang. 2021. BAGUA: Scaling up Distributed Learning with System Relaxations. Proc. VLDB Endow., Vol. 15, 4 (2021), 804--813. https://doi.org/10.14778/3503585.3503590

Digital Library

[37]

Apurva Gandhi, Yuki Asada, Victor Fu, Advitya Gemawat, Lihao Zhang, Rathijit Sen, Carlo Curino, Jesú s Camacho-Rodr'i guez, and Matteo Interlandi. 2022. The Tensor Data Platform: Towards an AI-centric Database System. CoRR, Vol. abs/2211.02753 (2022). https://doi.org/10.48550/arXiv.2211.02753

[38]

Apurva Gandhi, Yuki Asada, Victor Fu, Advitya Gemawat, Lihao Zhang, Rathijit Sen, Carlo Curino, Jesus Camacho-Rodriguez, and Matteo Interlandi. 2023. The Tensor Data Platform: Towards an AI-centric Database System. In CIDR. https://www.cidrdb.org/cidr2023/papers/p68-gandhi.pdf

[39]

Zekai J. Gao, Shangyu Luo, Luis Leopoldo Perez, and Chris Jermaine. 2017. The BUDS Language for Distributed Bayesian Machine Learning. In SIGMOD. https://doi.org/10.1145/3035918.3035937

Digital Library

[40]

Rainer Gemulla, Erik Nijkamp, Peter J. Haas, and Yannis Sismanis. 2011. Large-scale matrix factorization with distributed stochastic gradient descent. In SIGKDD. 69--77. https://doi.org/10.1145/2020408.2020426

Digital Library

[41]

Google. 2019. Inside TensorFlow: tf.distribute.Strategy. https://www.youtube.com/watch?v=jKV53r9-H14

[42]

Google. 2020. TensorFlow Federated: Machine Learning on Decentralized Data. https://www.tensorflow.org/federated

[43]

Google. 2022. DTensor Concepts. https://www.tensorflow.org/guide/dtensor_overview

[44]

Dong He, Supun Chathuranga Nakandala, Dalitso Banda, Rathijit Sen, Karla Saur, Kwanghyun Park, Carlo Curino, Jesú s Camacho-Rodr'i guez, Konstantinos Karanasos, and Matteo Interlandi. 2022. Query Processing on Tensor Computation Runtimes. Proc. VLDB Endow., Vol. 15, 11 (2022), 2811--2825. https://www.vldb.org/pvldb/vol15/p2811-he.pdf

Digital Library

[45]

Alireza Heidari, Joshua McGrath, Ihab F. Ilyas, and Theodoros Rekatsinas. 2019. HoloDetect: Few-Shot Learning for Error Detection. In SIGMOD. 829--846. https://doi.org/10.1145/3299869.3319888

Digital Library

[46]

Matteo Hessel, Manuel Kroiss, Aidan Clark, Iurii Kemaev, John Quan, Thomas Keck, Fabio Viola, and Hado van Hasselt. 2021. Podracer architectures for scalable Reinforcement Learning. CoRR, Vol. abs/2104.06272 (2021). https://arxiv.org/abs/2104.06272

[47]

Pedro Holanda and Hannes Mü hleisen. 2019. Relational Queries with a Tensor Processing Unit. In DaMoN. 19:1--19:3. https://doi.org/10.1145/3329785.3329932

Digital Library

[48]

Yu-Ching Hu, Yuliang Li, and Hung-Wei Tseng. 2022. TCUDB: Accelerating Database with Tensor Processors. In SIGMOD. 1360--1374. https://doi.org/10.1145/3514221.3517869

Digital Library

[49]

Botong Huang, Shivnath Babu, and Jun Yang. 2013. Cumulon: optimizing statistical data analysis in the cloud. In SIGMOD. 1--12. https://doi.org/10.1145/2463676.2465273

Digital Library

[50]

Madelon Hulsebos, Kevin Zeng Hu, Michiel A. Bakker, Emanuel Zgraggen, Arvind Satyanarayan, Tim Kraska, cC agatay Demiralp, and Cé sar A. Hidalgo. 2019. Sherlock: A Deep Learning Approach to Semantic Data Type Detection. In SIGKDD. 1500--1508. https://doi.org/10.1145/3292500.3330993

Digital Library

[51]

Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Christopher M. Jermaine, and Peter J. Haas. 2008. MCDB: a monte carlo approach to managing uncertain data. In SIGMOD. 687--700. https://doi.org/10.1145/1376616.1376686

Digital Library

[52]

Dimitrije Jankov, Shangyu Luo, Binhang Yuan, Zhuhua Cai, Jia Zou, Chris Jermaine, and Zekai J. Gao. 2019. Declarative Recursive Computation on an RDBMS. Proc. VLDB Endow., Vol. 12, 7 (2019), 822--835.

Digital Library

[53]

Dimitrije Jankov, Binhang Yuan, Shangyu Luo, and Chris Jermaine. 2021. Distributed Numerical and Machine Learning Computations via Two-Phase Execution of Aggregated Join Trees. Proc. VLDB Endow., Vol. 14, 7 (2021), 1228--1240. https://doi.org/10.14778/3450980.3450991

Digital Library

[54]

Chris Jermaine. 2021. The Tensor-Relational Algebra, and Other Ideas in Machine Learning System Design. In SSDBM. 270. https://doi.org/10.1145/3468791.3472262

Digital Library

[55]

Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken. 2019a. TASO: optimizing deep learning computation with automatic generation of graph substitutions. In SOSP. 47--62. https://doi.org/10.1145/3341301.3359630

Digital Library

[56]

Zhihao Jia, James Thomas, Todd Warszawski, Mingyu Gao, Matei Zaharia, and Alex Aiken. 2019b. Optimizing DNN Computation with Relaxed Graph Substitutions. In MLSys. https://proceedings.mlsys.org/book/276.pdf

[57]

Jiawei Jiang, Bin Cui, Ce Zhang, and Lele Yu. 2017. Heterogeneity-aware Distributed Parameter Servers. In SIGMOD. 463--478.

[58]

Jiawei Jiang, Shaoduo Gan, Yue Liu, Fanlin Wang, Gustavo Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, and Ce Zhang. 2021. Towards Demystifying Serverless Machine Learning Training. In SIGMOD. 857--871. https://doi.org/10.1145/3448016.3459240

Digital Library

[59]

Sian Jin, Chengming Zhang, Xintong Jiang, Yunhe Feng, Hui Guan, Guanpeng Li, Shuaiwen Song, and Dingwen Tao. 2021. COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression. Proc. VLDB Endow., Vol. 15, 4 (2021), 886--899. https://doi.org/10.14778/3503585.3503597

Digital Library

[60]

Steven G. Johnson. 2017. More Dots: Syntactic Loop Fusion in Julia. https://julialang.org/blog/2017/01/moredots/

[61]

Norman P. Jouppi and otjers. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In ISCA. 1--12. https://doi.org/10.1145/3079856.3080246

Digital Library

[62]

Norman P. Jouppi, Doe Hyun Yoon, George Kurian, Sheng Li, Nishant Patil, James Laudon, Cliff Young, and David A. Patterson. 2020. A domain-specific supercomputer for training deep neural networks. Commun. ACM, Vol. 63, 7 (2020), 67--78. https://doi.org/10.1145/3360307

Digital Library

[63]

Julia. 2022. Parallel Computing. https://docs.julialang.org/en/v1/manual/parallel-computing/

[64]

Peter Kairouz, Brendan McMahan, and Virginia Smith. 2020. Federated Learning Tutorial. In NeurIPS. https://slideslive.com/38935813/federated-learning-tutorial

[65]

Vasileios Karakasis, Theodoros Gkountouvas, Kornilios Kourtis, Georgios I. Goumas, and Nectarios Koziris. 2013. An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication. IEEE Trans. Parallel Distributed Syst., Vol. 24, 10 (2013), 1930--1940. https://doi.org/10.1109/TPDS.2012.290

Digital Library

[66]

Konstantinos Karanasos, Matteo Interlandi, Fotis Psallidas, Rathijit Sen, Kwanghyun Park, Ivan Popivanov, Doris Xin, Supun Nakandala, Subru Krishnan, Markus Weimer, Yuan Yu, Raghu Ramakrishnan, and Carlo Curino. 2020. Extending Relational Query Processing with ML Inference. In CIDR. http://cidrdb.org/cidr2020/papers/p24-karanasos-cidr20.pdf

[67]

David Kernert, Frank Kö hler, and Wolfgang Lehner. 2015. SpMacho - Optimizing Sparse Linear Algebra Expressions with Probabilistic Density Estimation. In EDBT. 289--300.

[68]

David Kernert, Wolfgang Lehner, and Frank Kö hler. 2016. Topology-Aware Optimization of Big Sparse Matrices and Matrix Multiplications on Main-Memory Systems. In ICDE. 823--834.

[69]

Mahmoud Abo Khamis, Hung Q. Ngo, and Atri Rudra. 2016. FAQ: Questions Asked Frequently. In PODS. 13--28. https://doi.org/10.1145/2902251.2902280

Digital Library

[70]

Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman P. Amarasinghe. 2017. The tensor algebra compiler. Proc. ACM Program. Lang., Vol. 1, OOPSLA (2017), 77:1--77:29. https://doi.org/10.1145/3133901

Digital Library

[71]

Urs Kö ster, Tristan Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William Constable, Oguz Elibol, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Carey Kloss, Ruby J. Pai, and Naveen Rao. 2017. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. In NeurIPS. 1742--1752. https://proceedings.neurips.cc/paper/2017/hash/a0160709701140704575d499c997b6ca-Abstract.html

[72]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In NeurIPS. 1106--1114. https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html

Digital Library

[73]

Arun Kumar, Matthias Boehm, and Jun Yang. 2017. Data Management in Machine Learning: Challenges, Techniques, and Systems. In SIGMOD. 1717--1722. https://doi.org/10.1145/3035918.3054775

Digital Library

[74]

Rasmus Munk Larsen and Tatiana Shpeisman. 2019. TensorFlow Graph Optimizations. https://web.stanford.edu/class/cs245/slides/TFGraphOptimizationsStanford.pdf Guest Lecture Stanford.

[75]

Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques A. Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. In CGO. IEEE, 2--14. https://doi.org/10.1109/CGO51591.2021.9370308

Digital Library

[76]

Fengan Li, Lingjiao Chen, Yijing Zeng, Arun Kumar, Xi Wu, Jeffrey F. Naughton, and Jignesh M. Patel. 2019. Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent. In SIGMOD. 1517--1534.

Digital Library

[77]

Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. In OSDI. 583--598.

Digital Library

[78]

Peng Li, Xi Rao, Jennifer Blase, Yue Zhang, Xu Chu, and Ce Zhang. 2021. CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks. In ICDE. 13--24. https://doi.org/10.1109/ICDE51399.2021.00009

[79]

Eric Liang, Richard Liaw, Robert Nishihara, Philipp Moritz, Roy Fox, Ken Goldberg, Joseph Gonzalez, Michael I. Jordan, and Ion Stoica. 2018. RLlib: Abstractions for Distributed Reinforcement Learning. In ICML, Vol. 80. 3059--3068. http://proceedings.mlr.press/v80/liang18b.html

[80]

Weifeng Liu and Brian Vinter. 2014. An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data. In IPDPS. 370--381.

[81]

Scott M. Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In NeurIPS. 4765--4774. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html

Digital Library

[82]

Shangyu Luo, Zekai J. Gao, Michael N. Gubanov, Luis Leopoldo Perez, and Christopher M. Jermaine. 2017. Scalable Linear Algebra on a Relational Database System. In ICDE. 523--534.

[83]

Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeff Dean. 2017. Device Placement Optimization with Reinforcement Learning. In ICML, Vol. 70. 2430--2439.

[84]

Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: A Distributed Framework for Emerging AI Applications. In OSDI. 561--577. https://www.usenix.org/conference/osdi18/presentation/nishihara

[85]

Supun Nakandala, Arun Kumar, and Yannis Papakonstantinou. 2019a. Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations. In SIGMOD. 1589--1606. https://doi.org/10.1145/3299869.3319874

Digital Library

[86]

Supun Nakandala, Karla Saur, Gyeong-In Yu, Konstantinos Karanasos, Carlo Curino, Markus Weimer, and Matteo Interlandi. 2020a. A Tensor Compiler for Unified Machine Learning Prediction Serving. In OSDI. 899--917. https://www.usenix.org/conference/osdi20/presentation/nakandala

[87]

Supun Nakandala, Yuhao Zhang, and Arun Kumar. 2019b. Cerebro: Efficient and Reproducible Model Selection on Deep Learning Systems. In DEEM@SIGMOD Workshop. 6:1--6:4. https://doi.org/10.1145/3329486.3329496

Digital Library

[88]

Supun Nakandala, Yuhao Zhang, and Arun Kumar. 2020b. Cerebro: A Data System for Optimized Deep Learning Model Selection. Proc. VLDB Endow., Vol. 13, 11 (2020), 2159--2173. http://www.vldb.org/pvldb/vol13/p2159-nakandala.pdf

Digital Library

[89]

Felix Neutatz, Felix Biessmann, and Ziawasch Abedjan. 2021. Enforcing Constraints for Machine Learning Systems via Declarative Feature Selection: An Experimental Study. In SIGMOD. 1345--1358. https://doi.org/10.1145/3448016.3457295

Digital Library

[90]

Milos Nikolic, Mohammed Elseidy, and Christoph Koch. 2014. LINVIEW: incremental view maintenance for complex analytical queries. In SIGMOD. 253--264.

[91]

NVIDIA. 2020. A100 Tensor Core GPU Architecture.

[92]

NVIDIA. 2022. TensorRT Developer Guide. https://docs.nvidia.com/deeplearning/tensorrt/pdf/TensorRT-Developer-Guide.pdf

[93]

Kunle Olukotun. 2021. "Let the Data Flow!". In CIDR.

[94]

Shoumik Palkar, James Thomas, Deepak Narayanan, Pratiksha Thaker, Rahul Palamuttam, Parimarjan Negi, Anil Shanbhag, Malte Schwarzkopf, Holger Pirk, Saman P. Amarasinghe, Samuel Madden, and Matei Zaharia. 2018. Evaluating End-to-End Optimization for Data Analytics Applications in Weld. Proc. VLDB Endow., Vol. 11, 9 (2018), 1002--1015. https://doi.org/10.14778/3213880.3213890

Digital Library

[95]

Shoumik Palkar, James Thomas, Anil Shanbhag, Malte Schwarzkopf, Saman P. Amarasinghe, and Matei Zaharia. 2017. A Common Runtime for High Performance Data Analysis. In CIDR. http://cidrdb.org/cidr2017/papers/p127-palkar-cidr17.pdf

[96]

Stavros Papadopoulos, Kushal Datta, Samuel Madden, and Timothy G. Mattson. 2016. The TileDB Array Data Storage Manager. Proc. VLDB Endow., Vol. 10, 4 (2016), 349--360. https://doi.org/10.14778/3025111.3025117

Digital Library

[97]

Adam Paszke et al. 2019. PyTorch: An Imperative Style, High- Performance Deep Learning Library. In NeurIPS.

[98]

Fabian Pedregosa, Gaë l Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake VanderPlas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res., Vol. 12 (2011), 2825--2830.

Digital Library

[99]

Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W. Battaglia. 2021. Learning Mesh-Based Simulation with Graph Networks. ICLR (2021).

[100]

Arnab Phani, Lukas Erlbacher, and Matthias Boehm. 2022. UPLIFT: Parallelization Strategies for Feature Transformations in Machine Learning Workloads. Proc. VLDB Endow., Vol. 15, 11 (2022), 2929--2938. https://www.vldb.org/pvldb/vol15/p2929-phani.pdf

Digital Library

[101]

Alexander Renz-Wieland, Rainer Gemulla, Zoi Kaoudi, and Volker Markl. 2022. NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter Access. In SIGMOD. 481--495. https://doi.org/10.1145/3514221.3517860

Digital Library

[102]

Alexander Renz-Wieland, Rainer Gemulla, Steffen Zeuch, and Volker Markl. 2020. Dynamic Parameter Allocation in Parameter Servers. Proc. VLDB Endow., Vol. 13, 11 (2020), 1877--1890. http://www.vldb.org/pvldb/vol13/p1877-renz-wieland.pdf

Digital Library

[103]

Marco Tú lio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In SIGKDD. 1135--1144. https://doi.org/10.1145/2939672.2939778

Digital Library

[104]

Matthew Rocklin. 2015. Dask: Parallel Computation with Blocked algorithms and Task Scheduling. In SCIPY.

[105]

Hongbo Rong, Jongsoo Park, Lingxiang Xiang, Todd A. Anderson, and Mikhail Smelyanskiy. 2016. Sparso: Context-driven Optimizations of Sparse Linear Algebra. In PACT. 247--259. https://doi.org/10.1145/2967938.2967943

Digital Library

[106]

Svetlana Sagadeeva and Matthias Boehm. 2021. SliceLine: Fast, Linear-Algebra-based Slice Finding for ML Model Debugging. In SIGMOD. 2290--2299. https://doi.org/10.1145/3448016.3457323

Digital Library

[107]

Ricardo Salazar, Felix Neutatz, and Ziawasch Abedjan. 2021. Automated Feature Engineering for Algorithmic Fairness. Proc. VLDB Endow., Vol. 14, 9 (2021), 1694--1702. https://doi.org/10.14778/3461535.3463474

Digital Library

[108]

Sebastian Schelter. 2020. "Amnesia" - Machine Learning Models That Can Forget User Data Very Fast. In CIDR. http://cidrdb.org/cidr2020/papers/p32-schelter-cidr20.pdf

[109]

Sebastian Schelter, Stefan Grafberger, and Ted Dunning. 2021. HedgeCut: Maintaining Randomised Trees for Low-Latency Machine Unlearning. In SIGMOD. 1545--1557. https://doi.org/10.1145/3448016.3457239

Digital Library

[110]

Sebastian Schelter, Dustin Lange, Philipp Schmidt, Meltem Celikel, Felix Bießmann, and Andreas Grafberger. 2018. Automating Large-Scale Data Quality Verification. Proc. VLDB Endow., Vol. 11, 12 (2018), 1781--1794. https://doi.org/10.14778/3229863.3229867

Digital Library

[111]

Sebastian Schelter, Andrew Palumbo, Shannon Quinn, Suneel Marthi, and Andrew Musselman. 2016. Samsara: Declarative Machine Learning on Distributed Dataflow Systems.

[112]

Maximilian E. Schü le, Tobias Gö tz, Alfons Kemper, and Thomas Neumann. 2022. ArrayQL Integration into Code-Generating Database Systems. In EDBT. https://doi.org/10.5441/002/edbt.2022.04

[113]

Vraj Shah, Jonathan Lacanlale, Premanand Kumar, Kevin Yang, and Arun Kumar. 2021. Towards Benchmarking Feature Type Inference for AutoML Platforms. In SIGMOD. 1584--1596. https://doi.org/10.1145/3448016.3457274

Digital Library

[114]

Gaurav Sharma and Jos Martin. 2009. MATLAB(^mbox® ): A Language for Parallel Computing. Int. J. Parallel Program., Vol. 37, 1 (2009), 3--36. https://doi.org/10.1007/s10766-008-0082--5

Digital Library

[115]

Alexander J. Smola and Shravan M. Narayanamurthy. 2010. An Architecture for Parallel Topic Models. Proc. VLDB Endow., Vol. 3, 1 (2010), 703--710.

Digital Library

[116]

Johanna Sommer, Matthias Boehm, Alexandre V. Evfimievski, Berthold Reinwald, and Peter J. Haas. 2019. MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions. In SIGMOD. 1607--1623. https://doi.org/10.1145/3299869.3319854

Digital Library

[117]

Leonhard F. Spiegelberg, Rahul Yesantharao, Malte Schwarzkopf, and Tim Kraska. 2021. Tuplex: Data Science in Python at Native Code Speed. In SIGMOD. 1718--1731. https://doi.org/10.1145/3448016.3457244

Digital Library

[118]

Michael Stonebraker, Paul Brown, Alex Poliakov, and Suchi Raman. 2011. The Architecture of SciDB. In SSDBM. 1--16.

[119]

Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown, Tiark Rompf, Hassan Chafi, Michael Wu, Anand R. Atreya, Martin Odersky, and Kunle Olukotun. 2011. OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning. In ICML. 609--616. https://icml.cc/2011/papers/373_icmlpaper.pdf

[120]

Ki Hyun Tae and Steven Euijong Whang. 2021. Slice Tuner: A Selective Data Acquisition Framework for Accurate and Fair Machine Learning Models. In SIGMOD. 1771--1783. https://doi.org/10.1145/3448016.3452792

Digital Library

[121]

Stef van Buuren and Karin Groothuis-Oudshoorn. 2011. mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, Articles, Vol. 45, 3 (2011), 1--67.

[122]

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. CoRR, Vol. abs/1802.04730 (2018). http://arxiv.org/abs/1802.04730

[123]

William N. Venables and Brian D. Ripley. 2002. Modern Applied Statistics with S, 4th Ed. Springer.

[124]

Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, and Kailash Gopalakrishnan. 2018. Training Deep Neural Networks with 8-bit Floating Point Numbers. In NeurIPS. 7686--7695. https://proceedings.neurips.cc/paper/2018/hash/335d3d1cd7ef05ec77714a215134914c-Abstract.html

[125]

Yisu Remy Wang, Shana Hutchison, Dan Suciu, Bill Howe, and Jonathan Leang. 2020. SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra. Proc. VLDB Endow., Vol. 13, 11 (2020), 1919--1932. http://www.vldb.org/pvldb/vol13/p1919-wang.pdf

Digital Library

[126]

Da Yan, Yingyi Bu, Yuanyuan Tian, Amol Deshpande, and James Cheng. 2016. Big Graph Analytics Systems. In SIGMOD. 2241--2243. https://doi.org/10.1145/2882903.2912566

Digital Library

[127]

Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, and Eamonn J. Keogh. 2016. Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets. In ICDM. 1317--1322. https://doi.org/10.1109/ICDM.2016.0179

[128]

Yongyang Yu, MingJie Tang, Walid G. Aref, Qutaibah M. Malluhi, Mostafa M. Abbas, and Mourad Ouzzani. 2017. In-Memory Distributed Matrix Computation Processing and Optimization. In ICDE.

[129]

Binhang Yuan, Dimitrije Jankov, Jia Zou, Yuxin Tang, Daniel Bourgeois, and Chris Jermaine. 2021. Tensor Relational Algebra for Distributed Machine Learning System Design. Proc. VLDB Endow., Vol. 14, 8 (2021), 1338--1350. https://doi.org/10.14778/3457390.3457399

Digital Library

[130]

Binhang Yuan, Cameron R. Wolfe, Chen Dun, Yuxin Tang, Anastasios Kyrillidis, and Chris Jermaine. 2022. Distributed Learning of Fully Connected Neural Networks using Independent Subnet Training. Proc. VLDB Endow., Vol. 15, 8 (2022), 1581--1590. https://www.vldb.org/pvldb/vol15/p1581-wolfe.pdf

Digital Library

[131]

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI. 15--28.

[132]

Dan Zhang, Yoshihiko Suhara, Jinfeng Li, Madelon Hulsebos, cC agatay Demiralp, and Wang-Chiew Tan. 2020. Sato: Contextual Semantic Type Detection in Tables. Proc. VLDB Endow., Vol. 13, 11 (2020), 1835--1848. http://www.vldb.org/pvldb/vol13/p1835-zhang.pdf

Digital Library

[133]

Hantian Zhang, Xu Chu, Abolfazl Asudeh, and Shamkant B. Navathe. 2021. OmniFair: A Declarative System for Model-Agnostic Group Fairness in Machine Learning. In SIGMOD. 2076--2088. https://doi.org/10.1145/3448016.3452787

Digital Library

[134]

Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, and Ce Zhang. 2017. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning. In ICML, Vol. 70. 4035--4043. http://proceedings.mlr.press/v70/zhang17e.html

[135]

Mingxing Zhang, Yongwei Wu, Kang Chen, Teng Ma, and Weimin Zheng. 2016. Measuring and Optimizing Distributed Array Programs. Proc. VLDB Endow., Vol. 9, 12 (2016), 912--923. https://doi.org/10.14778/2994509.2994511

Digital Library

[136]

Yi Zhang, Kamesh Munagala, and Jun Yang. 2011. Storing Matrices on Disk: Theory and Practice Revisited. Proc. VLDB Endow., Vol. 4, 11 (2011), 1075--1086. http://www.vldb.org/pvldb/vol4/p1075-zhang.pdf

Digital Library

[137]

Jia Zou, R. Matthew Barnett, Tania Lorido-Botran, Shangyu Luo, Carlos Monroy, Sourav Sikdar, Kia Teymourian, Binhang Yuan, and Chris Jermaine. 2018. PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development. In SIGMOD. https://doi.org/10.1145/3183713.3196933 io

Digital Library

Index Terms

Optimizing Tensor Computations: From Applications to Compilation and Runtime Techniques

Recommendations

Sparse Tensor Algebra Compilation
Computing Dense Tensor Decompositions with Optimal Dimension Trees

Dense tensor decompositions have been widely used in many signal processing problems including analyzing speech signals, identifying the localization of signal sources, and many other communication applications. Computing these decompositions poses ...
Graphene: An IR for Optimized Tensor Computations on GPUs
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

Modern GPUs accelerate computations and data movements of multi-dimensional tensors in hardware. However, expressing optimized tensor computations in software is extremely challenging even for experts. Languages like CUDA C++ are centered around flat ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '23: Companion of the 2023 International Conference on Management of Data

June 2023

330 pages

ISBN:9781450395076

DOI:10.1145/3555041

General Chairs:
Sudipto Das
Amazon Web Services, USA
,
Ippokratis Pandis
Amazon Web Services, USA
,
Program Chairs:
K. Selçuk Candan
Arizona State University, USA
,
Sihem Amer-Yahia
CNRS, Université Grenoble Alpes, France

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Tutorial

Conference

SIGMOD/PODS '23

Sponsor:

SIGMOD

SIGMOD/PODS '23: International Conference on Management of Data

June 18 - 23, 2023

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
351
Total Downloads

Downloads (Last 12 months)98
Downloads (Last 6 weeks)12

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten