research-article

Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines

Authors:
Francisco Romero

Stanford University

Stanford University
View Profile

,
Mark Zhao

Stanford University

Stanford University
View Profile

,
Neeraja J. Yadwadkar

Stanford University

Stanford University
View Profile

,
Christos Kozyrakis

Stanford University

Stanford University
View Profile

SoCC '21: Proceedings of the ACM Symposium on Cloud ComputingNovember 2021Pages 1–17https://doi.org/10.1145/3472883.3486972

Published:01 November 2021Publication History

SoCC '21: Proceedings of the ACM Symposium on Cloud Computing

Pages 1–17

ABSTRACT

The proliferation of camera-enabled devices and large video repositories has led to a diverse set of video analytics applications. These applications rely on video pipelines, represented as DAGs of operations, to transform videos, process extracted metadata, and answer questions like, "Is this intersection congested?" The latency and resource efficiency of pipelines can be optimized using configurable knobs for each operation (e.g., sampling rate, batch size, or type of hardware used). However, determining efficient configurations is challenging because (a) the configuration search space is exponentially large, and (b) the optimal configuration depends on users' desired latency and cost targets, (c) input video contents may exercise different paths in the DAG and produce a variable amount intermediate results. Existing video analytics and processing systems leave it to the users to manually configure operations and select hardware resources.

We present Llama: a heterogeneous and serverless framework for auto-tuning video pipelines. Given an end-to-end latency target, Llama optimizes for cost efficiency by (a) calculating a latency target for each operation invocation, and (b) dynamically running a cost-based optimizer to assign configurations across heterogeneous hardware that best meet the calculated per-invocation latency target. This makes the problem of auto-tuning large video pipelines tractable and allows us to handle input-dependent behavior, conditional branches in the DAG, and execution variability. We describe the algorithms in Llama and evaluate it on a cloud platform using serverless CPU and GPU resources. We show that compared to state-of-the-art cluster and serverless video analytics and processing systems, Llama achieves 7.8x lower latency and 16x cost reduction on average.

Supplemental Material

Day1_Session1_Order_1_Llama.mp4

mp4

450.7 MB

Download

References

2021. Amazon ECU. https://aws.amazon.com/ec2/faqs/#What_is_an_EC2_Compute_Unit_and_why_did_you_introduce_it.Google Scholar
2021. Ambarella CVFlow Architecture. https://www.ambarella.com/teehnology/#evflow.Google Scholar
2021. AWS Lambda. https://aws.amazon.com/lambda/.Google Scholar
2021. AWS Step Functions. https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html.Google Scholar
2021. Azure Functions. https://azure.microsoft.com/en-us/services/functions/.Google Scholar
2021. Cisco Annual Internet Report (2018-2023). https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html.Google Scholar
2021. CNN - Futuristic cop cars may identify suspects. https://money.cnn.com/2017/10/19/technology/future/police-ai-dashcam/index.html.Google Scholar
2021. Google Cloud. https://cloud.google.com/.Google Scholar
2021. Google Cloud Functions. https://cloud.google.com/functions.Google Scholar
2021. Multi-Process Service. https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf.Google Scholar
2021. NVIDIA A100 GPU. https://www.nvidia.com/en-us/data-center/a100/.Google Scholar
2021. Political Rally Video. https://www.youtube.com/watch?v=FGDFAD3Jkuc.Google Scholar
2021. Scanner. http://scanner.run/.Google Scholar
2021. Tears of Steel. https://www.youtube.com/watch?v=tjgM6ckoz88.Google Scholar
2021. Traffic Footage. https://www.youtube.com/watch?v=MNn9qKG2UFI.Google Scholar
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, and et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI'16). USENIX Association, USA, 265--283.Google Scholar
Yanif Ahmad, Oliver Kennedy, Christoph Koch, and Milos Nikolic. 2012. DBToaster: Higher-Order Delta Processing for Dynamic, Frequently Fresh Views. Proc. VLDB Endow. 5, 10 (June 2012), 968--979. https://doi.org/10.14778/2336664.2336670Google ScholarDigital Library
Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. 2017. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 469--482. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/alipourfardGoogle ScholarDigital Library
Amazon Go 2021. Amazon Go. https://www.amazon.com/b?ie=UTF8&node=16008589011.Google Scholar
G. Ananthanarayanan, P. Bahl, P. Bodík, K. Chintalapudi, M. Philipose, L. Ravindranath, and S. Sinha. 2017. Real-Time Video Analytics: The Killer App for Edge Computing. Computer 50, 10 (2017), 58--67. https://doi.org/10.1109/MC.2017.3641638Google ScholarDigital Library
Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. 2013. Effective Straggler Mitigation: Attack of the Clones. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). USENIX Association, Lombard, IL, 185--198. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/ananthanarayananGoogle Scholar
Lixiang Ao, Liz Izhikevich, Geoffrey M. Voelker, and George Porter. 2018. Sprocket: A Serverless Video Processing Framework. In Proceedings of the ACM Symposium on Cloud Computing (Carlsbad, CA, USA) (SoCC '18). Association for Computing Machinery, New York, NY, USA, 263--274. https://doi.org/10.1145/3267809.3267815Google ScholarDigital Library
Artificial Intelligence Security Surveillance Cameras 2018. Artificial Intelligence Security Surveillance Cameras. https://www.theverge.com/2018/1/23/16907238/artificial-intelligence-surveillance-cameras-security.Google Scholar
Ayon Basumallik and Rudolf Eigenmann. 2006. Optimizing Irregular Shared-Memory Applications for Distributed-Memory Systems. In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (New York, New York, USA) (PPoPP '06). Association for Computing Machinery, New York, NY, USA, 119--128. https://doi.org/10.1145/1122971.1122990Google ScholarDigital Library
Laurent Bindschaedler, Jasmina Malicevic, Nicolas Schiper, Ashvin Goel, and Willy Zwaenepoel. 2018. Rock You like a Hurricane: Taming Skew in Large Scale Analytics. In Proceedings of the Thirteenth EuroSys Conference (Porto, Portugal) (EuroSys '18). Association for Computing Machinery, New York, NY, USA, Article 20, 15 pages. https://doi.org/10.1145/3190508.3190532Google ScholarDigital Library
G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000).Google Scholar
Jack Choquette and Wishwesh Gandhi. 2020. NVIDIA's A100 GPU: Performance and Innovation for GPU Computing. In 2020 IEEE Hot Chips 32 Symposium (HCS), Virtual, August 16-18, 2020. IEEE.Google ScholarCross Ref
Daniel Crankshaw, Gur-Eyal Sela, Xiangxi Mo, Corey Zumar, Ion Stoica, Joseph Gonzalez, and Alexey Tumanov. 2020. InferLine: Latency-Aware Provisioning and Scaling for Prediction Serving Pipelines. In Proceedings of the 11th ACM Symposium on Cloud Computing (Virtual Event, USA) (SoCC '20). Association for Computing Machinery, New York, NY, USA, 477--491. https://doi.org/10.1145/3419111.3421285Google ScholarDigital Library
Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 613--627. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/crankshawGoogle Scholar
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation - Volume 6 (San Francisco, CA) (OSDI'04). USENIX Association, USA, 10.Google Scholar
Amol Deshpande, Zachary Ives, and Vijayshankar Raman. 2007. Adaptive Query Processing. Found. Trends Databases 1, 1 (Jan. 2007), 1--140.Google ScholarCross Ref
T. Elgamal. 2018. Costless: Optimizing Cost of Serverless Computing through Function Fusion and Placement. In 2018 IEEE/ACM Symposium on Edge Computing (SEC). 300--312. https://doi.org/10.1109/SEC.2018.00029Google ScholarCross Ref
Andrew D. Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca. 2012. Jockey: Guaranteed Job Latency in Data Parallel Clusters. In Proceedings of the 7th ACM European Conference on Computer Systems (Bern, Switzerland) (EuroSys '12). Association for Computing Machinery, New York, NY, USA, 99--112. https://doi.org/10.1145/2168836.2168847Google ScholarDigital Library
FFmpeg 2021. FFmpeg. https://ffmpeg.org/.Google Scholar
Sadjad Fouladi, Francisco Romero, Dan Iter, Qian Li, Shuvo Chatterjee, Christos Kozyrakis, Matei Zaharia, and Keith Winstein. 2019. From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers. In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference (Renton, WA, USA) (USENIX ATC '19). USENIX Association, USA, 475--488.Google Scholar
Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. 2017. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation (Boston, MA, USA) (NSDI'17). USENIX Association, USA, 363--376.Google Scholar
Ilya Ganusov and Mahesh Iyer. 2020. Agilex Generation of Intel FPGAs. In 2020 IEEE Hot Chips 32 Symposium (HCS), Virtual, August 16-18, 2020. IEEE.Google Scholar
James Gibson, David Atkins, Torrey Creed, Zac Imel, Panayiotis Georgiou, and Shrikanth Narayanan. 2019. Multi-label Multi-task Deep Learning for Behavioral Coding. IEEE Transactions on Affective Computing (2019), 1--1. https://doi.org/10.1109/TAFFC.2019.2952113Google ScholarCross Ref
Ionel Gog, Malte Schwarzkopf, Natacha Crooks, Matthew P. Grosvenor, Allen Clement, and Steven Hand. 2015. Musketeer: All for One, One for All in Data Processing Systems. In Proceedings of the Tenth European Conference on Computer Systems (Bordeaux, France) (EuroSys '15). Association for Computing Machinery, New York, NY, USA, Article 2, 16 pages. https://doi.org/10.1145/2741948.2741968Google ScholarDigital Library
Jashwant Raj Gunasekaran, Prashanth Thinakaran, Nachiappan C. Nachiappan, Mahmut Taylan Kandemir, and Chita R. Das. 2020. Fifer: Tackling Resource Underutilization in the Serverless Era. In Proceedings of the 21st International Middleware Conference (Delft, Netherlands) (Middleware '20). Association for Computing Machinery, New York, NY, USA, 280--295. https://doi.org/10.1145/3423211.3425683Google ScholarDigital Library
Herodotos Herodotou and Shivnath Babu. 2011. Profiling, What-If Analysis, and Cost-Based Optimization of MapReduce Programs. Proc. VLDB Endow. 4, 11 (Aug. 2011), 1111--1122. https://doi.org/10.14778/3402707.3402746Google ScholarDigital Library
Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Shivaram Venkataraman, Paramvir Bahl, Matthai Philipose, Phillip B. Gibbons, and Onur Mutlu. 2018. Focus: Querying Large Video Datasets with Low Latency and Low Cost. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 269--286. https://www.usenix.org/conference/osdi18/presentation/hsiehGoogle ScholarDigital Library
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 (Lisbon, Portugal) (EuroSys '07). Association for Computing Machinery, New York, NY, USA, 59--72. https://doi.org/10.1145/1272996.1273005Google ScholarDigital Library
Fei Jiang, Yong Jiang, Hui Zhi, Yi Dong, Hao Li, Sufeng Ma, Yilong Wang, Qiang Dong, Haipeng Shen, and Yongjun Wang. 2017. Artificial intelligence in healthcare: past, present and future. Stroke and Vascular Neurology 2, 4 (2017), 230--243. https://doi.org/10.1136/svn-2017-000101 arXiv:https://svn.bmj.com/content/2/4/230.full.pdfGoogle ScholarCross Ref
Junchen Jiang, Ganesh Ananthanarayanan, Peter Bodik, Siddhartha Sen, and Ion Stoica. 2018. Chameleon: Scalable Adaptation of Video Analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (Budapest, Hungary) (SIGCOMM '18). Association for Computing Machinery, New York, NY, USA, 253--266. https://doi.org/10.1145/3230543.3230574Google ScholarDigital Library
Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the Cloud: Distributed Computing for the 99%. In Proceedings of the 2017 Symposium on Cloud Computing (Santa Clara, California) (SoCC '17). Association for Computing Machinery, New York, NY, USA, 445--451. https://doi.org/10.1145/3127479.3128601Google ScholarDigital Library
N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 1--12. https://doi.org/10.1145/3079856.3080246Google ScholarDigital Library
Ram Srivatsa Kannan, Lavanya Subramanian, Ashwin Raju, Jeongseob Ahn, Jason Mars, and Lingjia Tang. 2019. GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks. In Proceedings of the Fourteenth EuroSys Conference 2019 (Dresden, Germany) (EuroSys '19). Association for Computing Machinery, New York, NY, USA, Article 34, 16 pages. https://doi.org/10.1145/3302424.3303958Google ScholarDigital Library
Sunghwan Kim, Taesung Lee, Seung-won Hwang, and Sameh Elnikety. 2018. List Intersection for Web Search: Algorithms, Cost Models, and Optimizations. Proc. VLDB Endow. 12, 1 (Sept. 2018), 1--13. https://doi.org/10.14778/3275536.3275537Google ScholarDigital Library
Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009), 1755--1758.Google ScholarDigital Library
Ana Klimovic, Heiner Litz, and Christos Kozyrakis. 2018. Selecta: Heterogeneous Cloud Storage Configuration for Data Analytics. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 759--773. https://www.usenix.org/conference/atc18/presentation/klimovic-selectaGoogle Scholar
Fan Lai, Jie You, Xiangfeng Zhu, Harsha V. Madhyastha, and Mosharaf Chowdhury. 2020. Sol: Fast Distributed Computation Over Slow Networks. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA, 273--288. https://www.usenix.org/conference/nsdi20/presentation/laiGoogle Scholar
Kshiteej Mahajan, Mosharaf Chowdhury, Aditya Akella, and Shuchi Chawla. 2018. Dynamic Query Re-Planning Using QOOP. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Carlsbad, CA, USA) (OSDI'18). USENIX Association, USA, 253--267.Google Scholar
Ashraf Mahgoub, Alexander Michaelson Medoff, Rakesh Kumar, Subrata Mitra, Ana Klimovic, Somali Chaterji, and Saurabh Bagchi. 2020. OPTIMUSCLOUD: Heterogeneous Configuration Optimization for Distributed Databases in the Cloud. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 189--203. https://www.usenix.org/conference/atc20/presentation/mahgoubGoogle Scholar
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (Indianapolis, Indiana, USA) (SIGMOD '10). Association for Computing Machinery, New York, NY, USA, 135--146. https://doi.org/10.1145/1807167.1807184Google ScholarDigital Library
Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis, Hossein Ahmadi, Dan Delorey, Slava Min, Mosha Pasumansky, and Jeff Shute. 2020. Dremel: A Decade of Interactive SQL Analysis at Web Scale. Proc. VLDB Endow. (2020).Google ScholarDigital Library
R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nico, and K. Crowley. 1988. Principles of Runtime Support for Parallel Processors. In Proceedings of the 2nd International Conference on Supercomputing (St. Malo, France) (ICS '88). Association for Computing Machinery, New York, NY, USA, 140--152. https://doi.org/10.1145/55364.55378Google ScholarDigital Library
Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: A Timely Dataflow System. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (Farminton, Pennsylvania) (SOSP '13). Association for Computing Machinery, New York, NY, USA, 439--455. https://doi.org/10.1145/2517349.2522738Google ScholarDigital Library
Derek G. Murray, Malte Schwarzkopf, Christopher Smowton, Steven Smith, Anil Madhavapeddy, and Steven Hand. 2011. CIEL: A Universal Execution Engine for Distributed Data-Flow Computing. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (Boston, MA) (NSDI'11). USENIX Association, USA, 113--126.Google ScholarDigital Library
Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013. Sparrow: Distributed, Low Latency Scheduling. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (Farminton, Pennsylvania) (SOSP '13). Association for Computing Machinery, New York, NY, USA, 69--84. https://doi.org/10.1145/2517349.2522716Google ScholarDigital Library
Alex Poms, Will Crichton, Pat Hanrahan, and Kayvon Fatahalian. 2018. Scanner: Efficient Video Analysis at Scale. ACM Trans. Graph. 37, 4, Article 138 (July 2018), 13 pages. https://doi.org/10.1145/3197517.3201394Google ScholarDigital Library
Christopher J. Rossbach, Yuan Yu, Jon Currey, Jean-Philippe Martin, and Dennis Fetterly. 2013. Dandelion: A Compiler and Runtime for Heterogeneous Systems. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (Farminton, Pennsylvania) (SOSP '13). Association for Computing Machinery, New York, NY, USA, 49--68. https://doi.org/10.1145/2517349.2522715Google ScholarDigital Library
Haichen Shen, Lequn Chen, Yuchen Jin, Liangyu Zhao, Bingyu Kong, Matthai Philipose, Arvind Krishnamurthy, and Ravi Sundaram. 2019. Nexus: A GPU Cluster Engine for Accelerating DNN-Based Video Analysis. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (Huntsville, Ontario, Canada) (SOSP '19). Association for Computing Machinery, New York, NY, USA, 322--337. https://doi.org/10.1145/3341301.3359658Google ScholarDigital Library
Ji Sun and Guoliang Li. 2019. An End-to-End Learning-Based Cost Estimator. Proc. VLDB Endow. 13, 3 (Nov. 2019), 307--319. https://doi.org/10.14778/3368289.3368296Google ScholarDigital Library
Jian Tan, Tieying Zhang, Feifei Li, Jie Chen, Qixing Zheng, Ping Zhang, Honglin Qiao, Yue Shi, Wei Cao, and Rui Zhang. 2019. IBTune: Individualized Buffer Tuning for Large-Scale Cloud Databases. Proc. VLDB Endow. 12, 10 (June 2019), 1221--1234. https://doi.org/10.14778/3339490.3339503Google ScholarDigital Library
Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A. Kozuch, Mor Harchol-Balter, and Gregory R. Ganger. 2016. TetriSched: Global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters. In Proceedings of the 11th European Conference on Computer Systems, EuroSys 2016 (Proceedings of the 11th European Conference on Computer Systems, EuroSys 2016). Association for Computing Machinery, Inc. https://doi.org/10.1145/2901318.2901355 11th European Conference on Computer Systems, EuroSys 2016; Conference date: 18-04-2016 Through 21-04-2016.Google ScholarDigital Library
Shivaram Venkataraman, Aurojit Panda, Ganesh Ananthanarayanan, Michael J. Franklin, and Ion Stoica. 2014. The Power of Choice in Data-Aware Cluster Scheduling. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (Broomfield, CO) (OSDI'14). USENIX Association, USA, 301--316.Google ScholarDigital Library
Shivaram Venkataraman, Zongheng Yang, Michael Franklin, Benjamin Recht, and Ion Stoica. 2016. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16). USENIX Association, Santa Clara, CA, 363--378. https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/venkataramanGoogle ScholarDigital Library
Stratis D. Viglas and Jeffrey F. Naughton. 2002. Rate-Based Query Optimization for Streaming Information Sources. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (Madison, Wisconsin) (SIGMOD '02). Association for Computing Machinery, New York, NY, USA, 37--48. https://doi.org/10.1145/564691.564697Google ScholarDigital Library
Martin Voogel, Yohan Frans, and Matt Ouellette. 2020. Xilinx Versal Premium Series. In 2020 IEEE Hot Chips 32 Symposium (HCS), Virtual, August 16-18, 2020. IEEE.Google Scholar
Liang Wang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart, and Michael Swift. 2018. Peeking Behind the Curtains of Serverless Platforms. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 133--146. https://www.usenix.org/conference/atc18/presentation/wang-liangGoogle ScholarDigital Library
Zuozhi Wang, Kai Zeng, Botong Huang, Wei Chen, Xiaozong Cui, Bo Wang, Ji Liu, Liya Fan, Dachuan Qu, Zhenyu Hou, Tao Guan, Chen Li, and Jingren Zhou. 2020. Tempura: A General Cost-Based Optimizer Framework for Incremental Data Processing. Proc. VLDB Endow. 14, 1 (Sept. 2020), 14--27. https://doi.org/10.14778/3421424.3421427Google ScholarDigital Library
Ran Xu, Jinkyu Koo, Rakesh Kumar, Peter Bai, Subrata Mitra, Sasa Misailovic, and Saurabh Bagchi. 2018. VideoChef: Efficient Approximation for Streaming Video Processing Pipelines. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 43--56. https://www.usenix.org/conference/atc18/presentation/xu-ranGoogle Scholar
Neeraja J. Yadwadkar, Ganesh Ananthanarayanan, and Randy Katz. 2014. Wrangler: Predictable and Faster Jobs Using Fewer Resources. In Proceedings of the ACM Symposium on Cloud Computing (Seattle, WA, USA) (SOCC '14). Association for Computing Machinery, New York, NY, USA, 1--14. https://doi.org/10.1145/2670979.2671005Google ScholarDigital Library
Neeraja J. Yadwadkar, Bharath Hariharan, Joseph E. Gonzalez, Burton Smith, and Randy H. Katz. 2017. Selecting the Best VM Across Multiple Public Clouds: A Data-driven Performance Modeling Approach. In Proceedings of the 2017 Symposium on Cloud Computing (Santa Clara, California) (SoCC '17). ACM, 452--465. https://doi.org/10.1145/3127479.3131614Google ScholarDigital Library
Tao Yu, Yue Zhang, and Kwei-Jay Lin. 2007. Efficient Algorithms for Web Services Selection with End-to-End QoS Constraints. ACM Trans. Web (2007).Google Scholar
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (Boston, MA) (HotCloud'10). USENIX Association, USA, 10.Google ScholarDigital Library
Haoyu Zhang, Ganesh Ananthanarayanan, Peter Bodik, Matthai Philipose, Paramvir Bahl, and Michael J. Freedman. 2017. Live Video Analytics at Scale with Approximation and Delay-Tolerance. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 377--392. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/zhangGoogle ScholarDigital Library

Index Terms

Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Client-server architectures
      2. Cloud computing

Recommendations

Supporting Multi-Provider Serverless Computing on the Edge
ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing

Serverless computing has recently emerged as a new execution model for cloud computing, in which service providers offer compute runtimes, also known as Function-as-a-Service (FaaS) platforms, allowing users to develop, execute and manage application ...
Read More
Towards cloud-edge collaborative online video analytics with fine-grained serverless pipelines
MMSys '21: Proceedings of the 12th ACM Multimedia Systems Conference

The ever-growing deployment scale of surveillance cameras and the users' increasing appetite for real-time queries have urged online video analytics. Synergizing the virtually unlimited cloud resources with agile edge processing would deliver an ideal ...
Read More
Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor
IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

The Intel® Xeon Phi™ coprocessor platform has a new software stack that enables new programming models. One such model is offload of computation from a host processor to a coprocessor that is a fully-capable Intel® Architecture CPU, namely, the Intel® ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SoCC '21: Proceedings of the ACM Symposium on Cloud Computing
November 2021
685 pages
ISBN:9781450386388
DOI:10.1145/3472883

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distributed systems
heterogeneous
scheduling
serverless computing
video analytics
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate169of722submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 986
  Total Downloads
- Downloads (Last 12 months)339
- Downloads (Last 6 weeks)47
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines

SoCC '21: Proceedings of the ACM Symposium on Cloud Computing

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Supporting Multi-Provider Serverless Computing on the Edge

Towards cloud-edge collaborative online video analytics with fine-grained serverless pipelines

Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines

SoCC '21: Proceedings of the ACM Symposium on Cloud Computing

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Supporting Multi-Provider Serverless Computing on the Edge

Towards cloud-edge collaborative online video analytics with fine-grained serverless pipelines

Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor