skip to main content
10.1145/3590140.3629119acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Glider: Serverless Ephemeral Stateful Near-Data Computation

Published:27 November 2023Publication History

ABSTRACT

Serverless data analytics generate a large amount of intermediate data during computation stages. However, serverless functions, which are short-lived and lack direct communication, face significant challenges in managing this data effectively. The traditional approach of using object storage to carry the data proves to be slow and costly, as it involves constant movement of data back and forth. Although specialized ephemeral storage solutions have been developed to address this issue, they fail to tackle the fundamental challenge of minimizing data movements. This work focuses on incorporating near-data computation into an ephemeral storage system to reduce the volume of transferred data in serverless analytics. We present Glider with the aim to enhance communication between serverless compute stages, allowing data to smoothly "glide" through the processing pipeline instead of bouncing between different services. Glider achieves this by leveraging stateful near-data execution of complex data-bound operations and an efficient I/O streaming interface. Under evaluation, it reduces data transfers by up to 99.7%, improves storage utilization by up to 99.8%, and enhances performance by up to 2.7×. In sum, Glider improves serverless data analytics by optimizing data movement, streamlining processing, and avoiding redundant transfers.

References

  1. Anurag Acharya, Mustafa Uysal, and Joel Saltz. 1998. Active Disks: Programming Model, Algorithms and Evaluation. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (San Jose, California, USA) (ASPLOS VIII). Association for Computing Machinery, New York, NY, USA, 81--91. https://doi.org/10.1145/291069.291026Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein, Klaus Satzke, Andre Beck, Paarijaat Aditya, and Volker Hilt. 2018. SAND: Towards High-Performance Serverless Computing. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 923--935. https://www.usenix.org/conference/atc18/presentation/akkusGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amazon Web Services. 2019. Serverless Reference Architecture: MapReduce. Retrieved September 15, 2023 from https://github.com/awslabs/lambda-refarch-mapreduceGoogle ScholarGoogle Scholar
  4. Amazon Web Services. 2021. S3 Object Lambda. Retrieved September 15, 2023 from https://aws.amazon.com/s3/features/object-lambda/Google ScholarGoogle Scholar
  5. Amazon Web Services. 2023. Introducing AWS Lambda response streaming. Retrieved September 15, 2023 from https://aws.amazon.com/blogs/compute/introducing-aws-lambda-response-streaming/Google ScholarGoogle Scholar
  6. Amazon Web Services. 2023. S3 Select. Retrieved September 20, 2023 from https://docs.aws.amazon.com/AmazonS3/latest/userguide/selecting-content-from-objects.htmlGoogle ScholarGoogle Scholar
  7. Alex Barcelo, Anna Queralt, and Toni Cortes. 2022. Revisiting active object stores: Bringing data locality to the limit with NVM. Future Generation Computer Systems 129 (2022), 425--439. https://doi.org/10.1016/j.future.2021.10.025Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Daniel Barcelona-Pons, Marc Sánchez-Artigas, Gerard París, Pierre Sutra, and Pedro García-López. 2019. On the FaaS Track: Building Stateful Distributed Applications with Serverless Architectures. In Proceedings of the 20th International Middleware Conference (Davis, CA, USA) (Middleware '19). Association for Computing Machinery, New York, NY, USA, 41--54. https://doi.org/10.1145/3361525.3361535Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Daniel Barcelona-Pons, Pierre Sutra, Marc Sánchez-Artigas, Gerard París, and Pedro García-López. 2022. Stateful Serverless Computing with Crucial. ACM Trans. Softw. Eng. Methodol. 31, 3, Article 39 (mar 2022), 38 pages. https://doi.org/10.1145/3490386Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Phil Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, and Jorgen Thelin. 2014. Orleans: Distributed Virtual Actors for Programmability and Scalability. Technical Report MSR-TR-2014-41. Microsoft. https://www.microsoft.com/en-us/research/publication/orleans-distributed-virtual-actors-for-programmability-and-scalability/Google ScholarGoogle Scholar
  11. Chao Chen, Yong Chen, and Philip C. Roth. 2012. DOSAS: Mitigating the Resource Contention in Active Storage Systems. In 2012 IEEE International Conference on Cluster Computing. IEEE, New York, NY, USA, 164--172. https://doi.org/10.1109/CLUSTER.2012.66Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Marcin Copik, Roman Böhringer, Alexandru Calotoiu, and Torsten Hoefler. 2023. FMI: Fast and Cheap Message Passing for Serverless Functions. In Proceedings of the 37th International Conference on Supercomputing (Orlando, FL, USA) (ICS '23). Association for Computing Machinery, New York, NY, USA, 373--385. https://doi.org/10.1145/3577193.3593718Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Haggai Eran, Lior Zeno, Maroun Tork, Gabi Malka, and Mark Silberstein. 2019. NICA: An Infrastructure for Inline Acceleration of Network Applications. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 345--362. https://www.usenix.org/conference/atc19/presentation/eranGoogle ScholarGoogle Scholar
  14. Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava, Anshuman Verma, Qasim Zuhair, Deepak Bansal, Doug Burger, Kushagra Vaid, David A. Maltz, and Albert Greenberg. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 51--66. https://www.usenix.org/conference/nsdi18/presentation/firestoneGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  15. Apache Flink. 2022. Stateful Functions. Retrieved September 15, 2023 from https://nightlies.apache.org/flink/flink-statefun-docs-master/Google ScholarGoogle Scholar
  16. Sadjad Fouladi, Francisco Romero, Dan Iter, Qian Li, Shuvo Chatterjee, Christos Kozyrakis, Matei Zaharia, and Keith Winstein. 2019. From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 475--488. http://www.usenix.org/conference/atc19/presentation/fouladiGoogle ScholarGoogle Scholar
  17. Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. 2017. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 363--376. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/fouladiGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  18. Glider. 2023. GitHub. Retrieved September 15, 2023 from https://github.com/CLOUDLAB-URV/glider-storeGoogle ScholarGoogle Scholar
  19. Raúl Gracia-Tinedo, Marc Sanchez-Artigas, Pedro Garcia-Lopez, Yosef Moatti, and Filip Gluszak. 2019. Lamda-Flow: Automatic Pushdown of Dataflow Operators Close to the Data. In 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, New York, NY, USA, 112--121. https://doi.org/10.1109/CCGRID.2019.00022Google ScholarGoogle ScholarCross RefCross Ref
  20. Joseph M. Hellerstein, Jose Faleiro, Joseph E. Gonzalez, Johann Schleier-Smith, Vikram Sreekanti, Alexey Tumanov, and Chenggang Wu. 2018. Serverless Computing: One Step Forward, Two Steps Back. https://doi.org/10.48550/ARXIV.1812.03651Google ScholarGoogle ScholarCross RefCross Ref
  21. K. R. Jayaram, Vinod Muthusamy, Parijat Dube, Vatche Ishakian, Chen Wang, Benjamin Herta, Scott Boag, Diana Arroyo, Asser Tantawi, Archit Verma, Falk Pollok, and Rania Khalaf. 2019. FfDL: A Flexible Multi-Tenant Deep Learning Platform. In Proceedings of the 20th International Middleware Conference (Davis, CA, USA) (Middleware '19). Association for Computing Machinery, New York, NY, USA, 82--95. https://doi.org/10.1145/3361525.3361538Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the Cloud: Distributed Computing for the 99%. In Proceedings of the 2017 Symposium on Cloud Computing (Santa Clara, California) (SoCC '17). Association for Computing Machinery, New York, NY, USA, 445--451. https://doi.org/10.1145/3127479.3128601Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Menezes Carreira, Karl Krauth, Neeraja Yadwadkar, Joseph Gonzalez, Raluca Ada Popa, Ion Stoica, and David A. Patterson. 2019. Cloud Programming Simplified: A Berkeley View on Serverless Computing. Technical Report UCB/EECS-2019-3. EECS Department, University of California, Berkeley.Google ScholarGoogle Scholar
  24. Anuj Kalia, Michael Kaminsky, and David Andersen. 2019. Datacenter RPCs can be General and Fast. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association, Boston, MA, 1--16. https://www.usenix.org/conference/nsdi19/presentation/kaliaGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  25. Anurag Khandelwal, Yupeng Tang, Rachit Agarwal, Aditya Akella, and Ion Stoica. 2022. Jiffy: Elastic Far-Memory for Stateful Serverless Analytics. In Proceedings of the Seventeenth European Conference on Computer Systems (Rennes, France) (EuroSys '22). Association for Computing Machinery, New York, NY, USA, 697--713. https://doi.org/10.1145/3492321.3527539Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ana Klimovic, Yawen Wang, Christos Kozyrakis, Patrick Stuedi, Jonas Pfefferle, and Animesh Trivedi. 2018. Understanding Ephemeral Storage for Serverless Analytics. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 789--794. https://www.usenix.org/conference/atc18/presentation/klimovic-serverlessGoogle ScholarGoogle Scholar
  27. Ana Klimovic, Yawen Wang, Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, and Christos Kozyrakis. 2018. Pocket: Elastic Ephemeral Storage for Serverless Analytics. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 427--444. https://www.usenix.org/conference/osdi18/presentation/klimovicGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lightbend. 2023. Akka. Retrieved September 15, 2023 from https://akka.io/Google ScholarGoogle Scholar
  29. Microsoft Azure. 2023. Entity functions. Retrieved September 15, 2023 from https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-entitiesGoogle ScholarGoogle Scholar
  30. Yosef Moatti, Eran Rom, Raul Gracia-Tinedo, Dalit Naor, Doron Chen, Josep Sampe, Marc Sanchez-Artigas, Pedro Garcıa-Lopez, Filip Gluszak, Eric Deschdt, Francesco Pace, Daniele Venzano, and Pietro Michiardi. 2017. Too Big to Eat: Boosting Analytics Data Ingestion from Object Stores with Scoop. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, New York, NY, USA, 309--320. https://doi.org/10.1109/ICDE.2017.243Google ScholarGoogle ScholarCross RefCross Ref
  31. Djob Mvondo, Mathieu Bacou, Kevin Nguetchouang, Lucien Ngale, Stéphane Pouget, Josiane Kouam, Renaud Lachaize, Jinho Hwang, Tim Wood, Daniel Hagimont, Noël De Palma, Bernabé Batchakui, and Alain Tchana. 2021. OFC: An Opportunistic Caching System for FaaS Platforms. In Proceedings of the Sixteenth European Conference on Computer Systems (Online Event, United Kingdom) (EuroSys '21). Association for Computing Machinery, New York, NY, USA, 228--244. https://doi.org/10.1145/3447786.3456239Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. National Library of Medicine. 2021. WGS of tumor sample from patient P94 (SRR15068323). Retrieved September 15, 2023 from https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR15068323Google ScholarGoogle Scholar
  33. Qifan Pu, Shivaram Venkataraman, and Ion Stoica. 2019. Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association, Boston, MA, 193--206. https://www.usenix.org/conference/nsdi19/presentation/puGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  34. Erik Riedel, Garth A. Gibson, and Christos Faloutsos. 1998. Active Storage for Large-Scale Data Mining and Multimedia. In Proceedings of the 24rd International Conference on Very Large Data Bases (VLDB '98). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 62--73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Francisco Romero, Gohar Irfan Chaudhry, Íñigo Goiri, Pragna Gopa, Paul Batum, Neeraja J. Yadwadkar, Rodrigo Fonseca, Christos Kozyrakis, and Ricardo Bianchini. 2021. Faa$T: A Transparent Auto-Scaling Cache for Serverless Applications. In Proceedings of the ACM Symposium on Cloud Computing (Seattle, WA, USA) (SoCC '21). Association for Computing Machinery, New York, NY, USA, 122--137. https://doi.org/10.1145/3472883.3486974Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Josep Sampé, Marc Sánchez-Artigas, Pedro García-López, and Gerard París. 2017. Data-Driven Serverless Functions for Object Storage. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference (Las Vegas, Nevada) (Middleware '17). Association for Computing Machinery, New York, NY, USA, 121--133. https://doi.org/10.1145/3135974.3135980Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Marc Sánchez-Artigas, Germán T. Eizaguirre, Gil Vernik, Lachlan Stuart, and Pedro García-López. 2020. Primula: A Practical Shuffle/Sort Operator for Serverless Computing. In Proceedings of the 21st International Middleware Conference Industrial Track (Delft, Netherlands) (Middleware '20). Association for Computing Machinery, New York, NY, USA, 31--37. https://doi.org/10.1145/3429357.3430522Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Bo Sang, Pierre-Louis Roman, Patrick Eugster, Hui Lu, Srivatsan Ravi, and Gustavo Petri. 2020. PLASMA: Programmable Elasticity for Stateful Cloud Computing Applications. In Proceedings of the Fifteenth European Conference on Computer Systems (Heraklion, Greece) (EuroSys '20). Association for Computing Machinery, New York, NY, USA, Article 42, 15 pages. https://doi.org/10.1145/3342195.3387553Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Pablo Gimeno Sarroca and Marc Sánchez-Artigas. 2023. On Data Processing through the Lenses of S3 Object Lambda. In IEEE INFOCOM 2023 - IEEE Conference on Computer Communications. IEEE, New York, NY, USA, 1--10. https://doi.org/10.1109/INFOCOM53939.2023.10228890Google ScholarGoogle ScholarCross RefCross Ref
  40. Simon Shillaker and Peter Pietzuch. 2020. Faasm: Lightweight Isolation for Efficient Stateful Serverless Computing. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, Berkeley, CA, USA, 419--433. https://www.usenix.org/conference/atc20/presentation/shillakerGoogle ScholarGoogle Scholar
  41. Vikram Sreekanti, Chenggang Wu, Xiayue Charles Lin, Johann Schleier-Smith, Joseph E. Gonzalez, Joseph M. Hellerstein, and Alexey Tumanov. 2020. Cloudburst: Stateful Functions-as-a-Service. Proc. VLDB Endow. 13, 12 (jul 2020), 2438--2452. https://doi.org/10.14778/3407790.3407836Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, Ana Klimovic, Adrian Schuep-bach, and Bernard Metzler. 2019. Unification of Temporary Storage in the NodeKernel Architecture. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 767--782. https://www.usenix.org/conference/atc19/presentation/stuediGoogle ScholarGoogle Scholar
  43. Ao Wang, Jingyuan Zhang, Xiaolong Ma, Ali Anwar, Lukas Rupprecht, Dimitrios Skourtis, Vasily Tarasov, Feng Yan, and Yue Cheng. 2020. InfiniCache: Exploiting Ephemeral Serverless Functions to Build a Cost-Effective Memory Cache. In 18th USENIX Conference on File and Storage Technologies (FAST 20). USENIX Association, Santa Clara, CA, 267--281. https://www.usenix.org/conference/fast20/presentation/wang-aoGoogle ScholarGoogle Scholar
  44. Liang Wang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart, and Michael Swift. 2018. Peeking Behind the Curtains of Serverless Platforms. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 133--146. https://www.usenix.org/conference/atc18/presentation/wang-liangGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  45. Wikimedia. 2023. Wikimedia downloads. Retrieved September 15, 2023 from https://dumps.wikimedia.org/Google ScholarGoogle Scholar
  46. Jingyuan Zhang, Ao Wang, Xiaolong Ma, Benjamin Carver, Nicholas John Newman, Ali Anwar, Lukas Rupprecht, Dimitrios Skourtis, Vasily Tarasov, Feng Yan, and Yue Cheng. 2022. Sion: Elastic Serverless Cloud Storage. https://doi.org/10.48550/ARXIV.2209.01496Google ScholarGoogle ScholarCross RefCross Ref
  47. Tian Zhang, Dong Xie, Feifei Li, and Ryan Stutsman. 2019. Narrowing the Gap Between Serverless and Its State with Storage Functions. In Proceedings of the ACM Symposium on Cloud Computing (Santa Cruz, CA, USA) (SoCC '19). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3357223.3362723Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Glider: Serverless Ephemeral Stateful Near-Data Computation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        Middleware '23: Proceedings of the 24th International Middleware Conference
        November 2023
        334 pages
        ISBN:9798400701771
        DOI:10.1145/3590140

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 November 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate203of948submissions,21%
      • Article Metrics

        • Downloads (Last 12 months)199
        • Downloads (Last 6 weeks)14

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader