ABSTRACT
Serverless data analytics generate a large amount of intermediate data during computation stages. However, serverless functions, which are short-lived and lack direct communication, face significant challenges in managing this data effectively. The traditional approach of using object storage to carry the data proves to be slow and costly, as it involves constant movement of data back and forth. Although specialized ephemeral storage solutions have been developed to address this issue, they fail to tackle the fundamental challenge of minimizing data movements. This work focuses on incorporating near-data computation into an ephemeral storage system to reduce the volume of transferred data in serverless analytics. We present Glider with the aim to enhance communication between serverless compute stages, allowing data to smoothly "glide" through the processing pipeline instead of bouncing between different services. Glider achieves this by leveraging stateful near-data execution of complex data-bound operations and an efficient I/O streaming interface. Under evaluation, it reduces data transfers by up to 99.7%, improves storage utilization by up to 99.8%, and enhances performance by up to 2.7×. In sum, Glider improves serverless data analytics by optimizing data movement, streamlining processing, and avoiding redundant transfers.
- Anurag Acharya, Mustafa Uysal, and Joel Saltz. 1998. Active Disks: Programming Model, Algorithms and Evaluation. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (San Jose, California, USA) (ASPLOS VIII). Association for Computing Machinery, New York, NY, USA, 81--91. https://doi.org/10.1145/291069.291026Google ScholarDigital Library
- Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein, Klaus Satzke, Andre Beck, Paarijaat Aditya, and Volker Hilt. 2018. SAND: Towards High-Performance Serverless Computing. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 923--935. https://www.usenix.org/conference/atc18/presentation/akkusGoogle ScholarDigital Library
- Amazon Web Services. 2019. Serverless Reference Architecture: MapReduce. Retrieved September 15, 2023 from https://github.com/awslabs/lambda-refarch-mapreduceGoogle Scholar
- Amazon Web Services. 2021. S3 Object Lambda. Retrieved September 15, 2023 from https://aws.amazon.com/s3/features/object-lambda/Google Scholar
- Amazon Web Services. 2023. Introducing AWS Lambda response streaming. Retrieved September 15, 2023 from https://aws.amazon.com/blogs/compute/introducing-aws-lambda-response-streaming/Google Scholar
- Amazon Web Services. 2023. S3 Select. Retrieved September 20, 2023 from https://docs.aws.amazon.com/AmazonS3/latest/userguide/selecting-content-from-objects.htmlGoogle Scholar
- Alex Barcelo, Anna Queralt, and Toni Cortes. 2022. Revisiting active object stores: Bringing data locality to the limit with NVM. Future Generation Computer Systems 129 (2022), 425--439. https://doi.org/10.1016/j.future.2021.10.025Google ScholarDigital Library
- Daniel Barcelona-Pons, Marc Sánchez-Artigas, Gerard París, Pierre Sutra, and Pedro García-López. 2019. On the FaaS Track: Building Stateful Distributed Applications with Serverless Architectures. In Proceedings of the 20th International Middleware Conference (Davis, CA, USA) (Middleware '19). Association for Computing Machinery, New York, NY, USA, 41--54. https://doi.org/10.1145/3361525.3361535Google ScholarDigital Library
- Daniel Barcelona-Pons, Pierre Sutra, Marc Sánchez-Artigas, Gerard París, and Pedro García-López. 2022. Stateful Serverless Computing with Crucial. ACM Trans. Softw. Eng. Methodol. 31, 3, Article 39 (mar 2022), 38 pages. https://doi.org/10.1145/3490386Google ScholarDigital Library
- Phil Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, and Jorgen Thelin. 2014. Orleans: Distributed Virtual Actors for Programmability and Scalability. Technical Report MSR-TR-2014-41. Microsoft. https://www.microsoft.com/en-us/research/publication/orleans-distributed-virtual-actors-for-programmability-and-scalability/Google Scholar
- Chao Chen, Yong Chen, and Philip C. Roth. 2012. DOSAS: Mitigating the Resource Contention in Active Storage Systems. In 2012 IEEE International Conference on Cluster Computing. IEEE, New York, NY, USA, 164--172. https://doi.org/10.1109/CLUSTER.2012.66Google ScholarDigital Library
- Marcin Copik, Roman Böhringer, Alexandru Calotoiu, and Torsten Hoefler. 2023. FMI: Fast and Cheap Message Passing for Serverless Functions. In Proceedings of the 37th International Conference on Supercomputing (Orlando, FL, USA) (ICS '23). Association for Computing Machinery, New York, NY, USA, 373--385. https://doi.org/10.1145/3577193.3593718Google ScholarDigital Library
- Haggai Eran, Lior Zeno, Maroun Tork, Gabi Malka, and Mark Silberstein. 2019. NICA: An Infrastructure for Inline Acceleration of Network Applications. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 345--362. https://www.usenix.org/conference/atc19/presentation/eranGoogle Scholar
- Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava, Anshuman Verma, Qasim Zuhair, Deepak Bansal, Doug Burger, Kushagra Vaid, David A. Maltz, and Albert Greenberg. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 51--66. https://www.usenix.org/conference/nsdi18/presentation/firestoneGoogle ScholarDigital Library
- Apache Flink. 2022. Stateful Functions. Retrieved September 15, 2023 from https://nightlies.apache.org/flink/flink-statefun-docs-master/Google Scholar
- Sadjad Fouladi, Francisco Romero, Dan Iter, Qian Li, Shuvo Chatterjee, Christos Kozyrakis, Matei Zaharia, and Keith Winstein. 2019. From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 475--488. http://www.usenix.org/conference/atc19/presentation/fouladiGoogle Scholar
- Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. 2017. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 363--376. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/fouladiGoogle ScholarDigital Library
- Glider. 2023. GitHub. Retrieved September 15, 2023 from https://github.com/CLOUDLAB-URV/glider-storeGoogle Scholar
- Raúl Gracia-Tinedo, Marc Sanchez-Artigas, Pedro Garcia-Lopez, Yosef Moatti, and Filip Gluszak. 2019. Lamda-Flow: Automatic Pushdown of Dataflow Operators Close to the Data. In 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, New York, NY, USA, 112--121. https://doi.org/10.1109/CCGRID.2019.00022Google ScholarCross Ref
- Joseph M. Hellerstein, Jose Faleiro, Joseph E. Gonzalez, Johann Schleier-Smith, Vikram Sreekanti, Alexey Tumanov, and Chenggang Wu. 2018. Serverless Computing: One Step Forward, Two Steps Back. https://doi.org/10.48550/ARXIV.1812.03651Google ScholarCross Ref
- K. R. Jayaram, Vinod Muthusamy, Parijat Dube, Vatche Ishakian, Chen Wang, Benjamin Herta, Scott Boag, Diana Arroyo, Asser Tantawi, Archit Verma, Falk Pollok, and Rania Khalaf. 2019. FfDL: A Flexible Multi-Tenant Deep Learning Platform. In Proceedings of the 20th International Middleware Conference (Davis, CA, USA) (Middleware '19). Association for Computing Machinery, New York, NY, USA, 82--95. https://doi.org/10.1145/3361525.3361538Google ScholarDigital Library
- Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the Cloud: Distributed Computing for the 99%. In Proceedings of the 2017 Symposium on Cloud Computing (Santa Clara, California) (SoCC '17). Association for Computing Machinery, New York, NY, USA, 445--451. https://doi.org/10.1145/3127479.3128601Google ScholarDigital Library
- Eric Jonas, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, Joao Menezes Carreira, Karl Krauth, Neeraja Yadwadkar, Joseph Gonzalez, Raluca Ada Popa, Ion Stoica, and David A. Patterson. 2019. Cloud Programming Simplified: A Berkeley View on Serverless Computing. Technical Report UCB/EECS-2019-3. EECS Department, University of California, Berkeley.Google Scholar
- Anuj Kalia, Michael Kaminsky, and David Andersen. 2019. Datacenter RPCs can be General and Fast. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association, Boston, MA, 1--16. https://www.usenix.org/conference/nsdi19/presentation/kaliaGoogle ScholarDigital Library
- Anurag Khandelwal, Yupeng Tang, Rachit Agarwal, Aditya Akella, and Ion Stoica. 2022. Jiffy: Elastic Far-Memory for Stateful Serverless Analytics. In Proceedings of the Seventeenth European Conference on Computer Systems (Rennes, France) (EuroSys '22). Association for Computing Machinery, New York, NY, USA, 697--713. https://doi.org/10.1145/3492321.3527539Google ScholarDigital Library
- Ana Klimovic, Yawen Wang, Christos Kozyrakis, Patrick Stuedi, Jonas Pfefferle, and Animesh Trivedi. 2018. Understanding Ephemeral Storage for Serverless Analytics. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 789--794. https://www.usenix.org/conference/atc18/presentation/klimovic-serverlessGoogle Scholar
- Ana Klimovic, Yawen Wang, Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, and Christos Kozyrakis. 2018. Pocket: Elastic Ephemeral Storage for Serverless Analytics. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 427--444. https://www.usenix.org/conference/osdi18/presentation/klimovicGoogle ScholarDigital Library
- Lightbend. 2023. Akka. Retrieved September 15, 2023 from https://akka.io/Google Scholar
- Microsoft Azure. 2023. Entity functions. Retrieved September 15, 2023 from https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-entitiesGoogle Scholar
- Yosef Moatti, Eran Rom, Raul Gracia-Tinedo, Dalit Naor, Doron Chen, Josep Sampe, Marc Sanchez-Artigas, Pedro Garcıa-Lopez, Filip Gluszak, Eric Deschdt, Francesco Pace, Daniele Venzano, and Pietro Michiardi. 2017. Too Big to Eat: Boosting Analytics Data Ingestion from Object Stores with Scoop. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, New York, NY, USA, 309--320. https://doi.org/10.1109/ICDE.2017.243Google ScholarCross Ref
- Djob Mvondo, Mathieu Bacou, Kevin Nguetchouang, Lucien Ngale, Stéphane Pouget, Josiane Kouam, Renaud Lachaize, Jinho Hwang, Tim Wood, Daniel Hagimont, Noël De Palma, Bernabé Batchakui, and Alain Tchana. 2021. OFC: An Opportunistic Caching System for FaaS Platforms. In Proceedings of the Sixteenth European Conference on Computer Systems (Online Event, United Kingdom) (EuroSys '21). Association for Computing Machinery, New York, NY, USA, 228--244. https://doi.org/10.1145/3447786.3456239Google ScholarDigital Library
- National Library of Medicine. 2021. WGS of tumor sample from patient P94 (SRR15068323). Retrieved September 15, 2023 from https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR15068323Google Scholar
- Qifan Pu, Shivaram Venkataraman, and Ion Stoica. 2019. Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association, Boston, MA, 193--206. https://www.usenix.org/conference/nsdi19/presentation/puGoogle ScholarDigital Library
- Erik Riedel, Garth A. Gibson, and Christos Faloutsos. 1998. Active Storage for Large-Scale Data Mining and Multimedia. In Proceedings of the 24rd International Conference on Very Large Data Bases (VLDB '98). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 62--73.Google ScholarDigital Library
- Francisco Romero, Gohar Irfan Chaudhry, Íñigo Goiri, Pragna Gopa, Paul Batum, Neeraja J. Yadwadkar, Rodrigo Fonseca, Christos Kozyrakis, and Ricardo Bianchini. 2021. Faa$T: A Transparent Auto-Scaling Cache for Serverless Applications. In Proceedings of the ACM Symposium on Cloud Computing (Seattle, WA, USA) (SoCC '21). Association for Computing Machinery, New York, NY, USA, 122--137. https://doi.org/10.1145/3472883.3486974Google ScholarDigital Library
- Josep Sampé, Marc Sánchez-Artigas, Pedro García-López, and Gerard París. 2017. Data-Driven Serverless Functions for Object Storage. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference (Las Vegas, Nevada) (Middleware '17). Association for Computing Machinery, New York, NY, USA, 121--133. https://doi.org/10.1145/3135974.3135980Google ScholarDigital Library
- Marc Sánchez-Artigas, Germán T. Eizaguirre, Gil Vernik, Lachlan Stuart, and Pedro García-López. 2020. Primula: A Practical Shuffle/Sort Operator for Serverless Computing. In Proceedings of the 21st International Middleware Conference Industrial Track (Delft, Netherlands) (Middleware '20). Association for Computing Machinery, New York, NY, USA, 31--37. https://doi.org/10.1145/3429357.3430522Google ScholarDigital Library
- Bo Sang, Pierre-Louis Roman, Patrick Eugster, Hui Lu, Srivatsan Ravi, and Gustavo Petri. 2020. PLASMA: Programmable Elasticity for Stateful Cloud Computing Applications. In Proceedings of the Fifteenth European Conference on Computer Systems (Heraklion, Greece) (EuroSys '20). Association for Computing Machinery, New York, NY, USA, Article 42, 15 pages. https://doi.org/10.1145/3342195.3387553Google ScholarDigital Library
- Pablo Gimeno Sarroca and Marc Sánchez-Artigas. 2023. On Data Processing through the Lenses of S3 Object Lambda. In IEEE INFOCOM 2023 - IEEE Conference on Computer Communications. IEEE, New York, NY, USA, 1--10. https://doi.org/10.1109/INFOCOM53939.2023.10228890Google ScholarCross Ref
- Simon Shillaker and Peter Pietzuch. 2020. Faasm: Lightweight Isolation for Efficient Stateful Serverless Computing. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, Berkeley, CA, USA, 419--433. https://www.usenix.org/conference/atc20/presentation/shillakerGoogle Scholar
- Vikram Sreekanti, Chenggang Wu, Xiayue Charles Lin, Johann Schleier-Smith, Joseph E. Gonzalez, Joseph M. Hellerstein, and Alexey Tumanov. 2020. Cloudburst: Stateful Functions-as-a-Service. Proc. VLDB Endow. 13, 12 (jul 2020), 2438--2452. https://doi.org/10.14778/3407790.3407836Google ScholarDigital Library
- Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, Ana Klimovic, Adrian Schuep-bach, and Bernard Metzler. 2019. Unification of Temporary Storage in the NodeKernel Architecture. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 767--782. https://www.usenix.org/conference/atc19/presentation/stuediGoogle Scholar
- Ao Wang, Jingyuan Zhang, Xiaolong Ma, Ali Anwar, Lukas Rupprecht, Dimitrios Skourtis, Vasily Tarasov, Feng Yan, and Yue Cheng. 2020. InfiniCache: Exploiting Ephemeral Serverless Functions to Build a Cost-Effective Memory Cache. In 18th USENIX Conference on File and Storage Technologies (FAST 20). USENIX Association, Santa Clara, CA, 267--281. https://www.usenix.org/conference/fast20/presentation/wang-aoGoogle Scholar
- Liang Wang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart, and Michael Swift. 2018. Peeking Behind the Curtains of Serverless Platforms. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 133--146. https://www.usenix.org/conference/atc18/presentation/wang-liangGoogle ScholarDigital Library
- Wikimedia. 2023. Wikimedia downloads. Retrieved September 15, 2023 from https://dumps.wikimedia.org/Google Scholar
- Jingyuan Zhang, Ao Wang, Xiaolong Ma, Benjamin Carver, Nicholas John Newman, Ali Anwar, Lukas Rupprecht, Dimitrios Skourtis, Vasily Tarasov, Feng Yan, and Yue Cheng. 2022. Sion: Elastic Serverless Cloud Storage. https://doi.org/10.48550/ARXIV.2209.01496Google ScholarCross Ref
- Tian Zhang, Dong Xie, Feifei Li, and Ryan Stutsman. 2019. Narrowing the Gap Between Serverless and Its State with Storage Functions. In Proceedings of the ACM Symposium on Cloud Computing (Santa Cruz, CA, USA) (SoCC '19). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3357223.3362723Google ScholarDigital Library
Index Terms
- Glider: Serverless Ephemeral Stateful Near-Data Computation
Recommendations
XFaaS: Hyperscale and Low Cost Serverless Functions at Meta
SOSP '23: Proceedings of the 29th Symposium on Operating Systems PrinciplesFunction-as-a-Service (FaaS) has become a popular programming paradigm in Serverless Computing. As the responsibility of resource provisioning shifts from users to cloud providers, the ease of use of FaaS for users may come at the expense of extra ...
iSeSA: Towards Migrating HPC and AI Workloads to Serverless Platform
FlexScience '22: Proceedings of the 12th Workshop on AI and Scientific Computing at Scale using Flexible Computing InfrastructuresHigh Performance Computing applications typically involve compute intensive simulations which inherently may process large amounts of data at high speeds. The high compute requirements of these applications often exceed the on-premise cluster capacity to ...
Architectural Implications of Function-as-a-Service Computing
MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on MicroarchitectureServerless computing is a rapidly growing cloud application model, popularized by Amazon's Lambda platform. Serverless cloud services provide fine-grained provisioning of resources, which scale automatically with user demand. Function-as-a-Service (FaaS)...
Comments