skip to main content
10.1145/3599691.3603407acmconferencesArticle/Chapter ViewAbstractPublication PageshotstorageConference Proceedingsconference-collections
research-article

SAND: A Storage Abstraction for Video-based Deep Learning

Published:10 July 2023Publication History

ABSTRACT

Deep learning has gained significant success in video applications such as classification, analytics, and self-supervised learning. However, when scaling out to a large volume of videos, existing approaches suffer from a fundamental limitation; they cannot efficiently utilize GPUs for training deep neural networks (DNNs). This is because video decoding in data preparation incurs a prohibitive amount of computing overhead, making GPU idle for the majority of training time. Otherwise, caching raw videos in memory or storage to bypass decoding is not scalable as they account for from tens to hundreds of terabytes.

This paper proposes SAND, a system that enables deep learning frameworks to directly access training data by a storage abstraction. This abstraction effectively hides the data preprocessing delay, enabling GPUs to be fully utilized for DNN training. To accomplish this, SAND operates an in-storage cache and manages the cache by ahead-of-time scheduling to guarantee that requested training data can be always retrieved immediately from the cache. This scheduling considers the future data accesses of deep learning frameworks for cache replacement. Compared to the existing approach, our evaluation using emulated environments shows that SAND improves the GPU utilization by 6.0X and reduces the training time by 75.9% on average.

References

  1. AWS P3 Instance Official Webpage. https://aws.amazon.com/ec2/instance-types/p3/?nc1=h_ls.Google ScholarGoogle Scholar
  2. Google Cloud Compute Pricing. https://cloud.google.com/compute/all-pricing. Online; accessed: June 8, 2023.Google ScholarGoogle Scholar
  3. Jetson AGX Xavier Official Webpage. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier/.Google ScholarGoogle Scholar
  4. Linux Virtual Filesystem Overview. https://www.kernel.org/doc/html/latest/filesystems/vfs.html.Google ScholarGoogle Scholar
  5. NVIDIA 3060 GPU Official Webpage. https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3060-3060ti/.Google ScholarGoogle Scholar
  6. NVIDIA GTX 3090 GPU Official Webpage. https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090-3090ti/.Google ScholarGoogle Scholar
  7. Samsung Datacenter SSD, pm9a3. https://semiconductor.samsung.com/ssd/datacenter-ssd/pm9a3/.Google ScholarGoogle Scholar
  8. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensor-Flow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI'16). USENIX Association, USA, 265--283.Google ScholarGoogle Scholar
  9. Younes Akbari, Somaya Al-Maadeed, Noor Al-Maadeed, Al Anood Najeeb, Afnan Al-Ali, Fouad Khelifi, and Ashref Lawgaly. A New Forensic Video Database for Source Smartphone Identification: Description and Analysis. IEEE Access 10 (2022), 20080--20091. Google ScholarGoogle ScholarCross RefCross Ref
  10. Amazon Web Services. P4 Instances. https://aws.amazon.com/ec2/instance-types/p4/?nc1=h_ls. Accessed: March 28, 2023.Google ScholarGoogle Scholar
  11. Brian Beach, Steven Armentrout, Rodney Bozo, and Emmanuel Tsouris. Elastic Block Storage. Apress, Berkeley, CA, 59--84. Google ScholarGoogle ScholarCross RefCross Ref
  12. Cisco. Cisco Visual Networking Index: Global Device Growth and Traffic Profiles. https://www.cisco.com/c/dam/m/en_us/solutions/service-provider/vni-forecast-highlights/pdf/Global_Device_Growth_Traffic_Profiles.pdf. Accessed: March 28, 2023.Google ScholarGoogle Scholar
  13. Srijan Das, Rui Dai, Michal Koperski, Luca Minciullo, Lorenzo Garattoni, Francois Bremond, and Gianpiero Francesca. Toyota Smarthome: Real-World Activities of Daily Living. In The IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  14. Srijan Das, Saurav Sharma, Rui Dai, François Brémond, and Monique Thonnat. VPN: Learning Video-Pose Embedding for Activities of Daily Living. Lecture Notes in Computer Science (2020), 72--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, and Bo Dai. Revisiting Skeleton-based Action Recognition. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Jun 2022). Google ScholarGoogle ScholarCross RefCross Ref
  16. Haoqi Fan, Yanghao Li, Bo Xiong, Wan-Yen Lo, and Christoph Feichtenhofer. PySlowFast. https://github.com/facebookresearch/slowfast.Google ScholarGoogle Scholar
  17. Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, and Christoph Feichtenhofer. Multiscale vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6824--6835.Google ScholarGoogle Scholar
  18. Christoph Feichtenhofer, Haoqi Fan, Bo Xiong, Ross Girshick, and Kaiming He. A large-scale study on unsupervised spatiotemporal representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3299--3309.Google ScholarGoogle Scholar
  19. Longlong Jing, Xiaodong Yang, and Yingli Tian. Video you only look once: Overall temporal convolutions for action recognition. Journal of Visual Communication and Image Representation 52 (2018), 58--65. Google ScholarGoogle ScholarCross RefCross Ref
  20. Luyi Kang, Yuqi Xue, Weiwei Jia, Xiaohao Wang, Jongryool Kim, Changhwan Youn, Myeong Joon Kang, Hyung Jin Lim, Bruce Jacob, and Jian Huang. IceClave: A Trusted Execution Environment for In-Storage Computing. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (Virtual Event, Greece) (MICRO '21). Association for Computing Machinery, New York, NY, USA, 199--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Will Kay, João Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. The Kinetics Human Action Video Dataset. CoRR abs/1705.06950 (2017). arXiv:1705.06950 http://arxiv.org/abs/1705.06950Google ScholarGoogle Scholar
  22. Jinhyung Kim, Taeoh Kim, Minho Shim, Dongyoon Han, Dongyoon Wee, and Junmo Kim. Frequency Selective Augmentation for Video Representation Learning. arXiv:cs.CV/2204.03865Google ScholarGoogle Scholar
  23. Heeseung Kwon, Manjin Kim, Suha Kwak, and Minsu Cho. MotionSqueeze: Neural Motion Feature Learning for Video Understanding. Lecture Notes in Computer Science (2020), 345--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Joo Hwan Lee, Hui Zhang, Veronica Lagrange, Praveen Krishnamoorthy, Xiaodong Zhao, and Yang Seok Ki. SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD. IEEE Computer Architecture Letters 19, 2 (2020), 110--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Michael Mathieu, Camille Couprie, and Yann LeCun. Deep multi-scale video prediction beyond mean square error. arXiv:cs.LG/1511.05440 4th International Conference on Learning Representations, ICLR 2016; Conference date: 02-05-2016 Through 04-05-2016.Google ScholarGoogle Scholar
  26. Ishan Misra, C. Lawrence Zitnick, and Martial Hebert. Shuffle and Learn: Unsupervised Learning using Temporal Order Verification. arXiv:cs.CV/1603.08561Google ScholarGoogle Scholar
  27. NVIDIA Corporation. NVIDIA Video Codec SDK. https://developer.nvidia.com/video-codec-sdk.Google ScholarGoogle Scholar
  28. Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee, J. K. Aggarwal, Hyungtae Lee, Larry Davis, Eran Swears, Xioyang Wang, Qiang Ji, Kishore Reddy, Mubarak Shah, Carl Vondrick, Hamed Pirsiavash, Deva Ramanan, Jenny Yuen, Antonio Torralba, Bi Song, Anesco Fong, Amit Roy-Chowdhury, and Mita Desai. A large-scale benchmark dataset for event recognition in surveillance video. In CVPR 2011. 3153--3160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates Inc., Red Hook, NY, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge J. Belongie, Ming-Hsuan Yang, Hartwig Adam, and Yin Cui. Exploring Temporal Granularity in Self-Supervised Video Representation Learning. ArXiv abs/2112.04480 (2021).Google ScholarGoogle Scholar
  31. Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, and Yin Cui. Spatiotemporal Contrastive Video Representation Learning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Jun 2021). Google ScholarGoogle ScholarCross RefCross Ref
  32. Xukan Ran, Haolianz Chen, Xiaodan Zhu, Zhenming Liu, and Jiasi Chen. DeepDecision: A Mobile Deep Learning Framework for Edge Video Analytics. In IEEE INFOCOM 2018 - IEEE Conference on Computer Communications. 1421--1429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Adrià Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Patraucean, Florent Altch'e, Michael Valko, Jean-Bastien Grill, Aäron van den Oord, and Andrew Zisserman. Broaden Your Views for Self-Supervised Video Learning. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), 1235--1245.Google ScholarGoogle Scholar
  34. Amazon Web Services. Amazon Elastic Compute Cloud User Guide. https://docs.aws.amazon.com/pdfs/AWSEC2/latest/UserGuide/ec2-ug.pdf [Online; accessed 28-March-2023].Google ScholarGoogle Scholar
  35. Gilad Sharir, Asaf Noy, and Lihi Zelnik-Manor. An image is worth 16×16 words, what is a video worth? arXiv preprint arXiv:2103.13915 (2021).Google ScholarGoogle Scholar
  36. Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of big data 6, 1 (2019), 1--48.Google ScholarGoogle Scholar
  37. Lucas Smaira, João Carreira, Eric Noland, Ellen Clancy, Amy Wu, and Andrew Zisserman. A short note on the kinetics-700-2020 human action dataset. arXiv preprint arXiv:2010.10864 (2020).Google ScholarGoogle Scholar
  38. Zhan Tong, Yibing Song, Jue Wang, and Limin Wang. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. arXiv preprint arXiv:2203.12602 (2022).Google ScholarGoogle Scholar
  39. Du Tran, Heng Wang, Lorenzo Torresani, and Matt Feiszli. Video classification with channel-separated convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5552--5561.Google ScholarGoogle Scholar
  40. Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-Local Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  41. Yi Wang, Kunchang Li, Yizhuo Li, Yinan He, Bingkun Huang, Zhiyu Zhao, Hongjie Zhang, Jilan Xu, Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Jiashuo Yu, Yali Wang, Limin Wang, and Yu Qiao. InternVideo: General Video Foundation Models via Generative and Discriminative Learning. arXiv preprint arXiv:2212.03191 (2022).Google ScholarGoogle Scholar
  42. Hyunho Yeo, Chan Ju Chong, Youngmok Jung, Juncheol Ye, and Dongsu Han. NEMO: Enabling Neural-Enhanced Video Streaming on Commodity Mobile Devices. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (London, United Kingdom) (MobiCom '20). Article 28, 14 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Hyunho Yeo, Hwijoon Lim, Jaehong Kim, Youngmok Jung, Juncheol Ye, and Dongsu Han. NeuroScaler: neural video enhancement at scale. In Proceedings of the ACM SIGCOMM 2022 Conference. 795--811.Google ScholarGoogle Scholar
  44. Mohammadreza Zolfaghari, Kamaljeet Singh, and Thomas Brox. Eco: Efficient convolutional network for online video understanding. In Proceedings of the European conference on computer vision (ECCV). 695--712.Google ScholarGoogle Scholar

Index Terms

  1. SAND: A Storage Abstraction for Video-based Deep Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      HotStorage '23: Proceedings of the 15th ACM Workshop on Hot Topics in Storage and File Systems
      July 2023
      131 pages
      ISBN:9798400702242
      DOI:10.1145/3599691

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 July 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate34of87submissions,39%

      Upcoming Conference

      HOTSTORAGE '24
    • Article Metrics

      • Downloads (Last 12 months)231
      • Downloads (Last 6 weeks)19

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader