skip to main content
10.1145/3599691.3603407acmconferencesArticle/Chapter ViewAbstractPublication PageshotstorageConference Proceedingsconference-collections
research-article

SAND: A Storage Abstraction for Video-based Deep Learning

Published: 10 July 2023 Publication History

Abstract

Deep learning has gained significant success in video applications such as classification, analytics, and self-supervised learning. However, when scaling out to a large volume of videos, existing approaches suffer from a fundamental limitation; they cannot efficiently utilize GPUs for training deep neural networks (DNNs). This is because video decoding in data preparation incurs a prohibitive amount of computing overhead, making GPU idle for the majority of training time. Otherwise, caching raw videos in memory or storage to bypass decoding is not scalable as they account for from tens to hundreds of terabytes.
This paper proposes SAND, a system that enables deep learning frameworks to directly access training data by a storage abstraction. This abstraction effectively hides the data preprocessing delay, enabling GPUs to be fully utilized for DNN training. To accomplish this, SAND operates an in-storage cache and manages the cache by ahead-of-time scheduling to guarantee that requested training data can be always retrieved immediately from the cache. This scheduling considers the future data accesses of deep learning frameworks for cache replacement. Compared to the existing approach, our evaluation using emulated environments shows that SAND improves the GPU utilization by 6.0X and reduces the training time by 75.9% on average.

References

[1]
AWS P3 Instance Official Webpage. https://aws.amazon.com/ec2/instance-types/p3/?nc1=h_ls.
[2]
Google Cloud Compute Pricing. https://cloud.google.com/compute/all-pricing. Online; accessed: June 8, 2023.
[3]
Jetson AGX Xavier Official Webpage. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier/.
[4]
Linux Virtual Filesystem Overview. https://www.kernel.org/doc/html/latest/filesystems/vfs.html.
[5]
NVIDIA 3060 GPU Official Webpage. https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3060-3060ti/.
[6]
NVIDIA GTX 3090 GPU Official Webpage. https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090-3090ti/.
[7]
Samsung Datacenter SSD, pm9a3. https://semiconductor.samsung.com/ssd/datacenter-ssd/pm9a3/.
[8]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensor-Flow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI'16). USENIX Association, USA, 265--283.
[9]
Younes Akbari, Somaya Al-Maadeed, Noor Al-Maadeed, Al Anood Najeeb, Afnan Al-Ali, Fouad Khelifi, and Ashref Lawgaly. A New Forensic Video Database for Source Smartphone Identification: Description and Analysis. IEEE Access 10 (2022), 20080--20091.
[10]
Amazon Web Services. P4 Instances. https://aws.amazon.com/ec2/instance-types/p4/?nc1=h_ls. Accessed: March 28, 2023.
[11]
Brian Beach, Steven Armentrout, Rodney Bozo, and Emmanuel Tsouris. Elastic Block Storage. Apress, Berkeley, CA, 59--84.
[12]
Cisco. Cisco Visual Networking Index: Global Device Growth and Traffic Profiles. https://www.cisco.com/c/dam/m/en_us/solutions/service-provider/vni-forecast-highlights/pdf/Global_Device_Growth_Traffic_Profiles.pdf. Accessed: March 28, 2023.
[13]
Srijan Das, Rui Dai, Michal Koperski, Luca Minciullo, Lorenzo Garattoni, Francois Bremond, and Gianpiero Francesca. Toyota Smarthome: Real-World Activities of Daily Living. In The IEEE International Conference on Computer Vision (ICCV).
[14]
Srijan Das, Saurav Sharma, Rui Dai, François Brémond, and Monique Thonnat. VPN: Learning Video-Pose Embedding for Activities of Daily Living. Lecture Notes in Computer Science (2020), 72--90.
[15]
Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, and Bo Dai. Revisiting Skeleton-based Action Recognition. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Jun 2022).
[16]
Haoqi Fan, Yanghao Li, Bo Xiong, Wan-Yen Lo, and Christoph Feichtenhofer. PySlowFast. https://github.com/facebookresearch/slowfast.
[17]
Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, and Christoph Feichtenhofer. Multiscale vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6824--6835.
[18]
Christoph Feichtenhofer, Haoqi Fan, Bo Xiong, Ross Girshick, and Kaiming He. A large-scale study on unsupervised spatiotemporal representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3299--3309.
[19]
Longlong Jing, Xiaodong Yang, and Yingli Tian. Video you only look once: Overall temporal convolutions for action recognition. Journal of Visual Communication and Image Representation 52 (2018), 58--65.
[20]
Luyi Kang, Yuqi Xue, Weiwei Jia, Xiaohao Wang, Jongryool Kim, Changhwan Youn, Myeong Joon Kang, Hyung Jin Lim, Bruce Jacob, and Jian Huang. IceClave: A Trusted Execution Environment for In-Storage Computing. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (Virtual Event, Greece) (MICRO '21). Association for Computing Machinery, New York, NY, USA, 199--211.
[21]
Will Kay, João Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. The Kinetics Human Action Video Dataset. CoRR abs/1705.06950 (2017). arXiv:1705.06950 http://arxiv.org/abs/1705.06950
[22]
Jinhyung Kim, Taeoh Kim, Minho Shim, Dongyoon Han, Dongyoon Wee, and Junmo Kim. Frequency Selective Augmentation for Video Representation Learning. arXiv:cs.CV/2204.03865
[23]
Heeseung Kwon, Manjin Kim, Suha Kwak, and Minsu Cho. MotionSqueeze: Neural Motion Feature Learning for Video Understanding. Lecture Notes in Computer Science (2020), 345--362.
[24]
Joo Hwan Lee, Hui Zhang, Veronica Lagrange, Praveen Krishnamoorthy, Xiaodong Zhao, and Yang Seok Ki. SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD. IEEE Computer Architecture Letters 19, 2 (2020), 110--113.
[25]
Michael Mathieu, Camille Couprie, and Yann LeCun. Deep multi-scale video prediction beyond mean square error. arXiv:cs.LG/1511.05440 4th International Conference on Learning Representations, ICLR 2016; Conference date: 02-05-2016 Through 04-05-2016.
[26]
Ishan Misra, C. Lawrence Zitnick, and Martial Hebert. Shuffle and Learn: Unsupervised Learning using Temporal Order Verification. arXiv:cs.CV/1603.08561
[27]
NVIDIA Corporation. NVIDIA Video Codec SDK. https://developer.nvidia.com/video-codec-sdk.
[28]
Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee, J. K. Aggarwal, Hyungtae Lee, Larry Davis, Eran Swears, Xioyang Wang, Qiang Ji, Kishore Reddy, Mubarak Shah, Carl Vondrick, Hamed Pirsiavash, Deva Ramanan, Jenny Yuen, Antonio Torralba, Bi Song, Anesco Fong, Amit Roy-Chowdhury, and Mita Desai. A large-scale benchmark dataset for event recognition in surveillance video. In CVPR 2011. 3153--3160.
[29]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates Inc., Red Hook, NY, USA.
[30]
Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge J. Belongie, Ming-Hsuan Yang, Hartwig Adam, and Yin Cui. Exploring Temporal Granularity in Self-Supervised Video Representation Learning. ArXiv abs/2112.04480 (2021).
[31]
Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, and Yin Cui. Spatiotemporal Contrastive Video Representation Learning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Jun 2021).
[32]
Xukan Ran, Haolianz Chen, Xiaodan Zhu, Zhenming Liu, and Jiasi Chen. DeepDecision: A Mobile Deep Learning Framework for Edge Video Analytics. In IEEE INFOCOM 2018 - IEEE Conference on Computer Communications. 1421--1429.
[33]
Adrià Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Patraucean, Florent Altch'e, Michael Valko, Jean-Bastien Grill, Aäron van den Oord, and Andrew Zisserman. Broaden Your Views for Self-Supervised Video Learning. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), 1235--1245.
[34]
Amazon Web Services. Amazon Elastic Compute Cloud User Guide. https://docs.aws.amazon.com/pdfs/AWSEC2/latest/UserGuide/ec2-ug.pdf [Online; accessed 28-March-2023].
[35]
Gilad Sharir, Asaf Noy, and Lihi Zelnik-Manor. An image is worth 16×16 words, what is a video worth? arXiv preprint arXiv:2103.13915 (2021).
[36]
Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of big data 6, 1 (2019), 1--48.
[37]
Lucas Smaira, João Carreira, Eric Noland, Ellen Clancy, Amy Wu, and Andrew Zisserman. A short note on the kinetics-700-2020 human action dataset. arXiv preprint arXiv:2010.10864 (2020).
[38]
Zhan Tong, Yibing Song, Jue Wang, and Limin Wang. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. arXiv preprint arXiv:2203.12602 (2022).
[39]
Du Tran, Heng Wang, Lorenzo Torresani, and Matt Feiszli. Video classification with channel-separated convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5552--5561.
[40]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-Local Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41]
Yi Wang, Kunchang Li, Yizhuo Li, Yinan He, Bingkun Huang, Zhiyu Zhao, Hongjie Zhang, Jilan Xu, Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Jiashuo Yu, Yali Wang, Limin Wang, and Yu Qiao. InternVideo: General Video Foundation Models via Generative and Discriminative Learning. arXiv preprint arXiv:2212.03191 (2022).
[42]
Hyunho Yeo, Chan Ju Chong, Youngmok Jung, Juncheol Ye, and Dongsu Han. NEMO: Enabling Neural-Enhanced Video Streaming on Commodity Mobile Devices. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (London, United Kingdom) (MobiCom '20). Article 28, 14 pages.
[43]
Hyunho Yeo, Hwijoon Lim, Jaehong Kim, Youngmok Jung, Juncheol Ye, and Dongsu Han. NeuroScaler: neural video enhancement at scale. In Proceedings of the ACM SIGCOMM 2022 Conference. 795--811.
[44]
Mohammadreza Zolfaghari, Kamaljeet Singh, and Thomas Brox. Eco: Efficient convolutional network for online video understanding. In Proceedings of the European conference on computer vision (ECCV). 695--712.

Index Terms

  1. SAND: A Storage Abstraction for Video-based Deep Learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HotStorage '23: Proceedings of the 15th ACM Workshop on Hot Topics in Storage and File Systems
    July 2023
    131 pages
    ISBN:9798400702242
    DOI:10.1145/3599691
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    • USENIX

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 July 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. storage abstraction
    2. video preprocessing
    3. computational storage

    Qualifiers

    • Research-article

    Conference

    HotStorage '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 34 of 87 submissions, 39%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 360
      Total Downloads
    • Downloads (Last 12 months)180
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media