research-article

SAND: A Storage Abstraction for Video-based Deep Learning

Authors:

Dongsu HanAuthors Info & Claims

HotStorage '23: Proceedings of the 15th ACM Workshop on Hot Topics in Storage and File Systems

Pages 16 - 23

https://doi.org/10.1145/3599691.3603407

Published: 10 July 2023 Publication History

Abstract

Deep learning has gained significant success in video applications such as classification, analytics, and self-supervised learning. However, when scaling out to a large volume of videos, existing approaches suffer from a fundamental limitation; they cannot efficiently utilize GPUs for training deep neural networks (DNNs). This is because video decoding in data preparation incurs a prohibitive amount of computing overhead, making GPU idle for the majority of training time. Otherwise, caching raw videos in memory or storage to bypass decoding is not scalable as they account for from tens to hundreds of terabytes.

This paper proposes SAND, a system that enables deep learning frameworks to directly access training data by a storage abstraction. This abstraction effectively hides the data preprocessing delay, enabling GPUs to be fully utilized for DNN training. To accomplish this, SAND operates an in-storage cache and manages the cache by ahead-of-time scheduling to guarantee that requested training data can be always retrieved immediately from the cache. This scheduling considers the future data accesses of deep learning frameworks for cache replacement. Compared to the existing approach, our evaluation using emulated environments shows that SAND improves the GPU utilization by 6.0X and reduces the training time by 75.9% on average.

References

[1]

AWS P3 Instance Official Webpage. https://aws.amazon.com/ec2/instance-types/p3/?nc1=h_ls.

[2]

Google Cloud Compute Pricing. https://cloud.google.com/compute/all-pricing. Online; accessed: June 8, 2023.

[3]

Jetson AGX Xavier Official Webpage. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier/.

[4]

Linux Virtual Filesystem Overview. https://www.kernel.org/doc/html/latest/filesystems/vfs.html.

[5]

NVIDIA 3060 GPU Official Webpage. https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3060-3060ti/.

[6]

NVIDIA GTX 3090 GPU Official Webpage. https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090-3090ti/.

[7]

Samsung Datacenter SSD, pm9a3. https://semiconductor.samsung.com/ssd/datacenter-ssd/pm9a3/.

[8]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensor-Flow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI'16). USENIX Association, USA, 265--283.

[9]

Younes Akbari, Somaya Al-Maadeed, Noor Al-Maadeed, Al Anood Najeeb, Afnan Al-Ali, Fouad Khelifi, and Ashref Lawgaly. A New Forensic Video Database for Source Smartphone Identification: Description and Analysis. IEEE Access 10 (2022), 20080--20091.

[10]

Amazon Web Services. P4 Instances. https://aws.amazon.com/ec2/instance-types/p4/?nc1=h_ls. Accessed: March 28, 2023.

[11]

Brian Beach, Steven Armentrout, Rodney Bozo, and Emmanuel Tsouris. Elastic Block Storage. Apress, Berkeley, CA, 59--84.

[12]

Cisco. Cisco Visual Networking Index: Global Device Growth and Traffic Profiles. https://www.cisco.com/c/dam/m/en_us/solutions/service-provider/vni-forecast-highlights/pdf/Global_Device_Growth_Traffic_Profiles.pdf. Accessed: March 28, 2023.

[13]

Srijan Das, Rui Dai, Michal Koperski, Luca Minciullo, Lorenzo Garattoni, Francois Bremond, and Gianpiero Francesca. Toyota Smarthome: Real-World Activities of Daily Living. In The IEEE International Conference on Computer Vision (ICCV).

[14]

Srijan Das, Saurav Sharma, Rui Dai, François Brémond, and Monique Thonnat. VPN: Learning Video-Pose Embedding for Activities of Daily Living. Lecture Notes in Computer Science (2020), 72--90.

Digital Library

[15]

Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, and Bo Dai. Revisiting Skeleton-based Action Recognition. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Jun 2022).

[16]

Haoqi Fan, Yanghao Li, Bo Xiong, Wan-Yen Lo, and Christoph Feichtenhofer. PySlowFast. https://github.com/facebookresearch/slowfast.

[17]

Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, and Christoph Feichtenhofer. Multiscale vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6824--6835.

[18]

Christoph Feichtenhofer, Haoqi Fan, Bo Xiong, Ross Girshick, and Kaiming He. A large-scale study on unsupervised spatiotemporal representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3299--3309.

[19]

Longlong Jing, Xiaodong Yang, and Yingli Tian. Video you only look once: Overall temporal convolutions for action recognition. Journal of Visual Communication and Image Representation 52 (2018), 58--65.

[20]

Luyi Kang, Yuqi Xue, Weiwei Jia, Xiaohao Wang, Jongryool Kim, Changhwan Youn, Myeong Joon Kang, Hyung Jin Lim, Bruce Jacob, and Jian Huang. IceClave: A Trusted Execution Environment for In-Storage Computing. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (Virtual Event, Greece) (MICRO '21). Association for Computing Machinery, New York, NY, USA, 199--211.

Digital Library

[21]

Will Kay, João Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. The Kinetics Human Action Video Dataset. CoRR abs/1705.06950 (2017). arXiv:1705.06950 http://arxiv.org/abs/1705.06950

[22]

Jinhyung Kim, Taeoh Kim, Minho Shim, Dongyoon Han, Dongyoon Wee, and Junmo Kim. Frequency Selective Augmentation for Video Representation Learning. arXiv:cs.CV/2204.03865

[23]

Heeseung Kwon, Manjin Kim, Suha Kwak, and Minsu Cho. MotionSqueeze: Neural Motion Feature Learning for Video Understanding. Lecture Notes in Computer Science (2020), 345--362.

Digital Library

[24]

Joo Hwan Lee, Hui Zhang, Veronica Lagrange, Praveen Krishnamoorthy, Xiaodong Zhao, and Yang Seok Ki. SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD. IEEE Computer Architecture Letters 19, 2 (2020), 110--113.

Digital Library

[25]

Michael Mathieu, Camille Couprie, and Yann LeCun. Deep multi-scale video prediction beyond mean square error. arXiv:cs.LG/1511.05440 4th International Conference on Learning Representations, ICLR 2016; Conference date: 02-05-2016 Through 04-05-2016.

[26]

Ishan Misra, C. Lawrence Zitnick, and Martial Hebert. Shuffle and Learn: Unsupervised Learning using Temporal Order Verification. arXiv:cs.CV/1603.08561

[27]

NVIDIA Corporation. NVIDIA Video Codec SDK. https://developer.nvidia.com/video-codec-sdk.

[28]

Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee, J. K. Aggarwal, Hyungtae Lee, Larry Davis, Eran Swears, Xioyang Wang, Qiang Ji, Kishore Reddy, Mubarak Shah, Carl Vondrick, Hamed Pirsiavash, Deva Ramanan, Jenny Yuen, Antonio Torralba, Bi Song, Anesco Fong, Amit Roy-Chowdhury, and Mita Desai. A large-scale benchmark dataset for event recognition in surveillance video. In CVPR 2011. 3153--3160.

Digital Library

[29]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates Inc., Red Hook, NY, USA.

Digital Library

[30]

Rui Qian, Yeqing Li, Liangzhe Yuan, Boqing Gong, Ting Liu, Matthew Brown, Serge J. Belongie, Ming-Hsuan Yang, Hartwig Adam, and Yin Cui. Exploring Temporal Granularity in Self-Supervised Video Representation Learning. ArXiv abs/2112.04480 (2021).

[31]

Rui Qian, Tianjian Meng, Boqing Gong, Ming-Hsuan Yang, Huisheng Wang, Serge Belongie, and Yin Cui. Spatiotemporal Contrastive Video Representation Learning. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Jun 2021).

[32]

Xukan Ran, Haolianz Chen, Xiaodan Zhu, Zhenming Liu, and Jiasi Chen. DeepDecision: A Mobile Deep Learning Framework for Edge Video Analytics. In IEEE INFOCOM 2018 - IEEE Conference on Computer Communications. 1421--1429.

Digital Library

[33]

Adrià Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Patraucean, Florent Altch'e, Michael Valko, Jean-Bastien Grill, Aäron van den Oord, and Andrew Zisserman. Broaden Your Views for Self-Supervised Video Learning. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), 1235--1245.

[34]

Amazon Web Services. Amazon Elastic Compute Cloud User Guide. https://docs.aws.amazon.com/pdfs/AWSEC2/latest/UserGuide/ec2-ug.pdf [Online; accessed 28-March-2023].

[35]

Gilad Sharir, Asaf Noy, and Lihi Zelnik-Manor. An image is worth 16×16 words, what is a video worth? arXiv preprint arXiv:2103.13915 (2021).

[36]

Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of big data 6, 1 (2019), 1--48.

[37]

Lucas Smaira, João Carreira, Eric Noland, Ellen Clancy, Amy Wu, and Andrew Zisserman. A short note on the kinetics-700-2020 human action dataset. arXiv preprint arXiv:2010.10864 (2020).

[38]

Zhan Tong, Yibing Song, Jue Wang, and Limin Wang. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. arXiv preprint arXiv:2203.12602 (2022).

[39]

Du Tran, Heng Wang, Lorenzo Torresani, and Matt Feiszli. Video classification with channel-separated convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5552--5561.

[40]

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-Local Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]

Yi Wang, Kunchang Li, Yizhuo Li, Yinan He, Bingkun Huang, Zhiyu Zhao, Hongjie Zhang, Jilan Xu, Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Jiashuo Yu, Yali Wang, Limin Wang, and Yu Qiao. InternVideo: General Video Foundation Models via Generative and Discriminative Learning. arXiv preprint arXiv:2212.03191 (2022).

[42]

Hyunho Yeo, Chan Ju Chong, Youngmok Jung, Juncheol Ye, and Dongsu Han. NEMO: Enabling Neural-Enhanced Video Streaming on Commodity Mobile Devices. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking (London, United Kingdom) (MobiCom '20). Article 28, 14 pages.

Digital Library

[43]

Hyunho Yeo, Hwijoon Lim, Jaehong Kim, Youngmok Jung, Juncheol Ye, and Dongsu Han. NeuroScaler: neural video enhancement at scale. In Proceedings of the ACM SIGCOMM 2022 Conference. 795--811.

[44]

Mohammadreza Zolfaghari, Kamaljeet Singh, and Thomas Brox. Eco: Efficient convolutional network for online video understanding. In Proceedings of the European conference on computer vision (ECCV). 695--712.

Index Terms

SAND: A Storage Abstraction for Video-based Deep Learning
1. Information systems
  1. Information storage systems
    1. Storage management
      1. Hierarchical storage management

Recommendations

NeSSA: Near-Storage Data Selection for Accelerated Machine Learning Training
HotStorage '23: Proceedings of the 15th ACM Workshop on Hot Topics in Storage and File Systems

Large-scale machine learning (ML) models rely on extremely large datasets to learn their exponentially growing number of parameters. While these models achieve unprecedented success, the increase in training time and hardware resources required is ...
Accelerating Deep Learning Training: A Storage Perspective
Computational Storage for an Energy-Efficient Deep Neural Network Training System
Euro-Par 2023: Parallel Processing
Abstract
Near-storage data processing and computational storage have recently received considerable attention from the industry as energy- and cost-efficient ways to improve system performance. This paper introduces a computational-storage solution to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HotStorage '23: Proceedings of the 15th ACM Workshop on Hot Topics in Storage and File Systems

July 2023

131 pages

ISBN:9798400702242

DOI:10.1145/3599691

General Chairs:
Ali Anwar
University of Minnesota
,
Ningfang Mi
Northeastern University
,
Program Chairs:
Vasily Tarasov
IBM Research
,
Yiying Zhang
University of California, San Diego

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

In-Cooperation

USENIX

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HotStorage '23

Sponsor:

SIGOPS

HotStorage '23: 15th ACM Workshop on Hot Topics in Storage and File Systems

July 9, 2023

MA, Boston, USA

Acceptance Rates

Overall Acceptance Rate 34 of 87 submissions, 39%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
360
Total Downloads

Downloads (Last 12 months)180
Downloads (Last 6 weeks)4

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten