research-article

You Only Look Once in Panorama: Object Detection for 360° Videos with MLaaS

Authors:

Jiangchuan LiuAuthors Info & Claims

NOSSDAV '24: Proceedings of the 34th edition of the Workshop on Network and Operating System Support for Digital Audio and Video

Pages 1 - 7

https://doi.org/10.1145/3651863.3651876

Published: 15 April 2024 Publication History

Abstract

360° videos are gaining popularity, but immersive analytics, particularly in object detection, confront challenges from complex scenes and high data volume. This imposes significant burdens on individual users and resource-limited edge devices. Fortunately, Machine Learning as a Service (MLaaS) offers an economical solution for quick deployment without specific hardware or expertise. However, current MLaaS are mostly 2D image-designated and not optimized for the distinctive characteristics of raw 360° video frames. In this paper, we propose a novel MLaaS-based system to address this challenge. Our solution partitions 360° frames into distortion-free 2D regions with dynamic region of interest prediction. We then present an image-stitching algorithm featuring Skyline representation, seamlessly combining all the 2D regions into a unified frame. This frame is then transmitted to the MLaaS platform, with the detected objects being back-projected to yield the final results. Our experiments demonstrate the superiority of this system over baselines, proving its effectiveness in 360° video object detection tasks.

References

[1]

Boto3. 2023. AWS SDK for Python (Boto3). Retrieved April 28, 2023 from https://aws.amazon.com/sdk-for-python/

[2]

Miao Cao, Satoshi Ikehata, and Kiyoharu Aizawa. 2022. Field-of-View IoU for Object Detection in 360° Images. arXiv e-prints (2022), arXiv-2202.

[3]

Lovish Chopra, Sarthak Chakraborty, Abhijit Mondal, and Sandip Chakraborty. 2021. Parima: Viewport adaptive 360-degree video streaming. In Proceedings of the Web Conference (WWW'21).

Digital Library

[4]

COCO. 2023. COCO: Detection Evaluation. Retrieved Sep 28, 2023 from https://cocodataset.org/#detection-eval

[5]

Benjamin Coors, Alexandru Paul Condurache, and Andreas Geiger. 2018. SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images. In Proceedings of the 15th European Conference on Computer Vision (ECCV'18).

Digital Library

[6]

Harold Scott Macdonald Coxeter. 1961. Introduction to geometry. (1961), 93 and 289--290.

[7]

Marc Eder, Mykhailo Shvets, John Lim, and Jan-Michael Frahm. 2020. Tangent Images for Mitigating Spherical Distortion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'20).

[8]

Ila Gokarn, Hemanth Sabbella, Yigong Hu, Tarek Abdelzaher, and Archan Misra. 2023. MOSAIC: Spatially-multiplexed edge AI optimization over multiple concurrent video sensing streams. In Proceedings of the 14th Conference on ACM Multimedia Systems (MMSys'23).

Digital Library

[9]

IMARC. 2023. 360-Degree Camera Market: Global Industry Trends, Share, Size, Growth, Opportunity and Forecast 2023-2028. Retrieved Sep 28, 2023 from https://www.imarcgroup.com/360-degree-camera-market

[10]

Junchen Jiang, Ganesh Ananthanarayanan, Peter Bodik, Siddhartha Sen, and Ion Stoica. 2018. Chameleon: Scalable Adaptation of Video Analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM'18).

Digital Library

[11]

Shanyang Jiang and Lan Zhang. 2022. Quality-aided Annotation Service Selection in MLaaS Market. In Proceedings of the IEEE/ACM 30th International Symposium on Quality of Service (IWQoS'22).

[12]

Glenn Jocher, Alex Stoken, Jirka Borovec, NanoCode012, ChristopherSTAN, Liu Changyu, Laughing, tkianai, Adam Hogan, lorenzomammana, yxNONG, AlexWang1900, Laurentiu Diaconu, Marc, wanghaoyang0106, ml5ah, Doug, Francisco Ingham, Frederik, Guilhen, Hatovix, Jake Poznanski, Jiacong Fang, Lijun Yu, changyu98, Mingyu Wang, Naman Gupta, Osama Akhtar, PetrDvoracek, and Prashant Rai. 2020. ultralytics/yolov5: v3.1 - Bug Fixes and Performance Improvements.

[13]

Sudarshan Lamkhede, Praveen Chandar, Vladan Radosavljevic, Amit Goyal, and Lan Luo. 2023. Machine Learning for Streaming Media. In Companion Proceedings of the ACM Web Conference (WWW'23 Companion).

[14]

Jiaxi Li, Jingwei Liao, Bo Chen, Anh Nguyen, Aditi Tiwari, Qian Zhou, Zhisheng Yan, and Klara Nahrstedt. 2023. Latency-Aware 360-Degree Video Analytics Framework for First Responders Situational Awareness. In Proceedings of the 33rd Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV'23).

Digital Library

[15]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and Larry Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Proceedings of the 13th European Conference on Computer Vision (ECCV'14).

[16]

Andrea Lodi, Silvano Martello, and Michele Monaci. 2002. Two-dimensional packing problems: A survey. European journal of operational research 141, 2 (2002), 241--252.

[17]

Yixiang Mao, Liyang Sun, Yong Liu, and Yao Wang. 2020. Low-Latency FoV-Adaptive Coding and Streaming for Interactive 360° Video Streaming. In Proceedings of the 28th ACM International Conference on Multimedia (MM'20).

Digital Library

[18]

A. Neubeck and L. Van Gool. 2006. Efficient Non-Maximum Suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR'06).

[19]

OpenCV. 2023. OpenCV: Open source computer vision library. Retrieved April 28, 2023 from https://opencv.org/

[20]

Feng Qian, Bo Han, Qingyang Xiao, and Vijay Gopalakrishnan. 2018. Flare: Practical Viewport-Adaptive 360-Degree Video Streaming for Mobile Devices. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom'18).

Digital Library

[21]

Yu-Chuan Su and Kristen Grauman. 2017. Learning Spherical Convolution for Fast Features from 360° Imagery. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17).

Digital Library

[22]

Kuan-Hsun Wang and Shang-Hong Lai. 2019. Object Detection in Curved Space for 360-Degree Camera. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'19).

[23]

Lijun Wei, Wee-Chong Oon, Wenbin Zhu, and Andrew Lim. 2011. A skyline heuristic for the 2D rectangular packing and strip packing problems. European Journal of Operational Research 215, 2 (2011), 337--346.

[24]

Qizhen Weng, Wencong Xiao, Yinghao Yu, Wei Wang, Cheng Wang, Jian He, Yong Li, Liping Zhang, Wei Lin, and Yu Ding. 2022. MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI'22).

[25]

Shuzhao Xie, Yuan Xue, Yifei Zhu, and Zhi Wang. 2022. Cost Effective MLaaS Federation: A Combinatorial Reinforcement Learning Approach. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM'22).

Digital Library

[26]

Wenyan Yang, Yanlin Qian, Joni-Kristian Kämäräinen, Francesco Cricri, and Lixin Fan. 2018. Object Detection in Equirectangular Panorama. In Proceedings of the 24th International Conference on Pattern Recognition (ICPR'18). 2190--2195.

[27]

Heeseung Yun, Sehun Lee, and Gunhee Kim. 2022. Panoramic Vision Transformer For Saliency Detection In 360° Videos. In Proceedings of the 17th European Conference on Computer Vision (ECCV'22).

[28]

Ilwi Yun, Hyuk-Jae Lee, and Chae Eun Rhee. 2022. Improving 360 monocular depth estimation via non-local dense prediction transformer and joint supervised and self-supervised learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI'22).

[29]

Haoyu Zhang, Ganesh Ananthanarayanan, Peter Bodik, Matthai Philipose, Paramvir Bahl, and Michael J. Freedman. 2017. Live Video Analytics at Scale with Approximation and Delay-Tolerance. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI'17).

[30]

Miao Zhang, Yifei Zhu, Linfeng Shen, Fangxin Wang, and Jiangchuan Liu. 2023. OmniSense: Towards Edge-Assisted Online Analytics for 360-Degree Videos. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM'23).

[31]

Qiao Zhang, Tao Xiang, Yifei Cai, Zhichao Zhao, Ning Wang, and Hongyi Wu. 2022. Privacy-Preserving Machine Learning as a Service: Challenges and Opportunities. IEEE Network (2022).

[32]

Yuanxing Zhang, Pengyu Zhao, Kaigui Bian, Yunxin Liu, Lingyang Song, and Xiaoming Li. 2019. DRL360: 360-degree Video Streaming with Deep Reinforcement Learning. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM'19).

Digital Library

[33]

Pengyu Zhao, Ansheng You, Yuanxing Zhang, Jiaying Liu, Kaigui Bian, and Yunhai Tong. 2020. Spherical Criteria for Fast and Accurate 360° Object Detection. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI'20).

Index Terms

You Only Look Once in Panorama: Object Detection for 360° Videos with MLaaS
1. Information systems
  1. Information systems applications
    1. Multimedia information systems
2. Networks
  1. Network services
    1. Cloud computing

Recommendations

Object classification using a local texture descriptor and a support vector machine

Objects classification or object detection is one of the most challenging tasks in computer vision. Digital images taken of real-life scenes capture objects at different positions, rotations and scales. Furthermore, variations in lighting, shape, color ...
Object detection and matching in a mixed network of fixed and mobile cameras
AREA '08: Proceedings of the 1st ACM workshop on Analysis and retrieval of events/actions and workflows in video streams

This work tackles the challenge of detecting and matching objects in scenes observed simultaneously by fixed and mobile cameras. No calibration between the cameras is needed, and no training data is used. A fully automated system is presented to detect ...
A novel shape-based non-redundant local binary pattern descriptor for object detection

Motivated by the discriminative ability of shape information and local patterns in object recognition, this paper proposes a window-based object descriptor that integrates both cues. In particular, contour templates representing object shape are used to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

NOSSDAV '24: Proceedings of the 34th edition of the Workshop on Network and Operating System Support for Digital Audio and Video

April 2024

77 pages

ISBN:9798400706134

DOI:10.1145/3651863

Program Chairs:
Amr Rizk
University of Duisburg-Essen, Germany
,
Maria Torres Vega
Katholieke Universiteit Leuven, Belgium

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 April 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

NOSSDAV '24

Sponsor:

SIGMM

NOSSDAV '24: 34th edition of the Workshop on Network and Operating System Support for Digital Audio and Video

April 15 - 18, 2024

Bari, Italy

Acceptance Rates

Overall Acceptance Rate 118 of 363 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
178
Total Downloads

Downloads (Last 12 months)178
Downloads (Last 6 weeks)10

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten