research-article

E²-VOR: An End-to-End En/Decoder Architecture for Efficient Video Object Recognition

Authors:

Zhuoran Song,

Naifeng Jing,

Xiaoyao LiangAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems, Volume 28, Issue 1

Article No.: 10, Pages 1 - 21

https://doi.org/10.1145/3543852

Published: 10 December 2022 Publication History

Get Access

Abstract

High-resolution video object recognition (VOR) evolves so fast but is very compute-intensive. This is because VOR leverages compute-intensive deep neural network (DNN) for better accuracy. Although many works have been proposed for speedup, they mostly focus on DNN algorithm and hardware acceleration on the edge side. We observe that most video streams need to be losslessly compressed before going online and an encoder should have all the video information. Moreover, as the cloud should have abundant computing power to handle sophisticated VOR algorithms, we propose to take a one-shot effort for a modified VOR algorithm at the encoding stage in cloud and integrate the full VOR regeneration into a slightly extended decoder on the device. The scheme can enable lightweight VOR with server-class accuracy by simply leveraging the classic and economic video decoder universal to any mobile device. Meanwhile, the scheme can save massive computing power for not repetitively processing the same video on different user devices that makes it extremely sustainable for green computing across the whole network.

We propose E²-VOR, an end-to-end encoder and decoder architecture for efficient VOR. We carefully design the scheme to have minimum impact on the video bitstream transmitted. In the cloud, the VOR extended video encoder tracks on a macro-block basis and packs intelligent information into the video stream for increased VOR accuracy and fast regenerating process. On the edge device, we extend the traditional video decoder with a small piece of dedicated hardware to enable the efficient VOR regeneration. Our experiment shows that E²-VOR can achieve 5.0× performance improvement with less than 0.4% VOR accuracy loss compared to the state-of-the-art FAVOS scheme. On average, E²-VOR can run over 54 frames-per-second (FPS) for 480P videos on an edge device.

References

[1]

Rangarajan Aravind, M. Reha Civanlar, and Amy R. Reibman. 1996. Packet loss resilience of MPEG-2 scalable video coding algorithms. IEEE Transactions on Circuits and Systems for Video Technology 6, 5 (1996), 426–435.

Abstract

References

Cited By

Index Terms

Recommendations

In-Datacenter Performance Analysis of a Tensor Processing Unit

End-to-end rate-distortion optimized MD mode selection for multiple description video coding

SSIM-Based end-to-end distortion modeling for H.264 video coding

Comments

Information

Published In

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Full Text

HTML Format

Share

Share this Publication link

Share on social media

Affiliations