skip to main content
10.1145/3664647.3681357acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Asymmetric Event-Guided Video Super-Resolution

Published: 28 October 2024 Publication History

Abstract

Event cameras are novel bio-inspired cameras that record asynchronous events with high temporal resolution and dynamic range. Leveraging the auxiliary temporal information recorded by event cameras holds great promise for the task of video super-resolution (VSR). However, existing event-guided VSR methods assume that the event and RGB cameras are strictly calibrated (e.g., pixel-level sensor designs in DAVIS 240/346). This assumption proves limiting in emerging high-resolution devices, such as dual-lens smartphones and unmanned aerial vehicles, where such precise calibration is typically unavailable. To unlock more event-guided application scenarios, we perform the task of asymmetric event-guided VSR for the first time, and we propose an Asymmetric Event-guided VSR Network (AsEVSRN) for this new task. AsEVSRN incorporates two specialized designs for leveraging the asymmetric event stream in VSR. Firstly, the content hallucination module dynamically enhances event and RGB information by exploiting their complementary nature, thereby adaptively boosting representational capacity. Secondly, the event-enhanced bidirectional recurrent cells align and propagate temporal features fused with features from content-hallucinated frames. Within the bidirectional recurrent cells, event-enhanced flow is employed to simultaneously utilize and fuse temporal information at both the feature and pixel levels. Comprehensive experimental results affirm that our method consistently generates superior quantitative and qualitative results.

References

[1]
Andreas Aakerberg, Kamal Nasrollahi, and Thomas B Moeslund. 2022. Real-world super-resolution of face-images from surveillance cameras. IET Image Processing, Vol. 16, 2 (2022), 442--452.
[2]
Philip Lenz Andreas Geiger and Raquel Urtasun. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. In CVPR.
[3]
Jiezhang Cao, Yawei Li, Kai Zhang, and Luc Van Gool. 2021. Video Super-Resolution Transformer. arXiv preprint arXiv:2106.06847 (2021).
[4]
Jiezhang Cao, Jingyun Liang, Kai Zhang, Yawei Li, Yulun Zhang, Wenguan Wang, and Luc Van Gool. 2022. Reference-based Image Super-Resolution with Deformable Attention Transformer. In ECCV.
[5]
Jiezhang Cao, Jingyun Liang, Kai Zhang, Wenguan Wang, Qin Wang, Yulun Zhang, Hao Tang, and Luc Van Gool. 2022. Towards interpretable video super-resolution via alternating optimization. In ECCV. 393--411.
[6]
Kelvin CK Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. 2021. Basicvsr: The search for essential components in video super-resolution and beyond. In CVPR. 4947--4956.
[7]
Kelvin CK Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy. 2022. BasicVSR: Improving video super-resolution with enhanced propagation and alignment. In CVPR. 5972--5981.
[8]
Michel Deudon, Alfredo Kalaitzis, Israel Goytom, Md Rifat Arefin, Zhichao Lin, Kris Sankaran, Vincent Michalski, Samira E Kahou, Julien Cornebise, and Yoshua Bengio. 2020. Highres-net: Recursive fusion for multi-frame super-resolution of satellite imagery. arXiv preprint arXiv:2002.06460 (2020).
[9]
Xin Ding, Tsuyoshi Takatani, Zhongyuan Wang, Ying Fu, and Yinqiang Zheng. 2022. Event-guided Video Clip Generation from Blurry Images. In ACM MM. 2672--2680.
[10]
Julius Erbach, Stepan Tulyakov, Patricia Vitoria, Alfredo Bochicchio, and Yuanyou Li. 2023. EvShutter: Transforming Events for Unconstrained Rolling Shutter Correction. In CVPR. 13904--13913.
[11]
Mathias Gehrig, Willem Aarents, Daniel Gehrig, and Davide Scaramuzza. 2021. Dsec: A stereo event camera dataset for driving scenarios. IEEE Robotics and Automation Letters, Vol. 6, 3 (2021), 4947--4954.
[12]
Tomio Goto, Takafumi Fukuoka, Fumiya Nagashima, Satoshi Hirano, and Masaru Sakurai. 2014. Super-resolution System for 4K-HDTV. In ICCV. 4453--4458.
[13]
Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. 2019. Recurrent back-projection network for video super-resolution. In CVPR. 3897--3906.
[14]
Zhewei Huang, Tianyuan Zhang, Wen Heng, Boxin Shi, and Shuchang Zhou. 2022. Real-time intermediate flow estimation for video frame interpolation. In ECCV. 624--642.
[15]
Takashi Isobe, Xu Jia, Shuhang Gu, Songjiang Li, Shengjin Wang, and Qi Tian. 2020. Video super-resolution with recurrent structure-detail network. In ECCV. 645--660.
[16]
Takashi Isobe, Xu Jia, Xin Tao, Changlin Li, Ruihuang Li, Yongjie Shi, Jing Mu, Huchuan Lu, and Yu-Wing Tai. 2022. Look back and forth: Video super-resolution with explicit temporal difference modeling. In CVPR. 17411--17420.
[17]
Yu Jiang, Yuehang Wang, Siqi Li, Yongji Zhang, Minghao Zhao, and Yue Gao. 2023. Event-based Low-illumination Image Enhancement. IEEE Transactions on Multimedia (2023).
[18]
Zhe Jiang, Yu Zhang, Dongqing Zou, Jimmy Ren, Jiancheng Lv, and Yebin Liu. 2020. Learning event-based motion deblurring. In CVPR. 3320--3329.
[19]
Yongcheng Jing, Yiding Yang, Xinchao Wang, Mingli Song, and Dacheng Tao. 2021. Turning frequency to resolution: Video super-resolution via event cameras. In CVPR. 7772--7781.
[20]
Younghyun Jo, Seoung Wug Oh, Jaeyeon Kang, and Seon Joo Kim. 2018. Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In CVPR. 3224--3232.
[21]
Dachun Kai, Jiayao Lu, Yueyi Zhang, and Xiaoyan Sun. 2024. EvTexture: Event-driven Texture Enhancement for Video Super-Resolution. In ICML.
[22]
Dachun Kai, Yueyi Zhang, and Xiaoyan Sun. 2023. Video Super-Resolution Via Event-Driven Temporal Alignment. In ICIP. 2950--2954.
[23]
Armin Kappeler, Seunghwan Yoo, Qiqin Dai, and Aggelos K Katsaggelos. 2016. Video super-resolution with convolutional neural networks. IEEE transactions on computational imaging, Vol. 2, 2 (2016), 109--122.
[24]
Soo Ye Kim, Jeongyeon Lim, Taeyoung Na, and Munchurl Kim. 2018. 3DSRnet: Video super-resolution using 3d convolutional neural networks. arXiv preprint arXiv:1812.09079 (2018).
[25]
Tae Hyun Kim, Mehdi SM Sajjadi, Michael Hirsch, and Bernhard Scholkopf. 2018. Spatio-temporal transformer network for video restoration. In ECCV. 106--122.
[26]
Zeqiang Lai, Ying Fu, and Jun Zhang. 2024. Hyperspectral Image Super Resolution With Real Unaligned RGB Guidance. IEEE Transactions on Neural Networks and Learning Systems (2024).
[27]
Junyong Lee, Myeonghee Lee, Sunghyun Cho, and Seungyong Lee. 2022. Reference-based video super-resolution using multi-camera video triplets. In CVPR. 17824--17833.
[28]
Fei Li, Linfeng Zhang, Zikun Liu, Juan Lei, and Zhenbo Li. 2023. Multi-frequency representation enhancement with privilege information for video super-resolution. In ICCV. 12814--12825.
[29]
Sheng Li, Fengxiang He, Bo Du, Lefei Zhang, Yonghao Xu, and Dacheng Tao. 2019. Fast spatio-temporal residual network for video super-resolution. In CVPR. 10522--10531.
[30]
Wenyi Lian and Wenjing Lian. 2022. Sliding window recurrent network for efficient video super-resolution. In ECCV. 591--601.
[31]
Jingyun Liang, Jiezhang Cao, Yuchen Fan, Kai Zhang, Rakesh Ranjan, Yawei Li, Radu Timofte, and Luc Van Gool. 2022. Vrt: A video restoration transformer. arXiv preprint arXiv:2201.12288 (2022).
[32]
Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. 2021. Swinir: Image restoration using swin transformer. In ICCV. 1833--1844.
[33]
Guixu Lin, Jin Han, Mingdeng Cao, Zhihang Zhong, and Yinqiang Zheng. 2023. Event-guided Frame Interpolation and Dynamic Range Expansion of Single Rolling Shutter Image. In ACM MM. 3078--3088.
[34]
Chengxu Liu, Huan Yang, Jianlong Fu, and Xueming Qian. 2022. Learning Trajectory-Aware Transformer for Video Super-Resolution. In CVPR. 5687--5696.
[35]
Yunfan Lu, Guoqiang Liang, and Lin Wang. 2023. Self-supervised Learning of Event-guided Video Frame Interpolation for Rolling Shutter Frames. arXiv preprint arXiv:2306.15507 (2023).
[36]
Yunfan Lu, Zipeng Wang, Minjie Liu, Hongjian Wang, and Lin Wang. 2023. Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution. In CVPR. 1557--1567.
[37]
Zhihe Lu, Zeyu Xiao, Jiawang Bai, Zhiwei Xiong, and Xinchao Wang. 2023. Can SAM Boost Video Super-Resolution? arXiv preprint arXiv:2305.06524 (2023).
[38]
Yimin Luo, Liguo Zhou, Shu Wang, and Zhongyuan Wang. 2017. Video satellite imagery super resolution via convolutional neural networks. IEEE Geoscience and Remote Sensing Letters, Vol. 14, 12 (2017), 2398--2402.
[39]
Moritz Menze and Andreas Geiger. 2015. Object scene flow for autonomous vehicles. In CVPR.
[40]
Nico Messikommer, Stamatios Georgoulis, Daniel Gehrig, Stepan Tulyakov, Julius Erbach, Alfredo Bochicchio, Yuanyou Li, and Davide Scaramuzza. 2022. Multi-bracket high dynamic range imaging with event cameras. In CVPR. 547--557.
[41]
Seungjun Nah, Sungyong Baik, Seokil Hong, Gyeongsik Moon, Sanghyun Son, Radu Timofte, and Kyoung Mu Lee. 2019. NTIRE 2019 challenge on video deblurring and super-resolution: Dataset and study. In CVPRW.
[42]
Genady Paikin, Yotam Ater, Roy Shaul, and Evgeny Soloveichik. 2021. Efi-net: Video frame interpolation from fusion of events and frames. In CVPR. 1291--1301.
[43]
Zhongwei Qiu, Huan Yang, Jianlong Fu, and Dongmei Fu. 2022. Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution. In ECCV.
[44]
Anurag Ranjan and Michael J Black. 2017. Optical flow estimation using a spatial pyramid network. In CVPR. 4161--4170.
[45]
Henri Rebecq, Daniel Gehrig, and Davide Scaramuzza. 2018. ESIM: an open event camera simulator. In Conference on robot learning. PMLR, 969--982.
[46]
Henri Rebecq, René Ranftl, Vladlen Koltun, and Davide Scaramuzza. 2019. High speed and high dynamic range video with an event camera. IEEE transactions on pattern analysis and machine intelligence, Vol. 43, 6 (2019), 1964--1980.
[47]
Viktor Rudnev, Vladislav Golyanik, Jiayi Wang, Hans-Peter Seidel, Franziska Mueller, Mohamed Elgharib, and Christian Theobalt. 2021. EventHands: Real-Time Neural 3D Hand Pose Estimation From an Event Stream. In ICCV. 12385--12395.
[48]
Mehdi SM Sajjadi, Raviteja Vemulapalli, and Matthew Brown. 2018. Frame-recurrent video super-resolution. In CVPR. 6626--6634.
[49]
Cedric Scheerlinck, Henri Rebecq, Timo Stoffregen, Nick Barnes, Robert Mahony, and Davide Scaramuzza. 2019. CED: Color event camera dataset. In CVPRW.
[50]
Wei Shang, Dongwei Ren, Dongqing Zou, Jimmy S Ren, Ping Luo, and Wangmeng Zuo. 2021. Bringing events into video deblurring with non-consecutively blurry frames. In ICCV. 4531--4540.
[51]
Shuwei Shi, Jinjin Gu, Liangbin Xie, Xintao Wang, Yujiu Yang, and Chao Dong. 2022. Rethinking alignment in video super-resolution transformers. NeurlPS, Vol. 35 (2022), 36081--36093.
[52]
Lei Sun, Christos Sakaridis, Jingyun Liang, Qi Jiang, Kailun Yang, Peng Sun, Yaozu Ye, Kaiwei Wang, and Luc Van Gool. 2022. Event-based fusion for motion deblurring with cross-modal attention. In ECCV. 412--428.
[53]
Lei Sun, Christos Sakaridis, Jingyun Liang, Peng Sun, Jiezhang Cao, Kai Zhang, Qi Jiang, Kaiwei Wang, and Luc Van Gool. 2023. Event-Based Frame Interpolation with Ad-hoc Deblurring. In CVPR. 18043--18052.
[54]
Qi Tang, Yao Zhao, Meiqin Liu, Jian Jin, and Chao Yao. 2024. Semantic Lens: Instance-Centric Semantic Alignment for Video Super-resolution. In AAAI, Vol. 38. 5154--5161.
[55]
Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, and Jiaya Jia. 2017. Detail-revealing deep video super-resolution. In ICCV. 4472--4480.
[56]
Zachary Teed and Jia Deng. 2020. Raft: Recurrent all-pairs field transforms for optical flow. In ECCV.
[57]
Yapeng Tian, Yulun Zhang, Yun Fu, and Chenliang Xu. 2020. Tdan: Temporally-deformable alignment network for video super-resolution. In CVPR. 3360--3369.
[58]
Stepan Tulyakov, Alfredo Bochicchio, Daniel Gehrig, Stamatios Georgoulis, Yuanyou Li, and Davide Scaramuzza. 2022. Time lens: Event-based frame interpolation with parametric non-linear flow and multi-scale fusion. In CVPR. 17755--17764.
[59]
Stepan Tulyakov, Daniel Gehrig, Stamatios Georgoulis, Julius Erbach, Mathias Gehrig, Yuanyou Li, and Davide Scaramuzza. 2021. Time lens: Event-based video frame interpolation. In CVPR. 16155--16164.
[60]
Xintao Wang, Kelvin CK Chan, Ke Yu, Chao Dong, and Chen Change Loy. 2019. EDVR: Video restoration with enhanced deformable convolutional networks. In CVPRW.
[61]
Yangguang Wang, Xiang Zhang, Mingyuan Lin, Lei Yu, Boxin Shi, Wen Yang, and Gui-Song Xia. 2023. Self-Supervised Scene Dynamic Recovery from Rolling Shutter Images and Events. arXiv preprint arXiv:2304.06930 (2023).
[62]
Yi Xiao, Xin Su, Qiangqiang Yuan, Denghong Liu, Huanfeng Shen, and Liangpei Zhang. 2021. Satellite video super-resolution via multiscale deformable convolution alignment and temporal grouping projection. IEEE Transactions on Geoscience and Remote Sensing, Vol. 60 (2021), 1--19.
[63]
Yi Xiao, Qiangqiang Yuan, Jiang He, Qiang Zhang, Jing Sun, Xin Su, Jialian Wu, and Liangpei Zhang. 2022. Space-time super-resolution for satellite video: A joint framework based on multi-scale spatial-temporal transformer. International Journal of Applied Earth Observation and Geoinformation, Vol. 108 (2022), 102731.
[64]
Yi Xiao, Qiangqiang Yuan, Kui Jiang, Xianyu Jin, Jiang He, Liangpei Zhang, and Chia-wen Lin. 2023. Local-Global Temporal Difference Learning for Satellite Video Super-Resolution. arXiv preprint arXiv:2304.04421 (2023).
[65]
Zeyu Xiao, Zhen Cheng, and Zhiwei Xiong. 2023. Space-time super-resolution for light field videos. IEEE Transactions on Image Processing (2023).
[66]
Zeyu Xiao, Xueyang Fu, Jie Huang, Zhen Cheng, and Zhiwei Xiong. 2021. Space-time distillation for video super-resolution. In CVPR. 2113--2122.
[67]
Zeyu Xiao, Dachun Kai, Yueyi Zhang, Zheng-Jun Zha, Xiaoyan Sun, and Zhiwei Xiong. 2024. Event-Adapted Video Super-Resolution. In ECCV.
[68]
Zeyu Xiao, Wenming Weng, Yueyi Zhang, and Zhiwei Xiong. 2022. EVA2: Event-Assisted Video Frame Interpolation via Cross-Modal Alignment and Aggregation. IEEE Transactions on Computational Imaging, Vol. 8 (2022), 1145--1158.
[69]
Zeyu Xiao, Zhiwei Xiong, Xueyang Fu, Dong Liu, and Zheng-Jun Zha. 2020. Space-time video super-resolution using temporal profiles. In ACM MM. 664--672.
[70]
Kai Xu, Ziwei Yu, Xin Wang, Michael Bi Mi, and Angela Yao. 2023. An implicit alignment for video super-resolution. arXiv preprint arXiv:2305.00163 (2023).
[71]
Tianfan Xue, Baian Chen, Jiajun Wu, Donglai Wei, and William T Freeman. 2019. Video enhancement with task-oriented flow. International Journal of Computer Vision, Vol. 127 (2019), 1106--1125.
[72]
Yixin Yang, Jin Han, Jinxiu Liang, Imari Sato, and Boxin Shi. 2023. Learning event guided high dynamic range video reconstruction. In CVPR. 13924--13934.
[73]
Peng Yi, Zhongyuan Wang, Kui Jiang, Junjun Jiang, Tao Lu, Xin Tian, and Jiayi Ma. 2021. Omniscient video super-resolution. In ICCV. 4429--4438.
[74]
Huanjing Yue, Zhiming Zhang, and Jingyu Yang. 2022. Real-RawVSR: Real-World Raw Video Super-Resolution with a Benchmark Dataset. In ECCV. 608--624.
[75]
Yanhong Zeng, Huan Yang, Hongyang Chao, Jianbo Wang, and Jianlong Fu. 2021. Improving visual quality of image synthesis by a token-based generator with transformers. NeurlPS, Vol. 34 (2021), 21125--21137.
[76]
Limeng Zhang, Hongguang Zhang, Jihua Chen, and Lei Wang. 2020. Hybrid deblur net: Deep non-uniform deblurring with event camera. IEEE Access, Vol. 8 (2020), 148075--148083.
[77]
Liangpei Zhang, Hongyan Zhang, Huanfeng Shen, and Pingxiang Li. 2010. A super-resolution reconstruction algorithm for surveillance images. Signal Processing, Vol. 90, 3 (2010), 848--859.
[78]
Xinyu Zhou, Peiqi Duan, Yi Ma, and Boxin Shi. 2022. EvUnroll: Neuromorphic events based rolling shutter image correction. In CVPR. 17775--17784.
[79]
Yupeng Zhou, Zhen Li, Chun-Le Guo, Song Bai, Ming-Ming Cheng, and Qibin Hou. 2023. Srformer: Permuted self-attention for single image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12780--12791.
[80]
Yunhao Zou, Yinqiang Zheng, Tsuyoshi Takatani, and Ying Fu. 2021. Learning to reconstruct high speed and high dynamic range videos from events. In CVPR. 2024--2033.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
October 2024
11719 pages
ISBN:9798400706868
DOI:10.1145/3664647
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. event camera
  2. stereo images
  3. video super-resolution

Qualifiers

  • Research-article

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 68
    Total Downloads
  • Downloads (Last 12 months)68
  • Downloads (Last 6 weeks)26
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media