Abstract
Introducing event cameras into video super-resolution (VSR) shows great promise. In practice, however, integrating event data as a new modality necessitates a laborious model architecture design. This not only consumes substantial time and effort but also disregards valuable insights from successful existing VSR models. Furthermore, the resource-intensive process of retraining these newly designed models exacerbates the challenge. In this paper, inspired by the recent success of parameter-efficient tuning in reducing the number of trainable parameters of a pre-trained model for downstream tasks, we introduce the Event AdapTER (EATER) for VSR. EATER efficiently utilizes knowledge of VSR models at the feature level through two lightweight and trainable components: the event-adapted alignment (EAA) unit and the event-adapted fusion (EAF) unit. The EAA unit aligns multiple frames based on the event stream in a coarse-to-fine manner, while the EAF unit efficiently fuses frames with the event stream through a multi-scale design. Thanks to both units, EATER outperforms the full fine-tuning approach with parameter efficiency, as demonstrated by comprehensive experiments.
Z. Xiao and D. Kai—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aakerberg, A., Nasrollahi, K., Moeslund, T.B.: Real-world super-resolution of face-images from surveillance cameras. IET Image Process. 16(2), 442–452 (2022)
Cao, J., Li, Y., Zhang, K., Van Gool, L.: Video super-resolution transformer. arXiv preprint arXiv:2106.06847 (2021)
Cao, J., et al.: Towards interpretable video super-resolution via alternating optimization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13678, pp. 393–411. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_23
Chan, K.C., Wang, X., Yu, K., Dong, C., Loy, C.C.: BasicVSR: the search for essential components in video super-resolution and beyond. In: CVPR, pp. 4947–4956 (2021)
Chan, K.C., Zhou, S., Xu, X., Loy, C.C.: BasicVSR++: improving video super-resolution with enhanced propagation and alignment. In: CVPR, pp. 5972–5981 (2022)
Chan, K.C., Zhou, S., Xu, X., Loy, C.C.: Investigating tradeoffs in real-world video super-resolution. In: CVPR, pp. 5962–5971 (2022)
Chen, S., et al.: AdaptFormer: adapting vision transformers for scalable visual recognition. In: Advances in Neural Information Processing Systems, vol. 35, pp. 16664–16678 (2022)
Chen, Z., et al.: Vision transformer adapter for dense predictions. arXiv preprint arXiv:2205.08534 (2022)
Deudon, M., et al.: HighRes-net: recursive fusion for multi-frame super-resolution of satellite imagery. arXiv preprint arXiv:2002.06460 (2020)
Erbach, J., Tulyakov, S., Vitoria, P., Bochicchio, A., Li, Y.: EvShutter: transforming events for unconstrained rolling shutter correction. In: CVPR, pp. 13904–13913 (2023)
Fuoli, D., Gu, S., Timofte, R.: Efficient video super-resolution through recurrent latent space propagation. In: ICCVW, pp. 3476–3485. IEEE (2019)
Gallego, G., et al.: Event-based vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(1), 154–180 (2020)
Gao, P., et al.: CLIP-Adapter: better vision-language models with feature adapters. Int. J. Comput. Vis. 132, 581–595 (2024)
Goto, T., Fukuoka, T., Nagashima, F., Hirano, S., Sakurai, M.: Super-resolution system for 4K-HDTV. In: ICPR, pp. 4453–4458. IEEE (2014)
Guo, D., Rush, A., Kim, Y.: Parameter-efficient transfer learning with diff pruning. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 4884–4896. Association for Computational Linguistics (2021)
Haris, M., Shakhnarovich, G., Ukita, N.: Recurrent back-projection network for video super-resolution. In: CVPR, pp. 3897–3906 (2019)
Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: ICML, pp. 2790–2799. PMLR (2019)
Huang, Y., Wang, W., Wang, L.: Bidirectional recurrent convolutional networks for multi-frame super-resolution. In: NeurlPS, vol. 28 (2015)
Huang, Y., Wang, W., Wang, L.: Video super-resolution via bidirectional recurrent convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 1015–1028 (2017)
Isobe, T., Jia, X., Gu, S., Li, S., Wang, S., Tian, Q.: Video super-resolution with recurrent structure-detail network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 645–660. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_38
Isobe, T., et al.: Look back and forth: video super-resolution with explicit temporal difference modeling. In: CVPR, pp. 17411–17420 (2022)
Jiang, Y., Wang, Y., Li, S., Zhang, Y., Zhao, M., Gao, Y.: Event-based low-illumination image enhancement. IEEE Trans. Multimedia 26, 1920–1931 (2023)
Jiang, Z., Zhang, Y., Zou, D., Ren, J., Lv, J., Liu, Y.: Learning event-based motion deblurring. In: CVPR, pp. 3320–3329 (2020)
Jing, Y., Yang, Y., Wang, X., Song, M., Tao, D.: Turning frequency to resolution: video super-resolution via event cameras. In: CVPR, pp. 7772–7781 (2021)
Jo, Y., Oh, S.W., Kang, J., Kim, S.J.: Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In: CVPR, pp. 3224–3232 (2018)
Kai, D., Lu, J., Zhang, Y., Sun, X.: EvTexture: event-driven texture enhancement for video super-resolution. In: ICML. PMLR (2024)
Kai, D., Zhang, Y., Sun, X.: Video super-resolution via event-driven temporal alignment. In: ICIP, pp. 2950–2954. IEEE (2023)
Kappeler, A., Yoo, S., Dai, Q., Katsaggelos, A.K.: Video super-resolution with convolutional neural networks. IEEE Trans. Comput. Imaging 2(2), 109–122 (2016)
Kim, S.Y., Lim, J., Na, T., Kim, M.: 3DSRnet: video super-resolution using 3D convolutional neural networks. arXiv preprint arXiv:1812.09079 (2018)
Kim, T.H., Sajjadi, M.S., Hirsch, M., Scholkopf, B.: Spatio-temporal transformer network for video restoration. In: ECCV, pp. 106–122 (2018)
Lee, J., Lee, M., Cho, S., Lee, S.: Reference-based video super-resolution using multi-camera video triplets. In: CVPR, pp. 17824–17833 (2022)
Li, F., Zhang, L., Liu, Z., Lei, J., Li, Z.: Multi-frequency representation enhancement with privilege information for video super-resolution. In: ICCV, pp. 12814–12825 (2023)
Li, S., He, F., Du, B., Zhang, L., Xu, Y., Tao, D.: Fast spatio-temporal residual network for video super-resolution. In: CVPR, pp. 10522–10531 (2019)
Li, W., Tao, X., Guo, T., Qi, L., Lu, J., Jia, J.: MuCAN: multi-correspondence aggregation network for video super-resolution. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 335–351. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_20
Li, Z., Chen, X., Pun, C.M., Cun, X.: High-resolution document shadow removal via a large-scale real-world dataset and a frequency-aware shadow erasing net. In: ICCV, pp. 12449–12458 (2023)
Li, Z., Chen, X., Wang, S., Pun, C.M.: A large-scale film style dataset for learning multi-frequency driven film enhancement. In: Elkind, E. (ed.) IJCAI, pp. 1160–1168 (2023). Main Track
Lian, W., Lian, W.: Sliding window recurrent network for efficient video super-resolution. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) ECCV 2022. LNCS, vol. 13802, pp. 591–601. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25063-7_37
Liang, J., et al.: VRT: a video restoration transformer. arXiv preprint arXiv:2201.12288 (2022)
Liu, C., Sun, D.: On Bayesian adaptive video super resolution. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 346–360 (2013)
Liu, C., Yang, H., Fu, J., Qian, X.: Learning trajectory-aware transformer for video super-resolution. In: CVPR, pp. 5687–5696 (2022)
Lu, Y., Liang, G., Wang, L.: Self-supervised learning of event-guided video frame interpolation for rolling shutter frames. arXiv preprint arXiv:2306.15507 (2023)
Lu, Y., Wang, Z., Liu, M., Wang, H., Wang, L.: Learning spatial-temporal implicit neural representations for event-guided video super-resolution. In: CVPR, pp. 1557–1567 (2023)
Lu, Z., Xiao, Z., Bai, J., Xiong, Z., Wang, X.: Can SAM boost video super-resolution? arXiv preprint arXiv:2305.06524 (2023)
Luo, Y., Zhou, L., Wang, S., Wang, Z.: Video satellite imagery super resolution via convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 14(12), 2398–2402 (2017)
Messikommer, N., et al.: Multi-bracket high dynamic range imaging with event cameras. In: CVPR, pp. 547–557 (2022)
Nah, S., et al.: NTIRE 2019 challenge on video deblurring and super-resolution: dataset and study. In: CVPRW (2019)
Nah, S., et al.: NTIRE 2019 challenge on video deblurring: methods and results. In: CVPRW (2019)
Nah, S., et al.: NTIRE 2019 challenge on video super-resolution: methods and results. In: CVPRW (2019)
Paikin, G., Ater, Y., Shaul, R., Soloveichik, E.: EFI-Net: video frame interpolation from fusion of events and frames. In: CVPR, pp. 1291–1301 (2021)
Qiu, Z., Yang, H., Fu, J., Fu, D.: Learning spatiotemporal frequency-transformer for compressed video super-resolution. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13678, pp. 257–273. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_15
Rebecq, H., Ranftl, R., Koltun, V., Scaramuzza, D.: High speed and high dynamic range video with an event camera. IEEE Trans. Pattern Anal. Mach. Intell. 43(6), 1964–1980 (2019)
Sajjadi, M.S., Vemulapalli, R., Brown, M.: Frame-recurrent video super-resolution. In: CVPR, pp. 6626–6634 (2018)
Shang, W., Ren, D., Zou, D., Ren, J.S., Luo, P., Zuo, W.: Bringing events into video deblurring with non-consecutively blurry frames. In: ICCV, pp. 4531–4540 (2021)
Shi, S., Gu, J., Xie, L., Wang, X., Yang, Y., Dong, C.: Rethinking alignment in video super-resolution transformers. In: NeurlPS, vol. 35, pp. 36081–36093 (2022)
Sun, L., et al.: Event-based fusion for motion deblurring with cross-modal attention. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13678, pp. 412–428. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_24
Sun, L., et al.: Event-based frame interpolation with ad-hoc deblurring. In: CVPR, pp. 18043–18052 (2023)
Sun, S., Ren, W., Li, J., Zhang, K., Liang, M., Cao, X.: Event-aware video deraining via multi-patch progressive learning. IEEE Trans. Image Process. 32, 3040–3053 (2023)
Tao, X., Gao, H., Liao, R., Wang, J., Jia, J.: Detail-revealing deep video super-resolution. In: ICCV, pp. 4472–4480 (2017)
Tian, Y., Zhang, Y., Fu, Y., Xu, C.: TDAN: temporally-deformable alignment network for video super-resolution. In: CVPR, pp. 3360–3369 (2020)
Tulyakov, S., Bochicchio, A., Gehrig, D., Georgoulis, S., Li, Y., Scaramuzza, D.: Time Lens++: event-based frame interpolation with parametric non-linear flow and multi-scale fusion. In: CVPR, pp. 17755–17764 (2022)
Tulyakov, S., et al.: Time Lens: event-based video frame interpolation. In: CVPR, pp. 16155–16164 (2021)
Wang, B., Liu, B., Liu, S., Yang, F.: VCISR: blind single image super-resolution with video compression synthetic data. In: WACV, pp. 4302–4312 (2024)
Wang, B., Yang, F., Yu, X., Zhang, C., Zhao, H.: APISR: anime production inspired real-world anime super-resolution. In: CVPR, pp. 25574–25584 (2024)
Wang, J., Weng, W., Zhang, Y., Xiong, Z.: Unsupervised video deraining with an event camera. In: ICCV, pp. 10831–10840 (2023)
Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: CVPRW (2019)
Wang, X., Li, Y., Zhang, H., Shan, Y.: Towards real-world blind face restoration with generative facial prior. In: CVPR, pp. 9168–9178 (2021)
Wang, X., Yu, K., Dong, C., Loy, C.C.: Recovering realistic texture in image super-resolution by deep spatial feature transform. In: Proceedings of the CVPR, pp. 606–615 (2018)
Wang, Y., et al.: Self-supervised scene dynamic recovery from rolling shutter images and events. arXiv preprint arXiv:2304.06930 (2023)
Weng, W., Zhang, Y., Xiong, Z.: Event-based blurry frame interpolation under blind exposure. In: CVPR, pp. 1588–1598 (2023)
Xiao, Y., Su, X., Yuan, Q., Liu, D., Shen, H., Zhang, L.: Satellite video super-resolution via multiscale deformable convolution alignment and temporal grouping projection. IEEE Trans. Geosci. Remote Sens. 60, 1–19 (2021)
Xiao, Y., et al.: Space-time super-resolution for satellite video: a joint framework based on multi-scale spatial-temporal transformer. Int. J. Appl. Earth Obs. Geoinf. 108, 102731 (2022)
Xiao, Y., et al.: Local-global temporal difference learning for satellite video super-resolution. arXiv preprint arXiv:2304.04421 (2023)
Xiao, Z., Bai, J., Lu, Z., Xiong, Z.: A dive into SAM prior in image restoration. arXiv preprint arXiv:2305.13620 (2023)
Xiao, Z., Cheng, Z., Xiong, Z.: Space-time super-resolution for light field videos. IEEE Trans. Image Process. 32, 4785–4799 (2023)
Xiao, Z., Fu, X., Huang, J., Cheng, Z., Xiong, Z.: Space-time distillation for video super-resolution. In: CVPR, pp. 2113–2122 (2021)
Xiao, Z., Weng, W., Zhang, Y., Xiong, Z.: EVA2: event-assisted video frame interpolation via cross-modal alignment and aggregation. IEEE Trans. Comput. Imaging 8, 1145–1158 (2022)
Xiao, Z., Xiong, Z., Fu, X., Liu, D., Zha, Z.J.: Space-time video super-resolution using temporal profiles. In: ACM MM, pp. 664–672 (2020)
Xu, Z., Chen, Z., Zhang, Y., Song, Y., Wan, X., Li, G.: Bridging vision and language encoders: parameter-efficient tuning for referring image segmentation. In: ICCV, pp. 17503–17512 (2023)
Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. Int. J. Comput. Vis. 127, 1106–1125 (2019)
Yang, M., Liu, B., Wang, B., Kim, H.S.: Diffusion-aided joint source channel coding for high realism wireless image transmission. arXiv preprint arXiv:2404.17736 (2024)
Yang, W., Wu, J., Ma, J., Li, L., Dong, W., Shi, G.: Learning for motion deblurring with hybrid frames and events. In: ACM MM, pp. 1396–1404 (2022)
Yang, W., Wu, J., Ma, J., Li, L., Shi, G.: Motion deblurring via spatial-temporal collaboration of frames and events. In: AAAI, vol. 38, pp. 6531–6539 (2024)
Yang, Y., Han, J., Liang, J., Sato, I., Shi, B.: Learning event guided high dynamic range video reconstruction. In: CVPR, pp. 13924–13934 (2023)
Yi, P., et al.: Omniscient video super-resolution. In: ICCV, pp. 4429–4438 (2021)
Yu, T., Lu, Z., Jin, X., Chen, Z., Wang, X.: Task residual for tuning vision-language models. In: CVPR, pp. 10899–10909 (2023)
Yue, H., Zhang, Z., Yang, J.: Real-RawVSR: real-world raw video super-resolution with a benchmark dataset. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13666, pp. 608–624. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20068-7_35
Zeng, Y., Yang, H., Chao, H., Wang, J., Fu, J.: Improving visual quality of image synthesis by a token-based generator with transformers. In: NeurlPS, vol. 34, pp. 21125–21137 (2021)
Zhang, J.O., Sax, A., Zamir, A., Guibas, L., Malik, J.: Side-tuning: a baseline for network adaptation via additive side networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 698–714. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_41
Zhang, L., Zhang, H., Shen, H., Li, P.: A super-resolution reconstruction algorithm for surveillance images. Sig. Process. 90(3), 848–859 (2010)
Zhang, L., Zhang, H., Chen, J., Wang, L.: Hybrid deblur net: deep non-uniform deblurring with event camera. IEEE Access 8, 148075–148083 (2020)
Zheng, X., et al.: Deep learning for event-based vision: a comprehensive survey and benchmarks. arXiv preprint arXiv:2302.08890 (2023)
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. Int. J. Comput. Vis. 130(9), 2337–2348 (2022)
Zhou, X., Duan, P., Ma, Y., Shi, B.: EvUnroll: neuromorphic events based rolling shutter image correction. In: CVPR, pp. 17775–17784 (2022)
Zou, Y., Zheng, Y., Takatani, T., Fu, Y.: Learning to reconstruct high speed and high dynamic range videos from events. In: CVPR, pp. 2024–2033 (2021)
Acknowledgments
We acknowledge funding from National Natural Science Foundation of China under Grants 62131003 and 62021001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xiao, Z., Kai, D., Zhang, Y., Zha, ZJ., Sun, X., Xiong, Z. (2025). Event-Adapted Video Super-Resolution. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15100. Springer, Cham. https://doi.org/10.1007/978-3-031-72946-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-72946-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72945-4
Online ISBN: 978-3-031-72946-1
eBook Packages: Computer ScienceComputer Science (R0)