Skip to main content

Event-Adapted Video Super-Resolution

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15100))

Included in the following conference series:

  • 276 Accesses

Abstract

Introducing event cameras into video super-resolution (VSR) shows great promise. In practice, however, integrating event data as a new modality necessitates a laborious model architecture design. This not only consumes substantial time and effort but also disregards valuable insights from successful existing VSR models. Furthermore, the resource-intensive process of retraining these newly designed models exacerbates the challenge. In this paper, inspired by the recent success of parameter-efficient tuning in reducing the number of trainable parameters of a pre-trained model for downstream tasks, we introduce the Event AdapTER (EATER) for VSR. EATER efficiently utilizes knowledge of VSR models at the feature level through two lightweight and trainable components: the event-adapted alignment (EAA) unit and the event-adapted fusion (EAF) unit. The EAA unit aligns multiple frames based on the event stream in a coarse-to-fine manner, while the EAF unit efficiently fuses frames with the event stream through a multi-scale design. Thanks to both units, EATER outperforms the full fine-tuning approach with parameter efficiency, as demonstrated by comprehensive experiments.

Z. Xiao and D. Kai—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aakerberg, A., Nasrollahi, K., Moeslund, T.B.: Real-world super-resolution of face-images from surveillance cameras. IET Image Process. 16(2), 442–452 (2022)

    Article  Google Scholar 

  2. Cao, J., Li, Y., Zhang, K., Van Gool, L.: Video super-resolution transformer. arXiv preprint arXiv:2106.06847 (2021)

  3. Cao, J., et al.: Towards interpretable video super-resolution via alternating optimization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13678, pp. 393–411. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_23

    Chapter  Google Scholar 

  4. Chan, K.C., Wang, X., Yu, K., Dong, C., Loy, C.C.: BasicVSR: the search for essential components in video super-resolution and beyond. In: CVPR, pp. 4947–4956 (2021)

    Google Scholar 

  5. Chan, K.C., Zhou, S., Xu, X., Loy, C.C.: BasicVSR++: improving video super-resolution with enhanced propagation and alignment. In: CVPR, pp. 5972–5981 (2022)

    Google Scholar 

  6. Chan, K.C., Zhou, S., Xu, X., Loy, C.C.: Investigating tradeoffs in real-world video super-resolution. In: CVPR, pp. 5962–5971 (2022)

    Google Scholar 

  7. Chen, S., et al.: AdaptFormer: adapting vision transformers for scalable visual recognition. In: Advances in Neural Information Processing Systems, vol. 35, pp. 16664–16678 (2022)

    Google Scholar 

  8. Chen, Z., et al.: Vision transformer adapter for dense predictions. arXiv preprint arXiv:2205.08534 (2022)

  9. Deudon, M., et al.: HighRes-net: recursive fusion for multi-frame super-resolution of satellite imagery. arXiv preprint arXiv:2002.06460 (2020)

  10. Erbach, J., Tulyakov, S., Vitoria, P., Bochicchio, A., Li, Y.: EvShutter: transforming events for unconstrained rolling shutter correction. In: CVPR, pp. 13904–13913 (2023)

    Google Scholar 

  11. Fuoli, D., Gu, S., Timofte, R.: Efficient video super-resolution through recurrent latent space propagation. In: ICCVW, pp. 3476–3485. IEEE (2019)

    Google Scholar 

  12. Gallego, G., et al.: Event-based vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 44(1), 154–180 (2020)

    Article  Google Scholar 

  13. Gao, P., et al.: CLIP-Adapter: better vision-language models with feature adapters. Int. J. Comput. Vis. 132, 581–595 (2024)

    Article  Google Scholar 

  14. Goto, T., Fukuoka, T., Nagashima, F., Hirano, S., Sakurai, M.: Super-resolution system for 4K-HDTV. In: ICPR, pp. 4453–4458. IEEE (2014)

    Google Scholar 

  15. Guo, D., Rush, A., Kim, Y.: Parameter-efficient transfer learning with diff pruning. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 4884–4896. Association for Computational Linguistics (2021)

    Google Scholar 

  16. Haris, M., Shakhnarovich, G., Ukita, N.: Recurrent back-projection network for video super-resolution. In: CVPR, pp. 3897–3906 (2019)

    Google Scholar 

  17. Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: ICML, pp. 2790–2799. PMLR (2019)

    Google Scholar 

  18. Huang, Y., Wang, W., Wang, L.: Bidirectional recurrent convolutional networks for multi-frame super-resolution. In: NeurlPS, vol. 28 (2015)

    Google Scholar 

  19. Huang, Y., Wang, W., Wang, L.: Video super-resolution via bidirectional recurrent convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 1015–1028 (2017)

    Article  Google Scholar 

  20. Isobe, T., Jia, X., Gu, S., Li, S., Wang, S., Tian, Q.: Video super-resolution with recurrent structure-detail network. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 645–660. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_38

    Chapter  Google Scholar 

  21. Isobe, T., et al.: Look back and forth: video super-resolution with explicit temporal difference modeling. In: CVPR, pp. 17411–17420 (2022)

    Google Scholar 

  22. Jiang, Y., Wang, Y., Li, S., Zhang, Y., Zhao, M., Gao, Y.: Event-based low-illumination image enhancement. IEEE Trans. Multimedia 26, 1920–1931 (2023)

    Article  Google Scholar 

  23. Jiang, Z., Zhang, Y., Zou, D., Ren, J., Lv, J., Liu, Y.: Learning event-based motion deblurring. In: CVPR, pp. 3320–3329 (2020)

    Google Scholar 

  24. Jing, Y., Yang, Y., Wang, X., Song, M., Tao, D.: Turning frequency to resolution: video super-resolution via event cameras. In: CVPR, pp. 7772–7781 (2021)

    Google Scholar 

  25. Jo, Y., Oh, S.W., Kang, J., Kim, S.J.: Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In: CVPR, pp. 3224–3232 (2018)

    Google Scholar 

  26. Kai, D., Lu, J., Zhang, Y., Sun, X.: EvTexture: event-driven texture enhancement for video super-resolution. In: ICML. PMLR (2024)

    Google Scholar 

  27. Kai, D., Zhang, Y., Sun, X.: Video super-resolution via event-driven temporal alignment. In: ICIP, pp. 2950–2954. IEEE (2023)

    Google Scholar 

  28. Kappeler, A., Yoo, S., Dai, Q., Katsaggelos, A.K.: Video super-resolution with convolutional neural networks. IEEE Trans. Comput. Imaging 2(2), 109–122 (2016)

    Article  MathSciNet  Google Scholar 

  29. Kim, S.Y., Lim, J., Na, T., Kim, M.: 3DSRnet: video super-resolution using 3D convolutional neural networks. arXiv preprint arXiv:1812.09079 (2018)

  30. Kim, T.H., Sajjadi, M.S., Hirsch, M., Scholkopf, B.: Spatio-temporal transformer network for video restoration. In: ECCV, pp. 106–122 (2018)

    Google Scholar 

  31. Lee, J., Lee, M., Cho, S., Lee, S.: Reference-based video super-resolution using multi-camera video triplets. In: CVPR, pp. 17824–17833 (2022)

    Google Scholar 

  32. Li, F., Zhang, L., Liu, Z., Lei, J., Li, Z.: Multi-frequency representation enhancement with privilege information for video super-resolution. In: ICCV, pp. 12814–12825 (2023)

    Google Scholar 

  33. Li, S., He, F., Du, B., Zhang, L., Xu, Y., Tao, D.: Fast spatio-temporal residual network for video super-resolution. In: CVPR, pp. 10522–10531 (2019)

    Google Scholar 

  34. Li, W., Tao, X., Guo, T., Qi, L., Lu, J., Jia, J.: MuCAN: multi-correspondence aggregation network for video super-resolution. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 335–351. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_20

    Chapter  Google Scholar 

  35. Li, Z., Chen, X., Pun, C.M., Cun, X.: High-resolution document shadow removal via a large-scale real-world dataset and a frequency-aware shadow erasing net. In: ICCV, pp. 12449–12458 (2023)

    Google Scholar 

  36. Li, Z., Chen, X., Wang, S., Pun, C.M.: A large-scale film style dataset for learning multi-frequency driven film enhancement. In: Elkind, E. (ed.) IJCAI, pp. 1160–1168 (2023). Main Track

    Google Scholar 

  37. Lian, W., Lian, W.: Sliding window recurrent network for efficient video super-resolution. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) ECCV 2022. LNCS, vol. 13802, pp. 591–601. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25063-7_37

    Chapter  Google Scholar 

  38. Liang, J., et al.: VRT: a video restoration transformer. arXiv preprint arXiv:2201.12288 (2022)

  39. Liu, C., Sun, D.: On Bayesian adaptive video super resolution. IEEE Trans. Pattern Anal. Mach. Intell. 36(2), 346–360 (2013)

    Article  Google Scholar 

  40. Liu, C., Yang, H., Fu, J., Qian, X.: Learning trajectory-aware transformer for video super-resolution. In: CVPR, pp. 5687–5696 (2022)

    Google Scholar 

  41. Lu, Y., Liang, G., Wang, L.: Self-supervised learning of event-guided video frame interpolation for rolling shutter frames. arXiv preprint arXiv:2306.15507 (2023)

  42. Lu, Y., Wang, Z., Liu, M., Wang, H., Wang, L.: Learning spatial-temporal implicit neural representations for event-guided video super-resolution. In: CVPR, pp. 1557–1567 (2023)

    Google Scholar 

  43. Lu, Z., Xiao, Z., Bai, J., Xiong, Z., Wang, X.: Can SAM boost video super-resolution? arXiv preprint arXiv:2305.06524 (2023)

  44. Luo, Y., Zhou, L., Wang, S., Wang, Z.: Video satellite imagery super resolution via convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 14(12), 2398–2402 (2017)

    Article  Google Scholar 

  45. Messikommer, N., et al.: Multi-bracket high dynamic range imaging with event cameras. In: CVPR, pp. 547–557 (2022)

    Google Scholar 

  46. Nah, S., et al.: NTIRE 2019 challenge on video deblurring and super-resolution: dataset and study. In: CVPRW (2019)

    Google Scholar 

  47. Nah, S., et al.: NTIRE 2019 challenge on video deblurring: methods and results. In: CVPRW (2019)

    Google Scholar 

  48. Nah, S., et al.: NTIRE 2019 challenge on video super-resolution: methods and results. In: CVPRW (2019)

    Google Scholar 

  49. Paikin, G., Ater, Y., Shaul, R., Soloveichik, E.: EFI-Net: video frame interpolation from fusion of events and frames. In: CVPR, pp. 1291–1301 (2021)

    Google Scholar 

  50. Qiu, Z., Yang, H., Fu, J., Fu, D.: Learning spatiotemporal frequency-transformer for compressed video super-resolution. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13678, pp. 257–273. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_15

    Chapter  Google Scholar 

  51. Rebecq, H., Ranftl, R., Koltun, V., Scaramuzza, D.: High speed and high dynamic range video with an event camera. IEEE Trans. Pattern Anal. Mach. Intell. 43(6), 1964–1980 (2019)

    Article  Google Scholar 

  52. Sajjadi, M.S., Vemulapalli, R., Brown, M.: Frame-recurrent video super-resolution. In: CVPR, pp. 6626–6634 (2018)

    Google Scholar 

  53. Shang, W., Ren, D., Zou, D., Ren, J.S., Luo, P., Zuo, W.: Bringing events into video deblurring with non-consecutively blurry frames. In: ICCV, pp. 4531–4540 (2021)

    Google Scholar 

  54. Shi, S., Gu, J., Xie, L., Wang, X., Yang, Y., Dong, C.: Rethinking alignment in video super-resolution transformers. In: NeurlPS, vol. 35, pp. 36081–36093 (2022)

    Google Scholar 

  55. Sun, L., et al.: Event-based fusion for motion deblurring with cross-modal attention. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13678, pp. 412–428. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19797-0_24

    Chapter  Google Scholar 

  56. Sun, L., et al.: Event-based frame interpolation with ad-hoc deblurring. In: CVPR, pp. 18043–18052 (2023)

    Google Scholar 

  57. Sun, S., Ren, W., Li, J., Zhang, K., Liang, M., Cao, X.: Event-aware video deraining via multi-patch progressive learning. IEEE Trans. Image Process. 32, 3040–3053 (2023)

    Article  Google Scholar 

  58. Tao, X., Gao, H., Liao, R., Wang, J., Jia, J.: Detail-revealing deep video super-resolution. In: ICCV, pp. 4472–4480 (2017)

    Google Scholar 

  59. Tian, Y., Zhang, Y., Fu, Y., Xu, C.: TDAN: temporally-deformable alignment network for video super-resolution. In: CVPR, pp. 3360–3369 (2020)

    Google Scholar 

  60. Tulyakov, S., Bochicchio, A., Gehrig, D., Georgoulis, S., Li, Y., Scaramuzza, D.: Time Lens++: event-based frame interpolation with parametric non-linear flow and multi-scale fusion. In: CVPR, pp. 17755–17764 (2022)

    Google Scholar 

  61. Tulyakov, S., et al.: Time Lens: event-based video frame interpolation. In: CVPR, pp. 16155–16164 (2021)

    Google Scholar 

  62. Wang, B., Liu, B., Liu, S., Yang, F.: VCISR: blind single image super-resolution with video compression synthetic data. In: WACV, pp. 4302–4312 (2024)

    Google Scholar 

  63. Wang, B., Yang, F., Yu, X., Zhang, C., Zhao, H.: APISR: anime production inspired real-world anime super-resolution. In: CVPR, pp. 25574–25584 (2024)

    Google Scholar 

  64. Wang, J., Weng, W., Zhang, Y., Xiong, Z.: Unsupervised video deraining with an event camera. In: ICCV, pp. 10831–10840 (2023)

    Google Scholar 

  65. Wang, X., Chan, K.C., Yu, K., Dong, C., Change Loy, C.: EDVR: video restoration with enhanced deformable convolutional networks. In: CVPRW (2019)

    Google Scholar 

  66. Wang, X., Li, Y., Zhang, H., Shan, Y.: Towards real-world blind face restoration with generative facial prior. In: CVPR, pp. 9168–9178 (2021)

    Google Scholar 

  67. Wang, X., Yu, K., Dong, C., Loy, C.C.: Recovering realistic texture in image super-resolution by deep spatial feature transform. In: Proceedings of the CVPR, pp. 606–615 (2018)

    Google Scholar 

  68. Wang, Y., et al.: Self-supervised scene dynamic recovery from rolling shutter images and events. arXiv preprint arXiv:2304.06930 (2023)

  69. Weng, W., Zhang, Y., Xiong, Z.: Event-based blurry frame interpolation under blind exposure. In: CVPR, pp. 1588–1598 (2023)

    Google Scholar 

  70. Xiao, Y., Su, X., Yuan, Q., Liu, D., Shen, H., Zhang, L.: Satellite video super-resolution via multiscale deformable convolution alignment and temporal grouping projection. IEEE Trans. Geosci. Remote Sens. 60, 1–19 (2021)

    Article  Google Scholar 

  71. Xiao, Y., et al.: Space-time super-resolution for satellite video: a joint framework based on multi-scale spatial-temporal transformer. Int. J. Appl. Earth Obs. Geoinf. 108, 102731 (2022)

    Google Scholar 

  72. Xiao, Y., et al.: Local-global temporal difference learning for satellite video super-resolution. arXiv preprint arXiv:2304.04421 (2023)

  73. Xiao, Z., Bai, J., Lu, Z., Xiong, Z.: A dive into SAM prior in image restoration. arXiv preprint arXiv:2305.13620 (2023)

  74. Xiao, Z., Cheng, Z., Xiong, Z.: Space-time super-resolution for light field videos. IEEE Trans. Image Process. 32, 4785–4799 (2023)

    Article  Google Scholar 

  75. Xiao, Z., Fu, X., Huang, J., Cheng, Z., Xiong, Z.: Space-time distillation for video super-resolution. In: CVPR, pp. 2113–2122 (2021)

    Google Scholar 

  76. Xiao, Z., Weng, W., Zhang, Y., Xiong, Z.: EVA2: event-assisted video frame interpolation via cross-modal alignment and aggregation. IEEE Trans. Comput. Imaging 8, 1145–1158 (2022)

    Article  Google Scholar 

  77. Xiao, Z., Xiong, Z., Fu, X., Liu, D., Zha, Z.J.: Space-time video super-resolution using temporal profiles. In: ACM MM, pp. 664–672 (2020)

    Google Scholar 

  78. Xu, Z., Chen, Z., Zhang, Y., Song, Y., Wan, X., Li, G.: Bridging vision and language encoders: parameter-efficient tuning for referring image segmentation. In: ICCV, pp. 17503–17512 (2023)

    Google Scholar 

  79. Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. Int. J. Comput. Vis. 127, 1106–1125 (2019)

    Article  Google Scholar 

  80. Yang, M., Liu, B., Wang, B., Kim, H.S.: Diffusion-aided joint source channel coding for high realism wireless image transmission. arXiv preprint arXiv:2404.17736 (2024)

  81. Yang, W., Wu, J., Ma, J., Li, L., Dong, W., Shi, G.: Learning for motion deblurring with hybrid frames and events. In: ACM MM, pp. 1396–1404 (2022)

    Google Scholar 

  82. Yang, W., Wu, J., Ma, J., Li, L., Shi, G.: Motion deblurring via spatial-temporal collaboration of frames and events. In: AAAI, vol. 38, pp. 6531–6539 (2024)

    Google Scholar 

  83. Yang, Y., Han, J., Liang, J., Sato, I., Shi, B.: Learning event guided high dynamic range video reconstruction. In: CVPR, pp. 13924–13934 (2023)

    Google Scholar 

  84. Yi, P., et al.: Omniscient video super-resolution. In: ICCV, pp. 4429–4438 (2021)

    Google Scholar 

  85. Yu, T., Lu, Z., Jin, X., Chen, Z., Wang, X.: Task residual for tuning vision-language models. In: CVPR, pp. 10899–10909 (2023)

    Google Scholar 

  86. Yue, H., Zhang, Z., Yang, J.: Real-RawVSR: real-world raw video super-resolution with a benchmark dataset. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13666, pp. 608–624. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20068-7_35

    Chapter  Google Scholar 

  87. Zeng, Y., Yang, H., Chao, H., Wang, J., Fu, J.: Improving visual quality of image synthesis by a token-based generator with transformers. In: NeurlPS, vol. 34, pp. 21125–21137 (2021)

    Google Scholar 

  88. Zhang, J.O., Sax, A., Zamir, A., Guibas, L., Malik, J.: Side-tuning: a baseline for network adaptation via additive side networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 698–714. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_41

    Chapter  Google Scholar 

  89. Zhang, L., Zhang, H., Shen, H., Li, P.: A super-resolution reconstruction algorithm for surveillance images. Sig. Process. 90(3), 848–859 (2010)

    Article  Google Scholar 

  90. Zhang, L., Zhang, H., Chen, J., Wang, L.: Hybrid deblur net: deep non-uniform deblurring with event camera. IEEE Access 8, 148075–148083 (2020)

    Article  Google Scholar 

  91. Zheng, X., et al.: Deep learning for event-based vision: a comprehensive survey and benchmarks. arXiv preprint arXiv:2302.08890 (2023)

  92. Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. Int. J. Comput. Vis. 130(9), 2337–2348 (2022)

    Article  Google Scholar 

  93. Zhou, X., Duan, P., Ma, Y., Shi, B.: EvUnroll: neuromorphic events based rolling shutter image correction. In: CVPR, pp. 17775–17784 (2022)

    Google Scholar 

  94. Zou, Y., Zheng, Y., Takatani, T., Fu, Y.: Learning to reconstruct high speed and high dynamic range videos from events. In: CVPR, pp. 2024–2033 (2021)

    Google Scholar 

Download references

Acknowledgments

We acknowledge funding from National Natural Science Foundation of China under Grants 62131003 and 62021001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiwei Xiong .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5290 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xiao, Z., Kai, D., Zhang, Y., Zha, ZJ., Sun, X., Xiong, Z. (2025). Event-Adapted Video Super-Resolution. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15100. Springer, Cham. https://doi.org/10.1007/978-3-031-72946-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72946-1_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72945-4

  • Online ISBN: 978-3-031-72946-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics