Abstract
How to effectively explore spatial-temporal features is important for video colorization. Instead of stacking multiple frames along the temporal dimension or recurrently propagating estimated features that will accumulate errors or cannot explore information from far-apart frames, we develop a memory-based feature propagation module that can establish reliable connections with features from far-apart frames and alleviate the influence of inaccurately estimated features. To extract better features from each frame for the above-mentioned feature propagation, we explore the features from large-pretrained visual models to guide the feature estimation of each frame so that the estimated features can model complex scenarios. In addition, we note that adjacent frames usually contain similar contents. To explore this property for better spatial and temporal feature utilization, we develop a local attention module to aggregate the features from adjacent frames in a spatial-temporal neighborhood. We formulate our memory-based feature propagation module, large-pretrained visual model guided feature estimation module, and local attention module into an end-to-end trainable network (named ColorMNet) and show that it performs favorably against state-of-the-art methods on both the benchmark datasets and real-world scenarios. Our source codes and pre-trained models are available at: https://github.com/yyang181/colormnet.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, S., et al.: Exemplar-based video colorization with long-term spatiotemporal dependency. arXiv preprint arXiv:2303.15081 (2023)
Chen, X., Zou, D., Zhao, Q., Tan, P.: Manifold preserving edit propagation. ACM TOG 31(6), 1–7 (2012)
Cheng, H.K., Schwing, A.G.: XMEM: long-term video object segmentation with an Atkinson-Shiffrin memory model. In: ECCV (2022)
Cheng, Z., Yang, Q., Sheng, B.: Deep colorization. In: ICCV (2015)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2021)
Hasler, D., Süsstrunk, S.: Measuring colorfulness in natural images. In: Human Vision and Electronic Imaging VIII (2023)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
He, M., Chen, D., Liao, J., Sander, P.V., Yuan, L.: Deep exemplar-based colorization. ACM TOG 37(4), 1–16 (2018)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local NASH equilibrium. In: NeurIPS (2017)
Iizuka, S., Simo-Serra, E.: DeepRemaster: temporal source-reference attention networks for comprehensive video enhancement. ACM TOG 38(6), 1–13 (2019)
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color! joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM TOG 35(4), 1–11 (2016)
Kang, X., Lin, X., Zhang, K., et al.: NTIRE 2023 video colorization challenge. In: CVPRW (2023)
Kang, X., Yang, T., Ouyang, W., Ren, P., Li, L., Xie, X.: DDColor: towards photo-realistic image colorization via dual decoders. In: ICCV (2023)
Karen Simonyan, A.Z.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Lai, W.S., Huang, J.B., Wang, O., Shechtman, E., Yumer, E., Yang, M.H.: Learning blind video temporal consistency. In: ECCV (2018)
Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: ECCV (2016)
Lei, C., Chen, Q.: Fully automatic video colorization with self-regularization and diversity. In: CVPR (2019)
Lei, C., Xing, Y., Chen, Q.: Blind video temporal consistency via deep video prior. In: NeurIPS (2020)
Levin, A., Lischinski, D., Weiss, Y.: Colorization using optimization. ACM TOG 23(3), 689–694 (2004)
Liu, Y., et al.: Temporally consistent video colorization with deep feature propagation and self-regularization learning. arXiv preprint arXiv:2110.04562 (2021)
Luan, Q., Wen, F., Cohen-Or, D., Liang, L., Xu, Y., Shum, H.: Natural image colorization. In: ESRT (2007)
Oquab, M., Darcet, T., Moutakanni, T., et al.: Dinov2: Learning robust visual features without supervision. TMLR (2024)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS (2019)
Paul, S., Bhattacharya, S., Gupta, S.: Spatiotemporal colorization of video using 3D steerable pyramids. IEEE TCSVT 27(8), 1605–1619 (2016)
Perazzi, F., et al.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR (2016)
Qu, Y., Wong, T., Heng, P.: Manga colorization. ACM TOG 25(3), 1214–1220 (2006)
Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J.: Scribbler: controlling deep image synthesis with sketch and color. In: CVPR (2017)
Sheng, B., Sun, H., Magnor, M., Li, P.: Video colorization using parallel optimization in feature space. IEEE TCSVT 24(3), 407–417 (2013)
Su, J.W., Chu, H.K., Huang, J.B.: Instance-aware image colorization. In: CVPR (2020)
Teed, Z., Deng, J.: Raft: recurrent all-pairs field transforms for optical flow. In: ECCV (2020)
Thasarathan, H., Nazeri, K., Ebrahimi, M.: Automatic temporally coherent video colorization. In: CRV (2019)
Unterthiner, T., van Steenkiste, S., Kurach, K., Marinier, R., Michalski, M., Gelly, S.: FVD: a new metric for video generation. In: ICLR (2019)
Wan, Z., Zhang, B., Chen, D., Liao, J.: Bringing old films back to life. In: CVPR (2022)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE TIP 13(4), 600–612 (2004)
Xu, Z., Wang, T., Fang, F., Sheng, Y., Zhang, G.: Stylization-based architecture for fast deep exemplar colorization. In: CVPR (2020)
Yang, Y., Peng, Z., Du, X., Tao, Z., Tang, J., Pan, J.: BistNet: semantic image prior guided bidirectional temporal feature fusion for deep exemplar-based video colorization. IEEE TPAMI 1–14 (2024)
Yatziv, L., Sapiro, G.: Fast image and video colorization using chrominance blending. IEEE TIP 15(5), 1120–1129 (2006)
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H.: Restormer: Efficient transformer for high-resolution image restoration. In: CVPR (2022)
Zhang, B., et al.: Deep exemplar-based video colorization. In: CVPR (2019)
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: ECCV (2016)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Zhang, R., et al.: Real-time user-guided image colorization with learned deep priors. ACM TOG 36(4), 1–11 (2017)
Zhao, H., Wu, W., Liu, Y., He, D.: Color2embed: fast exemplar-based image colorization using color embeddings. arXiv preprint arXiv:2106.08017 (2021)
Zhao, Y., et al.: VCGAN: video colorization with hybrid generative adversarial network. IEEE TMM (2022)
Acknowledgements
This work has been partly supported by the National Natural Science Foundation of China (Nos. U22B2049, 62272233, 62332010), the Fundamental Research Funds for the Central Universities (No. 30922010910), and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX24_0680).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, Y., Dong, J., Tang, J., Pan, J. (2025). ColorMNet: A Memory-Based Deep Spatial-Temporal Feature Propagation Network for Video Colorization. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15062. Springer, Cham. https://doi.org/10.1007/978-3-031-73235-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-73235-5_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73234-8
Online ISBN: 978-3-031-73235-5
eBook Packages: Computer ScienceComputer Science (R0)