ColorMNet: A Memory-Based Deep Spatial-Temporal Feature Propagation Network for Video Colorization

Yang, Yixin; Dong, Jiangxin; Tang, Jinhui; Pan, Jinshan

doi:10.1007/978-3-031-73235-5_19

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15062))

Included in the following conference series:

European Conference on Computer Vision

321 Accesses

Abstract

How to effectively explore spatial-temporal features is important for video colorization. Instead of stacking multiple frames along the temporal dimension or recurrently propagating estimated features that will accumulate errors or cannot explore information from far-apart frames, we develop a memory-based feature propagation module that can establish reliable connections with features from far-apart frames and alleviate the influence of inaccurately estimated features. To extract better features from each frame for the above-mentioned feature propagation, we explore the features from large-pretrained visual models to guide the feature estimation of each frame so that the estimated features can model complex scenarios. In addition, we note that adjacent frames usually contain similar contents. To explore this property for better spatial and temporal feature utilization, we develop a local attention module to aggregate the features from adjacent frames in a spatial-temporal neighborhood. We formulate our memory-based feature propagation module, large-pretrained visual model guided feature estimation module, and local attention module into an end-to-end trainable network (named ColorMNet) and show that it performs favorably against state-of-the-art methods on both the benchmark datasets and real-world scenarios. Our source codes and pre-trained models are available at: https://github.com/yyang181/colormnet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Transformer-Based Video Colorization Method Fusing Local Self-attention and Bidirectional Optical Flow

Temporally consistent video colorization with deep feature propagation and self-regularization learning

Article Open access 03 January 2024

See Fine Color from the Rough Black-and-White

References

Chen, S., et al.: Exemplar-based video colorization with long-term spatiotemporal dependency. arXiv preprint arXiv:2303.15081 (2023)
Chen, X., Zou, D., Zhao, Q., Tan, P.: Manifold preserving edit propagation. ACM TOG 31(6), 1–7 (2012)
Google Scholar
Cheng, H.K., Schwing, A.G.: XMEM: long-term video object segmentation with an Atkinson-Shiffrin memory model. In: ECCV (2022)
Google Scholar
Cheng, Z., Yang, Q., Sheng, B.: Deep colorization. In: ICCV (2015)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
Hasler, D., Süsstrunk, S.: Measuring colorfulness in natural images. In: Human Vision and Electronic Imaging VIII (2023)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
He, M., Chen, D., Liao, J., Sander, P.V., Yuan, L.: Deep exemplar-based colorization. ACM TOG 37(4), 1–16 (2018)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local NASH equilibrium. In: NeurIPS (2017)
Google Scholar
Iizuka, S., Simo-Serra, E.: DeepRemaster: temporal source-reference attention networks for comprehensive video enhancement. ACM TOG 38(6), 1–13 (2019)
Article Google Scholar
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Let there be color! joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM TOG 35(4), 1–11 (2016)
Article Google Scholar
Kang, X., Lin, X., Zhang, K., et al.: NTIRE 2023 video colorization challenge. In: CVPRW (2023)
Google Scholar
Kang, X., Yang, T., Ouyang, W., Ren, P., Li, L., Xie, X.: DDColor: towards photo-realistic image colorization via dual decoders. In: ICCV (2023)
Google Scholar
Karen Simonyan, A.Z.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Lai, W.S., Huang, J.B., Wang, O., Shechtman, E., Yumer, E., Yang, M.H.: Learning blind video temporal consistency. In: ECCV (2018)
Google Scholar
Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: ECCV (2016)
Google Scholar
Lei, C., Chen, Q.: Fully automatic video colorization with self-regularization and diversity. In: CVPR (2019)
Google Scholar
Lei, C., Xing, Y., Chen, Q.: Blind video temporal consistency via deep video prior. In: NeurIPS (2020)
Google Scholar
Levin, A., Lischinski, D., Weiss, Y.: Colorization using optimization. ACM TOG 23(3), 689–694 (2004)
Article Google Scholar
Liu, Y., et al.: Temporally consistent video colorization with deep feature propagation and self-regularization learning. arXiv preprint arXiv:2110.04562 (2021)
Luan, Q., Wen, F., Cohen-Or, D., Liang, L., Xu, Y., Shum, H.: Natural image colorization. In: ESRT (2007)
Google Scholar
Oquab, M., Darcet, T., Moutakanni, T., et al.: Dinov2: Learning robust visual features without supervision. TMLR (2024)
Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: NeurIPS (2019)
Google Scholar
Paul, S., Bhattacharya, S., Gupta, S.: Spatiotemporal colorization of video using 3D steerable pyramids. IEEE TCSVT 27(8), 1605–1619 (2016)
Google Scholar
Perazzi, F., et al.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR (2016)
Google Scholar
Qu, Y., Wong, T., Heng, P.: Manga colorization. ACM TOG 25(3), 1214–1220 (2006)
Article Google Scholar
Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J.: Scribbler: controlling deep image synthesis with sketch and color. In: CVPR (2017)
Google Scholar
Sheng, B., Sun, H., Magnor, M., Li, P.: Video colorization using parallel optimization in feature space. IEEE TCSVT 24(3), 407–417 (2013)
Google Scholar
Su, J.W., Chu, H.K., Huang, J.B.: Instance-aware image colorization. In: CVPR (2020)
Google Scholar
Teed, Z., Deng, J.: Raft: recurrent all-pairs field transforms for optical flow. In: ECCV (2020)
Google Scholar
Thasarathan, H., Nazeri, K., Ebrahimi, M.: Automatic temporally coherent video colorization. In: CRV (2019)
Google Scholar
Unterthiner, T., van Steenkiste, S., Kurach, K., Marinier, R., Michalski, M., Gelly, S.: FVD: a new metric for video generation. In: ICLR (2019)
Google Scholar
Wan, Z., Zhang, B., Chen, D., Liao, J.: Bringing old films back to life. In: CVPR (2022)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE TIP 13(4), 600–612 (2004)
Google Scholar
Xu, Z., Wang, T., Fang, F., Sheng, Y., Zhang, G.: Stylization-based architecture for fast deep exemplar colorization. In: CVPR (2020)
Google Scholar
Yang, Y., Peng, Z., Du, X., Tao, Z., Tang, J., Pan, J.: BistNet: semantic image prior guided bidirectional temporal feature fusion for deep exemplar-based video colorization. IEEE TPAMI 1–14 (2024)
Google Scholar
Yatziv, L., Sapiro, G.: Fast image and video colorization using chrominance blending. IEEE TIP 15(5), 1120–1129 (2006)
Google Scholar
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H.: Restormer: Efficient transformer for high-resolution image restoration. In: CVPR (2022)
Google Scholar
Zhang, B., et al.: Deep exemplar-based video colorization. In: CVPR (2019)
Google Scholar
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: ECCV (2016)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Google Scholar
Zhang, R., et al.: Real-time user-guided image colorization with learned deep priors. ACM TOG 36(4), 1–11 (2017)
Google Scholar
Zhao, H., Wu, W., Liu, Y., He, D.: Color2embed: fast exemplar-based image colorization using color embeddings. arXiv preprint arXiv:2106.08017 (2021)
Zhao, Y., et al.: VCGAN: video colorization with hybrid generative adversarial network. IEEE TMM (2022)
Google Scholar

Download references

Acknowledgements

This work has been partly supported by the National Natural Science Foundation of China (Nos. U22B2049, 62272233, 62332010), the Fundamental Research Funds for the Central Universities (No. 30922010910), and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. KYCX24_0680).

Author information

Authors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Yixin Yang, Jiangxin Dong, Jinhui Tang & Jinshan Pan

Authors

Yixin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jiangxin Dong
View author publications
You can also search for this author in PubMed Google Scholar
Jinhui Tang
View author publications
You can also search for this author in PubMed Google Scholar
Jinshan Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinshan Pan .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 65022 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Y., Dong, J., Tang, J., Pan, J. (2025). ColorMNet: A Memory-Based Deep Spatial-Temporal Feature Propagation Network for Video Colorization. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15062. Springer, Cham. https://doi.org/10.1007/978-3-031-73235-5_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-73235-5_19
Published: 30 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73234-8
Online ISBN: 978-3-031-73235-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ColorMNet: A Memory-Based Deep Spatial-Temporal Feature Propagation Network for Video Colorization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Transformer-Based Video Colorization Method Fusing Local Self-attention and Bidirectional Optical Flow

Temporally consistent video colorization with deep feature propagation and self-regularization learning

See Fine Color from the Rough Black-and-White

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (zip 65022 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

ColorMNet: A Memory-Based Deep Spatial-Temporal Feature Propagation Network for Video Colorization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Transformer-Based Video Colorization Method Fusing Local Self-attention and Bidirectional Optical Flow

Temporally consistent video colorization with deep feature propagation and self-regularization learning

See Fine Color from the Rough Black-and-White

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (zip 65022 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation