Video Colorization: A Survey

Peng, Zhong-Zheng; Yang, Yi-Xin; Tang, Jin-Hui; Pan, Jin-Shan

doi:10.1007/s11390-024-4143-z

Video Colorization: A Survey

Survey
Published: 22 July 2024

Volume 39, pages 487–508, (2024)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Zhong-Zheng Peng (彭中正)¹^na1,
Yi-Xin Yang (杨艺新)¹^na1,
Jin-Hui Tang (唐金辉)¹ &
…
Jin-Shan Pan (潘金山)¹

310 Accesses
1 Altmetric
Explore all metrics

Abstract

Video colorization aims to add color to grayscale or monochrome videos. Although existing methods have achieved substantial and noteworthy results in the field of image colorization, video colorization presents more formidable obstacles due to the additional necessity for temporal consistency. Moreover, there is rarely a systematic review of video colorization methods. In this paper, we aim to review existing state-of-the-art video colorization methods. In addition, maintaining spatial-temporal consistency is pivotal to the process of video colorization. To gain deeper insight into the evolution of existing methods in terms of spatial-temporal consistency, we further review video colorization methods from a novel perspective. Video colorization methods can be categorized into four main categories: optical-flow based methods, scribble-based methods, exemplar-based methods, and fully automatic methods. However, optical-flow based methods rely heavily on accurate optical-flow estimation, scribble-based methods require extensive user interaction and modifications, exemplar-based methods face challenges in obtaining suitable reference images, and fully automatic methods often struggle to meet specific colorization requirements. We also discuss the existing challenges and highlight several future research opportunities worth exploring.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Lai W S, Huang J B, Wang O, Shechtman E, Yumer E, Yang M H. Learning blind video temporal consistency. In Proc. the 15th European Conference on Computer Vision, Oct. 2018, pp.170–185. DOI: https://doi.org/10.1007/978-3-030-01267-0_11.
Bonneel N, Tompkin J, Sunkavalli K, Sun D Q, Paris S, Pfister H. Blind video temporal consistency. ACM Trans. Graphics, 2015, 34(6): 196. DOI: https://doi.org/10.1145/2816795.2818107.
Article Google Scholar
Lei C Y, Xing Y Z, Ouyang H, Chen Q F. Deep video prior for video consistency and propagation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2023, 45(1): 356–371. DOI: https://doi.org/10.1109/TPAMI.2022.3142071.
Article Google Scholar
Yatziv L, Sapiro G. Fast image and video colorization using chrominance blending. IEEE Trans. Image Processing, 2006, 15(5): 1120–1129. DOI: https://doi.org/10.1109/TIP.2005.864231.
Article Google Scholar
Sheng B, Sun H Q, Magnor M, Li P. Video colorization using parallel optimization in feature space. IEEE Trans. Circuits and Systems for Video Technology, 2014, 24(3): 407–417. DOI: https://doi.org/10.1109/TCSVT.2013.2276702.
Article Google Scholar
Doğan P, Aydın T O, Stefanoski N, Smolic A. Key-frame based spatiotemporal scribble propagation. In Proc. the 2015 Eurographics Workshop on Intelligent Cinematography and Editing, May 2015, pp.13–20. DOI: https://doi.org/10.2312/wiced.20151073.
Paul S, Bhattacharya S, Gupta S. Spatiotemporal colorization of video using 3D steerable pyramids. IEEE Trans. Circuits and Systems for Video Technology, 2017, 27(8): 1605–1619. DOI: https://doi.org/10.1109/TCSVT.2016.2539539.
Article Google Scholar
Jacob V G, Gupta S. Colorization of grayscale images and videos using a semiautomatic approach. In Proc. the 16th IEEE International Conference on Image Processing, Nov. 2009, pp.1653–1656. DOI: https://doi.org/10.1109/ICIP.2009.5413392.
Ben-Zrihem N, Zelnik-Manor L. Approximate nearest neighbor fields in video. In Proc. the 28th IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2015, pp.5233–5242. DOI: https://doi.org/10.1109/CVPR.2015.7299160.
Xia S F, Liu J Y, Fang Y M, Yang W H, Guo Z M. Robust and automatic video colorization via multiframe reordering refinement. In Proc. the 23rd IEEE International Conference on Image Processing, Sept. 2016, pp.4017–4021. DOI: https://doi.org/10.1109/ICIP.2016.7533114.
Heu J H, Hyun D Y, Kim C S, Lee S U. Image and video colorization based on prioritized source propagation. In Proc. the 16th IEEE International Conference on Image Processing, Nov. 2009, pp.465–468. DOI: https://doi.org/10.1109/ICIP.2009.5414371.
Zhang B, He M M, Liao J, Sander P V, Yuan L, Bermak A, Chen D. Deep exemplar-based video colorization. In Proc. the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.8044–8053. DOI: https://doi.org/10.1109/CVPR.2019.00824.
Vondrick C, Shrivastava A, Fathi A, Guadarrama S, Murphy K. Tracking emerges by colorizing videos. In Proc. the 15th European Conference on Computer Vision, Sept. 2018. pp.391–408. DOI: https://doi.org/10.1007/978-3-030-01261-8_24.
Meyer S, Cornillère V, Djelouah A, Schroers C, Gross M H. Deep video color propagation. In Proc. the 29th British Machine Vision Conference, Sept. 2018, Article No. 128. DOI: https://doi.org/10.3929/ethz-b-000319608.
Iizuka S, Simo-Serra E. DeepRemaster: Temporal source-reference attention networks for comprehensive video enhancement. ACM Trans. Graphics, 2019, 38 (6): Article No.176. DOI: https://doi.org/10.1145/3355089.3356570.
Liu Y X, Zhang X Y, Xu X G. Reference-based video colorization with multi-scale semantic fusion and temporal augmentation. In Proc. the 28th IEEE International Conference on Image Processing, Sept. 2021, pp.1924–1928. DOI: https://doi.org/10.1109/ICIP42928.2021.9506422.
Yang Y, Liu Y, Yuan H, Chu Y H. Deep colorization: A channel attention-based CNN for video colorization. In Proc. the 5th International Conference on Image and Graphics Processing, Jan. 2022, pp.275–280. DOI: https://doi.org/10.1145/3512388.3512428.
Yang Y X, Pan J S, Peng Z Z, Du X Y, Tao Z L, Tang J H. BiSTNet: Semantic image prior guided bidirectional temporal feature fusion for deep exemplar-based video colorization. IEEE Trans. Pattern Analysis and Machine Intelligence, 2024. DOI: https://doi.org/10.1109/TPAMI.2024.3370920. (early access)
Wan Z Y, Zhang B, Chen D D, Liao J. Bringing old films back to life. In Proc. the 35th IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.17673–17682. DOI: https://doi.org/10.1109/CVPR52688.2022.01717.
Zhao Y Z, Po L M, Liu K C, Wang X H, Yu W Y, Xian P F, Zhang Y J, Liu M Y. SVCNet: Scribble-based video colorization network with temporal aggregation. IEEE Trans. Image Processing, 2023, 32: 4443–4458. DOI: https://doi.org/10.1109/TIP.2023.3298537.
Article Google Scholar
Jampani V, Gadde R, Gehler P V. Video propagation networks. In Proc. the 30th IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.3154–3164. DOI: https://doi.org/10.1109/CVPR.2017.336.
Liu S F, Zhong G Y, De Mello S, Gu J W, Jampani V, Yang M H, Kautz J. Switchable temporal propagation network. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.89–104. DOI: https://doi.org/10.1007/978-3030-01234-2_6.
Liu Y H, Zhao H Y, Chan K C K, Wang X T, Loy C C, Qiao Y, Dong C. Temporally consistent video colorization with deep feature propagation and self-regularization learning. Computational Visual Media, 2024, 10(2): 375–395. DOI: https://doi.org/10.1007/s41095-023-0342-8.
Article Google Scholar
Lei C Y, Chen Q F. Fully automatic video colorization with self-regularization and diversity. In Proc. the 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.3748–3756. DOI: https://doi.org/10.1109/CVPR.2019.00387.
Kouzouglidis P, Sfikas G, Nikou C. Automatic video colorization using 3D conditional generative adversarial networks. In Proc. the 14th International Symposium on Visual Computing, Oct. 2019, pp.209–218. DOI: https://doi.org/10.1007/9783-030-33720-9_16.
Thasarathan H, Nazeri K, Ebrahimi M. Automatic temporally coherent video colorization. In Proc. the 16th Conference on Computer and Robot Vision, May 2019, pp.189–194. DOI: https://doi.org/10.1109/CRV.2019.00033.
Zhao Y Z, Po L M, Yu W Y, Rehman Y A U, Liu M Y, Zhang Y J, Ou W F. VCGAN: Video colorization with hybrid generative adversarial network. IEEE Trans. Multimedia, 2023, 25: 3017–3032. DOI: https://doi.org/10.1109/TMM.2022.3154600.
Article Google Scholar
Salmona A, Bouza L, Delon J. Deoldify: A review and implementation of an automatic colorization method. Image Processing on Line, 2022, 12: 347–368. DOI: https://doi.org/10.5201/ipol.2022.403.
Article MathSciNet Google Scholar
Jampour M, Zare M, Javidi M. Advanced multi-GANs towards near to real image and video colorization. Journal of Ambient Intelligence and Humanized Computing, 2023, 14(9): 12857–12874. DOI: https://doi.org/10.1007/s12652-022-04206-z.
Article Google Scholar
Mahajan A, Patel N, Kotak A, Palkar B. An end-to-end approach for automatic and consistent colorization of gray-scale videos using deep-learning techniques. In Proc. the 2020 International Conference on Machine Intelligence and Data Science Applications, May 2021, pp.539–551. DOI: https://doi.org/10.1007/978-981-33-4087-9_45.
Shi M, Zhang J Q, Chen S Y, Gao L, Lai Y K, Zhang F L. Reference-based deep line art video colorization. IEEE Trans. Visualization and Computer Graphics, 2023, 29(6): 2965–2979. DOI: https://doi.org/10.1109/TVCG.2022.3146000.
Article Google Scholar
Veluri B, Pernu C, Saffari A, Smith J R, Taylor M B, Gollakota S. NeuriCam: Key-frame video super-resolution and colorization for IoT cameras. arXiv: 2207.12496, 2022. https://arxiv.org/abs/2207.12496, May 2024.
Zhang Q, Wang B, Wen W, Li H, Liu J. Line art correlation matching feature transfer network for automatic animation colorization. In Proc. the 2021 IEEE Winter Conference on Applications of Computer Vision, Jan. 2021, pp.3871–3880. DOI: https://doi.org/10.1109/WACV48630.2021.00392.
Casey E, Pérez V, Li Z R. The animation transformer: Visual correspondence via segment matching. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.11303–11312. DOI: https://doi.org/10.1109/ICCV48922.2021.01113.
Zhao H Y, Wu W H, Liu Y H, He D L. Color2Embed: Fast exemplar-based image colorization using color embeddings. arXiv: 2106.08017, 2021. https://arxiv.org/abs/2106.08017, May 2024.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd International Conference on Learning Representations, May 2015.
He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: https://doi.org/10.1109/CVPR.2016.90.
Chen S Q, Li X M, Zhang X L, Wang M D, Zhang Y, Han J T, Zhang Y. Exemplar-based video colorization with long-term spatiotemporal dependency. Knowledge-Based Systems, 2024, 284: 111240. DOI: https://doi.org/10.1016/j.knosys.2023.111240.
Article Google Scholar
Wang Z, Bovik A C, Sheikh H R, Simoncelli E P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Processing, 2004, 13(4): 600–612. DOI: https://doi.org/10.1109/TIP.2003.819861.
Article Google Scholar
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6629–6640. DOI: https://doi.org/10.5555/3295222.3295408.
Zhang R, Isola P, Efros A A, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In Proc. the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.586–595. DOI: https://doi.org/10.1109/CVPR.2018.00068.
Hasler D, Suesstrunk S E. Measuring colorfulness in natural images. In Proc. the SPIE 5007, Human Vision and Electronic Imaging VIII, Jun. 2003, pp.87–95. DOI: https://doi.org/10.1117/12.477378.
Xue T F, Chen B A, Wu J J, Wei D L, Freeman W T. Video enhancement with task-oriented flow. International Journal of Computer Vision, 2019, 127(8): 1106–1125. DOI: https://doi.org/10.1007/s11263-018-01144-2.
Article Google Scholar
Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. In Proc. the 22nd IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2009, pp.248–255. DOI: https://doi.org/10.1109/CVPR.2009.5206848.
Soomro K, Zamir A R, Shah M. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv: 1212.0402, 2012. https://arxiv.org/abs/1212.0402, May 2024.
Wu Z X, Wang X, Jiang Y G, Ye H, Xue X Y. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In Proc. the 23rd ACM International Conference on Multimedia, Oct. 2015, pp.461–470. DOI: https://doi.org/10.1145/2733373.2806222.
Feichtenhofer C, Pinz A, Zisserman A. Convolutional two-stream network fusion for video action recognition. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.1933–1941. DOI: https://doi.org/10.1109/CVPR.2016.213.
Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A. A benchmark dataset and evaluation methodology for video object segmentation. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.724–732. DOI: https://doi.org/10.1109/CVPR.2016.85.
Pont-Tuset J, Perazzi F, Caelles S, Arbeláez P, Sorkine-Hornung A, Van Gool L. The 2017 DAVIS challenge on video object segmentation. arXiv: 1704.00675, 2017. https://arxiv.org/abs/1704.00675, May 2024.
Caelles S, Pont-Tuset J, Perazzi F, Montes A, Maninis K K, Van Gool L. The 2019 DAVIS challenge on VOS: Unsupervised multi-object segmentation. arXiv: 1905.00737, 2019. https://arxiv.org/abs/1905.00737, May 2024.
Abu-El-Haija S, Kothari N, Lee J, Natsev P, Toderici G, Varadarajan B, Vijayanarasimhan S. YouTube-8M: A large-scale video classification benchmark. arXiv: 1609. 08675, 2016. https://arxiv.org/abs/1609.08675, May 2024.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2015, pp.3431–3440. DOI: https://doi.org/10.1109/CVPR.2015.7298965.
Li S Y, Zhao S Y, Yu W J, Sun W X, Metaxas D, Loy C C, Liu Z W. Deep animation video interpolation in the wild. In Proc. the 34th IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.6583–6591. DOI: https://doi.org/10.1109/CVPR46437.2021.00652.
Zhang S H, Chen T, Zhang Y F, Hu S M, Martin R R. Vectorizing cartoon animations. IEEE Trans. Visualization and Computer Graphics, 2009, 15(4): 618–629. DOI: https://doi.org/10.1109/TVCG.2009.9.
Article Google Scholar
Levin A, Lischinski D, Weiss Y. Colorization using optimization. ACM Trans. Graphics, 2004, 23(3): 689–694. DOI: https://doi.org/10.1145/1015706.1015780.
Article Google Scholar
Akimoto N, Hayakawa A, Shin A, Narihira T. Reference-based video colorization with spatiotemporal correspondence. arXiv: 2011.12528, 2020. https://arxiv.org/abs/2011.12528, May 2024.
Sun D Q, Yang X D, Liu M Y, Kautz J. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In Proc. the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.8934–8943. DOI: https://doi.org/10.1109/CVPR.2018.00931.
Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In Proc. the 30th IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.1647–1655. DOI: https://doi.org/10.1109/CVPR.2017.179.
Chang Y L, Liu Z Y, Lee K Y, Hsu W. Free-form video inpainting with 3D gated convolution and temporal patchGAN. In Proc. the 16th IEEE/CVF International Conference on Computer Vision, Oct. 2019, pp.9065–9074. DOI: https://doi.org/10.1109/ICCV.2019.00916.
Oquab M, Darcet T, Moutakanni T, Vo H, Szafraniec M, Khalidov V, Fernandez P, Haziza D, Massa F, El-Nouby A, Assran M, Ballas N, Galuba W, Howes R, Huang P Y, Li S W, Misra I, Rabbat M, Sharma V, Synnaeve G, Xu H, Jegou H, Mairal J, Labatut P, Joulin A, Bojanowski P. DINOv2: Learning robust visual features without supervision. arXiv: 2304.07193, 2023. https://arxiv.org/abs/2304.07193, May 2024.
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780. DOI: https://doi.org/10.1162/neco.1997.9.8.1735.
Article Google Scholar
Cho K, Van Merriënboer B, Bahdanau D, Bengio Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv: 1409.1259, 2014. https://arxiv.org/abs/1409.1259, May 2024.
Kang X Y, Lin X H, Zhang K, Hui Z, Xiang W M, He J Y, Li X M, Ren P R, Xie X S, Timofte R, Yang Y X, Pan J S, Zheng Z, Qiyan P, Jiangxin Z, Jinhui D, Jinjing T, Chichen L, Li L Q, Liang Q R, Gang R, Liu X F, Feng S, Liu S, Wang H, Feng C Y, Bai F R, Zhang Y Q, Shao G Q, Wang X T, Lei L, Chen S Q, Zhang Y, Xu H N, Liu Z Y, Zhang Z, Luo Y, Zuo Z C. NTIRE 2023 video colorization challenge. In Proc. the 36th IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2023. pp.1570–1581. DOI: https://doi.org/10.1109/CVPRW59228.2023.00159.
Zhang R, Isola P, Efros A A. Colorful image colorization. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.649–666. DOI: https://doi.org/10.1007/978-3-319-46487-9_40.
Kang X Y, Yang T, Ouyang W Q, Ren P R, Li L Z, Xie X S. DDColor: Towards photo-realistic image colorization via dual decoders. In Proc. the 2023 IEEE/CVF International Conference on Computer Vision, Oct. 2023, pp.328–338. DOI: https://doi.org/10.1109/ICCV51070.2023.00037.
Ji X Z, Jiang B Y, Luo D H, Tao G P, Chu W Q, Xie Z F, Wang C J, Tai Y. ColorFormer: Image colorization via color memory assisted hybrid-attention transformer. In Proc. the 17th European Conference on Computer Vision, Oct. 2022, pp.20–36. DOI: https://doi.org/10.1007/978-3-031-19787-1_2.
Kim E, Lee S, Park J, Choi S, Seo C, Choo J. Deep edge-aware interactive colorization against color-bleeding effects. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.14647–14656. DOI: https://doi.org/10.1109/ICCV48922.2021.01440.
Pan J H, Bai H R, Tang J H. Cascaded deep video deblurring using temporal sharpness prior. In Proc. the 33rd IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.3040–3048. DOI: https://doi.org/10.1109/CVPR42600.2020.00311.

Download references

Author information

Co-First Author (Zhong-Zheng Peng wrote the section about the classification of video colorization methods, and Yi-Xin Yang wrote the introduction. Zhong-Zheng Peng and Yi-Xin Yang jointly participated in the remaining sections.)

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
Zhong-Zheng Peng (彭中正), Yi-Xin Yang (杨艺新), Jin-Hui Tang (唐金辉) & Jin-Shan Pan (潘金山)

Authors

Zhong-Zheng Peng (彭中正)
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Xin Yang (杨艺新)
View author publications
You can also search for this author in PubMed Google Scholar
Jin-Hui Tang (唐金辉)
View author publications
You can also search for this author in PubMed Google Scholar
Jin-Shan Pan (潘金山)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin-Shan Pan (潘金山).

Ethics declarations

Conflict of Interest The authors declare that they have no conflict of interest.

Additional information

This work was supported by the National Natural Science Foundation of China under Grant Nos. U22B2049 and 62332010.

Zhong-Zheng Peng is currently pursuing his Ph.D. degree in School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing. His research interests include image/video colorization, super-resolution, and other restoration tasks.

Yi-Xin Yang is currently pursuing his Ph.D. degree in School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing. He received his M.S. degree in electrical engineering from University of California, Riverside, and B.S. degree in electronic information engineering from University of Electronic Science and Technology of China, Chengdu. His research interests include image/video colorization, super-resolution, and other restoration tasks.

Jin-Hui Tang received his B.E. and Ph.D. degrees from the University of Science and Technology of China, Hefei, in 2003 and 2008, respectively. He is currently a professor with the Nanjing University of Science and Technology, Nanjing. His research interests include multimedia analysis and computer vision.

Jin-Shan Pan received his Ph.D. degree in computational mathematics from the Dalian University of Technology, Dalian, in 2017. He is a professor of School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing. His research interest includes image deblurring, image/video analysis and enhancement, and related vision problems.

Electronic Supplementary Material