skip to main content
10.1145/3664647.3680937acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

MLP Embedded Inverse Tone Mapping

Published: 28 October 2024 Publication History

Abstract

The advent of High Dynamic Range/Wide Color Gamut (HDR/WCG) display technology has made significant progress in providing exceptional richness and vibrancy for the human visual experience. However, the widespread adoption of HDR/WCG images is hindered by their substantial storage requirements, imposing significant bandwidth challenges during distribution. Besides, HDR/WCG images are often tone-mapped into Standard Dynamic Range (SDR) versions for compatibility, necessitating the usage of inverse Tone Mapping (iTM) techniques to reconstruct their original representation. In this work, we propose a meta-transfer learning framework for practical HDR/WCG media transmission by embedding image-wise metadata into their SDR counterparts for later iTM reconstruction. Specifically, we devise a meta-learning strategy to pre-train a lightweight multilayer perceptron (MLP) model that maps SDR pixels to HDR/WCG ones on an external dataset, resulting in a domain-wise iTM model. Subsequently, for the transfer learning process of each HDR/WCG image, we present a spatial-aware online mining mechanism to select challenging training pairs to adapt the meta-trained model to an image-wise iTM model. Finally, the adapted MLP, embedded as metadata, is transmitted alongside the SDR image, facilitating the reconstruction of the original image on HDR/WCG displays. We conduct extensive experiments and evaluate the proposed framework with diverse metrics. Compared with existing solutions, our framework shows superior performance in fidelity, minimal latency, and negligible overhead. The codes are available at https://github.com/pjliu3/MLP_iTM.

References

[1]
Francesco Banterle, Patrick Ledda, Kurt Debattista, Alan Chalmers, and Marina Bloj. 2007. A framework for inverse tone mapping. The Visual Computer 23 (2007), 467--478.
[2]
Cambodge Bist, Rémi Cozot, Gérard Madec, and Xavier Ducloux. 2017. Tone expansion using lighting style aesthetics. Comput. Graph. 62 (2017), 77--86.
[3]
ITU-R Recommendation BT. 2002. Parameter values for the HDTV standards for production and international programme exchange. International Telecommunication Union, Recommendation, May (2002).
[4]
Xiangyu Chen, Yihao Liu, Zhengwen Zhang, Yu Qiao, and Chao Dong. 2021. Hdrunet: Single image hdr reconstruction with denoising and dequantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 354--363.
[5]
Xiangyu Chen, Zhengwen Zhang, Jimmy S Ren, Lynhoo Tian, Yu Qiao, and Chao Dong. 2021. A new journey from SDRTV to HDRTV. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4500--4509.
[6]
Zhen Cheng, TaoWang, Yong Li, Fenglong Song, Chang Chen, and Zhiwei Xiong. 2022. Towards real-world hdrtv reconstruction: A data synthesis-based approach. In European Conference on Computer Vision. Springer, 199--216.
[7]
Jun Chu, Zhixian Guo, and Lu Leng. 2018. Object Detection Based on Multi-Layer Convolution Feature Fusion and Online Hard Example Mining. IEEE Access 6 (2018), 19959--19967.
[8]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic metalearning for fast adaptation of deep networks. In International conference on machine learning. PMLR, 1126--1135.
[9]
Yuanshen Guan, Ruikang Xu, Mingde Yao, Jie Huang, and Zhiwei Xiong. 2024. EdiTor: Edge-guided Transformer for Ghost-free High Dynamic Range Imaging. ACM Transactions on Multimedia Computing, Communications and Applications (2024).
[10]
Yuanshen Guan, Ruikang Xu, Mingde Yao, Lizhi Wang, and Zhiwei Xiong. 2023. Mutual-guided dynamic network for image fusion. In Proceedings of the 31st ACM International Conference on Multimedia. 1779--1788.
[11]
Cheng Guo, Leidong Fan, Ziyu Xue, and Xiuhua Jiang. 2023. Learning a practical sdr-to-hdrtv up-conversion using new dataset and degradation models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22231--22241.
[12]
Chenlei Hu, Ruohua Zhou, and Qingsheng Yuan. 2023. Synthetic Speech Spoofing Detection Based on Online Hard Example Mining. IEEE Access 11 (2023), 140443--140450.
[13]
Peihuan Huang, Gaofeng Cao, Fei Zhou, and Guoping Qiu. 2023. Video inverse tone mapping network with luma and chroma mapping. In Proceedings of the 31st ACM International Conference on Multimedia. 1383--1391.
[14]
ITU 2019. Objective metric for the assessment of the potential visibility of colour differences in television (0 ed.). ITU, Geneva, Switzerland.
[15]
ITU-R. 2020. High Dynamic Range Television for Production and International Programme Exchange. ITU-R Rec BT.2390-8 (2020).
[16]
Soo Ye Kim and Munchurl Kim. 2019. A multi-purpose convolutional neural network for simultaneous super-resolution and high dynamic range image reconstruction. In Computer Vision-ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2-6, 2018, Revised Selected Papers, Part III 14. Springer, 379--394.
[17]
Soo Ye Kim, Jihyong Oh, and Munchurl Kim. 2019. Deep sr-itm: Joint learning of super-resolution and inverse tone-mapping for 4k uhd hdr applications. In Proceedings of the IEEE/CVF international conference on computer vision. 3116--3125.
[18]
Soo Ye Kim, Jihyong Oh, and Munchurl Kim. 2020. Jsi-gan: Gan-based joint super-resolution and inverse tone-mapping with pixel-wise task-specific filters for uhd hdr video. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11287--11295.
[19]
Rafael P Kovaleski and Manuel M Oliveira. 2014. High-quality reverse tone mapping for a wide range of exposures. In 2014 27th SIBGRAPI Conference on Graphics, Patterns and Images. IEEE, 49--56.
[20]
Hoang M Le, Brian Price, Scott Cohen, and Michael S Brown. 2023. GamutMLP: A Lightweight MLP for Color Loss Recovery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18268--18277.
[21]
Leyi Li, Huijie Qiao, Qi Ye, and Qinmin Yang. 2023. Metadata-Based RAW Reconstruction via Implicit Neural Functions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18196--18205.
[22]
Shuai Luo, Yujie Li, Pengxiang Gao, Yichuan Wang, and Seiichi Serikawa. 2022. Meta-seg: A survey of meta-learning for image segmentation. Pattern Recognition 126 (2022), 108586.
[23]
Gonzalo Luzardo, Jan Aelterman, Hiep Luong, Wilfried Philips, Daniel Ochoa, and Sven Rousseaux. 2018. Fully-automatic inverse tone mapping preserving the content creator's artistic intentions. In 2018 Picture Coding Symposium (PCS). IEEE, 199--203.
[24]
Rafal K. Mantiuk and Maryam Azimi. 2021. PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR. In Picture Coding Symposium, PCS 2021, Bristol, United Kingdom, June 29 - July 2, 2021. IEEE, 1--5.
[25]
Rafal K. Mantiuk, Dounia Hammou, and Param Hanji. 2023. HDR-VDP-3: A multimetric for predicting image differences, quality and contrast distortions in high dynamic range and regular content. CoRR abs/2304.13625 (2023). arXiv:2304.13625
[26]
Belen Masia, Ana Serrano, and Diego Gutierrez. 2017. Dynamic range expansion based on image statistics. Multimedia Tools and Applications 76 (2017), 631--648.
[27]
Seonghyeon Nam, Abhijith Punnappurath, Marcus A. Brubaker, and Michael S. Brown. 2022. Learning sRGB-to-Raw-RGB De-rendering with Content-Aware Metadata. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 17683--17692.
[28]
Rang MH Nguyen and Michael S Brown. 2016. RAW image reconstruction using a self-contained sRGB-JPEG image with only 64 KB overhead. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1655--1663.
[29]
Alex Nichol, Joshua Achiam, and John Schulman. 2018. On First-Order Meta-Learning Algorithms. CoRR abs/1803.02999 (2018). arXiv:1803.02999 http://arxiv. org/abs/1803.02999
[30]
Abhijith Punnappurath and Michael S. Brown. 2021. Spatially Aware Metadata for Raw Reconstruction. In IEEE Winter Conference on Applications of Computer Vision, WACV 2021, Waikoloa, HI, USA, January 3-8, 2021. IEEE, 218--226.
[31]
Allan G Rempel, Matthew Trentacoste, Helge Seetzen, H David Young, Wolfgang Heidrich, Lorne Whitehead, and Greg Ward. 2007. Ldr2hdr: on-the-fly reverse tone mapping of legacy video and photographs. ACM transactions on graphics (TOG) 26, 3 (2007), 39--es.
[32]
Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B Tenenbaum, Hugo Larochelle, and Richard S Zemel. 2018. Meta-learning for semi-supervised few-shot classification. arXiv preprint arXiv:1803.00676 (2018).
[33]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention--MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 234--241.
[34]
Marc Rußwurm, Sherrie Wang, Marco Korner, and David Lobell. 2020. Metalearning for few-shot land cover classification. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition workshops. 200--201.
[35]
János Schanda. 2007. Colorimetry: understanding the CIE system. John Wiley & Sons.
[36]
BT Series. 2012. Parameter values for ultra-high definition television systems for production and international programme exchange. In Proc. ITU-T, Bt. 2020. 1--7.
[37]
BT Series. 2019. Guidance for operational practices in HDR television production. (2019).
[38]
BT Series. 2019. Methods for conversion of high dynamic range content to standard dynamic range content and vice-versa. (2019).
[39]
Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. 2016. Training regionbased object detectors with online hard example mining. In Proceedings of the IEEE conference on computer vision and pattern recognition. 761--769.
[40]
SMPTE Standard. 2014. High dynamic range electro-optical transfer function of mastering reference displays. SMPTE ST 2084, 2014 (2014), 11.
[41]
Doug Walker, Carol Payne, Patrick Hodoul, and Michael Dolan. 2021. Color management with opencolorio v2. In ACM SIGGRAPH 2021 Courses. 1--226.
[42]
Guangting Wang, Chong Luo, Xiaoyan Sun, Zhiwei Xiong, and Wenjun Zeng. 2020. Tracking by instance detection: A meta-learning approach. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6288--6297.
[43]
Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. 2019. Meta-learning to detect rare objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9925--9934.
[44]
Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 4 (2004), 600--612.
[45]
Zhou Wang, Eero P Simoncelli, and Alan C Bovik. 2003. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Vol. 2. Ieee, 1398--1402.
[46]
Gang Xu, Qibin Hou, Le Zhang, and Ming-Ming Cheng. 2022. Fmnet: Frequencyaware modulation network for sdr-to-hdr translation. In Proceedings of the 30th ACM International Conference on Multimedia. 6425--6435.
[47]
Mingde Yao, Dongliang He, Xin Li, Zhihong Pan, and Zhiwei Xiong. 2023. Bidirectional translation between uhd-hdr and hd-sdr videos. IEEE Transactions on Multimedia 25 (2023), 8672--8686.
[48]
Lin Zhang, Ying Shen, and Hongyu Li. 2014. VSI: A visual saliency-induced index for perceptual image quality assessment. IEEE Transactions on Image processing 23, 10 (2014), 4270--4281.
[49]
Lin Zhang, Lei Zhang, Xuanqin Mou, and David Zhang. 2011. FSIM: A feature similarity index for image quality assessment. IEEE transactions on Image Processing 20, 8 (2011), 2378--2386.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
October 2024
11719 pages
ISBN:9798400706868
DOI:10.1145/3664647
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. high dynamic range
  2. inverse tone mapping
  3. wide color gamut

Qualifiers

  • Research-article

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 62
    Total Downloads
  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)15
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media