research-article

MLP Embedded Inverse Tone Mapping

Authors:

Zhiwei XiongAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 1283 - 1291

https://doi.org/10.1145/3664647.3680937

Published: 28 October 2024 Publication History

Abstract

The advent of High Dynamic Range/Wide Color Gamut (HDR/WCG) display technology has made significant progress in providing exceptional richness and vibrancy for the human visual experience. However, the widespread adoption of HDR/WCG images is hindered by their substantial storage requirements, imposing significant bandwidth challenges during distribution. Besides, HDR/WCG images are often tone-mapped into Standard Dynamic Range (SDR) versions for compatibility, necessitating the usage of inverse Tone Mapping (iTM) techniques to reconstruct their original representation. In this work, we propose a meta-transfer learning framework for practical HDR/WCG media transmission by embedding image-wise metadata into their SDR counterparts for later iTM reconstruction. Specifically, we devise a meta-learning strategy to pre-train a lightweight multilayer perceptron (MLP) model that maps SDR pixels to HDR/WCG ones on an external dataset, resulting in a domain-wise iTM model. Subsequently, for the transfer learning process of each HDR/WCG image, we present a spatial-aware online mining mechanism to select challenging training pairs to adapt the meta-trained model to an image-wise iTM model. Finally, the adapted MLP, embedded as metadata, is transmitted alongside the SDR image, facilitating the reconstruction of the original image on HDR/WCG displays. We conduct extensive experiments and evaluate the proposed framework with diverse metrics. Compared with existing solutions, our framework shows superior performance in fidelity, minimal latency, and negligible overhead. The codes are available at https://github.com/pjliu3/MLP_iTM.

References

[1]

Francesco Banterle, Patrick Ledda, Kurt Debattista, Alan Chalmers, and Marina Bloj. 2007. A framework for inverse tone mapping. The Visual Computer 23 (2007), 467--478.

Digital Library

[2]

Cambodge Bist, Rémi Cozot, Gérard Madec, and Xavier Ducloux. 2017. Tone expansion using lighting style aesthetics. Comput. Graph. 62 (2017), 77--86.

[3]

ITU-R Recommendation BT. 2002. Parameter values for the HDTV standards for production and international programme exchange. International Telecommunication Union, Recommendation, May (2002).

[4]

Xiangyu Chen, Yihao Liu, Zhengwen Zhang, Yu Qiao, and Chao Dong. 2021. Hdrunet: Single image hdr reconstruction with denoising and dequantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 354--363.

[5]

Xiangyu Chen, Zhengwen Zhang, Jimmy S Ren, Lynhoo Tian, Yu Qiao, and Chao Dong. 2021. A new journey from SDRTV to HDRTV. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4500--4509.

[6]

Zhen Cheng, TaoWang, Yong Li, Fenglong Song, Chang Chen, and Zhiwei Xiong. 2022. Towards real-world hdrtv reconstruction: A data synthesis-based approach. In European Conference on Computer Vision. Springer, 199--216.

Digital Library

[7]

Jun Chu, Zhixian Guo, and Lu Leng. 2018. Object Detection Based on Multi-Layer Convolution Feature Fusion and Online Hard Example Mining. IEEE Access 6 (2018), 19959--19967.

[8]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic metalearning for fast adaptation of deep networks. In International conference on machine learning. PMLR, 1126--1135.

[9]

Yuanshen Guan, Ruikang Xu, Mingde Yao, Jie Huang, and Zhiwei Xiong. 2024. EdiTor: Edge-guided Transformer for Ghost-free High Dynamic Range Imaging. ACM Transactions on Multimedia Computing, Communications and Applications (2024).

[10]

Yuanshen Guan, Ruikang Xu, Mingde Yao, Lizhi Wang, and Zhiwei Xiong. 2023. Mutual-guided dynamic network for image fusion. In Proceedings of the 31st ACM International Conference on Multimedia. 1779--1788.

Digital Library

[11]

Cheng Guo, Leidong Fan, Ziyu Xue, and Xiuhua Jiang. 2023. Learning a practical sdr-to-hdrtv up-conversion using new dataset and degradation models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22231--22241.

[12]

Chenlei Hu, Ruohua Zhou, and Qingsheng Yuan. 2023. Synthetic Speech Spoofing Detection Based on Online Hard Example Mining. IEEE Access 11 (2023), 140443--140450.

[13]

Peihuan Huang, Gaofeng Cao, Fei Zhou, and Guoping Qiu. 2023. Video inverse tone mapping network with luma and chroma mapping. In Proceedings of the 31st ACM International Conference on Multimedia. 1383--1391.

Digital Library

[14]

ITU 2019. Objective metric for the assessment of the potential visibility of colour differences in television (0 ed.). ITU, Geneva, Switzerland.

[15]

ITU-R. 2020. High Dynamic Range Television for Production and International Programme Exchange. ITU-R Rec BT.2390-8 (2020).

[16]

Soo Ye Kim and Munchurl Kim. 2019. A multi-purpose convolutional neural network for simultaneous super-resolution and high dynamic range image reconstruction. In Computer Vision-ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2-6, 2018, Revised Selected Papers, Part III 14. Springer, 379--394.

[17]

Soo Ye Kim, Jihyong Oh, and Munchurl Kim. 2019. Deep sr-itm: Joint learning of super-resolution and inverse tone-mapping for 4k uhd hdr applications. In Proceedings of the IEEE/CVF international conference on computer vision. 3116--3125.

[18]

Soo Ye Kim, Jihyong Oh, and Munchurl Kim. 2020. Jsi-gan: Gan-based joint super-resolution and inverse tone-mapping with pixel-wise task-specific filters for uhd hdr video. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 11287--11295.

[19]

Rafael P Kovaleski and Manuel M Oliveira. 2014. High-quality reverse tone mapping for a wide range of exposures. In 2014 27th SIBGRAPI Conference on Graphics, Patterns and Images. IEEE, 49--56.

Digital Library

[20]

Hoang M Le, Brian Price, Scott Cohen, and Michael S Brown. 2023. GamutMLP: A Lightweight MLP for Color Loss Recovery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18268--18277.

[21]

Leyi Li, Huijie Qiao, Qi Ye, and Qinmin Yang. 2023. Metadata-Based RAW Reconstruction via Implicit Neural Functions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18196--18205.

[22]

Shuai Luo, Yujie Li, Pengxiang Gao, Yichuan Wang, and Seiichi Serikawa. 2022. Meta-seg: A survey of meta-learning for image segmentation. Pattern Recognition 126 (2022), 108586.

Digital Library

[23]

Gonzalo Luzardo, Jan Aelterman, Hiep Luong, Wilfried Philips, Daniel Ochoa, and Sven Rousseaux. 2018. Fully-automatic inverse tone mapping preserving the content creator's artistic intentions. In 2018 Picture Coding Symposium (PCS). IEEE, 199--203.

[24]

Rafal K. Mantiuk and Maryam Azimi. 2021. PU21: A novel perceptually uniform encoding for adapting existing quality metrics for HDR. In Picture Coding Symposium, PCS 2021, Bristol, United Kingdom, June 29 - July 2, 2021. IEEE, 1--5.

[25]

Rafal K. Mantiuk, Dounia Hammou, and Param Hanji. 2023. HDR-VDP-3: A multimetric for predicting image differences, quality and contrast distortions in high dynamic range and regular content. CoRR abs/2304.13625 (2023). arXiv:2304.13625

[26]

Belen Masia, Ana Serrano, and Diego Gutierrez. 2017. Dynamic range expansion based on image statistics. Multimedia Tools and Applications 76 (2017), 631--648.

Digital Library

[27]

Seonghyeon Nam, Abhijith Punnappurath, Marcus A. Brubaker, and Michael S. Brown. 2022. Learning sRGB-to-Raw-RGB De-rendering with Content-Aware Metadata. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 17683--17692.

[28]

Rang MH Nguyen and Michael S Brown. 2016. RAW image reconstruction using a self-contained sRGB-JPEG image with only 64 KB overhead. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1655--1663.

[29]

Alex Nichol, Joshua Achiam, and John Schulman. 2018. On First-Order Meta-Learning Algorithms. CoRR abs/1803.02999 (2018). arXiv:1803.02999 http://arxiv. org/abs/1803.02999

[30]

Abhijith Punnappurath and Michael S. Brown. 2021. Spatially Aware Metadata for Raw Reconstruction. In IEEE Winter Conference on Applications of Computer Vision, WACV 2021, Waikoloa, HI, USA, January 3-8, 2021. IEEE, 218--226.

[31]

Allan G Rempel, Matthew Trentacoste, Helge Seetzen, H David Young, Wolfgang Heidrich, Lorne Whitehead, and Greg Ward. 2007. Ldr2hdr: on-the-fly reverse tone mapping of legacy video and photographs. ACM transactions on graphics (TOG) 26, 3 (2007), 39--es.

Digital Library

[32]

Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B Tenenbaum, Hugo Larochelle, and Richard S Zemel. 2018. Meta-learning for semi-supervised few-shot classification. arXiv preprint arXiv:1803.00676 (2018).

[33]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention--MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 234--241.

[34]

Marc Rußwurm, Sherrie Wang, Marco Korner, and David Lobell. 2020. Metalearning for few-shot land cover classification. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition workshops. 200--201.

[35]

János Schanda. 2007. Colorimetry: understanding the CIE system. John Wiley & Sons.

[36]

BT Series. 2012. Parameter values for ultra-high definition television systems for production and international programme exchange. In Proc. ITU-T, Bt. 2020. 1--7.

[37]

BT Series. 2019. Guidance for operational practices in HDR television production. (2019).

[38]

BT Series. 2019. Methods for conversion of high dynamic range content to standard dynamic range content and vice-versa. (2019).

[39]

Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. 2016. Training regionbased object detectors with online hard example mining. In Proceedings of the IEEE conference on computer vision and pattern recognition. 761--769.

[40]

SMPTE Standard. 2014. High dynamic range electro-optical transfer function of mastering reference displays. SMPTE ST 2084, 2014 (2014), 11.

[41]

Doug Walker, Carol Payne, Patrick Hodoul, and Michael Dolan. 2021. Color management with opencolorio v2. In ACM SIGGRAPH 2021 Courses. 1--226.

Digital Library

[42]

Guangting Wang, Chong Luo, Xiaoyan Sun, Zhiwei Xiong, and Wenjun Zeng. 2020. Tracking by instance detection: A meta-learning approach. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6288--6297.

[43]

Yu-Xiong Wang, Deva Ramanan, and Martial Hebert. 2019. Meta-learning to detect rare objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9925--9934.

[44]

Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 4 (2004), 600--612.

Digital Library

[45]

Zhou Wang, Eero P Simoncelli, and Alan C Bovik. 2003. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, Vol. 2. Ieee, 1398--1402.

[46]

Gang Xu, Qibin Hou, Le Zhang, and Ming-Ming Cheng. 2022. Fmnet: Frequencyaware modulation network for sdr-to-hdr translation. In Proceedings of the 30th ACM International Conference on Multimedia. 6425--6435.

Digital Library

[47]

Mingde Yao, Dongliang He, Xin Li, Zhihong Pan, and Zhiwei Xiong. 2023. Bidirectional translation between uhd-hdr and hd-sdr videos. IEEE Transactions on Multimedia 25 (2023), 8672--8686.

Digital Library

[48]

Lin Zhang, Ying Shen, and Hongyu Li. 2014. VSI: A visual saliency-induced index for perceptual image quality assessment. IEEE Transactions on Image processing 23, 10 (2014), 4270--4281.

[49]

Lin Zhang, Lei Zhang, Xuanqin Mou, and David Zhang. 2011. FSIM: A feature similarity index for image quality assessment. IEEE transactions on Image Processing 20, 8 (2011), 2378--2386.

Digital Library

Index Terms

MLP Embedded Inverse Tone Mapping
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction

Recommendations

Video Inverse Tone Mapping Network with Luma and Chroma Mapping
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

\beginabstract With the popularity of consumer high dynamic range (HDR) display devices, video inverse tone mapping (iTM) has become a research hotspot. However, existing methods are designed based on a perceptual non-uniformity color space (e.g., RGB ...
Redistributing the Precision and Content in 3D-LUT-based Inverse Tone-mapping for HDR/WCG Display
CVMP '23: Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production

ITM (inverse tone-mapping) converts SDR (standard dynamic range) footage to HDR/WCG (high dynamic range /wide color gamut) for media production. It happens not only when remastering legacy SDR footage in front-end content provider, but also adapting on-...
Hybrid Conditional Deep Inverse Tone Mapping
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Emerging modern displays are capable to render ultra-high definition (UHD) media contents with high dynamic range (HDR) and wide color gamut (WCG). Although more and more native contents as such have been getting produced, the total amount is still in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
62
Total Downloads

Downloads (Last 12 months)62
Downloads (Last 6 weeks)15

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten