Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs

Perevozchikov, Georgy; Mehta, Nancy; Afifi, Mahmoud; Timofte, Radu

doi:10.1007/978-3-031-72764-1_14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15094))

Included in the following conference series:

European Conference on Computer Vision

454 Accesses

Abstract

Modern smartphone camera quality heavily relies on the image signal processor (ISP) to enhance captured raw images, utilizing carefully designed modules to produce final output images encoded in a standard color space (e.g., sRGB). Neural-based end-to-end learnable ISPs offer promising advancements, potentially replacing traditional ISPs with their ability to adapt without requiring extensive tuning for each new camera model, as is often the case for nearly every module in traditional ISPs. However, the key challenge with the recent learning-based ISPs is the urge to collect large paired datasets for each distinct camera model due to the influence of intrinsic camera characteristics on the formation of input raw images. This paper tackles this challenge by introducing a novel method for unpaired learning of raw-to-raw translation across diverse cameras. Specifically, we propose Rawformer, an unsupervised Transformer-based encoder-decoder method for raw-to-raw translation. It accurately maps raw images captured by a certain camera to the target camera, facilitating the generalization of learnable ISPs to new unseen cameras. Our method demonstrates superior performance on real camera datasets, achieving higher accuracy compared to previous state-of-the-art techniques, and preserving a more robust correlation between the original and translated raw images. The codes and the pretrained models are available at https://github.com/gosha20777/rawformer.

M. Afifi—Now at Google.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images

Transform Your Smartphone into a DSLR Camera: Learning the ISP in the Wild

DualDn: Dual-Domain Denoising via Differentiable ISP

References

Sharif, S.A., Naqvi, R.A., Biswas, M.: Beyond joint demosaicking and denoising: an image processing pipeline for a pixel-bin image sensor. In: CVPR (2021)
Google Scholar
Abdelhamed, A., Lin, S., Brown, M.S.: A high-quality denoising dataset for smartphone cameras. In: CVPR (2018)
Google Scholar
Afifi, M., Abdelhamed, A., Abuolaim, A., Punnappurath, A., Brown, M.S.: CIE XYZ net: unprocessing images for low-level computer vision tasks. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 4688–4700 (2021)
Google Scholar
Afifi, M., Abuolaim, A.: Semi-supervised raw-to-raw mapping. In: BMVC (2021)
Google Scholar
Afifi, M., Brown, M.S.: Sensor-independent illumination estimation for DNN models. In: BMVC (2019)
Google Scholar
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Barron, J.T., Tsai, Y.T.: Fast Fourier color constancy. In: CVPR (2017)
Google Scholar
Chen, C., Chen, Q., Xu, J., Koltun, V.: Learning to see in the dark. In: CVPR (2018)
Google Scholar
Cheng, D., Prasad, D.K., Brown, M.S.: Illuminant estimation for color constancy: why spatial-domain methods work and the role of the color distribution. JOSA A 31(5), 1049–1058 (2014)
Article Google Scholar
Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR (2018)
Google Scholar
Conde, M.V., Vasluianu, F., Vazquez-Corral, J., Timofte, R.: Perceptual image enhancement for smartphone real-time applications. In: CVPR (2023)
Google Scholar
Dai, L., Liu, X., Li, C., Chen, J.: AWNet: attentive wavelet network for image ISP. In: ECCV (2020)
Google Scholar
Delbracio, M., Kelly, D., Brown, M.S., Milanfar, P.: Mobile computational photography: a tour. Annu. Rev. Vision Sci. 7, 571–604 (2021)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Finlayson, G.D., Zhu, Y.: Designing color filters that make cameras more colorimetric. IEEE Trans. Image Process. 30, 853–867 (2020)
Article MathSciNet Google Scholar
Hasinoff, S.W., et al.: Burst photography for high dynamic range and low-light imaging on mobile cameras. ACM Trans. Graph. 35(6), 1–12 (2016)
Google Scholar
He, X., et al.: Enhancing RAW-to-sRGB with decoupled style structure in Fourier domain. In: AAAI (2024)
Google Scholar
Herrmann, C., et al.: Learning to autofocus. In: CVPR (2020)
Google Scholar
Ignatov, A., Chiang, C.M., Kuo, H.K., Sycheva, A., Timofte, R.: Learned smartphone ISP on mobile NPUs with deep learning, mobile AI 2021 challenge: Report. In: CVPRW (2021)
Google Scholar
Ignatov, A., et al.: AIM 2019 challenge on raw to RGB mapping: methods and results. In: ICCVW (2019)
Google Scholar
Ignatov, A., et al.: AIM 2020 challenge on learned image signal processing pipeline. In: ECCV (2020)
Google Scholar
Ignatov, A., Van Gool, L., Timofte, R.: Replacing mobile camera ISP with a single deep learning model. In: CVPRW (2020)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Google Scholar
Jeong, W., Jung, S.W.: RAWtoBit: a fully end-to-end camera ISP network. In: ECCV (2022)
Google Scholar
Jiang, Y., Wronski, B., Mildenhall, B., Barron, J.T., Wang, Z., Xue, T.: Fast and high quality image denoising via malleable convolution. In: ECCV (2022)
Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANS for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: CVPR (2020)
Google Scholar
Kim, J., Kim, M., Kang, H., Lee, K.: U-GAT-IT: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. arXiv preprint arXiv:1907.10830 (2019)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR (2017)
Google Scholar
Liang, Z., Cai, J., Cao, Z., Zhang, L.: Cameranet: a two-stage framework for effective camera ISP learning. IEEE Trans. Image Process. 30, 2248–2262 (2021)
Article Google Scholar
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: NeurIPS (2017)
Google Scholar
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Maas, A.L., Hannun, A.Y., Ng, A.Y., et al.: Rectifier nonlinearities improve neural network acoustic models. In: ICML (2013)
Google Scholar
Menon, D., Andriani, S., Calvagno, G.: Demosaicing with directional filtering and a posteriori decision. IEEE Trans. Image Process. 16(1), 132–141 (2006)
Article MathSciNet Google Scholar
Nguyen, R., Prasad, D.K., Brown, M.S.: Raw-to-raw: mapping between image sensor color responses. In: CVPR (2014)
Google Scholar
Pang, Y., Lin, J., Qin, T., Chen, Z.: Image-to-image translation: methods and applications. IEEE Trans. Multimedia 24, 3859–3881 (2021)
Article Google Scholar
Park, T., Efros, A.A., Zhang, R., Zhu, J.Y.: Contrastive learning for unpaired image-to-image translation. In: ECCV (2020)
Google Scholar
Prabhakar, K.R., Vinod, V., Sahoo, N.R., Babu, R.V.: Few-shot domain adaptation for low light raw image enhancement. arXiv preprint arXiv:2303.15528 (2023)
Punnappurath, A., Abuolaim, A., Abdelhamed, A., Levinshtein, A., Brown, M.S.: Day-to-night image synthesis for training nighttime neural ISPS. In: CVPR (2022)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (2015)
Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANS. Adv. Neural Inf. Process. Syst. 29 (2016)
Google Scholar
Schwartz, E., Giryes, R., Bronstein, A.M.: DeepISP: toward learning an end-to-end image processing pipeline. IEEE Trans. Image Process. 28(2), 912–923 (2018)
Article MathSciNet Google Scholar
Seo, D., et al.: Graphics2RAW: mapping computer graphics images to sensor raw images. In: ICCV (2023)
Google Scholar
Sharma, G., Wu, W., Dalal, E.N.: The CIEDE2000 color-difference formula: implementation notes, supplementary test data, and mathematical observations. Color. Res. Appl. 30(1), 21–30 (2005)
Article Google Scholar
Šindelář, O., Šroubek, F.: Image deblurring in smartphone devices using built-in inertial measurement sensors. J. Electron. Imaging 22(1), 011003 (2013)
Article Google Scholar
Souza, M., Heidrich, W.: Crispnet: color rendition ISP net. arXiv preprint arXiv:2203.10562 (2022)
Tominaga, S., Nishi, S., Ohtera, R.: Measurement and estimation of spectral sensitivity functions for mobile phone cameras. Sensors 21(15), 4985 (2021)
Article Google Scholar
Torbunov, D., et al.: UVCGAN: UNet vision transformer cycle-consistent GAN for unpaired image-to-image translation. In: WACV (2023)
Google Scholar
Torbunov, D., et al.: UVCGAN v2: an improved cycle-consistent GAN for unpaired image-to-image translation. arXiv preprint arXiv:2303.16280 (2023)
Truong, P., Danelljan, M., Van Gool, L., Timofte, R.: Learning accurate dense correspondences and when to trust them. In: CVPR (2021)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Wirzberger Raimundo, D., Ignatov, A., Timofte, R.: LAN: Lightweight attention-based network for raw-to-RGB smartphone image processing. In: CVPRW, pp. 807–815 (2022)
Google Scholar
Wronski, B., et al.: Handheld multi-frame super-resolution. ACM Trans. Graph. 38(4), 1–18 (2019)
Google Scholar
Xing, Y., Qian, Z., Chen, Q.: Invertible image signal processing. In: CVPR (2021)
Google Scholar
Yi, Z., Zhang, H., Tan, P., Gong, M.: DualGAN: Unsupervised dual learning for image-to-image translation. In: ICCV, pp. 2849–2857 (2017)
Google Scholar
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H.: Restormer: efficient transformer for high-resolution image restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5728–5739 (2022)
Google Scholar
Zamir, S.W., et al.: CycleISP: real image restoration via improved data synthesis. In: CVPR (2020)
Google Scholar
Zhang, Z., Wang, H., Liu, M., Wang, R., Zhang, J., Zuo, W.: Learning raw-to-SRGB mappings with inaccurately aligned supervision. In: ICCV (2021)
Google Scholar
Zhao, Y., Wu, R., Dong, H.: Unpaired image-to-image translation using adversarial consistency loss. In: ECCV (2020)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Google Scholar
Zhu, P., Abdal, R., Qin, Y., Wonka, P.: SEAN: image synthesis with semantic region-adaptive normalization. In: CVPR (2020)
Google Scholar

Download references

Acknowledgments

This work was partly supported by The Alexander von Humboldt Foundation.

Author information

Authors and Affiliations

Computer Vision Lab, CAIDAS and IFI, University of Würzburg, John Skilton Strasse 4a, 97074, Würzburg, Germany
Georgy Perevozchikov, Nancy Mehta & Radu Timofte
York University, 4700 Keele Street, Toronto, ON, Canada, M3J 1P3
Mahmoud Afifi

Authors

Georgy Perevozchikov
View author publications
You can also search for this author in PubMed Google Scholar
Nancy Mehta
View author publications
You can also search for this author in PubMed Google Scholar
Mahmoud Afifi
View author publications
You can also search for this author in PubMed Google Scholar
Radu Timofte
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nancy Mehta .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 14071 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Perevozchikov, G., Mehta, N., Afifi, M., Timofte, R. (2025). Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15094. Springer, Cham. https://doi.org/10.1007/978-3-031-72764-1_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-72764-1_14
Published: 25 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72763-4
Online ISBN: 978-3-031-72764-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs