Abstract
Ghosting artifacts caused by moving objects and misalignments are a key challenge in constructing high dynamic range (HDR) images. Current methods first register the input low dynamic range (LDR) images using optical flow before merging them. This process is error-prone, and often causes ghosting in the resulting merged image. We propose a novel dual-attention-guided end-to-end deep neural network, called DAHDRNet, which produces high-quality ghost-free HDR images. Unlike previous methods that directly stack the LDR images or features for merging, we use dual-attention modules to guide the merging according to the reference image. DAHDRNet thus exploits both spatial attention and feature channel attention to achieve ghost-free merging. The spatial attention modules automatically suppress undesired components caused by misalignments and saturation, and enhance the fine details in the non-reference images. The channel attention modules adaptively rescale channel-wise features by considering the inter-dependencies between channels. The dual-attention approach is applied recurrently to further improve feature representation, and thus alignment. A dilated residual dense block is devised to make full use of the hierarchical features and increase the receptive field when hallucinating missing details. We employ a hybrid loss function, which consists of a perceptual loss, a total variation loss, and a content loss to recover photo-realistic images. Although DAHDRNet is not flow-based, it can be applied to flow-based registration to reduce artifacts caused by optical-flow estimation errors. Experiments on different datasets show that the proposed DAHDRNet achieves state-of-the-art quantitative and qualitative results.
Similar content being viewed by others
References
Bogoni, L. (2000). Extending dynamic range of monochrome and color images through fusion. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 7–12).
Cai, J., Gu, S., & Zhang, L. (2018). Learning a deep single image contrast enhancer from multi-exposure images. IEEE Transactions on Image Processing, 27(4), 2049–2062.
Debevec, Paul, E., & Malik, J. (1997). Recovering high dynamic range radiance maps from photographs. In Conference on computer graphics & interactive techniques (pp. 369–378).
Eilertsen, G., Kronander, J., Denes, G., Mantiuk, R. K., & Unger, J. (2017). HDR image reconstruction from a single exposure using deep CNNs. ACM Transactions on Graphics, 36(6), 178–193.
Endo, Y., Kanamori, Y., & Mitani, J. (2017). Deep reverse tone mapping. ACM Transactions on Graphics, 36(6), 1–10.
Fan, H., & Zhou, J. (2018). Stacked latent attention for multimodal reasoning. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1072–1080).
Gharbi, M., Chen, J., Barron, J. T., Hasinoff, S. W., & Durand, F. (2017). Deep bilateral learning for real-time image enhancement. ACM Transactions on Graphics, 36(4), 118–130.
Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. Journal of Machine Learning Research, 9, 249–256.
Gong, D., Yang, J., Liu, L., Zhang, Y., Reid, I., Shen, C., Hengel, A. V. D., & Shi, Q. (2016). From motion blur to motion flow: A deep learning solution for removing heterogeneous motion blur. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3806–3815).
Gong, D., Zhang, Z., Shi, Q., Hengel, A. V. D., Shen, C., & Zhang, Y. (2018). Learning an optimizer for image deconvolution. arXiv preprint arXiv:1804.03368
Grosch, T. (2006). Fast and robust high dynamic range image generation with camera and object movement. In IEEE international conference of vision, modeling and visualization.
Hafner, D., Demetz, O., & Weickert, J. (2014). Simultaneous HDR and optic flow computation. In IEEE international conference on pattern recognition (pp. 2065–2070).
Heo, Y., Lee, K., Lee, S., Moon, Y., & Cha, J. (2011). Ghost-free high dynamic range imaging. In IEEE Asian conference on computer vision (ACCV) (pp. 486–500).
Hu, J., Gallo, O., Pulli, K., Sun, X. (2013). HDR deghosting: How to deal with saturation? In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1163–1170).
Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4700–4708).
Jacobs, K., Loscos, C., & Ward, G. (2008). Automatic high dynamic range image generation of dynamic environments. Computer Graphics and Applications, 28(2), 84–93.
Kalantari, N. K., & Ramamoorthi, R. (2017). Deep high dynamic range imaging of dynamic scenes. ACM Transactions on Graphics, 36(4), 1–12.
Kalantari, N. K., & Ramamoorthi, R. (2019). Deep HDR video from sequences with alternating exposures. In Computer graphics forum (Vol. 38, pp. 193–205). Wiley Online Library.
Kang, S. B., Uyttendaele, M., Winder, S., & Szeliski, R. (2003). High dynamic range video. ACM Transactions on Graphics, 22(3), 319–325.
Kingma, D., & Adam, B. J. (2014). A method for stochastic optimization. Computer Science.
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A. P., Tejani, A., Totz, J., Wang, Z., & Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1451–1460).
Lee, C., Li, Y., & Monga, V. (2014). Ghost-free high dynamic range imaging via rank minimization. IEEE Signal Processing Letters, 21(9), 1045–1049.
Lu, J., Xiong, C., Parikh, D., & Socher, R. (2017). Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In IEEE conference on computer vision and pattern recognition (CVPR).
Mann, P., Mann, S., & Picard, R. W. (1995). On being ‘undigital’ with digital cameras: Extending dynamic range by combining differently exposed pictures. In Proceedings of IS&T (pp. 442–448).
Mantiuk, R., Kim, K. J., Rempel, A. G., & Heidrich, W. (2011). HDR-VDP-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions. In ACM Siggraph (pp. 1–14).
Martel, J. N., Mueller, L. K., Carey, S. J., Dudek, P., & Wetzstein, G. (2020). Neural sensors: Learning pixel exposures for HDR imaging and video compressive sensing with programmable sensors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(7), 1642–1653.
Metzler, C. A., Ikoma, H., Peng, Y., & Wetzstein, G. (2020). Deep optics for single-shot high-dynamic-range imaging. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1375–1385).
Miguel, G., Boris, A., Michael, W., Christian, T., Hans-Peter, S., & Hendrik, P. A. L. (2010). Optimal HDR reconstruction with linear digital cameras. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 215–222).
Nayar, S. K., & Mitsunaga, T. (2002). High dynamic range imaging: Spatially varying pixel exposures. In IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 472–479).
Oh, T. H., Lee, J. Y., Tai, Y. W., & Kweon, I. S. (2015). Robust high dynamic range imaging by rank minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(6), 1219–1232.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in pytorch.
Pece, F., & Kautz, J. (2010). Bitmap movement detection: HDR for dynamic scenes. In Visual media production (pp. 1–8).
Prabhakar, K. R., Agrawal, S., Singh, D. K., Ashwath, B., & Babu, R. V. (2020). Towards practical and efficient high-resolution HDR deghosting with CNN. In European conference on computer vision (ECCV) (pp. 497–513).
Raman, S., & Chaudhuri, S. (2011). Reconstruction of high contrast images for dynamic scenes. The Visual Computer, 27(12), 1099–1114.
Reinhard, E., Ward, G., Pattanaik, S., & Debevec, P. E. (2005). High dynamic range imaging, acquisition, display, and image-based lighting. Princeton University Press.
Saeed, A., & Nick, B. (2019). Real image denoising with feature attention. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3155–3164).
Santos, M. S., Ren, T. I., & Kalantari, N. K. (2020). Single image HDR reconstruction using a CNN with masked features and perceptual loss. arXiv preprint arXiv:2005.07335
Sen, P., Nima, K. K., Maziar, Y., Soheil, D., Goldman, D. B., & Shechtman, E. (2012). Robust patch-based HDR reconstruction of dynamic scenes. ACM Transactions on Graphics, 31(6), 1–11.
Srikantha, A., & Sidibe, D. (2012). Ghost detection and removal for high dynamic range images: Recent advances. Signal Processing: Image Communication, 27(6), 650–662.
Suin, M., Purohit, K., & Rajagopalan, A. N. (2020). Spatially-attentive patch-hierarchical network for adaptive motion deblurring. In IEEE conference on computer vision and pattern recognition (CVPR).
Sun, Q., Tseng, E., Fu, Q., Heidrich, W., & Heide, F. (2020). Learning rank-1 diffractive optics for single-shot high dynamic range imaging. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1386–1396).
Szpak, Z. L., Chojnacki, W., Eriksson, A., & van den Hengel, A. (2014). Sampson distance based joint estimation of multiple homographies with uncalibrated cameras. Computer Vision and Image Understanding, 125, 200–213.
Szpak, Z. L., Chojnacki, W., & van den Hengel, A. (2015). Robust multiple homography estimation: An ill-solved problem. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2132–2141).
Tumblin, J., Agrawal, A., & Raskar, R. (2005). Why I want a gradient camera. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 103–110).
Tursun, O. T., Akyüz, A. O., Erdem, A., & Erdem, E. (2015). The state of the art in HDR deghosting: A survey and evaluation. Computer Graphics Forum, 34(2), 683–707.
Tursun, O. T., Akyüz, A. O., Erdem, A., & Erdem, E. (2016). An objective deghosting quality metric for HDR images. Computer Graphics Forum, 35(2), 139–152.
Wu, S., Xu, J., Tai, Y. W., & Tang, C. K. (2018). Deep high dynamic range imaging with large foreground motions. In European conference on computer vision (ECCV).
Yan, Q., Gong, D., Shi, Q., Hengel, A. V. D., Shen, C., Reid, I., & Zhang, Y. (2019). Attention-guided network for ghost-free high dynamic range imaging. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1751–1760).
Yan, Q., Gong, D., Zhang, P., Shi, Q., Sun, J., Reid, I., & Zhang, Y. (2019). Multi-scale dense networks for deep high dynamic range imaging. In IEEE winter conference on applications of computer vision (WACV) (pp. 41–50). https://doi.org/10.1109/WACV.2019.00012
Yan, Q., Sun, J., Li, H., Zhu, Y., & Zhang, Y. (2017). High dynamic range imaging by sparse representation. Neurocomputing, 269(20), 160–169.
Yan, Q., Zhang, L., Liu, Y., Zhu, Y., Sun, J., Shi, Q., & Zhang, Y. (2020). Deep HDR imaging via a non-local network. IEEE Transactions on Image Processing, 29, 4308–4322.
Yang, J., Gong, D., Liu, L., & Shi, Q. (2018). Seeing deeply and bidirectionally: A deep learning approach for single image reflection removal. In European conference on computer vision (ECCV) (pp. 654–669).
Yang, X., Xu, K., Song, Y., Zhang, Q., Wei, X., & Rynson, L. (2018). Image correction via deep reciprocating HDR transformation. In IEEE conference on computer vision and pattern recognition (CVPR).
Yu, F., & Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122
Zhang, C., Yan, Q., Zhu, Y., Li, X., Sun, J., & Zhang, Y. (2020). Attention-based network for low-light image enhancement. In IEEE international conference on multimedia and expo (ICME).
Zhang, W., & Cham, W. K. (2012). Gradient-directed multiexposure composition. IEEE Transactions on Image Processing, 21(4), 2318–2323.
Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., & Fu, Y. (2018). Image super-resolution using very deep residual channel attention networks. In European conference on computer vision (ECCV).
Zhang, Y., Tian, Y., Kong, Y., Zhong, B., & Fu, Y. (2018). Residual dense network for image super-resolution. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2472–2481).
Zhao, B., Wu, X., Feng, J., Peng, Q., & Yan, S. (2017). Diversified visual attention networks for fine-grained object classification. IEEE Transactions on Multimedia, 19(6), 1245–1256.
Zhao, H., Gallo, O., Frosio, I., & Kautz, J. (2017). Loss functions for image restoration with neural networks. IEEE Transactions on Computational Imaging, 3(1), 47–57.
Zimmer, H., Bruhn, A., & Weickert, J. (2011). Freehand HDR imaging of moving scenes with simultaneous resolution enhancement. In Computer Graphics Forum (pp. 405–414).
Acknowledgements
This work was partially supported by the Centre for Augmented Reasoning at the Australian Institute for Machine Learning, ARC (DP140102270, DP160100703), and NSFC (61871328, 61971273, 61901384).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Rei Kawakami.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yan, Q., Gong, D., Shi, J.Q. et al. Dual-Attention-Guided Network for Ghost-Free High Dynamic Range Imaging. Int J Comput Vis 130, 76–94 (2022). https://doi.org/10.1007/s11263-021-01535-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-021-01535-y