skip to main content
research-article

Kernel Attention Network for Single Image Super-Resolution

Published: 05 July 2020 Publication History

Abstract

Recently, attention mechanisms have shown a developing tendency toward convolutional neural network (CNN), and some representative attention mechanisms, i.e., channel attention (CA) and spatial attention (SA) have been fully applied to single image super-resolution (SISR) tasks. However, the existing architectures directly apply these attention mechanisms to SISR without much consideration of the nature characteristic, resulting in less strong representational power. In this article, we propose a novel kernel attention module (KAM) for SISR, which enables the network to adjust its receptive field size corresponding to various scales of input by dynamically selecting the appropriate kernel. Based on this, we stack multiple kernel attention modules with group and residual connection to constitute a novel architecture for SISR, which enables our network to learn more distinguishing representations through filtering the information under different receptive fields. Thus, our network is more sensitive to multi-scale features, which enables our single network to deal with multi-scale SR task by predefining the upscaling modules. Besides, other attention mechanisms in super-resolution are also investigated and illustrated in detail in this article. Thanks to the kernel attention mechanism, the extensive benchmark evaluation shows that our method outperforms the other state-of-the-art methods.

References

[1]
Eirikur Agustsson and Radu Timofte. 2017. NTIRE 2017 challenge on single image super-resolution: Dataset and study. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops, Honolulu, HI, July 21--26, 2017. 1122--1131.
[2]
Namhyuk Ahn, Byungkon Kang, and Kyung-Ah Sohn. 2018. Fast, accurate, and lightweight super-resolution with cascading residual network. In 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, September 8--14, 2018, Part X. 256--272.
[3]
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 6077--6086.
[4]
Adrian Bulat, Jing Yang, and Georgios Tzimiropoulos. 2018. To learn image super-resolution, use a GAN to learn how to do image degradation first. In 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, September 8--14, 2018, Part VI. 187--202.
[5]
Jie Chen, Jie Shao, and Chengkun He. 2020. Movie fill in the blank by joint learning from video and text with adaptive temporal attention. Pattern Recognit. Lett. 132 (2020), 62--68.
[6]
Tao Dai, Jianrui Cai, Yongbing Zhang, Shu-Tao Xia, and Lei Zhang. 2019. Second-order attention network for single image super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, June 16--20, 2019. 11065--11074.
[7]
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2014. Learning a deep convolutional network for image super-resolution. In 13th European Conference on Computer Vision, ECCV 2014, Zurich, Switzerland, September 6--12, 2014, Part IV. 184--199.
[8]
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2016. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38, 2 (2016), 295--307.
[9]
Chao Dong, Chen Change Loy, and Xiaoou Tang. 2016. Accelerating the super-resolution convolutional neural network. In 14th European Conference on Computer Vision, ECCV 2016, Amsterdam, The Netherlands, October 11--14, 2016, Part II. 391--407.
[10]
Gilad Freedman and Raanan Fattal. 2011. Image and video upscaling from local self-examples. ACM Trans. Graph. 30, 2 (2011), 12:1--12:11.
[11]
Lianli Gao, Xiangpeng Li, Jingkuan Song, and Heng Tao Shen. 2020. Hierarchical LSTMs with adaptive attention for visual captioning. IEEE Trans. Pattern Anal. Mach. Intell. 42, 5 (2020), 1112--1131.
[12]
Jianting Guo, Peijia Zheng, and Jiwu Huang. 2017. An efficient motion detection and tracking scheme for encrypted surveillance videos. TOMCCAP 13, 4 (2017), 61:1--61:23.
[13]
Wei Han, Shiyu Chang, Ding Liu, Mo Yu, Michael Witbrock, and Thomas S. Huang. 2018. Image super-resolution via dual-state recurrent networks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 1654--1663.
[14]
Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. 2018. Deep back-projection networks for super-resolution. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 1664--1673.
[15]
Chen He and Haifeng Hu. 2019. Image captioning with visual-semantic double attention. TOMCCAP 15, 1 (2019), 26:1--26:16.
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, June 27--30, 2016. 770--778.
[17]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861. arxiv:1704.04861
[18]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 7132--7141.
[19]
Yanting Hu, Jie Li, Yuanfei Huang, and Xinbo Gao. 2019. Channel-wise and spatial feature modulation network for single image super-resolution. IEEE Trans. Circuits Syst. Video Techn.
[20]
Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, July 21--26, 2017. 2261--2269.
[21]
Jia-Bin Huang, Abhishek Singh, and Narendra Ahuja. 2015. Single image super-resolution from transformed self-exemplars. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, June 7--12, 2015. 5197--5206.
[22]
Zheng Hui, Xinbo Gao, Yunchu Yang, and Xiumei Wang. 2019. Lightweight image super-resolution with information multi-distillation network. In 27th ACM International Conference on Multimedia, MM 2019, Nice, France, October 21--25, 2019. 2024--2032.
[23]
Zheng Hui, Xiumei Wang, and Xinbo Gao. 2018. Fast and accurate single image super-resolution via information distillation network. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 723--731.
[24]
Michal Irani and Shmuel Peleg. 1991. Improving resolution by image registration. CVGIP: Graphical Model and Image Processing 53, 3 (1991), 231--239.
[25]
Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. 2016. Accurate image super-resolution using very deep convolutional networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, June 27--30, 2016. 1646--1654.
[26]
Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. 2016. Deeply-recursive convolutional network for image super-resolution. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, June 27--30, 2016. 1637--1645.
[27]
Jun-Hyuk Kim, Jun-Ho Choi, Manri Cheon, and Jong-Seok Lee. 2020. MAMNet: Multi-path adaptive modulation network for image super-resolution. Neurocomput 402 (2020), 38--49.
[28]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, May 7--9, 2015.
[29]
Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. 2019. Fast and accurate image super-resolution with deep Laplacian pyramid networks. IEEE Trans. Pattern Anal. Mach. Intell. 41, 11 (2019), 2599--2613.
[30]
Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, July 21--26, 2017. 105--114.
[31]
Juncheng Li, Faming Fang, Kangfu Mei, and Guixu Zhang. 2018. Multi-scale residual network for image super-resolution. In 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, September 8--14, 2018, Part VIII. 527--542.
[32]
Xianguo Li, Yemei Sun, Yanli Yang, and Changyun Miao. 2019. Symmetrical residual connections for single image super-resolution. TOMCCAP 15, 1 (2019), 19:1--19:10.
[33]
Xiang Li, Wenhai Wang, Xiaolin Hu, and Jian Yang. 2019. Selective kernel networks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, June 16--20, 2019. 510--519.
[34]
Zhen Li, Jinglei Yang, Zheng Liu, Xiaomin Yang, Gwanggil Jeon, and Wei Wu. 2019. Feedback network for image super-resolution. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, June 16--20, 2019. 3867--3876.
[35]
Qianli Liao and Tomaso A. Poggio. 2016. Bridging the gaps between residual learning, recurrent neural networks and visual cortex. CoRR abs/1604.03640 (2016).
[36]
Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. 2017. Enhanced deep residual networks for single image super-resolution. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops, Honolulu, HI, July 21--26, 2017. 1132--1140.
[37]
Heng Liu, Jungong Han, Shudong Hou, Ling Shao, and Ruan Yue. 2018. Single image super-resolution using a deep encoder-decoder symmetrical network with iterative back projection. Neurocomput. 282 (2018), 52--59.
[38]
Xiao-Jiao Mao, Chunhua Shen, and Yu-Bin Yang. 2016. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5--10, 2016, Barcelona, Spain. 2802--2810.
[39]
David R. Martin, Charless C. Fowlkes, Doron Tal, and Jitendra Malik. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In 8th International Conference On Computer Vision (ICCV-01), Vancouver, British Columbia, Canada, July 7--14, 2001, Volume 2. 416--425.
[40]
Yusuke Matsui, Kota Ito, Yuji Aramaki, Azuma Fujimoto, Toru Ogawa, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2017. Sketch-based manga retrieval using manga109 dataset. Multimedia Tools Appl. 76, 20 (2017), 21811--21838.
[41]
Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. 2014. Recurrent models of visual attention. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8--13 2014, Montreal, Quebec, Canada. 2204--2212.
[42]
Jongchan Park, Sanghyun Woo, Joon-Young Lee, and In So Kweon. 2018. BAM: Bottleneck attention module. In British Machine Vision Conference 2018, BMVC 2018, Northumbria University, Newcastle, UK, September 3--6, 2018. 147.
[43]
Wenzhe Shi, Jose Caballero, Ferenc Huszar, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, June 27--30, 2016. 1874--1883.
[44]
Assaf Shocher, Nadav Cohen, and Michal Irani. 2018. “Zero-shot” Super-resolution using deep internal learning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 3118--3126.
[45]
Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong Li, Alan Hanjalic, and Heng Tao Shen. 2019. From deterministic to generative: Multimodal stochastic RNNs for video captioning. IEEE Trans. Neural Networks Learn. Syst. 30, 10 (2019), 3047--3058.
[46]
Ying Tai, Jian Yang, and Xiaoming Liu. 2017. Image super-resolution via deep recursive residual network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, July 21--26, 2017. 2790--2798.
[47]
Ying Tai, Jian Yang, Xiaoming Liu, and Chunyan Xu. 2017. MemNet: A persistent memory network for image restoration. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22--29, 2017. 4549--4557.
[48]
Anqi Wang, Haifeng Hu, and Liang Yang. 2018. Image captioning with affective guiding and selective attention. TOMCCAP 14, 3 (2018), 73:1--73:15.
[49]
Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2017. Adversarial cross-modal retrieval. In 2017 ACM on Multimedia Conference, MM 2017, Mountain View, CA, October 23--27, 2017. 154--162.
[50]
Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 13, 4 (2004), 600--612.
[51]
Zhihao Wang, Jian Chen, and Steven C. H. Hoi. 2019. Deep learning for image super-resolution: A survey. CoRR abs/1902.06068.
[52]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, September 8--14, 2018, Part VII. 3--19.
[53]
Xing Xu, Fumin Shen, Yang Yang, Heng Tao Shen, and Xuelong Li. 2017. Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Process. 26, 5 (2017), 2494--2507.
[54]
Jianchao Yang, John Wright, Thomas S. Huang, and Yi Ma. 2008. Image super-resolution as sparse representation of raw image patches. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2008, June 24–26, 2008, Anchorage, AK.
[55]
Xin Yang, Haiyang Mei, Jiqing Zhang, Ke Xu, Baocai Yin, Qiang Zhang, and Xiaopeng Wei. 2019. DRFN: Deep recurrent fusion network for single-image super-resolution with large factors. IEEE Trans. Multimedia 21, 2 (2019), 328--337.
[56]
Yuan Yuan, Siyuan Liu, Jiawei Zhang, Yongbing Zhang, Chao Dong, and Liang Lin. 2018. Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2018, Salt Lake City, UT, June 18--22, 2018. 701--710.
[57]
Dongyang Zhang, Jie Shao, Gang Hu, and Lianli Gao. 2017. Sharp and real image super-resolution using generative adversarial network. In 24th International Conference on Neural Information Processing, ICONIP 2017, Guangzhou, China, November 14--18, 2017, Part III. 217--226.
[58]
Kai Zhang, Wangmeng Zuo, Shuhang Gu, and Lei Zhang. 2017. Learning deep CNN denoiser prior for image restoration. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, July 21--26, 2017. 2808--2817.
[59]
Kai Zhang, Wangmeng Zuo, and Lei Zhang. 2018. Learning a single convolutional super-resolution network for multiple degradations. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 3262--3271.
[60]
Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. 2018. Image super-resolution using very deep residual channel attention networks. In 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, September 8--14, 2018, Part VII. 294--310.
[61]
Yulun Zhang, Kunpeng Li, Kai Li, Bineng Zhong, and Yun Fu. 2019. Residual non-local attention networks for image restoration. In International Conference on Learning Representations, ICLR 2019, New Orleans, LA.
[62]
Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. 2018. Residual dense network for image super-resolution. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, June 18--22, 2018. 2472--2481.
[63]
Hang Zhao, Orazio Gallo, Iuri Frosio, and Jan Kautz. 2017. Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3, 1 (2017), 47--57.
[64]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22--29, 2017. 2242--2251.

Cited By

View all
  • (2024)Cross-Modal Face Super-Resolution Based on Quasi-Siamese Domain Transfer Fusion NetworkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/369038720:11(1-23)Online publication date: 28-Aug-2024
  • (2024)Detail-preserving Joint Image UpsamplingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366524620:8(1-23)Online publication date: 13-Jun-2024
  • (2024)JOA‐GAN: An improved single‐image super‐resolution network for remote sensing based on GANIET Image Processing10.1049/ipr2.1319218:12(3530-3544)Online publication date: 27-Aug-2024
  • Show More Cited By

Index Terms

  1. Kernel Attention Network for Single Image Super-Resolution

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 3
    August 2020
    364 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3409646
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 July 2020
    Online AM: 13 May 2020
    Accepted: 01 May 2020
    Revised: 01 April 2020
    Received: 01 May 2019
    Published in TOMM Volume 16, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Image super-resolution
    2. kernel attention
    3. multi-scale features
    4. receptive field

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Sichuan Science and Technology Program
    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)43
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Cross-Modal Face Super-Resolution Based on Quasi-Siamese Domain Transfer Fusion NetworkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/369038720:11(1-23)Online publication date: 28-Aug-2024
    • (2024)Detail-preserving Joint Image UpsamplingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366524620:8(1-23)Online publication date: 13-Jun-2024
    • (2024)JOA‐GAN: An improved single‐image super‐resolution network for remote sensing based on GANIET Image Processing10.1049/ipr2.1319218:12(3530-3544)Online publication date: 27-Aug-2024
    • (2023)An Image Arbitrary-Scale Super-Resolution Network Using Frequency-domain InformationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361637620:3(1-23)Online publication date: 16-Aug-2023
    • (2023)Double-Layer Search and Adaptive Pooling Fusion for Reference-Based Image Super-ResolutionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360493720:1(1-23)Online publication date: 21-Jun-2023
    • (2023)Compressed Screen Content Image Super ResolutionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358996319:6(1-20)Online publication date: 30-Mar-2023
    • (2023)SMNet: Synchronous Multi-Scale Low Light Enhancement Network With Local and Global ConcernIEEE Transactions on Multimedia10.1109/TMM.2023.325414125(9506-9517)Online publication date: 1-Jan-2023
    • (2023)Joint Pixel and Frequency Feature Learning and Fusion via Channel-Wise Transformer for High-Efficiency Learned In-Loop Filter in VVCIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.332348334:5(4070-4083)Online publication date: 10-Oct-2023
    • (2023)Super-Resolution Reconstruction of CT Images Based on Multi-scale Information Fused Generative Adversarial NetworksAnnals of Biomedical Engineering10.1007/s10439-023-03412-w52:1(57-70)Online publication date: 8-Dec-2023
    • (2022)An Investigation of Scale Factor in Deep Networks for Scene Recognitionundefined10.12794/metadc1944210Online publication date: May-2022
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media