skip to main content
research-article

Residual Refinement Network with Attribute Guidance for Precise Saliency Detection

Published: 22 July 2021 Publication History

Abstract

As an important topic in the multimedia and computer vision fields, salient object detection has been researched for years. Recently, state-of-the-art performance has been witnessed with the aid of the fully convolutional networks (FCNs) and the various pyramid-like encoder-decoder frameworks. Starting from a common encoder-decoder architecture, we enhance a residual refinement network with feature purification for better saliency estimation. To this end, we improve the global knowledge streams with intermediate supervisions for global saliency estimation and design a specific feature subtraction module for residual learning, respectively. On the basis of the strengthened network, we also introduce an attribute encoding sub-network (AENet) with a grid aggregation block (GAB) to guide the final saliency predictor to obtain more accurate saliency maps. Furthermore, the network is trained with a novel constraint loss besides the traditional cross-entropy loss to yield the finer results. Extensive experiments on five public benchmarks show our method achieves better or comparable performance compared with previous state-of-the-art methods.

References

[1]
Ali Borji, Ming-Ming Cheng, Huaizu Jiang, and Jia Li. 2015. Salient object detection: A benchmark. IEEE Trans. Image Process. 24, 12 (2015), 5706–5722.
[2]
Ali Borji and Laurent Itti. 2012. Exploiting local and global patch rarities for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 478–485.
[3]
Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, and Han Hu. 2019. GCNet: Non-local networks meet squeeze-excitation networks and beyond. arXiv preprint arXiv:1904.11492 (2019).
[4]
Shuhan Chen, Xiuli Tan, Ben Wang, and Xuelong Hu. 2018. Reverse attention for salient object detection. In Proceedings of the European Conference on Computer Vision. 234–250.
[5]
Ming-Ming Cheng, Niloy J. Mitra, Xiaolei Huang, Philip H. S. Torr, and Shi-Min Hu. 2014. Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37, 3 (2014), 569–582.
[6]
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. 2018. Paying more attention to saliency: Image captioning with saliency and context attention. ACM Trans. Multimedia Comput., Commun. Applic. 14, 2 (2018), 1–21.
[7]
Zijun Deng, Xiaowei Hu, Lei Zhu, Xuemiao Xu, Jing Qin, Guoqiang Han, and Pheng-Ann Heng. 2018. R3Net: Recurrent residual refinement network for saliency detection. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, 684–690.
[8]
Yuanyuan Ding, Jing Xiao, and Jingyi Yu. 2011. Importance filtering for image retargeting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 89–96.
[9]
Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2015. The Pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis. 111, 1 (2015), 98–136.
[10]
Mengyang Feng, Huchuan Lu, and Errui Ding. 2019. Attentive feedback network for boundary-aware salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1623–1632.
[11]
Damien Fourure, Rémi Emonet, Elisa Fromont, Damien Muselet, Alain Tremeau, and Christian Wolf. 2017. Residual conv-deconv grid network for semantic segmentation. In Proceedings of the British Machine Vision Conference.
[12]
Keren Fu, Qijun Zhao, and Irene Yu-Hua Gu. 2018. Refinet: A deep segmentation assisted refinement network for salient object detection. IEEE Trans. Multimedia 21, 2 (2018), 457–469.
[13]
Yue Gao, Meng Wang, Dacheng Tao, Rongrong Ji, and Qionghai Dai. 2012. 3-D object retrieval and recognition with hypergraph analysis. IEEE Trans. Image Process. 21, 9 (2012), 4290–4303.
[14]
Genliang Guan, Zhiyong Wang, Shaohui Mei, Max Ott, Mingyi He, and David Dagan Feng. 2014. A top-down approach for video summarization. ACM Trans. Multim. Comput., Commun. Applic. 11, 1 (2014), 1–21.
[15]
Junfeng He, Jinyuan Feng, Xianglong Liu, Tao Cheng, Tai-Hsu Lin, Hyunjin Chung, and Shih-Fu Chang. 2012. Mobile product search with bag of hash bits and boundary reranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3005–3012.
[16]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961–2969.
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[18]
Seunghoon Hong, Tackgeun You, Suha Kwak, and Bohyung Han. 2015. Online tracking by learning discriminative saliency map with convolutional neural network. In Proceedings of the International Conference on Machine Learning. 597–606.
[19]
Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, and Philip H. S. Torr. 2017. Deeply supervised salient object detection with short connections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3203–3212.
[20]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.
[21]
Laurent Itti, Christof Koch, and Ernst Niebur. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell.11 (1998), 1254–1259.
[22]
Bowen Jiang, Lihe Zhang, Huchuan Lu, Chuan Yang, and Ming-Hsuan Yang. 2013. Saliency detection via absorbing Markov chain. In Proceedings of the IEEE International Conference on Computer Vision. 1665–1672.
[23]
Huaizu Jiang, Jingdong Wang, Zejian Yuan, Yang Wu, Nanning Zheng, and Shipeng Li. 2013. Salient object detection: A discriminative regional feature integration approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2083–2090.
[24]
Jiwhan Kim, Dongyoon Han, Yu-Wing Tai, and Junmo Kim. 2014. Salient region detection via high-dimensional color transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 883–890.
[25]
Dominik A. Klein and Simone Frintrop. 2011. Center-surround divergence of feature statistics for salient object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2214–2219.
[26]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1097–1105.
[27]
Jason Kuen, Zhenhua Wang, and Gang Wang. 2016. Recurrent attentional networks for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3668–3677.
[28]
Guanbin Li and Yizhou Yu. 2015. Visual saliency based on multiscale deep features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5455–5463.
[29]
Guanbin Li and Yizhou Yu. 2016. Visual saliency detection based on multiscale deep CNN features. IEEE Trans. Image Process. 25, 11 (2016), 5012–5024.
[30]
Yin Li, Xiaodi Hou, Christof Koch, James M. Rehg, and Alan L. Yuille. 2014. The secrets of salient object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 280–287.
[31]
Zun Li, Congyan Lang, Yunpeng Chen, Junhao Liew, and Jiashi Feng. 2019. Deep reasoning with multi-scale context for salient object detection. arXiv preprint arXiv:1901.08362 (2019).
[32]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980–2988.
[33]
Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Jiashi Feng, and Jianmin Jiang. 2019. A simple pooling-based design for real-time salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3917–3926.
[34]
Nian Liu and Junwei Han. 2016. DHSNet: Deep hierarchical saliency network for salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 678–686.
[35]
Nian Liu, Junwei Han, and Ming-Hsuan Yang. 2018. PiCANet: Learning pixel-wise contextual attention for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3089–3098.
[36]
Tie Liu, Zejian Yuan, Jian Sun, Jingdong Wang, Nanning Zheng, Xiaoou Tang, and Heung-Yeung Shum. 2010. Learning to detect a salient object. IEEE Trans. Pattern Anal. Mach. Intell. 33, 2 (2010), 353–367.
[37]
Yun Liu, Yu Qiu, Le Zhang, JiaWang Bian, Guang-Yu Nie, and Ming-Ming Cheng. 2018. Salient object detection via high-to-low hierarchical context aggregation. arXiv preprint arXiv:1812.10956 (2018).
[38]
Yi Liu, Qiang Zhang, Dingwen Zhang, and Jungong Han. 2019. Employing deep part-object relationships for salient object detection. In Proceedings of the IEEE International Conference on Computer Vision. 1232–1241.
[39]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.
[40]
Zhiming Luo, Akshaya Mishra, Andrew Achkar, Justin Eichel, Shaozi Li, and Pierre-Marc Jodoin. 2017. Non-local deep features for salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6609–6617.
[41]
Xiongkuo Min, Guangtao Zhai, Ke Gu, and Xiaokang Yang. 2016. Fixation prediction through multimodal analysis. ACM Trans. Multim. Comput., Commun. Applic. 13, 1 (2016), 1–23.
[42]
Xuebin Qin, Zichen Zhang, Chenyang Huang, Chao Gao, Masood Dehghan, and Martin Jagersand. 2019. BASNet: Boundary-aware salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7479–7489.
[43]
Rong Quan, Junwei Han, Dingwen Zhang, Feiping Nie, Xueming Qian, and Xuelong Li. 2017. Unsupervised salient object detection via inferring from imperfect saliency models. IEEE Trans. Multimedia 20, 5 (2017), 1101–1112.
[44]
Gaurav Sharma, Frédéric Jurie, and Cordelia Schmid. 2012. Discriminative spatial saliency for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3506–3513.
[45]
Xiaohui Shen and Ying Wu. 2012. A unified approach to salient object detection via low rank matrix recovery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 853–860.
[46]
Jianping Shi, Qiong Yan, Li Xu, and Jiaya Jia. 2015. Hierarchical image saliency detection on extended CSSD. IEEE Trans. Pattern Anal. Mach. Intell. 38, 4 (2015), 717–729.
[47]
Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. 2016. Training region-based object detectors with online hard example mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 761–769.
[48]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[49]
Jinming Su, Jia Li, Yu Zhang, Changqun Xia, and Yonghong Tian. 2019. Selectivity or invariance: Boundary-aware salient object detection. In Proceedings of the IEEE International Conference on Computer Vision. 3799–3808.
[50]
Lijun Wang, Huchuan Lu, Xiang Ruan, and Ming-Hsuan Yang. 2015. Deep networks for saliency detection via local estimation and global search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3183–3192.
[51]
Lijun Wang, Huchuan Lu, Yifan Wang, Mengyang Feng, Dong Wang, Baocai Yin, and Xiang Ruan. 2017. Learning to detect salient objects with image-level supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 136–145.
[52]
Linzhao Wang, Lijun Wang, Huchuan Lu, Pingping Zhang, and Xiang Ruan. 2016. Saliency detection with recurrent fully convolutional networks. In Proceedings of the European Conference on Computer Vision. Springer, 825–841.
[53]
Linzhao Wang, Lijun Wang, Huchuan Lu, Pingping Zhang, and Xiang Ruan. 2018. Salient object detection with recurrent fully convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 41, 7 (2018), 1734–1746.
[54]
Tiantian Wang, Lihe Zhang, Shuo Wang, Huchuan Lu, Gang Yang, Xiang Ruan, and Ali Borji. 2018. Detect globally, refine locally: A novel approach to saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3127–3135.
[55]
Wenguan Wang, Shuyang Zhao, Jianbing Shen, Steven C. H. Hoi, and Ali Borji. 2019. Salient object detection with pyramid attention and salient edges. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1448–1457.
[56]
Runmin Wu, Mengyang Feng, Wenlong Guan, Dong Wang, Huchuan Lu, and Errui Ding. 2019. A mutual learning method for salient object detection with intertwined multi-supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8150–8159.
[57]
Zhe Wu, Li Su, and Qingming Huang. 2019. Cascaded partial decoder for fast and accurate salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3907–3916.
[58]
Zhe Wu, Li Su, and Qingming Huang. 2019. Stacked cross refinement network for edge-aware salient object detection. In Proceedings of the IEEE International Conference on Computer Vision. 7264–7273.
[59]
Huaxin Xiao, Jiashi Feng, Yunchao Wei, Maojun Zhang, and Shuicheng Yan. 2018. Deep salient object detection with dense connections and distraction diagnosis. IEEE Trans. Multimedia 20, 12 (2018), 3239–3251.
[60]
Yulin Xie, Huchuan Lu, and Ming-Hsuan Yang. 2012. Bayesian saliency via low and mid level cues. IEEE Trans. Image Process. 22, 5 (2012), 1689–1698.
[61]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning. 2048–2057.
[62]
Chuan Yang, Lihe Zhang, Huchuan Lu, Xiang Ruan, and Ming-Hsuan Yang. 2013. Saliency detection via graph-based manifold ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3166–3173.
[63]
Linwei Ye, Zhi Liu, Lina Li, Liquan Shen, Cong Bai, and Yang Wang. 2017. Salient object segmentation via effective integration of saliency and objectness. IEEE Trans. Multimedia 19, 8 (2017), 1742–1756.
[64]
Xu Yingyue, Xu Dan, Hong Xiaopeng, Ouyang Wanli, Ji Rongrong, Xu Min, and Zhao Guoying. 2019. Structured modeling of joint deep feature and prediction refinement for salient object detection. In Proceedings of the IEEE International Conference on Computer Vision. 3788–3797.
[65]
Jun Zhang, Meng Wang, Liang Lin, Xun Yang, Jun Gao, and Yong Rui. 2017. Saliency detection on light field: A multi-cue approach. ACM Trans. Multim. Comput., Commun. Applic. 13, 3 (2017), 1–22.
[66]
Lu Zhang, Ju Dai, Huchuan Lu, You He, and Gang Wang. 2018. A bi-directional message passing model for salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1741–1750.
[67]
Pingping Zhang, Dong Wang, Huchuan Lu, Hongyu Wang, and Xiang Ruan. 2017. Amulet: Aggregating multi-level convolutional features for salient object detection. In Proceedings of the IEEE International Conference on Computer Vision. 202–211.
[68]
Pingping Zhang, Dong Wang, Huchuan Lu, Hongyu Wang, and Baocai Yin. 2017. Learning uncertain convolutional features for accurate saliency detection. In Proceedings of the IEEE International Conference on Computer Vision. 212–221.
[69]
Xiaoning Zhang, Tiantian Wang, Jinqing Qi, Huchuan Lu, and Gang Wang. 2018. Progressive attention guided recurrent network for salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 714–722.
[70]
Jia-Xing Zhao, Jiang-Jiang Liu, Deng-Ping Fan, Yang Cao, Jufeng Yang, and Ming-Ming Cheng. 2019. EGNet: Edge guidance network for salient object detection. In Proceedings of the IEEE International Conference on Computer Vision. 8779–8788.
[71]
Ting Zhao and Xiangqian Wu. 2019. Pyramid feature attention network for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3085–3094.
[72]
Yuan Zhou, Ailing Mao, Shuwei Huo, Jianjun Lei, and Sun-Yuan Kung. 2018. Salient object detection via fuzzy theory and object-level enhancement. IEEE Trans. Multimedia 21, 1 (2018), 74–85.
[73]
Yunzhi Zhuge, Yu Zeng, and Huchuan Lu. 2019. Deep embedding features for salient object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9340–9347.

Cited By

View all
  • (2024)Rethinking Feature Mining for Light Field Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3676967Online publication date: 8-Jul-2024
  • (2024)Gated Multi-Modal Edge Refinement Network for Light Field Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367483620:10(1-20)Online publication date: 28-Jun-2024
  • (2024)OrchLoc: In-Orchard Localization via a Single LoRa Gateway and Generative Diffusion Model-based FingerprintingProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661876(304-317)Online publication date: 3-Jun-2024
  • Show More Cited By

Index Terms

  1. Residual Refinement Network with Attribute Guidance for Precise Saliency Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 3
    August 2021
    443 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3476118
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 July 2021
    Accepted: 01 December 2020
    Revised: 01 October 2020
    Received: 01 April 2020
    Published in TOMM Volume 17, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Salient object detection
    2. residual learning
    3. deep intermediate supervision
    4. attribute encoding

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • NSFC
    • National Natural Science Foundation of China
    • Youth Innovation Promotion Association CAS

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Rethinking Feature Mining for Light Field Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3676967Online publication date: 8-Jul-2024
    • (2024)Gated Multi-Modal Edge Refinement Network for Light Field Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367483620:10(1-20)Online publication date: 28-Jun-2024
    • (2024)OrchLoc: In-Orchard Localization via a Single LoRa Gateway and Generative Diffusion Model-based FingerprintingProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661876(304-317)Online publication date: 3-Jun-2024
    • (2024)RAST: Restorable Arbitrary Style TransferACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363877020:5(1-21)Online publication date: 22-Jan-2024
    • (2023)Detection of Moving Object Using Superpixel Fusion NetworkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357999819:5(1-15)Online publication date: 16-Mar-2023
    • (2023)Feedback Chain Network for Hippocampus SegmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357174419:3s(1-18)Online publication date: 14-Mar-2023
    • (2023)Multi-Guidance CNNs for Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357050719:3(1-19)Online publication date: 25-Feb-2023
    • (2023)Local Eyebrow Feature Attention Network for Masked Face RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356994319:3(1-19)Online publication date: 25-Feb-2023
    • (2023)Mirror Segmentation via Semantic-aware Contextual Contrasted Feature LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356612719:2s(1-22)Online publication date: 17-Feb-2023
    • (2023)Sim2Word: Explaining Similarity with Representative Attribute Words via Counterfactual ExplanationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356303919:6(1-22)Online publication date: 12-Jul-2023
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media