Elsevier

Computers & Graphics

Volume 104, May 2022, Pages 72-85
Computers & Graphics

Technical Section
Visual saliency detection via a recurrent residual convolutional neural network based on densely aggregated features

https://doi.org/10.1016/j.cag.2022.03.011Get rights and content

Highlights

  • An aggregation module is designed to aggregate densely connected features with different resolutions, which enables fully communication and fusion between different network layers.

  • An improved recurrent residual refinement mechanism is proposed, in which the residuals are learned recurrently under deep supervision to achieve continuous optimization of the saliency map.

  • Extensive experiments prove the advantages of DAF-RRN over the state-of-the-arts.

Abstract

Current visual saliency detection algorithms based on deep learning suffer from reduced detection effect in complex scenes owing to ineffective feature expression and poor generalization. The present study addresses this issue by proposing a recurrent residual network based on dense aggregated features. Firstly, different levels of dense convolutional features are extracted from the ResNeXt101 network. Then, the features of all layers are aggregated under an Atrous spatial pyramid pooling operation, which makes comprehensive use of all possible saliency cues. Finally, the residuals are learned recurrently under a deep supervision mechanism to achieve continuous optimization of the saliency map. Application of the proposed algorithm to publicly available datasets demonstrates that the dense aggregation of features not only enhances the aggregation of effective information within a single layer, but also enhances external interactions between information at different feature levels. As a result, the proposed algorithm provides better detection ability than that of current state-of-the-art algorithms.

Introduction

Salient object detection is an important field of study in image information processing that seeks to recognize and capture the areas of images that are of particular relevance to the overall scene in a similar manner to which humans recognize salient objects in scenes. Accordingly, salient object detection represents a machine learning process that plays a key role in various computer vision tasks, such as pedestrian reidentification, target tracking, and semantic segmentation. Numerous advancements in recent years, such as the development of deep learning technology and the enhancement of feature expression ability, have resulted in significant and continuous progress in the performance of salient object detection algorithms [1], [2].

Salient object detection algorithms have benefited greatly from the ability of deep neural networks to extract effective features from unprocessed images automatically, and the rich semantic information generated can reflect differences in saliency between image regions. For example, Li et al. [3] constructed a multiscale deep features (MDF) model directly from three levels of features  obtained  from a convolutional neural network (CNN) [4], and the MDF model was applied for training super pixel-level classifiers to achieve salient object detection. The generated semantic information was demonstrated to enhance the smoothness of the salient region boundaries predicted by the image-level features, while the region-level depth features enhanced the accuracy the selected salient object regions. Therefore, Li et al. [5] further proposed a deep contrast learning (DCL) model that fused image-level and region-level convolutional features to improve the detection accuracy. Furthermore, the statistical information contained in conventional color, texture, and center priors was demonstrated to have a complementary effect on the detection capability associated with the deep features. A saliency detection model was proposed by Liu et al. [6] that combined a global model with local optimization using conventional features, where the recurrent fully convolutional network (RFCN) algorithm was employed to combine the salient prior map in the network structure. Both methods have improved the salient object detection performance of the algorithm.

In recent years, with the continuous development of deep learning technology, more models for saliency detection have been developed. Ning et al. [7] proposed a feature selection network that combines global and local fine-grained features to realize person reidentification. Ning et al. [8] proposed a model that has joint weak saliency and attention aware is proposed, which can obtain more complete global features by weakening saliency features. The model then obtains diversified saliency features via attention diversity to improve the performance of the model. Ning et al. [9] proposed a lightweight encoding and decoding network (EDNet). A perfect balance of network prediction accuracy and time consumption was achieved by using feature fusion and efficient deconvolution in the feature decoding stage. Gao et al. [10] proposed a salient object detection framework for surveillance applications based on the intelligent network with hierarchical cloud computing and the scene-specific edge computing. It can make the saliency detection model in the edge server suitable for its own specific scene, and performs well in other environments. Gao et al. [11] proposed a multi-stage context perception scheme to efficiently extract the contextual information corresponding to different-size receptive fields in the single image; and proposed the stage-wise refinement to allocate the label information to different parts of the network for helping the network to learn the enriched semantically common knowledge. Zhang et al. [12] proposed a region-proposal based optical flow strategy to suppress the saliency enhancement of non-salient regions due to the moving background. Besides, it develops the bidirectional Bayesian state transition strategy to model the motion uncertainty for refining the spatiotemporal saliency feature. These models have obvious breakthroughs in preserving detail information and improving operation efficiency, but the performances would decrease when dealing with complex scenes.

The present work addresses this issue by proposing an algorithm that uses densely aggregated features in conjunction with a recurrent residual network (DAF-RRN) to detect salient objects in images. Firstly, different levels of dense convolutional features are extracted from a basic network, and then the features of all layers are aggregated to make comprehensive use of all possible saliency cues. Finally, the residuals are learned recurrently under a deep supervision mechanism to achieve continuous optimization of the saliency map. The proposed algorithm is applied to publicly available datasets, and the results demonstrate that the dense aggregation of features not only enhances the aggregation of effective information within a single layer, but also enhances external interactions between information at different feature levels, which increases the detection ability of the proposed algorithm relative to that of current state-of-the-art visual saliency detection algorithms.

The contributions can be summarized as:

(1) The proposed model uses dense connection to aggregate effective information, which combines the technologies of feature dimension reduction, feature reuse and feature cascade. Compared with the existing methods, the feature information is more comprehensive.

(2) All levels of dense features with different resolutions are aggregated via the proposed aggregated module, which enables fully communication and fusion between different network layers. It enriches the obtained dense features with strong expression ability and abundant saliency cues.

(3) An improved Recurrent Residual Refinement Aggregating Network (R3ANet) is designed to generate initial saliency maps from the densely aggregated features, with which the residuals are learned recurrently under a deep supervision mechanism to achieve continuous optimization of the saliency map.

(4) Extensive experiments have been carried out on 3 datasets, which proves the advantages of DAF-RRN over the state-of-the-arts by quantitative and qualitative evaluation.

Section snippets

Related works

Benefiting from the powerful feature learning and expression ability of convolution networks, the deep learning models based on convolution feature fusion continuously refresh the records of traditional manual feature algorithms on existing datasets, and gradually become the mainstream direction of visual saliency detection task. The deep method can be divided into the following categories: multi-layer perceptron, full convolution network, hybrid network and RGB-D saliency detection. This

Recurrent residual network based on densely aggregated features

The overall structure and execution process of DAF-RRN algorithm are shown in Fig. 1. Firstly, ResNeXt101 [36] is used to learn and provide different resolutions of features, and the information within a single layer is aggregated by dense connections to obtain different levels of features. Then, the ASPP operation is applied to achieve the external aggregation of all levels of features. Then, the densely aggregated features are employed for learning residuals under the deep supervision

Description of experimental parameters

The training dataset employed for the experiments was the MSRA10K salient object dataset, which contains 10,000 natural images. The images are resized to 300*300 during training. To enhance the robustness of the network to image transformations and simultaneously alleviate the over-fitting problem, data augmentation was used, which applied random rotation, random cropping, and horizontal flipping to the images in the training dataset. The testing datasets employed were the ECSSD, HKU-IS, and

Conclusion

In this paper, a recurrent residual network based on dense aggregation features is proposed to realize visual saliency detection in complex scenes. This model uses compact mechanism to improve the reuse rate and continuity of features. Multi-level features can be obtained, which helps to maintain the integrity of the target region, and improve the smoothness of the detection region. In order to realize the information fusion of the features in different depth and with different resolutions in

CRediT authorship contribution statement

Chunjian Hua: Conceptualization, Methodology. Xintong Zou: Data curation, Writing – original draft. Yan Ling: Visualization, Investigation. Ying Chen: Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The authors wish to acknowledge the support for the research work from the National Natural Science Foundation of China under Grant No. [62173160].

References (49)

  • ZhangP.P. et al.

    Hyperfusion-net: hyper-densely reflective feature fusion for salient object detection

    Pattern Recognit

    (2019)
  • LuoA. et al.

    Webly-supervised learning for salient object detection

    Pattern Recognit

    (2020)
  • BorjiA. et al.

    Salient object detection: A survey

    Comput Vis Media

    (2019)
  • WangW.G. et al.

    Salient object detection in the deep learning era: an in-depth survey

    (2020)
  • Li GB, Yu YZ. Visual saliency based on multiscale deep features. In: IEEE conference on computer vision and pattern...
  • Krizhevsky A, Sutskever I, Hinton GE, et al. Imagenet classification with deep convolutional neural networks. In:...
  • Li GB, Yu YZ. Deep Contrast Learning for Salient Object Detection. In: IEEE conference on computer vision and pattern...
  • LiuF. et al.

    Deep network saliency detection based on global model and local optimization

    Acta Opt Sin

    (2017)
  • NingX. et al.

    Feature refinement and filter network for person re-identification

    IEEE Trans Circuits Syst Video Technol

    (2020)
  • XinN. et al.

    Jwsaa: joint weak saliency and attention aware for person re-identification – science direct

    Neurocomputing

    (2021)
  • NingX. et al.

    Real-time 3d face alignment using an encoder–decoder network with an efficient deconvolution layer

    IEEE Signal Process Lett

    (2020)
  • GaoZ. et al.

    Salient object detection in the distributed cloud–edge intelligent network

    IEEE Netw

    (2020)
  • GaoZ. et al.

    Trustful internet of surveillance things based on deeply represented visual co-saliency detection

    IEEE Internet Things J

    (2020)
  • ZhangJ. et al.

    Industrial pervasive edge computing-based intelligence iot for surveillance saliency detection

    IEEE Trans Ind Inf

    (2021)
  • Li GB, Yu YZ. Visual saliency based on multiscale deep features. In: IEEE conference on computer vision and pattern...
  • ZhaoR. et al.

    Saliency detection by multi-context deep learning

  • LeeG. et al.

    Deep saliency with encoded low level distance map and high-level features

  • KimJ. et al.

    A shape-based approach for salient object detection using deep learning

  • FangZ. et al.

    Extraction of refined deep feature and its application in saliency detection

    J Comput-Aided Des Comput Graph

    (2019)
  • WangL.Z. et al.

    Saliency detection with recurrent fully convolutional networks

  • ZhangP.P. et al.

    Learning uncertain convolutional features for accurate saliency detection

  • Zhao WB, Zhang J, et al. Weakly Supervised Video Salient Object Detection. In: IEEE conference on computer vision and...
  • Luo ZM, Mishra A, et al. Non-local deep features for salient object detection. In: IEEE conference on computer vision...
  • ZhangX. et al.

    Boundary-aware high-resolution network with region enhancement for salient object detection

    Neurocomputing

    (2020)
  • Cited by (10)

    • PCB defects target detection combining multi-scale and attention mechanism

      2023, Engineering Applications of Artificial Intelligence
    • Editorial Note

      2022, Computers and Graphics (Pergamon)
    • Multiple salient object detection through multi-level foreground segmentation strategy

      2023, International Journal of System Assurance Engineering and Management
    View all citing articles on Scopus

    This article was recommended for publication by J. Zheng.

    View full text