Visual saliency detection via a recurrent residual convolutional neural network based on densely aggregated features

doi:10.1016/j.cag.2022.03.011

Computers & Graphics

Volume 104, May 2022, Pages 72-85

https://doi.org/10.1016/j.cag.2022.03.011 Get rights and content

Highlights

•
An aggregation module is designed to aggregate densely connected features with different resolutions, which enables fully communication and fusion between different network layers.
•
An improved recurrent residual refinement mechanism is proposed, in which the residuals are learned recurrently under deep supervision to achieve continuous optimization of the saliency map.
•
Extensive experiments prove the advantages of DAF-RRN over the state-of-the-arts.

Abstract

Current visual saliency detection algorithms based on deep learning suffer from reduced detection effect in complex scenes owing to ineffective feature expression and poor generalization. The present study addresses this issue by proposing a recurrent residual network based on dense aggregated features. Firstly, different levels of dense convolutional features are extracted from the ResNeXt101 network. Then, the features of all layers are aggregated under an Atrous spatial pyramid pooling operation, which makes comprehensive use of all possible saliency cues. Finally, the residuals are learned recurrently under a deep supervision mechanism to achieve continuous optimization of the saliency map. Application of the proposed algorithm to publicly available datasets demonstrates that the dense aggregation of features not only enhances the aggregation of effective information within a single layer, but also enhances external interactions between information at different feature levels. As a result, the proposed algorithm provides better detection ability than that of current state-of-the-art algorithms.

Graphical abstract

Introduction

Salient object detection is an important field of study in image information processing that seeks to recognize and capture the areas of images that are of particular relevance to the overall scene in a similar manner to which humans recognize salient objects in scenes. Accordingly, salient object detection represents a machine learning process that plays a key role in various computer vision tasks, such as pedestrian reidentification, target tracking, and semantic segmentation. Numerous advancements in recent years, such as the development of deep learning technology and the enhancement of feature expression ability, have resulted in significant and continuous progress in the performance of salient object detection algorithms [1], [2].

Salient object detection algorithms have benefited greatly from the ability of deep neural networks to extract effective features from unprocessed images automatically, and the rich semantic information generated can reflect differences in saliency between image regions. For example, Li et al. [3] constructed a multiscale deep features (MDF) model directly from three levels of features obtained from a convolutional neural network (CNN) [4], and the MDF model was applied for training super pixel-level classifiers to achieve salient object detection. The generated semantic information was demonstrated to enhance the smoothness of the salient region boundaries predicted by the image-level features, while the region-level depth features enhanced the accuracy the selected salient object regions. Therefore, Li et al. [5] further proposed a deep contrast learning (DCL) model that fused image-level and region-level convolutional features to improve the detection accuracy. Furthermore, the statistical information contained in conventional color, texture, and center priors was demonstrated to have a complementary effect on the detection capability associated with the deep features. A saliency detection model was proposed by Liu et al. [6] that combined a global model with local optimization using conventional features, where the recurrent fully convolutional network (RFCN) algorithm was employed to combine the salient prior map in the network structure. Both methods have improved the salient object detection performance of the algorithm.

In recent years, with the continuous development of deep learning technology, more models for saliency detection have been developed. Ning et al. [7] proposed a feature selection network that combines global and local fine-grained features to realize person reidentification. Ning et al. [8] proposed a model that has joint weak saliency and attention aware is proposed, which can obtain more complete global features by weakening saliency features. The model then obtains diversified saliency features via attention diversity to improve the performance of the model. Ning et al. [9] proposed a lightweight encoding and decoding network (EDNet). A perfect balance of network prediction accuracy and time consumption was achieved by using feature fusion and efficient deconvolution in the feature decoding stage. Gao et al. [10] proposed a salient object detection framework for surveillance applications based on the intelligent network with hierarchical cloud computing and the scene-specific edge computing. It can make the saliency detection model in the edge server suitable for its own specific scene, and performs well in other environments. Gao et al. [11] proposed a multi-stage context perception scheme to efficiently extract the contextual information corresponding to different-size receptive fields in the single image; and proposed the stage-wise refinement to allocate the label information to different parts of the network for helping the network to learn the enriched semantically common knowledge. Zhang et al. [12] proposed a region-proposal based optical flow strategy to suppress the saliency enhancement of non-salient regions due to the moving background. Besides, it develops the bidirectional Bayesian state transition strategy to model the motion uncertainty for refining the spatiotemporal saliency feature. These models have obvious breakthroughs in preserving detail information and improving operation efficiency, but the performances would decrease when dealing with complex scenes.

The present work addresses this issue by proposing an algorithm that uses densely aggregated features in conjunction with a recurrent residual network (DAF-RRN) to detect salient objects in images. Firstly, different levels of dense convolutional features are extracted from a basic network, and then the features of all layers are aggregated to make comprehensive use of all possible saliency cues. Finally, the residuals are learned recurrently under a deep supervision mechanism to achieve continuous optimization of the saliency map. The proposed algorithm is applied to publicly available datasets, and the results demonstrate that the dense aggregation of features not only enhances the aggregation of effective information within a single layer, but also enhances external interactions between information at different feature levels, which increases the detection ability of the proposed algorithm relative to that of current state-of-the-art visual saliency detection algorithms.

The contributions can be summarized as:

(1) The proposed model uses dense connection to aggregate effective information, which combines the technologies of feature dimension reduction, feature reuse and feature cascade. Compared with the existing methods, the feature information is more comprehensive.

(2) All levels of dense features with different resolutions are aggregated via the proposed aggregated module, which enables fully communication and fusion between different network layers. It enriches the obtained dense features with strong expression ability and abundant saliency cues.

(3) An improved Recurrent Residual Refinement Aggregating Network (R3ANet) is designed to generate initial saliency maps from the densely aggregated features, with which the residuals are learned recurrently under a deep supervision mechanism to achieve continuous optimization of the saliency map.

(4) Extensive experiments have been carried out on 3 datasets, which proves the advantages of DAF-RRN over the state-of-the-arts by quantitative and qualitative evaluation.

Section snippets

Related works

Benefiting from the powerful feature learning and expression ability of convolution networks, the deep learning models based on convolution feature fusion continuously refresh the records of traditional manual feature algorithms on existing datasets, and gradually become the mainstream direction of visual saliency detection task. The deep method can be divided into the following categories: multi-layer perceptron, full convolution network, hybrid network and RGB-D saliency detection. This

Recurrent residual network based on densely aggregated features

The overall structure and execution process of DAF-RRN algorithm are shown in Fig. 1. Firstly, ResNeXt101 [36] is used to learn and provide different resolutions of features, and the information within a single layer is aggregated by dense connections to obtain different levels of features. Then, the ASPP operation is applied to achieve the external aggregation of all levels of features. Then, the densely aggregated features are employed for learning residuals under the deep supervision

Description of experimental parameters

The training dataset employed for the experiments was the MSRA10K salient object dataset, which contains 10,000 natural images. The images are resized to 300*300 during training. To enhance the robustness of the network to image transformations and simultaneously alleviate the over-fitting problem, data augmentation was used, which applied random rotation, random cropping, and horizontal flipping to the images in the training dataset. The testing datasets employed were the ECSSD, HKU-IS, and

Conclusion

In this paper, a recurrent residual network based on dense aggregation features is proposed to realize visual saliency detection in complex scenes. This model uses compact mechanism to improve the reuse rate and continuity of features. Multi-level features can be obtained, which helps to maintain the integrity of the target region, and improve the smoothness of the detection region. In order to realize the information fusion of the features in different depth and with different resolutions in

CRediT authorship contribution statement

Chunjian Hua: Conceptualization, Methodology. Xintong Zou: Data curation, Writing – original draft. Yan Ling: Visualization, Investigation. Ying Chen: Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The authors wish to acknowledge the support for the research work from the National Natural Science Foundation of China under Grant No. [62173160].

References (49)

ZhangP.P. et al.
Hyperfusion-net: hyper-densely reflective feature fusion for salient object detection
Pattern Recognit
(2019)
LuoA. et al.
Webly-supervised learning for salient object detection
Pattern Recognit
(2020)
BorjiA. et al.
Salient object detection: A survey
Comput Vis Media
(2019)
WangW.G. et al.
Salient object detection in the deep learning era: an in-depth survey
(2020)
Li GB, Yu YZ. Visual saliency based on multiscale deep features. In: IEEE conference on computer vision and pattern...
Krizhevsky A, Sutskever I, Hinton GE, et al. Imagenet classification with deep convolutional neural networks. In:...
Li GB, Yu YZ. Deep Contrast Learning for Salient Object Detection. In: IEEE conference on computer vision and pattern...
LiuF. et al.
Deep network saliency detection based on global model and local optimization
Acta Opt Sin
(2017)
NingX. et al.
Feature refinement and filter network for person re-identification
IEEE Trans Circuits Syst Video Technol
(2020)
XinN. et al.
Jwsaa: joint weak saliency and attention aware for person re-identification – science direct
Neurocomputing
(2021)

NingX. et al.

Real-time 3d face alignment using an encoder–decoder network with an efficient deconvolution layer

IEEE Signal Process Lett

(2020)

GaoZ. et al.

Salient object detection in the distributed cloud–edge intelligent network

IEEE Netw

(2020)

GaoZ. et al.

Trustful internet of surveillance things based on deeply represented visual co-saliency detection

IEEE Internet Things J

(2020)

ZhangJ. et al.

Industrial pervasive edge computing-based intelligence iot for surveillance saliency detection

IEEE Trans Ind Inf

(2021)

Li GB, Yu YZ. Visual saliency based on multiscale deep features. In: IEEE conference on computer vision and pattern...

ZhaoR. et al.

Saliency detection by multi-context deep learning

LeeG. et al.

Deep saliency with encoded low level distance map and high-level features

KimJ. et al.

A shape-based approach for salient object detection using deep learning

FangZ. et al.

Extraction of refined deep feature and its application in saliency detection

J Comput-Aided Des Comput Graph

(2019)

WangL.Z. et al.

Saliency detection with recurrent fully convolutional networks

ZhangP.P. et al.

Learning uncertain convolutional features for accurate saliency detection

Zhao WB, Zhang J, et al. Weakly Supervised Video Salient Object Detection. In: IEEE conference on computer vision and...

Luo ZM, Mishra A, et al. Non-local deep features for salient object detection. In: IEEE conference on computer vision...

ZhangX. et al.

Boundary-aware high-resolution network with region enhancement for salient object detection

Neurocomputing

(2020)

Cited by (10)

PCB defects target detection combining multi-scale and attention mechanism
2023, Engineering Applications of Artificial Intelligence
The detection of PCB defect quality plays an important role in PCB fabrication. However, the size of the PCB defects is too small to identify. In order to improve the detection efficiency of existing algorithms, a joint multiscale PCB defect target detection and attention mechanism, which named RAR-SSD, was proposed. By using lightweight receptive field block module (RFB-s) with an attention mechanism module, we built a wider range of effective focused features, which exploited the importance of different features in different channels without increasing the computing power of the network. In addition, we built a feature fusion module to efficiently fuse low-level feature information with high-level feature information to produce a more complete feature map and improve the accuracy of fault recognition. The proposed network improved the fault recognition accuracy of PCBs by 2.23% over the original SSD algorithm, with a recall rate of 6.51% and an F1 value of 4.85%, the model has greatly improved in terms of detection performance. The optimized algorithm has significant speed and accuracy advantages over the algorithms YOLOv3 and YOLOv5. Experimental results show that the proposed RAR-SSD model has good performance in detecting small and medium size targets for defects in the PCB manufacturing process and is of some guidance for the subsequent detection of PCB defects.
UDAformer: Underwater image enhancement based on dual attention transformer
2023, Computers and Graphics (Pergamon)
Underwater images suffer from color casts and low contrast degraded due to wavelength-dependent light scatter and abortion of the underwater environment, which impacts the application of high-level computer vision tasks. Considering the characteristics of uneven degradation and loss of color channel of underwater images, a novel dual attention transformer-based underwater image enhancement method, called UDAformer, is proposed. Specifically, Dual Attention Transformer Block (DATB) combining Channel Self-Attention Transformer (CSAT) with Pixel Self-Attention Transformer is proposed for efficient encoding and decoding of underwater image features. Then, the shifted window method for the pixel self-attention (SW-PSAT) is proposed to improve computational efficiency. Finally, the underwater images are recovered through the design of residual connections based on the underwater imaging model. Experimental results demonstrate the proposed UDAformer surpasses previous state-of-the-art methods, both qualitatively and quantitatively. The code is publicly available at: https://github.com/ShenZhen0502/UDAformer.
Editorial Note
2022, Computers and Graphics (Pergamon)
An improved YOLOv5-based model for automatic PCB defect detection
2024, Journal of Physics: Conference Series
Multiple salient object detection through multi-level foreground segmentation strategy
2023, International Journal of System Assurance Engineering and Management
GPNet: Key Point Generation Auxiliary Network for Object Detection
2023, Advanced Theory and Simulations

View all citing articles on Scopus

^☆: This article was recommended for publication by J. Zheng.

View full text

Technical SectionVisual saliency detection via a recurrent residual convolutional neural network based on densely aggregated features☆

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Related works

Recurrent residual network based on densely aggregated features

Description of experimental parameters

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Pattern Recognit

Pattern Recognit

Salient object detection: A survey

Comput Vis Media

Salient object detection in the deep learning era: an in-depth survey

Deep network saliency detection based on global model and local optimization

Acta Opt Sin

Feature refinement and filter network for person re-identification

IEEE Trans Circuits Syst Video Technol

Jwsaa: joint weak saliency and attention aware for person re-identification – science direct

Neurocomputing

Real-time 3d face alignment using an encoder–decoder network with an efficient deconvolution layer

IEEE Signal Process Lett

Salient object detection in the distributed cloud–edge intelligent network

IEEE Netw

Trustful internet of surveillance things based on deeply represented visual co-saliency detection

IEEE Internet Things J

Industrial pervasive edge computing-based intelligence iot for surveillance saliency detection

IEEE Trans Ind Inf

Saliency detection by multi-context deep learning

Deep saliency with encoded low level distance map and high-level features

A shape-based approach for salient object detection using deep learning

Extraction of refined deep feature and its application in saliency detection

J Comput-Aided Des Comput Graph

Saliency detection with recurrent fully convolutional networks

Learning uncertain convolutional features for accurate saliency detection

Boundary-aware high-resolution network with region enhancement for salient object detection

Neurocomputing

Technical Section
Visual saliency detection via a recurrent residual convolutional neural network based on densely aggregated features☆