DrlNet: Blind object proposal quality assessment with discriminative response learning

doi:10.1016/j.dsp.2020.102810

Digital Signal Processing

Volume 106, November 2020, 102810

https://doi.org/10.1016/j.dsp.2020.102810 Get rights and content

Abstract

Object proposal quality assessment without ground truth as reference is a challenging task. Some existing methods measure the quality with hand-crafted metrics for subjective metrics, such as objectness and foreground confidence. Recently, deep learning is adopted for direct assessment for quantifiable metric, such as Intersection over Union (IoU). However, we find that IoU, the commonly used quality metric, is far from fully describing the quality of an object proposal. Proposals with the same IoU score may carry totally different amount of discriminative attribute. We introduce a new metric named Discriminative Information Richness (DIR) to characterize the discriminative degree of the given object proposal. DIR is derived from the response intensity of the projected deep feature maps, whose high correlation response indicates the discriminative regions. Besides, we design a convolutional neural network named DrlNet to simultaneously predict IoU scores and perceive the richness of the identification information. DrlNet is defined as a multi-metric joint deep regression network for both spatial covering prediction and discriminative information richness perception. Compared with the solely IoU based models, DrlNet can provide more comprehensive quality assessment. We perform comprehensive experiments on both PASCAL VOC dataset and COCO dataset. The experimental results show that our DrlNet performs well on both proposal selection and object detection tasks. Particularly, experimental results on COCO dataset demonstrate the good generalization ability of the proposed model.

Introduction

In the past few decades, academia has witnessed rapid development of object detection [1], especially with the boost of deep learning [2]. In Convolutional Neural Network (CNN) based two-stage object detection approaches, such as R-CNN [3], SPP-Net [4], Fast R-CNN [5] and Faster R-CNN [6], proposal algorithms are widely used for generating object candidates.

Actually, besides object detection, object proposal algorithms, which aim to provide bounding box candidates with high object covering confidence, have played important roles in many other high-level computer vision tasks, such as object segmentation [7], [8], visual tracking [9], [10], action detection [11], [12], et al. Like region-wise processing strategy [13], [14], [15], [16], [17], which are widely used in computer vision tasks, proposal based processing strategy is another commonly adopted processing manner. The object proposal algorithms generate massive target hypothesises, which can specify the processing targets and greatly narrow the information space to be processed. While, it also shows the facts that, the quality of object proposals will directly influence the performance of such subsequent tasks. A small candidate pool with high quality proposals will greatly boost the performance and efficiency of subsequent steps of the application algorithm. Conversely, poor proposals may even significantly degrade application performance.

Given ground truth file, proposal quality can be easily estimated with the commonly used IoU metric. However, there is usually no available ground truth file for direct quality assessment in real-world applications, which makes it a blind assessment problem. Actually, no-reference object proposal quality ranking has been studied in proposal generating methods [18], [19], [20], [21] for good proposal suggestion. However, these build-in ranking modules can only provide relative superiority orders, but hardly give a metric based prediction, such as predicting the IoU score. To cover this concern, Wu et al. [22] proposed a generic proposal evaluator (GPE), which can directly predict the IoU score of the given object proposal. However, taking IoU as the only metric can hardly distinguish the quality difference of proposals covering the different parts of the object.

In Fig. 1 (a), we present a cat image, whose ground truth object box is shown in Fig. 1 (b). Two bounding box candidates, which share the same IoU values and equally cover the right and left body of the cat, are presented in Fig. 1 (c) and Fig. 1 (d), respectively. Apparently, GPE can not effectively evaluate the relative superiority of the two candidate boxes since they have the same IoU scores. While, we can easily notice that the proposal in Fig. 1 (c) can provide more discriminative information which can help distinguish the category of target object better than that in Fig. 1 (d), so proposal in Fig. 1 (c) should be given higher quality evaluation. Richness of discriminative information is a vital factor in proposal quality assessment, which also has broad prospects of application to high-level tasks.

Taking unsupervised and weakly supervised detection tasks for example, discriminative candidates can definitely provide more non-redundant and valuable information for training. Accordingly, the introduction of discriminative judgment helps the detector to capture the key attributes efficiently and reduces the cost of computation and storage.

Hence, screening out the region proposal with rich discriminative information from massive candidates is an important and promising task. However, richness of discriminative information has never been comprehensively explored, especially as a quality indicator. Actually, there are no existing detection or recognition databases which have annotated samples including quantitative scores for discriminative degree. One of the main obstacles is how to define a suitable metric to quantitatively compute the discrimination of the given proposal. And then, another subsequent problem is how to predict this metric without ground truth as reference. In this paper, our contribution mainly reflects as follows:

(1) We introduce a new metric named Discriminative Information Richness (DIR) to characterize the discriminative degree of the given proposal.

(2) A blind quality evaluation method within discriminative response learning framework is proposed, which can simultaneously perceive the richness of the identification information and the target covering in the candidate area.

Section snippets

Blind image and saliency quality assessment

In computer vision society, blind quality assessment is not a new research topic and a group of promising Blind Image Quality Assessment (BIQA) methods have been proposed in the last several years [23], [24], [25], [26], [27]. Oszustet et al. [23] proposed to extract local features via derivative filters and adopted support vector regression technique for blind image quality assessment. Liu et al. [24] creatively proposed to extract both low-level and high-level statistical features, and then

Blind discrimination assessment with deep response learning

To fully describe the quality of an object proposal, we propose a proposal quality assessment framework, namely Discriminative Response Learning Network (DrlNet), which can pick out the optimal proposals by inferring two complementary quality metrics. The first metric is IoU, which expresses the spatial consistency between candidate boxes and the ground truth. And the second, namely DIR, shows discriminative information richness of the candidate boxes. Intuitively, discriminative information

Experimental results and analysis

In this section, we present experimental results to demonstrate the performance of the proposed DrlNet from three aspects. In Sec. 4.1, we verify the effectiveness of our trained model with both qualitative analysis and quantitative evaluation. In Sec. 4.2, we test the proposal selection performance. Finally, in Sec. 4.3, we give evaluation and analysis about the generalization ability of the trained DrlNet to show whether it is suitable for images outside the training categories.

Conclusion

In this paper, we propose an object proposal quality assessment network, namely DrlNet, within discriminative response learning framework. The proposed method can simultaneously perceive the richness of the identification information and the target covering in the candidate area without ground truth information. We conduct experiments on publicly available datasets and other images containing the categories unseen by the trained models to verify the effectiveness and generalization of DrlNet.

CRediT authorship contribution statement

Qi Qi: Investigation, Methodology, Software. Kunqian Li: Conceptualization, Funding acquisition, Methodology, Visualization, Writing - original draft. Xinning Wang: Conceptualization, Writing - review & editing. Xin Luan: Resources, Supervision, Writing - review & editing. Dalei Song: Project administration, Resources, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

The research has been supported by the National Natural Science Foundation of China under Grant 61906177, in part by the Natural Science Foundation of Shandong Province under Grant ZR2019BF034, in part by Fundamental Research Funds for the Central Universities under Grants 201813022 and 201964013.

Qi Qi received the B.S. and M.S. degrees from University of Jinan, Jinan, China, in 2015 and 2017, respectively. He is currently working toward the Ph.D. degree in College of Information Science and Engineering, Ocean University of China, Qingdao, China. His research interests include image processing and computer vision.

References (45)

H. Zhang et al.
Visual tracking using Siamese convolutional neural network with region proposal and domain specific updating
Neurocomputing
(2018)
F. Gurkan et al.
Robust object tracking via integration of particle filtering with deep detection
Digit. Signal Process.
(2019)
N. Li et al.
Detecting action tubes via spatial action estimation and temporal path inference
Neurocomputing
(2018)
W. Tao et al.
Interactively multiphase image segmentation based on variational formulation and graph cuts
Pattern Recognit.
(2010)
K. Li et al.
Iterative image segmentation with feature driven heuristic four-color labeling
Pattern Recognit.
(2018)
K. Sun et al.
A center-driven image set partition algorithm for efficient structure from motion
Inf. Sci.
(2019)
X.-S. Wei et al.
Unsupervised object discovery and co-localization by deep descriptor transformation
Pattern Recognit.
(2019)
G. Yue et al.
Blind quality assessment for screen content images via convolutional neural network
Quality Perception of Advanced Multimedia Systems
Digit. Signal Process.
(2019)
G. Zhai et al.
Free-energy principle inspired visual quality assessment: an overview
Quality Perception of Advanced Multimedia Systems
Digit. Signal Process.
(2019)
Z. Zhong et al.
Class-specific object proposals re-ranking for object detection in automatic driving
Neurocomputing
(2017)

Y. Zhu et al.

The prediction of head and eye movement for 360 degree images

Signal Process. Image Commun.

(2018)

Y. Zhu et al.

The prediction of saliency map for head and eye movements in 360 degree images

IEEE Trans. Multimed.

(2019)

Z. Zou et al.

Object detection in 20 years: a survey

L. Jiao et al.

A survey of deep learning-based object detection

IEEE Access

(2019)

R. Girshick et al.

Rich feature hierarchies for accurate object detection and semantic segmentation

K. He et al.

Spatial pyramid pooling in deep convolutional networks for visual recognition

IEEE Trans. Pattern Anal. Mach. Intell.

(2015)

R. Girshick

Fast r-cnn

S. Ren et al.

Faster r-cnn: towards real-time object detection with region proposal networks

K. Li et al.

Unsupervised co-segmentation for indefinite number of common foreground objects

IEEE Trans. Image Process.

(2016)

J. Zhang et al.

Multivideo object cosegmentation for irrelevant frames involved videos

IEEE Signal Process. Lett.

(2016)

Y. Zhao et al.

Temporal action detection with structured segment networks

W. Tao et al.

Unified mean shift segmentation and graph region merging algorithm for infrared ship target segmentation

Opt. Eng.

(2007)

Cited by (1)

Quantifying image naturalness using transfer learning and fusion model
2023, Multimedia Tools and Applications

Kunqian Li received his B.S. degree in China University of Petroleum (UPC), Qingdao, China, in 2012. In 2018, he received his Ph.D. degree in Huazhong University of Science and Technology (HUST), Wuhan, China. He is currently a lecturer in College of Engineering, Ocean University of China, Qingdao, China. His research interests include image processing and visual recognition.

Xinning Wang received her B.S. degree and M.E. degree from Ocean University of China, Qingdao City, China, in 2009 and 2012, respectively, and the Ph.D. degree with the Department of Computer Science and Software Engineering in Auburn University in 2017. She is currently a post-doctoral research fellow in Ocean University of China, Qingdao, China. Her research interests include spanning data mining and analytics, computer architecture and systems, cloud computing, machine learning and cybersecurity.

Xin Luan received the B.S. and M.S. degrees from the School of Computer Science and Technology of Harbin Engineering University. She has been a lecturer, associate professor, professor, doctoral tutor of College of Information Science and Engineering, Ocean University of China, Qingdao, China. She is currently an extramural doctoral tutor of Ocean University of China. She is mainly engaged in research on ocean observation technology and artificial intelligence.

Dalei Song received his Ph.D. degree from Harbin Industrial University, Harbin, China, in 1999. From 1999 to 2001, he was a senior engineer with Lucent Technologies. He is currently a full professor with College of Engineering, Ocean University of China, Qingdao, China. His research interests include machine intelligent perception, ocean observation technology, robot control technology and computer vision.

View full text

DrlNet: Blind object proposal quality assessment with discriminative response learning

Abstract

Introduction

Section snippets

Blind image and saliency quality assessment

Blind discrimination assessment with deep response learning

Experimental results and analysis

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgement

Neurocomputing

Digit. Signal Process.

Neurocomputing

Pattern Recognit.

Pattern Recognit.

Inf. Sci.

Pattern Recognit.

Digit. Signal Process.

Digit. Signal Process.

Neurocomputing

Signal Process. Image Commun.

IEEE Trans. Multimed.

Object detection in 20 years: a survey

A survey of deep learning-based object detection

IEEE Access

Rich feature hierarchies for accurate object detection and semantic segmentation

Spatial pyramid pooling in deep convolutional networks for visual recognition

IEEE Trans. Pattern Anal. Mach. Intell.

Fast r-cnn

Faster r-cnn: towards real-time object detection with region proposal networks

Unsupervised co-segmentation for indefinite number of common foreground objects

IEEE Trans. Image Process.

Multivideo object cosegmentation for irrelevant frames involved videos

IEEE Signal Process. Lett.

Temporal action detection with structured segment networks

Unified mean shift segmentation and graph region merging algorithm for infrared ship target segmentation

Opt. Eng.