Exploring ubiquitous relations for boosting classification and localization
Introduction
The performance of the learning models in the field of image processing has been rapidly improved in a brief period. It mainly benefits from the development of deep neural networks which are driven by massive amounts of training data [1], [2], [3], [4]. Nevertheless, therein still lies the prominent problem of lacking detailed annotations in the large-scale detection and segmentation tasks. The progress of manually annotated training data is not only expensive and tedious but also inaccurate due to human error. It is currently a big challenge to explore robust learning methods with a minimal amount of supervision.
To address such problem, weakly supervised learning [5], [6], [7], [8] has been gaining significant attention to face the challenges of large-scale detection and segmentation only utilizing image-level labels as the supervised information (i.e., whether the queried object is present in the image or not). Compared with the supervised methods, weakly supervised and unsupervised ones attempt to use the scant image-level label annotations and raw feature of image pixels to accomplish complex tasks. Most of the recent works focus on mining the general feature representation or transforming the weak annotations to strong ones. What can humans get in case of facing a huge amount of images? We can easily identify the same objects from one category of images in spite of their free locations. We can also have a clear idea about which pixels belong to a given object. The reason is that we make a conscious effort to find the Relation which permeates the whole visual world. As an example shown in Fig. 1(a), the objects of the same category present a very similar visual appearance evidently. And Fig. 1(b) shows a certain relation in material texture and edge coherence between superpixels belonging to the same object or background. Therefore, we have serious grounds to believe that the relation can help the classification and localization tasks. And some experiments [9], [10] have already given evidence that the relation is profitable for the deep learning model. The challenges are how to explore the ubiquitous relations in the visual world, and how to boost the existing learning models, especially without further annotations.
In this paper, we explore the ubiquitous relations to boost the learning models to reach their full potential without further annotations. We first propose a relation exploring scheme to mine the relations, and then provide three specific instantiations (i.e., Object-Relation, Superpixel-Relation, and Pixel-Relation) of the scheme for different learning models. Object-Relation mines the object-level relations among objects on different images only with the image-labels. We adopt it to lead the few-shot classification models to focus on the principal object and weaken the misleading of the background. Superpixel-Relation concentrates on the relations among superpixels as shown in Fig. 1(b). The relations can be easily utilized to the saliency detection results amending without any additional annotation. For the Pixel-Relation, we explore the relation between pixels on feature maps. And we introduce the Pixel-Relation to the saliency detection models to further discriminate the uncertain saliency area by comparing their feature pixel as shown in Fig. 1(c) with the certain area.
There are several advantages of our relation exploring scheme: (a) The relations capture the intrinsic spatial dependencies of the entities by computing their direct interactions without any additional strong annotation; (b) The scheme can be generalized into different scenarios including image, superpixel and feature map. (c) It is flexible to adopt to the existing learning models and boost their performance.
In summary, the main contributions of this paper are:
- •
We propose a universal relation measurement pattern which can generalize to three forms to inspire the potential of the existing non-fully supervised models while keeping the original model structure.
- •
We explore the object-level relation and utilize it for boosting the few-shot classification model.
- •
We mine the superpixel-level relation to amend some existing saliency object detection algorithms.
- •
We explore the pixel-level relation to refine the unsatisfactory saliency object detection results that contain plenty of uncertain areas.
The rest of this paper is organized as follows. Section 2 summarizes the related works. Section 3 formally introduces our model in detail. Section 4 presents the experimental results. Finally, we conclude our work in Section 5.
Section snippets
Related work
Machine Learning algorithms play crucial roles in our life. For example, some algorithms have achieved significant success on financial distress prediction [11], [12], [13], which enable to get rid of the global financial crisis. Recently emerged intelligent application, such as intelligent monitoring system [14], self-driving car [15], oceanic Data Analysis [16], [17], [18], deeply rely on deep-learning methods.
The popular deep learning methods now use large-scale data. But in reality, there
Relation for non-fully supervised learning
In this section, we first define a general scheme of relation discovery. Then we specialize it into three instantiations, i.e., Object-Relation, Superpixel-Relation and Pixel-Relation, for different learning models of image understanding applications.
Experiments
We expound all the experiments in detail and show the results for the applications of three relations. We choose the state-of-the-art algorithm in each task as baseline to show the performance of our relations. So, in the current paper, we will not repeat the evaluation works on other good algorithms.
Conclusions
In this paper, we endeavored to mine the ubiquitous relation for boosting different state-of-the-art learning models. We introduced a relation exploring scheme to estimate the relation between two entities. We applied the basic scheme to different tasks including few-shot classification and saliency object detection for improvement. Firstly, the object relation can guide the classification model to learn the essential object feature with the condition of scarce training data. Secondly, the
CRediT authorship contribution statement
Xin Sun: Conceptualization, Supervision, Writing - original draft. Changrui Chen: Conceptualization, Data curation, Methodology, Writing - original draft. Junyu Dong: Writing - review & editing, Project administration, Investigation. Dan Liu: Resources, Data curation. Guosheng Hu: Writing - review & editing, Validation.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (No. 61971388, U1706218, 41576011, L1824025), Key Research and Development Program of Shandong Province, China (No. 2017GGX10105), and Major Program of Natural Science Foundation of Shandong Province, China (No. ZR2018ZB0852). The authors gratefully thank anonymous referees for their useful comments and editors for their work.
References (65)
- et al.
Inverse projection group sparse representation for tumor classification: A low rank variation dictionary approach
Knowl.-Based Syst.
(2020) - et al.
Low-rank local tangent space embedding for subspace clustering
Inform. Sci.
(2020) - et al.
Dynamic financial distress prediction with concept drift based on time weighting combined with Adaboost support vector machine ensemble
Knowl.-Based Syst.
(2017) - et al.
Class-imbalanced dynamic financial distress prediction based on Adaboost-SVM ensemble combined with SMOTE and time weighting
Inf. Fusion
(2020) - et al.
Imbalanced enterprise credit evaluation with DTE-sbd: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates
Inform. Sci.
(2018) - et al.
A CFCC-lstm model for sea surface temperature prediction
IEEE Geosci. Remote Sens. Lett.
(2017) - et al.
Transferring deep knowledge for object recognition in low-quality underwater videos
Neurocomputing
(2018) - et al.
Learning to segment with image-level annotations
Pattern Recognit.
(2016) - et al.
Saliency detection via background and foreground seed selection
Neurocomputing
(2015) - et al.
Faster R-CNN: Towards real-time object detection with region proposal networks
IEEE Trans. Pattern Anal. Mach. Intell.
(2015)