MIL-SKDE: Multiple-instance learning with supervised kernel density estimation

doi:10.1016/j.sigpro.2012.07.024

Signal Processing

Volume 93, Issue 6, June 2013, Pages 1471-1484

https://doi.org/10.1016/j.sigpro.2012.07.024 Get rights and content

Abstract

Multiple-instance learning (MIL) is a variation on supervised learning. Instead of receiving a set of labeled instances, the learner receives a set of bags that are labeled. Each bag contains many instances. The aim of MIL is to classify new bags or instances. In this work, we propose a novel algorithm, MIL-SKDE (multiple-instance learning with supervised kernel density estimation), which addresses MIL problem through an extended framework of “KDE (kernel density estimation)+mean shift”. Since the KDE+mean shift framework is an unsupervised learning method, we extend KDE to its supervised version, called supervised KDE (SKDE), by considering class labels of samples. To seek the modes (local maxima) of SKDE, we also extend mean shift to a supervised version by taking into account sample labels. SKDE is an alternative of the well-known diverse density estimation (DDE) whose modes are called concepts. Comparing to DDE, SKDE is more convenient to learn multi-modal concepts and robust to labeling noise (mistakenly labeled bags). Finally, each bag is mapped into a concept space where the multi-class SVM classifiers are learned. Experimental results demonstrate that our approach outperforms the state-of-the-art MIL approaches.

Highlights

► A novel MIL (multi-instance learning) algorithm, MIL-SKDE, is presented in this paper. ► MIL-SKDE utilizes a proposed supervised kernel density estimation (SKDE) to model the MIL. ► The modes of SKDE are located using a proposed supervised version of mean shift. ► Our method performs superiorly to the state-of-the-art alternatives in the experiments.

Introduction

In standard supervised (or single-instance) learning, the training set is given by $D' = {〈 x_{i}, y_{i} 〉}_{i = 1}^{n}$ , where $x_{i} \in R^{d}$ is an instance, and for each $x_{i}$ , there is a known label $y_{i} \in Y = {0, 1}$ . The task is to learn a classifier $f : R^{d} \to Y$ . This learning usually requires much manual labor on labeling. In contrast, the multiple-instance learning (MIL) avoids labeling the individual instances and assigns only one label to a collection of instances instead. Such a collection of instances is called a bag. More formally, the training set in MIL is given by $D = {〈 B_{i}, y_{i} 〉}_{i = 1}^{m}$ , where bag $B_{i} = {x_{i, j}}_{j = 1}^{| B_{i} |}$ , $x_{i, j} \in R^{d}$ is an instance and $| B_{i} |$ is the number of instances in B_i. Let $y_{i, j} \in {0, 1}$ be the latent label of instance $x_{i, j} \in B_{i}$ . Then, the known $y_{i} = \max {y_{i, j}}_{j = 1}^{| B_{i} |}$ . The aim of MIL is to learn a classifier that is able to classify new bags or new instances.

Many tasks in computer vision and machine learning can be naturally cast as an MIL problem. For instance, object categorization which involves determining whether or not an image contains a certain category of objects [1], [2], [3]. To tackle the intra-class variability, e.g., appearance diversity of vehicles according to different makes and models, MIL treats each image as a set of local regions, but only those regions that carry category-specific information are regions of interest (ROI) for purposes of classification. For example, all the wheel regions of vehicles have common circular shapes, so the wheel regions are ROI. Other regions have random features and possess no discriminative power. From this viewpoint, an image is a bag and the image regions are instances of the image categorization problem. Likewise, many other applications are also able to be treated as MIL problems, e.g., content-based image retrieval [4], [5], [6], segmentation [7], object tracking [8], human detection [9] and computer aided diagnosis [10], [11].

Loosely speaking, if an instance repeatedly occurs in different positive bags, this instance is called a concept (concepts for negative bags are not considered here because negative instances are usually randomly distributed) and the weights of concepts indicate the frequencies of which concepts occur in positive bags. Learning concepts is critical for MIL algorithms. However, MIL algorithms often confront a problem of inducing weighted concepts from a large instance space (large value of $n = \sum_{i = 1}^{m} | B_{i} |$ as each bag may contain many instances) and a large feature space (feature vectors extracted from instances are high-dimensional, e.g., 128-D SIFT features [12]). How to efficiently induce important concepts from a large instance+feature space is still challenging for MIL. Another issue is that many existing MIL algorithms follow a multiple-instance setting that a positive bag must contain at least one true positive instance, whereas a negative bag contains negative instances only. This setting is often not true in computer vision tasks because the labeling processing is quite subjective or visual features extracted from instances are distorted due to changes of scale, illumination and view angle, etc. Those bags labeled positive but containing no true positive instances and bags labeled negative but containing true positive instances are called labeling noise.

In this paper, our contributions are as follows. We propose a novel MIL algorithm, MIL-SKDE (multiple-instance learning with supervised kernel density estimation), which is able to conveniently learn concepts in large feature space and is robust to labeling noise. First, we advance a modified version of kernel density estimation (KDE) function to estimate the instance distribution. The modified KDE is named supervised kernel density estimation (SKDE) as it considers class labels of data points. SKDE is an alternative to the well-known diverse density estimation (DDE) [13] and more robust to the labeling noise. The same as DDE, the modes (local maxima) of SKDE are the concepts to be learned. As mean shift is widely used to locate modes of KDE, we extend mean shift to its supervised version (named supervised mean shift) to adapt it to SKDE. Similar to mean shift, the supervised mean shift is also steepest ascent (that only computes first order gradient) with a varying step size that is the magnitude of the gradient. Supervised mean shift is handy to seek modes in a large feature space because, from an initial point, it can quickly converge to a mode without computing Hessian matrix (second-order gradient) which is expensive in high-dimensional space.

The remainder of the paper is organized as follows. In Section 2, we review the previous work on MIL and briefly describe kernel density estimation and mean shift. In Section 3 we derive our supervised kernel density estimation and compare it with diverse density estimation. Section 4 presents the supervised mean shift for locating the modes in SKDE. Section 5 summarizes MIL-SKDE. Experiments are presented in Section 6. Finally, we conclude this paper in Section 7.

Section snippets

Multiple-instance Learning

MIL is first proposed in the context of drug activity prediction [14]. Since then, many efforts have been endeavored to address this learning with ambiguous labeling. Maron et al. [13] proposed the Diverse Density to estimate the instance distribution in the feature space. Specifically, let ${B_{i}^{+}}_{i = 1}^{m^{+}}$ and ${B_{i}^{-}}_{i = 1}^{m^{-}}$ denote positive and negative bags respectively and x be a concept, the Diverse Density estimator (DDE) of x is defined as ${\hat{f}}_{dde} (x) = \prod_{i = 1}^{m^{+}} \Pr (x | B_{i}^{+}) \prod_{i = 1}^{m^{-}} \Pr (x | B_{i}^{-}) .$ The $\Pr (x | B_{i}^{+})$ and $\Pr ($

Supervised Kernel density estimation

In this section, we derive a new density to estimate instance distribution given the bag samples and desire the modes of the new density stand for the concepts to be learned. To formulate such density, the kernel density estimation (KDE) seems to be a good option as seeking modes in KDE has been well investigated in previous works. However, a gap between MIL and KDE is that MIL is a generalized supervised learning whereas KDE is an unsupervised learning. To bridge this gap, we derive an

Supervised mean shift

In the mean shift algorithm for mode seeking of KDE, a starting point x will quickly converge to a mode (local maximum) of KDE by iteratively shifting x to a mean vector $\bar{x}$ calculated by (10). The convergence is guaranteed as the mean shift vector $\bar{x} - x$ always points toward the direction of maximum increase in the KDE, i.e., the gradient of (5). The mean shift is steepest ascent with a varying step size which is the magnitude of the gradient. Mean shift does not compute the Hessian matrix which

Algorithm summary

In this section, we summarize the MIL-SKDE algorithm in concept learning and classification.

Experiments

First, we qualitatively analyze the supervised mean shift, which is a key component of MIL-SKDE, on synthetic data. Second, to evaluate the proposed MIL-SKDE, we apply it to two typical but challenging MIL applications, region-based image categorization and object category recognition, based on the publicly available benchmark datasets. The results are compared with other state-of-the-art approaches. The source codes of some previous MIL approaches are kindly provided online by researchers in

Conclusions

In this paper, we have presented a novel MIL algorithm, MIL-SKDE, which utilizes the proposed a density function, supervised kernel density estimation (SKDE), to model the MIL problem. Comparing to commonly adopted Diverse Density Estimation (DDE) (1), SKDE is able to learn multiple concepts efficiently through a supervised mean shift and better tolerate the labeling noise. SKDE and supervised mean shift can be seen as the extensions of conventional kernel density estimation and mean shift

References (40)

T.G. Dietterich et al.
Solving the multiple-instance problem with axis-parallel rectangles
Artificial Intelligence
(1997)
X. Li et al.
A note on the convergence of the mean shift
Journal of Pattern Recognition
(2007)
M. Wang et al.
Semi-supervised kernel density estimation for video annotation
Computer Vision and Image Understanding
(2009)
Z. Fu, A. Robles-Kelly, An instance selection approach to multiple instance learning, in: Computer Vision and Pattern...
S. Vijayanarasimhan, K. Grauman, Keywords to visual categories: Multiple-instance learning forweakly supervised object...
Y. Chen et al.
MilesMultiple-instance learning via embedded instance selection
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2006)
Z.-J. Zha, X.-S. Hua, T. Mei, J. Wang, G.-J. Qi, Z. Wang, Joint multi-label multi-instance learning for image...
Q. Zhang, S.A. Goldman, W. Yu, J. E. Fritts, Content-based image retrieval using multiple-instance learning, in:...
C. Yang, Image database retrieval with multiple-instance learning techniques, in: International Conference on Data...
A. Vezhnevets, J.M. Buhmann, Towards weakly supervised semantic segmentation by means of multiple instance and...

M. Li, J.T. Kwok, B.-L. Lu, Online multiple instance learning with no regret, in: Computer Vision and Pattern...

Z. Lin, G. Hua, L.S. Davis, Multiple instance feature for robust part-based object detection, in: Computer Vision and...

V. C. Raykar, B. Krishnapuram, J. Bi, M. Dundar, R. B. Rao, Bayesian multiple instance learning: automatic feature...

D. Wu, J. Bi, K. Boyer, A min-max framework of cascaded classifier with multiple instance learning for computer aided...

D.G. Lowe

Distinctive image features from scale-invariant keypoints

International Journal of Computer Vision

(2004)

O. Maron, T. Lozano-Perez, A framework for multiple-instance learning, in: Advances in Neural Information Processing...

Q. Zhang, S.A. Goldman, Em-dd: An improved multiple-instance learning technique, in: Advances in Neural Information...

R. Rahmani et al.

Localized content based image retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2008)

Y. Chen et al.

Image categorization by learning and reasoning with regions

Journal of Machine Learning Research

(2004)

S. Andrews, I. Tsochantaridis, T. Hofmann, Support vector machines for multiple-instance learning, in: Advances in...

Cited by (8)

A multi-instance networks with multiple views for classification of mammograms
2021, Neurocomputing
Citation Excerpt :
However, the proposed model only uses one mammogram of a patient to distinguish benign and malignant cases without any other annotations. MIL is a variety of supervised learning [33]. It provides a weakly-supervised modeling approach.
Breast cancer is the most common malignant disease in women, and early screening of breast cancer is crucial for improving the survival rate. Mammography is one of the most popular imaging methods for breast cancer screening with the characteristics of practicality, effectiveness, and low cost. However, the classification of mammograms suffers from large image sizes, indistinct image characteristics of lesions, small proportion of abnormalities, and class imbalance. To address these difficulties, the multi-view input and weighted multi-instance learning (MIL) methods are proposed. A novel model called the weighted MIL DenseNet with multi-view input (WMDNet) is presented that integrates the two methods above. The multi-view inputs method is used to enhance the abnormalities of mammograms and obtain more potential features from mammograms with different views, simultaneously. The weighted MIL is designed to extract the most suspicious lesions from mammograms to resolve the problems of small proportion of abnormalities and class imbalance. To verify the effectiveness of the proposed methods, three binary classification models are evaluated on two public datasets, the INbreast and MIAS datasets. The experimental results demonstrate that the proposed methods can achieve preferably results compared with several other state-of-the-art approaches, especially the proposed WMDNet model gains the best classification results on both datasets.
MILKDE: A new approach for multiple instance learning based on positive instance selection and kernel density estimation
2017, Engineering Applications of Artificial Intelligence
Citation Excerpt :
Unlike the MILIS algorithm, our methods consider both positive and negative bags, an approach that tends to be more representative of the entire distribution, what may explain the improved results in all data sets. Regarding MIL-SKDE (Du et al., 2013), the association of weights to the concepts is useful since it allows the use of all concepts (with weights) instead of only one. On the other hand, our method is able to use not only one instance of each positive bag (concepts), but also all instances selected in the Repechage step, what may yield robustness.
Multiple Instance Learning (MIL) is a recent paradigm of learning, which is based on the assignment of a single label to a set of instances called bag. A bag is positive if it contains at least one positive instance, and negative otherwise. This work proposes a new algorithm based on likelihood computation by means of Kernel Density Estimation (KDE) called MILKDE. Using the LogitBoost classifier, its performance was compared to that of forty-three MIL algorithms available in the literature using five data sets. Our proposal outperformed all of them for the Elephant (87.40%), Fox (66.80%) and COREL 2000 data sets (77.8%), and achieved competitive results for the MUSK 1 (89.20%) and MUSK 2 (87.50%) data sets, which are comparable to the higher accuracies obtained by other methods for this data sets. Overall results are statistically comparable to those obtained by the most well known methods for MIL described in the literature.
Weakly supervised segment annotation via expectation kernel density estimation
2019, IET Computer Vision
Computational analysis of multiple instance learning-based systems for automatic visual inspection: A doctoral research proposal
2019, Advances in Intelligent Systems and Computing
Weakly supervised segment annotation via expectation kernel density estimation
2018, arXiv
Relative Minimum Distance Between Projected Bags for Improved Multiple Instance Classification
2018, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

View all citing articles on Scopus

View full text

MIL-SKDE: Multiple-instance learning with supervised kernel density estimation

Abstract

Highlights

Introduction

Section snippets

Multiple-instance Learning

Supervised Kernel density estimation

Supervised mean shift

Algorithm summary

Experiments

Conclusions

Artificial Intelligence

Journal of Pattern Recognition

Computer Vision and Image Understanding

MilesMultiple-instance learning via embedded instance selection

IEEE Transactions on Pattern Analysis and Machine Intelligence

Distinctive image features from scale-invariant keypoints

International Journal of Computer Vision

Localized content based image retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence

Image categorization by learning and reasoning with regions

Journal of Machine Learning Research