Elsevier

Signal Processing

Volume 93, Issue 6, June 2013, Pages 1471-1484
Signal Processing

MIL-SKDE: Multiple-instance learning with supervised kernel density estimation

https://doi.org/10.1016/j.sigpro.2012.07.024Get rights and content

Abstract

Multiple-instance learning (MIL) is a variation on supervised learning. Instead of receiving a set of labeled instances, the learner receives a set of bags that are labeled. Each bag contains many instances. The aim of MIL is to classify new bags or instances. In this work, we propose a novel algorithm, MIL-SKDE (multiple-instance learning with supervised kernel density estimation), which addresses MIL problem through an extended framework of “KDE (kernel density estimation)+mean shift”. Since the KDE+mean shift framework is an unsupervised learning method, we extend KDE to its supervised version, called supervised KDE (SKDE), by considering class labels of samples. To seek the modes (local maxima) of SKDE, we also extend mean shift to a supervised version by taking into account sample labels. SKDE is an alternative of the well-known diverse density estimation (DDE) whose modes are called concepts. Comparing to DDE, SKDE is more convenient to learn multi-modal concepts and robust to labeling noise (mistakenly labeled bags). Finally, each bag is mapped into a concept space where the multi-class SVM classifiers are learned. Experimental results demonstrate that our approach outperforms the state-of-the-art MIL approaches.

Highlights

► A novel MIL (multi-instance learning) algorithm, MIL-SKDE, is presented in this paper. ► MIL-SKDE utilizes a proposed supervised kernel density estimation (SKDE) to model the MIL. ► The modes of SKDE are located using a proposed supervised version of mean shift. ► Our method performs superiorly to the state-of-the-art alternatives in the experiments.

Introduction

In standard supervised (or single-instance) learning, the training set is given by D={xi,yi}i=1n, where xiRd is an instance, and for each xi, there is a known label yiY={0,1}. The task is to learn a classifier f:RdY. This learning usually requires much manual labor on labeling. In contrast, the multiple-instance learning (MIL) avoids labeling the individual instances and assigns only one label to a collection of instances instead. Such a collection of instances is called a bag. More formally, the training set in MIL is given by D={Bi,yi}i=1m, where bag Bi={xi,j}j=1|Bi|, xi,jRd is an instance and |Bi| is the number of instances in Bi. Let yi,j{0,1} be the latent label of instance xi,jBi. Then, the known yi=max{yi,j}j=1|Bi|. The aim of MIL is to learn a classifier that is able to classify new bags or new instances.

Many tasks in computer vision and machine learning can be naturally cast as an MIL problem. For instance, object categorization which involves determining whether or not an image contains a certain category of objects [1], [2], [3]. To tackle the intra-class variability, e.g., appearance diversity of vehicles according to different makes and models, MIL treats each image as a set of local regions, but only those regions that carry category-specific information are regions of interest (ROI) for purposes of classification. For example, all the wheel regions of vehicles have common circular shapes, so the wheel regions are ROI. Other regions have random features and possess no discriminative power. From this viewpoint, an image is a bag and the image regions are instances of the image categorization problem. Likewise, many other applications are also able to be treated as MIL problems, e.g., content-based image retrieval [4], [5], [6], segmentation [7], object tracking [8], human detection [9] and computer aided diagnosis [10], [11].

Loosely speaking, if an instance repeatedly occurs in different positive bags, this instance is called a concept (concepts for negative bags are not considered here because negative instances are usually randomly distributed) and the weights of concepts indicate the frequencies of which concepts occur in positive bags. Learning concepts is critical for MIL algorithms. However, MIL algorithms often confront a problem of inducing weighted concepts from a large instance space (large value of n=i=1m|Bi| as each bag may contain many instances) and a large feature space (feature vectors extracted from instances are high-dimensional, e.g., 128-D SIFT features [12]). How to efficiently induce important concepts from a large instance+feature space is still challenging for MIL. Another issue is that many existing MIL algorithms follow a multiple-instance setting that a positive bag must contain at least one true positive instance, whereas a negative bag contains negative instances only. This setting is often not true in computer vision tasks because the labeling processing is quite subjective or visual features extracted from instances are distorted due to changes of scale, illumination and view angle, etc. Those bags labeled positive but containing no true positive instances and bags labeled negative but containing true positive instances are called labeling noise.

In this paper, our contributions are as follows. We propose a novel MIL algorithm, MIL-SKDE (multiple-instance learning with supervised kernel density estimation), which is able to conveniently learn concepts in large feature space and is robust to labeling noise. First, we advance a modified version of kernel density estimation (KDE) function to estimate the instance distribution. The modified KDE is named supervised kernel density estimation (SKDE) as it considers class labels of data points. SKDE is an alternative to the well-known diverse density estimation (DDE) [13] and more robust to the labeling noise. The same as DDE, the modes (local maxima) of SKDE are the concepts to be learned. As mean shift is widely used to locate modes of KDE, we extend mean shift to its supervised version (named supervised mean shift) to adapt it to SKDE. Similar to mean shift, the supervised mean shift is also steepest ascent (that only computes first order gradient) with a varying step size that is the magnitude of the gradient. Supervised mean shift is handy to seek modes in a large feature space because, from an initial point, it can quickly converge to a mode without computing Hessian matrix (second-order gradient) which is expensive in high-dimensional space.

The remainder of the paper is organized as follows. In Section 2, we review the previous work on MIL and briefly describe kernel density estimation and mean shift. In Section 3 we derive our supervised kernel density estimation and compare it with diverse density estimation. Section 4 presents the supervised mean shift for locating the modes in SKDE. Section 5 summarizes MIL-SKDE. Experiments are presented in Section 6. Finally, we conclude this paper in Section 7.

Section snippets

Multiple-instance Learning

MIL is first proposed in the context of drug activity prediction [14]. Since then, many efforts have been endeavored to address this learning with ambiguous labeling. Maron et al. [13] proposed the Diverse Density to estimate the instance distribution in the feature space. Specifically, let {Bi+}i=1m+ and {Bi}i=1m denote positive and negative bags respectively and x be a concept, the Diverse Density estimator (DDE) of x is defined asf^dde(x)=i=1m+Pr(x|Bi+)i=1mPr(x|Bi).The Pr(x|Bi+) and Pr(

Supervised Kernel density estimation

In this section, we derive a new density to estimate instance distribution given the bag samples and desire the modes of the new density stand for the concepts to be learned. To formulate such density, the kernel density estimation (KDE) seems to be a good option as seeking modes in KDE has been well investigated in previous works. However, a gap between MIL and KDE is that MIL is a generalized supervised learning whereas KDE is an unsupervised learning. To bridge this gap, we derive an

Supervised mean shift

In the mean shift algorithm for mode seeking of KDE, a starting point x will quickly converge to a mode (local maximum) of KDE by iteratively shifting x to a mean vector x¯ calculated by (10). The convergence is guaranteed as the mean shift vector x¯x always points toward the direction of maximum increase in the KDE, i.e., the gradient of (5). The mean shift is steepest ascent with a varying step size which is the magnitude of the gradient. Mean shift does not compute the Hessian matrix which

Algorithm summary

In this section, we summarize the MIL-SKDE algorithm in concept learning and classification.

Experiments

First, we qualitatively analyze the supervised mean shift, which is a key component of MIL-SKDE, on synthetic data. Second, to evaluate the proposed MIL-SKDE, we apply it to two typical but challenging MIL applications, region-based image categorization and object category recognition, based on the publicly available benchmark datasets. The results are compared with other state-of-the-art approaches. The source codes of some previous MIL approaches are kindly provided online by researchers in

Conclusions

In this paper, we have presented a novel MIL algorithm, MIL-SKDE, which utilizes the proposed a density function, supervised kernel density estimation (SKDE), to model the MIL problem. Comparing to commonly adopted Diverse Density Estimation (DDE) (1), SKDE is able to learn multiple concepts efficiently through a supervised mean shift and better tolerate the labeling noise. SKDE and supervised mean shift can be seen as the extensions of conventional kernel density estimation and mean shift

References (40)

  • T.G. Dietterich et al.

    Solving the multiple-instance problem with axis-parallel rectangles

    Artificial Intelligence

    (1997)
  • X. Li et al.

    A note on the convergence of the mean shift

    Journal of Pattern Recognition

    (2007)
  • M. Wang et al.

    Semi-supervised kernel density estimation for video annotation

    Computer Vision and Image Understanding

    (2009)
  • Z. Fu, A. Robles-Kelly, An instance selection approach to multiple instance learning, in: Computer Vision and Pattern...
  • S. Vijayanarasimhan, K. Grauman, Keywords to visual categories: Multiple-instance learning forweakly supervised object...
  • Y. Chen et al.

    MilesMultiple-instance learning via embedded instance selection

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2006)
  • Z.-J. Zha, X.-S. Hua, T. Mei, J. Wang, G.-J. Qi, Z. Wang, Joint multi-label multi-instance learning for image...
  • Q. Zhang, S.A. Goldman, W. Yu, J. E. Fritts, Content-based image retrieval using multiple-instance learning, in:...
  • C. Yang, Image database retrieval with multiple-instance learning techniques, in: International Conference on Data...
  • A. Vezhnevets, J.M. Buhmann, Towards weakly supervised semantic segmentation by means of multiple instance and...
  • M. Li, J.T. Kwok, B.-L. Lu, Online multiple instance learning with no regret, in: Computer Vision and Pattern...
  • Z. Lin, G. Hua, L.S. Davis, Multiple instance feature for robust part-based object detection, in: Computer Vision and...
  • V. C. Raykar, B. Krishnapuram, J. Bi, M. Dundar, R. B. Rao, Bayesian multiple instance learning: automatic feature...
  • D. Wu, J. Bi, K. Boyer, A min-max framework of cascaded classifier with multiple instance learning for computer aided...
  • D.G. Lowe

    Distinctive image features from scale-invariant keypoints

    International Journal of Computer Vision

    (2004)
  • O. Maron, T. Lozano-Perez, A framework for multiple-instance learning, in: Advances in Neural Information Processing...
  • Q. Zhang, S.A. Goldman, Em-dd: An improved multiple-instance learning technique, in: Advances in Neural Information...
  • R. Rahmani et al.

    Localized content based image retrieval

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2008)
  • Y. Chen et al.

    Image categorization by learning and reasoning with regions

    Journal of Machine Learning Research

    (2004)
  • S. Andrews, I. Tsochantaridis, T. Hofmann, Support vector machines for multiple-instance learning, in: Advances in...
  • Cited by (8)

    • A multi-instance networks with multiple views for classification of mammograms

      2021, Neurocomputing
      Citation Excerpt :

      However, the proposed model only uses one mammogram of a patient to distinguish benign and malignant cases without any other annotations. MIL is a variety of supervised learning [33]. It provides a weakly-supervised modeling approach.

    • MILKDE: A new approach for multiple instance learning based on positive instance selection and kernel density estimation

      2017, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      Unlike the MILIS algorithm, our methods consider both positive and negative bags, an approach that tends to be more representative of the entire distribution, what may explain the improved results in all data sets. Regarding MIL-SKDE (Du et al., 2013), the association of weights to the concepts is useful since it allows the use of all concepts (with weights) instead of only one. On the other hand, our method is able to use not only one instance of each positive bag (concepts), but also all instances selected in the Repechage step, what may yield robustness.

    • Relative Minimum Distance Between Projected Bags for Improved Multiple Instance Classification

      2018, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus
    View full text