Skip to main content

Advertisement

Log in

Context-aware MIML instance annotation: exploiting label correlations with classifier chains

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In multi-instance multi-label (MIML) instance annotation, the goal is to learn an instance classifier while training on a MIML dataset, which consists of bags of instances paired with label sets; instance labels are not provided in the training data. The MIML formulation can be applied in many domains. For example, in an image domain, bags are images, instances are feature vectors representing segments in the images, and the label sets are lists of objects or categories present in each image. Although many MIML algorithms have been developed for predicting the label set of a new bag, only a few have been specifically designed to predict instance labels. We propose MIML-ECC (ensemble of classifier chains), which exploits bag-level context through label correlations to improve instance-level prediction accuracy. The proposed method is scalable in all dimensions of a problem (bags, instances, classes, and feature dimension) and has no parameters that require tuning (which is a problem for prior methods). In experiments on two image datasets, a bioacoustics dataset, and two artificial datasets, MIML-ECC achieves higher or comparable accuracy in comparison with several recent methods and baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. We found it effective to use a pool of threads, with each handling one of the \(L\) chains. Within each of these threads, construction of the RF classifiers was parallelized over trees. Support instance updates cannot be parallelized, because they occur sequentially in time.

  2. Code is C++ compiled with GCC 4.0 (most speed optimizations enabled). Experiments ran on a Mac Pro with 2x 2.4 GHz Quad-Core Intel Xeon processor and 16 GB 1,066 MHz DDR3 memory, with OS X 10.8.1.

References

  1. Andrews S, Tsochantaridis I, Hofmann T (2002) Support vector machines for multiple-instance learning. Adv Neural Inf Process Syst 15:561–568

    Google Scholar 

  2. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  3. Briggs F, Fern X, Raich R (2012) Rank-loss support instance machines for MIML instance annotation. In: International conference on data mining, pp 534–542

  4. Briggs F, Fern X, Raich R, Lou Q (2013) Instance annotation for multi-instance multi-label learning. ACM Trans Knowl Discov Data 7(3). doi:10.1145/2500491

  5. Briggs F, Lakshminarayanan B, Neal L, Fern X, Raich R, Hadley S, Hadley A, Betts M (2012) Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. J Acoust Soc Am 131:4640

    Article  Google Scholar 

  6. Buja A, Stuetzle W, Shen Y (2005) Loss functions for binary class probability estimation and classification: structure and applications. Technical report

  7. Carroll L (1896) Through the looking-glass: and what Alice found there. Macmillan, London

  8. Cour T, Sapp B, Taskar B (2011) Learning from partial labels. J Mach Learn Res 12:1225–1261

    MathSciNet  Google Scholar 

  9. Dembczynski K, Cheng W, Hüllermeier E (2010) Bayes optimal multilabel classification via probabilistic classifier chains. In: International conference on machine learning, pp 279–286

  10. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MATH  MathSciNet  Google Scholar 

  11. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  12. Frey PW, Slate DJ (1991) Letter recognition using holland-style adaptive classifiers. Mach Learn 6:161

    Google Scholar 

  13. Frost R (1921) Mountain interval. H. Holt, New York

  14. Kong X, Shi X, Philip SY (2011) Multi-label collective classification. In: SDM, vol. 11, pp 618–629

  15. Li B, Xiong W, Hu W (2011) Context-aware multi-instance learning based on hierarchical sparse representation. In: Data mining (ICDM), 2011 IEEE 11th international conference on, pp 370–377

  16. Li B, Xiong W, Hu W (2011) Web horror image recognition based on context-aware multi-instance learning. In: International conference on data mining, pp 1158–1163

  17. Li Y, Hu J, Jiang Y, Zhou Z (2012) Towards discovering what patterns trigger what labels. In: Conference on artificial intelligence

  18. Li Y, Ji S, Kumar S, Ye J, Zhou Z et al (2009) Drosophila gene expression pattern annotation through multi-instance multi-label learning. In: International joint conference on artificial intelligence, pp 1445–1450

  19. Liu L, Dietterich T (2012) A conditional multinomial mixture model for superset label learning. In: Advances in neural information processing systems, pp 557–565

  20. Maron O (1998) Learning from ambiguity. PhD thesis, Massachusetts Institute of Technology

  21. Maron O, Lozano-Pérez T (1998) A framework for multiple-instance learning. In: Advances in neural information processing systems, pp 570–576

  22. Nguyen N (2010) A new svm approach to multi-instance multi-label learning. In: International conference on data mining, pp 384–392

  23. Rahimi A, Recht B (2007) Random features for large-scale kernel machines. Adv Neural Inf Process Syst 20:1177–1184

    Google Scholar 

  24. Ray S, Craven M (2005) Supervised versus multiple instance learning: an empirical comparison. In: International conference on machine learning. ACM, New York, pp 697–704

  25. Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359

    Article  MathSciNet  Google Scholar 

  26. Shen C, Jiao J, Wang B, Yang Y (2009) Multi-instance multi-label learning for automatic tag recommendation. In: Proceedings of the 2009 IEEE international conference on systems, man, and cybernetics (SMC 2009)

  27. Vezhnevets A, Buhmann J, Zurich E (2010) Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning. In: Conference on computer vision and pattern recognition

  28. Vijayanarasimhan S, Grauman K (2009) What’s it going to cost you?: Predicting effort versus informativeness for multi-label image annotations. In: Conference on computer vision and pattern recognition, pp 2262–2269

  29. Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In: International conference on computer vision, pp 1800–1807

  30. Xu X, Frank E (2004) Logistic regression and boosting for labeled bags of instances. In: Advances in knowledge discovery and data mining, pp 272–281

  31. Xu X, Xue X, Zhou Z (2011) Ensemble multi-instance multi-label learning approach for video annotation task. In: Proceedings of the 19th ACM international conference on multimedia, ACM, New York, pp 1153–1156

  32. Yang S, Bian J, Zha H (2010) Hybrid generative/discriminative learning for automatic image annotation. In: Conference on uncertainty in artificial intelligence

  33. Yang S, Zha H, Hu B (2009) Dirichlet-bernoulli alignment: a generative model for multi-class multi-label multi-instance corpora. In: Advances in neural information processing systems, pp 2143–2150

  34. Yang W, Wang Y, Vahdat A, Mori G (2012) Kernel latent svm for visual recognition. In: Advances in neural information processing systems, volume 2, p 4

  35. Yuille A, Rangarajan A (2002) The concave-convex procedure (CCCP). Adv Neural Inf Process Syst 2:1033–1040

    Google Scholar 

  36. Zha Z, Hua X, Mei T, Wang J, Qi G, Wang Z (2008) Joint multi-label multi-instance learning for image classification. In: Conference on computer vision and pattern recognition, pp 1–8

  37. Zhang M, Zhou Z (2008) M3MIML: a maximum margin method for multi-instance multi-label learning. In: International conference on data mining, pp 688–697

  38. Zhang Q, Goldman S (2002) EM-DD: an improved multiple-instance learning technique. Adv Neural Inf Process Syst 2:1073–1080

    Google Scholar 

  39. Zhou Z, Zhang M (2007) Multi-instance multi-label learning with application to scene classification. Adv Neural Inf Process Syst 19:1609

    Google Scholar 

  40. Zhou Z-H, Sun Y-Y, Li Y-F (2009) Multi-instance learning by treating instances as non-iid samples. In: International conference on machine learning, pp 1249–1256

  41. Zhou Z-H, Zhang M-L, Huang S-J, Li Y-F (2012) Multi-instance multi-label learning. Artif Intell 176(1):2291–2320

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work is partially funded by NSF grant 1055113 to Xiaoli Z. Fern, and the College of Engineering, Oregon State University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Forrest Briggs.

Appendix

Appendix

The correlation coefficient \(\rho (X,Y)\) between two random variables X and Y is given by

$$\begin{aligned} \rho (X,Y) = \frac{\hbox {Cov}(X,Y)}{\sqrt{\hbox {Var}(X)}\sqrt{\hbox {Var}(Y)}} \end{aligned}$$
(14)

Let \(X\) be a Bernoulli RV with \(P(X=1)=\frac{1}{2}\). Similarly, let \(Y\) conditioned on \(X\) be a Bernoulli RV with \(P(Y=1|X)=\frac{1}{2} (1+\rho ) X + \frac{1}{2} (1-\rho ) (1-X)\), as in Sect. 5.5, Eq. (12). The correlation coefficient \(\rho (X,Y) = \rho \).

Proof we begin by noting the property that the expected value of an arbitrary Bernoulli RV \(T\) with \(P(T=1)=p\) satisfies \(E[T] = P(T=1)=p\). Moreover, since \(T \in \{0,1\}\), \(T^k=T\) for \(k=1,2,\ldots \) and consequently \(E[T^k]=p\). The variance of \(T\) is given by \(\hbox {Var}(T)=E[T^2]-E[T]^2 = p - p^2 =p(1-p)\). To compute the correlation coefficient \(\rho (X,Y)\), we first compute \(E[X]\), \(E[Y]\), \(\hbox {Var}(X)\), \(\hbox {Var}(Y)\), and \(\hbox {Cov}(X,Y)\). Since \(P(X=1)=\frac{1}{2}\), we have \(E[X]=P(X=1)=\frac{1}{2}\). The expectation of \(Y\) is computed as follows

$$\begin{aligned} E[Y]&= E_X [ E_Y [Y |X]] \nonumber \\&= E_X [P(Y=1|X)] \nonumber \\&= E_X \left[ \frac{1}{2} (1+\rho ) X + \frac{1}{2} (1-\rho ) (1-X)\right] \nonumber \\&= \frac{1}{2} (1+\rho ) E_X [X] + \frac{1}{2} (1 -\rho ) (1 -E[X]) \nonumber \\&= \frac{1}{2} (1+\rho ) \frac{1}{2} + \frac{1}{2} (1-\rho ) \frac{1}{2} \nonumber \\&= \frac{1}{2}. \end{aligned}$$
(15)

Since \(X\) and \(Y\) are Bernoulli RVs with \(P(X=1)=P(Y=1)=E[X]=E[Y]=\frac{1}{2}\), we also have \(\hbox {Var}(X)=\hbox {Var}(Y)=\frac{1}{2}(1-\frac{1}{2}) =\frac{1}{4}\). Next, we compute

$$\begin{aligned} E[X Y]&= E_X [ E_Y [X Y |X]] \nonumber \\&= E_X [ X E_Y [ Y |X]] \nonumber \\&= E_X [X P(Y=1|X)]\nonumber \\&= E_X\left[ X (\frac{1}{2} (1+\rho ) X + \frac{1}{2} (1-\rho ) (1-X))\right] \nonumber \\&= \frac{1}{2} (1+\rho ) E_X [X^2] + \frac{1}{2} (1 -\rho ) E[X(1-X)] \nonumber \\&= \frac{1}{2} (1+\rho ) \frac{1}{2} \nonumber \\&= \frac{1}{4} (1+\rho ). \end{aligned}$$
(16)

The covariance is therefore \(\hbox {Cov}(X,Y)=E[X Y] -E[X]E[Y] \!=\! \frac{1}{4} (1+\rho ) \!-\! \frac{1}{2}^2 = \frac{1}{4} \rho .\) Finally, substituting \(\hbox {Var}(X)\!=\!\hbox {Var}(Y)\!=\!\frac{1}{4}\) and \(\hbox {Cov}(X,Y)\!=\!\frac{1}{4} \rho \) into (14), yields \(\rho (X,Y)~=~\rho \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Briggs, F., Fern, X.Z. & Raich, R. Context-aware MIML instance annotation: exploiting label correlations with classifier chains. Knowl Inf Syst 43, 53–79 (2015). https://doi.org/10.1007/s10115-014-0781-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-014-0781-8

Keywords