Abstract
In multi-instance multi-label (MIML) instance annotation, the goal is to learn an instance classifier while training on a MIML dataset, which consists of bags of instances paired with label sets; instance labels are not provided in the training data. The MIML formulation can be applied in many domains. For example, in an image domain, bags are images, instances are feature vectors representing segments in the images, and the label sets are lists of objects or categories present in each image. Although many MIML algorithms have been developed for predicting the label set of a new bag, only a few have been specifically designed to predict instance labels. We propose MIML-ECC (ensemble of classifier chains), which exploits bag-level context through label correlations to improve instance-level prediction accuracy. The proposed method is scalable in all dimensions of a problem (bags, instances, classes, and feature dimension) and has no parameters that require tuning (which is a problem for prior methods). In experiments on two image datasets, a bioacoustics dataset, and two artificial datasets, MIML-ECC achieves higher or comparable accuracy in comparison with several recent methods and baselines.





Similar content being viewed by others
Notes
We found it effective to use a pool of threads, with each handling one of the \(L\) chains. Within each of these threads, construction of the RF classifiers was parallelized over trees. Support instance updates cannot be parallelized, because they occur sequentially in time.
Code is C++ compiled with GCC 4.0 (most speed optimizations enabled). Experiments ran on a Mac Pro with 2x 2.4 GHz Quad-Core Intel Xeon processor and 16 GB 1,066 MHz DDR3 memory, with OS X 10.8.1.
References
Andrews S, Tsochantaridis I, Hofmann T (2002) Support vector machines for multiple-instance learning. Adv Neural Inf Process Syst 15:561–568
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Briggs F, Fern X, Raich R (2012) Rank-loss support instance machines for MIML instance annotation. In: International conference on data mining, pp 534–542
Briggs F, Fern X, Raich R, Lou Q (2013) Instance annotation for multi-instance multi-label learning. ACM Trans Knowl Discov Data 7(3). doi:10.1145/2500491
Briggs F, Lakshminarayanan B, Neal L, Fern X, Raich R, Hadley S, Hadley A, Betts M (2012) Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. J Acoust Soc Am 131:4640
Buja A, Stuetzle W, Shen Y (2005) Loss functions for binary class probability estimation and classification: structure and applications. Technical report
Carroll L (1896) Through the looking-glass: and what Alice found there. Macmillan, London
Cour T, Sapp B, Taskar B (2011) Learning from partial labels. J Mach Learn Res 12:1225–1261
Dembczynski K, Cheng W, Hüllermeier E (2010) Bayes optimal multilabel classification via probabilistic classifier chains. In: International conference on machine learning, pp 279–286
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Frey PW, Slate DJ (1991) Letter recognition using holland-style adaptive classifiers. Mach Learn 6:161
Frost R (1921) Mountain interval. H. Holt, New York
Kong X, Shi X, Philip SY (2011) Multi-label collective classification. In: SDM, vol. 11, pp 618–629
Li B, Xiong W, Hu W (2011) Context-aware multi-instance learning based on hierarchical sparse representation. In: Data mining (ICDM), 2011 IEEE 11th international conference on, pp 370–377
Li B, Xiong W, Hu W (2011) Web horror image recognition based on context-aware multi-instance learning. In: International conference on data mining, pp 1158–1163
Li Y, Hu J, Jiang Y, Zhou Z (2012) Towards discovering what patterns trigger what labels. In: Conference on artificial intelligence
Li Y, Ji S, Kumar S, Ye J, Zhou Z et al (2009) Drosophila gene expression pattern annotation through multi-instance multi-label learning. In: International joint conference on artificial intelligence, pp 1445–1450
Liu L, Dietterich T (2012) A conditional multinomial mixture model for superset label learning. In: Advances in neural information processing systems, pp 557–565
Maron O (1998) Learning from ambiguity. PhD thesis, Massachusetts Institute of Technology
Maron O, Lozano-Pérez T (1998) A framework for multiple-instance learning. In: Advances in neural information processing systems, pp 570–576
Nguyen N (2010) A new svm approach to multi-instance multi-label learning. In: International conference on data mining, pp 384–392
Rahimi A, Recht B (2007) Random features for large-scale kernel machines. Adv Neural Inf Process Syst 20:1177–1184
Ray S, Craven M (2005) Supervised versus multiple instance learning: an empirical comparison. In: International conference on machine learning. ACM, New York, pp 697–704
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359
Shen C, Jiao J, Wang B, Yang Y (2009) Multi-instance multi-label learning for automatic tag recommendation. In: Proceedings of the 2009 IEEE international conference on systems, man, and cybernetics (SMC 2009)
Vezhnevets A, Buhmann J, Zurich E (2010) Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning. In: Conference on computer vision and pattern recognition
Vijayanarasimhan S, Grauman K (2009) What’s it going to cost you?: Predicting effort versus informativeness for multi-label image annotations. In: Conference on computer vision and pattern recognition, pp 2262–2269
Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In: International conference on computer vision, pp 1800–1807
Xu X, Frank E (2004) Logistic regression and boosting for labeled bags of instances. In: Advances in knowledge discovery and data mining, pp 272–281
Xu X, Xue X, Zhou Z (2011) Ensemble multi-instance multi-label learning approach for video annotation task. In: Proceedings of the 19th ACM international conference on multimedia, ACM, New York, pp 1153–1156
Yang S, Bian J, Zha H (2010) Hybrid generative/discriminative learning for automatic image annotation. In: Conference on uncertainty in artificial intelligence
Yang S, Zha H, Hu B (2009) Dirichlet-bernoulli alignment: a generative model for multi-class multi-label multi-instance corpora. In: Advances in neural information processing systems, pp 2143–2150
Yang W, Wang Y, Vahdat A, Mori G (2012) Kernel latent svm for visual recognition. In: Advances in neural information processing systems, volume 2, p 4
Yuille A, Rangarajan A (2002) The concave-convex procedure (CCCP). Adv Neural Inf Process Syst 2:1033–1040
Zha Z, Hua X, Mei T, Wang J, Qi G, Wang Z (2008) Joint multi-label multi-instance learning for image classification. In: Conference on computer vision and pattern recognition, pp 1–8
Zhang M, Zhou Z (2008) M3MIML: a maximum margin method for multi-instance multi-label learning. In: International conference on data mining, pp 688–697
Zhang Q, Goldman S (2002) EM-DD: an improved multiple-instance learning technique. Adv Neural Inf Process Syst 2:1073–1080
Zhou Z, Zhang M (2007) Multi-instance multi-label learning with application to scene classification. Adv Neural Inf Process Syst 19:1609
Zhou Z-H, Sun Y-Y, Li Y-F (2009) Multi-instance learning by treating instances as non-iid samples. In: International conference on machine learning, pp 1249–1256
Zhou Z-H, Zhang M-L, Huang S-J, Li Y-F (2012) Multi-instance multi-label learning. Artif Intell 176(1):2291–2320
Acknowledgments
This work is partially funded by NSF grant 1055113 to Xiaoli Z. Fern, and the College of Engineering, Oregon State University.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The correlation coefficient \(\rho (X,Y)\) between two random variables X and Y is given by
Let \(X\) be a Bernoulli RV with \(P(X=1)=\frac{1}{2}\). Similarly, let \(Y\) conditioned on \(X\) be a Bernoulli RV with \(P(Y=1|X)=\frac{1}{2} (1+\rho ) X + \frac{1}{2} (1-\rho ) (1-X)\), as in Sect. 5.5, Eq. (12). The correlation coefficient \(\rho (X,Y) = \rho \).
Proof we begin by noting the property that the expected value of an arbitrary Bernoulli RV \(T\) with \(P(T=1)=p\) satisfies \(E[T] = P(T=1)=p\). Moreover, since \(T \in \{0,1\}\), \(T^k=T\) for \(k=1,2,\ldots \) and consequently \(E[T^k]=p\). The variance of \(T\) is given by \(\hbox {Var}(T)=E[T^2]-E[T]^2 = p - p^2 =p(1-p)\). To compute the correlation coefficient \(\rho (X,Y)\), we first compute \(E[X]\), \(E[Y]\), \(\hbox {Var}(X)\), \(\hbox {Var}(Y)\), and \(\hbox {Cov}(X,Y)\). Since \(P(X=1)=\frac{1}{2}\), we have \(E[X]=P(X=1)=\frac{1}{2}\). The expectation of \(Y\) is computed as follows
Since \(X\) and \(Y\) are Bernoulli RVs with \(P(X=1)=P(Y=1)=E[X]=E[Y]=\frac{1}{2}\), we also have \(\hbox {Var}(X)=\hbox {Var}(Y)=\frac{1}{2}(1-\frac{1}{2}) =\frac{1}{4}\). Next, we compute
The covariance is therefore \(\hbox {Cov}(X,Y)=E[X Y] -E[X]E[Y] \!=\! \frac{1}{4} (1+\rho ) \!-\! \frac{1}{2}^2 = \frac{1}{4} \rho .\) Finally, substituting \(\hbox {Var}(X)\!=\!\hbox {Var}(Y)\!=\!\frac{1}{4}\) and \(\hbox {Cov}(X,Y)\!=\!\frac{1}{4} \rho \) into (14), yields \(\rho (X,Y)~=~\rho \).
Rights and permissions
About this article
Cite this article
Briggs, F., Fern, X.Z. & Raich, R. Context-aware MIML instance annotation: exploiting label correlations with classifier chains. Knowl Inf Syst 43, 53–79 (2015). https://doi.org/10.1007/s10115-014-0781-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-014-0781-8