Context-aware MIML instance annotation: exploiting label correlations with classifier chains

Briggs, Forrest; Fern, Xiaoli Z.; Raich, Raviv

doi:10.1007/s10115-014-0781-8

Context-aware MIML instance annotation: exploiting label correlations with classifier chains

Regular Paper
Published: 04 September 2014

Volume 43, pages 53–79, (2015)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Forrest Briggs¹,
Xiaoli Z. Fern¹ &
Raviv Raich¹

425 Accesses
10 Citations
Explore all metrics

Abstract

In multi-instance multi-label (MIML) instance annotation, the goal is to learn an instance classifier while training on a MIML dataset, which consists of bags of instances paired with label sets; instance labels are not provided in the training data. The MIML formulation can be applied in many domains. For example, in an image domain, bags are images, instances are feature vectors representing segments in the images, and the label sets are lists of objects or categories present in each image. Although many MIML algorithms have been developed for predicting the label set of a new bag, only a few have been specifically designed to predict instance labels. We propose MIML-ECC (ensemble of classifier chains), which exploits bag-level context through label correlations to improve instance-level prediction accuracy. The proposed method is scalable in all dimensions of a problem (bags, instances, classes, and feature dimension) and has no parameters that require tuning (which is a problem for prior methods). In experiments on two image datasets, a bioacoustics dataset, and two artificial datasets, MIML-ECC achieves higher or comparable accuracy in comparison with several recent methods and baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Active Multi-Instance Multi-Label Learning

Multi-instance Multi-label Learning for Image Categorization Based on Integrated Contextual Information

Multi-label Learning with Missing Labels Based on Instance-Wise and Label-Wise Correlations for Image Classification

Notes

We found it effective to use a pool of threads, with each handling one of the $L$ chains. Within each of these threads, construction of the RF classifiers was parallelized over trees. Support instance updates cannot be parallelized, because they occur sequentially in time.
Code is C++ compiled with GCC 4.0 (most speed optimizations enabled). Experiments ran on a Mac Pro with 2x 2.4 GHz Quad-Core Intel Xeon processor and 16 GB 1,066 MHz DDR3 memory, with OS X 10.8.1.

References

Andrews S, Tsochantaridis I, Hofmann T (2002) Support vector machines for multiple-instance learning. Adv Neural Inf Process Syst 15:561–568
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Briggs F, Fern X, Raich R (2012) Rank-loss support instance machines for MIML instance annotation. In: International conference on data mining, pp 534–542
Briggs F, Fern X, Raich R, Lou Q (2013) Instance annotation for multi-instance multi-label learning. ACM Trans Knowl Discov Data 7(3). doi:10.1145/2500491
Briggs F, Lakshminarayanan B, Neal L, Fern X, Raich R, Hadley S, Hadley A, Betts M (2012) Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. J Acoust Soc Am 131:4640
Article Google Scholar
Buja A, Stuetzle W, Shen Y (2005) Loss functions for binary class probability estimation and classification: structure and applications. Technical report
Carroll L (1896) Through the looking-glass: and what Alice found there. Macmillan, London
Cour T, Sapp B, Taskar B (2011) Learning from partial labels. J Mach Learn Res 12:1225–1261
MathSciNet Google Scholar
Dembczynski K, Cheng W, Hüllermeier E (2010) Bayes optimal multilabel classification via probabilistic classifier chains. In: International conference on machine learning, pp 279–286
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MATH MathSciNet Google Scholar
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Frey PW, Slate DJ (1991) Letter recognition using holland-style adaptive classifiers. Mach Learn 6:161
Google Scholar
Frost R (1921) Mountain interval. H. Holt, New York
Kong X, Shi X, Philip SY (2011) Multi-label collective classification. In: SDM, vol. 11, pp 618–629
Li B, Xiong W, Hu W (2011) Context-aware multi-instance learning based on hierarchical sparse representation. In: Data mining (ICDM), 2011 IEEE 11th international conference on, pp 370–377
Li B, Xiong W, Hu W (2011) Web horror image recognition based on context-aware multi-instance learning. In: International conference on data mining, pp 1158–1163
Li Y, Hu J, Jiang Y, Zhou Z (2012) Towards discovering what patterns trigger what labels. In: Conference on artificial intelligence
Li Y, Ji S, Kumar S, Ye J, Zhou Z et al (2009) Drosophila gene expression pattern annotation through multi-instance multi-label learning. In: International joint conference on artificial intelligence, pp 1445–1450
Liu L, Dietterich T (2012) A conditional multinomial mixture model for superset label learning. In: Advances in neural information processing systems, pp 557–565
Maron O (1998) Learning from ambiguity. PhD thesis, Massachusetts Institute of Technology
Maron O, Lozano-Pérez T (1998) A framework for multiple-instance learning. In: Advances in neural information processing systems, pp 570–576
Nguyen N (2010) A new svm approach to multi-instance multi-label learning. In: International conference on data mining, pp 384–392
Rahimi A, Recht B (2007) Random features for large-scale kernel machines. Adv Neural Inf Process Syst 20:1177–1184
Google Scholar
Ray S, Craven M (2005) Supervised versus multiple instance learning: an empirical comparison. In: International conference on machine learning. ACM, New York, pp 697–704
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359
Article MathSciNet Google Scholar
Shen C, Jiao J, Wang B, Yang Y (2009) Multi-instance multi-label learning for automatic tag recommendation. In: Proceedings of the 2009 IEEE international conference on systems, man, and cybernetics (SMC 2009)
Vezhnevets A, Buhmann J, Zurich E (2010) Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning. In: Conference on computer vision and pattern recognition
Vijayanarasimhan S, Grauman K (2009) What’s it going to cost you?: Predicting effort versus informativeness for multi-label image annotations. In: Conference on computer vision and pattern recognition, pp 2262–2269
Winn J, Criminisi A, Minka T (2005) Object categorization by learned universal visual dictionary. In: International conference on computer vision, pp 1800–1807
Xu X, Frank E (2004) Logistic regression and boosting for labeled bags of instances. In: Advances in knowledge discovery and data mining, pp 272–281
Xu X, Xue X, Zhou Z (2011) Ensemble multi-instance multi-label learning approach for video annotation task. In: Proceedings of the 19th ACM international conference on multimedia, ACM, New York, pp 1153–1156
Yang S, Bian J, Zha H (2010) Hybrid generative/discriminative learning for automatic image annotation. In: Conference on uncertainty in artificial intelligence
Yang S, Zha H, Hu B (2009) Dirichlet-bernoulli alignment: a generative model for multi-class multi-label multi-instance corpora. In: Advances in neural information processing systems, pp 2143–2150
Yang W, Wang Y, Vahdat A, Mori G (2012) Kernel latent svm for visual recognition. In: Advances in neural information processing systems, volume 2, p 4
Yuille A, Rangarajan A (2002) The concave-convex procedure (CCCP). Adv Neural Inf Process Syst 2:1033–1040
Google Scholar
Zha Z, Hua X, Mei T, Wang J, Qi G, Wang Z (2008) Joint multi-label multi-instance learning for image classification. In: Conference on computer vision and pattern recognition, pp 1–8
Zhang M, Zhou Z (2008) M3MIML: a maximum margin method for multi-instance multi-label learning. In: International conference on data mining, pp 688–697
Zhang Q, Goldman S (2002) EM-DD: an improved multiple-instance learning technique. Adv Neural Inf Process Syst 2:1073–1080
Google Scholar
Zhou Z, Zhang M (2007) Multi-instance multi-label learning with application to scene classification. Adv Neural Inf Process Syst 19:1609
Google Scholar
Zhou Z-H, Sun Y-Y, Li Y-F (2009) Multi-instance learning by treating instances as non-iid samples. In: International conference on machine learning, pp 1249–1256
Zhou Z-H, Zhang M-L, Huang S-J, Li Y-F (2012) Multi-instance multi-label learning. Artif Intell 176(1):2291–2320
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

This work is partially funded by NSF grant 1055113 to Xiaoli Z. Fern, and the College of Engineering, Oregon State University.

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, 97331-5501, USA
Forrest Briggs, Xiaoli Z. Fern & Raviv Raich

Authors

Forrest Briggs
View author publications
You can also search for this author inPubMed Google Scholar
Xiaoli Z. Fern
View author publications
You can also search for this author inPubMed Google Scholar
Raviv Raich
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Forrest Briggs.

Appendix

The correlation coefficient $\rho (X,Y)$ between two random variables X and Y is given by

$$\begin{aligned} \rho (X,Y) = \frac{\hbox {Cov}(X,Y)}{\sqrt{\hbox {Var}(X)}\sqrt{\hbox {Var}(Y)}} \end{aligned}$$

(14)

Let $X$ be a Bernoulli RV with $P(X=1)=\frac{1}{2}$. Similarly, let $Y$ conditioned on $X$ be a Bernoulli RV with $P(Y=1|X)=\frac{1}{2} (1+\rho ) X + \frac{1}{2} (1-\rho ) (1-X)$, as in Sect. 5.5, Eq. (12). The correlation coefficient $\rho (X,Y) = \rho $.

Proof we begin by noting the property that the expected value of an arbitrary Bernoulli RV $T$ with $P(T=1)=p$ satisfies $E[T] = P(T=1)=p$. Moreover, since $T \in \{0,1\}$, $T^k=T$ for $k=1,2,\ldots $ and consequently $E[T^k]=p$. The variance of $T$ is given by $\hbox {Var}(T)=E[T^2]-E[T]^2 = p - p^2 =p(1-p)$. To compute the correlation coefficient $\rho (X,Y)$, we first compute $E[X]$, $E[Y]$, $\hbox {Var}(X)$, $\hbox {Var}(Y)$, and $\hbox {Cov}(X,Y)$. Since $P(X=1)=\frac{1}{2}$, we have $E[X]=P(X=1)=\frac{1}{2}$. The expectation of $Y$ is computed as follows

$$\begin{aligned} E[Y]&= E_X [ E_Y [Y |X]] \nonumber \\&= E_X [P(Y=1|X)] \nonumber \\&= E_X \left[ \frac{1}{2} (1+\rho ) X + \frac{1}{2} (1-\rho ) (1-X)\right] \nonumber \\&= \frac{1}{2} (1+\rho ) E_X [X] + \frac{1}{2} (1 -\rho ) (1 -E[X]) \nonumber \\&= \frac{1}{2} (1+\rho ) \frac{1}{2} + \frac{1}{2} (1-\rho ) \frac{1}{2} \nonumber \\&= \frac{1}{2}. \end{aligned}$$

(15)

Since $X$ and $Y$ are Bernoulli RVs with $P(X=1)=P(Y=1)=E[X]=E[Y]=\frac{1}{2}$, we also have $\hbox {Var}(X)=\hbox {Var}(Y)=\frac{1}{2}(1-\frac{1}{2}) =\frac{1}{4}$. Next, we compute

$$\begin{aligned} E[X Y]&= E_X [ E_Y [X Y |X]] \nonumber \\&= E_X [ X E_Y [ Y |X]] \nonumber \\&= E_X [X P(Y=1|X)]\nonumber \\&= E_X\left[ X (\frac{1}{2} (1+\rho ) X + \frac{1}{2} (1-\rho ) (1-X))\right] \nonumber \\&= \frac{1}{2} (1+\rho ) E_X [X^2] + \frac{1}{2} (1 -\rho ) E[X(1-X)] \nonumber \\&= \frac{1}{2} (1+\rho ) \frac{1}{2} \nonumber \\&= \frac{1}{4} (1+\rho ). \end{aligned}$$

(16)

The covariance is therefore $\hbox {Cov}(X,Y)=E[X Y] -E[X]E[Y] \!=\! \frac{1}{4} (1+\rho ) \!-\! \frac{1}{2}^2 = \frac{1}{4} \rho .$ Finally, substituting $\hbox {Var}(X)\!=\!\hbox {Var}(Y)\!=\!\frac{1}{4}$ and $\hbox {Cov}(X,Y)\!=\!\frac{1}{4} \rho $ into (14), yields $\rho (X,Y)~=~\rho $.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Briggs, F., Fern, X.Z. & Raich, R. Context-aware MIML instance annotation: exploiting label correlations with classifier chains. Knowl Inf Syst 43, 53–79 (2015). https://doi.org/10.1007/s10115-014-0781-8

Download citation

Received: 22 December 2013
Revised: 03 May 2014
Accepted: 23 August 2014
Published: 04 September 2014
Issue Date: April 2015
DOI: https://doi.org/10.1007/s10115-014-0781-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Context-aware MIML instance annotation: exploiting label correlations with classifier chains

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Active Multi-Instance Multi-Label Learning

Multi-instance Multi-label Learning for Image Categorization Based on Integrated Contextual Information

Multi-label Learning with Missing Labels Based on Instance-Wise and Label-Wise Correlations for Image Classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now