Machine Learning for Visual Concept Recognition and Ranking for Images

Binder, Alexander; Samek, Wojciech; Müller, Klaus-Robert; Kawanabe, Motoaki

doi:10.1007/978-3-319-06755-1_17

Machine Learning for Visual Concept Recognition and Ranking for Images

Alexander Binder^15,16,
Wojciech Samek^15,16,
Klaus-Robert Müller^16,17 &
…
Motoaki Kawanabe¹⁸

Chapter
First Online: 01 January 2014

1437 Accesses
2 Citations

Part of the book series: Cognitive Technologies ((COGTECH))

Abstract

Recognition of a large set of generic visual concepts in images and ranking of images based on visual semantics is one of the unsolved tasks for future multimedia and scientific applications based on image collections. From that perspective, improvements of the quality of semantic annotations for image data are well matched to the goals of the THESEUS research program with respect to multimedia and scientific services. We will introduce the data-driven and algorithmic challenges inherent in such tasks from a perspective of statistical data analysis and machine learning and discuss approaches relying on kernel-based similarities and discriminative methods which are capable of processing large-scale datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

B. André, T. Vercauteren, A.M. Buchner, M.B. Wallace, N. Ayache, Retrieval evaluation and distance learning from perceived similarity between endomicroscopy videos, in Proceedings of the 14th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI’11), Toronto, ed. by G. Fichtinger, A.L. Martel, T.M. Peters. Volume 6893 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2011), pp. 297–304. http://dblp.uni-trier.de/db/conf/miccai/miccai2011-3.html#AndreVBWA11
F.R. Bach, G.R.G. Lanckriet, M.I. Jordan, Multiple kernel learning, conic duality, and the SMO algorithm, in Proceedings of the 21st International Conference on Machine Learning (ICML’04), Banff, ed. by C.E. Brodley. Volume 69 of ACM International Conference Proceeding Series (ACM, 2004). http://dblp.uni-trier.de/db/conf/icml/icml2004.html#BachLJ04
H. Bay, A. Ess, T. Tuytelaars, L.V. Gool, SURF: speeded up robust features. Comput. Vis. Image Underst. (CVIU) 110, 346–359 (2008)
Article Google Scholar
A. Binder, M. Kawanabe, Enhancing recognition of visual concepts with primitive color histograms via non-sparse multiple kernel learning, in Proceedings of the 10th Workshop of the Cross-Language Evaluation Forum (CLEF’09), Corfu, Greece, ed. by C. Peters, B. Caputo, J. Gonzalo, G.J.F. Jones, J. Kalpathy-Cramer, H. Müller, T. Tsikrika. Volume 6242 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2009), pp. 269–276. http://dblp.uni-trier.de/db/conf/clef/clef2009-2.html#BinderK09
A. Binder, K.R. Müller, M. Kawanabe, On taxonomies for multi-class image categorization. Int. J. Comput. Vis. 99(3), 281–301 (2012a). http://dblp.uni-trier.de/db/journals/ijcv/ijcv99.html#BinderMK12
A. Binder, S. Nakajima, M. Kloft, C. Müller, W. Samek, U. Brefeld, K.R. Müller, M. Kawanabe, Insights from classifying visual concepts with multiple kernel learning. PLoS One 7(8) (2012b). http://dblp.uni-trier.de/db/journals/corr/corr1112.html#abs-1112-3697
A. Binder, W. Samek, M. Kloft, C. Müller, K.R. Müller, M. Kawanabe, The joint submission of the TU Berlin and Fraunhofer FIRST (TUBFI) to the ImageCLEF2011 photo annotation task, in CLEF (Notebook Papers/Labs/Workshop), Amsterdam, ed. by V. Petras, P. Forner, P.D. Clough, 2011. http://dblp.uni-trier.de/db/conf/clef/clef2011w.html#BinderSKMMK11
A. Binder, W. Samek, K.R. Müller, M. Kawanabe, Enhanced representation and multi-task learning for image annotation. Comput. Vis. Image Underst. 117(5), 466–478 (2013). http://dblp.uni-trier.de/db/journals/cviu/cviu117.html#BinderSMK13
A. Binder, W. Wojcikiewicz, C. Müller, M. Kawanabe, A hybrid supervised-unsupervised vocabulary generation algorithm for visual concept recognition, in Proceedings of the 10th Asian Conference on Computer Vision (ACCV’10), Queenstown, ed. by R. Kimmel, R. Klette, A. Sugimoto. Volume 6494 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2010), pp. 95–108. http://dblp.uni-trier.de/db/conf/accv/accv2010-3.html#BinderWMK10
C. Cortes, V. Vapnik, Support vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
A. Cruz-Roa, J.C. Caicedo, F.A. Gonzáez, Visual pattern mining in histology image collections using bag of features. Artif. Intell. Med. 52(2), 91–106 (2011). http://dblp.uni-trier.de/db/journals/artmed/artmed52.html#Cruz-RoaCG11
G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray, Visual categorization with bags of keypoints. Workshop on Statistical Learning in Computer Vision (ECCV), Prague, 2004, pp. 1–22
Google Scholar
G. Csurka, F. Perronnin, L. Marchesotti, S. Clinchant, J. Ah-Pine, Fisher kernel representation of images and some of its successful applications, in Proceedings of the International Conference on Computer Vision and Theory Applications (VISAPP’10), Angers, ed. by P. Richard, J. Braz (INSTICC, 2010), pp. 21–25. http://dblp.uni-trier.de/db/conf/visapp/visapp2010-1.html#CsurkaPMCA10
J. Deng, A.C. Berg, F.F. Li, Hierarchical semantic indexing for large scale image retrieval, in Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR’11), Colorado Springs (IEEE, 2011), pp. 785–792. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2011.html#DengBL11
M. Everingham, L.J.V. Gool, C.K.I. Williams, J.M. Winn, A. Zisserman, The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010). http://dblp.uni-trier.de/db/journals/ijcv/ijcv88.html#EveringhamGWWZ10
Y. Freund, R.E. Schapire, A decision theoretic generalization of on-line learning and an application to boosting, in Second European Conference on Computational Learning Theory (EuroCOLT-95), Barcelona, ed. by P.M.B. Vitányi (Springer, Berlin/Heidelberg/New York, 1995), pp. 23–37. citeseer.nj.nec.com/freund95decisiontheoretic.html
P.V. Gehler, S. Nowozin, On feature combination for multiclass object classification, in Proceedings of the 12th IEEE International Conference on Computer Vision (ICCV’09), Kyoto (IEEE, 2009), pp. 221–228. http://dblp.uni-trier.de/db/conf/iccv/iccv2009.html#GehlerN09
T. Hofmann, Probabilistic latent semantic analysis, in Proceedings of the Uncertainty in Artificial Intelligence (UAI’99), Stockholm, 1999, pp. 289–296
Google Scholar
N. Inoue, Y. Kamishima, T. Wada, K. Shinoda, S. Sato, TokyoTech+Canon at TRECVID 2011, in TREC Video Retrieval Evaluation, Gaithersburg, Maryland, 2011
Google Scholar
F. Jurie, B. Triggs, Creating efficient codebooks for visual recognition, in Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV’05), Beijing (IEEE Computer Society, 2005), pp. 604–610. http://dblp.uni-trier.de/db/conf/iccv/iccv2005-1.html#JurieT05
M. Kawanabe, A. Binder, C. Müller, W. Wojcikiewicz, Multi-modal visual concept classification of images via Markov random walk over tags, in Proceedings of the IEEE Workshop on Applications of Computer Vision (WACV’11), Kona (IEEE Computer Society, 2011), pp. 396–401. http://dblp.uni-trier.de/db/conf/wacv/wacv2011.html#KawanabeBMW11
M. Kloft, U. Brefeld, S. Sonnenburg, A. Zien, Lp-norm multiple kernel learning. J. Mach. Learn. Res. 12, 953–997 (2011). http://dblp.uni-trier.de/db/journals/jmlr/jmlr12.html#KloftBSZ11
A. Kumar, C. Sminchisescu, Support kernel machines for object recognition, in Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV’07), Rio de Janeiro (IEEE, 2007), pp. 1–8. http://dblp.uni-trier.de/db/conf/iccv/iccv2007.html#KumarS07
C.H. Lampert, M.B. Blaschko, A multiple kernel learning approach to joint multi-class object detection, in Proceedings of the 30th DAGM Symposium, Munich, ed. by G. Rigoll. Volume 5096 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2008), pp. 31–40. http://dblp.uni-trier.de/db/conf/dagm/dagm2008.html#LampertB08
G.R.G. Lanckriet, N. Cristianini, P.L. Bartlett, L.E. Ghaoui, M.I. Jordan, Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004). http://dblp.uni-trier.de/db/journals/jmlr/jmlr5.html#LanckrietCBGJ03
D.G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
P. Massart, É. Nédélec, Risk bounds for statistical learning. Ann. Stat. 34(5), 2326–2366 (2006)
Article MATH Google Scholar
K.R. Müller, S. Mika, G. Rätsch, S. Tsuda, B. Schölkopf, An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–202 (2001). http://www.ist.temple.edu/~vucetic/cis526fall2003/SVMintro.pdf
D. Nistér, H. Stewénius, Scalable recognition with a vocabulary tree, in Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York (IEEE Computer Society, 2006), pp. 2161–2168. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2006-2.html#NisterS06
S. Nowak, K. Nagel, J. Liebetrau, The CLEF 2011 photo annotation and concept-based retrieval tasks, in CLEF (Notebook Papers/Labs/Workshop), Amsterdam, ed. by V. Petras, P. Forner, P.D. Clough, 2011. http://dblp.uni-trier.de/db/conf/clef/clef2011w.html#NowakNL11
D. Parikh, Recognizing jumbled images: the role of local and global information in image classification, in ed. by D.N. Metaxas, L. Quan, A. Sanfeliu, L.J.V. Gool, Proceedings of the International Conference on Computer Vision (ICCV’11), Barcelona (IEEE, 2011), pp. 519–526. http://dblp.uni-trier.de/db/conf/iccv/iccv2011.html#Parikh11
W. Samek, A. Binder, M. Kawanabe, Multi-task learning via non-sparse multiple kernel learning, in Proceedings of the 14th International Conference on Computer Analysis of Images and Patterns (CAIP’11), Seville, ed. by P. Real, D. Díaz-Pernil, H. Molina-Abril, A. Berciano, W.G. Kropatsch. Volume 6854 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2011), pp. 335–342. http://dblp.uni-trier.de/db/conf/caip/caip2011-1.html#SamekBK11
S. Sonnenburg, G. Rätsch, C. Schäfer, B. Schölkopf, Large scale multiple kernel learning. J. Mach. Learn. Res. 7, 1531–1565 (2006). http://jmlr.org/papers/volume7/sonnenburg06a/sonnenburg06a.pdf
M. Sugiyama, M. Krauledat, K.R. Müller, Covariate shift adaptation by importance weighted cross validation. J. Mach. Learn. Res. 8, 985–1005 (2007). http://dblp.uni-trier.de/db/journals/jmlr/jmlr8.html#SugiyamaKM07
J.R.R. Uijlings, A.W.M. Smeulders, R.J.H. Scha, Real-time visual concept classification. IEEE Trans. Multimed. 12(7), 665–681 (2010). http://dblp.uni-trier.de/db/journals/tmm/tmm12.html#UijlingsSS10
K.E.A. van de Sande, T. Gevers, C.G.M. Snoek, Evaluating color descriptors for object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1582–1596 (2010). http://dblp.uni-trier.de/db/journals/pami/pami32.html#SandeGS10
K.E.A. van de Sande, T. Gevers, C.G.M. Snoek, Empowering visual categorization with the GPU. IEEE Trans. Multimed. 13(1), 60–70 (2011a). http://dblp.uni-trier.de/db/journals/tmm/tmm13.html#SandeGS11
K.E.A. van de Sande, J.R.R. Uijlings, T. Gevers, A.W.M. Smeulders, Segmentation as selective search for object recognition, in Proceedings of the 13th International Conference on Computer Vision (ICCV’11), Barcelona, ed. by D.N. Metaxas, L. Quan, A. Sanfeliu, L.J.V. Gool (IEEE, 2011b), pp. 1879–1886. http://dblp.uni-trier.de/db/conf/iccv/iccv2011.html#SandeUGS11
J. van Gemert, J.M. Geusebroek, C.J. Veenman, A.W.M. Smeulders, Kernel codebooks for scene categorization, in Proceedings of the 10th European Conference on Computer Vision (ECCV’08), Marseille, 2008, pp. 696–709. http://dblp.uni-trier.de/db/conf/eccv/eccv2008-3.html#GemertGVS08
P. von Bünau, F.C. Meinecke, F.C. Király, K.R. Müller, Finding stationary subspaces in multivariate time series. Phys. Rev. Lett. 103(21), 214101 (2009)
Google Scholar
J. Wang, J. Yang, K. Yu, F. Lv, T.S. Huang, Y. Gong, Locality-constrained linear coding for image classification, in Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10), San Francisco (IEEE, 2010), pp. 3360–3367. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2010.html#WangYYLHG10
W. Wojcikiewicz, A. Binder, M. Kawanabe, Enhancing image classification with class-wise clustered vocabularies, in Proceedings of the 20th International Conference on Pattern Recognition (ICPR’10), Istanbul (IEEE, 2010a), pp. 1060–1063. http://dblp.uni-trier.de/db/conf/icpr/icpr2010.html#WojcikiewiczBK10
W. Wojcikiewicz, A. Binder, M. Kawanabe, Shrinking large visual vocabularies using multi-label agglomerative information bottleneck, in Proceedings of the 17th IEEE International Conference on Image Processing (ICIP’10), Hong Kong, 2010b, pp. 3849–3852
Google Scholar
R. Xu, Y. Hirano, R. Tachibana, S. Kido, Classification of diffuse lung disease patterns on high-resolution computed tomography by a bag of words approach, in Proceedings of the 14th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI’11), Toronto, ed. by G. Fichtinger, A.L. Martel, T.M. Peters. Volume 6893 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2011), pp. 183–190. http://dblp.uni-trier.de/db/conf/miccai/miccai2011-3.html#XuHTK11
F. Yan, J. Kittler, K. Mikolajczyk, M.A. Tahir, Non-sparse multiple kernel learning for fisher discriminant analysis, in Proceedings of the 2009 IEEE International Conference on Data Mining (ICDM’09), Miami, ed. by W. Wei, H. Kargupta, S. Ranka, P.S. Yu, X. Wu (IEEE Computer Society, 2009), pp. 1064–1069. http://dblp.uni-trier.de/db/conf/icdm/icdm2009.html#YanKMT09
J. Yang, K. Yu, Y. Gong, T.S. Huang, Linear spatial pyramid matching using sparse coding for image classification, in Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’09), Miami (IEEE, 2009a), pp. 1794–1801. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2009.html#YangYGH09
L. Yang, N. Zheng, Y. Jie, M. Chen, H. Chen, A biased sampling strategy for object categorization, in Proceedings of the 12th International Conference on Computer Vision (ICCV’09), Kyoto (IEEE, 2009b), pp. 1141–1148. http://dblp.uni-trier.de/db/conf/iccv/iccv2009.html#YangZYCC09
K. Yu, T. Zhang, Y. Gong, Nonlinear learning using local coordinate coding, in Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS’09), Vancouver, ed. by Y. Bengio, D. Schuurmans, J.D. Lafferty, C.K.I. Williams, A. Culotta (Curran Associates, Inc., 2009), pp. 2223–2231. http://dblp.uni-trier.de/db/conf/nips/nips2009.html#YuZG09

Download references

Acknowledgements

This work was primarily supported by the German Federal Ministry for Economic Affairs and Energy (BMWi) under the THESEUS research program (Grant 01MQ07018). Furthermore it was in part supported by the World Class University Program through the National Research Foundation of Korea funded by the Korean Ministry of Education, Science, and Technology, under Grant R31-10008. We express our thanks to Volker Tresp, the work package leader at CTC WP6, Ralf Schäfer from the Fraunhofer HHI and Shinichi Nakajima from the Nikon corporation for the fruitful collaboration.

Author information

Authors and Affiliations

Fraunhofer Institute for Computer Architecture and Software Technology (FIRST), Berlin, Germany
Alexander Binder & Wojciech Samek
Machine Learning Group, TU Berlin, Berlin, Germany
Alexander Binder, Wojciech Samek & Klaus-Robert Müller
Department of Brain and Cognitive Engineering, Korea University, Seoul, Korea
Klaus-Robert Müller
ATR Brain Information Communication Research Laboratory, Kyoto, Japan
Motoaki Kawanabe

Authors

Alexander Binder
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Samek
View author publications
You can also search for this author in PubMed Google Scholar
Klaus-Robert Müller
View author publications
You can also search for this author in PubMed Google Scholar
Motoaki Kawanabe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Binder .

Editor information

Editors and Affiliations

Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI) GmbH, Saarbrücken, Germany
Wolfgang Wahlster
Fraunhofer Heinrich-Hertz-Institut, Berlin, Germany
Hans-Joachim Grallert
Empolis Information Management GmbH, Kaiserslautern, Germany
Stefan Wess
Corporate Technology, Siemens AG, München, Germany
Hermann Friedrich
Strategy Advisory, SAP Deutschland AG & Co. KG, Walldorf, Germany
Thomas Widenka

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Binder, A., Samek, W., Müller, KR., Kawanabe, M. (2014). Machine Learning for Visual Concept Recognition and Ranking for Images. In: Wahlster, W., Grallert, HJ., Wess, S., Friedrich, H., Widenka, T. (eds) Towards the Internet of Services: The THESEUS Research Program. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-06755-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-06755-1_17
Published: 02 July 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06754-4
Online ISBN: 978-3-319-06755-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics