Skip to main content

Machine Learning for Visual Concept Recognition and Ranking for Images

  • Chapter
  • First Online:

Part of the book series: Cognitive Technologies ((COGTECH))

Abstract

Recognition of a large set of generic visual concepts in images and ranking of images based on visual semantics is one of the unsolved tasks for future multimedia and scientific applications based on image collections. From that perspective, improvements of the quality of semantic annotations for image data are well matched to the goals of the THESEUS research program with respect to multimedia and scientific services. We will introduce the data-driven and algorithmic challenges inherent in such tasks from a perspective of statistical data analysis and machine learning and discuss approaches relying on kernel-based similarities and discriminative methods which are capable of processing large-scale datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • B. André, T. Vercauteren, A.M. Buchner, M.B. Wallace, N. Ayache, Retrieval evaluation and distance learning from perceived similarity between endomicroscopy videos, in Proceedings of the 14th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI’11), Toronto, ed. by G. Fichtinger, A.L. Martel, T.M. Peters. Volume 6893 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2011), pp. 297–304. http://dblp.uni-trier.de/db/conf/miccai/miccai2011-3.html#AndreVBWA11

  • F.R. Bach, G.R.G. Lanckriet, M.I. Jordan, Multiple kernel learning, conic duality, and the SMO algorithm, in Proceedings of the 21st International Conference on Machine Learning (ICML’04), Banff, ed. by C.E. Brodley. Volume 69 of ACM International Conference Proceeding Series (ACM, 2004). http://dblp.uni-trier.de/db/conf/icml/icml2004.html#BachLJ04

  • H. Bay, A. Ess, T. Tuytelaars, L.V. Gool, SURF: speeded up robust features. Comput. Vis. Image Underst. (CVIU) 110, 346–359 (2008)

    Article  Google Scholar 

  • A. Binder, M. Kawanabe, Enhancing recognition of visual concepts with primitive color histograms via non-sparse multiple kernel learning, in Proceedings of the 10th Workshop of the Cross-Language Evaluation Forum (CLEF’09), Corfu, Greece, ed. by C. Peters, B. Caputo, J. Gonzalo, G.J.F. Jones, J. Kalpathy-Cramer, H. Müller, T. Tsikrika. Volume 6242 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2009), pp. 269–276. http://dblp.uni-trier.de/db/conf/clef/clef2009-2.html#BinderK09

  • A. Binder, K.R. Müller, M. Kawanabe, On taxonomies for multi-class image categorization. Int. J. Comput. Vis. 99(3), 281–301 (2012a). http://dblp.uni-trier.de/db/journals/ijcv/ijcv99.html#BinderMK12

  • A. Binder, S. Nakajima, M. Kloft, C. Müller, W. Samek, U. Brefeld, K.R. Müller, M. Kawanabe, Insights from classifying visual concepts with multiple kernel learning. PLoS One 7(8) (2012b). http://dblp.uni-trier.de/db/journals/corr/corr1112.html#abs-1112-3697

  • A. Binder, W. Samek, M. Kloft, C. Müller, K.R. Müller, M. Kawanabe, The joint submission of the TU Berlin and Fraunhofer FIRST (TUBFI) to the ImageCLEF2011 photo annotation task, in CLEF (Notebook Papers/Labs/Workshop), Amsterdam, ed. by V. Petras, P. Forner, P.D. Clough, 2011. http://dblp.uni-trier.de/db/conf/clef/clef2011w.html#BinderSKMMK11

  • A. Binder, W. Samek, K.R. Müller, M. Kawanabe, Enhanced representation and multi-task learning for image annotation. Comput. Vis. Image Underst. 117(5), 466–478 (2013). http://dblp.uni-trier.de/db/journals/cviu/cviu117.html#BinderSMK13

  • A. Binder, W. Wojcikiewicz, C. Müller, M. Kawanabe, A hybrid supervised-unsupervised vocabulary generation algorithm for visual concept recognition, in Proceedings of the 10th Asian Conference on Computer Vision (ACCV’10), Queenstown, ed. by R. Kimmel, R. Klette, A. Sugimoto. Volume 6494 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2010), pp. 95–108. http://dblp.uni-trier.de/db/conf/accv/accv2010-3.html#BinderWMK10

  • C. Cortes, V. Vapnik, Support vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  • A. Cruz-Roa, J.C. Caicedo, F.A. Gonzáez, Visual pattern mining in histology image collections using bag of features. Artif. Intell. Med. 52(2), 91–106 (2011). http://dblp.uni-trier.de/db/journals/artmed/artmed52.html#Cruz-RoaCG11

  • G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray, Visual categorization with bags of keypoints. Workshop on Statistical Learning in Computer Vision (ECCV), Prague, 2004, pp. 1–22

    Google Scholar 

  • G. Csurka, F. Perronnin, L. Marchesotti, S. Clinchant, J. Ah-Pine, Fisher kernel representation of images and some of its successful applications, in Proceedings of the International Conference on Computer Vision and Theory Applications (VISAPP’10), Angers, ed. by P. Richard, J. Braz (INSTICC, 2010), pp. 21–25. http://dblp.uni-trier.de/db/conf/visapp/visapp2010-1.html#CsurkaPMCA10

  • J. Deng, A.C. Berg, F.F. Li, Hierarchical semantic indexing for large scale image retrieval, in Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR’11), Colorado Springs (IEEE, 2011), pp. 785–792. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2011.html#DengBL11

  • M. Everingham, L.J.V. Gool, C.K.I. Williams, J.M. Winn, A. Zisserman, The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010). http://dblp.uni-trier.de/db/journals/ijcv/ijcv88.html#EveringhamGWWZ10

  • Y. Freund, R.E. Schapire, A decision theoretic generalization of on-line learning and an application to boosting, in Second European Conference on Computational Learning Theory (EuroCOLT-95), Barcelona, ed. by P.M.B. Vitányi (Springer, Berlin/Heidelberg/New York, 1995), pp. 23–37. citeseer.nj.nec.com/freund95decisiontheoretic.html

  • P.V. Gehler, S. Nowozin, On feature combination for multiclass object classification, in Proceedings of the 12th IEEE International Conference on Computer Vision (ICCV’09), Kyoto (IEEE, 2009), pp. 221–228. http://dblp.uni-trier.de/db/conf/iccv/iccv2009.html#GehlerN09

  • T. Hofmann, Probabilistic latent semantic analysis, in Proceedings of the Uncertainty in Artificial Intelligence (UAI’99), Stockholm, 1999, pp. 289–296

    Google Scholar 

  • N. Inoue, Y. Kamishima, T. Wada, K. Shinoda, S. Sato, TokyoTech+Canon at TRECVID 2011, in TREC Video Retrieval Evaluation, Gaithersburg, Maryland, 2011

    Google Scholar 

  • F. Jurie, B. Triggs, Creating efficient codebooks for visual recognition, in Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV’05), Beijing (IEEE Computer Society, 2005), pp. 604–610. http://dblp.uni-trier.de/db/conf/iccv/iccv2005-1.html#JurieT05

  • M. Kawanabe, A. Binder, C. Müller, W. Wojcikiewicz, Multi-modal visual concept classification of images via Markov random walk over tags, in Proceedings of the IEEE Workshop on Applications of Computer Vision (WACV’11), Kona (IEEE Computer Society, 2011), pp. 396–401. http://dblp.uni-trier.de/db/conf/wacv/wacv2011.html#KawanabeBMW11

  • M. Kloft, U. Brefeld, S. Sonnenburg, A. Zien, Lp-norm multiple kernel learning. J. Mach. Learn. Res. 12, 953–997 (2011). http://dblp.uni-trier.de/db/journals/jmlr/jmlr12.html#KloftBSZ11

  • A. Kumar, C. Sminchisescu, Support kernel machines for object recognition, in Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV’07), Rio de Janeiro (IEEE, 2007), pp. 1–8. http://dblp.uni-trier.de/db/conf/iccv/iccv2007.html#KumarS07

  • C.H. Lampert, M.B. Blaschko, A multiple kernel learning approach to joint multi-class object detection, in Proceedings of the 30th DAGM Symposium, Munich, ed. by G. Rigoll. Volume 5096 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2008), pp. 31–40. http://dblp.uni-trier.de/db/conf/dagm/dagm2008.html#LampertB08

  • G.R.G. Lanckriet, N. Cristianini, P.L. Bartlett, L.E. Ghaoui, M.I. Jordan, Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004). http://dblp.uni-trier.de/db/journals/jmlr/jmlr5.html#LanckrietCBGJ03

  • D.G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  • P. Massart, É. Nédélec, Risk bounds for statistical learning. Ann. Stat. 34(5), 2326–2366 (2006)

    Article  MATH  Google Scholar 

  • K.R. Müller, S. Mika, G. Rätsch, S. Tsuda, B. Schölkopf, An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–202 (2001). http://www.ist.temple.edu/~vucetic/cis526fall2003/SVMintro.pdf

  • D. Nistér, H. Stewénius, Scalable recognition with a vocabulary tree, in Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York (IEEE Computer Society, 2006), pp. 2161–2168. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2006-2.html#NisterS06

  • S. Nowak, K. Nagel, J. Liebetrau, The CLEF 2011 photo annotation and concept-based retrieval tasks, in CLEF (Notebook Papers/Labs/Workshop), Amsterdam, ed. by V. Petras, P. Forner, P.D. Clough, 2011. http://dblp.uni-trier.de/db/conf/clef/clef2011w.html#NowakNL11

  • D. Parikh, Recognizing jumbled images: the role of local and global information in image classification, in ed. by D.N. Metaxas, L. Quan, A. Sanfeliu, L.J.V. Gool, Proceedings of the International Conference on Computer Vision (ICCV’11), Barcelona (IEEE, 2011), pp. 519–526. http://dblp.uni-trier.de/db/conf/iccv/iccv2011.html#Parikh11

  • W. Samek, A. Binder, M. Kawanabe, Multi-task learning via non-sparse multiple kernel learning, in Proceedings of the 14th International Conference on Computer Analysis of Images and Patterns (CAIP’11), Seville, ed. by P. Real, D. Díaz-Pernil, H. Molina-Abril, A. Berciano, W.G. Kropatsch. Volume 6854 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2011), pp. 335–342. http://dblp.uni-trier.de/db/conf/caip/caip2011-1.html#SamekBK11

  • S. Sonnenburg, G. Rätsch, C. Schäfer, B. Schölkopf, Large scale multiple kernel learning. J. Mach. Learn. Res. 7, 1531–1565 (2006). http://jmlr.org/papers/volume7/sonnenburg06a/sonnenburg06a.pdf

  • M. Sugiyama, M. Krauledat, K.R. Müller, Covariate shift adaptation by importance weighted cross validation. J. Mach. Learn. Res. 8, 985–1005 (2007). http://dblp.uni-trier.de/db/journals/jmlr/jmlr8.html#SugiyamaKM07

  • J.R.R. Uijlings, A.W.M. Smeulders, R.J.H. Scha, Real-time visual concept classification. IEEE Trans. Multimed. 12(7), 665–681 (2010). http://dblp.uni-trier.de/db/journals/tmm/tmm12.html#UijlingsSS10

  • K.E.A. van de Sande, T. Gevers, C.G.M. Snoek, Evaluating color descriptors for object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1582–1596 (2010). http://dblp.uni-trier.de/db/journals/pami/pami32.html#SandeGS10

  • K.E.A. van de Sande, T. Gevers, C.G.M. Snoek, Empowering visual categorization with the GPU. IEEE Trans. Multimed. 13(1), 60–70 (2011a). http://dblp.uni-trier.de/db/journals/tmm/tmm13.html#SandeGS11

  • K.E.A. van de Sande, J.R.R. Uijlings, T. Gevers, A.W.M. Smeulders, Segmentation as selective search for object recognition, in Proceedings of the 13th International Conference on Computer Vision (ICCV’11), Barcelona, ed. by D.N. Metaxas, L. Quan, A. Sanfeliu, L.J.V. Gool (IEEE, 2011b), pp. 1879–1886. http://dblp.uni-trier.de/db/conf/iccv/iccv2011.html#SandeUGS11

  • J. van Gemert, J.M. Geusebroek, C.J. Veenman, A.W.M. Smeulders, Kernel codebooks for scene categorization, in Proceedings of the 10th European Conference on Computer Vision (ECCV’08), Marseille, 2008, pp. 696–709. http://dblp.uni-trier.de/db/conf/eccv/eccv2008-3.html#GemertGVS08

  • P. von Bünau, F.C. Meinecke, F.C. Király, K.R. Müller, Finding stationary subspaces in multivariate time series. Phys. Rev. Lett. 103(21), 214101 (2009)

    Google Scholar 

  • J. Wang, J. Yang, K. Yu, F. Lv, T.S. Huang, Y. Gong, Locality-constrained linear coding for image classification, in Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10), San Francisco (IEEE, 2010), pp. 3360–3367. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2010.html#WangYYLHG10

  • W. Wojcikiewicz, A. Binder, M. Kawanabe, Enhancing image classification with class-wise clustered vocabularies, in Proceedings of the 20th International Conference on Pattern Recognition (ICPR’10), Istanbul (IEEE, 2010a), pp. 1060–1063. http://dblp.uni-trier.de/db/conf/icpr/icpr2010.html#WojcikiewiczBK10

  • W. Wojcikiewicz, A. Binder, M. Kawanabe, Shrinking large visual vocabularies using multi-label agglomerative information bottleneck, in Proceedings of the 17th IEEE International Conference on Image Processing (ICIP’10), Hong Kong, 2010b, pp. 3849–3852

    Google Scholar 

  • R. Xu, Y. Hirano, R. Tachibana, S. Kido, Classification of diffuse lung disease patterns on high-resolution computed tomography by a bag of words approach, in Proceedings of the 14th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI’11), Toronto, ed. by G. Fichtinger, A.L. Martel, T.M. Peters. Volume 6893 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2011), pp. 183–190. http://dblp.uni-trier.de/db/conf/miccai/miccai2011-3.html#XuHTK11

  • F. Yan, J. Kittler, K. Mikolajczyk, M.A. Tahir, Non-sparse multiple kernel learning for fisher discriminant analysis, in Proceedings of the 2009 IEEE International Conference on Data Mining (ICDM’09), Miami, ed. by W. Wei, H. Kargupta, S. Ranka, P.S. Yu, X. Wu (IEEE Computer Society, 2009), pp. 1064–1069. http://dblp.uni-trier.de/db/conf/icdm/icdm2009.html#YanKMT09

  • J. Yang, K. Yu, Y. Gong, T.S. Huang, Linear spatial pyramid matching using sparse coding for image classification, in Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’09), Miami (IEEE, 2009a), pp. 1794–1801. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2009.html#YangYGH09

  • L. Yang, N. Zheng, Y. Jie, M. Chen, H. Chen, A biased sampling strategy for object categorization, in Proceedings of the 12th International Conference on Computer Vision (ICCV’09), Kyoto (IEEE, 2009b), pp. 1141–1148. http://dblp.uni-trier.de/db/conf/iccv/iccv2009.html#YangZYCC09

  • K. Yu, T. Zhang, Y. Gong, Nonlinear learning using local coordinate coding, in Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS’09), Vancouver, ed. by Y. Bengio, D. Schuurmans, J.D. Lafferty, C.K.I. Williams, A. Culotta (Curran Associates, Inc., 2009), pp. 2223–2231. http://dblp.uni-trier.de/db/conf/nips/nips2009.html#YuZG09

Download references

Acknowledgements

This work was primarily supported by the German Federal Ministry for Economic Affairs and Energy (BMWi) under the THESEUS research program (Grant 01MQ07018). Furthermore it was in part supported by the World Class University Program through the National Research Foundation of Korea funded by the Korean Ministry of Education, Science, and Technology, under Grant R31-10008. We express our thanks to Volker Tresp, the work package leader at CTC WP6, Ralf Schäfer from the Fraunhofer HHI and Shinichi Nakajima from the Nikon corporation for the fruitful collaboration.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Binder .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Binder, A., Samek, W., Müller, KR., Kawanabe, M. (2014). Machine Learning for Visual Concept Recognition and Ranking for Images. In: Wahlster, W., Grallert, HJ., Wess, S., Friedrich, H., Widenka, T. (eds) Towards the Internet of Services: The THESEUS Research Program. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-06755-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06755-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06754-4

  • Online ISBN: 978-3-319-06755-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics