Semantic Bottleneck for Computer Vision Tasks

Bucher, Maxime; Herbin, Stéphane; Jurie, Frédéric

doi:10.1007/978-3-030-20890-5_44

Maxime Bucher^18,19,
Stéphane Herbin¹⁸ &
Frédéric Jurie¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11362))

Included in the following conference series:

Asian Conference on Computer Vision

2339 Accesses

Abstract

This paper introduces a novel method for the representation of images that is semantic by nature, addressing the question of computation intelligibility in computer vision tasks. More specifically, our proposition is to introduce what we call a semantic bottleneck in the processing pipeline, which is a crossing point in which the representation of the image is entirely expressed with natural language, while retaining the efficiency of numerical representations. We show that our approach is able to generate semantic representations that give state-of-the-art results on semantic content-based image retrieval and also perform very well on image classification tasks. Intelligibility is evaluated through user centered experiments for failure detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Search-Based Image Annotation: Extracting Semantics from Similar Images

Approximate Query Matching for Graph-Based Holistic Image Retrieval

Semantic Gap in Image and Video Analysis: An Introduction

References

Das, A., et al.: Visual dialog. In: CVPR (2016)
Google Scholar
Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an approach to evaluating interpretability of ML. arXiv:1806.00069 (2018)
Doshi-Velez, F., Kim, B.: A roadmap for a rigorous science of interpretability. arXiv:1702.08608 (2017)
Biran, O., Cotton, C.: Explanation and justification in ML: a survey. In: IJCAI (2017)
Google Scholar
Ras, G., Haselager, P., van Gerven, M.: Explanation methods in deep learning: users, values, concerns and challenges. arXiv:1803.07517 (2018)
Google Scholar
Zhang, P., Wang, J., Farhadi, A., Hebert, M., Parikh, D.: Predicting failures of vision systems. In: CVPR (2014)
Google Scholar
Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. TPAMI (2014)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. TPAMI (2017)
Google Scholar
Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: CVPR (2017)
Google Scholar
Gordo, A., Larlus, D.: Beyond instance-level image retrieval: leveraging captions to learn a global visual representation for semantic retrieval. In: CVPR (2017)
Google Scholar
Dai, B., Lin, D., Urtasun, R., Fidler, S.: Towards diverse and natural image descriptions via a conditional GAN. In: CVPR (2017)
Google Scholar
Dai, B., Lin, D.: Contrastive learning for image captioning. In: NIPS (2017)
Google Scholar
Seo, P.H., Lehrmann, A., Han, B., Sigal, L.: Visual reference resolution using attention memory for visual dialog. In: NIPS (2017)
Google Scholar
Lu, J., Kannan, A., Yang, J., Parikh, D., Batra, D.: Best of both worlds: transferring knowledge from discriminative learning to a generative visual dialog model. In: NIPS (2017)
Google Scholar
Malinowski, M., Rohrbach, M., Fritz, M.: Ask your neurons: a neural-based approach to answering questions about images. In: ICCV (2015)
Google Scholar
Lin, X., Parikh, D.: Leveraging visual question answering for image-caption ranking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 261–277. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_17
Chapter Google Scholar
Zhu, Y., Groth, O., Bernstein, M., Fei-Fei, L.: Visual7w: grounded question answering in images. In: CVPR (2016)
Google Scholar
Jabri, A., Joulin, A., van der Maaten, L.: Revisiting VQA baselines. In: ECCV (2016)
Google Scholar
Wu, Q., Wang, P., Shen, C., Dick, A., van den Hengel, A.: Ask me anything: free-form visual question answering based on knowledge from external sources. In: CVPR (2016)
Google Scholar
Lipton, Z.C.: The mythos of model interpretability. In: ICML (2016)
Google Scholar
Doran, D., Schulz, S., Besold, T.R.: What does explainable AI really mean? A new conceptualization of perspectives. arXiv:1710.00794 (2017)
Hohman, F.M., Kahng, M., Pienta, R., Chau, D.H.: Visual analytics in deep learning: an interrogative survey for the next frontiers. In: TVCG (2018)
Google Scholar
Guidotti, R., Monreale, A., Turini, F., Pedreschi, D., Giannotti, F.: A survey of methods for explaining black box models. arXiv:1802.01933 (2018)
Zhang, Q., Yang, Y., Wu, Y.N., Zhu, S.C.: Interpreting CNNs via decision trees. arXiv:1802.00121 (2018)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV (2017)
Google Scholar
Rajani, N.F., Mooney, R.J.: Using explanations to improve ensembling of visual question answering systems. In: IJCAI (2017)
Google Scholar
Montavon, G., Samek, W., Müller, K.R.: Methods for interpreting and understanding deep neural networks. Digit. Sign. Process. Rev. J. (2018)
Google Scholar
Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., Darrell, T.: Generating visual explanations. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 3–19. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_1
Chapter Google Scholar
Park, D.H., et al.: Multimodal explanations: justifying decisions and pointing to the evidence. In: CVPR (2018)
Google Scholar
Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR (2015)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Chapter Google Scholar
Dosovitskiy, A., Brox, T.: Inverting visual representations with convolutional networks. In: CVPR (2016)
Google Scholar
Olah, C., et al.: The building blocks of interpretability. Distill (2018)
Google Scholar
Zhang, Q., Cao, R., Shi, F., Wu, Y.N., Zhu, S.C.: Interpreting CNN knowledge via an explanatory graph. arXiv:1708.01785 (2017)
Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: quantifying interpretability of deep visual representations. In: CVPR (2017)
Google Scholar
Kindermans, P.J., et al.: The (un)reliability of saliency methods. arXiv:1711.00867 (2017)
Serban, I.V., et al.: Generating factoid questions with recurrent neural networks: the 30M factoid question-answer corpus. In: ACL (2016)
Google Scholar
Li, Y., Huang, C., Tang, X., Change Loy, C.: Learning to disambiguate by asking discriminative questions. In: CVPR (2017)
Google Scholar
Mostafazadeh, N., Misra, I., Devlin, J., Mitchell, M., He, X., Vanderwende, L.: Generating natural questions about an image. In: ACL (2016)
Google Scholar
Das, A., Kottur, S., Moura, J.M.F., Lee, S., Batra, D.: Learning cooperative visual dialog agents with deep reinforcement learning. In: ICCV 2017 (2017)
Google Scholar
Ganju, S., Russakovsky, O., Gupta, A.: What’s in a question: using visual questions as a form of supervision. In: CVPR (2017)
Google Scholar
Zhu, Y., Lim, J.J., Fei-Fei, L.: Knowledge acquisition for visual question answering via iterative querying. In: CVPR (2017)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2014)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV (2015)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. (1997)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (2013)
Google Scholar
Antol, S., et al.: VQA: visual question answering. In: ICCV (2015)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

ONERA, Université Paris-Saclay, 91123, Palaiseau, France
Maxime Bucher & Stéphane Herbin
Normandie Univ, UNICAEN, ENSICAEN, CNRS, Mont-Saint-Aignan, France
Maxime Bucher & Frédéric Jurie

Authors

Maxime Bucher
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Herbin
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Jurie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maxime Bucher .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C. V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bucher, M., Herbin, S., Jurie, F. (2019). Semantic Bottleneck for Computer Vision Tasks. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11362. Springer, Cham. https://doi.org/10.1007/978-3-030-20890-5_44

Download citation

DOI: https://doi.org/10.1007/978-3-030-20890-5_44
Published: 02 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20889-9
Online ISBN: 978-3-030-20890-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics