research-article

The Focus-Aspect-Value Model for Explainable Prediction of Subjective Visual Interpretation

Authors:

Tushar Karayil,

Philipp Blandfort,

Andreas DengelAuthors Info & Claims

ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval

Pages 16 - 24

https://doi.org/10.1145/3323873.3325026

Published: 05 June 2019 Publication History

Abstract

Subjective visual interpretation is a challenging yet important topic in computer vision. Many approaches reduce this problem to the prediction of adjective- or attribute-labels from images. However,most of these do not take attribute semantics into account, or only process the image in a holistic manner. Furthermore, there is alack of relevant datasets with fine-grained subjective labels. In this paper, we propose the Focus-Aspect-Value (FAV) model to structure the process of capturing subjectivity in image processing,and introduce a novel dataset following this way of modeling. We run experiments on this dataset to compare several deep learning methods and find that incorporating context information based on tensor multiplication outperforms the default way of information fusion (concatenation).

References

[1]

David Bamman, Chris Dyer, and Noah A. Smith. 2014. Distributed Representations of Geographically Situated Language. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 828--834.

[2]

Marco Baroni and Roberto Zamparelli. 2010. Nouns Are Vectors, Adjectives Are Matrices: Representing Adjective-noun Constructions in Semantic Space. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 1183--1193. http://dl.acm.org/citation.cfm?id=1870658.1870773

[3]

Hedi Ben-Younes, Rémi Cadene, Matthieu Cord, and Nicolas Thome. 2017. Mutan: Multimodal tucker fusion for visual question answering. In Proc. IEEE Int. Conf. Comp. Vis, Vol. 3.

[4]

Damian Borth, Tao Chen, Rongrong Ji, and Shih-Fu Chang. 2013a. Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In Proceedings of the 21st ACM international conference on Multimedia. ACM, 459--460.

Digital Library

[5]

Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. 2013b. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the 21st ACM international conference on Multimedia. ACM, 223--232.

Digital Library

[6]

Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. 2013c. Large-scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs. In Proceedings of the 21st ACM International Conference on Multimedia (MM '13). ACM, New York, NY, USA, 223--232.

Digital Library

[7]

Heinrich H Bulthoff. 1996. Bayesian decision theory and psychophysics. Perception as Bayesian inference, Vol. 123 (1996).

Digital Library

[8]

Claus-Christian Carbon. 2011. Cognitive mechanisms for explaining dynamics of aesthetic appreciation. i-Perception, Vol. 2, 7 (2011), 708--719.

[9]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. Ieee, 248--255.

[10]

Sagnik Dhar, Vicente Ordonez, and Tamara L Berg. 2011. High level describable attributes for predicting aesthetics and interestingness. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 1657--1664.

Digital Library

[11]

Ali Farhadi, Ian Endres, Derek Hoiem, and David Forsyth. 2009a. Describing objects by their attributes. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 1778--1785.

[12]

A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. 2009b. Describing objects by their attributes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 1778--1785.

[13]

Emiliano Guevara. 2010. A Regression Model of Adjective-noun Compositionality in Distributional Semantics. In Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics (GEMS '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 33--37. http://dl.acm.org/citation.cfm?id=1870516.1870521

Digital Library

[14]

Birgit Hamp and Helmut Feldweg. 1997. GermaNet - a Lexical-Semantic Net for German. In In Proceedings of ACL workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications. 9--15.

[15]

Matthias Hartung, Fabian Kaupmann, Soufian Jebbara, and Philipp Cimiano. 2017. Learning Compositionality Functions on Word Embeddings for Modelling Attribute Meaning in Adjective-Noun Phrases. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Association for Computational Linguistics, 54--64. http://aclweb.org/anthology/E17--1006

[16]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.

[17]

Guido Hesselmann, Christian A Kell, and Andreas Kleinschmidt. 2008. Ongoing activity fluctuations in hMT+ bias the perception of coherent visual motion. Journal of Neuroscience, Vol. 28, 53 (2008), 14481--14485.

[18]

Brendan Jou and Shih-Fu Chang. 2016. Deep Cross Residual Learning for Multitask Visual Recognition. In ACM Multimedia .

Digital Library

[19]

Brendan Jou, Tao Chen, Nikolaos Pappas, Miriam Redi, Mercan Topkara, and Shih-Fu Chang. 2015. Visual affect around the world: A large-scale multilingual visual sentiment ontology. In Proceedings of the 23rd ACM international conference on Multimedia. ACM, 159--168.

Digital Library

[20]

Sebastian Kalkowski, Christian Schulze, Andreas Dengel, and Damian Borth. 2015. Real-time analysis and visualization of the YFCC100M dataset. In Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions. ACM, 25--30.

Digital Library

[21]

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et almbox. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, Vol. 123, 1 (2017), 32--73.

Digital Library

[22]

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, Michael Bernstein, and Li Fei-Fei. 2016. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. https://arxiv.org/abs/1602.07332

[23]

Angeliki Lazaridou, Georgiana Dinu, Adam Liska, and Marco Baroni. 2015. From visual attributes to adjectives through decompositional distributional semantics. TACL, Vol. 3 (2015), 183--196.

[24]

Bing Liu and Lei Zhang. 2012. A survey of opinion mining and sentiment analysis. In Mining text data. Springer, 415--463.

[25]

Anush K Moorthy, Pere Obrador, and Nuria Oliver. 2010. Towards computational models of the visual aesthetic appeal of consumer videos. In European Conference on Computer Vision. Springer, 1--14.

[26]

Genevieve Patterson, Chen Xu, Hang Su, and James Hays. 2014. The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding. International Journal of Computer Vision, Vol. 108, 1--2 (2014), 59--81.

Digital Library

[27]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.

Digital Library

[28]

Piyush Sharma, Nan Ding, Sebastian Goodman, and Radu Soricut. 2018. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning. In Proceedings of ACL .

[29]

Marie L Smith, Frédéric Gosselin, and Philippe G Schyns. 2012. Measuring internal representations from behavioral and brain data. Current Biology, Vol. 22, 3 (2012), 191--196.

Cited By

Sarridis IKoutlis CPapadopoulos SDiou CGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)FaceX: Understanding Face Attribute Classifiers through Summary Model ExplanationsProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658007(758-766)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658007
Blandfort PKarayil THees JDengel A(2020)The Focus–Aspect–Value model for predicting subjective visual attributesInternational Journal of Multimedia Information Retrieval10.1007/s13735-019-00188-59:1(47-60)Online publication date: 2-Jan-2020
https://doi.org/10.1007/s13735-019-00188-5
Karayil TIrfan ARaue FHees JDengel A(2019)Conditional GANs for Image Captioning with SentimentsArtificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series10.1007/978-3-030-30490-4_25(300-312)Online publication date: 17-Sep-2019
https://dl.acm.org/doi/10.1007/978-3-030-30490-4_25

Index Terms

The Focus-Aspect-Value Model for Explainable Prediction of Subjective Visual Interpretation
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

Aspect-Based Sentiment Classification Model Based on Multi-view Information Fusion
Web Information Systems and Applications
Abstract
Aspect-based sentiment classification is one of the hot tasks in the field of natural language processing. The task aims to judge the sentiment polarity of the target word, also known as the aspect term, specified in the sentence. The current ...
Transformer-Based Multi-aspect Modeling for Multi-aspect Multi-sentiment Analysis
Natural Language Processing and Chinese Computing
Abstract
Aspect-based sentiment analysis (ABSA) aims at analyzing the sentiment of a given aspect in a sentence. Recently, neural network-based methods have achieved promising results in existing ABSA datasets. However, these datasets tend to degenerate to ...
Aspect-based sentiment analysis via fusing multiple sources of textual knowledge
Abstract
The aim of aspect-based sentiment analysis (ABSA) is to predict sentiment polarity of text toward a specific aspect. Although existing neural network models show promising performances on ABSA, their capabilities can be unsatisfactory ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval

June 2019

427 pages

ISBN:9781450367653

DOI:10.1145/3323873

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada
,
Alberto Del Bimbo
University of Florence, Italy
,
Zhongfei Zhang
Binghamton University, State University of New York, USA
,
Program Chairs:
Alexander Hauptmann
Carnegie Mellon University, USA
,
K. Selcuk Candan
Arizona State University, USA
,
Marco Bertini
University of Florence, Italy
,
Lexing Xie
Australia National University, Australia
,
Xiao-Yong Wei
Sichuan University, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Nvidia
Bundesministerium für Bildung und Forschung

Conference

ICMR '19

Sponsor:

SIGMM

ICMR '19: International Conference on Multimedia Retrieval

June 10 - 13, 2019

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
141
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sarridis IKoutlis CPapadopoulos SDiou CGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)FaceX: Understanding Face Attribute Classifiers through Summary Model ExplanationsProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658007(758-766)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658007
Blandfort PKarayil THees JDengel A(2020)The Focus–Aspect–Value model for predicting subjective visual attributesInternational Journal of Multimedia Information Retrieval10.1007/s13735-019-00188-59:1(47-60)Online publication date: 2-Jan-2020
https://doi.org/10.1007/s13735-019-00188-5
Karayil TIrfan ARaue FHees JDengel A(2019)Conditional GANs for Image Captioning with SentimentsArtificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series10.1007/978-3-030-30490-4_25(300-312)Online publication date: 17-Sep-2019
https://dl.acm.org/doi/10.1007/978-3-030-30490-4_25

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten