skip to main content
10.1145/3323873.3325026acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

The Focus-Aspect-Value Model for Explainable Prediction of Subjective Visual Interpretation

Published: 05 June 2019 Publication History

Abstract

Subjective visual interpretation is a challenging yet important topic in computer vision. Many approaches reduce this problem to the prediction of adjective- or attribute-labels from images. However,most of these do not take attribute semantics into account, or only process the image in a holistic manner. Furthermore, there is alack of relevant datasets with fine-grained subjective labels. In this paper, we propose the Focus-Aspect-Value (FAV) model to structure the process of capturing subjectivity in image processing,and introduce a novel dataset following this way of modeling. We run experiments on this dataset to compare several deep learning methods and find that incorporating context information based on tensor multiplication outperforms the default way of information fusion (concatenation).

References

[1]
David Bamman, Chris Dyer, and Noah A. Smith. 2014. Distributed Representations of Geographically Situated Language. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 828--834.
[2]
Marco Baroni and Roberto Zamparelli. 2010. Nouns Are Vectors, Adjectives Are Matrices: Representing Adjective-noun Constructions in Semantic Space. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 1183--1193. http://dl.acm.org/citation.cfm?id=1870658.1870773
[3]
Hedi Ben-Younes, Rémi Cadene, Matthieu Cord, and Nicolas Thome. 2017. Mutan: Multimodal tucker fusion for visual question answering. In Proc. IEEE Int. Conf. Comp. Vis, Vol. 3.
[4]
Damian Borth, Tao Chen, Rongrong Ji, and Shih-Fu Chang. 2013a. Sentibank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content. In Proceedings of the 21st ACM international conference on Multimedia. ACM, 459--460.
[5]
Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. 2013b. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the 21st ACM international conference on Multimedia. ACM, 223--232.
[6]
Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel, and Shih-Fu Chang. 2013c. Large-scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs. In Proceedings of the 21st ACM International Conference on Multimedia (MM '13). ACM, New York, NY, USA, 223--232.
[7]
Heinrich H Bulthoff. 1996. Bayesian decision theory and psychophysics. Perception as Bayesian inference, Vol. 123 (1996).
[8]
Claus-Christian Carbon. 2011. Cognitive mechanisms for explaining dynamics of aesthetic appreciation. i-Perception, Vol. 2, 7 (2011), 708--719.
[9]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. Ieee, 248--255.
[10]
Sagnik Dhar, Vicente Ordonez, and Tamara L Berg. 2011. High level describable attributes for predicting aesthetics and interestingness. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 1657--1664.
[11]
Ali Farhadi, Ian Endres, Derek Hoiem, and David Forsyth. 2009a. Describing objects by their attributes. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 1778--1785.
[12]
A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. 2009b. Describing objects by their attributes. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 1778--1785.
[13]
Emiliano Guevara. 2010. A Regression Model of Adjective-noun Compositionality in Distributional Semantics. In Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics (GEMS '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 33--37. http://dl.acm.org/citation.cfm?id=1870516.1870521
[14]
Birgit Hamp and Helmut Feldweg. 1997. GermaNet - a Lexical-Semantic Net for German. In In Proceedings of ACL workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications. 9--15.
[15]
Matthias Hartung, Fabian Kaupmann, Soufian Jebbara, and Philipp Cimiano. 2017. Learning Compositionality Functions on Word Embeddings for Modelling Attribute Meaning in Adjective-Noun Phrases. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Association for Computational Linguistics, 54--64. http://aclweb.org/anthology/E17--1006
[16]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.
[17]
Guido Hesselmann, Christian A Kell, and Andreas Kleinschmidt. 2008. Ongoing activity fluctuations in hMT+ bias the perception of coherent visual motion. Journal of Neuroscience, Vol. 28, 53 (2008), 14481--14485.
[18]
Brendan Jou and Shih-Fu Chang. 2016. Deep Cross Residual Learning for Multitask Visual Recognition. In ACM Multimedia .
[19]
Brendan Jou, Tao Chen, Nikolaos Pappas, Miriam Redi, Mercan Topkara, and Shih-Fu Chang. 2015. Visual affect around the world: A large-scale multilingual visual sentiment ontology. In Proceedings of the 23rd ACM international conference on Multimedia. ACM, 159--168.
[20]
Sebastian Kalkowski, Christian Schulze, Andreas Dengel, and Damian Borth. 2015. Real-time analysis and visualization of the YFCC100M dataset. In Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions. ACM, 25--30.
[21]
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et almbox. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, Vol. 123, 1 (2017), 32--73.
[22]
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, Michael Bernstein, and Li Fei-Fei. 2016. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. https://arxiv.org/abs/1602.07332
[23]
Angeliki Lazaridou, Georgiana Dinu, Adam Liska, and Marco Baroni. 2015. From visual attributes to adjectives through decompositional distributional semantics. TACL, Vol. 3 (2015), 183--196.
[24]
Bing Liu and Lei Zhang. 2012. A survey of opinion mining and sentiment analysis. In Mining text data. Springer, 415--463.
[25]
Anush K Moorthy, Pere Obrador, and Nuria Oliver. 2010. Towards computational models of the visual aesthetic appeal of consumer videos. In European Conference on Computer Vision. Springer, 1--14.
[26]
Genevieve Patterson, Chen Xu, Hang Su, and James Hays. 2014. The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding. International Journal of Computer Vision, Vol. 108, 1--2 (2014), 59--81.
[27]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.
[28]
Piyush Sharma, Nan Ding, Sebastian Goodman, and Radu Soricut. 2018. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning. In Proceedings of ACL .
[29]
Marie L Smith, Frédéric Gosselin, and Philippe G Schyns. 2012. Measuring internal representations from behavioral and brain data. Current Biology, Vol. 22, 3 (2012), 191--196.

Cited By

View all
  • (2024)FaceX: Understanding Face Attribute Classifiers through Summary Model ExplanationsProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658007(758-766)Online publication date: 30-May-2024
  • (2020)The Focus–Aspect–Value model for predicting subjective visual attributesInternational Journal of Multimedia Information Retrieval10.1007/s13735-019-00188-59:1(47-60)Online publication date: 2-Jan-2020
  • (2019)Conditional GANs for Image Captioning with SentimentsArtificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series10.1007/978-3-030-30490-4_25(300-312)Online publication date: 17-Sep-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval
June 2019
427 pages
ISBN:9781450367653
DOI:10.1145/3323873
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fav
  2. images
  3. information fusion
  4. logistic regression
  5. neural network
  6. subjectivity
  7. zero-shot

Qualifiers

  • Research-article

Funding Sources

  • Nvidia
  • Bundesministerium für Bildung und Forschung

Conference

ICMR '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)FaceX: Understanding Face Attribute Classifiers through Summary Model ExplanationsProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658007(758-766)Online publication date: 30-May-2024
  • (2020)The Focus–Aspect–Value model for predicting subjective visual attributesInternational Journal of Multimedia Information Retrieval10.1007/s13735-019-00188-59:1(47-60)Online publication date: 2-Jan-2020
  • (2019)Conditional GANs for Image Captioning with SentimentsArtificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series10.1007/978-3-030-30490-4_25(300-312)Online publication date: 17-Sep-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media