research-article

Visual Concept Naming: Discovering Well-Recognized Textual Expressions of Visual Concepts

Authors:
Masayasu Muraoka

IBM Research - Tokyo, Japan

IBM Research - Tokyo, Japan
View Profile

,
Tetsuya Nasukawa

IBM Research - Tokyo, Japan

IBM Research - Tokyo, Japan
View Profile

,
Rudy Raymond

IBM Research - Tokyo, Japan

IBM Research - Tokyo, Japan
View Profile

,
Bishwaranjan Bhattacharjee

IBM Thomas J. Watson Research Center, USA

IBM Thomas J. Watson Research Center, USA
View Profile

Authors Info & Claims

WWW '20: Proceedings of The Web Conference 2020April 2020Pages 2556–2562https://doi.org/10.1145/3366423.3380006

Published:20 April 2020Publication History

WWW '20: Proceedings of The Web Conference 2020

Pages 2556–2562

ABSTRACT

We propose a task called Visual Concept Naming to associate visual concepts with the corresponding textual expressions, i.e., names of visual concepts found in real-world multimodal data. To tackle the task, we create a dataset consisting of 3.4 million tweets in total in three languages. We also propose a method for extracting candidate names of visual concepts and validating them by exploiting Web-based knowledge obtained through image search. To demonstrate the capability of our method, we conduct an experiment with the dataset we create and evaluate names obtained by our method through crowdsourcing, where we establish an evaluation method to verify the names. The experimental results indicate that the proposed method can identify a wide variety of names of visual concepts. The names we obtained also show interesting insights regarding languages and countries where the languages are used.1

References

Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow, and Rebecca Passonneau. 2011. Sentiment analysis of twitter data. In Proceedings of the Workshop on Language in Social Media (LSM 2011). 30–38.Google ScholarDigital Library
Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision. 2425–2433.Google ScholarDigital Library
Shane Bergsma and Randy Goebel. 2011. Using visual information to predict lexical preference. In Proceedings of the International Conference Recent Advances in Natural Language Processing 2011. 399–405.Google Scholar
Shane Bergsma and Benjamin Van Durme. 2011. Learning bilingual lexicons using the visual similarity of labeled web images. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three. AAAI Press, 1764–1769.Google Scholar
Francis Bond and Kyonghee Paik. 2012. A Survey of WordNets and their Licenses. In Proceedings of the 6th Global WordNet Conference. 64–71.Google Scholar
Giorgos Bouritsas, Petros Koutras, Athanasia Zlatintsi, and Petros Maragos. 2018. Multimodal visual concept learning with weakly supervised techniques. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4914–4923.Google ScholarCross Ref
Kenneth Ward Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational linguistics 16, 1 (1990), 22–29.Google ScholarDigital Library
Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement 20, 1 (1960), 37–46.Google Scholar
Liping Du, Xiaoge Li, and Dayi Lin. 2016. Chinese term extraction from web pages based on expected point-wise mutual information. In 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). 1647–1651. https://doi.org/10.1109/FSKD.2016.7603424Google ScholarCross Ref
Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the web: An experimental study. Artificial intelligence 165, 1 (2005), 91–134.Google Scholar
Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision 88, 2 (2010), 303–338.Google ScholarDigital Library
Ming Hao, Christian Rohrdantz, Halldór Janetzko, Umeshwar Dayal, Daniel A Keim, Lars-Erik Haug, and Mei-Chun Hsu. 2011. Visual sentiment analysis on twitter data streams. In 2011 IEEE Conference on Visual Analytics Science and Technology (VAST). 277–278. https://doi.org/10.1109/VAST.2011.6102472Google ScholarCross Ref
Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Fürstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum. 2011. Robust disambiguation of named entities in text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 782–792.Google ScholarDigital Library
Hitoshi Isahara, Fransis Bond, Kiyotaka Uchimoto, Masao Utiyama, and Kyoko Kanzaki. 2008. Development of Japanese WordNet. In Proceedings of 6th International Conference on Language Resources and Evaluation, 2008. 2420–2423.Google Scholar
Frieda Josi, Christian Wartena, and Jean Charbonnier. 2018. Text-Based Annotation of Scientific Images Using Wikimedia Categories. In Database and Expert Systems Applications, Mourad Elloumi, Michael Granitzer, Abdelkader Hameurlain, Christin Seifert, Benno Stein, A Min Tjoa, and Roland Wagner (Eds.). Springer International Publishing, Cham, 243–253.Google Scholar
Daekook Kang and Yongtae Park. 2014. based measurement of customer satisfaction in mobile service: Sentiment analysis and VIKOR approach. Expert Systems with Applications 41, 4 (2014), 1041–1050.Google ScholarDigital Library
Douwe Kiela, Anita Lilla Verő, and Stephen Clark. 2016. Comparing data sources and architectures for deep visual representation learning in semantics. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 447–456. https://doi.org/10.18653/v1/D16-1043Google ScholarCross Ref
Kibok Lee, Kimin Lee, Kyle Min, Yuting Zhang, Jinwoo Shin, and Honglak Lee. 2018. Hierarchical novelty detection for visual object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1034–1042.Google ScholarCross Ref
Chee Wee Leong, Rada Mihalcea, and Samer Hassan. 2010. Text mining for automatic image tagging. In Coling 2010: Posters. 647–655.Google Scholar
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740–755.Google ScholarCross Ref
Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan L Yuille, and Kevin Murphy. 2016. Generation and comprehension of unambiguous object descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 11–20.Google ScholarCross Ref
Walaa Medhat, Ahmed Hassan, and Hoda Korashy. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams engineering journal 5, 4 (2014), 1093–1113.Google ScholarCross Ref
George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995), 39–41.Google ScholarDigital Library
Yoshiki Niwa and Yoshihiko Nitta. 1994. Co-occurrence vectors from corpora vs. distance vectors from dictionaries. In Proceedings of the 15th conference on Computational linguistics-Volume 1. Association for Computational Linguistics, 304–309.Google ScholarDigital Library
Bryan A Plummer, Liwei Wang, Chris M Cervantes, Juan C Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 2015. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In Proceedings of the IEEE international conference on computer vision. 2641–2649.Google ScholarDigital Library
Bryan A Plummer, Liwei Wang, Chris M Cervantes, Juan C Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 2017. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models. International Journal of Computer Vision 123, 1 (2017), 74–93.Google ScholarDigital Library
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115, 3 (2015), 211–252.Google ScholarDigital Library
Elisa Shearer and Katerina Eva Matsa. 2018. News Use Across Social Media Platforms 2018. https://www.journalism.org/wp-content/uploads/sites/8/2018/09/PJ_2018.09.10_social-media-news_FINAL.pdfGoogle Scholar
Peter D Turney. 2001. Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In European conference on machine learning. Springer, 491–502.Google ScholarDigital Library
Kohei Uehara, Antonio Tejero-De-Pablos, Yoshitaka Ushiku, and Tatsuya Harada. 2018. Visual question generation for class acquisition of unknown objects. In Proceedings of the European Conference on Computer Vision (ECCV). 481–496.Google ScholarCross Ref
Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics 2 (2014), 67–78. https://doi.org/10.1162/tacl_a_00166Google ScholarCross Ref
Licheng Yu, Patrick Poirson, Shan Yang, Alexander C Berg, and Tamara L Berg. 2016. Modeling context in referring expressions. In European Conference on Computer Vision. Springer, 69–85.Google ScholarCross Ref
Wei-Dong (Jackie) Zhu, Bob Foyle, Daniel Gagné, Vijay Gupta, Josemina Magdalen, Amarjeet S Mundi, Tetsuya Nasukawa, Paulis Mark, Jane Singer, and Martin Triska. 2014. IBM Watson Content Analytics: Discovering Actionable Insight from Your Content (3 ed.). An IBM Redbooks publication.Google Scholar

Index Terms

Visual Concept Naming: Discovering Well-Recognized Textual Expressions of Visual Concepts

Index terms have been assigned to the content through auto-classification.

Recommendations

VD-PCR: Improving visual dialog with pronoun coreference resolution
Highlights
- A novel framework VD-PCR to improve visual dialog models with pronoun coreference.
Abstract
The visual dialog task requires an AI agent to interact with humans in multi-round dialogs based on a visual environment. As a common linguistic phenomenon, pronouns are often used in dialogs to improve the communication efficiency. As ...
Read More
Visual analysis of documents with semantic graphs
VAKD '09: Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with Interactive Exploration

In this paper, we present a technique for visual analysis of documents based on the semantic representation of text in the form of a directed graph, referred to as semantic graph. This approach can aid data mining tasks, such as exploratory data ...
Read More
A Flexible Text Mining System for Entity and Relation Extraction in PubMed
DTMBIO '15: Proceedings of the ACM Ninth International Workshop on Data and Text Mining in Biomedical Informatics

Due to an enormous number of scientific publications that cannot be handled manually, there is a rising interest in text-mining techniques for automated information extraction, especially in the biomedical field. Such techniques provide effective means ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '20: Proceedings of The Web Conference 2020
April 2020
3143 pages
ISBN:9781450370233
DOI:10.1145/3366423
Editors:
Yennun Huang
Acadmica sinica, Taiwan
,
Irwin King
The Chinese University of Hong Kong, Hong Kong
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 April 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crowdsourcing
image search
multimodal grounding
social media analysis
text mining
vision and language
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Upcoming Conference
WWW '24

Sponsor:

sigweb

The ACM Web Conference 2024

May 13 - 17, 2024

Singapore , Singapore
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 210
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Visual Concept Naming: Discovering Well-Recognized Textual Expressions of Visual Concepts

WWW '20: Proceedings of The Web Conference 2020

ABSTRACT

References

Cited By

Index Terms

Recommendations

VD-PCR: Improving visual dialog with pronoun coreference resolution

Visual analysis of documents with semantic graphs

A Flexible Text Mining System for Entity and Relation Extraction in PubMed

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Visual Concept Naming: Discovering Well-Recognized Textual Expressions of Visual Concepts

WWW '20: Proceedings of The Web Conference 2020

ABSTRACT

References

Cited By

Index Terms

Recommendations

VD-PCR: Improving visual dialog with pronoun coreference resolution

Visual analysis of documents with semantic graphs

A Flexible Text Mining System for Entity and Relation Extraction in PubMed

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media