research-article

Image Retrieval for Complex Queries Using Knowledge Embedding

Authors:
Chandramani Chaudhary

BITS Pilani, Pilani Campus, Pilani, India

BITS Pilani, Pilani Campus, Pilani, India
View Profile

,
Poonam Goyal

BITS Pilani, Pilani Campus, Pilani, India

BITS Pilani, Pilani Campus, Pilani, India
View Profile

,
Navneet Goyal

BITS Pilani, Pilani Campus, Pilani, India

BITS Pilani, Pilani Campus, Pilani, India
View Profile

,
Yi-Ping Phoebe Chen

La Trobe University, Melbourne, Australia

La Trobe University, Melbourne, Australia

0000-0002-4122-3767
View Profile

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 16 Issue 1Article No.: 13pp 1–23https://doi.org/10.1145/3375786

Published:29 March 2020Publication History

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

With the increase in popularity of image-based applications, users are retrieving images using more sophisticated and complex queries. We present three types of complex queries, namely, long, ambiguous, and abstract. Each type of query has its own characteristics/complexities and thus leads to imprecise and incomplete image retrieval. Existing methods for image retrieval are unable to deal with the high complexity of such queries. Search engines need to integrate their image retrieval process with knowledge to obtain rich semantics for effective retrieval. We propose a framework, Image Retrieval using Knowledge Embedding (ImReKE), for embedding knowledge with images and queries, allowing retrieval approaches to understand the context of queries and images in a better way. ImReKE (IR_Approach, Knowledge_Base) takes two inputs, namely, an image retrieval approach and a knowledge base. It selects quality concepts (concepts that possess properties such as rarity, newness, etc.) from the knowledge base to provide rich semantic representations for queries and images to be leveraged by the image retrieval approach. For the first time, an effective knowledge base that exploits both the visual and textual information of concepts has been developed. Our extensive experiments demonstrate that the proposed framework improves image retrieval significantly for all types of complex queries. The improvement is remarkable in the case of abstract queries, which have not yet been dealt with explicitly in the existing literature. We also compare the quality of our knowledge base with the existing text-based knowledge bases, such as ConceptNet, ImageNet, and the like.

Supplemental Material

Available for Download

zip

chaudhary.zip (363.9 KB)

Supplemental movie, appendix, image and software files for, Image Retrieval for Complex Queries Using Knowledge Embedding

References

X. S. Hua, L. Yang, J. Wang, J. Wang, M. Ye, K. Wang, Y. Rui, and J. Li. 2013. Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines. In Proceedings of the ACM International Conference on Multimedia. 393--422.Google Scholar
L. Nie, S. Yan, M. Wang, R. Hong, and T. Chua. 2012. Harvesting visual concepts for image search with complex queries. In Proceedings of the ACM International Conference on Multimedia. 59--68Google Scholar
D. Guo and P. Gao. 2016. Complex-query web image search with concept based relevance estimation. In Proceedings of the ACM International Conference on World Wide Web. 19, 2 (2016), 247--264.Google Scholar
C. Cui, J. Shen, Z. Chen, S. Wang, and J. Ma. 2017. Learning to rank images for complex queries in concept-based search. Neurocomputing 274 (2017), 19--28.Google ScholarDigital Library
H. Chen, A. Trouve, K. J. Murakami, and A. Fukuda. 2017. Semantic image retrieval for complex queries using a knowledge parser. Multimedia Tools and Applications 77, 9 (2017), 10733--10751.Google ScholarDigital Library
B. Siddiquie, B. White, A. Sharma, and L. S. Davis. 2014. Multi-modal image retrieval for complex queries using small codes. In Proceedings of the ACM International Conference on Multimedia Retrieval. 321--328.Google Scholar
X. Qian, D. Lu, Y. Wang, L. Zhu, Y. Y. Tang, and M. Wang. 2017. Image re-ranking based on topic diversity. IEEE Trans. Image Procesing 26, 8 (2017), 3734--3747.Google ScholarDigital Library
M. Wang, K. Yang, X.-S. Hua, and H.-J. Zhang. 2010. Towards a relevant and diverse search of social images. In IEEE Trans. Multimedia 12, 8 (2010), 829--842.Google ScholarDigital Library
A. Ksibi, A. Ben Ammar, and C. Ben Amar. 2014. Adaptive diversification for tag-based social image retrieval. International Journal of Multimedia Information Retrieval 3, 1 (2014), 29--39Google ScholarCross Ref
G. A. Miller. 1995. WordNet: A lexical database for English. Communications of the ACM 38, 11 (1995), 39--41.Google ScholarDigital Library
H. Liu and P. Singh. 2004. ConceptNet—A practical commonsense reasoning tool-kit. BT Technology Journal 22, 4 (2004), 211--226.Google ScholarDigital Library
N. Tandon, F. Suchanek, and G. Weikum. 2014. WebChild : Harvesting and organizing commonsense knowledge from the Web. In Proceedings of the ACM Conference on Web Search and Data Mining. 523--532.Google Scholar
W. Wu, H. Li, H. Wang, and K. Q. Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In Proceedings of the ACM SIGMOD. 481--492.Google Scholar
T. Mitchell. 2015. Never-ending learning. In AAAI Conference on Artificial Intelligence.Google Scholar
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google Scholar
X. Chen. 2013. NEIL: Extracting visual knowledge from Web data. In Proceedings of the IEEE International Conference on Computer Vision. 1409--1416.Google ScholarDigital Library
M. Guillaumin, T. Mensink, J. Verbeek, C. Schmid, and C. Schmid TagProp. 2009. TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Proceedings of the IEEE International Conference on Computer Vision. 309--316.Google ScholarCross Ref
V. Ordonez, W. Liu, J. Deng, Y. Choi, A. C. Berg, and T. L. Berg. 2015. Predicting entry-level categories. International Journal of Computer Vision 115, 1 (2015), 29--43.Google ScholarDigital Library
A. Mathews, L. Xie, and X. He. 2015. Choosing basic-level concept names using visual and language context. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 595--602.Google Scholar
M. Chen, A. Zheng, and S. Louis. 2013. Fast Image tagging. In Proceedings of the International Conference on Machine Learning. 1274--1282.Google Scholar
R. Aly, D. Hiemstra, and R. Ordelman. 2007. Building detectors to support searches on combined semantic concepts. In Multimedia Information Retrieval Workshop. 40--45.Google Scholar
A. P. Natsev and M. R. Naphade. 2005. Learning the semantics of multimedia queries and concepts from a small number of examples. In Proceedings of the ACM International Conference on Multimedia. 598--607.Google Scholar
X. Li, C. G. M. Snoek, S. Member, M. Worring, and A. W. M. Smeulders. 2012. Harvesting social Images for Bi-concept search. IEEE Trans. on Multimedia 14, 4 (2012), 1091--1104Google ScholarDigital Library
C. Chaudhary, P. Goyal, J. R. A. Moniz, N. Goyal, and Y. P. P. Chen. 2018. Linguistic patterns and cross modality-based image retrieval for complex queries. In Proceedings of International Conference on Multimedia Retrieval. 257--265.Google ScholarDigital Library
S. N. Chowdhury, N. Tandon, and G. Weikum. 2018. VISIR: Visual and semantic image label refinement. In Proceedings of the ACM Conference on Web Search and Data Mining. 117--125.Google Scholar
P. Cui, S. Liu, and W. Zhu. 2018. General knowledge embedded image representation learning. IEEE Trans. on Multimedia 20, 1 (2018), 198--207.Google ScholarDigital Library
J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence 194, 4 (2013), 28--61.Google ScholarDigital Library
O. Etzioni, A. Popescu, D. S. Weld, D. Downey, and A. Yates. 2004. Web-scale information extraction in KnowItAll (Preliminary Results). In Proceedings of the ACM International Conference on WWW. 100--110.Google Scholar
C. Bizer et al. 2009. Dbpedia--A crystallization point for the Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web 7 (2009), 154--165.Google ScholarDigital Library
S. Belongie and P. Perona. 2016. Visipedia circa 2015. Pattern Recognition Letters 72 (2016), 15--24.Google ScholarDigital Library
B. Gao, T.-Y. Liu, T. Qin, X. Zheng, Q.-S. Cheng, and W.-Y. Ma. 2005. Web image clustering by consistent utilization of visual features and surrounding texts. In Proceedings of the ACM International Conference on Multimedia. 112--121.Google ScholarDigital Library
P.-A. Moellic, J.-E. Haugeard, and G. Pitel. 2008. Image clustering based on a shared nearest neighbors approach for tagged collections. In Proceedings of the International Conference on Content-Based Image and Video Retrieval. 269--278.Google Scholar
D. J. Joshi, R. Datta, Z. Zhuang, W. Weiss, M. Friedenberg, J. Li, and J. Wang. 2006. Paragrab: A comprehensive architecture for web image management and multimodal querying. In Proceedings of the International Conference on Very Large Data Bases. 1163--1166.Google Scholar
E. Hoque, G. Strong, O. Hoeber, and M. Gong. 2011. Conceptual query expansion and visual search results exploration for Web image retrieval. In Proceedings of the Atlantic Web Intelligence Conference. 73--82.Google Scholar
My D. Myoupo, A. Popescu, H. L. Borgne, and P. A. Moëllic. 2009. Multimodal image retrieval over a large database. In Proceedings of the International Conference on Cross-language Evaluation Forum. 177--184.Google Scholar
I. H. Witten, and D. Milne. 2008. An effective, LowCost measure of semantic relatedness obtained from Wikipedia links. In Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, AAAI Press, Chicago. 25--30.Google Scholar
X. Tang, K. Liu, J. Cui, F. Wen and X. Wang. 2012. IntentSearch: Capturing user intention for one-click internet image search. IEEE Trans. on Pattern Analysis and Machine Intelligence 34, 7 (2012), 1342--1353.Google ScholarDigital Library
Z. Zha, L. Yang, T. Mei, M. Wang, and Z. Wang. 2009. Visual query suggestion. In Proceedings of the ACM International Conference on Multimedia. 15--24.Google Scholar
D. Qin, S. Gammeter, L. Bossard, T. Quack, and L. Van Gool. 2011. Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 777--784.Google Scholar
M. De Marneffe and C. D. Manning. 2008. The Stanford typed dependencies representation. In ACM Workshop on Cross-Framework and Cross-Domain Parser Evaluation. 1--8Google Scholar
M. A. Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the Coling. 539--545.Google ScholarDigital Library
R. Snow, D. Jurafsky, and A. Y. Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. In Proceedings of the Neural Information Processing Systems. 1297--1304.Google Scholar
N. Tandon, C. Hariman, J. Urbani, A. Rohrbach, M. Rohrbach, and G. Weikum. 2016. Commonsense in Parts: Mining part-whole relations from the web and image tags. In Proceedings of the AAAI Conference on Artificial Intelligence. 243--250.Google Scholar
Y. Gao, M. Wang, Z. J. Zha, J. Shen, X. Li, and X. Wu. 2013. Visual-textual joint relevance learning for tag-based social image search. IEEE Transactions on Image Processing 22, 1 (2013), 363--376.Google ScholarDigital Library
K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations, arXiv:1409.1556v6.Google Scholar
Oxford Dictionary. Retrieved from https://en.oxforddictionaries.com.Google Scholar
Cambridge Dictionary. Retrieved from https://dictionary.cambridge.org/.Google Scholar
T. S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. T. Zheng. 2009. NUS-wide: A real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 8--10.Google Scholar
S. Wang, Y. Chen, J. Zhuo, Q. Huang, and Q. Tian. 2018. Joint global and coattentive representation learning for image-sentence retrieval. In Proceedings of the ACM International Conference on Multimedia. 1398--1406.Google Scholar
Y. Wu, S. Wang, and Q. Huang. 2018. Learning semantic structure-preserved embeddings for cross-modal retrieval. In Proceedings of the ACM International Conference on Multimedia. 825--833.Google Scholar
P. Isola, D. Zoran, D. Krishnan, and E. H. Adelson, 2014. Crisp boundary detection using pointwise mutual information. In Proceedings of the European Conference on Computer Vision. 799--814.Google Scholar
S. Wang and S. Jiang. 2015. Instre: A new benchmark for instance-level object retrieval and recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM). 11, 3 (2015), 37.Google Scholar

Index Terms

Image Retrieval for Complex Queries Using Knowledge Embedding
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Image search

Recommendations

Linguistic Patterns and Cross Modality-based Image Retrieval for Complex Queries
ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

With the rising prevalence of social media, coupled with the ease of sharing images, people with specific needs and applications such as known item search, multimedia question answering, etc., have started searching for visual content, which is ...
Read More
Enhancing image retrieval for complex queries using external knowledge sources
Abstract
Annotation-based image retrieval associates textual descriptions to images based on human perception. A user query, composed of keywords of choice and for retrieval, are usually matched lexically with the textual descriptions associated for stored ...
Read More
Semantic image retrieval for complex queries using a knowledge parser

In order to improve the retrieval accuracy of image retrieval systems, research focus has been shifted from designing sophisticated low-level feature extraction algorithms to combining image retrieval processing with rich semantics and knowledge-based ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Multimedia Computing, Communications, and Applications Volume 16, Issue 1
February 2020
363 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3384216
Editor:
Alberto Del Bimbo
University of Firenze, Italy
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 March 2020
- Accepted: 1 December 2019
- Revised: 1 October 2019
- Received: 1 January 2019
Published in tomm Volume 16, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Image retrieval
ambiguous query
complex query
diversity
knowledge base
knowledge embedding
query expansion
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 441
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Image Retrieval for Complex Queries Using Knowledge Embedding

ACM Transactions on Multimedia Computing, Communications, and Applications

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Linguistic Patterns and Cross Modality-based Image Retrieval for Complex Queries

Enhancing image retrieval for complex queries using external knowledge sources

Semantic image retrieval for complex queries using a knowledge parser