skip to main content
research-article

Image Retrieval for Complex Queries Using Knowledge Embedding

Published:29 March 2020Publication History
Skip Abstract Section

Abstract

With the increase in popularity of image-based applications, users are retrieving images using more sophisticated and complex queries. We present three types of complex queries, namely, long, ambiguous, and abstract. Each type of query has its own characteristics/complexities and thus leads to imprecise and incomplete image retrieval. Existing methods for image retrieval are unable to deal with the high complexity of such queries. Search engines need to integrate their image retrieval process with knowledge to obtain rich semantics for effective retrieval. We propose a framework, Image Retrieval using Knowledge Embedding (ImReKE), for embedding knowledge with images and queries, allowing retrieval approaches to understand the context of queries and images in a better way. ImReKE (IR_Approach, Knowledge_Base) takes two inputs, namely, an image retrieval approach and a knowledge base. It selects quality concepts (concepts that possess properties such as rarity, newness, etc.) from the knowledge base to provide rich semantic representations for queries and images to be leveraged by the image retrieval approach. For the first time, an effective knowledge base that exploits both the visual and textual information of concepts has been developed. Our extensive experiments demonstrate that the proposed framework improves image retrieval significantly for all types of complex queries. The improvement is remarkable in the case of abstract queries, which have not yet been dealt with explicitly in the existing literature. We also compare the quality of our knowledge base with the existing text-based knowledge bases, such as ConceptNet, ImageNet, and the like.

Skip Supplemental Material Section

Supplemental Material

References

  1. X. S. Hua, L. Yang, J. Wang, J. Wang, M. Ye, K. Wang, Y. Rui, and J. Li. 2013. Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines. In Proceedings of the ACM International Conference on Multimedia. 393--422.Google ScholarGoogle Scholar
  2. L. Nie, S. Yan, M. Wang, R. Hong, and T. Chua. 2012. Harvesting visual concepts for image search with complex queries. In Proceedings of the ACM International Conference on Multimedia. 59--68Google ScholarGoogle Scholar
  3. D. Guo and P. Gao. 2016. Complex-query web image search with concept based relevance estimation. In Proceedings of the ACM International Conference on World Wide Web. 19, 2 (2016), 247--264.Google ScholarGoogle Scholar
  4. C. Cui, J. Shen, Z. Chen, S. Wang, and J. Ma. 2017. Learning to rank images for complex queries in concept-based search. Neurocomputing 274 (2017), 19--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. Chen, A. Trouve, K. J. Murakami, and A. Fukuda. 2017. Semantic image retrieval for complex queries using a knowledge parser. Multimedia Tools and Applications 77, 9 (2017), 10733--10751.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Siddiquie, B. White, A. Sharma, and L. S. Davis. 2014. Multi-modal image retrieval for complex queries using small codes. In Proceedings of the ACM International Conference on Multimedia Retrieval. 321--328.Google ScholarGoogle Scholar
  7. X. Qian, D. Lu, Y. Wang, L. Zhu, Y. Y. Tang, and M. Wang. 2017. Image re-ranking based on topic diversity. IEEE Trans. Image Procesing 26, 8 (2017), 3734--3747.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Wang, K. Yang, X.-S. Hua, and H.-J. Zhang. 2010. Towards a relevant and diverse search of social images. In IEEE Trans. Multimedia 12, 8 (2010), 829--842.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Ksibi, A. Ben Ammar, and C. Ben Amar. 2014. Adaptive diversification for tag-based social image retrieval. International Journal of Multimedia Information Retrieval 3, 1 (2014), 29--39Google ScholarGoogle ScholarCross RefCross Ref
  10. G. A. Miller. 1995. WordNet: A lexical database for English. Communications of the ACM 38, 11 (1995), 39--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Liu and P. Singh. 2004. ConceptNet—A practical commonsense reasoning tool-kit. BT Technology Journal 22, 4 (2004), 211--226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Tandon, F. Suchanek, and G. Weikum. 2014. WebChild : Harvesting and organizing commonsense knowledge from the Web. In Proceedings of the ACM Conference on Web Search and Data Mining. 523--532.Google ScholarGoogle Scholar
  13. W. Wu, H. Li, H. Wang, and K. Q. Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In Proceedings of the ACM SIGMOD. 481--492.Google ScholarGoogle Scholar
  14. T. Mitchell. 2015. Never-ending learning. In AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  15. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarGoogle Scholar
  16. X. Chen. 2013. NEIL: Extracting visual knowledge from Web data. In Proceedings of the IEEE International Conference on Computer Vision. 1409--1416.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Guillaumin, T. Mensink, J. Verbeek, C. Schmid, and C. Schmid TagProp. 2009. TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Proceedings of the IEEE International Conference on Computer Vision. 309--316.Google ScholarGoogle ScholarCross RefCross Ref
  18. V. Ordonez, W. Liu, J. Deng, Y. Choi, A. C. Berg, and T. L. Berg. 2015. Predicting entry-level categories. International Journal of Computer Vision 115, 1 (2015), 29--43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Mathews, L. Xie, and X. He. 2015. Choosing basic-level concept names using visual and language context. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 595--602.Google ScholarGoogle Scholar
  20. M. Chen, A. Zheng, and S. Louis. 2013. Fast Image tagging. In Proceedings of the International Conference on Machine Learning. 1274--1282.Google ScholarGoogle Scholar
  21. R. Aly, D. Hiemstra, and R. Ordelman. 2007. Building detectors to support searches on combined semantic concepts. In Multimedia Information Retrieval Workshop. 40--45.Google ScholarGoogle Scholar
  22. A. P. Natsev and M. R. Naphade. 2005. Learning the semantics of multimedia queries and concepts from a small number of examples. In Proceedings of the ACM International Conference on Multimedia. 598--607.Google ScholarGoogle Scholar
  23. X. Li, C. G. M. Snoek, S. Member, M. Worring, and A. W. M. Smeulders. 2012. Harvesting social Images for Bi-concept search. IEEE Trans. on Multimedia 14, 4 (2012), 1091--1104Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Chaudhary, P. Goyal, J. R. A. Moniz, N. Goyal, and Y. P. P. Chen. 2018. Linguistic patterns and cross modality-based image retrieval for complex queries. In Proceedings of International Conference on Multimedia Retrieval. 257--265.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. N. Chowdhury, N. Tandon, and G. Weikum. 2018. VISIR: Visual and semantic image label refinement. In Proceedings of the ACM Conference on Web Search and Data Mining. 117--125.Google ScholarGoogle Scholar
  26. P. Cui, S. Liu, and W. Zhu. 2018. General knowledge embedded image representation learning. IEEE Trans. on Multimedia 20, 1 (2018), 198--207.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence 194, 4 (2013), 28--61.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. O. Etzioni, A. Popescu, D. S. Weld, D. Downey, and A. Yates. 2004. Web-scale information extraction in KnowItAll (Preliminary Results). In Proceedings of the ACM International Conference on WWW. 100--110.Google ScholarGoogle Scholar
  29. C. Bizer et al. 2009. Dbpedia--A crystallization point for the Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web 7 (2009), 154--165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Belongie and P. Perona. 2016. Visipedia circa 2015. Pattern Recognition Letters 72 (2016), 15--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. B. Gao, T.-Y. Liu, T. Qin, X. Zheng, Q.-S. Cheng, and W.-Y. Ma. 2005. Web image clustering by consistent utilization of visual features and surrounding texts. In Proceedings of the ACM International Conference on Multimedia. 112--121.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P.-A. Moellic, J.-E. Haugeard, and G. Pitel. 2008. Image clustering based on a shared nearest neighbors approach for tagged collections. In Proceedings of the International Conference on Content-Based Image and Video Retrieval. 269--278.Google ScholarGoogle Scholar
  33. D. J. Joshi, R. Datta, Z. Zhuang, W. Weiss, M. Friedenberg, J. Li, and J. Wang. 2006. Paragrab: A comprehensive architecture for web image management and multimodal querying. In Proceedings of the International Conference on Very Large Data Bases. 1163--1166.Google ScholarGoogle Scholar
  34. E. Hoque, G. Strong, O. Hoeber, and M. Gong. 2011. Conceptual query expansion and visual search results exploration for Web image retrieval. In Proceedings of the Atlantic Web Intelligence Conference. 73--82.Google ScholarGoogle Scholar
  35. My D. Myoupo, A. Popescu, H. L. Borgne, and P. A. Moëllic. 2009. Multimodal image retrieval over a large database. In Proceedings of the International Conference on Cross-language Evaluation Forum. 177--184.Google ScholarGoogle Scholar
  36. I. H. Witten, and D. Milne. 2008. An effective, LowCost measure of semantic relatedness obtained from Wikipedia links. In Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, AAAI Press, Chicago. 25--30.Google ScholarGoogle Scholar
  37. X. Tang, K. Liu, J. Cui, F. Wen and X. Wang. 2012. IntentSearch: Capturing user intention for one-click internet image search. IEEE Trans. on Pattern Analysis and Machine Intelligence 34, 7 (2012), 1342--1353.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Z. Zha, L. Yang, T. Mei, M. Wang, and Z. Wang. 2009. Visual query suggestion. In Proceedings of the ACM International Conference on Multimedia. 15--24.Google ScholarGoogle Scholar
  39. D. Qin, S. Gammeter, L. Bossard, T. Quack, and L. Van Gool. 2011. Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 777--784.Google ScholarGoogle Scholar
  40. M. De Marneffe and C. D. Manning. 2008. The Stanford typed dependencies representation. In ACM Workshop on Cross-Framework and Cross-Domain Parser Evaluation. 1--8Google ScholarGoogle Scholar
  41. M. A. Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the Coling. 539--545.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. R. Snow, D. Jurafsky, and A. Y. Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. In Proceedings of the Neural Information Processing Systems. 1297--1304.Google ScholarGoogle Scholar
  43. N. Tandon, C. Hariman, J. Urbani, A. Rohrbach, M. Rohrbach, and G. Weikum. 2016. Commonsense in Parts: Mining part-whole relations from the web and image tags. In Proceedings of the AAAI Conference on Artificial Intelligence. 243--250.Google ScholarGoogle Scholar
  44. Y. Gao, M. Wang, Z. J. Zha, J. Shen, X. Li, and X. Wu. 2013. Visual-textual joint relevance learning for tag-based social image search. IEEE Transactions on Image Processing 22, 1 (2013), 363--376.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations, arXiv:1409.1556v6.Google ScholarGoogle Scholar
  46. Oxford Dictionary. Retrieved from https://en.oxforddictionaries.com.Google ScholarGoogle Scholar
  47. Cambridge Dictionary. Retrieved from https://dictionary.cambridge.org/.Google ScholarGoogle Scholar
  48. T. S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. T. Zheng. 2009. NUS-wide: A real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 8--10.Google ScholarGoogle Scholar
  49. S. Wang, Y. Chen, J. Zhuo, Q. Huang, and Q. Tian. 2018. Joint global and coattentive representation learning for image-sentence retrieval. In Proceedings of the ACM International Conference on Multimedia. 1398--1406.Google ScholarGoogle Scholar
  50. Y. Wu, S. Wang, and Q. Huang. 2018. Learning semantic structure-preserved embeddings for cross-modal retrieval. In Proceedings of the ACM International Conference on Multimedia. 825--833.Google ScholarGoogle Scholar
  51. P. Isola, D. Zoran, D. Krishnan, and E. H. Adelson, 2014. Crisp boundary detection using pointwise mutual information. In Proceedings of the European Conference on Computer Vision. 799--814.Google ScholarGoogle Scholar
  52. S. Wang and S. Jiang. 2015. Instre: A new benchmark for instance-level object retrieval and recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM). 11, 3 (2015), 37.Google ScholarGoogle Scholar

Index Terms

  1. Image Retrieval for Complex Queries Using Knowledge Embedding

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 1
      February 2020
      363 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3384216
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 March 2020
      • Accepted: 1 December 2019
      • Revised: 1 October 2019
      • Received: 1 January 2019
      Published in tomm Volume 16, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format