Abstract
With the increase in popularity of image-based applications, users are retrieving images using more sophisticated and complex queries. We present three types of complex queries, namely, long, ambiguous, and abstract. Each type of query has its own characteristics/complexities and thus leads to imprecise and incomplete image retrieval. Existing methods for image retrieval are unable to deal with the high complexity of such queries. Search engines need to integrate their image retrieval process with knowledge to obtain rich semantics for effective retrieval. We propose a framework, Image Retrieval using Knowledge Embedding (ImReKE), for embedding knowledge with images and queries, allowing retrieval approaches to understand the context of queries and images in a better way. ImReKE (IR_Approach, Knowledge_Base) takes two inputs, namely, an image retrieval approach and a knowledge base. It selects quality concepts (concepts that possess properties such as rarity, newness, etc.) from the knowledge base to provide rich semantic representations for queries and images to be leveraged by the image retrieval approach. For the first time, an effective knowledge base that exploits both the visual and textual information of concepts has been developed. Our extensive experiments demonstrate that the proposed framework improves image retrieval significantly for all types of complex queries. The improvement is remarkable in the case of abstract queries, which have not yet been dealt with explicitly in the existing literature. We also compare the quality of our knowledge base with the existing text-based knowledge bases, such as ConceptNet, ImageNet, and the like.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Image Retrieval for Complex Queries Using Knowledge Embedding
- X. S. Hua, L. Yang, J. Wang, J. Wang, M. Ye, K. Wang, Y. Rui, and J. Li. 2013. Clickage: Towards bridging semantic and intent gaps via mining click logs of search engines. In Proceedings of the ACM International Conference on Multimedia. 393--422.Google Scholar
- L. Nie, S. Yan, M. Wang, R. Hong, and T. Chua. 2012. Harvesting visual concepts for image search with complex queries. In Proceedings of the ACM International Conference on Multimedia. 59--68Google Scholar
- D. Guo and P. Gao. 2016. Complex-query web image search with concept based relevance estimation. In Proceedings of the ACM International Conference on World Wide Web. 19, 2 (2016), 247--264.Google Scholar
- C. Cui, J. Shen, Z. Chen, S. Wang, and J. Ma. 2017. Learning to rank images for complex queries in concept-based search. Neurocomputing 274 (2017), 19--28.Google ScholarDigital Library
- H. Chen, A. Trouve, K. J. Murakami, and A. Fukuda. 2017. Semantic image retrieval for complex queries using a knowledge parser. Multimedia Tools and Applications 77, 9 (2017), 10733--10751.Google ScholarDigital Library
- B. Siddiquie, B. White, A. Sharma, and L. S. Davis. 2014. Multi-modal image retrieval for complex queries using small codes. In Proceedings of the ACM International Conference on Multimedia Retrieval. 321--328.Google Scholar
- X. Qian, D. Lu, Y. Wang, L. Zhu, Y. Y. Tang, and M. Wang. 2017. Image re-ranking based on topic diversity. IEEE Trans. Image Procesing 26, 8 (2017), 3734--3747.Google ScholarDigital Library
- M. Wang, K. Yang, X.-S. Hua, and H.-J. Zhang. 2010. Towards a relevant and diverse search of social images. In IEEE Trans. Multimedia 12, 8 (2010), 829--842.Google ScholarDigital Library
- A. Ksibi, A. Ben Ammar, and C. Ben Amar. 2014. Adaptive diversification for tag-based social image retrieval. International Journal of Multimedia Information Retrieval 3, 1 (2014), 29--39Google ScholarCross Ref
- G. A. Miller. 1995. WordNet: A lexical database for English. Communications of the ACM 38, 11 (1995), 39--41.Google ScholarDigital Library
- H. Liu and P. Singh. 2004. ConceptNet—A practical commonsense reasoning tool-kit. BT Technology Journal 22, 4 (2004), 211--226.Google ScholarDigital Library
- N. Tandon, F. Suchanek, and G. Weikum. 2014. WebChild : Harvesting and organizing commonsense knowledge from the Web. In Proceedings of the ACM Conference on Web Search and Data Mining. 523--532.Google Scholar
- W. Wu, H. Li, H. Wang, and K. Q. Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In Proceedings of the ACM SIGMOD. 481--492.Google Scholar
- T. Mitchell. 2015. Never-ending learning. In AAAI Conference on Artificial Intelligence.Google Scholar
- J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google Scholar
- X. Chen. 2013. NEIL: Extracting visual knowledge from Web data. In Proceedings of the IEEE International Conference on Computer Vision. 1409--1416.Google ScholarDigital Library
- M. Guillaumin, T. Mensink, J. Verbeek, C. Schmid, and C. Schmid TagProp. 2009. TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Proceedings of the IEEE International Conference on Computer Vision. 309--316.Google ScholarCross Ref
- V. Ordonez, W. Liu, J. Deng, Y. Choi, A. C. Berg, and T. L. Berg. 2015. Predicting entry-level categories. International Journal of Computer Vision 115, 1 (2015), 29--43.Google ScholarDigital Library
- A. Mathews, L. Xie, and X. He. 2015. Choosing basic-level concept names using visual and language context. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 595--602.Google Scholar
- M. Chen, A. Zheng, and S. Louis. 2013. Fast Image tagging. In Proceedings of the International Conference on Machine Learning. 1274--1282.Google Scholar
- R. Aly, D. Hiemstra, and R. Ordelman. 2007. Building detectors to support searches on combined semantic concepts. In Multimedia Information Retrieval Workshop. 40--45.Google Scholar
- A. P. Natsev and M. R. Naphade. 2005. Learning the semantics of multimedia queries and concepts from a small number of examples. In Proceedings of the ACM International Conference on Multimedia. 598--607.Google Scholar
- X. Li, C. G. M. Snoek, S. Member, M. Worring, and A. W. M. Smeulders. 2012. Harvesting social Images for Bi-concept search. IEEE Trans. on Multimedia 14, 4 (2012), 1091--1104Google ScholarDigital Library
- C. Chaudhary, P. Goyal, J. R. A. Moniz, N. Goyal, and Y. P. P. Chen. 2018. Linguistic patterns and cross modality-based image retrieval for complex queries. In Proceedings of International Conference on Multimedia Retrieval. 257--265.Google ScholarDigital Library
- S. N. Chowdhury, N. Tandon, and G. Weikum. 2018. VISIR: Visual and semantic image label refinement. In Proceedings of the ACM Conference on Web Search and Data Mining. 117--125.Google Scholar
- P. Cui, S. Liu, and W. Zhu. 2018. General knowledge embedded image representation learning. IEEE Trans. on Multimedia 20, 1 (2018), 198--207.Google ScholarDigital Library
- J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence 194, 4 (2013), 28--61.Google ScholarDigital Library
- O. Etzioni, A. Popescu, D. S. Weld, D. Downey, and A. Yates. 2004. Web-scale information extraction in KnowItAll (Preliminary Results). In Proceedings of the ACM International Conference on WWW. 100--110.Google Scholar
- C. Bizer et al. 2009. Dbpedia--A crystallization point for the Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web 7 (2009), 154--165.Google ScholarDigital Library
- S. Belongie and P. Perona. 2016. Visipedia circa 2015. Pattern Recognition Letters 72 (2016), 15--24.Google ScholarDigital Library
- B. Gao, T.-Y. Liu, T. Qin, X. Zheng, Q.-S. Cheng, and W.-Y. Ma. 2005. Web image clustering by consistent utilization of visual features and surrounding texts. In Proceedings of the ACM International Conference on Multimedia. 112--121.Google ScholarDigital Library
- P.-A. Moellic, J.-E. Haugeard, and G. Pitel. 2008. Image clustering based on a shared nearest neighbors approach for tagged collections. In Proceedings of the International Conference on Content-Based Image and Video Retrieval. 269--278.Google Scholar
- D. J. Joshi, R. Datta, Z. Zhuang, W. Weiss, M. Friedenberg, J. Li, and J. Wang. 2006. Paragrab: A comprehensive architecture for web image management and multimodal querying. In Proceedings of the International Conference on Very Large Data Bases. 1163--1166.Google Scholar
- E. Hoque, G. Strong, O. Hoeber, and M. Gong. 2011. Conceptual query expansion and visual search results exploration for Web image retrieval. In Proceedings of the Atlantic Web Intelligence Conference. 73--82.Google Scholar
- My D. Myoupo, A. Popescu, H. L. Borgne, and P. A. Moëllic. 2009. Multimodal image retrieval over a large database. In Proceedings of the International Conference on Cross-language Evaluation Forum. 177--184.Google Scholar
- I. H. Witten, and D. Milne. 2008. An effective, LowCost measure of semantic relatedness obtained from Wikipedia links. In Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, AAAI Press, Chicago. 25--30.Google Scholar
- X. Tang, K. Liu, J. Cui, F. Wen and X. Wang. 2012. IntentSearch: Capturing user intention for one-click internet image search. IEEE Trans. on Pattern Analysis and Machine Intelligence 34, 7 (2012), 1342--1353.Google ScholarDigital Library
- Z. Zha, L. Yang, T. Mei, M. Wang, and Z. Wang. 2009. Visual query suggestion. In Proceedings of the ACM International Conference on Multimedia. 15--24.Google Scholar
- D. Qin, S. Gammeter, L. Bossard, T. Quack, and L. Van Gool. 2011. Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 777--784.Google Scholar
- M. De Marneffe and C. D. Manning. 2008. The Stanford typed dependencies representation. In ACM Workshop on Cross-Framework and Cross-Domain Parser Evaluation. 1--8Google Scholar
- M. A. Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the Coling. 539--545.Google ScholarDigital Library
- R. Snow, D. Jurafsky, and A. Y. Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. In Proceedings of the Neural Information Processing Systems. 1297--1304.Google Scholar
- N. Tandon, C. Hariman, J. Urbani, A. Rohrbach, M. Rohrbach, and G. Weikum. 2016. Commonsense in Parts: Mining part-whole relations from the web and image tags. In Proceedings of the AAAI Conference on Artificial Intelligence. 243--250.Google Scholar
- Y. Gao, M. Wang, Z. J. Zha, J. Shen, X. Li, and X. Wu. 2013. Visual-textual joint relevance learning for tag-based social image search. IEEE Transactions on Image Processing 22, 1 (2013), 363--376.Google ScholarDigital Library
- K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations, arXiv:1409.1556v6.Google Scholar
- Oxford Dictionary. Retrieved from https://en.oxforddictionaries.com.Google Scholar
- Cambridge Dictionary. Retrieved from https://dictionary.cambridge.org/.Google Scholar
- T. S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. T. Zheng. 2009. NUS-wide: A real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 8--10.Google Scholar
- S. Wang, Y. Chen, J. Zhuo, Q. Huang, and Q. Tian. 2018. Joint global and coattentive representation learning for image-sentence retrieval. In Proceedings of the ACM International Conference on Multimedia. 1398--1406.Google Scholar
- Y. Wu, S. Wang, and Q. Huang. 2018. Learning semantic structure-preserved embeddings for cross-modal retrieval. In Proceedings of the ACM International Conference on Multimedia. 825--833.Google Scholar
- P. Isola, D. Zoran, D. Krishnan, and E. H. Adelson, 2014. Crisp boundary detection using pointwise mutual information. In Proceedings of the European Conference on Computer Vision. 799--814.Google Scholar
- S. Wang and S. Jiang. 2015. Instre: A new benchmark for instance-level object retrieval and recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM). 11, 3 (2015), 37.Google Scholar
Index Terms
- Image Retrieval for Complex Queries Using Knowledge Embedding
Recommendations
Linguistic Patterns and Cross Modality-based Image Retrieval for Complex Queries
ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia RetrievalWith the rising prevalence of social media, coupled with the ease of sharing images, people with specific needs and applications such as known item search, multimedia question answering, etc., have started searching for visual content, which is ...
Enhancing image retrieval for complex queries using external knowledge sources
AbstractAnnotation-based image retrieval associates textual descriptions to images based on human perception. A user query, composed of keywords of choice and for retrieval, are usually matched lexically with the textual descriptions associated for stored ...
Semantic image retrieval for complex queries using a knowledge parser
In order to improve the retrieval accuracy of image retrieval systems, research focus has been shifted from designing sophisticated low-level feature extraction algorithms to combining image retrieval processing with rich semantics and knowledge-based ...
Comments