Ad-hoc Video Search Improved by the Word Sense Filtering of Query Terms

Hirakawa, Koji; Kikuchi, Kotaro; Ueki, Kazuya; Kobayashi, Tetsunori; Hayashi, Yoshihiko

doi:10.1007/978-3-030-03520-4_15

Koji Hirakawa²²,
Kotaro Kikuchi²²,
Kazuya Ueki^22,23,
Tetsunori Kobayashi²² &
…
Yoshihiko Hayashi²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11292))

Included in the following conference series:

Asia Information Retrieval Symposium

477 Accesses

Abstract

The performances of an ad-hoc video search (AVS) task can only be improved when the video processing for analyzing video contents and the linguistic processing for interpreting natural language queries are nicely combined. Among the several issues associated with this challenging task, this paper particularly focuses on the sense disambiguation/filtering (WSD/WSF) of the terms contained in a search query. We propose WSD/WSF methods which employ distributed sense representations, and discuss their efficacy in improving the performance of an AVS system which makes full use of a large bank of visual concept classifiers. The application of a WSD/WSF method is crucial, as each visual concept classifier is linked with the lexical concept denoted by a word sense. The results are generally promising, outperforming not only a baseline query processing method that only considers the polysemy of a query term but also a strong WSD baseline method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://trecvid.nist.gov/.
2.
A WordNet synset denotes a lexical concept. It is defined by a set of synonymous word senses. A word, more precisely a word form, generally has multiple senses and each sense denotes a unique synset.
3.
Refer to [9] for the list of employed classifiers.
4.
We employed the MS COCO dataset available at http://cocodataset.org.
5.
https://code.google.com/archive/p/word2vec/.
6.
The official evaluation metrics adopted by TRECVID AVS is a variant of usual mAP.
7.
As the number of queries in the TRECVID AVS task is as small as 30, the WSD accuracies by the presented methods are quite unstable. We could not observe any statistical significance. However the DistSim method slightly outperformed two other methods in precision: 0.892 (DistSim) to 0.890 (MFS) and 0.888 (SimSum).

References

Awad, G., et al.: Trecvid 2017: evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In: Proceedings of TRECVID 2017. NIST, USA (2017)
Google Scholar
Deng, J., Dong, W., Socher, R., jia Li, L., Li, K., Fei-fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Inoue, N., Shinoda, K.: Semantic indexing for large-scale video retrieval. ITE Trans. Media Technol. Appl. 4(3), 209–217 (2016). https://doi.org/10.3169/mta.4.209
Article Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS 2013, pp. 3111–3119. Curran Associates Inc., USA (2013)
Google Scholar
Miller, G.A., Fellbaum, C.: Wordnet then and now. Lang. Resour. Eval. 41(2), 209–214 (2007). https://doi.org/10.1007/s10579-007-9044-6
Article Google Scholar
Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. 41(2), 1–69 (2009)
Article Google Scholar
Rothe, S., Schütze, H.: Autoextend: combining word embeddings with semantic resources. Comput. Linguist. 43(3), 593–617 (2017)
Article MathSciNet Google Scholar
Snow, R., Prakash, S., Jurafsky, D., Ng, A.Y.: Learning to merge word senses. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (2007)
Google Scholar
Ueki, K., Hirakawa, K., Kikuchi, K., Ogawa, T., Kobayashi, T.: Waseda$\_$meisei at trecvid 2017: Ad-hoc video search. In: 2017 TREC Video Retrieval Evaluation Notebook Papers (2017)
Google Scholar

Download references

Acknowledgment

The present work was partially supported by JSPS KAKENHI Grants numbers 15K00249, 17H01831, and 18K11362, and the Kayamori Foundation of Informational Science Advancement.

Author information

Authors and Affiliations

Waseda University, Tokyo, Japan
Koji Hirakawa, Kotaro Kikuchi, Kazuya Ueki, Tetsunori Kobayashi & Yoshihiko Hayashi
Meisei University, Hino, Japan
Kazuya Ueki

Authors

Koji Hirakawa
View author publications
You can also search for this author in PubMed Google Scholar
Kotaro Kikuchi
View author publications
You can also search for this author in PubMed Google Scholar
Kazuya Ueki
View author publications
You can also search for this author in PubMed Google Scholar
Tetsunori Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Yoshihiko Hayashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoshihiko Hayashi .

Editor information

Editors and Affiliations

National Taiwan Normal University, Taipei, Taiwan
Yuen-Hsien Tseng
Waseda University, Tokyo, Japan
Tetsuya Sakai
School of Information Systems, Singapore Management University, Singapore, Singapore
Jing Jiang
Academia Sinica, Taipei, Taiwan
Lun-Wei Ku
Huawei Research America, Champaign, IL, USA
Dae Hoon Park
National Chiayi University, Chiayi City, Taiwan
Jui-Feng Yeh
Yuan Ze University, Taoyuan City, Taiwan
Liang-Chih Yu
National Central University, Taoyuan City, Taiwan
Lung-Hao Lee
National Taiwan Normal University, Taipei City, Taiwan
Zhi-Hong Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hirakawa, K., Kikuchi, K., Ueki, K., Kobayashi, T., Hayashi, Y. (2018). Ad-hoc Video Search Improved by the Word Sense Filtering of Query Terms. In: Tseng, YH., et al. Information Retrieval Technology. AIRS 2018. Lecture Notes in Computer Science(), vol 11292. Springer, Cham. https://doi.org/10.1007/978-3-030-03520-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-03520-4_15
Published: 17 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03519-8
Online ISBN: 978-3-030-03520-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics