Integrating Visual Word Embeddings into Translation Language Model for Keyword Spotting on Historical Mongolian Document Images

Wei, Hongxi; Zhang, Hui; Gao, Guanglai

doi:10.1007/978-3-319-77383-4_60

Hongxi Wei¹⁹,
Hui Zhang¹⁹ &
Guanglai Gao¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10736))

Included in the following conference series:

Pacific Rim Conference on Multimedia

2417 Accesses

Abstract

In Bag-of-Visual-Words (BoVW) framework, there is lacking of the semantic relatedness between visual words. Therefore, a visual word embeddings approach has been proposed in this paper, which is similar to the word embedding technique in natural language processing (NLP). First of all, a large number of visual words are extracted and collected from a word image collection under the framework of BoVW. And then, a deep learning procedure is used for mapping visual words into embedding vectors in a semantic space. After that, the visual word embeddings are integrated into a translation language model for attaining the aim of keyword spotting in the scenario of query-by-example. Experimental results prove that the proposed visual word embeddings based translation language model approach for keyword spotting outperforms various state-of-the-art methods, including BoVW, language model (LM), translation language model with mutual information (TLM-MI) and latent Dirichlet allocation (LDA).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolian Documents

Using Word Mover’s Distance with Spatial Constraints for Measuring Similarity Between Mongolian Word Images

Latent Dirichlet Allocation Based Image Retrieval

References

Rath, T.M., Manmatha, R.: Word spotting for historical manuscripts. Int. J. Doc. Anal. Recognit. 9(2), 139–152 (2007)
Article Google Scholar
Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proceedings of ICDAR 2003, pp. 218–222. IEEE Press, New York (2003)
Google Scholar
Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Proceedings of CVPR 2003, pp. 521–527. IEEE Press, New York (2003)
Google Scholar
Wei, H., Gao, G., Bao, Y.: A method for removing inflectional suffixes in word spotting of Mongolian Kanjur. In: Proceedings of ICDAR 2011, pp. 88–92. IEEE Press, New York (2011)
Google Scholar
Shekhar, R., Jawahar, C.V.: Word image retrieval using bag of visual words. In: Proceedings of DAS 2012, pp. 297–301. IEEE Press, New York (2012)
Google Scholar
Aldavert, D., Rusinol, M., Toledo, R., Llados, J.: A study of bag-of-visual-words representations for handwritten keyword spotting. Int. J. Doc. Anal. Recognit. 18(3), 223–234 (2015)
Article Google Scholar
Lopes-Monroy, A.P., Montes-Y-Gomez, M., Escalante, H.J., Cruz-Roa, A., Gonzalez, F.A.: Improving the BoVW via discriminative visual n-grams and MKL strategies. Neurocomputing 175, 768–781 (2016)
Article Google Scholar
Fornes, A., Frinken, V., Fischer, A., Almazan, J., Jackson, G., Bunke, H.: A keyword spotting approach using blurred shape model-based descriptors. In: Proceedings of HIP 2011, pp. 83–89. ACM Press, New York (2011)
Google Scholar
Aldavert, D., Rusinol, M., Toledo, R., Llados, J.: Integrating visual and textual cues for query-by-string word spotting. In: Proceedings of ICDAR 2013, pp. 511–515. IEEE Press, New York (2013)
Google Scholar
Rothacker, L., Fink, G.A.: Segmentation-free query-by-string word spotting with bag-of-features HMMs. In: Proceedings of ICDAR 2015, pp. 661–665. IEEE Press, New York (2015)
Google Scholar
Wei, H., Gao, G., Su, X.: A multiple instances approach to improving keyword spotting on historical Mongolian document images. In: Proceedings of ICDAR 2015, pp. 121–125. IEEE Press, New York (2015)
Google Scholar
Wei, H., Gao, G.: A keyword retrieval system for historical Mongolian document images. Int. J. Doc. Anal. Recognit. 17(1), 33–45 (2014)
Article Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Coorado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS 2013, pp. 3111–3119. MIT Press, Massachusetts (2013)
Google Scholar
Zamani, H., Croft, W.B.: Embedding-based query language models. In: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval (ICTIR 2016), pp. 147–156. ACM Press, New York (2016)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543. ACL Press, Stroudsburg (2014)
Google Scholar
Zuccon, G., Koopman, B., Bruza, P., Azzopardi, L.: Integrating and evaluating neural word embeddings in information retrieval. In: Proceedings of the 20th Australasian Document Computing Symposium (ADCS 2015), pp. 12:1–12:8. ACM Press, New York (2015)
Google Scholar
Wei, H., Gao, G., Bao, Y., Wang, Y.: An efficient binarization method for ancient Mongolian document images. In: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering, pp. 43–46. IEEE Press, New York (2010)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Karimzadehgan, M., Zhai, C.X.: Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In: Proceedings of SIGIR 2010, pp. 323–330. ACM Press, New York (2010)
Google Scholar
Zhai, C.X., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of SIGIR 2001, pp. 334–342. ACM Press, New York (2001)
Google Scholar
Wei, H., Gao, G., Su, X.: LDA-based word image representation for keyword spotting on historical Mongolian documents. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9950, pp. 432–441. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46681-1_52
Chapter Google Scholar
Wei, H., Zhang, H., Gao, G.: Representing word image using visual word embeddings and RNN for keyword spotting on historical document images. In: Proceedings of ICME 2017, pp. 1374–1379. IEEE Press, New York (2017)
Google Scholar
Wei, H., Gao, G.: Visual language model for keyword spotting on historical Mongolian document images. In: Proceedings of CCDC 2017, pp. 1765–1770. IEEE Press, New York (2017)
Google Scholar

Download references

Acknowledgements

The paper is supported by the National Natural Science Foundation of China under Grant 61463038.

Author information

Authors and Affiliations

School of Computer Science, Inner Mongolia University, Hohhot, China
Hongxi Wei, Hui Zhang & Guanglai Gao

Authors

Hongxi Wei
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guanglai Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongxi Wei .

Editor information

Editors and Affiliations

University of Electronic Science and Technology of China, Chengdu, China
Bing Zeng
University of Chinese Academy of Sciences, Beijing, China
Qingming Huang
University of Ottawa, Ottawa, Ontario, Canada
Abdulmotaleb El Saddik
University of Electronic Science and Technology of China, Chengdu, China
Hongliang Li
Chinese Academy of Sciences, Beijing, China
Shuqiang Jiang
Harbin Institute of Technology, Harbin, China
Xiaopeng Fan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, H., Zhang, H., Gao, G. (2018). Integrating Visual Word Embeddings into Translation Language Model for Keyword Spotting on Historical Mongolian Document Images. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10736. Springer, Cham. https://doi.org/10.1007/978-3-319-77383-4_60

Download citation

DOI: https://doi.org/10.1007/978-3-319-77383-4_60
Published: 10 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77382-7
Online ISBN: 978-3-319-77383-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Integrating Visual Word Embeddings into Translation Language Model for Keyword Spotting on Historical Mongolian Document Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolian Documents

Using Word Mover’s Distance with Spatial Constraints for Measuring Similarity Between Mongolian Word Images

Latent Dirichlet Allocation Based Image Retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Integrating Visual Word Embeddings into Translation Language Model for Keyword Spotting on Historical Mongolian Document Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolian Documents

Using Word Mover’s Distance with Spatial Constraints for Measuring Similarity Between Mongolian Word Images

Latent Dirichlet Allocation Based Image Retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation