Abstract
In Bag-of-Visual-Words (BoVW) framework, there is lacking of the semantic relatedness between visual words. Therefore, a visual word embeddings approach has been proposed in this paper, which is similar to the word embedding technique in natural language processing (NLP). First of all, a large number of visual words are extracted and collected from a word image collection under the framework of BoVW. And then, a deep learning procedure is used for mapping visual words into embedding vectors in a semantic space. After that, the visual word embeddings are integrated into a translation language model for attaining the aim of keyword spotting in the scenario of query-by-example. Experimental results prove that the proposed visual word embeddings based translation language model approach for keyword spotting outperforms various state-of-the-art methods, including BoVW, language model (LM), translation language model with mutual information (TLM-MI) and latent Dirichlet allocation (LDA).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rath, T.M., Manmatha, R.: Word spotting for historical manuscripts. Int. J. Doc. Anal. Recognit. 9(2), 139–152 (2007)
Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proceedings of ICDAR 2003, pp. 218–222. IEEE Press, New York (2003)
Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Proceedings of CVPR 2003, pp. 521–527. IEEE Press, New York (2003)
Wei, H., Gao, G., Bao, Y.: A method for removing inflectional suffixes in word spotting of Mongolian Kanjur. In: Proceedings of ICDAR 2011, pp. 88–92. IEEE Press, New York (2011)
Shekhar, R., Jawahar, C.V.: Word image retrieval using bag of visual words. In: Proceedings of DAS 2012, pp. 297–301. IEEE Press, New York (2012)
Aldavert, D., Rusinol, M., Toledo, R., Llados, J.: A study of bag-of-visual-words representations for handwritten keyword spotting. Int. J. Doc. Anal. Recognit. 18(3), 223–234 (2015)
Lopes-Monroy, A.P., Montes-Y-Gomez, M., Escalante, H.J., Cruz-Roa, A., Gonzalez, F.A.: Improving the BoVW via discriminative visual n-grams and MKL strategies. Neurocomputing 175, 768–781 (2016)
Fornes, A., Frinken, V., Fischer, A., Almazan, J., Jackson, G., Bunke, H.: A keyword spotting approach using blurred shape model-based descriptors. In: Proceedings of HIP 2011, pp. 83–89. ACM Press, New York (2011)
Aldavert, D., Rusinol, M., Toledo, R., Llados, J.: Integrating visual and textual cues for query-by-string word spotting. In: Proceedings of ICDAR 2013, pp. 511–515. IEEE Press, New York (2013)
Rothacker, L., Fink, G.A.: Segmentation-free query-by-string word spotting with bag-of-features HMMs. In: Proceedings of ICDAR 2015, pp. 661–665. IEEE Press, New York (2015)
Wei, H., Gao, G., Su, X.: A multiple instances approach to improving keyword spotting on historical Mongolian document images. In: Proceedings of ICDAR 2015, pp. 121–125. IEEE Press, New York (2015)
Wei, H., Gao, G.: A keyword retrieval system for historical Mongolian document images. Int. J. Doc. Anal. Recognit. 17(1), 33–45 (2014)
Mikolov, T., Sutskever, I., Chen, K., Coorado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS 2013, pp. 3111–3119. MIT Press, Massachusetts (2013)
Zamani, H., Croft, W.B.: Embedding-based query language models. In: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval (ICTIR 2016), pp. 147–156. ACM Press, New York (2016)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543. ACL Press, Stroudsburg (2014)
Zuccon, G., Koopman, B., Bruza, P., Azzopardi, L.: Integrating and evaluating neural word embeddings in information retrieval. In: Proceedings of the 20th Australasian Document Computing Symposium (ADCS 2015), pp. 12:1–12:8. ACM Press, New York (2015)
Wei, H., Gao, G., Bao, Y., Wang, Y.: An efficient binarization method for ancient Mongolian document images. In: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering, pp. 43–46. IEEE Press, New York (2010)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Karimzadehgan, M., Zhai, C.X.: Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In: Proceedings of SIGIR 2010, pp. 323–330. ACM Press, New York (2010)
Zhai, C.X., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of SIGIR 2001, pp. 334–342. ACM Press, New York (2001)
Wei, H., Gao, G., Su, X.: LDA-based word image representation for keyword spotting on historical Mongolian documents. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9950, pp. 432–441. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46681-1_52
Wei, H., Zhang, H., Gao, G.: Representing word image using visual word embeddings and RNN for keyword spotting on historical document images. In: Proceedings of ICME 2017, pp. 1374–1379. IEEE Press, New York (2017)
Wei, H., Gao, G.: Visual language model for keyword spotting on historical Mongolian document images. In: Proceedings of CCDC 2017, pp. 1765–1770. IEEE Press, New York (2017)
Acknowledgements
The paper is supported by the National Natural Science Foundation of China under Grant 61463038.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Wei, H., Zhang, H., Gao, G. (2018). Integrating Visual Word Embeddings into Translation Language Model for Keyword Spotting on Historical Mongolian Document Images. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10736. Springer, Cham. https://doi.org/10.1007/978-3-319-77383-4_60
Download citation
DOI: https://doi.org/10.1007/978-3-319-77383-4_60
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77382-7
Online ISBN: 978-3-319-77383-4
eBook Packages: Computer ScienceComputer Science (R0)