Skip to main content

A Hybrid Representation of Word Images for Keyword Spotting

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1332))

Included in the following conference series:

  • 2282 Accesses

Abstract

In the task of keyword spotting based on query-by-example, how to represent word images is a very important issue. Meanwhile, the problem of out-of-vocabulary (OOV) is frequently occurred in keyword spotting. Therefore, the problem of OOV keyword spotting is a challenging task. In this paper, a hybrid representation approach of word images has been presented to accomplish the aim of OOV keyword spotting. To be specific, a sequence to sequence model has been utilized to generate representation vectors of word images. Meanwhile, a CNN model with VGG16 architecture has been used to obtain another type of representation vectors. After that, a score fusion scheme is adopted to combine the above two kinds of representation vectors. Experimental results demonstrate that the proposed hybrid representation approach of word images is especially suited for solving the problem of OOV keyword spotting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Giotis, A.P., Sfikas, G., Gatos, B., Nikou, C.: A survey of document image word spotting techniques. Pattern Recogn. 68(8), 310–332 (2017)

    Article  Google Scholar 

  2. Gurjar, N., Sudholt, S., Fink, G.A.: Learning deep representations for word spotting under weak supervision. In: Proceedings of the 13th International Workshop on Document Analysis Systems (DAS’18), pp. 7–12. IEEE (2018)

    Google Scholar 

  3. Wei, H., Gao, G.: A keyword retrieval system for historical Mongolian document images. Int. J. Doc. Anal. Recogn. (IJDAR) 17(1), 33–45 (2013). https://doi.org/10.1007/s10032-013-0203-6

    Article  Google Scholar 

  4. Wilkinson, T., Lindstrom, J., Brun, A.: Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections. In: Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV’17), pp. 4433–4442. IEEE (2017)

    Google Scholar 

  5. Wei, H., Zhang, H., Gao, G.: Word image representation based on visual embeddings and spatial constraints for keyword spotting on historical documents. In: Proceedings of the 24th International Conference on Pattern Recognition (ICPR’18), pp. 3616–3621. IEEE (2018)

    Google Scholar 

  6. Aldavert, D., Rusinol, M., Toledo, R., Llados, J.: A study of bag-of-visual-words representations for handwritten keyword spotting. Int. J. Doc. Anal. Recogn. 18(3), 223–234 (2015)

    Article  Google Scholar 

  7. Wei, H., Gao, G.: Visual language model for keyword spotting on historical Mongolian document images. In: Proceedings of the 29th Chinese Control and Decision Conference (CCDC’17), pp. 1737–1742. IEEE (2017)

    Google Scholar 

  8. Wei, H., Gao, G., Su, X.: LDA-based word image representation for keyword spotting on historical mongolian documents. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9950, pp. 432–441. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46681-1_52

    Chapter  Google Scholar 

  9. Wei, H., Zhang, H., Gao, G.: Representing word image using visual word embeddings and RNN for keyword spotting on historical document images. In: Proceedings of the 18th International Conference on Multimedia and Expo (ICME’17), pp. 1368–1373. IEEE (2017)

    Google Scholar 

  10. Wei, H., Zhang, H., Gao, G.: Integrating visual word embeddings into translation language model for keyword spotting on historical mongolian document images. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds.) PCM 2017. LNCS, vol. 10736, pp. 616–625. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77383-4_60

    Chapter  Google Scholar 

  11. Wei, H., Zhang, H., Gao, G., Su X.: Using word mover’s distance with spatial constraints for measuring similarity between mongolian word images. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science, vol 10637. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70093-9_20

  12. Krishnan, P., Dutta, K., Jawahar, C.: Deep feature embedding for accurate recognition and retrieval of handwritten text. In: Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition (ICFHR’16), pp. 289–294. IEEE (2016)

    Google Scholar 

Download references

Acknowledgments

This study is supported by the Natural Science Foundation of Inner Mongolia Autonomous Region under Grant 2019ZD14, the Project for Science and Technology of Inner Mongolia Autonomous Region under Grant 2019GG281, the Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region under Grant NJYT-20-A05, and the Natural Science Foundation of China under Grant 61463038 and 61763034.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongxi Wei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wei, H., Zhang, J., Liu, K. (2020). A Hybrid Representation of Word Images for Keyword Spotting. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63820-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63819-1

  • Online ISBN: 978-3-030-63820-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics