Skip to main content

On-the-Fly Deformations for Keyword Spotting

  • Conference paper
  • First Online:
Document Analysis Systems (DAS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13237))

Included in the following conference series:

  • 2147 Accesses

Abstract

Modern Keyword Spotting systems rely on deep learning approaches to build effective neural networks which provide state-of-the-art results. Despite their evident success, these deep models have proven to be sensitive with respect to the input images; a small deformation, almost indistinguishable to the human eye, may considerably alter the resulting retrieval list. To address this issue, we propose a novel “on-the-fly” approach which deforms an input image to better match the query image, aiming to stabilize the aforementioned sensitivity. Results on the IAM dataset verify the effectiveness of the proposed method, which outperforms existing Query-by-Example approaches. Code is available at https://github.com/georgeretsi/defKWS/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)

    Article  Google Scholar 

  2. Cootes, T.F., Twining, C.J., Babalola, K.O., Taylor, C.J.: Diffeomorphic statistical shape models. Image Vis. Comput. 26(3), 326–332 (2008)

    Article  Google Scholar 

  3. Gerber, S., Tasdizen, T., Joshi, S., Whitaker, R.: On the manifold structure of the space of brain images. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A., Taylor, C. (eds.) MICCAI 2009. LNCS, vol. 5761, pp. 305–312. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04268-3_38

    Chapter  Google Scholar 

  4. Giotis, A.P., Sfikas, G., Nikou, C., Gatos, B.: Shape-based word spotting in handwritten document images. In: 13th International conference on document analysis and recognition (ICDAR), pp. 561–565. IEEE (2015)

    Google Scholar 

  5. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778 (2016)

    Google Scholar 

  7. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 2017–2025 (2015)

    Google Scholar 

  8. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  9. Krishnan, P., Dutta, K., Jawahar, C.V.: Deep feature embedding for accurate recognition and retrieval of handwritten text. In: Proceedings of the \(15^{th}\) International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 289–294 (2016)

    Google Scholar 

  10. Krishnan, P., Dutta, K., Jawahar, C.: Word spotting and recognition using deep embedding. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 1–6. IEEE (2018)

    Google Scholar 

  11. Krishnan, P., Jawahar, C.V.: Matching handwritten document images. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 766–782. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_46

    Chapter  Google Scholar 

  12. Krishnan, P., Jawahar, C.V.: HWNet v2: an efficient word image representation for handwritten documents. IJDAR 22(4), 387–405 (2019). https://doi.org/10.1007/s10032-019-00336-x

    Article  Google Scholar 

  13. Krishnan, P., Jawahar, C.: Bringing semantics into word image representation. Pattern Recogn. 108, 107542 (2020)

    Article  Google Scholar 

  14. Noblet, V., Heinrich, C., Heitz, F., Armspach, J.P.: 3-D deformable image registration: a topology preservation scheme based on hierarchical deformation models and interval analysis optimization. IEEE Trans. Image Proc. 14(5), 553–566 (2005)

    Article  MathSciNet  Google Scholar 

  15. Retsinas, G., Louloudis, G., Stamatopoulos, N., Gatos, B.: Efficient learning-free keyword spotting. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1587–1600 (2018)

    Article  Google Scholar 

  16. Retsinas, G., Louloudis, G., Stamatopoulos, N., Sfikas, G., Gatos, B.: An alternative deep feature approach to line level keyword spotting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12658–12666 (2019)

    Google Scholar 

  17. Retsinas, G., Sfikas, G., Gatos, B.: Transferable deep features for keyword spotting. In: International Workshop on Computational Intelligence for Multimedia Understanding (IWCIM), held in conjunction with EUSIPCO (2017)

    Google Scholar 

  18. Retsinas, G., Sfikas, G., Louloudis, G., Stamatopoulos, N., Gatos, B.: Compact deep descriptors for keyword spotting. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 315–320. IEEE (2018)

    Google Scholar 

  19. Retsinas, G., Sfikas, G., Nikou, C., Maragos, P.: Deformation-invariant networks for handwritten text recognition. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 949–953. IEEE (2021)

    Google Scholar 

  20. Retsinas, G., Sfikas, G., Nikou, C., Maragos, P.: From Seq2Seq recognition to handwritten word embeddings. In: Proceedings of the British Machine Vision Conference (BMVC) (2021)

    Google Scholar 

  21. Retsinas, G., Sfikas, G., Stamatopoulos, N., Louloudis, G., Gatos, B.: Exploring critical aspects of cnn-based keyword spotting. a phocnet study. In: 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 13–18. IEEE (2018)

    Google Scholar 

  22. Retsinas, G., Stamatopoulos, N., Louloudis, G., Sfikas, G., Gatos, B.: Nonlinear manifold embedding on keyword spotting using t-sne. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 487–492. IEEE (2017)

    Google Scholar 

  23. Sfikas, G., Heinrich, C., Nikou, C.: Multiple atlas inference and population analysis using spectral clustering. In: 2010 20th International Conference on Pattern Recognition, pp. 2500–2503. IEEE (2010)

    Google Scholar 

  24. Sfikas, Giorgos, Nikou, Christophoros: Bayesian multiview manifold learning applied to hippocampus shape and clinical score data. In: Müller, H., et al. (eds.) MCV/BAMBI -2016. LNCS, vol. 10081, pp. 160–171. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61188-4_15

    Chapter  Google Scholar 

  25. Sudholt, S., Fink, G.A.: PHOCNet: a deep convolutional neural network for word spotting in handwritten documents. In: Proceedings of the \(15^{th}\) International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 277–282 (2016)

    Google Scholar 

  26. Sudholt, S., Fink, G.A.: A modified isomap approach to manifold learning in word spotting. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 529–539. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24947-6_44

    Chapter  Google Scholar 

  27. Sudholt, S., Fink, G.A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: 2017 14th IAPR International Conference On Document Analysis And Recognition (ICDAR), vol. 1, pp. 493–498. IEEE (2017)

    Google Scholar 

  28. Wilkinson, T., Brun, A.: Semantic and verbatim word spotting using deep neural networks. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 307–312. IEEE (2016)

    Google Scholar 

Download references

Acknowledgment

This research has been partially co - financed by the EU and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the calls : “RESEARCH - CREATE - INNOVATE”, project Culdile (code T1E\(\varDelta \)K - 03785) and “OPEN INNOVATION IN CULTURE”, project Bessarion (T6YB\(\varPi \) - 00214).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgos Sfikas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Retsinas, G., Sfikas, G., Gatos, B., Nikou, C. (2022). On-the-Fly Deformations for Keyword Spotting. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham. https://doi.org/10.1007/978-3-031-06555-2_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06555-2_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06554-5

  • Online ISBN: 978-3-031-06555-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics