On-the-Fly Deformations for Keyword Spotting

Retsinas, George; Sfikas, Giorgos; Gatos, Basilis; Nikou, Christophoros

doi:10.1007/978-3-031-06555-2_23

George Retsinas¹⁰,
Giorgos Sfikas¹³,
Basilis Gatos¹¹ &
…
Christophoros Nikou¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13237))

Included in the following conference series:

International Workshop on Document Analysis Systems

2147 Accesses

Abstract

Modern Keyword Spotting systems rely on deep learning approaches to build effective neural networks which provide state-of-the-art results. Despite their evident success, these deep models have proven to be sensitive with respect to the input images; a small deformation, almost indistinguishable to the human eye, may considerably alter the resulting retrieval list. To address this issue, we propose a novel “on-the-fly” approach which deforms an input image to better match the query image, aiming to stabilize the aforementioned sensitivity. Results on the IAM dataset verify the effectiveness of the proposed method, which outperforms existing Query-by-Example approaches. Code is available at https://github.com/georgeretsi/defKWS/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

TextFormer: A Query-based End-to-end Text Spotter with Mixed Supervision

Article 07 February 2024

Behavior of Keyword Spotting Networks Under Noisy Conditions

SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting

Article 15 April 2025

References

Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)
Article Google Scholar
Cootes, T.F., Twining, C.J., Babalola, K.O., Taylor, C.J.: Diffeomorphic statistical shape models. Image Vis. Comput. 26(3), 326–332 (2008)
Article Google Scholar
Gerber, S., Tasdizen, T., Joshi, S., Whitaker, R.: On the manifold structure of the space of brain images. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A., Taylor, C. (eds.) MICCAI 2009. LNCS, vol. 5761, pp. 305–312. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04268-3_38
Chapter Google Scholar
Giotis, A.P., Sfikas, G., Nikou, C., Gatos, B.: Shape-based word spotting in handwritten document images. In: 13th International conference on document analysis and recognition (ICDAR), pp. 561–565. IEEE (2015)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778 (2016)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 2017–2025 (2015)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Krishnan, P., Dutta, K., Jawahar, C.V.: Deep feature embedding for accurate recognition and retrieval of handwritten text. In: Proceedings of the $15^{th}$ International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 289–294 (2016)
Google Scholar
Krishnan, P., Dutta, K., Jawahar, C.: Word spotting and recognition using deep embedding. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 1–6. IEEE (2018)
Google Scholar
Krishnan, P., Jawahar, C.V.: Matching handwritten document images. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 766–782. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_46
Chapter Google Scholar
Krishnan, P., Jawahar, C.V.: HWNet v2: an efficient word image representation for handwritten documents. IJDAR 22(4), 387–405 (2019). https://doi.org/10.1007/s10032-019-00336-x
Article Google Scholar
Krishnan, P., Jawahar, C.: Bringing semantics into word image representation. Pattern Recogn. 108, 107542 (2020)
Article Google Scholar
Noblet, V., Heinrich, C., Heitz, F., Armspach, J.P.: 3-D deformable image registration: a topology preservation scheme based on hierarchical deformation models and interval analysis optimization. IEEE Trans. Image Proc. 14(5), 553–566 (2005)
Article MathSciNet Google Scholar
Retsinas, G., Louloudis, G., Stamatopoulos, N., Gatos, B.: Efficient learning-free keyword spotting. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1587–1600 (2018)
Article Google Scholar
Retsinas, G., Louloudis, G., Stamatopoulos, N., Sfikas, G., Gatos, B.: An alternative deep feature approach to line level keyword spotting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12658–12666 (2019)
Google Scholar
Retsinas, G., Sfikas, G., Gatos, B.: Transferable deep features for keyword spotting. In: International Workshop on Computational Intelligence for Multimedia Understanding (IWCIM), held in conjunction with EUSIPCO (2017)
Google Scholar
Retsinas, G., Sfikas, G., Louloudis, G., Stamatopoulos, N., Gatos, B.: Compact deep descriptors for keyword spotting. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 315–320. IEEE (2018)
Google Scholar
Retsinas, G., Sfikas, G., Nikou, C., Maragos, P.: Deformation-invariant networks for handwritten text recognition. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 949–953. IEEE (2021)
Google Scholar
Retsinas, G., Sfikas, G., Nikou, C., Maragos, P.: From Seq2Seq recognition to handwritten word embeddings. In: Proceedings of the British Machine Vision Conference (BMVC) (2021)
Google Scholar
Retsinas, G., Sfikas, G., Stamatopoulos, N., Louloudis, G., Gatos, B.: Exploring critical aspects of cnn-based keyword spotting. a phocnet study. In: 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 13–18. IEEE (2018)
Google Scholar
Retsinas, G., Stamatopoulos, N., Louloudis, G., Sfikas, G., Gatos, B.: Nonlinear manifold embedding on keyword spotting using t-sne. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 487–492. IEEE (2017)
Google Scholar
Sfikas, G., Heinrich, C., Nikou, C.: Multiple atlas inference and population analysis using spectral clustering. In: 2010 20th International Conference on Pattern Recognition, pp. 2500–2503. IEEE (2010)
Google Scholar
Sfikas, Giorgos, Nikou, Christophoros: Bayesian multiview manifold learning applied to hippocampus shape and clinical score data. In: Müller, H., et al. (eds.) MCV/BAMBI -2016. LNCS, vol. 10081, pp. 160–171. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61188-4_15
Chapter Google Scholar
Sudholt, S., Fink, G.A.: PHOCNet: a deep convolutional neural network for word spotting in handwritten documents. In: Proceedings of the $15^{th}$ International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 277–282 (2016)
Google Scholar
Sudholt, S., Fink, G.A.: A modified isomap approach to manifold learning in word spotting. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 529–539. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24947-6_44
Chapter Google Scholar
Sudholt, S., Fink, G.A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: 2017 14th IAPR International Conference On Document Analysis And Recognition (ICDAR), vol. 1, pp. 493–498. IEEE (2017)
Google Scholar
Wilkinson, T., Brun, A.: Semantic and verbatim word spotting using deep neural networks. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 307–312. IEEE (2016)
Google Scholar

Download references

Acknowledgment

This research has been partially co - financed by the EU and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the calls : “RESEARCH - CREATE - INNOVATE”, project Culdile (code T1E$\varDelta $K - 03785) and “OPEN INNOVATION IN CULTURE”, project Bessarion (T6YB$\varPi $ - 00214).

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
George Retsinas
Computational Intelligence Laboratory, Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”, Athens, Greece
Basilis Gatos
Department of Computer Science and Engineering, University of Ioannina, Ioannina, Greece
Christophoros Nikou
Department of Surveying and Geoinformatics Engineering, University of West Attica, Athens, Greece
Giorgos Sfikas

Authors

George Retsinas
View author publications
You can also search for this author in PubMed Google Scholar
Giorgos Sfikas
View author publications
You can also search for this author in PubMed Google Scholar
Basilis Gatos
View author publications
You can also search for this author in PubMed Google Scholar
Christophoros Nikou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giorgos Sfikas .

Editor information

Editors and Affiliations

Kyushu University, Fukuoka, Japan
Seiichi Uchida
Boise State University, BOISE, ID, USA
Elisa Barney
LIRIS UMR CNRS, Villeurbanne, France
Véronique Eglin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Retsinas, G., Sfikas, G., Gatos, B., Nikou, C. (2022). On-the-Fly Deformations for Keyword Spotting. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham. https://doi.org/10.1007/978-3-031-06555-2_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-06555-2_23
Published: 18 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06554-5
Online ISBN: 978-3-031-06555-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

On-the-Fly Deformations for Keyword Spotting