Skip to main content

Keyword Spotting with Quaternionic ResNet: Application to Spotting in Greek Manuscripts

  • Conference paper
  • First Online:
Document Analysis Systems (DAS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13237))

Included in the following conference series:

Abstract

Quaternionized versions of standard (real-valued) neural network layers have shown to lead to networks that are sparse and as effective as their real-valued counterparts. In this work, we explore their usefulness in the context of the Keyword Spotting task. Tests on a collection of manuscripts written in modern Greek show that the proposed quaternionic ResNet achieves excellent performance using only a small fraction of the memory footprint of its real-valued counterpart. Code is available at https://github.com/sfikas/quaternion-resnet-kws.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Arena, P., Fortuna, L., Occhipinti, L., Xibilia, M.G.: Neural networks for quaternion-valued function approximation. In: Proceedings of IEEE International Symposium on Circuits and Systems-ISCAS 1994, vol. 6, pp. 307–310. IEEE (1994)

    Google Scholar 

  2. Bojesomo, A., Liatsis, P., Marzouqi, H.A.: Traffic flow prediction using deep sedenion networks. arXiv preprint arXiv:2012.03874 (2020)

  3. Ell, T.A., Le Bihan, N., Sangwine, S.J.: Quaternion Fourier Transforms for Signal and Image Processing. Wiley, Hoboken (2014)

    Book  Google Scholar 

  4. Ell, T.A., Sangwine, S.J.: Hypercomplex Fourier transforms of color images. IEEE Trans. Image Process. 16(1), 22–35 (2007)

    Article  MathSciNet  Google Scholar 

  5. Gatos, B., et al.: GRPOLY-DB: An old Greek polytonic document image database. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 646–650. IEEE (2015)

    Google Scholar 

  6. Gaudet, C.J., Maida, A.S.: Deep quaternion networks. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)

    Google Scholar 

  7. Giotis, A.P., Sfikas, G., Nikou, C., Gatos, B.: Shape-based word spotting in handwritten document images. In: 13th International conference on document analysis and recognition (ICDAR), pp. 561–565. IEEE (2015)

    Google Scholar 

  8. Grassucci, E., Zhang, A., Comminiello, D.: Lightweight convolutional neural networks by hypercomplex parameterization. arXiv preprint arXiv:2110.04176 (2021)

  9. Han, K., et al.: A survey on visual transformer. CoRR abs/2012.12556 (2020). https://arxiv.org/abs/2012.12556

  10. Hui, W., Xiao-Hui, W., Yue, Z., Jie, Y.: Color texture segmentation using quaternion-Gabor filters. In: 2006 International Conference on Image Processing, pp. 745–748. IEEE (2006)

    Google Scholar 

  11. Isokawa, T., Kusakabe, T., Matsui, N., Peper, F.: Quaternion neural network and its application. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003. LNCS (LNAI), vol. 2774, pp. 318–324. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45226-3_44

    Chapter  Google Scholar 

  12. Kobyzev, I., Prince, S.J., Brubaker, M.A.: Normalizing flows: an introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3964–3979 (2020)

    Article  Google Scholar 

  13. Leung, H., Haykin, S.: The complex backpropagation algorithm. IEEE Trans. Signal Process. 39(9), 2101–2104 (1991)

    Article  Google Scholar 

  14. Nitta, T.: A quaternary version of the back-propagation algorithm. In: Proceedings of ICNN’95-International Conference on Neural Networks. vol. 5, pp. 2753–2756. IEEE (1995)

    Google Scholar 

  15. Parcollet, T., Morchid, M., Linarès, G.: A survey of quaternion neural networks. Artif. Intell. Rev. 53(4), 2957–2982 (2019). https://doi.org/10.1007/s10462-019-09752-1

    Article  Google Scholar 

  16. Parcollet, T., et al.: Quaternion convolutional neural networks for end-to-end automatic speech recognition. arXiv preprint arXiv:1806.07789 (2018)

  17. Prieto, J.R., Vidal, E.: Improved graph methods for table layout understanding. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12822, pp. 507–522. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_33

    Chapter  Google Scholar 

  18. Prince, S.J.: Computer Vision: Models, Learning, and Inference. Cambridge University Press, Cambridge (2012)

    Book  Google Scholar 

  19. Retsinas, G., Elafrou, A., Goumas, G., Maragos, P.: Weight pruning via adaptive sparsity loss. arXiv preprint arXiv:2006.02768 (2020)

  20. Retsinas, G., Louloudis, G., Stamatopoulos, N., Gatos, B.: Efficient learning-free keyword spotting. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1587–1600 (2018)

    Article  Google Scholar 

  21. Retsinas, G., Sfikas, G., Nikou, C., Maragos, P.: From Seq2Seq recognition to handwritten word embeddings. In: Proceedings of the British Machine Vision Conference (BMVC) (2021)

    Google Scholar 

  22. Retsinas, G., Sfikas, G., Stamatopoulos, N., Louloudis, G., Gatos, B.: Exploring critical aspects of CNN-based keyword spotting. a phocnet study. In: Proceedings of the International Workshop on Document Analysis Systems (DAS), pp. 13–18. IEEE (2018)

    Google Scholar 

  23. Rusakov, E., Sudholt, S., Wolf, F., Fink, G.A.: Exploring architectures for CNN-based word spotting. arXiv preprint arXiv:1806.10866 (2018)

  24. Sfikas, G., Giotis, A.P., Louloudis, G., Gatos, B.: Using attributes for word spotting and recognition in polytonic greek documents. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 686–690. IEEE (2015)

    Google Scholar 

  25. Sfikas, G., Giotis, A.P., Retsinas, G., Nikou, C.: Quaternion generative adversarial networks for inscription detection in byzantine monuments. In: Del Bimbo, A., Cucchiara, R., Sclaroff, S., Farinella, G.M., Mei, T., Bertini, M., Escalante, H.J., Vezzani, R. (eds.) ICPR 2021. LNCS, vol. 12667, pp. 171–184. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68787-8_12

    Chapter  Google Scholar 

  26. Sfikas, G., Ioannidis, D., Tzovaras, D.: Quaternion Harris for multispectral keypoint detection. In: Proceedings of the International Conference on Image Processing (ICIP), pp. 11–15. IEEE (2020)

    Google Scholar 

  27. Sfikas, G., Nikou, C., Galatsanos, N., Heinrich, C.: MR brain tissue classification using an edge-preserving spatially variant bayesian mixture model. In: Metaxas, D., Axel, L., Fichtinger, G., Székely, G. (eds.) MICCAI 2008. LNCS, vol. 5241, pp. 43–50. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85988-8_6

    Chapter  Google Scholar 

  28. Sfikas, G., Nikou, C., Galatsanos, N., Heinrich, C.: Majorization-minimization mixture model determination in image segmentation. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2169–2176. IEEE (2011)

    Google Scholar 

  29. Sfikas, G., Retsinas, G., Gatos, B.: A PHOC decoder for lexicon-free handwritten word recognition. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 513–518. IEEE (2017)

    Google Scholar 

  30. Sfikas, G., Retsinas, G., Gatos, B.: Hypercomplex generative adversarial networks for lightweight semantic labeling. In: International Conference on Pattern Recognition and Artificial Intelligence (2022)

    Google Scholar 

  31. Trabelsi, C., et al.: Deep complex networks. arXiv preprint arXiv:1705.09792 (2017)

  32. Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems (NIPS), pp. 5998–6008 (2017)

    Google Scholar 

  33. Vidal, E., Toselli, A.H.: Probabilistic indexing and search for hyphenated words. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12822, pp. 426–442. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_28

    Chapter  Google Scholar 

  34. Vince, J.: Quaternions for Computer Graphics. Springer, London (2021). https://doi.org/10.1007/978-1-4471-7509-4

    Book  MATH  Google Scholar 

  35. Wolf, F., Fischer, A., Fink, G.A.: Graph convolutional neural networks for learning attribute representations for word spotting. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 50–64. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_4

    Chapter  Google Scholar 

  36. Zhang, A., et al.: Beyond fully-connected layers with quaternions: parameterization of hypercomplex multiplications with \(1/n \) parameters. In: Proceedings of the International Conference on Learning Representations (ICLR) (2021)

    Google Scholar 

  37. Zhang, F.: Quaternions and matrices of quaternions. Linear Algebra Appl. 251, 21–57 (1997)

    Article  MathSciNet  Google Scholar 

  38. Zhu, X., Xu, Y., Xu, H., Chen, C.: Quaternion convolutional neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 631–647 (2018)

    Google Scholar 

Download references

Acknowledgments

This research has been partially co-financed by the EU and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the calls “RESEARCH - CREATE - INNOVATE” (project Culdile - code T1E\(\varDelta \)K-03785, project Impala - code T1E\(\varDelta \)K-04517) and “OPEN INNOVATION IN CULTURE” (project Bessarion - T6YB\(\varPi \)-00214).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgos Sfikas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sfikas, G., Retsinas, G., Giotis, A.P., Gatos, B., Nikou, C. (2022). Keyword Spotting with Quaternionic ResNet: Application to Spotting in Greek Manuscripts. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham. https://doi.org/10.1007/978-3-031-06555-2_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06555-2_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06554-5

  • Online ISBN: 978-3-031-06555-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics