Abstract
Quaternionized versions of standard (real-valued) neural network layers have shown to lead to networks that are sparse and as effective as their real-valued counterparts. In this work, we explore their usefulness in the context of the Keyword Spotting task. Tests on a collection of manuscripts written in modern Greek show that the proposed quaternionic ResNet achieves excellent performance using only a small fraction of the memory footprint of its real-valued counterpart. Code is available at https://github.com/sfikas/quaternion-resnet-kws.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arena, P., Fortuna, L., Occhipinti, L., Xibilia, M.G.: Neural networks for quaternion-valued function approximation. In: Proceedings of IEEE International Symposium on Circuits and Systems-ISCAS 1994, vol. 6, pp. 307–310. IEEE (1994)
Bojesomo, A., Liatsis, P., Marzouqi, H.A.: Traffic flow prediction using deep sedenion networks. arXiv preprint arXiv:2012.03874 (2020)
Ell, T.A., Le Bihan, N., Sangwine, S.J.: Quaternion Fourier Transforms for Signal and Image Processing. Wiley, Hoboken (2014)
Ell, T.A., Sangwine, S.J.: Hypercomplex Fourier transforms of color images. IEEE Trans. Image Process. 16(1), 22–35 (2007)
Gatos, B., et al.: GRPOLY-DB: An old Greek polytonic document image database. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 646–650. IEEE (2015)
Gaudet, C.J., Maida, A.S.: Deep quaternion networks. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2018)
Giotis, A.P., Sfikas, G., Nikou, C., Gatos, B.: Shape-based word spotting in handwritten document images. In: 13th International conference on document analysis and recognition (ICDAR), pp. 561–565. IEEE (2015)
Grassucci, E., Zhang, A., Comminiello, D.: Lightweight convolutional neural networks by hypercomplex parameterization. arXiv preprint arXiv:2110.04176 (2021)
Han, K., et al.: A survey on visual transformer. CoRR abs/2012.12556 (2020). https://arxiv.org/abs/2012.12556
Hui, W., Xiao-Hui, W., Yue, Z., Jie, Y.: Color texture segmentation using quaternion-Gabor filters. In: 2006 International Conference on Image Processing, pp. 745–748. IEEE (2006)
Isokawa, T., Kusakabe, T., Matsui, N., Peper, F.: Quaternion neural network and its application. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003. LNCS (LNAI), vol. 2774, pp. 318–324. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45226-3_44
Kobyzev, I., Prince, S.J., Brubaker, M.A.: Normalizing flows: an introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3964–3979 (2020)
Leung, H., Haykin, S.: The complex backpropagation algorithm. IEEE Trans. Signal Process. 39(9), 2101–2104 (1991)
Nitta, T.: A quaternary version of the back-propagation algorithm. In: Proceedings of ICNN’95-International Conference on Neural Networks. vol. 5, pp. 2753–2756. IEEE (1995)
Parcollet, T., Morchid, M., Linarès, G.: A survey of quaternion neural networks. Artif. Intell. Rev. 53(4), 2957–2982 (2019). https://doi.org/10.1007/s10462-019-09752-1
Parcollet, T., et al.: Quaternion convolutional neural networks for end-to-end automatic speech recognition. arXiv preprint arXiv:1806.07789 (2018)
Prieto, J.R., Vidal, E.: Improved graph methods for table layout understanding. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12822, pp. 507–522. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_33
Prince, S.J.: Computer Vision: Models, Learning, and Inference. Cambridge University Press, Cambridge (2012)
Retsinas, G., Elafrou, A., Goumas, G., Maragos, P.: Weight pruning via adaptive sparsity loss. arXiv preprint arXiv:2006.02768 (2020)
Retsinas, G., Louloudis, G., Stamatopoulos, N., Gatos, B.: Efficient learning-free keyword spotting. IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1587–1600 (2018)
Retsinas, G., Sfikas, G., Nikou, C., Maragos, P.: From Seq2Seq recognition to handwritten word embeddings. In: Proceedings of the British Machine Vision Conference (BMVC) (2021)
Retsinas, G., Sfikas, G., Stamatopoulos, N., Louloudis, G., Gatos, B.: Exploring critical aspects of CNN-based keyword spotting. a phocnet study. In: Proceedings of the International Workshop on Document Analysis Systems (DAS), pp. 13–18. IEEE (2018)
Rusakov, E., Sudholt, S., Wolf, F., Fink, G.A.: Exploring architectures for CNN-based word spotting. arXiv preprint arXiv:1806.10866 (2018)
Sfikas, G., Giotis, A.P., Louloudis, G., Gatos, B.: Using attributes for word spotting and recognition in polytonic greek documents. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 686–690. IEEE (2015)
Sfikas, G., Giotis, A.P., Retsinas, G., Nikou, C.: Quaternion generative adversarial networks for inscription detection in byzantine monuments. In: Del Bimbo, A., Cucchiara, R., Sclaroff, S., Farinella, G.M., Mei, T., Bertini, M., Escalante, H.J., Vezzani, R. (eds.) ICPR 2021. LNCS, vol. 12667, pp. 171–184. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68787-8_12
Sfikas, G., Ioannidis, D., Tzovaras, D.: Quaternion Harris for multispectral keypoint detection. In: Proceedings of the International Conference on Image Processing (ICIP), pp. 11–15. IEEE (2020)
Sfikas, G., Nikou, C., Galatsanos, N., Heinrich, C.: MR brain tissue classification using an edge-preserving spatially variant bayesian mixture model. In: Metaxas, D., Axel, L., Fichtinger, G., Székely, G. (eds.) MICCAI 2008. LNCS, vol. 5241, pp. 43–50. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85988-8_6
Sfikas, G., Nikou, C., Galatsanos, N., Heinrich, C.: Majorization-minimization mixture model determination in image segmentation. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2169–2176. IEEE (2011)
Sfikas, G., Retsinas, G., Gatos, B.: A PHOC decoder for lexicon-free handwritten word recognition. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 513–518. IEEE (2017)
Sfikas, G., Retsinas, G., Gatos, B.: Hypercomplex generative adversarial networks for lightweight semantic labeling. In: International Conference on Pattern Recognition and Artificial Intelligence (2022)
Trabelsi, C., et al.: Deep complex networks. arXiv preprint arXiv:1705.09792 (2017)
Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems (NIPS), pp. 5998–6008 (2017)
Vidal, E., Toselli, A.H.: Probabilistic indexing and search for hyphenated words. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12822, pp. 426–442. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_28
Vince, J.: Quaternions for Computer Graphics. Springer, London (2021). https://doi.org/10.1007/978-1-4471-7509-4
Wolf, F., Fischer, A., Fink, G.A.: Graph convolutional neural networks for learning attribute representations for word spotting. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 50–64. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_4
Zhang, A., et al.: Beyond fully-connected layers with quaternions: parameterization of hypercomplex multiplications with \(1/n \) parameters. In: Proceedings of the International Conference on Learning Representations (ICLR) (2021)
Zhang, F.: Quaternions and matrices of quaternions. Linear Algebra Appl. 251, 21–57 (1997)
Zhu, X., Xu, Y., Xu, H., Chen, C.: Quaternion convolutional neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 631–647 (2018)
Acknowledgments
This research has been partially co-financed by the EU and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the calls “RESEARCH - CREATE - INNOVATE” (project Culdile - code T1E\(\varDelta \)K-03785, project Impala - code T1E\(\varDelta \)K-04517) and “OPEN INNOVATION IN CULTURE” (project Bessarion - T6YB\(\varPi \)-00214).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Sfikas, G., Retsinas, G., Giotis, A.P., Gatos, B., Nikou, C. (2022). Keyword Spotting with Quaternionic ResNet: Application to Spotting in Greek Manuscripts. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham. https://doi.org/10.1007/978-3-031-06555-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-06555-2_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06554-5
Online ISBN: 978-3-031-06555-2
eBook Packages: Computer ScienceComputer Science (R0)