Fingerspelling Recognition with Two-Steps Cascade Process of Spotting and Classification

Muroi, Masanori; Sogi, Naoya; Kato, Nobuko; Fukui, Kazuhiro

doi:10.1007/978-3-030-68780-9_55

Masanori Muroi¹⁶,
Naoya Sogi¹⁶,
Nobuko Kato¹⁷ &
…
Kazuhiro Fukui¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12666))

Included in the following conference series:

International Conference on Pattern Recognition

2220 Accesses
1 Citations

Abstract

In this paper, we propose a framework for fingerspelling recognition, based on the two-step cascade process of spotting and classification. This two-steps process is motivated by the human cognitive function in fingerspelling recognition. In the spotting process, an image sequence corresponding to certain fingerspelling is extracted from an input video by classifying the partial sequence into two fingerspelling categories and others. At this stage, how to deal with temporary dynamic information is a key point. The extracted fingerspelling is classified in the classification process. Here, the temporal dynamic information is not necessarily required. Rather, how to classify its static hand shape using the multi-view images is more important. In our framework, we employ temporal regularized canonical correlation analysis (TRCCA) for the spotting, considering it can effectively handle an image sequence’s temporal information. For the classification, we employ the orthogonal mutual subspace method (OMSM), since it can consider the information effectively from multi-view images to classify the hand shape fast and accurately. We demonstrate the effectiveness of our framework based on a complementary combination of TRCCA and OMSM compared to conventional methods on a private Japanese fingerspelling dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Huang, J., Zhou, W., Li, H., Li, W.: Sign language recognition using 3D convolutional neural networks. In: IEEE International Conference on Multimedia and Expo 2015, pp. 1–6 (2015)
Google Scholar
Hosoe, H., Sako, S., Kwolek, B.: Recognition of JSL finger spelling using convolutional neural networks. In: International Conference on Machine Vision Applications, pp. 85–88 (2017)
Google Scholar
Starner, T., Weaver, J., Pentland, A.: Real-time American sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1371–1375 (1998)
Article Google Scholar
Takabayashi, D., Tanaka, Y., Okazaki, A., Kato, N., Hino, H., Fukui, K.: Finger alphabets recognition with multi-depth images for developing their learning system. In: Korea-Japan Joint Workshop on Frontiers of Computer Vision, pp. 154–159 (2014)
Google Scholar
Ohkawa, Y., Fukui, K.: Hand-shape recognition using the distributions of multi-viewpoint image sets. IEICE Trans. Inf. Syst. 95(6), 1619–1627 (2012)
Article Google Scholar
Mukai, N., Harada, N., Chang, Y.: Japanese fingerspelling recognition based on classification tree and machine learning. In: Nicograph International, pp. 19–24 (2017)
Google Scholar
Wang, Z., Li, B.: A two-stage approach to saliency detection in images. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 965–968 (2008)
Google Scholar
van der Heijden, A.H.C.: Two stages in visual information processing and visual perception? Vis. Cogn. 3(4), 325–362 (1996)
Article Google Scholar
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)
Article Google Scholar
Kobayashi, T.: S3CCA: smoothly structured sparse CCA for partial pattern matching. In: International Conference on Pattern Recognition, pp. 1981–1986 (2014)
Google Scholar
Tanaka, S., Okazaki, A., Kato, N., Hino, H., Fukui, K.: Spotting fingerspelled words from sign language video by temporally regularized canonical component analysis. In: 2016 IEEE International Conference on Identity, Security and Behavior Analysis, pp. 1–7 (2016)
Google Scholar
Yamaguchi, O., Fukui, K., Maeda, K.: Face recognition using temporal image sequence. In: Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 318–323 (1998)
Google Scholar
Fukui, K., Maki, A.: Difference subspace and its generalization for subspace-based methods. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2164–2177 (2015)
Article Google Scholar
Kawahara, T., Nishiyama, M., Kozakaya, T., Yamaguchi, O.: Face recognition based on whitening transformation of distribution of subspaces. In: Asian Conference on Computer Vision workshops, Subspace, pp. 97–103 (2007)
Google Scholar
Kim, T.K., Kittler, J., Cipolla, R.: Incremental learning of locally orthogonal subspaces for set-based object recognition. In: Proceedings British Machine Vision Conference, pp. 559–568 (2006)
Google Scholar
Hotelling, H.: Relations between two sets of variates. Biometrika 28(3–4), 321–377 (1936)
Article Google Scholar
Afriat, S.N.: Orthogonal and oblique projectors and the characteristics of pairs of vector spaces. In: Mathematical Proceedings of the Cambridge Philosophical Society, vol. 53, no. 04, pp. 800–816 (1957)
Google Scholar
Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 806–813 (2014)
Google Scholar
Chen, J.-C., Patel, V.M., Chellappa, R.: Unconstrained face verification using deep CNN features. In: IEEE Winter Conference on Applications of Computer Vision 2016, pp. 1–9 (2016)
Google Scholar
Li, G., Yu, Y.: Visual saliency based on multiscale deep features. In: IEEE Conference on Computer Vision and Pattern Recognition 2015, pp. 5455–5463 (2015)
Google Scholar
Sogi, N., Nakayama, T., Fukui, K.: A method based on convex cone model for image-set classification with CNN features. In: International Joint Conference on Neural Networks 2018, pp. 1–8 (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Google Scholar
Fukui, K., Yamaguchi, O.: The kernel orthogonal mutual subspace method and its application to 3D object recognition. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007. LNCS, vol. 4844, pp. 467–476. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76390-1_46
Chapter Google Scholar
Peris, M., Fukui, K.: Both-hand gesture recognition based on KOMSM with volume subspaces for robot teleoperation. In: International Conference on Cyber Technology in Automation, Control, and Intelligent Systems, pp. 191–196 (2012)
Google Scholar

Download references

Acknowledgement

This work was partly supported by JSPS KAKENHI Grant Number 19H04129.

Author information

Authors and Affiliations

Graduate School of Systems and Information Engineering, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
Masanori Muroi, Naoya Sogi & Kazuhiro Fukui
Faculty of Industrial Technology, Tsukuba University of Technology, 4-3-15 Amakubo, Tsukuba, Ibaraki, 305-8520, Japan
Nobuko Kato

Authors

Masanori Muroi
View author publications
You can also search for this author in PubMed Google Scholar
Naoya Sogi
View author publications
You can also search for this author in PubMed Google Scholar
Nobuko Kato
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhiro Fukui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masanori Muroi .

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Alberto Del Bimbo
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Rita Cucchiara
Department of Computer Science, Boston University, Boston, MA, USA
Stan Sclaroff
Dipartimento di Matematica e Informatica, University of Catania, Catania, Italy
Giovanni Maria Farinella
Cloud & AI, JD.COM, Beijing, China
Tao Mei
Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Marco Bertini
Computational Sciences Department, National Institute of Astrophysics, Optics and Electronics (INAOE), Tonantzintla, Puebla, Mexico
Hugo Jair Escalante
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Roberto Vezzani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Muroi, M., Sogi, N., Kato, N., Fukui, K. (2021). Fingerspelling Recognition with Two-Steps Cascade Process of Spotting and Classification. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12666. Springer, Cham. https://doi.org/10.1007/978-3-030-68780-9_55

Download citation

DOI: https://doi.org/10.1007/978-3-030-68780-9_55
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68779-3
Online ISBN: 978-3-030-68780-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)