Abstract
In this paper, we propose a framework for fingerspelling recognition, based on the two-step cascade process of spotting and classification. This two-steps process is motivated by the human cognitive function in fingerspelling recognition. In the spotting process, an image sequence corresponding to certain fingerspelling is extracted from an input video by classifying the partial sequence into two fingerspelling categories and others. At this stage, how to deal with temporary dynamic information is a key point. The extracted fingerspelling is classified in the classification process. Here, the temporal dynamic information is not necessarily required. Rather, how to classify its static hand shape using the multi-view images is more important. In our framework, we employ temporal regularized canonical correlation analysis (TRCCA) for the spotting, considering it can effectively handle an image sequence’s temporal information. For the classification, we employ the orthogonal mutual subspace method (OMSM), since it can consider the information effectively from multi-view images to classify the hand shape fast and accurately. We demonstrate the effectiveness of our framework based on a complementary combination of TRCCA and OMSM compared to conventional methods on a private Japanese fingerspelling dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Huang, J., Zhou, W., Li, H., Li, W.: Sign language recognition using 3D convolutional neural networks. In: IEEE International Conference on Multimedia and Expo 2015, pp. 1–6 (2015)
Hosoe, H., Sako, S., Kwolek, B.: Recognition of JSL finger spelling using convolutional neural networks. In: International Conference on Machine Vision Applications, pp. 85–88 (2017)
Starner, T., Weaver, J., Pentland, A.: Real-time American sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1371–1375 (1998)
Takabayashi, D., Tanaka, Y., Okazaki, A., Kato, N., Hino, H., Fukui, K.: Finger alphabets recognition with multi-depth images for developing their learning system. In: Korea-Japan Joint Workshop on Frontiers of Computer Vision, pp. 154–159 (2014)
Ohkawa, Y., Fukui, K.: Hand-shape recognition using the distributions of multi-viewpoint image sets. IEICE Trans. Inf. Syst. 95(6), 1619–1627 (2012)
Mukai, N., Harada, N., Chang, Y.: Japanese fingerspelling recognition based on classification tree and machine learning. In: Nicograph International, pp. 19–24 (2017)
Wang, Z., Li, B.: A two-stage approach to saliency detection in images. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 965–968 (2008)
van der Heijden, A.H.C.: Two stages in visual information processing and visual perception? Vis. Cogn. 3(4), 325–362 (1996)
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)
Kobayashi, T.: S3CCA: smoothly structured sparse CCA for partial pattern matching. In: International Conference on Pattern Recognition, pp. 1981–1986 (2014)
Tanaka, S., Okazaki, A., Kato, N., Hino, H., Fukui, K.: Spotting fingerspelled words from sign language video by temporally regularized canonical component analysis. In: 2016 IEEE International Conference on Identity, Security and Behavior Analysis, pp. 1–7 (2016)
Yamaguchi, O., Fukui, K., Maeda, K.: Face recognition using temporal image sequence. In: Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 318–323 (1998)
Fukui, K., Maki, A.: Difference subspace and its generalization for subspace-based methods. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2164–2177 (2015)
Kawahara, T., Nishiyama, M., Kozakaya, T., Yamaguchi, O.: Face recognition based on whitening transformation of distribution of subspaces. In: Asian Conference on Computer Vision workshops, Subspace, pp. 97–103 (2007)
Kim, T.K., Kittler, J., Cipolla, R.: Incremental learning of locally orthogonal subspaces for set-based object recognition. In: Proceedings British Machine Vision Conference, pp. 559–568 (2006)
Hotelling, H.: Relations between two sets of variates. Biometrika 28(3–4), 321–377 (1936)
Afriat, S.N.: Orthogonal and oblique projectors and the characteristics of pairs of vector spaces. In: Mathematical Proceedings of the Cambridge Philosophical Society, vol. 53, no. 04, pp. 800–816 (1957)
Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 806–813 (2014)
Chen, J.-C., Patel, V.M., Chellappa, R.: Unconstrained face verification using deep CNN features. In: IEEE Winter Conference on Applications of Computer Vision 2016, pp. 1–9 (2016)
Li, G., Yu, Y.: Visual saliency based on multiscale deep features. In: IEEE Conference on Computer Vision and Pattern Recognition 2015, pp. 5455–5463 (2015)
Sogi, N., Nakayama, T., Fukui, K.: A method based on convex cone model for image-set classification with CNN features. In: International Joint Conference on Neural Networks 2018, pp. 1–8 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Fukui, K., Yamaguchi, O.: The kernel orthogonal mutual subspace method and its application to 3D object recognition. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007. LNCS, vol. 4844, pp. 467–476. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76390-1_46
Peris, M., Fukui, K.: Both-hand gesture recognition based on KOMSM with volume subspaces for robot teleoperation. In: International Conference on Cyber Technology in Automation, Control, and Intelligent Systems, pp. 191–196 (2012)
Acknowledgement
This work was partly supported by JSPS KAKENHI Grant Number 19H04129.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Muroi, M., Sogi, N., Kato, N., Fukui, K. (2021). Fingerspelling Recognition with Two-Steps Cascade Process of Spotting and Classification. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12666. Springer, Cham. https://doi.org/10.1007/978-3-030-68780-9_55
Download citation
DOI: https://doi.org/10.1007/978-3-030-68780-9_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68779-3
Online ISBN: 978-3-030-68780-9
eBook Packages: Computer ScienceComputer Science (R0)