Fingerspelling Alphabet Recognition Using CNNs with 3D Convolutions for Cross Platform Applications

Kondratiuk, Serhii; Krak, Iurii; Kylias, Anatolii; Kasianiuk, Veda

doi:10.1007/978-3-030-54215-3_37

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1246))

Included in the following conference series:

International Scientific Conference “Intellectual Systems of Decision Making and Problem of Computational Intelligence”

503 Accesses
1 Citations

Abstract

The proposed communication technology is developed with cross-platform frameworks and consists of two parts: modeling and recognition of Ukrainian dactyl signs. Modeling is performed using realistic 3d hand model with animations of dynamic gesture and transitions between gesture, developed in Unity3D framework. User is able specify different words in the user interface and adjust number of polygons and step of animation to get satisfying performance. The computations can be done both on the device or in web. Recognition model training and serving is done with Tensorflow, which allows to deploy the model on different devices, including mobile, or to perform the model prediction on server in cloud. The dataset with Ukrainian dactyl signs was collected with 50 persons and 1500 images per each gesture, which allowed to train the model with high enough accuracy and robust in different environment conditions. The model is based on the MobileNetv3 convolutional neural network architecture, and with the optimal configuration of layers and network parameters, also in order to take into account temporal data, 3d convolutions were used. On the collected test dataset, which is 10% of the overall augmented dataset, and is 15000 images, with different light, noise and blurring condition and different personas hands, accuracy of over 98% is achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barmak, A., Krak, Y., Manziuk, E., Kasianiuk, V.: Information technology of separating hyperplanes synthesis for linear classifiers. J. Autom. Inf. Sci. 51(5), 54–64 (2019). https://doi.org/10.1615/JAutomatInfScien.v51.i5.50
Article Google Scholar
Bobic, V., Tadic, T.: Hand gesture recognition using neural network based techniques. In: 13th Symposium on Neural Networks and Applications (NEUREL), pp. 35–38. IEEE (2016)
Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 4724–4733. IEEE (2017). https://doi.org/10.1109/CVPR.2017.502
Garcia, B.: American Sign language: Real-time American Sign Language Recognition with Convolutional Neural Networks. Stanford University Stanford (2015)
Google Scholar
Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3d CNNS retrace the history of 2d CNNS and imagenet? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546–6555. IEEE (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016). https://doi.org/10.1109/CVPR.2016.90
Howard, A., Sandler, M., Chu, G., et al.: Searching for mobilenetv3. In: arXiv preprint. p. art. no. 5 (2019). arXiv:1905.02244
Howard, A., Zhu, M., Chen, B., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. In: arXiv preprint. p. art. no. 9 (2017). arXiv:1704.04861
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. IEEE (2018). https://doi.org/10.1109/CVPR.2018.00745
Iandola, F.N., Han, S., Moskewicz, M.W., et al.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5mb model size. In: arXiv preprint. p. art. no. 13 (2016). arXiv:1602.07360
Karpathy, A., Toderici, G., Shetty, S., et al.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732. IEEE (2014). https://doi.org/10.1109/CVPR.2014.223
Kopuklu, O., Kose, N., Rigoll, G.: Motion fused frames: Data level fusion strategy for hand gesture recognition. In: arXiv preprint. p. art. no. 9 (2018). arXiv:1804.07187v2
Krak, I., Kondratiuk, S.: Cross-platform software for the development of sign communication system: Dactyl language modelling. In: Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017, vol. 1, pp. 167–170. IEEE (2017). https://doi.org/10.1109/STC-CSIT.2017.8098760
Krak, I., Kudin, G., Kulias, A.: Multidimensional scaling by means of pseudoinverse operation. Cybern. Syst. Anal. 55(1), 22–29 (2019). https://doi.org/10.1007/s10559-019-00108-9
Article MATH Google Scholar
Krak, Y., Golik, A., Kasianiuk, V.: Recognition of dactylemes of ukrainian sign language based on the geometric characteristics of hand contours defects. J. Autom. Inf. Sci. 48(4), 90–98 (2016). https://doi.org/10.1615/JAutomatInfScien.v48.i4.80
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25(2), 1097–1105 (2012). https://doi.org/10.1145/3065386
Article Google Scholar
Kryvonos, I., Krak, I.: Modeling human hand movements, facial expressions, and articulation to synthesize and visualize gesture information. Cybern. Syst. Anal. 47(4), 501–505 (2011). https://doi.org/10.1007/s10559-011-9332-4
Article Google Scholar
Kryvonos, I., Krak, I.V., Barmak, O., Kulias, A.: Methods to create systems for the analysis and synthesis of communicative information. Cybern. Syst. Anal. 53(6), 847–856 (2017). https://doi.org/10.1007/s10559-017-9986-7
Article MATH Google Scholar
Kryvonos, I., Krak, I., Barmak, O., Shkilniuk, D.: Construction and identification of elements of sign communication. Cybern. Syst. Anal. 49(2), 163–172 (2013). https://doi.org/10.1007/s10559-016-9812-7
Article Google Scholar
Kryvonos, I., Krak, Y., Barchukova, Y., Trotsenko, B.: Human hand motion parametrization for dactylemes modeling. J. Autom. Inf. Sci. 43(12), 1–11 (2011). https://doi.org/10.1615/JAutomatInfScien.v43.i12.10
Article Google Scholar
LINFO: The linux information project, cross-platform definition. http://www.linfo.org/cross-platform.html
Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: arXiv preprint. p. art. no. 5 (2018). arXiv:1807.11164
Mell, P., Grance, T.: The NIST Definition of Cloud Computing, Technical report. National Institute of Standards and Technology: U.S. Department of Commerce. Special publication (2011). https://doi.org/10.6028/NIST.SP.800-145
Ong, E.J., Cooper, H., Pugeault, N., Bowden, R.: Sign language recognition using sequential pattern trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2200–2207. IEEE (2012)
Google Scholar
PostgreSQL: The world’s most advanced open source relational database (2020). https://www.postgresql.org/
Sandler, M., Howard, A., Zhu, M., et al.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520. IEEE (2018). https://doi.org/10.1109/CVPR.2018.00474
TensorFlow: Tensorflow framework documentation (2020). https://www.tensorflow.org/api/
Twentybn: The 20bn-jester dataset v1 (2019). https://20bn.com/datasets/jester
Unity: Unity3d framework (2019). https://unity3d.com/
Wang, H., He, W., Wang, F.K.: Enterprise cloud service architectures. Inf. Technol. Manag. 13(4), 445–454 (2012). https://doi.org/10.1007/s10799-012-0139-4
Article Google Scholar
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856. IEEE (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
Serhii Kondratiuk, Iurii Krak & Veda Kasianiuk
Glushkov Cybernetics Institute, Kyiv, Ukraine
Iurii Krak & Anatolii Kylias

Authors

Serhii Kondratiuk
View author publications
You can also search for this author in PubMed Google Scholar
Iurii Krak
View author publications
You can also search for this author in PubMed Google Scholar
Anatolii Kylias
View author publications
You can also search for this author in PubMed Google Scholar
Veda Kasianiuk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iurii Krak .

Editor information

Editors and Affiliations

Department of Informatics, Jan Evangelista Purkyně University in Ústí nad Labem, Ústí nad Labem, Czech Republic
Sergii Babichev
Department of Informatics and Computer Science, Kherson National Technical University, Kherson, Ukraine
Volodymyr Lytvynenko
Institute of Electronics and Information, Lublin University of Technology, Lublin, Poland
Waldemar Wójcik
Department of Informatics and Computer Science, Kherson National Technical University, Kherson, Ukraine
Svetlana Vyshemyrskaya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kondratiuk, S., Krak, I., Kylias, A., Kasianiuk, V. (2021). Fingerspelling Alphabet Recognition Using CNNs with 3D Convolutions for Cross Platform Applications. In: Babichev, S., Lytvynenko, V., Wójcik, W., Vyshemyrskaya, S. (eds) Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2020. Advances in Intelligent Systems and Computing, vol 1246. Springer, Cham. https://doi.org/10.1007/978-3-030-54215-3_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-54215-3_37
Published: 26 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54214-6
Online ISBN: 978-3-030-54215-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics