Skip to main content

Fingerspelling Alphabet Recognition Using CNNs with 3D Convolutions for Cross Platform Applications

  • Conference paper
  • First Online:
Lecture Notes in Computational Intelligence and Decision Making (ISDMCI 2020)

Abstract

The proposed communication technology is developed with cross-platform frameworks and consists of two parts: modeling and recognition of Ukrainian dactyl signs. Modeling is performed using realistic 3d hand model with animations of dynamic gesture and transitions between gesture, developed in Unity3D framework. User is able specify different words in the user interface and adjust number of polygons and step of animation to get satisfying performance. The computations can be done both on the device or in web. Recognition model training and serving is done with Tensorflow, which allows to deploy the model on different devices, including mobile, or to perform the model prediction on server in cloud. The dataset with Ukrainian dactyl signs was collected with 50 persons and 1500 images per each gesture, which allowed to train the model with high enough accuracy and robust in different environment conditions. The model is based on the MobileNetv3 convolutional neural network architecture, and with the optimal configuration of layers and network parameters, also in order to take into account temporal data, 3d convolutions were used. On the collected test dataset, which is 10% of the overall augmented dataset, and is 15000 images, with different light, noise and blurring condition and different personas hands, accuracy of over 98% is achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barmak, A., Krak, Y., Manziuk, E., Kasianiuk, V.: Information technology of separating hyperplanes synthesis for linear classifiers. J. Autom. Inf. Sci. 51(5), 54–64 (2019). https://doi.org/10.1615/JAutomatInfScien.v51.i5.50

    Article  Google Scholar 

  2. Bobic, V., Tadic, T.: Hand gesture recognition using neural network based techniques. In: 13th Symposium on Neural Networks and Applications (NEUREL), pp. 35–38. IEEE (2016)

    Google Scholar 

  3. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 4724–4733. IEEE (2017). https://doi.org/10.1109/CVPR.2017.502

  4. Garcia, B.: American Sign language: Real-time American Sign Language Recognition with Convolutional Neural Networks. Stanford University Stanford (2015)

    Google Scholar 

  5. Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3d CNNS retrace the history of 2d CNNS and imagenet? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546–6555. IEEE (2018)

    Google Scholar 

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016). https://doi.org/10.1109/CVPR.2016.90

  7. Howard, A., Sandler, M., Chu, G., et al.: Searching for mobilenetv3. In: arXiv preprint. p. art. no. 5 (2019). arXiv:1905.02244

  8. Howard, A., Zhu, M., Chen, B., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. In: arXiv preprint. p. art. no. 9 (2017). arXiv:1704.04861

  9. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. IEEE (2018). https://doi.org/10.1109/CVPR.2018.00745

  10. Iandola, F.N., Han, S., Moskewicz, M.W., et al.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5mb model size. In: arXiv preprint. p. art. no. 13 (2016). arXiv:1602.07360

  11. Karpathy, A., Toderici, G., Shetty, S., et al.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732. IEEE (2014). https://doi.org/10.1109/CVPR.2014.223

  12. Kopuklu, O., Kose, N., Rigoll, G.: Motion fused frames: Data level fusion strategy for hand gesture recognition. In: arXiv preprint. p. art. no. 9 (2018). arXiv:1804.07187v2

  13. Krak, I., Kondratiuk, S.: Cross-platform software for the development of sign communication system: Dactyl language modelling. In: Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017, vol. 1, pp. 167–170. IEEE (2017). https://doi.org/10.1109/STC-CSIT.2017.8098760

  14. Krak, I., Kudin, G., Kulias, A.: Multidimensional scaling by means of pseudoinverse operation. Cybern. Syst. Anal. 55(1), 22–29 (2019). https://doi.org/10.1007/s10559-019-00108-9

    Article  MATH  Google Scholar 

  15. Krak, Y., Golik, A., Kasianiuk, V.: Recognition of dactylemes of ukrainian sign language based on the geometric characteristics of hand contours defects. J. Autom. Inf. Sci. 48(4), 90–98 (2016). https://doi.org/10.1615/JAutomatInfScien.v48.i4.80

    Article  Google Scholar 

  16. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25(2), 1097–1105 (2012). https://doi.org/10.1145/3065386

    Article  Google Scholar 

  17. Kryvonos, I., Krak, I.: Modeling human hand movements, facial expressions, and articulation to synthesize and visualize gesture information. Cybern. Syst. Anal. 47(4), 501–505 (2011). https://doi.org/10.1007/s10559-011-9332-4

    Article  Google Scholar 

  18. Kryvonos, I., Krak, I.V., Barmak, O., Kulias, A.: Methods to create systems for the analysis and synthesis of communicative information. Cybern. Syst. Anal. 53(6), 847–856 (2017). https://doi.org/10.1007/s10559-017-9986-7

    Article  MATH  Google Scholar 

  19. Kryvonos, I., Krak, I., Barmak, O., Shkilniuk, D.: Construction and identification of elements of sign communication. Cybern. Syst. Anal. 49(2), 163–172 (2013). https://doi.org/10.1007/s10559-016-9812-7

    Article  Google Scholar 

  20. Kryvonos, I., Krak, Y., Barchukova, Y., Trotsenko, B.: Human hand motion parametrization for dactylemes modeling. J. Autom. Inf. Sci. 43(12), 1–11 (2011). https://doi.org/10.1615/JAutomatInfScien.v43.i12.10

    Article  Google Scholar 

  21. LINFO: The linux information project, cross-platform definition. http://www.linfo.org/cross-platform.html

  22. Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: arXiv preprint. p. art. no. 5 (2018). arXiv:1807.11164

  23. Mell, P., Grance, T.: The NIST Definition of Cloud Computing, Technical report. National Institute of Standards and Technology: U.S. Department of Commerce. Special publication (2011). https://doi.org/10.6028/NIST.SP.800-145

  24. Ong, E.J., Cooper, H., Pugeault, N., Bowden, R.: Sign language recognition using sequential pattern trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2200–2207. IEEE (2012)

    Google Scholar 

  25. PostgreSQL: The world’s most advanced open source relational database (2020). https://www.postgresql.org/

  26. Sandler, M., Howard, A., Zhu, M., et al.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520. IEEE (2018). https://doi.org/10.1109/CVPR.2018.00474

  27. TensorFlow: Tensorflow framework documentation (2020). https://www.tensorflow.org/api/

  28. Twentybn: The 20bn-jester dataset v1 (2019). https://20bn.com/datasets/jester

  29. Unity: Unity3d framework (2019). https://unity3d.com/

  30. Wang, H., He, W., Wang, F.K.: Enterprise cloud service architectures. Inf. Technol. Manag. 13(4), 445–454 (2012). https://doi.org/10.1007/s10799-012-0139-4

    Article  Google Scholar 

  31. Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856. IEEE (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iurii Krak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kondratiuk, S., Krak, I., Kylias, A., Kasianiuk, V. (2021). Fingerspelling Alphabet Recognition Using CNNs with 3D Convolutions for Cross Platform Applications. In: Babichev, S., Lytvynenko, V., Wójcik, W., Vyshemyrskaya, S. (eds) Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2020. Advances in Intelligent Systems and Computing, vol 1246. Springer, Cham. https://doi.org/10.1007/978-3-030-54215-3_37

Download citation

Publish with us

Policies and ethics