Abstract
The proposed communication technology is developed with cross-platform frameworks and consists of two parts: modeling and recognition of Ukrainian dactyl signs. Modeling is performed using realistic 3d hand model with animations of dynamic gesture and transitions between gesture, developed in Unity3D framework. User is able specify different words in the user interface and adjust number of polygons and step of animation to get satisfying performance. The computations can be done both on the device or in web. Recognition model training and serving is done with Tensorflow, which allows to deploy the model on different devices, including mobile, or to perform the model prediction on server in cloud. The dataset with Ukrainian dactyl signs was collected with 50 persons and 1500 images per each gesture, which allowed to train the model with high enough accuracy and robust in different environment conditions. The model is based on the MobileNetv3 convolutional neural network architecture, and with the optimal configuration of layers and network parameters, also in order to take into account temporal data, 3d convolutions were used. On the collected test dataset, which is 10% of the overall augmented dataset, and is 15000 images, with different light, noise and blurring condition and different personas hands, accuracy of over 98% is achieved.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barmak, A., Krak, Y., Manziuk, E., Kasianiuk, V.: Information technology of separating hyperplanes synthesis for linear classifiers. J. Autom. Inf. Sci. 51(5), 54–64 (2019). https://doi.org/10.1615/JAutomatInfScien.v51.i5.50
Bobic, V., Tadic, T.: Hand gesture recognition using neural network based techniques. In: 13th Symposium on Neural Networks and Applications (NEUREL), pp. 35–38. IEEE (2016)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 4724–4733. IEEE (2017). https://doi.org/10.1109/CVPR.2017.502
Garcia, B.: American Sign language: Real-time American Sign Language Recognition with Convolutional Neural Networks. Stanford University Stanford (2015)
Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3d CNNS retrace the history of 2d CNNS and imagenet? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546–6555. IEEE (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016). https://doi.org/10.1109/CVPR.2016.90
Howard, A., Sandler, M., Chu, G., et al.: Searching for mobilenetv3. In: arXiv preprint. p. art. no. 5 (2019). arXiv:1905.02244
Howard, A., Zhu, M., Chen, B., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. In: arXiv preprint. p. art. no. 9 (2017). arXiv:1704.04861
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. IEEE (2018). https://doi.org/10.1109/CVPR.2018.00745
Iandola, F.N., Han, S., Moskewicz, M.W., et al.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<0.5mb model size. In: arXiv preprint. p. art. no. 13 (2016). arXiv:1602.07360
Karpathy, A., Toderici, G., Shetty, S., et al.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732. IEEE (2014). https://doi.org/10.1109/CVPR.2014.223
Kopuklu, O., Kose, N., Rigoll, G.: Motion fused frames: Data level fusion strategy for hand gesture recognition. In: arXiv preprint. p. art. no. 9 (2018). arXiv:1804.07187v2
Krak, I., Kondratiuk, S.: Cross-platform software for the development of sign communication system: Dactyl language modelling. In: Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017, vol. 1, pp. 167–170. IEEE (2017). https://doi.org/10.1109/STC-CSIT.2017.8098760
Krak, I., Kudin, G., Kulias, A.: Multidimensional scaling by means of pseudoinverse operation. Cybern. Syst. Anal. 55(1), 22–29 (2019). https://doi.org/10.1007/s10559-019-00108-9
Krak, Y., Golik, A., Kasianiuk, V.: Recognition of dactylemes of ukrainian sign language based on the geometric characteristics of hand contours defects. J. Autom. Inf. Sci. 48(4), 90–98 (2016). https://doi.org/10.1615/JAutomatInfScien.v48.i4.80
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25(2), 1097–1105 (2012). https://doi.org/10.1145/3065386
Kryvonos, I., Krak, I.: Modeling human hand movements, facial expressions, and articulation to synthesize and visualize gesture information. Cybern. Syst. Anal. 47(4), 501–505 (2011). https://doi.org/10.1007/s10559-011-9332-4
Kryvonos, I., Krak, I.V., Barmak, O., Kulias, A.: Methods to create systems for the analysis and synthesis of communicative information. Cybern. Syst. Anal. 53(6), 847–856 (2017). https://doi.org/10.1007/s10559-017-9986-7
Kryvonos, I., Krak, I., Barmak, O., Shkilniuk, D.: Construction and identification of elements of sign communication. Cybern. Syst. Anal. 49(2), 163–172 (2013). https://doi.org/10.1007/s10559-016-9812-7
Kryvonos, I., Krak, Y., Barchukova, Y., Trotsenko, B.: Human hand motion parametrization for dactylemes modeling. J. Autom. Inf. Sci. 43(12), 1–11 (2011). https://doi.org/10.1615/JAutomatInfScien.v43.i12.10
LINFO: The linux information project, cross-platform definition. http://www.linfo.org/cross-platform.html
Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: arXiv preprint. p. art. no. 5 (2018). arXiv:1807.11164
Mell, P., Grance, T.: The NIST Definition of Cloud Computing, Technical report. National Institute of Standards and Technology: U.S. Department of Commerce. Special publication (2011). https://doi.org/10.6028/NIST.SP.800-145
Ong, E.J., Cooper, H., Pugeault, N., Bowden, R.: Sign language recognition using sequential pattern trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2200–2207. IEEE (2012)
PostgreSQL: The world’s most advanced open source relational database (2020). https://www.postgresql.org/
Sandler, M., Howard, A., Zhu, M., et al.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520. IEEE (2018). https://doi.org/10.1109/CVPR.2018.00474
TensorFlow: Tensorflow framework documentation (2020). https://www.tensorflow.org/api/
Twentybn: The 20bn-jester dataset v1 (2019). https://20bn.com/datasets/jester
Unity: Unity3d framework (2019). https://unity3d.com/
Wang, H., He, W., Wang, F.K.: Enterprise cloud service architectures. Inf. Technol. Manag. 13(4), 445–454 (2012). https://doi.org/10.1007/s10799-012-0139-4
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856. IEEE (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kondratiuk, S., Krak, I., Kylias, A., Kasianiuk, V. (2021). Fingerspelling Alphabet Recognition Using CNNs with 3D Convolutions for Cross Platform Applications. In: Babichev, S., Lytvynenko, V., Wójcik, W., Vyshemyrskaya, S. (eds) Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2020. Advances in Intelligent Systems and Computing, vol 1246. Springer, Cham. https://doi.org/10.1007/978-3-030-54215-3_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-54215-3_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54214-6
Online ISBN: 978-3-030-54215-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)