ABSTRACT
Abstract: Sign language is an important communicating method for deaf-mute people. In recent years, the hybrid model between the Bi-directional Long-Short Term Memory (BiLSTM) and 3D convolutional network model makes full use of the feature extraction ability of convolutional neural networks and the advantages of time series classification of the recurrent neural network model to achieve more accurate recognition. However, high precision, scalability and robustness are still important challenges in future sign language recognition research. The main research direction and responding research methods aim to improve the accuracy and speed of 3D poses and continuous sentences sign language recognition based on hybrid models with the upgrading of computer hardware equipment and network. The paper improves a novel residual neural network and then engages it to extract features and build models with BiLSTM. The proposed hybrid model combines the improved neural network and Bi-directional Long-Short Term Memory (BiLSTM). In order to validate the proposed algorithm, we introduce the Chalearn dataset and Sports-1M dataset captured with depth, color and stereo-IR sensors. On the two challenging datasets, our multi-path hybrid residual neural network achieves an accuracy of 78.9% and 82.7%, outperforms other state-of-the-art algorithms, and is close to human accuracy of 88.4%.
- CHEOK M J, OMAR Z, and JAWARD M H. A review of hand gesture and sign language recognition techniques[J]. International Journal of Machine Learning and Cybernetics, 2019, 10(1): 131–153. Doi: 10.1007/s13042-017-0705-5.Google ScholarCross Ref
- CAMGOZ N C, HADFIELD S, KOLLER O, SubUNets: End-to-end hand shape and continuous sign language recognition[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 3075–3084.Google ScholarCross Ref
- KO S K, SON J G, and JUNG H. Sign language recognition with recurrent neural network using human keypoint detection[C]. 2018 Conference on Research in Adaptive and Convergent Systems, Honolulu, USA, 2018: 326–328.Google ScholarDigital Library
- CAMGOZ N C, HADFIELD S, KOLLER O, Using convolutional 3d neural networks for user-independent continuous gesture recognition[C]. The 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 2016: 49–54.Google Scholar
- PU Junfu, ZHOU Wengang, and LI Houqiang. Dilated convolutional network with iterative optimization for continuous sign language recognition[C]. The 27th International Joint Conference on Artificial Intelligence, Wellington, New Zealand, 2018: 885–891.Google ScholarDigital Library
- HUANG Jie, ZHOU Wengang, ZHANG Qilin, Video- based sign language recognition without temporal segmentation[C]. The 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 2257–2264.Google ScholarCross Ref
- WANG Shuo, GUO Dan, ZHOU Wengang, Connectionist temporal fusion for sign language translation[C]. The 26th ACM International Conference on Multimedia, Seoul, Korea, 2018: 1483– 1491.Google ScholarDigital Library
- KOLLER O, ZARGARAN O, NEY H, Deep sign: Hybrid CNN-HMM for continuous sign language recognition[C]. 2016 British Machine Vision Conference, York, UK, 2016: 1–2.Google ScholarCross Ref
- KOLLER O, ZARGARAN S, and NEY H. Re-sign: Re- aligned end-to-end sequence modelling with deep recurrent CNN-HMMs[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 2017: 4297–4305.Google ScholarCross Ref
- KOLLER O, ZARGARAN S, NEY H, Deep sign: Enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs[J]. International Journal of Computer Vision, 2018, 126(12): 1311–1325. Doi: 10.1007/s11263-018-1121-3.Google ScholarDigital Library
- PIGOU L, VAN HERREWEGHE M, and DAMBRE J. Gesture and sign language recognition with temporal residual networks[C]. 2017 IEEE International Conference on Computer Vision Workshops, Venice, Italy, 2017: 3086–3093.Google ScholarCross Ref
- CUI Runpeng, LIU Hu, and ZHANG Changshui. Recurrent convolutional neural networks for continuous sign language recognition by staged optimization[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 7361–7369.Google ScholarCross Ref
- ARIESTA M C, WIRYANA F, SUHARJITO, Sentence level Indonesian sign language recognition using 3D convolutional neural network and bidirectional recurrent neural network[C]. 2018 Indonesian Association for Pattern Recognition International Conference (INAPR), Jakarta, Indonesia, 2018: 16–22.Google Scholar
- GUO Dan, ZHOU Wengang, LI Houqiang, Hierarchical LSTM for sign language translation[C]. The 32nd AAAI Conference on Artificial Intelligence, the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, USA, 2018: 6845–6852.Google Scholar
- CUI Runpeng, LIU Hu, and ZHANG Changshui. A deepneural framework for continuous sign language recognition by iterative training[J]. IEEE Transactions on Multimedia, 2019, 21(7): 1880–1891. Doi: 10.1109/TMM.2018.2889563.Google ScholarCross Ref
- FORSTER J, SCHMIDT C, HOYOUX T, RWTH- PHOENIX-Weather: A large vocabulary sign language recognition and translation corpus[C]. The 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, 2012: 3785–3789.Google Scholar
- Rekha J, Bhattacharya J, Majumder S. Shape, texture and local movement hand gesture features for indian sign language recognition[C]//3rd International Conference on Trendz in Information Sciences & Computing (TISC2011). IEEE, 2011: 30-35.Google Scholar
- E.Ohn-BarandM.Trivedi.Handgesturerecognitioninreal time for automotive interfaces: a multimodal vision-based approach and evaluations. IEEE ITS, 15(6):1–10, 2014.Google Scholar
- K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition. In NIPS, 2014.Google Scholar
- H. Wang, D. Oneata, J. Verbeek, and C. Schmid. A robust and efficient video representation for action recognition. IJCV, 2015.Google Scholar
Recommendations
Sign language recognition with recurrent neural network using human keypoint detection
RACS '18: Proceedings of the 2018 Conference on Research in Adaptive and Convergent SystemsWe study the sign language recognition problem which is to translate the meaning of signs from visual input such as videos. It is well-known that many problems in the field of computer vision require a huge amount of dataset to train deep neural network ...
Sign language recognition using 3-D Hopfield neural network
ICIP '95: Proceedings of the 1995 International Conference on Image Processing (Vol.2)-Volume 2 - Volume 2This paper presents a sign language recognition system which consists of three modules: model-based hand tracking, feature extraction, and gesture recognition using a 3-D Hopfield neural network. In the experiments, we illustrate that this system can ...
Time-shiftable Convolutional Sign Language Recognition Based on Key Frame Extraction
ICIT '22: Proceedings of the 2022 10th International Conference on Information Technology: IoT and Smart CitySign language recognition for the deaf-mute is an important technology in the field of computer vision, which is conducive to promoting communication between hearing person and the deaf-mute. However, the current mainstream methods for sign language ...
Comments