Vowel priority lip matching scheme and similarity evaluation model based on humanoid robot Ren-Xin

Liu, Zheng; Kang, Xin; Nishide, Shun; Ren, Fuji

doi:10.1007/s12652-020-02175-9

Vowel priority lip matching scheme and similarity evaluation model based on humanoid robot Ren-Xin

Original Research
Published: 11 June 2020

Volume 13, pages 5055–5066, (2022)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Zheng Liu ORCID: orcid.org/0000-0003-0685-1124¹,
Xin Kang¹,
Shun Nishide¹ &
…
Fuji Ren¹

305 Accesses
Explore all metrics

Abstract

At present, the significance of humanoid robots dramatically increased while this kind of robots rarely enters human life because of its immature development. The lip shape of humanoid robots is crucial in the speech process since it makes humanoid robots look like real humans. Many studies show that vowels are the essential elements of pronunciation in all languages in the world. Based on the traditional research of viseme, we increased the priority of the smooth transition of lip between vowels and propose a lip matching scheme based on vowel priority. Additionally, we also designed a similarity evaluation model based on the Manhattan distance by using computer vision lip features, which quantifies the lip shape similarity between 0–1 provides an effective recommendation of evaluation standard. Surprisingly, this model successfully compensates the disadvantages of lip shape similarity evaluation criteria in this field. We applied this lip-matching scheme to Ren-Xin humanoid robot and performed robot teaching experiments as well as a similarity comparison experiment of 20 sentences with two males and two females and the robot. Notably, all the experiments have achieved excellent results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Formant-Based Lip Motion Generation and Evaluation in Humanoid Robots

Conformer-Based Lip-Reading for Japanese Sentence

Lip-Reading Using Pixel-Based and Geometry-Based Features for Multimodal Human–Robot Interfaces

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Binyong Y, Felley M (1990) Chinese romanization: pronunciation & orthography. Peking
Cootes T, Baldock ER, Graham J (2000) An introduction to active shape models. Image Process Anal 243657:223–248
Google Scholar
Cootes T, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685. https://doi.org/10.1109/34.927467
Article Google Scholar
Dai K, Zhang Y, Wang D et al (2020) High-performance long-term tracking with meta-updater. arXiv preprint arXiv:2004.00305
Fan X, Yang X (2017) A speech-driven lip synchronization method. J Donghua Univ (Nat Sci) 4:2 (in Chinese)
Google Scholar
Fu K, Sun L, Kang X et al (2019) Text detection for natural scene based on mobileNet V2 and U-Net. In: 2019 IEEE international conference on mechatronics and automation (ICMA), pp 1560–1564. https://doi.org/10.1109/ICMA.2019.8816384
Hara F, Endou K, Shirata S (1997) Lip-Configuration Control Of A Mouth Robot For Japanese Vowels. In: Proceedings 6th IEEE International workshop on robot and human communication, pp 412–418. https://doi.org/10.1109/ROMAN.1997.647022
Herath DC, Jochum E, Vlachos E (2017) An experimental study of embodied interaction and human perception of social presence for interactive robots in public settings. IEEE Trans Cogn Dev Syst 10(4):1096–1105. https://doi.org/10.1109/TCDS.2017.2787196
Article Google Scholar
Hwang J, Tani J (2017) Seamless integration and coordination of cognitive skills in humanoid robots: a deep learning approach. IEEE Trans Cogn Dev Syst 10(2):345–358. https://doi.org/10.1109/TCDS.2017.2714170
Article Google Scholar
Hyung HJ, Ahn BK, Choi D et al (2016) Evaluation of a Korean Lip-sync system for an android robot. In: 2016 13th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), pp 78–82. https://doi.org/10.1109/URAI.2016.7734025
Ishi CT, Machiyashiki D, Mikata R et al (2018) A speech-driven hand gesture generation method and evaluation in android robots. IEEE Robot Autom Lett 3(4):3757–3764. https://doi.org/10.1109/LRA.2018.2856281
Article Google Scholar
Keating PA, Huffman MK (1984) Vowel variation in Japanese. Phonetica 41(4):191–207. https://doi.org/10.1159/000261726
Article Google Scholar
Kim TH (2008) A study on Korean lip-sync for animation characters-based on lip-sync technique in english-speaking animations. Cartoon Animat Stud 13:97–114 (in Korean)
Google Scholar
Kuindersma S, Deits R, Fallon M et al (2016) Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot. Auton Robots 40(3):429–455. https://doi.org/10.1007/s10514-015-9479-3
Article Google Scholar
Li X, Wang T (2018) A long time tracking with BIN-NST and DRN. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-018-1025-7
Article Google Scholar
Li P, Wang D, Wang L et al (2018) Deep visual tracking: review and experimental comparison. Pattern Recogn 76:323–338. https://doi.org/10.1016/j.patcog.2017.11.007
Article Google Scholar
Liu Z, Ren F, Kang X (2019) Research on the effect of different speech segment lengths on speech emotion recognition based on LSTM. In: Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering, pp 491–499. https://doi.org/10.18178/wcse.2019.06.073
Long T (2019) Research on application of athlete gesture tracking algorithms based on deep learning. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-019-01575-w
Article Google Scholar
Lu H, Li Y, Chen M et al (2018) Brain intelligence: go beyond artificial intelligence. Mob Netw Appl 23(2):368–375. https://doi.org/10.1007/s11036-017-0932-8
Article Google Scholar
Luo RC, Chang SR, Huang CC et al (2011) Human robot interactions using speech synthesis and recognition with lip synchronization. In: 2011 IECON 2011–37th Annual Conference of the IEEE Industrial Electronics Society, pp 171–176. https://doi.org/10.1109/IECON.2011.6119307
Miyazaki T, Nakashima T (2015) Analysis of mouth shape deformation rate for generation of Japanese utterance images automatically. In: Software engineering research, management and applications, pp 75–86. https://doi.org/10.1007/978-3-319-11265-7_6
Morishima S, Harashima H (1991) A media conversion from speech to facial image for intelligent man-machine interface. IEEE J Sel Areas Commun 9(4):594–600. https://doi.org/10.1109/49.81953
Article Google Scholar
Nishikawa K, Takanobu H, Mochida T et al (2004) Speech production of an advanced talking robot based on human acoustic theory. In: 2004 IEEE International Conference on Robotics and Automation(ICRA), pp 3213–3219. https://doi.org/10.1109/ROBOT.2004.1308749
Oh KG, Jung C Y, Lee Y G et al (2010) Real-time lip synchronization between text-to-speech (TTS) system and robot mouth. In: 19th International symposium in robot and human interactive communication, pp 620–625. https://doi.org/10.1109/ROMAN.2010.5598656
Ren F (2009) Affective information processing and recognizing human emotion. Electron Notes Theor Comput Sci 225:39–50. https://doi.org/10.1016/j.entcs.2008.12.065
Article Google Scholar
Ren F, Bao Y (2020) A review on human-computer interaction and intelligent robots. Int J Inf Technol Decis Mak 19(01):5–47. https://doi.org/10.1142/S0219622019300052
Article Google Scholar
Ren F, Huang Z (2016) Automatic facial expression learning method based on humanoid robot XIN-REN. IEEE Trans Hum Mach Syst 46(6):810–821. https://doi.org/10.1109/THMS.2016.2599495
Article Google Scholar
Ren F, Kang X, Quan C (2015) Examining accumulated emotional traits in suicide blogs with an emotion topic model. IEEE J Biomed Health Inform 20(5):1384–1396. https://doi.org/10.1109/JBHI.2015.2459683
Article Google Scholar
Ren F, Matsumoto K (2015) Semi-automatic creation of youth slang corpus and its application to affective computing. IEEE Trans Affect Comput 7(2):176–189. https://doi.org/10.1109/TAFFC.2015.2457915
Article Google Scholar
Saitoh T, Konishi R (2010) Profile lip reading for vowel and word recognition. In: 2010 20th International conference on pattern recognition, pp 1356–1359. https://doi.org/10.1109/ICPR.2010.335
Sulistijono IA, Baiqunni HH, Darojah Z et al (2014) Vowel recognition system of Lipsynchrobot in lips gesture using neural network. In: 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp 1751–1756. https://doi.org/10.1109/FUZZ-IEEE.2014.6891843
Sun Y, Wang X, Tang X (2013) Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3476–3483. https://doi.org/10.1109/CVPR.2013.446
Verner IM, Polishuk A, Krayner N (2016) Science class with RoboThespian: using a robot teacher to make science fun and engage students. IEEE Robot Autom Mag 23(2):74–80. https://doi.org/10.1109/MRA.2016.2515018
Article Google Scholar
Yan J (1998) Research on the viseme of chinese phonetics. Comput Eng Des 19(1):31–34 (in Chinese)
Google Scholar
You ZJ, Shen CY, Chang C W et al (2006) A robot as a teaching assistant in an English class. In: Sixth IEEE international conference on advanced learning technologies (ICALT'06), pp 87–91. https://doi.org/10.1109/ICALT.2006.1652373
Zeng H, Hu D, Hu Z (2013) Simple analyzing on matching mechanism between Chinese speech and mouth shape. Audio Eng 10:44–48 (in Chinese)
Google Scholar

Download references

Acknowledgements

This research has been partially supported by JSPS KAKENHI Grant no. 19K20345.

Author information

Authors and Affiliations

School of Information Faculty of Engineering, Tokushima University, Tokushima, Japan
Zheng Liu, Xin Kang, Shun Nishide & Fuji Ren

Authors

Zheng Liu
View author publications
You can also search for this author inPubMed Google Scholar
Xin Kang
View author publications
You can also search for this author inPubMed Google Scholar
Shun Nishide
View author publications
You can also search for this author inPubMed Google Scholar
Fuji Ren
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Fuji Ren.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Z., Kang, X., Nishide, S. et al. Vowel priority lip matching scheme and similarity evaluation model based on humanoid robot Ren-Xin. J Ambient Intell Human Comput 13, 5055–5066 (2022). https://doi.org/10.1007/s12652-020-02175-9

Download citation

Received: 31 March 2020
Accepted: 04 June 2020
Published: 11 June 2020
Issue Date: November 2022
DOI: https://doi.org/10.1007/s12652-020-02175-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Vowel priority lip matching scheme and similarity evaluation model based on humanoid robot Ren-Xin

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Formant-Based Lip Motion Generation and Evaluation in Humanoid Robots

Conformer-Based Lip-Reading for Japanese Sentence

Lip-Reading Using Pixel-Based and Geometry-Based Features for Multimodal Human–Robot Interfaces

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now