Application of Granular Computing-Based Pre-processing in the Labelling of Phonemes

Ashrafi, Negin; Ramanna, Sheela

doi:10.1007/978-981-16-2765-1_11

Negin Ashrafi⁶ &
Sheela Ramanna⁶

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 238))

484 Accesses

Abstract

Machine learning algorithms are increasingly effective in algorithmic viseme recognition which is a main component of audio-visual speech recognition (AVSR). A viseme is the smallest recognizable unit correlated with a particular realization of a given phoneme. Labelling of phonemes and assigning them to viseme classes is a challenging problem in AVSR. In this paper, we present preliminary results of applying rough sets in pre-processing video frames (with lip markers) of spoken corpus in an effort to label the phonemes spoken by the speakers. The problem addressed here is to detect and remove frames in which the shape of the lips do not represent a phoneme completely. Our results demonstrate that the silhouette score improves with rough set-based pre-processing using the unsupervised K-means clustering method. In addition, an unsupervised CNN model for feature extraction was used as input to the K-means clustering method. The results show promise in the application of a granular computing method for pre-processing large audio-video datasets.

http://www.modality-corpus.org/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.modality-corpus.org/.

References

Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Understand. 104(2–3), 90–126 (2006)
Article Google Scholar
Kawaler, M., Czyżewski, A.: Database of speech and facial expressions recorded with optimized face motion capture settings. J. Intell. Inf. Syst. 53(2), 381–404 (2019)
Article Google Scholar
Noda, K., Yamaguchi, Y., Nakadai, K., Okuno, H.G., Ogata, T.: Audio-visual speech recognition using deep learning. Appl. Intell. 42(4), 722–737 (2015)
Article Google Scholar
Czyzewski, A., Kostek, B., Bratoszewski, P., Kotus, J., Szykulski, M.: An audio-visual corpus for multimodal automatic speech recognition. J. Intell. Inf. Syst. 49(2), 167–192 (2017)
Article Google Scholar
Shillingford, B., Assael, Y., Hoffman, M.W., Paine, T., Hughes, C., Prabhu, U., Liao, H., Sak, H., Rao, K., Bennett, L., et al.: Large-scale visual speech recognition. arXiv preprint arXiv:1807.05162 (2018)
Kahou, S., Bouthillier, X., Lamblin, P., Gulcehre, C., Michalski, V., Konda, V.: Emonets: multimodal deep learning approaches for emotion recognition in video. J. Multimodal User Interfaces 10(2), 99–111 (2016)
Article Google Scholar
Vryzas, N., Liatsou, A., Kotsakis, R., Dimoulas, C., Kalliris, G.: Speech emotion recognition for performance interaction. J. Audio Eng. Soc. 66(6), 457–467 (2018)
Article Google Scholar
Zhang, S., Zhang, S., Huang, T., Gao, W.: Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Trans. Multimedia 20(6), 1576–1590 (2018)
Article Google Scholar
Jachimski, D., Czyzewski, A., Ciszewski, T.: A comparative study of english viseme recognition methods and algorithms. Multimedia Tools Appl. 77(13), 16495–16532 (2018)
Article Google Scholar
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)
Article Google Scholar
Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982)
Article Google Scholar
Zadeh, L.A.: Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst. 90(2), 111–127 (1997)
Article MathSciNet Google Scholar
Butenkov, S.: Granular computing in image processing and understanding. In: Proceedings of International Conference on Artificial Intelligence AIA-2004, Innsbruk (pp. 811–816). IASTED (2004)
Google Scholar
Pal, S.K., Shankar, B.U., Mitra, P.: Granular computing, rough entropy and object extraction. Pattern Recognit. Lett. 26(16), 2509–2517 (2005)
Article Google Scholar
Pal, S.K., Peters, J.F.: Rough Fuzzy Image Analysis: Foundations and Methodologies. CRC Press (2010). ISBN 9781138116238
Google Scholar
Chakraborty, D., Shankar, B.U., Pal, S.K.: Granulation, rough entropy and spatiotemporal moving object detection. Appl. Soft Comput. 13(9), 4001–4009 (2013)
Article Google Scholar
Adak, C.: Rough clustering based unsupervised image change detection. arXiv preprint arXiv:1404.6071 (2014)
Pal, S.K., Bhoumik, D., Chakraborty, D.: Granulated deep learning and Z-numbers in motion detection and object recognition. Neural Comput, Appl (2019)
Book Google Scholar
Guérin, J., Boots, B.: Improving image clustering with multiple pretrained cnn feature extractors. arXiv preprint arXiv:1807.07760 (2018)
Guérin, J., Gibaru, O., Thiery, S., Nyiri, E.: Cnn features are also great at unsupervised classification. arXiv preprint arXiv:1707.01700 (2017)

Download references

Acknowledgements

Negin Ashrafi’s research was supported by MITACS RTA grant# IT20946 and Sheela Ramanna’s research was supported by NSERC Discovery grant# 194376.

Author information

Authors and Affiliations

Department of Applied Computer Science, University of Winnipeg, MB R3B 2E9, Winnipeg, Canada
Negin Ashrafi & Sheela Ramanna

Authors

Negin Ashrafi
View author publications
You can also search for this author in PubMed Google Scholar
Sheela Ramanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheela Ramanna .

Editor information

Editors and Affiliations

Gdynia Maritime University, Gdynia, Poland
Ireneusz Czarnowski
Bournemouth University, Poole, UK
Robert J. Howlett
KES International, Shoreham-by-Sea, UK
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ashrafi, N., Ramanna, S. (2021). Application of Granular Computing-Based Pre-processing in the Labelling of Phonemes. In: Czarnowski, I., Howlett, R.J., Jain, L.C. (eds) Intelligent Decision Technologies. Smart Innovation, Systems and Technologies, vol 238. Springer, Singapore. https://doi.org/10.1007/978-981-16-2765-1_11

Download citation

DOI: https://doi.org/10.1007/978-981-16-2765-1_11
Published: 08 July 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2764-4
Online ISBN: 978-981-16-2765-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics