Automatic boundary detection based on entropy measures for text-independent syllable segmentation

Laleye, Fréjus A. A.; Ezin, Eugène C.; Motamed, Cina

doi:10.1007/s11042-016-3911-3

Automatic boundary detection based on entropy measures for text-independent syllable segmentation

Published: 10 September 2016

Volume 76, pages 16347–16368, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Fréjus A. A. Laleye¹,
Eugène C. Ezin² &
Cina Motamed¹

257 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, we study the boundary detection in syllable segmentation field. We describe an algorithm proposed for text-independent syllable segmentation. This algorithm provides a performance comparison between the entropies of Shannon, Tsallis and Renyi in an effective detection of beginning-ending points of syllable in a speech signal. The Shannon generalizations (Tsallis and Renyi) quantify the degree of signal organization and offer the relevant information such as the voicing degree on the first syllable segment that we obtained from the temporal dynamics of singularity exponents. The method we propose is focused on an aggregation measure based on entropies to enhance the syllable boundaries detection. It has been also demonstrated in this paper that the best suited entropy for efficient boundary detection is Renyi entropy. Once evaluated, our algorithm produced better performance with efficient results on two languages, i.e., the Fongbe (an African tonal language spoken especially in Benin, Togo, and Nigeria) and an American English. The overall accuracy of syllable boundaries was obtained on Fongbe dataset and validated subsequently on TIMIT dataset with a margin of error < 5m s.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Text-Independent Syllable Segmentation Using Singularity Exponents And Rényi Entropy

Article 07 October 2016

Chinese Speech Syllable Segmentation Algorithm Based on Peak Point and Energy Entropy Ratio

Automatic Syllable Repetition Detection in Continuous Speech Based on Linear Prediction Coefficients

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Notes

www.fongbe.fr

References

Baraniuk R, Flandrin P, Janssen A, Michel O (2001) Measuring time-frequency information content using the renyi entropies. In: IEEE Transactions on Information Theory, Vol. 47, IEEE, pp 1391– 1409
Boashash B Time frequency signal analysis and processing: A comprehensive reference. In: Elsevier, Oxford, Elsevier, p 2003
Chen X, Qiu X, Zhu C, Liu P, Huang X (2015) Long short-term memory neural networks for chinese word segmentation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp 1197–1206
Ching-Tang H, Mu-Chun S, Eugene L, Chin H (1999) A segmentation method for continuous speech utilizing hybrid neuro-fuzzy network. J Inf Sci Eng 15 (4):615–628
Google Scholar
Chou C-H, Liu P-H, Cai B (2008) On the studies of syllable segmentation and improving mfccs for automatic birdsong recognition. In: Asia-Pacific Services Computing Conference, IEEE, pp 745– 750
Demeechai T, Makelainen K (2001) Recognition of syllables in a tone language. Speech Comm, Elsevier 33(3):241–254. doi:10.1016/S0167-6393(00)00017-0
Article MATH Google Scholar
Fantinato PC, Guido RC, Chen S.-H., Santos BLS, Vieira LS, J SB, Rodrigues LC, Sanchez F, Escola J, Souza LM, Maciel CD, Scalassara PR, Pereira J (2008) A fractal-based approach for speech segmentation. In: Tenth IEEE International Symposium on Multimedia, IEEE Computer Society, pp 551–555
Graves A, Fernn̈dez S., Gomez F, Schmidhuber J (2006) Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: ICML, Pittsburgh, USA, pp 369–376
Haque MA, Kim J-M (2011) An enhanced fuzzy c-means algorithm for audio segmentation and classification. Multimedia Tools Appl 63(2):485–500. doi:10.1007/s11042-011-0921-z
Howitt A (2002) Vowel landmark detection. J Acoust Soc Am. 112(5):2279. doi:10.1121/1.4779139
Jittiwarangkul N, Jitapunkul S, Luksaneeyanavin S, Ahkuputra V, Wutiwiwatchai C (169) Thai syllable segmentation for connected speech based on energy. In: The Asia-Pacific Conference on Circuits and Systems, IEEE
Khanagha V, Daoudi K, Pont O, Yahia H (2011) Improving text-independent phonetic segmentation based on the microcanonical multiscale formalism. In: IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, pp 4484–4487
Khanagha V, Daoudi K, Pont O, Yahia H (2014) Phonetic segmentation of speech signal using local singularity analysis. Digital Signal Process Elsevier 35:86–94. doi:10.1016/j.dsp.2014.08.002
Kinsner W, Grieder W (2008) Speech segmentation using multifractal measures and amplification of signal features. In: 7th International Conference on Cognitive Informatics, IEEE Computer Society, pp 351–357
Landsiedel C, Edlund J, Eyben F, Neiberg D, Schuller B (2011) Syllabification of conversational speech using bidirectional long-short-term memory neural networks. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5256–5259. doi:10.1109/ICASSP.2011.5947543
Makashay M, Wightman C, Syrdal A, Conkie A (2000) Perceptual evaluation of automatic segmentation in text-to-speech synthesis. In: Proceedings of the 6th conference of spoken and language processing, Beijing, China
Mermelstein P (1957) Automatic segmentation of speech into syllabic units. J Acoust Soc Am 58:880–883
Article Google Scholar
Obin N, Lamare F, Roebel A (2013) Syll-o-matic: an adaptive time-frequency representation for the automatic segmentation of speech into syllables. In: International conference on acoustics, Speech and Signal Processing, IEEE, pp 6699–6703
Origlia A, Cutugno F, Galat V (2014) Continuous emotion recognition with phonetic syllables. Speech Comm 57:155–169. doi:10.1016/j.specom.2013.09.012
Pan F, Ding N (2010) Speech denoising and syllable segmentation based on fractal dimension. In: International Conference on Measuring Technology and Mechatronics Automation, IEEE, pp 433–436
Petrillo M, Cutugno F (2003) A syllable segmentation algorithm for english and italian. In: Proceedings of 8th european conference on speech communication and technology, EUROSPEECH, Geneva, pp 2913–2916
Pfitzinger H, Burger S, Heid S (1996) Syllable detection in read and spontaneous speech. In: Proceedings of the Fourth International Conference on Spoken Language (ICSLP), Vol. 2, IEEE, pp 1261– 1264
Pikrakis A, Giannakopoulos T, Theodoridis S (2008) An overview of speech/music discrimination techniques in the context of audio recordings. In: Multimedia Services in Intelligent Environments, Springer Berlin Heidelberg, pp 81–102
Prasad VK, Nagarajan T, Murthy HA (2004) Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Comm 42(3-4):429–446. doi:10.1016/j.specom.2003.12.002
Pont O, Turiel A, Yahia H (2011) An optimized algorithm for the evaluation of local singularity exponents in digital signals. In: Combinatorial Image Analysis, Springer Berlin Heidelberg, pp 346– 357
Rasanen O, Laine U, Altosaar T (2009) An improved speech segmentation quality measure: the r-value. In: Proceedings of INTERSPEECH, pp 1851–1854
Renyi A On measures of entropy and information. In: Proceedings of the fourth berkeley symposium on mathematical statistics and probability, Vol. 1, University of California Press, Berkeley, Calif, 1961, pp. 547–561
Saunders J (1996) Real-time discrimination of broadcast speech/music. In: Proceedings of the Acoustics, Speech, and Signal Processing, pp 993–996
Shannon C (1948) A mathematical theory of communication. Bell Syst Tech J 27:379– 423
Article MathSciNet MATH Google Scholar
Shastri L, Chang S, Greenberg S (1999) Syllable detection and segmentation using temporal flow neural networks. In: Proceedings of the Fourteenth International Congress of Phonetic Sciences, pp 1721– 1724
Sheikhi G, Farshad A (2011) Segmentation of speech into syllable units using fuzzy smoothed short term energy contour. In: Proceedings of international conference on acoustics, Speech and Signal Processing, IEEE, pp 195–198
Shen HJE, Lee JL (1998) Robust entropy-based endpoint detection for speech recognition in noisy environments. In: Fifth international conference on spoken language processing
Sreekumar K, George K, Arunraj K, Kumar C (2014) Spectral matching based voice activity detector for improved speaker recognition. In: International conference on power signals control and computations, EPSCICON, IEEE, pp 1–4
Tsallis C (1998) Possible generalization of boltzmann-gibbs statistics. J Stat Phys 52(1-2):479– 487
Article MathSciNet MATH Google Scholar
Turiel A, Parga N (2000) The multi-fractal structure of contrast changes in natural images: from sharp edges to textures. Neural Comput 12:763–793
Article Google Scholar
Turiel A, Prez-Vicente C, Grazzini J (2006) Numerical methods for the estimation of multifractal singularity spectra on sampled data: A comparative study. J Comput Phys 216(1):362–390. doi:10.1016/j.jcp.2005.12.004
Villing R, Timoney J, Ward T, Costello J (2004) Automatic blind syllable segmentation for continuous speech. In: Proceedings of the irish signals and systems conference, Belfast, UK, pp 41–46
Vuuren VZ, Bosch L, Niesler T Unconstrained speech segmentation using deep neural networks. In: ICPRAM 2015 - Proceedings of the international conference on pattern recognition applications and methods, lisbon, Portugal, Vol. 1
Wu L, Shire M, Greenberg S, Morgan N (1997) Integrating syllable boundary information into speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing, Vol. 2, IEEE, pp 987–990
Yahia H, Sudre J, Pottier C, Garcon V (2010) Motion analysis in oceanographic satellite images using multiscale methods and the energy cascade. J Pattern Recognit 43(10):3591–3604. doi:10.1016/j.patcog.2010.04.011
Zhao X, O’Shqughnessy D (2008) A new hybrid approach for automatic speech signal segmentation using silence signal detection, energy convex hull, and spectral variation. In: Canadian Conference on Electrical and Computer Engineering, IEEE, pp 145–148

Download references

Author information

Authors and Affiliations

Laboratoire d’Informatique Signal et Image de la Côte d’Opale, Université du Littoral Côte d’Opale, 50 rue F. Buisson, BP 719, 62228, Calais Cedex, France
Fréjus A. A. Laleye & Cina Motamed
Institut de Mathématiques et de Sciences Physiques, Université d’Abomey-Calavi, BP 613, Porto-Novo, Bénin
Eugène C. Ezin

Authors

Fréjus A. A. Laleye
View author publications
You can also search for this author inPubMed Google Scholar
Eugène C. Ezin
View author publications
You can also search for this author inPubMed Google Scholar
Cina Motamed
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Fréjus A. A. Laleye.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laleye, F.A.A., Ezin, E.C. & Motamed, C. Automatic boundary detection based on entropy measures for text-independent syllable segmentation. Multimed Tools Appl 76, 16347–16368 (2017). https://doi.org/10.1007/s11042-016-3911-3

Download citation

Received: 01 April 2016
Revised: 14 July 2016
Accepted: 25 August 2016
Published: 10 September 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11042-016-3911-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic boundary detection based on entropy measures for text-independent syllable segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic Text-Independent Syllable Segmentation Using Singularity Exponents And Rényi Entropy

Chinese Speech Syllable Segmentation Algorithm Based on Peak Point and Energy Entropy Ratio

Automatic Syllable Repetition Detection in Continuous Speech Based on Linear Prediction Coefficients

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now