Abstract
Text-independent speech segmentation is a challenging topic in computer-based speech recognition systems. This paper proposes a novel time-domain algorithm based on fuzzy knowledge for continuous speech segmentation task via a nonlinear speech analysis. Short-term energy, zero-crossing rate and the singularity exponents are the time-domain features that we have calculated in each point of speech signal in order to exploit relevant information for generating the significant segments. This is down for the phoneme or syllable identification and the transition fronts. Fuzzy logic technique helped us to fuzzify the calculated features into three complementary sets namely: low, medium, high and to perform a matching phase using a set of fuzzy rules. The outputs of our proposed algorithm are silence, phonemes, or syllables. Once evaluated, our algorithm produced the best performances with efficient results on Fongbe language (an African tonal language spoken especially in Benin, Togo and Nigeria).




Similar content being viewed by others
Notes
References
Tan BT, Lang R, Schroder H, Spray A, Dermody P (1994) Applying wavelet analysis to speech segmentation and classification. In: Szu HH (ed) Wavelet applications, volume Proceedings of SPIE 2242, pp 750–761
Hioka Y, Hamada N (2003) Voice activity detection with array signal processing in the wavelet domain. IEICE Trans Fundam Electron Commun Comput Sci 86(11):2802–2811
Bartosz Z, Suresh M, Richard W, Mariusz Z (2011) Phoneme segmentation based on wavelet spectra analysis. Arch Acoust 36(1):29–47
Rahman M, Bhuiyan AA (2012) Continuous bangla speech segmentation using short-term speech features extraction approaches. Int J Adv Comput Sci Appl 3(11):131–138
Nagarajan T, Murthy AH, Rajesh MH (2003) Segmentation of speech into syllable-like units. In: EUROSPEECH, Geneve, pp 2893–2896
Sheikhi G, Almasganj F (2011) Segmentation of speech into syllable units using fuzzy smoothed short term energy contour. In: 18th Iranian conference of biomedical engineering (ICBME), IEEE, pp 195–198
Bachu R, Kopparthi S, Adapa B, Barkana BD (2009) Voiced/unvoiced decision for speech signals based on zero-crossing rate and energy. In: Advanced techniques in computing sciences and software engineering. Springer, Netherlands, pp 279–282
Saunders J (1996) Real-time discrimination of broadcast speech/music. In: Proceedings of the acoustics, speech, and signal processing, pp 993–996
Pan F, Ding N (2010) Speech denoising and syllable segmentation based on fractal dimension. In: International conference on measuring technology and mechatronics automation, pp 433–436
Obin N, Lamare F, Roebel A (2013) Syll-o-matic: An adaptive time-frequency representation for the automatic segmentation of speech into syllables. In: International conference on acoustics, speech and signal processing, pp 6699–6703
Reichl W, Ruske G (1997) Syllable segmentation of continuous speech with artificial neural networks. In: Proceedings of Eurospeech, 3rd European conference on speech communication and technology, Berlin, pp 987–990
Shastri L, Chang S, Greenberg S (1999) Syllable detection and segmentation using temporal flow neural networks. In: Proceedings of the Fourteenth International Congress of Phonetic Sciences, San Francisco, pp 1721–1724
Ching-Tang H, Mu-Chun S, Eugene L, Chin H (1999) A segmentation method for continuous speech utilizing hybrid neuro-fuzzy network. J Inf Sci Eng 15:615–628
Makashay M, Colin W, Ann S, Alistair C (2000) Perceptual evaluation of automatic segmentation in text-to-speech synthesis. J Inf Sci Eng 15:431–434
Lo HY, Wang HM (2007) Phonetic boundary refinement using support vector machine. In: IEEE international conference on acoustics, speech and signal processing—ICASSP ’07, Honolulu, HI, pp 933–936
Mporas I, Ganchev T, Fakotakis N (2010) Speech segmentation using regression fusion of boundary predictions. Comput Speech Lang 24(2):273–288
Fréjus AA, Laleye EC, Ezin CM (2014) Weighted Combination of Naive Bayes and LVQ Classifier for Fongbe Phoneme Classification. In: IEEE 10th international conference on signal image technology & internet based systems, pp 7–13
Laleye FAA, Ezin EC, Motamed C (2015) Adaptive decision-level fusion for Fongbe phoneme classification using fuzzy logic and deep belief networks. In: 12th international conference on informatics in control, automation and robotics (ICINCO), vol 1, Colmar, Alsace, France, pp 15–24
Lefebvre C, Brousseau A-M (2001) A grammar of Fonge. De Gruyter Mouton, Berlin
Greenberg J (1966) Languages of Africa. Mouton, La Haye
Akoha AB (2010) Syntaxe et lexicologie du Fon-gbe: Bénin. Ed. L’harmattan, p 368
Khanagha V, Pont O, Yahia H (2011) Improving text-independent phonetic segmentation based on the microcanonical multiscale formalism. In: IEEE international conference on acoustics, speech and signal processing. IEEE, pp 4484–4487
Turiel A, Parga N (2000) The multi-fractal structure of contrast changes in natural images: from sharp edges to textures. In: Neural computation. IEEE, vol 12, pp 763–793
Turiel A, Perez-Vicente C, Grazzini J (2006) Numerical methods for the estimation of multifractal singularity spectra on sampled data: a comparative study. J Comput Phys 216:362–390
Shete DS, Patil SB, Patil SB (2014) Zero crossing rate and Energy of the Speech Signal of Devanagari Script. J VLSI Signal Process IOSR-JVSP 4(1):01–05
Yoshua B, Pascal L, Dan P, Hugo L (2007) Greedy layerwise training of deep networks. In: Proceedings of advances in neural information processing systems 19 (NIPS’06), pp 153–160
Geoffrey EH, Simon O, Yee-Whye T (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
O’Connor P, Neil D, Liu SC, Delbruck T, Pfeiffer M (2013) Real-time classification and sensor fusion with a spiking deep belief network. Front Neurosci 7:178
Vuuren VZ, Bosch L, Niesler T (2015) Unconstrained speech segmentation using deep neural networks. In: ICPRAM 2015—proceedings of the international conference on pattern recognition applications and methods, vol 1. Lisbon, Portugal, pp 248–254
Rasanen OJ, Laine UK, Altosaar T (2009) An improved speech segmentation quality measure: the r-value. In: Proceedings of INTERSPEECH, pp 1851–1854
Acknowledgements
Funding was provided by Agence Universitaire de la Francophonie.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Laleye, F.A.A., Ezin, E.C. & Motamed, C. Fuzzy-based algorithm for Fongbe continuous speech segmentation. Pattern Anal Applic 20, 855–864 (2017). https://doi.org/10.1007/s10044-016-0591-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-016-0591-6