Skip to main content
Log in

Fuzzy-based algorithm for Fongbe continuous speech segmentation

  • Short Paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Text-independent speech segmentation is a challenging topic in computer-based speech recognition systems. This paper proposes a novel time-domain algorithm based on fuzzy knowledge for continuous speech segmentation task via a nonlinear speech analysis. Short-term energy, zero-crossing rate and the singularity exponents are the time-domain features that we have calculated in each point of speech signal in order to exploit relevant information for generating the significant segments. This is down for the phoneme or syllable identification and the transition fronts. Fuzzy logic technique helped us to fuzzify the calculated features into three complementary sets namely: low, medium, high and to perform a matching phase using a set of fuzzy rules. The outputs of our proposed algorithm are silence, phonemes, or syllables. Once evaluated, our algorithm produced the best performances with efficient results on Fongbe language (an African tonal language spoken especially in Benin, Togo and Nigeria).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. www.fongbe.fr.

References

  1. Tan BT, Lang R, Schroder H, Spray A, Dermody P (1994) Applying wavelet analysis to speech segmentation and classification. In: Szu HH (ed) Wavelet applications, volume Proceedings of SPIE 2242, pp 750–761

  2. Hioka Y, Hamada N (2003) Voice activity detection with array signal processing in the wavelet domain. IEICE Trans Fundam Electron Commun Comput Sci 86(11):2802–2811

    Google Scholar 

  3. Bartosz Z, Suresh M, Richard W, Mariusz Z (2011) Phoneme segmentation based on wavelet spectra analysis. Arch Acoust 36(1):29–47

    Google Scholar 

  4. Rahman M, Bhuiyan AA (2012) Continuous bangla speech segmentation using short-term speech features extraction approaches. Int J Adv Comput Sci Appl 3(11):131–138

    Google Scholar 

  5. Nagarajan T, Murthy AH, Rajesh MH (2003) Segmentation of speech into syllable-like units. In: EUROSPEECH, Geneve, pp 2893–2896

  6. Sheikhi G, Almasganj F (2011) Segmentation of speech into syllable units using fuzzy smoothed short term energy contour. In: 18th Iranian conference of biomedical engineering (ICBME), IEEE, pp 195–198

  7. Bachu R, Kopparthi S, Adapa B, Barkana BD (2009) Voiced/unvoiced decision for speech signals based on zero-crossing rate and energy. In: Advanced techniques in computing sciences and software engineering. Springer, Netherlands, pp 279–282

  8. Saunders J (1996) Real-time discrimination of broadcast speech/music. In: Proceedings of the acoustics, speech, and signal processing, pp 993–996

  9. Pan F, Ding N (2010) Speech denoising and syllable segmentation based on fractal dimension. In: International conference on measuring technology and mechatronics automation, pp 433–436

  10. Obin N, Lamare F, Roebel A (2013) Syll-o-matic: An adaptive time-frequency representation for the automatic segmentation of speech into syllables. In: International conference on acoustics, speech and signal processing, pp 6699–6703

  11. Reichl W, Ruske G (1997) Syllable segmentation of continuous speech with artificial neural networks. In: Proceedings of Eurospeech, 3rd European conference on speech communication and technology, Berlin, pp 987–990

  12. Shastri L, Chang S, Greenberg S (1999) Syllable detection and segmentation using temporal flow neural networks. In: Proceedings of the Fourteenth International Congress of Phonetic Sciences, San Francisco, pp 1721–1724

  13. Ching-Tang H, Mu-Chun S, Eugene L, Chin H (1999) A segmentation method for continuous speech utilizing hybrid neuro-fuzzy network. J Inf Sci Eng 15:615–628

    Google Scholar 

  14. Makashay M, Colin W, Ann S, Alistair C (2000) Perceptual evaluation of automatic segmentation in text-to-speech synthesis. J Inf Sci Eng 15:431–434

    Google Scholar 

  15. Lo HY, Wang HM (2007) Phonetic boundary refinement using support vector machine. In: IEEE international conference on acoustics, speech and signal processing—ICASSP ’07, Honolulu, HI, pp 933–936

  16. Mporas I, Ganchev T, Fakotakis N (2010) Speech segmentation using regression fusion of boundary predictions. Comput Speech Lang 24(2):273–288

    Article  Google Scholar 

  17. Fréjus AA, Laleye EC, Ezin CM (2014) Weighted Combination of Naive Bayes and LVQ Classifier for Fongbe Phoneme Classification. In: IEEE 10th international conference on signal image technology & internet based systems, pp 7–13

  18. Laleye FAA, Ezin EC, Motamed C (2015) Adaptive decision-level fusion for Fongbe phoneme classification using fuzzy logic and deep belief networks. In: 12th international conference on informatics in control, automation and robotics (ICINCO), vol 1, Colmar, Alsace, France, pp 15–24

  19. Lefebvre C, Brousseau A-M (2001) A grammar of Fonge. De Gruyter Mouton, Berlin

    Google Scholar 

  20. Greenberg J (1966) Languages of Africa. Mouton, La Haye

    Google Scholar 

  21. Akoha AB (2010) Syntaxe et lexicologie du Fon-gbe: Bénin. Ed. L’harmattan, p 368

  22. Khanagha V, Pont O, Yahia H (2011) Improving text-independent phonetic segmentation based on the microcanonical multiscale formalism. In: IEEE international conference on acoustics, speech and signal processing. IEEE, pp 4484–4487

  23. Turiel A, Parga N (2000) The multi-fractal structure of contrast changes in natural images: from sharp edges to textures. In: Neural computation. IEEE, vol 12, pp 763–793

  24. Turiel A, Perez-Vicente C, Grazzini J (2006) Numerical methods for the estimation of multifractal singularity spectra on sampled data: a comparative study. J Comput Phys 216:362–390

    Article  MathSciNet  MATH  Google Scholar 

  25. Shete DS, Patil SB, Patil SB (2014) Zero crossing rate and Energy of the Speech Signal of Devanagari Script. J VLSI Signal Process IOSR-JVSP 4(1):01–05

    Article  Google Scholar 

  26. Yoshua B, Pascal L, Dan P, Hugo L (2007) Greedy layerwise training of deep networks. In: Proceedings of advances in neural information processing systems 19 (NIPS’06), pp 153–160

  27. Geoffrey EH, Simon O, Yee-Whye T (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  28. O’Connor P, Neil D, Liu SC, Delbruck T, Pfeiffer M (2013) Real-time classification and sensor fusion with a spiking deep belief network. Front Neurosci 7:178

    Google Scholar 

  29. Vuuren VZ, Bosch L, Niesler T (2015) Unconstrained speech segmentation using deep neural networks. In: ICPRAM 2015—proceedings of the international conference on pattern recognition applications and methods, vol 1. Lisbon, Portugal, pp 248–254

  30. Rasanen OJ, Laine UK, Altosaar T (2009) An improved speech segmentation quality measure: the r-value. In: Proceedings of INTERSPEECH, pp 1851–1854

Download references

Acknowledgements

Funding was provided by Agence Universitaire de la Francophonie.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fréjus A. A. Laleye.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Laleye, F.A.A., Ezin, E.C. & Motamed, C. Fuzzy-based algorithm for Fongbe continuous speech segmentation. Pattern Anal Applic 20, 855–864 (2017). https://doi.org/10.1007/s10044-016-0591-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-016-0591-6

Keywords

Navigation