Skip to main content
Log in

Bounded Generalized Gaussian Mixture Model with ICA

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

In this paper, we propose bounded generalized Gaussian mixture model with independent component analysis (ICA). One limitation in ICA is that it assumes the sources to be independent from each other. This assumption can be relaxed by employing a mixture model. In our proposed model, bounded generalized Gaussian distribution (BGGD) is adopted for modeling the data and we have further extended its mixture as an ICA mixture model by employing gradient ascent along with expectation maximization for parameter estimation. By inferring the shape parameter in BGGD, Gaussian distribution and Laplace distribution can be characterized as special cases. In order to validate the effectiveness of this algorithm, experiments are performed on blind source separation (BSS) and BSS as preprocessing to unsupervised keyword spotting. For BSS, TIMIT, TSP and Noizeus speech corpora are selected and results are compared with ICA. For keyword spotting, TIMIT speech corpus is selected and recognition results are further compared before and after BSS being applied as preprocessing when speech utterances are affected by mixing of noise or other speech utterances. The mixing of noise or speech utterances with a particular or target speech utterance can greatly affect the intelligibility of a speech signal. The results achieved from the presented experiments on different applications have demonstrated the effectiveness of ICA mixture model in statistical learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Alinaghi A, Jackson PJ, Liu Q, Wang W (2014) Joint mixing vector and binaural model based stereo source separation. IEEE/ACM Trans Audio Speech Lang Process 22(9):1434–1448. https://doi.org/10.1109/TASLP.2014.2320637

    Article  Google Scholar 

  2. Allili M (2012) Wavelet modeling using finite mixtures of generalized gaussian distributions: application to texture discrimination and retrieval. IEEE Trans Image Process 21(4):1452–1464. https://doi.org/10.1109/TIP.2011.2170701

    Article  MathSciNet  MATH  Google Scholar 

  3. Allili M, Baaziz N, Mejri M (2014) Texture modeling using contourlets and finite mixtures of generalized Gaussian distributions and applications. IEEE Trans Multimed 16(3):772–784. https://doi.org/10.1109/TMM.2014.2298832

    Article  Google Scholar 

  4. Allili MS, Bouguila N, Ziou D (2008) Finite general Gaussian mixture modeling and application to image and video foreground segmentation. J Electron Imaging 17(1):013,005–013,005

    Google Scholar 

  5. Ans B, Hérault J, Jutten C (1985) Adaptive neural architectures: detection of primitives. Proc COGNITIVA 85:593–597

    Google Scholar 

  6. Azam M, Bouguila N (2015) Unsupervised keyword spotting using bounded generalized Gaussian mixture model with ICA. In: 2015 IEEE GlobalSIP, 45: 1150–1154 . https://doi.org/10.1109/GlobalSIP.2015.7418378

  7. Azam M, Bouguila N (2016) Speaker classification via supervised hierarchical clustering using ICA mixture model. Springer, Cham, pp 193–202. https://doi.org/10.1007/978-3-319-33618-3_20

    Book  Google Scholar 

  8. Bae UM, Lee TW, Lee SY (2000) Blind signal separation in teleconferencing using ica mixture model. Electron Lett 36(7):680–682. https://doi.org/10.1049/el:20000459

    Article  Google Scholar 

  9. Bell AJ, Sejnowski TJ (1995) An information-maximization approach to blind separation and blind deconvolution. Neural Comput 7:1129–1159

    Google Scholar 

  10. Bishop CM (2006) Pattern recognition and machine learning (Information science and statistics). Springer, New York

    MATH  Google Scholar 

  11. Cardoso J (1997) Infomax and maximum likelihood for blind source separation. IEEE Signal Process Lett. https://doi.org/10.1109/97.566704

    Article  Google Scholar 

  12. Choudrey RA, Roberts SJ (2003) Variational mixture of bayesian independent component analyzers. Neural Comput 15(1):213–252

    MATH  Google Scholar 

  13. Choy S, Tong C (2010) Statistical wavelet subband characterization based on generalized gamma density and its application in texture retrieval. IEEE Trans Image Process 19(2):281–289. https://doi.org/10.1109/TIP.2009.2033400

    Article  MathSciNet  MATH  Google Scholar 

  14. Comon P (1992) Independent component analysis. In: lnternational signal processing workshop on high-order statistics, Chamrousse, France, 10–12 July 1991, pp 111–120 (republished in J.L. Lacoume, ed., Hioher-Order Statistics, Elsevier, Amsterdam 1992, pp 29–38)

  15. Comon P (1994) Independent component analysis, a new concept? Signal Process 36(3):287–314

    MATH  Google Scholar 

  16. Comon P, Jutten C (2010) Handbook of blind source separation: independent component analysis and applications, 1st edn. Academic Press, Cambridge

    Google Scholar 

  17. Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366. https://doi.org/10.1109/TASSP.1980.1163420

    Article  Google Scholar 

  18. Elguebaly T, Bouguila N (2014) Background subtraction using finite mixtures of asymmetric gaussian distributions and shadow detection. Mach Vis Appl 25(5):1145–1162

    Google Scholar 

  19. Elguebaly T, Bouguila N (2015) Simultaneous high-dimensional clustering and feature selection using asymmetric Gaussian mixture models. Image Vis Comput 34:27–41

    Google Scholar 

  20. Emiya V, Vincent E, Harlander N, Hohmann V (2011) Subjective and objective quality assessment of audio source separation. IEEE Trans Audio Speech Lang Process 19(7):2046–2057. https://doi.org/10.1109/TASL.2011.2109381

    Article  Google Scholar 

  21. Farag A, El-Baz A, Gimel’farb G (2006) Precise segmentation of multimodal images. IEEE Trans Image Process 15(4):952–968. https://doi.org/10.1109/TIP.2005.863949

    Article  Google Scholar 

  22. Figueiredo MA, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396

    Google Scholar 

  23. Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL DARPA TIMIT acoustic phonetic continuous speech corpus CDROM. http://www.ldc.upenn.edu/Catalog/LDC93S1.html

  24. Gu F, Zhang H, Wang W, Wang S (2017) An expectation-maximization algorithm for blind separation of noisy mixtures using Gaussian mixture model. Circuits Systems Signal Process 36(7):2697–2726. https://doi.org/10.1007/s00034-016-0424-2

    Article  MATH  Google Scholar 

  25. Hazen T, Shen W, White C (2009) Query-by-example spoken term detection using phonetic posteriorgram templates. IEEE Workshop ASRU 2009:421–426. https://doi.org/10.1109/ASRU.2009.5372889

    Article  Google Scholar 

  26. Hedelin P, Skoglund J (2000) Vector quantization based on gaussian mixture models. IEEE Trans Speech Audio Process 8(4):385–401. https://doi.org/10.1109/89.848220

    Article  Google Scholar 

  27. Herault J, Jutten C (1986) Space or time adaptive signal processing by neural network models. In: Neural networks for computing, vol. 151, pp. 206–211. AIP Publishing, New York

  28. Hérault J, Jutten C, Ans B (1985) Détection de grandeurs primitives dans un message composite par une architecture de calcul neuromimétique en apprentissage non supervisé. In: 10 Colloque sur le traitement du signal et des images, FRA, 1985. GRETSI, Groupe dEtudes du Traitement du Signal et des Images

  29. Hrault J, Ans B (1984) Circuits neuronaux synapses modifiables: dcodage de messages composites par apprentissage non supervis. C R Acad Sci 299:525–528

    Google Scholar 

  30. Hu Y, Loizou P (2007) Noizeus: A noisy speech corpus for evaluation of speech enhancement algorithms . http://ecs.utdallas.edu/loizou/speech/noizeus/. Online web resource

  31. Huang X, Acero A, Hon H (2001) Spoken language processing: a guide to theory, algorithm, and system development, 1st edn. Prentice Hall PTR, Upper Saddle River

    Google Scholar 

  32. Hyvärinen A, Karhunen J, Oja E (2004) Independent component analysis, vol 46. Wiley, Hoboken

    Google Scholar 

  33. Jayashree P, Premkumar MJJ (2015) Machine learning in automatic speech recognition: a survey. IETE Tech Rev 0(0):1–12

    Google Scholar 

  34. Jutten C (1987) Calcul neuromimétique et traitement du signal: analyse en composantes indépendantes. Ph.D. thesis, Grenoble INPG

  35. Jutten C, Herault J (1991) Blind separation of sources, part 1: an adaptive algorithm based on neuromimetic architecture. Signal Process 24(1):1–10

    MATH  Google Scholar 

  36. Kabal P (2002) TSP speech database. Tech. rep., Department of Electrical & Computer Engineering, McGill University, Montreal, Quebec, Canada

  37. Lee TW, Girolami M, Sejnowski TJ (1999) Independent component analysis using an extended infomax algorithm for mixed sub-gaussian and super-gaussian sources

  38. Lee TW, Lewicki MS (2000) The generalized Gaussian mixture model using ICA. In: International workshop on ICA, pp 239–244

  39. Lee TW, Lewicki MS (2002) Unsupervised image classification, segmentation, and enhancement using ICA mixture models. IEEE Trans Image Process 11(3):270–279

    Google Scholar 

  40. Lee TW, Lewicki MS, Sejnowski TJ (1999) Unsupervised classification with non-Gaussian mixture models using ICA. In: Advances in neural information processing systems, pp 508–514

  41. Lee TW, Lewicki MS, Sejnowski TJ (2000) ICA mixture models for unsupervised classification with non-Gaussian sources and automatic context switching in blind signal separation. In: IEEE transactions on pattern recognition and machine learning

  42. Li W, Liao Q (2012) Keyword-specific normalization based keyword spotting for spontaneous speech. In: 8th international symposium on Chinese spoken language processing (ISCSLP), 2012, pp 233–237 https://doi.org/10.1109/ISCSLP.2012.6423490

  43. Lindblom J, Samuelsson J (2003) Bounded support Gaussian mixture modeling of speech spectra. IEEE Trans Speech Audio Process 11(1):88–99. https://doi.org/10.1109/TSA.2002.805639

    Article  Google Scholar 

  44. Liu C, Rubin DB (1995) ML estimation of the t distribution using EM and its extensions. ECM ECME Stat Sinica 5(1):19–39

    MathSciNet  MATH  Google Scholar 

  45. Liu G, Wu J, Zhou S (2013) Probabilistic classifiers with a generalized Gaussian scale mixture prior. Pattern Recognit 46(1):332–345

    MATH  Google Scholar 

  46. McGraw-Hill: Keyword spotting. (n.d.) mcgraw-hill dictionary of scientific & technical terms, 6e. (2003). http://encyclopedia2.thefreedictionary.com/keyword+spotting. Retrieved on March 31 2015

  47. McLachlan G, Peel D (2004) Finite mixture models. Wiley, Hoboken

    MATH  Google Scholar 

  48. Mollah MNH, Minami M, Eguchi S (2006) Exploring latent structure of mixture ica models by the minimum \(\beta \)-divergence method. Neural Comput 18(1):166–190

    MATH  Google Scholar 

  49. Mowlaee P, Saeidi R, Christensen MG, Martin R (2012) Subjective and objective quality assessment of single-channel speech separation algorithms. In: 2012 IEEE ICASSP, pp 69–72

  50. Myers C, Rabiner L (1981) A level building dynamic time warping algorithm for connected word recognition. IEEE Trans Acoust Speech Signal Process 29(2):284–297. https://doi.org/10.1109/TASSP.1981.1163527

    Article  MATH  Google Scholar 

  51. Nguyen TM, Wu QJ, Zhang H (2014) Bounded generalized Gaussian mixture model. Pattern Recognit 47(9):3132

    MATH  Google Scholar 

  52. Palmer JA, Kreutz-delgado K, Makeig S (2006) An independent component analysis mixture model with adaptive source densities. Technical Report, UCSD

  53. Park A, Glass J (2005) Towards unsupervised pattern discovery in speech. In: IEEE Workshop on automatic speech recognition and understanding, 2005, pp 53–58 https://doi.org/10.1109/ASRU.2005.1566529

  54. Park A, Glass J (2006) Unsupervised word acquisition from speech using pattern discovery. In: IEEE proceedings of international conference on acoustics, speech and signal processing ICASSP, 2006, vol 1, pp. I–I . https://doi.org/10.1109/ICASSP.2006.1660044

  55. Park A, Glass J (2008) Unsupervised pattern discovery in speech. IEEE Trans Audio Speech Lang Process 16(1):186–197. https://doi.org/10.1109/TASL.2007.909282

    Article  Google Scholar 

  56. Peel D, McLachlan G (2000) Robust mixture modelling using the t distribution. Stat Comput 10(4):339–348

    Google Scholar 

  57. Peng T, Chen Y, Liu Z (2015) A time-frequency domain blind source separation method for underdetermined instantaneous mixtures. Circuits Syst Signal Process. https://doi.org/10.1007/s00034-015-0035-3

    Article  MathSciNet  Google Scholar 

  58. Persia LD, Milone D, Rufiner HL, Yanagida M (2008) Perceptual evaluation of blind source separation for robust speech recognition. Signal Process 88(10):2578–2583

    MATH  Google Scholar 

  59. Petersen KB, Winther O (2005) The EM algorithm in independent component analysis. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP ’05), vol 5, pp v/169–v/172. https://doi.org/10.1109/ICASSP.2005.1416267

  60. Price M, Glass J, Chandrakasan A (2015) A 6 mW, 5,000-word real-time speech recognizer using WFST models. IEEE J Solid-State Circuits 50(1):102–112. https://doi.org/10.1109/JSSC.2014.2367818

    Article  Google Scholar 

  61. Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall Inc, Upper Saddle River

    Google Scholar 

  62. Ribeiro PB, Romero RAF, Oliveira PR, Schiabel H, Verosa LB (2013) Automatic segmentation of breast masses using enhanced ICA mixture model. Neurocomputing 120:61–71

    Google Scholar 

  63. Rohlicek J, Russell W, Roukos S, Gish H (1989) Continuous hidden markov modeling for speaker-independent word spotting. In: International conference on acoustics, speech, and signal processing, 1989. ICASSP-89, vol 1, pp 627–630. https://doi.org/10.1109/ICASSP.1989.266505

  64. Rose R, Paul D (1990) A hidden Markov model based keyword recognition system. In: International conference on acoustics, speech, and signal processing, 1990. ICASSP-90., vol 1, pp 129–132. https://doi.org/10.1109/ICASSP.1990.115555

  65. Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process 26(1):43–49. https://doi.org/10.1109/TASSP.1978.1163055

    Article  MATH  Google Scholar 

  66. Salazar A (2013) ICA and ICAMM methods. In: On statistical pattern recognition in independent component analysis mixture modelling, Springer Theses, vol 4. Springer, Berlin

  67. Shah CA, Arora MK, Varshney PK (2004) Unsupervised classification of hyperspectral data: an ICA mixture model based approach. Int J Remote Sens 25(2):481–487

    Google Scholar 

  68. Shah CA, Varshney PK, Arora MK (2007) ICA mixture model algorithm for unsupervised classification of remote sensing imagery. Int J Remote Sens 28(8):1711–1731

    Google Scholar 

  69. Siu MH, Gish H, Chan A, Belfield W, Lowe S (2014) Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery. Comput Speech Lang 28(1):210–223

    Google Scholar 

  70. Szoke I, Schwarz P, Burget L, Fapso M, Karafiat M, Cernocky J, Matejka P (2005) Comparison of keyword spotting approaches for informal continuous speech. In: In Proceedings, Interspeech

  71. Takebayashi Y, Tsuboi H, Kanazawa H (1992) Keyword-spotting in noisy continuous speech using word pattern vector subabstraction and noise immunity learning. In: IEEE international conference on acoustics, speech, and signal processing, 1992. ICASSP-92, vol 2, pp. 85–88. https://doi.org/10.1109/ICASSP.1992.226114

  72. Thiagarajan JJ, Ramamurthy KN, Spanias A (2013) Mixing matrix estimation using discriminative clustering for blind source separation. Digital Signal Process 23(1):9–18

    MathSciNet  Google Scholar 

  73. Vincent E, Bertin N, Gribonval R, Bimbot F (2014) From blind to guided audio source separation: how models and side information can improve the separation of sound. IEEE Signal Process Mag 31(3):107–115. https://doi.org/10.1109/MSP.2013.2297440

    Article  Google Scholar 

  74. Vincent E, Gribonval R, Fevotte C (2006) Performance measurement in blind audio source separation. IEEE Trans Audio Speech Lang Process 14(4):1462–1469. https://doi.org/10.1109/TSA.2005.858005

    Article  Google Scholar 

  75. Wang H, Lee T, Leung CC, Ma B, Li H (2013) Unsupervised mining of acoustic subword units with segment-level Gaussian posteriorgrams. In: 14th annual conference of the international speech communication association INTERSPEECH 2013, Lyon, France, August 25–29, 2013, pp 2297–2301

  76. Wei X, Yang Z (2012) The infinite student’s t-factor mixture analyzer for Robust clustering and classification. Pattern Recognit 45(12):4346–4357

    MATH  Google Scholar 

  77. Wilcox L, Bush M (1992) Training and search algorithms for an interactive wordspotting system. In: IEEE international conference on acoustics, speech, and signal processing, 1992. ICASSP-92., 1992, vol 2, pp 97–100. https://doi.org/10.1109/ICASSP.1992.226111

  78. Zhang Y (2009) Unsupervised spoken keyword spotting and learning of acoustically meaningful units. Master’s thesis, Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science

  79. Zhang Y (2013) Unsupervised speech processing with applications to query-by-exampley-example spoken term detection. Ph.D. thesis, MIT. Department of Electrical Engineering and Computer Science

  80. Zhang Y, Glass J (2009) Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams. In: IEEE workshop on automatic speech recognition understanding, 2009. ASRU 2009. pp 398–403. https://doi.org/10.1109/ASRU.2009.5372931

  81. Zhang Y, Glass J (2010) Towards multi-speaker unsupervised speech pattern discovery. In: IEEE international conference on acoustics speech and signal processing (ICASSP), 2010, pp 4366–4369 . https://doi.org/10.1109/ICASSP.2010.5495637

  82. Zhang Y, Glass J (2011) An inner-product lower-bound estimate for dynamic time warping. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2011, pp 5660–5663 . https://doi.org/10.1109/ICASSP.2011.5947644

Download references

Acknowledgements

The completion of this research was made possible thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Azam.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Azam, M., Bouguila, N. Bounded Generalized Gaussian Mixture Model with ICA. Neural Process Lett 49, 1299–1320 (2019). https://doi.org/10.1007/s11063-018-9868-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-018-9868-7

Keywords

Navigation