Skip to main content
Log in

Access the number of speakers through visual access tendency for effective speech clustering

  • Original Article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

Speech clustering group the unlabeled speech utterances according to their similarity features and it requires prior information about number of speakers before assigning every speech utterance into its respective speaker cluster. Determine the number of speakers of speech dataset is a primary problem of speech clustering. Most methods follow the post clustering ideas for evaluation of number of speakers. After the recent study of cluster (or speakers) detection methods, it is found that visual access tendency (VAT) is most suitable approach for assessing the number of speakers information. However, it needs speaker model parameters for finding an accurate speakers information. By this motivation, the VAT is extended with Gaussian mixture model (GMM) for deriving of speakers information with model parameters. In the proposed work, speech data (i.e. speaker utterances or segment) is modeled by GMM, which derives GMM mean supervectors. Dissimilarity features are derived for a set of GMM mean supervectors in VAT for effective speech clustering. The GMM mean supervectors are high-dimensional and this dimensionality problem is addressed by generating intermediate vectors (i-vectors). Efficiency of proposed methods is demonstrated in the experimental study by real time datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Alam MJ, Gupta V, Kenny P, Dumouchel P (2015) Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation, pp 1–13

  • Anguera X (2012) Speaker independent discriminant feature extraction for acoustic pattern matching. In: Proceedings of the ICASSP

  • Arun P (2001) Data mining techniques. Universities Press, London

    Google Scholar 

  • Cai D, He X, Han J (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(2):1624–1637

    Article  Google Scholar 

  • Chu S, Tang H, Haung T (2009) Fishervoice and semi-supervised speaker clustering. In: Proceedings of the IEEE international acoustic speech and signal processing, pp 4089–4092

  • Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2010) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798

    Article  Google Scholar 

  • Douglas DA, Reynolds A (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Proc 10:19–41

    Article  Google Scholar 

  • Ferras M, Madikeri S, Bourlard H (2016) Speaker diarization and linking of meeting data. IEEE/ACM Trans Audio Speech Lang Process 24(11):1935–1945

    Article  Google Scholar 

  • Gupta P, Sharma TK, Mehrotra D, Abraham A (2017) Knowledge building through optimized classification rule set generation using genetic based elitist multi objective approach. In: Neural computing and applications (NCAA), Springer. https://doi.org/10.1007/s00521-017-3042-4

  • Haipeng W, Tan L, Cheung CL, Bin M (2015) Acoustic segment modeling with spectral clustering methods. IEEE Trans Audio Speech Lang Process 23(2):264–277

    Article  Google Scholar 

  • Han J, Kamber M (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  • Havens TC, Bezdek JC, Keller JM, Popescu M (2008) Dunn’s cluster validity index as contrast measure of VAT images. In: International conference, IEEE 2008

  • Jain A, Dubes R (1988) Algorithms for clustering data. Prentice Hall, New York

    MATH  Google Scholar 

  • Lee Y, Glass J (2012) A nonparametric Bayesian approach to acoustic model discovery. In: Proceedings of ACL

  • Li L, Wang D, Zhang C, Zheng TF (2016) Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Trans Audio Speech Lang Process 24(6):1129–1139

    Article  Google Scholar 

  • Lovasz L, Plummer M (1986) Matching theory. Budapest, Northholland

    MATH  Google Scholar 

  • Mohammad S, Patrick K (2014) A study of the cosine distance-based mean shift for telephone speech diarization. IEEE/ACM Trans Audio Speech Lang Process 22(1):217–227

    Article  Google Scholar 

  • Rafaely B, Kolossa D, Maymon Y (2017) Towards acoustically robust localization of speakers in a reverberant environment. In: Hands-free speech communications and microphone arrays (HSCMA), pp 10–16

  • Rajpurohit J, Sharma TK, Abraham A, Vaishali A (2017) Glossary of metaheuristic algorithms. Int J Comput Inf Syst Ind Manag Appl 9:181–205

    Google Scholar 

  • Roberto T (2011) An overview of speaker identification: accuracy and robustness issues. IEEE Circuits Syst Mag 11:23–61

    Article  Google Scholar 

  • Saki F, Kehtarnavaz N (2017) Real-time unsupervised classification of environmental noise signals. IEEE/ACM Transa Audio Speech Lang Process 22(8):1657–1667

    Article  Google Scholar 

  • Sangeeta B, Rohdin J, Shinoda K (2015) Autonomous selection of i-vectors for PLDA modeling in speaker verification. Speech Commun 72:32–46

    Article  Google Scholar 

  • Senoussaoui M, Patrick K, Themo S, Dumouchel P (2013) Efficient iterative mean shift based cosine dissimilarity for mutli-recording speaker clustering. In: Proceedings of ICASSP, pp 7712–7715

  • Sharma TK, Pant M (2016) Identification of noise in multi noise plant using enhanced version of shuffled frog leaping algorithm. Int J Syst Assur Eng Manag. https://doi.org/10.1007/s13198-016-0466-7

  • Stephen SH, Dehak N, Dehak R, Glass J (2013) Unsupervised methods for speaker diarization: an integrated and iterative approach. IEEE Trans Audio Speech Lang Process 21(10):2015–2028

    Article  Google Scholar 

  • Tang S, Chu S (2012) Partially supervised speaker clustering. IEEE Trans Pattern Anal Mach Intell 34(5):959–971

    Article  Google Scholar 

  • Varadarajan B, Khudanpur S, Dupoux E (2008) Unsupervised learning of acoustic subword units. In: ACL-08: HLT

  • Wang L, Geng X, Bezdek J, Leckie C, Ramamohanarao K (2010) Enhanced visual analysis for cluster tendency assessment and data partitioning. IEEE Trans Knowl Data Eng 22(10):1401–1414

    Article  Google Scholar 

  • Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Int 13:841–847

    Article  Google Scholar 

  • Zeinali H, Sameti H, Burget L (2017) HMM-based phrase-independent i-vector extractor for text-dependent speaker verification. IEEE/ACM Trans Audio Speech Lang Process 25(7):1421–1435

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to T. Suneetha Rani.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suneetha Rani, T., Krishna Prasad, M.H.M. Access the number of speakers through visual access tendency for effective speech clustering. Int J Syst Assur Eng Manag 9, 559–566 (2018). https://doi.org/10.1007/s13198-018-0703-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-018-0703-3

Keywords