Access the number of speakers through visual access tendency for effective speech clustering

Suneetha Rani, T.; Krishna Prasad, M. H. M.

doi:10.1007/s13198-018-0703-3

Access the number of speakers through visual access tendency for effective speech clustering

Original Article
Published: 05 February 2018

Volume 9, pages 559–566, (2018)
Cite this article

International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

83 Accesses
Explore all metrics

Abstract

Speech clustering group the unlabeled speech utterances according to their similarity features and it requires prior information about number of speakers before assigning every speech utterance into its respective speaker cluster. Determine the number of speakers of speech dataset is a primary problem of speech clustering. Most methods follow the post clustering ideas for evaluation of number of speakers. After the recent study of cluster (or speakers) detection methods, it is found that visual access tendency (VAT) is most suitable approach for assessing the number of speakers information. However, it needs speaker model parameters for finding an accurate speakers information. By this motivation, the VAT is extended with Gaussian mixture model (GMM) for deriving of speakers information with model parameters. In the proposed work, speech data (i.e. speaker utterances or segment) is modeled by GMM, which derives GMM mean supervectors. Dissimilarity features are derived for a set of GMM mean supervectors in VAT for effective speech clustering. The GMM mean supervectors are high-dimensional and this dimensionality problem is addressed by generating intermediate vectors (i-vectors). Efficiency of proposed methods is demonstrated in the experimental study by real time datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Access the cluster tendency by visual methods for robust speech clustering

Article 27 October 2015

A study on the roles of total variability space and session variability modeling in speaker recognition

Article 07 December 2015

Speaker Diarization in Vietnamese Voice

References

Alam MJ, Gupta V, Kenny P, Dumouchel P (2015) Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation, pp 1–13
Anguera X (2012) Speaker independent discriminant feature extraction for acoustic pattern matching. In: Proceedings of the ICASSP
Arun P (2001) Data mining techniques. Universities Press, London
Google Scholar
Cai D, He X, Han J (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(2):1624–1637
Article Google Scholar
Chu S, Tang H, Haung T (2009) Fishervoice and semi-supervised speaker clustering. In: Proceedings of the IEEE international acoustic speech and signal processing, pp 4089–4092
Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2010) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Article Google Scholar
Douglas DA, Reynolds A (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Proc 10:19–41
Article Google Scholar
Ferras M, Madikeri S, Bourlard H (2016) Speaker diarization and linking of meeting data. IEEE/ACM Trans Audio Speech Lang Process 24(11):1935–1945
Article Google Scholar
Gupta P, Sharma TK, Mehrotra D, Abraham A (2017) Knowledge building through optimized classification rule set generation using genetic based elitist multi objective approach. In: Neural computing and applications (NCAA), Springer. https://doi.org/10.1007/s00521-017-3042-4
Haipeng W, Tan L, Cheung CL, Bin M (2015) Acoustic segment modeling with spectral clustering methods. IEEE Trans Audio Speech Lang Process 23(2):264–277
Article Google Scholar
Han J, Kamber M (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar
Havens TC, Bezdek JC, Keller JM, Popescu M (2008) Dunn’s cluster validity index as contrast measure of VAT images. In: International conference, IEEE 2008
Jain A, Dubes R (1988) Algorithms for clustering data. Prentice Hall, New York
MATH Google Scholar
Lee Y, Glass J (2012) A nonparametric Bayesian approach to acoustic model discovery. In: Proceedings of ACL
Li L, Wang D, Zhang C, Zheng TF (2016) Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Trans Audio Speech Lang Process 24(6):1129–1139
Article Google Scholar
Lovasz L, Plummer M (1986) Matching theory. Budapest, Northholland
MATH Google Scholar
Mohammad S, Patrick K (2014) A study of the cosine distance-based mean shift for telephone speech diarization. IEEE/ACM Trans Audio Speech Lang Process 22(1):217–227
Article Google Scholar
Rafaely B, Kolossa D, Maymon Y (2017) Towards acoustically robust localization of speakers in a reverberant environment. In: Hands-free speech communications and microphone arrays (HSCMA), pp 10–16
Rajpurohit J, Sharma TK, Abraham A, Vaishali A (2017) Glossary of metaheuristic algorithms. Int J Comput Inf Syst Ind Manag Appl 9:181–205
Google Scholar
Roberto T (2011) An overview of speaker identification: accuracy and robustness issues. IEEE Circuits Syst Mag 11:23–61
Article Google Scholar
Saki F, Kehtarnavaz N (2017) Real-time unsupervised classification of environmental noise signals. IEEE/ACM Transa Audio Speech Lang Process 22(8):1657–1667
Article Google Scholar
Sangeeta B, Rohdin J, Shinoda K (2015) Autonomous selection of i-vectors for PLDA modeling in speaker verification. Speech Commun 72:32–46
Article Google Scholar
Senoussaoui M, Patrick K, Themo S, Dumouchel P (2013) Efficient iterative mean shift based cosine dissimilarity for mutli-recording speaker clustering. In: Proceedings of ICASSP, pp 7712–7715
Sharma TK, Pant M (2016) Identification of noise in multi noise plant using enhanced version of shuffled frog leaping algorithm. Int J Syst Assur Eng Manag. https://doi.org/10.1007/s13198-016-0466-7
Stephen SH, Dehak N, Dehak R, Glass J (2013) Unsupervised methods for speaker diarization: an integrated and iterative approach. IEEE Trans Audio Speech Lang Process 21(10):2015–2028
Article Google Scholar
Tang S, Chu S (2012) Partially supervised speaker clustering. IEEE Trans Pattern Anal Mach Intell 34(5):959–971
Article Google Scholar
Varadarajan B, Khudanpur S, Dupoux E (2008) Unsupervised learning of acoustic subword units. In: ACL-08: HLT
Wang L, Geng X, Bezdek J, Leckie C, Ramamohanarao K (2010) Enhanced visual analysis for cluster tendency assessment and data partitioning. IEEE Trans Knowl Data Eng 22(10):1401–1414
Article Google Scholar
Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Int 13:841–847
Article Google Scholar
Zeinali H, Sameti H, Burget L (2017) HMM-based phrase-independent i-vector extractor for text-dependent speaker verification. IEEE/ACM Trans Audio Speech Lang Process 25(7):1421–1435
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, JNTUK, Kakinada, Andhra Pradesh, India
T. Suneetha Rani & M. H. M. Krishna Prasad

Authors

T. Suneetha Rani
View author publications
You can also search for this author inPubMed Google Scholar
M. H. M. Krishna Prasad
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to T. Suneetha Rani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suneetha Rani, T., Krishna Prasad, M.H.M. Access the number of speakers through visual access tendency for effective speech clustering. Int J Syst Assur Eng Manag 9, 559–566 (2018). https://doi.org/10.1007/s13198-018-0703-3

Download citation

Received: 26 September 2016
Revised: 24 January 2018
Published: 05 February 2018
Issue Date: April 2018
DOI: https://doi.org/10.1007/s13198-018-0703-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Access the number of speakers through visual access tendency for effective speech clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Access the cluster tendency by visual methods for robust speech clustering

A study on the roles of total variability space and session variability modeling in speaker recognition

Speaker Diarization in Vietnamese Voice

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now