Abstract
Identifying the cues for speech segments of speech data is an indispensable task in speaker clustering. The existing techniques perform the task of speech clustering without any prior knowledge of cluster tendency. Many techniques are investigated for finding a prior cluster tendency (CT). During the investigation, the visual access tendency (VAT) is recognized as a reasonable choice to find a cluster tendency. The speech clustering poses three important problems, which are as follows: modelling the speech data, cluster tendency, and effective speech clustering. Modelling is required for defining the shape of the speech segment based on the characteristics of speaker’s voice; hence it is useful for speech recognition. The GMM is a good choice for obtaining the precise model of speech data. Determining the number of speakers (or number of clusters) for the speech is known as cluster tendency. The quality of speech clustering depends on modelling and a prior clustering tendency. The classical algorithms [such as k-means, and minimum spanning tree (MST)-based-clustering] are merged with VAT for determining the effective clustering results along with a prior cluster tendency. We use linear subspace learning for representing the speech segments (or speech utterances) in a projected space of high-dimensional data. Various linear subspace learning techniques are used for improving the speech clustering results. The proposed approaches are hybrid approaches (i.e., k-means-CT, and MST–CT-based clustering), they use expensive steps. For this key reason, we propose another method, direct visualized clustering method, in which we derive the explicit speaker clustering results directly from VAT instead of using either k-means or MST-based clustering. We experimented the proposed methods on TSP speech datasets and done the comparative study for demonstrating the effectiveness of our work.
Similar content being viewed by others
References
Atal BS (1974) Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoust Soc Am 55(6):1304–1312
Bezdek J et al (2002) VAT: a tool for visual assessment of cluster tendency. In: Proceedings of international joint conference on neural networks, pp 2225–2230
Berkhin P (2002) Survey of clustering data mining techniques. Technical report, accrue software
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Proceedings of AAAI94 workshop knowledge discovery in databases (KDD), pp 359–370
Bezdek JC et al (2007) Visual assessment of clustering tendency for rectangular dissimilarity matrices. IEEE Trans Fuzzy Syst 15(5):890–903
Campbell WM et al (2006) Support vector machines using GMM supervectors for speaker verification, pp 308–311
Chenzhe Z et al (2014) SVM Venn machine with k-means clustering. In: Artificial intelligence applications and innovations, IFIP advances in information and communication technology, vol 437, pp 251–260
Chu S, Tang H, Haung T (2009) Fishervoice and semi-supervised speaker clustuering. In: Proceeding of IEEE international acoustic speech and signal processing, pp 4089–4092
Chen S et al (2014) A hybrid Clustering algorithm based on fuzzy c-means and improved swarm optimization. Arab J Sci Eng 39(12):8875–8887
Duda R, Hart P, Stork D (2000) Pattern classification, 2nd edn. Wiley, New York
Ester M, Kriegel P, Sander J, Xu X (1996) A density based algorithm for discovering clusters in large databases with noise. In: Proceedings of 3rd international conference on knowledge discovery and data mining, pp 226–231
Fisher B, Zollar T, Nuhman J (2001) Path based pairwise data clustering with application to texture segmentation. In: Energy minimization methods in computer vision and pattern recognition, vol 2, no 134, pp 235–250
Fu AWC et al (2005) Scaling and time warping in time series querying. In: Proceedings of VLDB conference
Ghoting A, Parhasarthy S, Otey ME (2006) Fast mining of distance-based outliers in high dimensional datasets. In: Proceedings of SIAM international conference data mining (SDM), vol 16, no. 13, pp 349–364
Grygorash O, Zhou Y, Jorgensen Z (2006) Minimum spanning tree-based clustering algorithms. In: Proceedings of IEEE international conference tools with artificial intelligence, pp 73–81
He X et al (2003) Locality preserving projections. In: Proceedings of advance in neural information processing systems
He X et al (2005) Neighborhood preserving embedding. In: Proceeding of IEEE International conference computer vision, pp 1208–1213
Havens TC et al (2013) An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans Knowl Data Eng 24(5):813–822
Han J, Kamber M (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, San Francisco
Havens TC, Bezdek JC et al (2006) Scalable single linkage hierarchical clustering for big data, pp 215–234
Havens TC, Bezdek JC et al (2010) A new Implementation of the co-VAT algorithm for visual assessment of clusters in rectangular relational data, pp 363–371
Havens TC, Bezdek JC, Keller JM, Popescu M (2008) Dunns cluster validity index as a contrast measure of VAT images. In: International conference on pattern recognition, pp 1–4
He X et al (2011) Laplacian regularized Gaussian mixture model for data clustering. IEEE Trans Knowl Data Eng 23(9):1406–1418
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, New Jersey
Jain AK, Murthi MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Jana PK, Naik A (2009) An efficient minimum spanning tree based clustering algorithm. In: International conference on methods and models in computer science
Karypis G, Han E (1999) A hierarchical clustering algorithm using dynamic modeling. IEEE Trans Comput Spec Issue Data Anal Min 32(8):68–75
Laszlo M, Mukherjee S (2005) Minimum spanning tree partitioning algorithm for micro aggregation. IEEE Trans Knowl Data Eng 17(7):902–911
Lew M et al (2006) Content-based multimedia information retrieval: state of art and challenges. ACM Trans Multimed Comput Commun Appl 2(3):1–19
Li X, Hu W, Shen C, Dick A, Zhang Z (2014) Context-aware hypergraph construction for robust spectral clustering. IEEE Trans Knowl Data Eng, 26(10):2588–2597
Lovasz L, Plummer M (1986) Matching theory. Budapest, Northholland
Mammone R, Zhang X, Ramachandran R (1996) Robust speaker recognition: a feature-based approach. IEEE Signal Process Mag 13(5):58–71
Moore J et al (1997) Web page categorization and feature selection using association rule and principle component clustering. In: Proceeding of workshop Information technologies and systems
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66
Ozertem U, Erdogmus D et al (2008) Mean shift spectral clustering. Pattern Recognit 41(4):1924–1938
Pujari AK (2001) Data mining techniques. Universities Press, London
Reynolds DA (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Proc 10:19–41
Senoussaoui M et al (2014) A study of the cosine distance-based mean shift for telephone speech diarization. In: IEEE transactions on audio, speech, and language processing, vol 22, no 1
Tang H et al (2012) Partially supervised speaker clustering. IEEE Trans Pattern Anal Mach Intell 34(5):959–971
Togneri R et al. (2011) An overview of speaker identification: accuracy and robustness issues. IEEE Circuits Syst Mag 11(2):23–61
Vathy-Fogarassy A, Kiss A, Abonyi J (2006) Hybrid minimal spanning tree and mixture of Gaussians based clustering algorithm. In: Foundations of information and knowledge systems. Springer, pp 313–330
Voila P, Wells III W (1997) Alignment by maximization of mutual information. Int J Comput Vis 24(2):137–154
Wang X, Wang X, Wilkes DM (2009) A divide-and-conquer approach for minimum spanning tree-based clustering. IEEE Trans Knowl Data Eng 21(7):945–958
Wang L, Bezdek J, Ramamohanarao K (2010) Enhanced visual analysis for cluster tendency assessment and data partitioning. IEEE Trans Knowl Data Eng 22(10):1401–1414
Wu C-H, Ouyang C-S, Chen L-W, Lu L-W (2015) A new fuzzy clustering validity index with a median factor for centroid-based clustering. IEEE Trans Fuzzy Syst 23(3):701–718
Wang L et al (2010) SpecVAT: enhanced visual cluster analysis. In: Proceedings of international conference on data mining, pp 152–157
Xu Y, Olman V, Xu D (2002) Clustering gene expression data using a graph theoretic approach: an application of minimum spanning trees. Bioinformatics 18(4):536–545
Xuand R, Wunsch D II (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Zahn CT (1971) Graph theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput 20(1):68–86
Zhao Y et al (2006) Generalized dimension-reduction framework for recent-biased time series analysis. IEEE Trans Knowl Data Eng 18(2):231–244
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Suneetha Rani, T., Krishna Prasad, M.H.M. Access the cluster tendency by visual methods for robust speech clustering. Int J Syst Assur Eng Manag 8 (Suppl 1), 465–477 (2017). https://doi.org/10.1007/s13198-015-0393-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13198-015-0393-z