Abstract
Discriminative clustering (DC) can effectively integrates subspace selection and clustering into a coherent framework. It performs in the iterative classical Linear Discriminant Analysis (LDA) dimensionality reduction and clustering processing. DC can effectively cluster the data with high dimension. However, it has complex form and high computational complexity. Recent work shows DC is equivalent to kernel k-means (KM) with a specific kernel matrix. This new insights provides a chance of simplifying the optimization problem in the original DC algorithm. Based on this equivalence relationship, Discriminative K-means (DKM) algorithm is proposed. When the number of data points (denoted as n) is small, DKM is feasible and efficient. However, the construction of kernel matrix needs to compute the inverse of a matrix in DKM, when n is large, which is time consuming. In this paper, we concentrate on the efficiency of DC. We present a new framework for DC, namely, Efficient DC (EDC), which consists of DKM and the whitening transformation of the regularized total scatter matrix (WRTS) plus KM clustering (WRTS+KM). When m (dimensions) is small and n far outweighs m, namely, n ≫ m, EDC can carry out WRTS+KM on data, which is more efficient than DKM. When n is small and m far outweighs n, namely, m ≫ n, EDC can carry out DKM on data, which is more efficient. We also extend EDC to soft case, and propose Efficient Discriminative Maximum Entropy Clustering (EDMEC), which is an efficient version of maximum entropy based DC. Extensive experiments on a collection of benchmark data sets are presented to show the effectiveness of the proposed algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Webb, A.: Statistical Pattern Recognition. Wiley, New Jersey (2002)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Boston (2005)
Yeung, K.Y., Medvedovic, M., Bumgarner, R.E.: Clustering gene-expression data with repeated measurements. Genome Biol. 4(5), R34 (2003)
Mitra, P., Murthy, C., Pal, S.: Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 301–312 (2002)
Law, M.H.C., Figueiredo, M.A.T., Jain, A.K.: Simultaneous feature selection and clustering using mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1154–1166 (2004)
De la Torre, F., Kanade, T.: Discriminative cluster analysis. In: Proceedings of the 23th International Conference on Machine Learning, pp. 241–248. ACM Press, New York (2006)
Ding, C., Tao, L.: Adaptive dimension reduction using discriminant analysis and K-means clustering. In: Proceedings of the 24th International Conference on Machine Learning, pp. 521–528. ACM Press, New York (2007)
Ye, J.P., Zhao, Z., Wu, M.R.: Discriminative K-means for clustering. In: 21th Advances in Neural Information Processing Systems, pp. 1649–1656. MIT Press, USA (2007)
Li, C.H., Kuo, B.C., Lin, C.T.: LDA-based clustering algorithm and its application to an unsupervised feature extraction. IEEE Trans. Fuzzy Syst. 19(1), 152–163 (2011)
Yin, X.S., Chen, S.C., Hu, E.L.: Regularized soft K-means for discriminant analysis. Neurocomputing 103, 29–42 (2013)
Zhi, X.B., Fan, J.L., Zhao, F.: Fuzzy linear discriminant analysis-guided maximum entropy fuzzy clustering algorithm. Pattern Recogn. 46(6), 1604–1615 (2013)
Fukunaga, K.: Statistical Pattern Recognition, 2nd edn. Academic Press, San Diego (1990)
Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 711–720 (1997)
Friedman, J.H.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84(405), 165–175 (1989)
Raudys, S., Duin, R.P.W.: On expected classification error of the fisher linear classifier with pseudo-inverse covariance matrix. Pattern Recogn. Lett. 19(5–6), 385–392 (1998)
Ye, J., Xiong, T.: Computational and theoretical analysis of null space and orthogonal linear discriminant analysis. J. Mach. Learn. Res. 7, 1183–1204 (2006)
Li, R.P., Mukaidono, M.: A maximum entropy approach to fuzzy clustering. In: Proceedings of the Fourth International Conference on Fuzzy Systems, pp. 2227–2232 (1995)
Karayiannis, N.B.: MECA: maximum entropy clustering algorithm. In: Proceedings of the Third IEEE International Conference on Fuzzy Systems, pp. 630–635 (1994)
Zhang, Z.H., Zheng, N.N., Shi, G.: Maximum-entropy clustering algorithm and its global convergence analysis. Sci. China Ser. E: Technol. Sci. 44(1), 89–101 (2001)
Ren, S.J., Wang, Y.D.: A proof of the convergence theorem of maximum-entropy clustering algorithm. Sci. China Ser. F Inf. Sci. 53(6), 1151–1158 (2010)
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html (2015)
Breitenbach, M., Grudic, G.: Clustering through ranking on manifolds. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 73–80. ACM Press, New York (2005)
Acknowledgements
This work is supported by the National Science Foundation of China (No. 61102095), the Science Plan Foundation of the Education Bureau of Shaanxi Province (No. 2010JK835, No. 14JK1661), the Natural Science Basic Research Plan in Shaanxi Province of China (No. 2014JM8307) and The Science and Technology Plan in Shaanxi Province of China (No. 2014KJXX-72).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhi, Xb., Fan, Jl. (2015). A New Algorithm for Discriminative Clustering and Its Maximum Entropy Extension. In: He, X., et al. Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques. IScIDE 2015. Lecture Notes in Computer Science(), vol 9243. Springer, Cham. https://doi.org/10.1007/978-3-319-23862-3_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-23862-3_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23861-6
Online ISBN: 978-3-319-23862-3
eBook Packages: Computer ScienceComputer Science (R0)