Abstract
Recently, a considerable growth of interest in using Nonnegative Matrix Factorization (NMF) for pattern classification and data clustering has been observed. For nonnegative data (observations, data items, feature vectors) many problems of partitional clustering can be modeled in terms of a matrix factorization into two groups of vectors: the nonnegative centroid vectors and the binary vectors of cluster indicators. Hence our data partitional clustering problem boils down to a semi-binary NMF problem. Usually, NMF problems are solved with an alternating minimization of a given cost function with multiplicative algorithms. Since our NMF problem has a particular characteristics, we apply a different algorithm for updating the estimated factors than commonly-used, i.e. a binary update with simulated annealing steering. As a result, our algorithm outperforms some well-known algorithms for partitional clustering.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Compututing Surveys 31(3), 264–323 (1999)
Mcqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Anderberg, M.R.: Cluster Analysis for Applications. Monographs and Textbooks on Probability and Mathematical Statistics. Academic Press, Inc., New York (1973)
Ball, G.H., Hall, D.J.: ISODATA, a novel method of data analysis and classification. Technical report, Stanford University, Stanford, CA (1965)
Diday, E.: The dynamic cluster method in non-hierarchical clustering. J. Comput. Inf. Sci. 2, 61–88 (1973)
Symon, M.J.: Clustering criterion and multi-variate normal mixture. Biometrics 77, 35–43 (1977)
Mao, J., Jain, A.K.: A self-organizing network for hyperellipsoidal clustering (HEC). IEEE Trans. Neural Netw. 7(1), 16–29 (1996)
Dhillon, I.S., Modha, D.M.: Concept decompositions for large sparse text data using clustering. Machine Learning J. 42, 143–175 (2001)
Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C-20, 68–86 (1971)
Ozawa, K.: A stratificational overlapping cluster scheme. Pattern Recogn. 18, 279–286 (1985)
Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River (1988)
Mitchell, T. (ed.): Machine Learning. McGraw Hill, Inc., New York (1997)
Lu, S.Y., Fu, K.S.: A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans. Syst. Man Cybern. 8, 381–389 (1978)
Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)
Ruspini, E.H.: A new approach to clustering. Inf. Control 15, 22–32 (1969)
Bezdek, J.C.: Pattern Recognition With Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Sethi, I., Jain, A.K. (eds.): Artificial Neural Networks and Pattern Recognition: Old and New Connections. Elsevier Science Inc., New York (1991)
Jain, A.K., Mao, J.: Neural networks and pattern recognition. In: Zurada, J.M., Marks II, R.J., Robinson, E.G. (eds.) Computational Intell. Imitating Life, pp. 194–212. IEEE Press, Los Alamitos (1994)
Kohonen, T.: Self-organization and associative memory, 3rd edn. Springer, New York (1989)
Raghavan, V.V., Birchard, K.: A clustering strategy based on a formalism of the reproductive process in natural systems. SIGIR Forum 14(2), 10–22 (1979)
Special issue on evolutionary computation. In: Fogel, D.B., Fogel, L.J. (eds.) IEEE Transactions Neural Networks (1994)
Jones, D., Beltramo, M.A.: Solving partitioning problems with genetic algorithms. In: Proc. of the Fourth International Conference on Genetic Algorithms, pp. 442–449. Morgan Kaufmann Publishers, San Francisco (1991)
Koontz, W.L.G., Fukunaga, K., Narendra, P.M.: A branch and bound clustering algorithm. IEEE Trans. Comput. 23, 908–914 (1975)
Cheng, C.H.: A branch-and-bound clustering algorithm. IEEE Trans. Syst. Man Cybern. 25(5), 895–898 (1995)
Rojas, M., Santos, S.A., Sorensen, D.C.: Deterministic annealing approach to constrained clustering. IEEE Trans. Pattern Anal. Mach. Intell. 15, 785–794 (1993)
Baeza-Yates, R.A.: Introduction to data structures and algorithms related to information retrieval. In: Information retrieval: data structures and algorithms, pp. 13–27. Prentice-Hall, Inc., Upper Saddle River (1992)
Lee, D.D., Seung, H.S.: Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)
Lee, D.D., Seung, H.S.: Algorithms for nonnegative matrix factorization. In: NIPS, pp. 556–562 (2000)
Shahnaz, F., Berry, M., Pauca, P., Plemmons, R.: Document clustering using nonnegative matrix factorization. Inf. Process. Manage. 42(2), 373–386 (2006)
Li, T., Ding, C.: The relationships among various nonnegative matrix factorization methods for clustering. In: ICDM 2006, pp. 362–371. IEEE Computer Society, Washington, DC, USA (2006)
Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix tri-factorizations for clustering. In: KDD 2006: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 126–135. ACM Press, New York (2006)
Okun, O.G.: Non-negative matrix factorization and classifiers: experimental study. In: Proc. of the Fourth IASTED International Conference on Visualization, Imaging, and Image Processing (VIIP 2004), Marbella, Spain, pp. 550–555 (2004)
Zass, R., Shashua, A.: A unifying approach to hard and probabilistic clustering. In: International Conference on Computer Vision (ICCV), Beijing, China (2005)
Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. In: SIAM International Conf. on Data Mining, Lake Buena Vista, Florida. SIAM, Philadelphia (2004)
Carmona-Saez, P., Pascual-Marqui, R.D., Tirado, F., Carazo, J.M., Pascual-Montano, A.: Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinformatics 7(78) (2006)
Cho, H., Dhillon, I.S., Guan, Y., Sra, S.: Minimum sum squared residue based co-clustering of gene expression data. In: Proc. 4th SIAM International Conference on Data Mining (SDM), Florida, pp. 114–125 (2004)
Wild, S.: Seeding non-negative matrix factorization with the spherical k-means clustering. M.Sc. Thesis, University of Colorado (2000)
Ding, C., He, X., Simon, H.D.: On the equivalence of nonnegative matrix factorization and spectral clustering. In: Jonker, W., Petković, M. (eds.) SDM 2005. LNCS, vol. 3674, pp. 606–610. Springer, Heidelberg (2005)
Cichocki, A., Zdunek, R., Amari, S.: New algorithms for non-negative matrix factorization in applications to blind source separation. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2006, Toulouse, France, pp. 621–624 (2006)
Cichocki, A., Amari, S., Zdunek, R., Kompass, R., Hori, G., He, Z.: Extended SMART algorithms for non-negative matrix factorization. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 548–562. Springer, Heidelberg (2006)
Zdunek, R., Cichocki, A.: Nonnegative matrix factorization with constrained second-order optimization. Signal Processing 87, 1904–1916 (2007)
Li, H., Adali, T., Wang, W., Emge, D., Cichocki, A.: Non-negative matrix factorization with orthogonality constraints and its application to Raman spectroscopy. Journal of VLSI Signal Processing 48(1-2), 83–97 (2007)
Sajda, P., Du, S., Brown, T.R., Shungu, R.S.D.C., Mao, X., Parra, L.C.: Nonnegative matrix factorization for rapid recovery of constituent spectra in magnetic resonance chemical shift imaging of the brain. IEEE Trans. Medical Imaging 23(12), 1453–1465 (2004)
Cho, Y.C., Choi, S.: Nonnegative features of spectro-temporal sounds for classification. Pattern Recognition Letters 26, 1327–1336 (2005)
Liu, W., Zheng, N.: Non-negative matrix factorization based methods for object recognition. Pattern Recognition Letters 25(8), 893–897 (2004)
Guillamet, D., Schiele, B., Vitrià, J.: Analyzing non-negative matrix factorization for image classification. In: 16th International Conference on Pattern Recognition (ICPR 2002), Quebec City, Canada, vol. 2, pp. 116–119 (2002)
Lin, C.J.: Projected gradient methods for non-negative matrix factorization. Neural Computation 19(10), 2756–2779 (2007)
Kim, D., Sra, S., Dhillon, I.S.: Fast Newton-type methods for the least squares nonnegative matrix approximation problem. In: Proc. 6-th SIAM International Conference on Data Mining, Minneapolis, Minnesota, USA (2007)
Heiler, M., Schnörr, C.: Learning sparse representations by non-negative matrix factorization and sequential cone programming. J. Mach. Learn. Res. 7, 1385–1407 (2006)
Zdunek, R., Cichocki, A.: Nonnegative matrix factorization with quadratic programming. Neurocomputing (accepted, 2008)
Cichocki, A., Zdunek, R.: Regularized alternating least squares algorithms for non-negative matrix/tensor factorizations. In: Liu, D., Fei, S., Hou, Z., Zhang, H., Sun, C. (eds.) ISNN 2007. LNCS, vol. 4493, pp. 793–802. Springer, Heidelberg (2007)
Herman, G.T., Kuba, A. (eds.): Discrete Tomography: Foundations, Algorithms, and Applications. Birkhauser, Boston (1999)
Cao, B., Shen, D., Sun, J.T., Wang, X., Yang, Q., Chen, Z.: Detect and track latent factors with online nonnegative matrix factorization. In: Proc. the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, pp. 2689–2694 (2007)
Zhang, Z., Li, T., Ding, C., Zhang, X.S.: Binary matrix factorization with applications. In: Proc. IEEE Intternational Conference on Data Mining (ICDM) (to appear, 2007)
Green, P.J.: Bayesian reconstruction from emission tomography data using a modified EM algorithm. IEEE Trans. Medical Imaging 9, 84–93 (1990)
Zdunek, R., Pralat, A.: Detection of subsurface bubbles with discrete electromagnetic geotomography. Electronic Notes in Discrete Mathematics 20, 535–553 (2005)
Phillips, J.W., Leahy, R.M., Mosher, J.C.: MEG-based imaging of focal neuronal current sources. IEEE Trans. Medical Imaging 16, 248–338 (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zdunek, R. (2008). Data Clustering with Semi-binary Nonnegative Matrix Factorization. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2008. ICAISC 2008. Lecture Notes in Computer Science(), vol 5097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69731-2_68
Download citation
DOI: https://doi.org/10.1007/978-3-540-69731-2_68
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69572-1
Online ISBN: 978-3-540-69731-2
eBook Packages: Computer ScienceComputer Science (R0)