Abstract
Self-Organizing Map (SOM) networks have been successfully applied as a clustering method to numeric datasets. However, it is not feasible to directly apply SOM for clustering transactional data. This paper proposes the Transactions Clustering using SOM (TCSOM) algorithm for clustering binary transactional data. In the TCSOM algorithm, a normalized Dot Product norm based dissimilarity measure is utilized for measuring the distance between input vector and output neuron. And a modified weight adaptation function is employed for adjusting weights of the winner and its neighbors. More importantly, TCSOM is a one-pass algorithm, which is extremely suitable for data mining applications. Experimental results on real datasets show that TCSOM algorithm is superior to those state-of-the-art transactional data clustering algorithms with respect to clustering accuracy.
Similar content being viewed by others
References
Han, E. H., Karypis, G., Kumar, V. and Mobasher, B. Clustering based on association rule hypergraphs. In: SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pp. 9–13, 1997.
Gibson, D., Kleiberg, J. and Raghavan, P. Clustering categorical data: an approach based on dynamic systems. In: Proceedings of VLDB’98, pp. 311–323, 1998.
Zhang, Y., Fu, A. W., Cai, C. H. and Heng, P. A. Clustering categorical data. In: Proceedings of ICDE’00, pp. 305–305, 2000.
Huang, Z. A fast clustering algorithm to cluster very large categorical data sets in data mining. In: SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pp. 1–8, 1997.
Z. Huang (1998) ArticleTitleExtensions to the k-means algorithm for clustering large data sets with categorical values Data Mining and Knowledge Discovery 2 IssueID3 283–304 Occurrence Handle10.1023/A:1009769707641
Jollois, F. and Nadif, M. Clustering large categorical data. In: Proceedings of PAKDD’02, pp. 257–263, 2002.
Z. Huang M.K. Ng (1999) ArticleTitleA fuzzy k-modes algorithm for clustering categorical data IEEE Transaction on Fuzzy Systems 7 IssueID4 446–452
M.K. Ng J.C. Wong (2002) ArticleTitleClustering categorical data sets using tabu search techniques Pattern Recognition 35 IssueID12 2783–2790 Occurrence Handle10.1016/S0031-3203(02)00021-3
Y. Sun Q. Zhu Z. Chen (2002) ArticleTitleAn iterative initial-points refinement algorithm for categorical data clustering Pattern Recognition Letters 23 IssueID7 875–884 Occurrence Handle10.1016/S0167-8655(01)00163-5
Ganti, V., Gehrke, J. and Ramakrishnan, R. CACTUS-clustering categorical data using summaries. In: Proceedings of KDD’99, pp. 73–83, 1999.
Guha, S., Rastogi, R. and Shim, K. ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of ICDE’99, pp 512–521, 1999.
Wang, K., Xu, C. and Liu, B. Clustering transactions using large items. In: Proceedings of CIKM’99, pp. 483–490, 1999.
Yun, C. H., Chuang, K. T. and Chen, M. S. An efficient clustering algorithm for market basket data based on small large ratios. In: Proceedings of COMPSAC’01, pp. 505–510, 2001.
Yun, C. H., Chuang, K. T. and Chen, M. S. Using category based adherence to cluster market-basket data. In: Proceedings of ICDM’02, pp. 546–553, 2002.
Xu, J. and Sung, S. Y. Caucus-based transaction clustering. In: Proceedings of DASFAA’03, pp. 81–88, 2003.
Z. He X. Xu S. Deng (2002) ArticleTitleSqueezer: an efficient algorithm for clustering categorical data Journal of Computer Science & Technology 17 IssueID5 611–624 Occurrence Handle1929400
Barbara, D., Li, Y. and Couto, J. COOLCAT: an entropy-based algorithm for categorical clustering. In: Proceedings of CIKM’02, pp 582–589, 2002.
Yang, Y., Guan, S. and You, J. CLOPE: a fast and effective clustering algorithm for transactional data. In: Proceedings of KDD’02, pp. 682–687, 2002.
Giannotti, F., Gozzi, G. and Manco, G. Clustering transactional data. In: Proceedings of PKDD’02, pp. 175–187, 2002.
D. Cristofor D. Simovici (2002) ArticleTitleFinding median partitions using information-theoretical-based genetic algorithms Journal of Universal Computer Science 8 IssueID2 153–172 Occurrence Handle1895795
Z. He X. Xu S. Deng (2005) ArticleTitleA cluster ensemble method for clustering categorical data Information Fusion 6 IssueID2 143–151 Occurrence Handle10.1016/j.inffus.2004.03.001
Ordonez, C. Clustering Binary Data Streams with K-means. In: SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, 2003.
T. Kohonen (1984) Self-organization and associative memory EditionNumber2 Springer-Verlag Berlin
A. Flexer (2001) ArticleTitleOn the use of self-organizing maps for clustering and visualization Intelligent Data Analysis 5 IssueID5 373–384 Occurrence Handle02101276
Shum, W-H., Jin, H., Leung, K-S. and Wong, M. L. A Self-Organizing Map with Expanding Force for Data Clustering and Visualization. In: Proceedings of ICDM’02, pp. 434–441, 2002.
Domingos, P. and Hulton, G. Catching up with the data: research issues in mining data streams. In: 2001 SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2001.
Merz, C. J. and Merphy, P. UCI Repository of Machine Learning Databases, 1996. (http://www.ics.uci.edu/∼mlearn/MLRRepository.html).
M.F. Jiang S.S. Tseng C.M. Su (2001) ArticleTitleTwo-phase clustering process for outliers detection Pattern Recognition Letters 22 IssueID6–7 691–700
Z. He X. Xu S. Deng (2003) ArticleTitleDiscovering cluster based local outliers Pattern Recognition Letters 24 IssueID9–10 1641–1650
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
He, Z., Xu, X. & Deng, S. TCSOM: Clustering Transactions Using Self-Organizing Map. Neural Process Lett 22, 249–262 (2005). https://doi.org/10.1007/s11063-005-8016-3
Issue Date:
DOI: https://doi.org/10.1007/s11063-005-8016-3