Skip to main content
Log in

MPM: a hierarchical clustering algorithm using matrix partitioning method for non-numeric data

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

An Erratum to this article was published on 01 May 2006

Abstract

Clustering has been widely adopted in numerous applications, including pattern recognition, data analysis, image processing, and market research. When performing data mining, traditional clustering algorithms which use distance-based measurements to calculate the difference between data are unsuitable for non-numeric attributes such as nominal, Boolean, and categorical data. Applying an unsuitable similarity measurement in clustering may cause some valuable information embedded in the data attributes to be lost, and hence low quality clusters will be created. This paper proposes a novel hierarchical clustering algorithm, referred to as MPM, for the clustering of non-numeric data. The goals of MPM are to retain the data features of interest while effectively grouping data objects into clusters with high intra-similarity and low inter-similarity. MPM achieves these goals through two principal methods: (1) the adoption of a novel similarity measurement which has the ability to capture the “characterized properties” of information, and (2) the application of matrix permutation and matrix participation partitioning to the results of the similarity measurement (constructed in the form of a similarity matrix) in order to assign data to appropriate clusters. This study also proposes a heuristic-based algorithm, the Heuristic_MPM, to reduce the processing times required for matrix permutation and matrix partitioning, which together constitute the bulk of the total MPM execution time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In Proceedings of the 11th International Conference Data Engineering (pp. 3–14). Taipei, Taiwan.

  • Berkhin, P. (2002). Survey of clustering data mining techniques. Technical Report of Accrue Software Inc.

  • Cooley, R. Mobasher, B., & Srivastava, J. (1999). Data preparation for mining World Wide Web browsing patterns. Journal of Knowledge and Information Systems, 1(1), 5–32.

    Google Scholar 

  • Giannotti, F., Gozzi, C., & Manco, G. (2002). Characterizing Web User Accesses: A Transactional Approach to Web Log Clustering. In Proceedings of the International Conference on Information Technology: Coding and Computing (pp. 246–251). Las Vegas, USA.

  • Gionis, A., Kujala, T., & Mannila, H. (2003). In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 129–136). Washington, District of Columbia.

  • Grefenstette, G. (1992). Finding Semantic Similarity in Raw Text: the Deese Antonyms. In R. Goldman, P. Norvig, E. Charniak and B. Gale (Eds.), Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language (pp. 61–65). AAAI.

  • Guha, S., Rastogi, R., & Shim, K. (1998). CURE: An efficient clustering algorithm for large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 73–84). Seattle, USA.

  • Guha, S., Rasrogi, R., & Shim, K. (1999). ROCK: A Robust Clustering Algorithm for Categorical Attributes. In Proceedings of the 15th International Conference on Data Engineering (pp. 512–521). Sydney, Australia.

  • Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques, San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Joshi, A., & Krishnapuram, R. (1998). Robust Fuzzy Clustering Methods to Support Web Mining. In Proceeding of the ACM SIGMOD Workshop on Data Mining and Knowledge Discovery (pp. 15-1–15-8). Seattle, USA.

  • Karypis, G., Han, E.-H., & Kumar, V. (1999). CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 32(8), 68–75.

    Google Scholar 

  • Kaufman, L., & Rousseeuw, P. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.

    Google Scholar 

  • MacQueen, J. (1988). Some methods for classification and analysis of multivariate observations. Journal of the American Statistical Association, 83, 715–728.

    Article  MathSciNet  Google Scholar 

  • Mannila, H., Toivonen, H., & Verkamo, A. I. (1997). Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3), 259–289.

    Article  Google Scholar 

  • Shahabi, C., Banaei-Kashani, F., Faruque, J., & Faisal, A. (2001). Feature Matrices: A Model for Efficient and Anonymous Web Usage Mining. In Proceedings of the Second International Conference on Electronic Commerce and Web Technologies (pp. 280–294). Munich, Germany.

  • Su, Y. J., Jiau, H. C., & Tsai, S. R. (2002). Adaptive Web Recommendation for New Navigation Trends. In Proceeding of the International Computer Symposium (pp. 1359–1366). Hualien, Taiwan.

  • Talavera, L., & Bejar, J. (2001). Generality-based conceptual clustering with probabilistic concepts. IEEE Transaction on Pattern Analysis and Machine Intelligence, 23, 196–206.

    Article  Google Scholar 

  • Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: An efficient data clustering method for very large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 103–114). Montreal, Canada.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi-Jen Su.

Additional information

An erratum to this article is available at http://dx.doi.org/10.1007/s10844-006-9693-8.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiau, H.C., Su, YJ., Lin, YM. et al. MPM: a hierarchical clustering algorithm using matrix partitioning method for non-numeric data. J Intell Inf Syst 26, 185–207 (2006). https://doi.org/10.1007/s10844-006-0250-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-006-0250-2

Keywords

Navigation