Skip to main content
Log in

Pattern-based clustering and attribute analysis

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The logical analysis of data (LAD) is a combinatorics, optimization and logic based methodology for the analysis of datasets with binary or numerical input variables, and binary outcomes. It has been established in previous studies that LAD provides a competitive classification tool comparable in efficiency with the top classification techniques available. The goal of this paper is to show that the methodology of LAD can be useful in the discovery of new classes of observations and in the analysis of attributes. After a brief description of the main concepts of LAD, two efficient combinatorial algorithms are described for the generation of all prime, respectively all spanned, patterns (rules) satisfying certain conditions. It is shown that the application of classic clustering techniques to the set of observations represented in prime pattern space leads to the identification of a subclass of, say positive, observations, which is accurately recognizable, and is sharply distinct from the observations in the opposite, say negative, class. It is also shown that the set of all spanned patterns allows the introduction of a measure of significance and of a concept of monotonicity in the set of attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abramson SD, Alexe G, Hammer PL, Kohn J (2005) A computational approach to predicting cell growth on polymeric biomaterials. J Biomed Mater Res A 73A(1): 166–124

    Google Scholar 

  2. Alexe G, Alexe S, Boros E, Axelrod D, Hammer PL (2003) Combinatorial analysis of breast cancer data from image cytometry and gene expression microarrays. RUTCOR-Rutgers University Technical Report, RTR 3:1-28

  3. Alexe G, Alexe S, Hammer PL, Liotta L, Petricoin E, Reiss M (2004) Logical analysis of the proteomic ovarian cancer dataset. Proteomics 4(3): 766–783

    Google Scholar 

  4. Alexe G, Alexe S, Crama Y, Foldes S, Hammer PL, Simeone B (2004) Consensus algorithms for the generation of all maximal bicliques. Discrete Applied Mathematics 145: 11–21

    Google Scholar 

  5. Alexe G, Alexe S, Hammer PL, Kogan A (in press) Comprehensive vs. comprehensible classifiers in Logical Analysis of Data. Discrete Applied Mathematics

  6. Alexe G, Hammer PL (in press) Spanned patterns in Logical Analysis of Data. Discrete Applied Mathematics

  7. Alexe S, Blackstone E, Hammer PL, Ishwaran H, Lauer MS, Snader CEP (2003) Coronary risk prediction by Logical Analysis of Data. Annals of Operations Research 119: 15–42

    Google Scholar 

  8. Alexe S, Hammer PL, Kogan A, Lejeune MA (2003) A non-recursive regression model for country risk rating. RUTCOR-Rutgers University Research Report RRR 9:1–40

  9. Alexe S, Hammer PL (in press) Accelerated algorithm for pattern detection in Logical Analysis of Data. Discrete Applied Mathematics

  10. Alexe S, Hammer PL (in press) Partern-based discriminants in the Logical Analysis of Data. To appear in Data Mining in Biomedicine, Biocomputing, Springer Berlin Heidelberg New York

  11. Blake A (1937) Canonical expressions in Boolean Algebra, PhD Thesis, University of Chicago

  12. Boros E, Hammer PL, Ibaraki T, Kogan, A, Mayoraz E, Muchnik I (2000) An implementation of Logical Analysis of Data. IEEE Transactions on Knowledge and Data Engineering 12 (2):292–306

    Google Scholar 

  13. Crama Y, Hammer PL, Ibaraki T (1988) Cause-effect relationships and partially defined Boolean functions. Annals of Operations Research 16: 299–326

    Google Scholar 

  14. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn, John Wiley & Sons, Inc

  15. Hammer PL (1986) Partially defined Boolean functions and cause-effect relationships. International Conference on Multi-Attribute Decision Making Via OR-Based Expert Systems, University of Passau, Germany

  16. Hammer A, Hammer PL, Muchnik I (1999) Logical Analysis of Chinese productivity patterns. Annals of Operations Research, 87:165-176

  17. Hammer PL, Kogan A, Simeone B, Szedmak S (2004) Pareto-optimal patterns in Logical Analysis of Data. Discrete Applied Mathematics 144:79–102

    Google Scholar 

  18. Hartigan JA (1975) Clustering Algorithms, John Wiley & Sons, Inc

  19. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning, data mining, inference and prediction, Springer Series in Statistics, Berlin Heidelberg New York

  20. Huang Z (1997) A fast clustering algorithm to cluster very large categorical data sets in data mining. SIGMOD Workshop on Research Issues on Discrete Mathematics and Knowledge Discovery

  21. Jollois FX, Nadif M (2002) Clustering large categorical data. In: Cheng MS, Yu PS, Liu B (eds) Advances in knowledge discovery and data mining Proceedings of the 6th Pacific-Asia Conference, PAKDD 2002, Taipei, Taiwan Lecture Notes in Computer Science 2336, Springer Berlin Heidelberg New York, pp 257–263

  22. Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis, John Wiley & Sons, Inc

  23. Koda Y, Ruskey F (1993) A Gray code for the ideals of a forest poset. Journal of Algorithms 15: 324–340

    Google Scholar 

  24. Lauer MS, Alexe S, Snader CEP, Blackstone E, Ishwaran H, Hammer PL (2002) Use of the Logical Analysis of Data method for assessing long-term mortality risk after exercise electrocardiography. Circulation 106:685–690

    Google Scholar 

  25. Malgrange Y (1962) Recherche des sous-matrices premières d'une matrice à coefficients binaires-Applications à certains problèmes de graphe Deuxième Congrès de l'AFCALTI, Gauthier-Villars pp 231–242

  26. Quine W (1955) A way to simplify truth functions. American Mathematical Monthly 62: 627–631

    Google Scholar 

  27. Struyf A, Hubert M, Rousseeuw PJ (1997) Integrating robust clustering techniques in S-PLUS. Computational Statistics and Data Analysis 26:17–37

    Google Scholar 

  28. Vrac E, Diday S, Winsberg S, Limam MM (2002) Symbolic class description. In Krzysztof Jajuga et al (eds) Data analysis, classification and clustering Methods, Springer Berlin Heidelberg New York

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter L. Hammer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alexe, G., Alexe, S. & Hammer, P. Pattern-based clustering and attribute analysis. Soft Comput 10, 442–452 (2006). https://doi.org/10.1007/s00500-005-0505-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-005-0505-9

Keywords

Navigation