skip to main content
10.1145/1150402.1150438acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams

Published:20 August 2006Publication History

ABSTRACT

Patterns of contrast are a very important way of comparing multi-dimensional datasets. Such patterns are able to capture regions of high difference between two classes of data, and are useful for human experts and the construction of classifiers. However, mining such patterns is particularly challenging when the number of dimensions is large. This paper describes a new technique for mining several varieties of contrast pattern, based on the use of Zero-Suppressed Binary Decision Diagrams (ZBDDs), a powerful data structure for manipulating sparse data. We study the mining of both simple contrast patterns, such as emerging patterns, and more novel and complex contrasts, which we call disjunctive emerging patterns. A performance study demonstrates our ZBDD technique is highly scalable, substantially improves on state of the art mining for emerging patterns and can be effective for discovering complex contrasts from datasets with thousands of attributes.

References

  1. F. A. Aloul, I. L. Markov, and K. A. Sakallah. MINCE: A static global variable ordering for SAT and BDD. In Int'l Workshop on Logic Synthesis, 2001.]]Google ScholarGoogle Scholar
  2. F. A. Aloul, M. N. Mneimneh, and K. Sakallah. ZBDD-based backtrack search SAT solver. In Int'l Workshop on Logic Synthesis, 2002.]]Google ScholarGoogle Scholar
  3. J. Bailey, T. Manoukian, and K. Ramamohanarao. Fast algorithms for mining emerging patterns. In Proc. of PKDD 2002, pages 39--50.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Bailey, T. Manoukian, and K. Ramamohanarao. A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns. In Proc. of ICDM, pages 485--488, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. D. Bay and M. J. Pazzani. Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery., 5(3):213--246, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, 35(8):677--691, 1986.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Chatalic and L. Simon. Multi-resolution on compressed sets of clauses. In Proc. of ICTAI, pages 2--10, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proc. of ACM KDD, pages 43--52, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Dong and J. Li. Mining border descriptions of emerging patterns from dataset pairs. Knowledge and Information Systems, 8(2):178--202, 2005.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Dong and X. Zhang and L. Wong and J. Li. CAEP: Classification by Aggregating Emerging Patterns. In Proc. of the 2nd Int'l Conf. on Discovery Science, pages 30--42, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Edmonds, J. Gryz, D. Liang, and R. J. Miller. Mining for empty spaces in large data sets. Theor. Comput. Sci., 296(3):435--452, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Fan and K. Ramamohanarao. Fast discovery and the generalization of strong jumping emerging patterns for buildihng compact and accurate classifiers. IEEE Transactions on Data Engineering, To appear.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Fujii, G. Ootomo, and C. Hori. Interleaving based variable ordering methods for ordered binary decision diagrams. In Proc. of IEEE/ACM ICCAD '93, pages 38--41, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. of the Int'l Conf. on Management of Data, pages 1--12, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Hirsh. Generalizing version spaces. Machine Learning, 17(1):5--45, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Li, J. Li, L. Wong, M. Feng, and Y. P. Tan. Relative risk and odds ratio: A data mining perspective. In PODS, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Li, G. Dong, and K. Ramamohanarao. Making use of the most expressive jumping emerging patterns for classification. In Proc. of PAKDD 2000, pages 220--232.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Li, H. Liu, J. R. Downing, A. Yeoh, and L. Wong. Simple rules underlying gene expression profiles of more than six subtypes of Acute Lymphoblastic Leukaemia (ALL) patients. Bioinformatics, 19:71--78, 2003.]]Google ScholarGoogle ScholarCross RefCross Ref
  19. J. Li and L. Wong. Emerging patterns and gene expression data. In Proc. of the 12th Workshop on Genome Informatics, pages 3--13, 2001.]]Google ScholarGoogle Scholar
  20. J. Li and L. Wong. Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics, 18(10):1406--1407, 2002.]]Google ScholarGoogle ScholarCross RefCross Ref
  21. B. Liu, L. P. Ku, and W. Hsu. Discovering interesting holes in data. In Proc. of IJCAI, pages 930--935, 1997.]]Google ScholarGoogle Scholar
  22. H. Liu, J. Han, D. Xin, and Z. Shao. Top-down mining of interesting patterns from very high dimensional data. In To appear in Proc. of ICDE'06.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Minato. Zero-suppressed BDDs for set manipulation in combinatorial problems. In Proc. of the 30th Int'l Conf. on Design Automation, pages 272--277, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Minato. Zero-suppressed BDDs and their applications. Int'l Journal on Software Tools for Technology Transfer (STTT), 3(2):156--170, 2001.]]Google ScholarGoogle Scholar
  25. S. Minato and H. Arimura. Combinatorial itemset analysis based on Zero-suppressed BDDs. In IEEE/IEICE/IPSJ Int'l Workshop on Challenges in Web Information Retrieval and Integration (WIRI), pages 3--10, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Mishchenko. An introduction to Zero-suppressed Binary Decision Diagrams.]]Google ScholarGoogle Scholar
  27. T. M. Mitchell. Generalization as Search. AI, 18(2):203--226, 1982.]]Google ScholarGoogle Scholar
  28. F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. Zaki. Carpenter: Finding closed patterns in long biological datasets. In Proc. of KDD'03, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Rauzy. Mathematical foundations of minimal cutsets. IEEE Transactions on Reliability, 50(4), 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  30. F. Rioult, J. Boulicaut, D. Crémilleux, and J. Besson. Using transposition for pattern discovery from microarray data. In DMKD, pages 73--79, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Rudell. Dynamic variable ordering for ordered binary decision diagrams. In Proc. of the Int'l Conf. on CAD, pages 42--47, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Scholl, B. Becker, and A. Brogle. The multiple variable order problem for binary decision diagrams: theory and practical application. In Proc. of the 2001 Conf. on Asia South Pacific Design Automation, pages 85--90, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Sebag. Delaying the choice of bias: A disjunctive version space approach. In Proc. of ICML 1996, pages 444--452.]]Google ScholarGoogle Scholar
  34. F. Somenzi. CUDD: CU decision diagram package, 1997. Public software, Colorado University, Boulder.]]Google ScholarGoogle Scholar
  35. A. Soulet, B. Cramilleux, and F. Rioult. Condensed representation of emerging patterns. In Proc. of PAKDD 04, pages 127--132, 2004.]]Google ScholarGoogle ScholarCross RefCross Ref
  36. R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In SIGMOD96, pages 1--12.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. I. Webb, S. Butler, and D. Newlands. On detecting differences between groups. In Proc. of KDD03, pages 256--265, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2006
      986 pages
      ISBN:1595933395
      DOI:10.1145/1150402

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 August 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader