ABSTRACT
Patterns of contrast are a very important way of comparing multi-dimensional datasets. Such patterns are able to capture regions of high difference between two classes of data, and are useful for human experts and the construction of classifiers. However, mining such patterns is particularly challenging when the number of dimensions is large. This paper describes a new technique for mining several varieties of contrast pattern, based on the use of Zero-Suppressed Binary Decision Diagrams (ZBDDs), a powerful data structure for manipulating sparse data. We study the mining of both simple contrast patterns, such as emerging patterns, and more novel and complex contrasts, which we call disjunctive emerging patterns. A performance study demonstrates our ZBDD technique is highly scalable, substantially improves on state of the art mining for emerging patterns and can be effective for discovering complex contrasts from datasets with thousands of attributes.
- F. A. Aloul, I. L. Markov, and K. A. Sakallah. MINCE: A static global variable ordering for SAT and BDD. In Int'l Workshop on Logic Synthesis, 2001.]]Google Scholar
- F. A. Aloul, M. N. Mneimneh, and K. Sakallah. ZBDD-based backtrack search SAT solver. In Int'l Workshop on Logic Synthesis, 2002.]]Google Scholar
- J. Bailey, T. Manoukian, and K. Ramamohanarao. Fast algorithms for mining emerging patterns. In Proc. of PKDD 2002, pages 39--50.]] Google ScholarDigital Library
- J. Bailey, T. Manoukian, and K. Ramamohanarao. A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns. In Proc. of ICDM, pages 485--488, 2003.]] Google ScholarDigital Library
- S. D. Bay and M. J. Pazzani. Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery., 5(3):213--246, 2001.]] Google ScholarDigital Library
- R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, 35(8):677--691, 1986.]] Google ScholarDigital Library
- P. Chatalic and L. Simon. Multi-resolution on compressed sets of clauses. In Proc. of ICTAI, pages 2--10, 2000.]] Google ScholarDigital Library
- G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proc. of ACM KDD, pages 43--52, 1999.]] Google ScholarDigital Library
- G. Dong and J. Li. Mining border descriptions of emerging patterns from dataset pairs. Knowledge and Information Systems, 8(2):178--202, 2005.]]Google ScholarDigital Library
- G. Dong and X. Zhang and L. Wong and J. Li. CAEP: Classification by Aggregating Emerging Patterns. In Proc. of the 2nd Int'l Conf. on Discovery Science, pages 30--42, 1999.]] Google ScholarDigital Library
- J. Edmonds, J. Gryz, D. Liang, and R. J. Miller. Mining for empty spaces in large data sets. Theor. Comput. Sci., 296(3):435--452, 2003.]] Google ScholarDigital Library
- H. Fan and K. Ramamohanarao. Fast discovery and the generalization of strong jumping emerging patterns for buildihng compact and accurate classifiers. IEEE Transactions on Data Engineering, To appear.]] Google ScholarDigital Library
- H. Fujii, G. Ootomo, and C. Hori. Interleaving based variable ordering methods for ordered binary decision diagrams. In Proc. of IEEE/ACM ICCAD '93, pages 38--41, 1993.]] Google ScholarDigital Library
- J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. of the Int'l Conf. on Management of Data, pages 1--12, 2000.]] Google ScholarDigital Library
- H. Hirsh. Generalizing version spaces. Machine Learning, 17(1):5--45, 1994.]] Google ScholarDigital Library
- H. Li, J. Li, L. Wong, M. Feng, and Y. P. Tan. Relative risk and odds ratio: A data mining perspective. In PODS, 2005.]] Google ScholarDigital Library
- J. Li, G. Dong, and K. Ramamohanarao. Making use of the most expressive jumping emerging patterns for classification. In Proc. of PAKDD 2000, pages 220--232.]] Google ScholarDigital Library
- J. Li, H. Liu, J. R. Downing, A. Yeoh, and L. Wong. Simple rules underlying gene expression profiles of more than six subtypes of Acute Lymphoblastic Leukaemia (ALL) patients. Bioinformatics, 19:71--78, 2003.]]Google ScholarCross Ref
- J. Li and L. Wong. Emerging patterns and gene expression data. In Proc. of the 12th Workshop on Genome Informatics, pages 3--13, 2001.]]Google Scholar
- J. Li and L. Wong. Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics, 18(10):1406--1407, 2002.]]Google ScholarCross Ref
- B. Liu, L. P. Ku, and W. Hsu. Discovering interesting holes in data. In Proc. of IJCAI, pages 930--935, 1997.]]Google Scholar
- H. Liu, J. Han, D. Xin, and Z. Shao. Top-down mining of interesting patterns from very high dimensional data. In To appear in Proc. of ICDE'06.]] Google ScholarDigital Library
- S. Minato. Zero-suppressed BDDs for set manipulation in combinatorial problems. In Proc. of the 30th Int'l Conf. on Design Automation, pages 272--277, 1993.]] Google ScholarDigital Library
- S. Minato. Zero-suppressed BDDs and their applications. Int'l Journal on Software Tools for Technology Transfer (STTT), 3(2):156--170, 2001.]]Google Scholar
- S. Minato and H. Arimura. Combinatorial itemset analysis based on Zero-suppressed BDDs. In IEEE/IEICE/IPSJ Int'l Workshop on Challenges in Web Information Retrieval and Integration (WIRI), pages 3--10, 2005.]] Google ScholarDigital Library
- A. Mishchenko. An introduction to Zero-suppressed Binary Decision Diagrams.]]Google Scholar
- T. M. Mitchell. Generalization as Search. AI, 18(2):203--226, 1982.]]Google Scholar
- F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. Zaki. Carpenter: Finding closed patterns in long biological datasets. In Proc. of KDD'03, 2003.]] Google ScholarDigital Library
- A. Rauzy. Mathematical foundations of minimal cutsets. IEEE Transactions on Reliability, 50(4), 2001.]]Google ScholarCross Ref
- F. Rioult, J. Boulicaut, D. Crémilleux, and J. Besson. Using transposition for pattern discovery from microarray data. In DMKD, pages 73--79, 2003.]] Google ScholarDigital Library
- R. Rudell. Dynamic variable ordering for ordered binary decision diagrams. In Proc. of the Int'l Conf. on CAD, pages 42--47, 1993.]] Google ScholarDigital Library
- C. Scholl, B. Becker, and A. Brogle. The multiple variable order problem for binary decision diagrams: theory and practical application. In Proc. of the 2001 Conf. on Asia South Pacific Design Automation, pages 85--90, 2001.]] Google ScholarDigital Library
- M. Sebag. Delaying the choice of bias: A disjunctive version space approach. In Proc. of ICML 1996, pages 444--452.]]Google Scholar
- F. Somenzi. CUDD: CU decision diagram package, 1997. Public software, Colorado University, Boulder.]]Google Scholar
- A. Soulet, B. Cramilleux, and F. Rioult. Condensed representation of emerging patterns. In Proc. of PAKDD 04, pages 127--132, 2004.]]Google ScholarCross Ref
- R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In SIGMOD96, pages 1--12.]] Google ScholarDigital Library
- G. I. Webb, S. Butler, and D. Newlands. On detecting differences between groups. In Proc. of KDD03, pages 256--265, 2003.]] Google ScholarDigital Library
Index Terms
- Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams
Recommendations
Are zero-suppressed binary decision diagrams good for mining frequent patterns in high dimensional datasets?
AusDM '07: Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70Mining frequent patterns such as frequent itemsets is a core operation in many important data mining tasks, such as in association rule mining. Mining frequent itemsets in high-dimensional datasets is challenging, since the search space is exponential in ...
Using Highly Expressive Contrast Patterns for Classification - Is It Worthwhile?
PAKDD '09: Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data MiningClassification is an important task in data mining. Contrast patterns, such as emerging patterns, have been shown to be powerful for building classifiers, but they rarely exist in sparse data. Recently proposed disjunctive emerging patterns are highly ...
Efficient incremental mining of contrast patterns in changing data
A contrast pattern is a set of items (itemset) whose frequency differs significantly between two classes of data. Such patterns describe distinguishing characteristics between datasets, are meaningful to human experts, have strong discriminating ability ...
Comments