ABSTRACT
Data that can conceptually be viewed as tree structures abounds in domains such as bio-informatics, web logs, XML databases and multi-relational databases. Besides structural information such as nodes and edges, tree structured data also often contains attributes, that represent properties of nodes. Current algorithms for finding frequent patterns in structured data, do not take these attributes into account, and hence potentially useful information is neglected. We present FAT-miner, an algorithm for frequent pattern discovery in tree structured data with attributes. To illustrate the applicability of FAT-miner, we use it to explore the properties of good and bad loans in a well-known multi-relational financial database.
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487--499, 1994. Google ScholarDigital Library
- T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. In Proceedings of the Second SIAM International Conference on Data Mining, 2002.Google ScholarCross Ref
- R. Bayardo. Efficiently mining long patterns from databases. In A. T. Laura and M. Haas, editors, SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, pages 85--93, 1998. Google ScholarDigital Library
- P. Berka. Guide to the financial data set. http://lisp.vse.cz/challenge/. Workshop notes on Discovery Challenge PKDD2000.Google Scholar
- Y. Chi, R. Muntz, S. Nijssen, and J. Kok. Frequent subtree mining - an overview. Fundamenta Informaticae., 66(1--2):161--198, 2005. Google ScholarDigital Library
- L. Dehaspe and L. De Raedt. Mining association rules in multiple relations. In Proceedings of the 7th International Workshop on Inductive Logic Programming, volume 1297, pages 125--132. Springer-Verlag, 1997. Google ScholarDigital Library
- L. Denoyer and P. Gallinari. The Wikipedia XML Corpus. SIGIR Forum, 2006. Google ScholarDigital Library
- A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In D. A. Zighed, H. J. Komorowski, and J. M. Zytkow, editors, PKDD 2000, pages 13--23, 2000. Google ScholarDigital Library
- J. De Knijf. FAT-miner: Mining frequent attribute trees. Technical Report UU-CS-2006-053, Institute of Information and Computing Sciences, Utrecht University, 2006.Google Scholar
- A. Knobbe. Multi-Relational Data Mining. PhD thesis, Universiteit Utrecht, 2004.Google Scholar
- E. Ng, A. Fu, and K. Wang. Mining association rules from stars. In ICDM 2002, pages 322--329, 2002. Google ScholarDigital Library
- K. Wang and H. Liu. Discovering structural association of semistructured data. Knowledge and Data Engineering, 12(2):353--371, 2000. Google ScholarDigital Library
- X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM 2002, pages 721--724, 2002. Google ScholarDigital Library
- M. J. Zaki. Efficiently mining frequent trees in a forest. In KDD '02, pages 71--80, 2002. Google ScholarDigital Library
Index Terms
- FAT-miner: mining frequent attribute trees
Recommendations
CLS-Miner: efficient and effective closed high-utility itemset mining
High-utility itemset mining (HUIM) is a popular data mining task with applications in numerous domains. However, traditional HUIM algorithms often produce a very large set of high-utility itemsets (HUIs). As a result, analyzing HUIs can be very time ...
DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets
Frequent closed itemsets (FCI) play an important role in pruning redundant rules fast. Therefore, a lot of algorithms for mining FCI have been developed. Algorithms based on vertical data formats have some advantages in that they require scan databases ...
HPFP-Miner: A Novel Parallel Frequent Itemset Mining Algorithm
ICNC '09: Proceedings of the 2009 Fifth International Conference on Natural Computation - Volume 03Frequent itemset mining is a fundamental and essential issue in data mining field and can be used in many data mining tasks. Most of these mining tasks require multiple passes over the database and if the database size is large, which is usually the ...
Comments