Abstract
In this paper, we present a general framework to mine patterns with antimonotone constraints. This framework uses a technique that structures the pattern space in a way that facilitates the integration of constraints within the mining process. Furthermore, we also introduce a powerful strategy that uses background information on the data to speed-up the mining process. We illustrate our approach on a popular structured data mining problem, the frequent subgraph mining problem, and show, through experiments on synthetic and real-life data, that this general approach has advantages over state-of-the-art pattern mining algorithms.
Access this article
Rent this article via DeepDyve
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD international conference on management of data. Washington, DC, pp 207–216
Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In Proceedings of the 2nd annual SIAM symposium on data mining, pp 158–174
Bessière C, Régin J-C (1996) Mac and combined heuristics: two reasons to forsake FC (and CBJ ?) on hard problems
Borgelt C, Berthold MR, Patterson DE (2005) Molecular fragment mining for drug discovery. Number 3571 in lecture notes in AI. Springer Verlag, pp 1002–1013
Chen Y, Yang LH, Wang YG (2004) Incremental mining of frequent xml query patterns. In: Proceedings of 4th IEEE international conference on data mining (ICDM’04), Los Alamitos, CA, USA. IEEE Computer Society, pp 343–346
Dehaspe L, Toivonen H (1999) Discovery of frequent datalog patterns. Data Min Knowl Discov 3(1): 7–36
Deshpande M, Kuramochi M, Karypis G (2002) Automated approaches for classifying structures. In: Proceedings of the 2002 workshop on data mining in bioinformatics (BIOKDD’02). Edmonton, Canada, pp 11–18
Desrosiers C, Galinier P, Hansen P, Hertz A (2007) Sygma: reducing symmetry in graph mining. Technical Report G-2007-12, Les Cahiers du GERAD, December 2007
Fortin S (1996) The graph isomorphism problem. Technical Report 96-20, University of Alberta, Edomonton, Alberta, Canada
Gade K, Wang J, Karypis G (2004) Efficient closed pattern mining in the presence of tough block constraints. In: Proceedings of ACM SIGKDD, ACM, pp 138–147
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co Ltd., New York
Garofalakis M, Rastogi R, Shim K (2002) Mining sequential patterns with regular expression constraints. IEEE Trans Knowl Data Eng 14(3): 530–552
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Chen W, Naughton J, Bernstein PA (eds) 2000 ACM SIGMOD international conference on management of data. ACM Press, pp 1–12
Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraph in the presence of isomorphism. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM), pp 549–552
Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proc of the 4th European conference on principles of data mining and knowledge discovery. Springer-Verlag, pp 13–23
Kramer S, De Raedt L, Helma C (2001) Molecular feature mining in HIV data. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’01). ACM, New York, NY, USA, pp 136–143
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the first IEEE conference on data mining, pp 313–320
Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Discov 11(3): 243–271
Lee SD, De Raedt L (2004) Constraint based mining of first order sequences in seqlog. In: Database support for data mining application. Springer, pp 155–176
Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1(3): 241–258
McKay B (1981) Practical graph isomorphism. Congressus Numeratium 30: 45–87
National Cancer Institute (NCI) (1999) DTP/2D and 3D structural information. http://cactus.nci.nih.gov/ncidb2/download.html
Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: SIGMOD’98, pp 13–24
Nijssen S, Kok JN (2001) Faster association rules for multiple relations. In: IJCAI, pp 891–896
Nijssen S, Kok JN (2004) The gaston tool for frequent subgraph mining. In: Proceedings of the international workshop on graph-based tools (Grabats 2004), October 2004. Elsevier, pp 281–285
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th international conference on data engineering (ICDE ’01). IEEE Computer Society, Washington, DC, USA, pp 215–224
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: EDBT ’96: proceedings of the 5th international conference on extending database technology, pp 3–17
Sternberg MJE, King RD, Srinivasan A, Muggleton S (1995) Drug design by machine learning. In: Machine intelligence, vol 15, pp 328–338
Termier A, Rousset M-C, Sebag M (2002) Treefinder: a first step towards xml data mining. In: Proc of int conf on data mining ICDM’02. Maebashi, Japan, pp 450–457
Wang C, Hong M-S, Wang W, Shi B-L (2004) Chopper: efficient algorithm for tree mining. J Comput Sci Technol 19(3): 309–319
Wang C, Zhu Y, Wu T, Wang W, Shi B (2005) Constraint-based graph mining in large database. In: Proc of APWeb 2005, pp 133–144
Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM’02). IEEE Computer Society, Washington, DC, USA, pp 721–724
Yan X, Yu P, Han J (2004) Graph indexing: a frequent structure-based approach. In: Proceedings of 2004 ACM-SIGMOD international conference management of data (SIGMOD’04). Paris, France, pp 335–346
Zaki MJ (2000) Sequence mining in categorical domains: incorporating constraints. In: Proceedings of the 9th international conference on information and knowledge management, pp 422–429
Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: KDD ’02: proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. 2002 ACM, New York, NY, USA, pp 71–80
Zaki MJ, Nadimpally V, Bardhan D, Bystroff C (2004) Predicting protein folding pathways. Bioinformatics 20(1): 386–393
Zhu F, Yan X, Han J, Yu PS (2007) gPrune: a constraint pushing framework for graph pattern mining. In: Advances in knowledge discovery and data mining. PAKDD 2007, pp 388–400
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: M.J. Zaki.
Rights and permissions
About this article
Cite this article
Desrosiers, C., Galinier, P., Hertz, A. et al. Improving constrained pattern mining with first-fail-based heuristics. Data Min Knowl Disc 23, 63–90 (2011). https://doi.org/10.1007/s10618-010-0199-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0199-1