Improving constrained pattern mining with first-fail-based heuristics

Desrosiers, Christian; Galinier, Philippe; Hertz, Alain; Hansen, Pierre

doi:10.1007/s10618-010-0199-1

Improving constrained pattern mining with first-fail-based heuristics

Published: 21 August 2010

Volume 23, pages 63–90, (2011)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Christian Desrosiers¹,
Philippe Galinier²,
Alain Hertz² &
…
Pierre Hansen³

229 Accesses
Explore all metrics

Abstract

In this paper, we present a general framework to mine patterns with antimonotone constraints. This framework uses a technique that structures the pattern space in a way that facilitates the integration of constraints within the mining process. Furthermore, we also introduce a powerful strategy that uses background information on the data to speed-up the mining process. We illustrate our approach on a popular structured data mining problem, the frequent subgraph mining problem, and show, through experiments on synthetic and real-life data, that this general approach has advantages over state-of-the-art pattern mining algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Relaxation-Based Approach for Mining Diverse Closed Patterns

Mining diverse sets of patterns with constraint programming using the pairwise Jaccard similarity relaxation

Article 01 June 2024

Constrained pattern mining in the new era

Article 23 July 2015

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD international conference on management of data. Washington, DC, pp 207–216
Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In Proceedings of the 2nd annual SIAM symposium on data mining, pp 158–174
Bessière C, Régin J-C (1996) Mac and combined heuristics: two reasons to forsake FC (and CBJ ?) on hard problems
Borgelt C, Berthold MR, Patterson DE (2005) Molecular fragment mining for drug discovery. Number 3571 in lecture notes in AI. Springer Verlag, pp 1002–1013
Chen Y, Yang LH, Wang YG (2004) Incremental mining of frequent xml query patterns. In: Proceedings of 4th IEEE international conference on data mining (ICDM’04), Los Alamitos, CA, USA. IEEE Computer Society, pp 343–346
Dehaspe L, Toivonen H (1999) Discovery of frequent datalog patterns. Data Min Knowl Discov 3(1): 7–36
Article Google Scholar
Deshpande M, Kuramochi M, Karypis G (2002) Automated approaches for classifying structures. In: Proceedings of the 2002 workshop on data mining in bioinformatics (BIOKDD’02). Edmonton, Canada, pp 11–18
Desrosiers C, Galinier P, Hansen P, Hertz A (2007) Sygma: reducing symmetry in graph mining. Technical Report G-2007-12, Les Cahiers du GERAD, December 2007
Fortin S (1996) The graph isomorphism problem. Technical Report 96-20, University of Alberta, Edomonton, Alberta, Canada
Gade K, Wang J, Karypis G (2004) Efficient closed pattern mining in the presence of tough block constraints. In: Proceedings of ACM SIGKDD, ACM, pp 138–147
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co Ltd., New York
MATH Google Scholar
Garofalakis M, Rastogi R, Shim K (2002) Mining sequential patterns with regular expression constraints. IEEE Trans Knowl Data Eng 14(3): 530–552
Article Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Chen W, Naughton J, Bernstein PA (eds) 2000 ACM SIGMOD international conference on management of data. ACM Press, pp 1–12
Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraph in the presence of isomorphism. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM), pp 549–552
Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proc of the 4th European conference on principles of data mining and knowledge discovery. Springer-Verlag, pp 13–23
Kramer S, De Raedt L, Helma C (2001) Molecular feature mining in HIV data. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’01). ACM, New York, NY, USA, pp 136–143
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the first IEEE conference on data mining, pp 313–320
Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Discov 11(3): 243–271
Article MathSciNet Google Scholar
Lee SD, De Raedt L (2004) Constraint based mining of first order sequences in seqlog. In: Database support for data mining application. Springer, pp 155–176
Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1(3): 241–258
Article Google Scholar
McKay B (1981) Practical graph isomorphism. Congressus Numeratium 30: 45–87
MathSciNet Google Scholar
National Cancer Institute (NCI) (1999) DTP/2D and 3D structural information. http://cactus.nci.nih.gov/ncidb2/download.html
Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: SIGMOD’98, pp 13–24
Nijssen S, Kok JN (2001) Faster association rules for multiple relations. In: IJCAI, pp 891–896
Nijssen S, Kok JN (2004) The gaston tool for frequent subgraph mining. In: Proceedings of the international workshop on graph-based tools (Grabats 2004), October 2004. Elsevier, pp 281–285
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th international conference on data engineering (ICDE ’01). IEEE Computer Society, Washington, DC, USA, pp 215–224
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: EDBT ’96: proceedings of the 5th international conference on extending database technology, pp 3–17
Sternberg MJE, King RD, Srinivasan A, Muggleton S (1995) Drug design by machine learning. In: Machine intelligence, vol 15, pp 328–338
Termier A, Rousset M-C, Sebag M (2002) Treefinder: a first step towards xml data mining. In: Proc of int conf on data mining ICDM’02. Maebashi, Japan, pp 450–457
Wang C, Hong M-S, Wang W, Shi B-L (2004) Chopper: efficient algorithm for tree mining. J Comput Sci Technol 19(3): 309–319
Article Google Scholar
Wang C, Zhu Y, Wu T, Wang W, Shi B (2005) Constraint-based graph mining in large database. In: Proc of APWeb 2005, pp 133–144
Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM’02). IEEE Computer Society, Washington, DC, USA, pp 721–724
Yan X, Yu P, Han J (2004) Graph indexing: a frequent structure-based approach. In: Proceedings of 2004 ACM-SIGMOD international conference management of data (SIGMOD’04). Paris, France, pp 335–346
Zaki MJ (2000) Sequence mining in categorical domains: incorporating constraints. In: Proceedings of the 9th international conference on information and knowledge management, pp 422–429
Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: KDD ’02: proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. 2002 ACM, New York, NY, USA, pp 71–80
Zaki MJ, Nadimpally V, Bardhan D, Bystroff C (2004) Predicting protein folding pathways. Bioinformatics 20(1): 386–393
Article Google Scholar
Zhu F, Yan X, Han J, Yu PS (2007) gPrune: a constraint pushing framework for graph pattern mining. In: Advances in knowledge discovery and data mining. PAKDD 2007, pp 388–400

Download references

Author information

Authors and Affiliations

Ecole de Technologie Supérieure, 1100, Notre-Dame O., Montreal, QC, H3C 1K3, Canada
Christian Desrosiers
Ecole Polytechnique de Montréal, C.P. 6079 succ. Centre-ville, Montreal, QC, H3C 3A7, Canada
Philippe Galinier & Alain Hertz
HEC Montréal, 3000, Cote-Sainte-Catherine, Montreal, QC, H3T 2A7, Canada
Pierre Hansen

Authors

Christian Desrosiers
View author publications
You can also search for this author inPubMed Google Scholar
Philippe Galinier
View author publications
You can also search for this author inPubMed Google Scholar
Alain Hertz
View author publications
You can also search for this author inPubMed Google Scholar
Pierre Hansen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Christian Desrosiers.

Additional information

Responsible editor: M.J. Zaki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Desrosiers, C., Galinier, P., Hertz, A. et al. Improving constrained pattern mining with first-fail-based heuristics. Data Min Knowl Disc 23, 63–90 (2011). https://doi.org/10.1007/s10618-010-0199-1

Download citation

Received: 09 October 2009
Accepted: 09 August 2010
Published: 21 August 2010
Issue Date: July 2011
DOI: https://doi.org/10.1007/s10618-010-0199-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving constrained pattern mining with first-fail-based heuristics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Relaxation-Based Approach for Mining Diverse Closed Patterns

Mining diverse sets of patterns with constraint programming using the pairwise Jaccard similarity relaxation

Constrained pattern mining in the new era

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now