An integrated, generic approach to pattern mining: data mining template library

Chaoji, Vineet; Al Hasan, Mohammad; Salem, Saeed; Zaki, Mohammed J.

doi:10.1007/s10618-008-0098-x

An integrated, generic approach to pattern mining: data mining template library

Published: 18 May 2008

Volume 17, pages 457–495, (2008)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Vineet Chaoji¹,
Mohammad Al Hasan¹,
Saeed Salem¹ &
…
Mohammed J. Zaki¹

401 Accesses
21 Citations
3 Altmetric
Explore all metrics

Abstract

Frequent pattern mining (FPM) is an important data mining paradigm to extract informative patterns like itemsets, sequences, trees, and graphs. However, no practical framework for integrating the FPM tasks has been attempted. In this paper, we describe the design and implementation of the Data Mining Template Library (DMTL) for FPM. DMTL utilizes a generic data mining approach, where all aspects of mining are controlled via a set of properties. It uses a novel pattern property hierarchy to define and mine different pattern types. This property hierarchy can be thought of as a systematic characterization of the pattern space, i.e., a meta-pattern specification that allows the analyst to specify new pattern types, by extending this hierarchy. Furthermore, in DMTL all aspects of mining are controlled by a set of different mining properties. For example, the kind of mining approach to use, the kind of data types and formats to mine over, the kind of back-end storage manager to use, are all specified as a list of properties. This provides tremendous flexibility to customize the toolkit for various applications. Flexibility of the toolkit is exemplified by the ease with which support for a new pattern can be added. Experiments on synthetic and public dataset are conducted to demonstrate the scalability provided by the persistent back-end in the library. DMTL been publicly released as open-source software (http://dmtl.sourceforge.net/), and has been downloaded by numerous researchers from all over the world.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Big data analytics on Apache Spark

Article 13 October 2016

Big data preprocessing: methods and prospects

Article Open access 01 November 2016

References

Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD conference on management of data
Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Advances in knowledge discovery and data mining. AAAI Press, Menlo Park, CA, pp 307–328
Agrawal R, Srikant R (1995) Mining sequential patterns. In: 11th International conference on data engineering
Antunes C, Oliveira AL (2004) Sequential pattern mining algorithms: Trade-offs between speed and memory. In: 2nd International workshop on mining graphs, trees and sequences with ECML/PKDD
Asai T, Abe K, Kawasoe S, Arimura H, Satamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: 2nd SIAM international conference on data mining
Asai T, Arimura H, Uno T, Nakano S (2003) Discovering frequent substructures in large unordered trees. In: 6th International conference on discovery science
Ayres J, Flannick J, Gehrke JE, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: ACM SIGKDD international conference on knowledge discovery and data mining
Balcazar JL, Casas-Garriga G (2005) On horn axiomatizations for sequential data. In: 10th International confererence on database theory
Bayardo RJ Jr (1998) Efficiently mining long patterns from databases. In: SIGMOD ’98: Proceedings of the 1998 ACM SIGMOD international conference on Management of data. ACM, New York, USA, pp 85–93
Brin S, Motwani R, Ullman J, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: ACM SIGMOD conference on management of data
Buehrer G, Parthasarathy S, Ghoting A (2006). Out-of-core frequent pattern mining on a commodity PC. In: ACM SIGKDD international conference on knowledge discovery and data mining
Burdick D, Calimlim M, Gehrke J (2001a) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: IEEE international conference on data engineering
Burdick D, Calimlim M, Gehrke J (2001b) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: 17th International conference on data engineering
Chi Y, Yang Y, Muntz RR (2003) Indexing and mining free trees. In: 3rd IEEE international conference on data mining
Chi Y, Yang Y, Muntz RR (2004a) HybridTreeMiner: an efficient algorihtm for mining frequent rooted trees and free trees using canonical forms. In: 16th International conference on scientific and statistical database management
Chi Y, Yang Y, Xia Y, Muntz RR (2004b) CMTreeMiner: mining both closed and maximal frequent subtrees. In: 8th Pacific-Asia conferernce on knowledge discovery and data mining
Cook D, Holder L (1994) Substructure discovery using minimal description length and background knowledge. J Arti Intell Res 1: 231–255
Google Scholar
Dehaspe L, Toivonen H, King R (1998) Finding frequent substructures in chemical compounds. In: 4th ACM SIGKDD international conference knowledge discovery and data mining
Ganter B, Wille R (1999) Formal concept analysis: mathematical foundations. Springer-Verlag
Garofalakis M, Rastogi R, Shim K (1999) SPIRIT: sequential pattern mining with regular expression constraints. In: 25th International conference on very large data bases
Ghoting A, Buehrer G, Parthasarathy S, Kim D, Nguyen A, Chen Y-K, Dubey P (2005) Cache-conscious frequent pattern mining on a modern processor. In: 31st International conference on very large data bases
Goethals B, Zaki MJ (2003) Advances in frequent itemset mining implementations: report on FIMI’03. SIGKDD Explor 6: 109–117
Article Google Scholar
Gschwind T (2001) PSTL—A C++ Persistent Standard Template Library. In: 6th USENIX conference on object-oriented technologies and systems
Han J, Pei J, Yin Y (2000a) Mining frequent patterns without candidate generation. In: ACM SIGMOD conference on management of data
Han J, Pei J, Yin Y (2000b). Mining frequent patterns without candidate generation. In: ACM SIGMOD conference on management of data
Hasan MA, Chaoji V, Salem S, Zaki M (2005) DMTL: A generic Data Mining Template Library. In: 1st Workshop on library-centric software design (with OOPSLA)
Horváth T, Ramon J, Wrobel S (2006) Frequent subgraph mining in outerplanar graphs. In: 12th ACM SIGKDD international conference on knowledge discovery and data mining
Huan J, Wang W, Prins J (2003a) Efficient mining of frequent subgraphs in the presence of isomorphism. In: IEEE international conference on data mining
Huan J, Wang W, Prins J (2003b) Efficient mining of frequent subgraphs in the presence of isomorphism (Technical report TR03-021). University of North Carolina
Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: 4th European conference on principles of knowledge discovery and data mining
Inokuchi A, Washio T, Motoda H (2003) Complete mining of frequent patterns from graphs: Mining graph data. Machine Learn 50: 321–354
Article MATH Google Scholar
Kramer S, Raedt LD, Helma C (2001) Molecular feature mining in HIV data. In: ACM SIGKDD international conference on knowledge discovery and data mining
Kuramochi M, Karypis G (2001). Frequent subgraph discovery. In: 1st IEEE international conference on data mining
Kuramochi M, Karypis G (2004) An efficient algorithm for discovering frequent subgraphs. IEEE Transactions on Knowledge and Data Engineering 16: 1038–1051
Article Google Scholar
Mannila H, Toivonen H. (1996) Discovering generalized episodes using minimal occurences. In: 2nd International conference knowledge discovery and data mining
Mannila H, Toivonen H. (1997) Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1: 241–258
Article Google Scholar
Mannila H, Toivonen H, Verkamo I (1995) Discovering frequent episodes in sequences. In: 1st International conference knowledge discovery and data mining
Musser D, Derge G, Saini A. (2001) STL tutorial and reference guide, 2nd edition. Addison-Wesley
Nijssen S, Kok J (2003) Efficient discovery of frequent unordered trees. In: 1st Internationall workshop on mining graphs, trees and sequences
Nijssen S, Kok J (2004) A quickstart in frequent structure mining can make a difference. In: ACM SIGKDD international conference on knowledge discovery and data mining
Oates T, Schmill MD, Jensen D, Cohen PR (1997) A family of algorithms for finding temporal structure in data. In: 6th International workshop on AI and statistics
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: 7th International conference on database theory
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001). PrefixSpan: mining sequential patterns efficiently by prefixprojected pattern growth. In: IEEE international conference on data engineering
Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: 21st International conference on very large data bases
Shasha D, Wang J, Zhang S (2004) Unordered tree mining with applications to phylogeny. In: IEEE international conference on data engineering
Siek J, Lee L, Lumsdaine A (2002). The boost graph library. Addison-Wesley
Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. In: 5th International conference extending database technology
Termier A, Rousset M-C, Sebag M (2002) TreeFinder: a first step towards xml data mining. In: IEEE international conference on data mining
Termier A, Rousset M-C, Sebag M (2004) Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: IEEE international conference on data mining
Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM 23: 31–
Article MathSciNet Google Scholar
Wang C, Hong M, Pei J, Zhou H, Wang W, Shi B (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: Pacific-Asia conference on knowledge discovery and data mining
Wang J, Han J (2004) BIDE: efficient mining of frequent closed sequences. In: IEEE international conference on data engineering
Wang J, Han J, Pei J. (2003). CLOSET+: searching for the best strategies for mining frequent closed itemsets. In: ACM SIGKDD international conference on knowledge discovery and data mining
Wang K, Liu H (1998) Discovering typical structures of documents: a road map approach. In: ACM SIGIR international conference on information retrieval
Witten I, Frank E (1999) Data mining: practical machine learning tools and techniques with java implementations. Morgan Kauffman
Xiao Y, Yao J-F, Li Z, Dunham MH (2003) Efficient data mining for maximal frequent subtrees. In: IEEE international conference on data mining
Yan X, Han J (2002a) gSpan: graph-based substructure pattern mining. In: IEEE international conference on data mining
Yan X, Han J (2002b) gSpan: graph-based substructure pattern mining (Technical report UIUCDCS-R-2002-2296). University of Illinois at Urbana-Champaign
Yan X, Han J (2003) CloseGraph: mining closed frequent graph patterns In: ACM SIGKDD international conference on knowledge discovery and data mining
Yoshida K, Motoda H (1995) CLIP: concept learning from inference patterns. Artif Intel 75: 63–92
Article Google Scholar
Zaki MJ (2000a) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12: 372–390
Article Google Scholar
Zaki MJ (2000b) Sequences mining in categorical domains: Incorporating constraints. In: 9th International conference on information and knowledge management
Zaki MJ (2001) SPADE: An efficient algorithm for mining frequent sequences. Machine Learn J 42: 31–60
Article MATH Google Scholar
Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: 8th ACM SIGKDD international conference on knowledge discovery and data mining
Zaki MJ (2005a) Efficiently mining frequent embedded unordered trees. Fundamenta Informaticae 66: 33–52
MATH MathSciNet Google Scholar
Zaki MJ (2005b) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans Knowledge Data Eng 17: 1021–1035
Article Google Scholar
Zaki MJ, Gouda K (2003) Fast vertical mining using Diffsets. In: 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 326–335
Zaki MJ, Hsiao C-J (2002) ChARM: an efficient algorithm for closed itemset mining. In: 2nd SIAM international conference on data mining
Zaki MJ, Hsiao C-J (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17: 462–478
Article Google Scholar
Zaki MJ, Parimi N, De N, Gao F, Phoophakdee B, Urban J, Chaoji V, Hasan M, Salem S (2004) Towards generic pattern mining. In: International conference on formal concept analysis (Invited paper)
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: 3rd International conference on knowledge discovery and data mining
Zou B, Ma X, Kemme B, Newton G, Precu D (2006) Data mining using relational database management systems. In: Pacific-asia conference on knowledge discovery and data mining

Download references

Author information

Authors and Affiliations

Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
Vineet Chaoji, Mohammad Al Hasan, Saeed Salem & Mohammed J. Zaki

Authors

Vineet Chaoji
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Al Hasan
View author publications
You can also search for this author in PubMed Google Scholar
Saeed Salem
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed J. Zaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed J. Zaki.

Additional information

Responsible editor: Hannu Toivonen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chaoji, V., Al Hasan, M., Salem, S. et al. An integrated, generic approach to pattern mining: data mining template library. Data Min Knowl Disc 17, 457–495 (2008). https://doi.org/10.1007/s10618-008-0098-x

Download citation

Received: 08 May 2007
Accepted: 29 April 2008
Published: 18 May 2008
Issue Date: December 2008
DOI: https://doi.org/10.1007/s10618-008-0098-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An integrated, generic approach to pattern mining: data mining template library

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Big data analytics on Apache Spark

Big data preprocessing: methods and prospects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An integrated, generic approach to pattern mining: data mining template library

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Big data analytics on Apache Spark

Big data preprocessing: methods and prospects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation