Chopper: Efficient algorithm for tree mining

Wang, Chen; Hong, Ming-Sheng; Wang, Wei; Shi, Bai-Le

doi:10.1007/BF02944901

Chopper: Efficient algorithm for tree mining

Knowledge and Data Processing
Published: May 2004

Volume 19, pages 309–319, (2004)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Chen Wang¹,
Ming-Sheng Hong¹,
Wei Wang¹ &
…
Bai-Le Shi¹

54 Accesses
8 Citations
Explore all metrics

Abstract

With the development of Internet, frequent pattern mining has been extended to more complex patterns like tree mining and graph mining. Such applications arise in complex domains like bioinformatics, web mining, etc. In this paper, we present a novel algorithm, namedChopper, to discover frequent subtrees from ordered labeled trees. An extensive performance study shows that the newly developed algorithm outperformsTreeMiner V, one of the fastest methods proposed previously, in mining large databases. At the end of this paper, the potential improvement ofChopper is mentioned.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on semi-supervised learning

Article Open access 15 November 2019

Jesper E. van Engelen & Holger H. Hoos

Density-Based Clustering Based on Hierarchical Density Estimates

Learning from positive and unlabeled data: a survey

Article 02 April 2020

Jessa Bekker & Jesse Davis

References

Zaki M J. Efficiently mining frequent trees in a forest. In8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Copyright 2002 ACM 1-58113-567-X/02/0007, July 2002.
Cook D, Holder L. Substructure discovery using minimal description length and background knowledge.Journal of Artificial Intelligence Research, 1994, 1: 231–255.
Google Scholar
Agrawal R, Mannila H, Srikant Ret al. Fast discovery of association rules. InAdvances in Knowledge Discovery and Data Mining, Fayyad Uet al. (eds.), AAAI Press, Menlo Park, CA, 1996, pp.307–328.
Google Scholar
Cooley R, Mobasher B, Sravastava J. Web mining: Information and pattern discovering on the World Wide Web. In8th IEEE Int. Conf. Tools with AI, Newport Beach, California, USA, Nov. 1997, pp.558–567.
Zaki M J. SPADE: An efficient algorithm for mining frequent sequences.Machine Learning Journal, Jan/Feb 2001, 42(1/2): 112–120. Special issue on Unsupervised Learning.
Article Google Scholar
Asai T, Abe K, Kawasoe Set al. Efficient substructure discovery from large semi-structured data. InProc. SDM'02, Hyatt Regency, Crystal City, Arlington, Virginia, USA, Apr. 2002, pp.158–174.
Deahaspe L, Toivonen H, King R D. Finging frequent substructures in chemical compounds. InProc. KDD98, New York, USA, 1998, pp.30–36.
Matsuda T, Horiuchi T, Motoda Het al. Graph-based induction for general graph structured data. InProc. DS'99, New York, USA, 1999, pp.340–342.
Mannila H, Meek C. Global partial orders from sequential data. InProc. KDD2000, Boston, USA, 2000, pp.161–168.
Miyahara T, Shoudai T, Uchida Tet al. Discovery of frequent tree structured patterns in semistructured Web documents. InProc. PAKDD-2001, Hong Kong, China, 2001, pp.47–52.
Wang K, Liu H. Schema discovery for semistructured data. InProc. KDD'97, Newport Beach, USA, 1997, pp.271–274.
Wang J T L, Shapiro B A, Shasha Det al. Automated discovery of active motifs in multiple RNA secondary structures. InProc. KDD-96, Portland, USA, 1996, pp.70–75.
Pei J, Han J, Mortazavi-Asl Bet al. PrefixSpan: Mining sequential patterns by prefix-projected growth. InProc. ICDE01, Heidelberg, Germany, April 2001, pp.215–224.
Scott Fortin. The graph isomorphism problem. Technical Report No. TR96-20, Dept. of Computer Science, University of Alberta, 1996.
Richard Cole, Ramesh Hariharan, Piotr Indyk. Tree pattern matching and subset matching in deterministicO(n log³ n)-time. InProc. the 10th Annual ACMSIAM Symposium on Discrete Algorithms, Robert E Tarjan, Tandy Warnow (eds.), Baltimore, Maryland, USA, Jan. 1999, pp.245–254.
http://music.hyperreal.org
http://www.cs.washington.edu/research/adaptive

Download references

Author information

Authors and Affiliations

Department of Computing and Information Technology, Fudan University, 200433, Shanghai, P.R. China
Chen Wang, Ming-Sheng Hong, Wei Wang & Bai-Le Shi

Authors

Chen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Sheng Hong
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bai-Le Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Wang.

Additional information

This paper is supported by the Key Program of National Natural Science Foundation of China (Grant No.69933010) and the National High-Tech Development 863 Program of China (Grant Nos.2002AA4Z3430 and 2002AA231041).

Chen Wang was born in 1976. He received his B.E. degree and M.S. degree in computer science from Soochow University in 1999 and 2002 respectively. Now, he is currently a Ph.D. candidate in computer science at Fudan University. His research interests include data mining, database and knowledge base.

Qing-Qing Yuan was born in 1978. She received her B.E. degree and M.S. degree in computer science from Fudan University in 2000 and 2003 respectively. Her research interests include data mining, database and knowledge base.

Hao-Feng Zhou was born in 1975. He received his B.E. degree in computer science from Shanghai University in 1997, his M.S. degree and Ph.D. in computer science from Fudan University in 2000 and 2003 respectively. His research interests include data mining, database and knowledge base.

Wei Wang was born in 1970. He received the M.S. degree in 1992 and the Ph.D. degree in 1998. Now he is an associate professor of the Dept. of Computing and Information Technology, Fudan University. His main research areas include spatial-temporal database, constraint database, index technology and semistructure database.

Bai-Le Shi was born in 1935. He received the M.S. degree in 1956. Now he is a chief professor of the Dept. of Computing and Information Technology, Fudan University. His main research areas include object-oriented database, knowledge database, digital library.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Hong, MS., Wang, W. et al. Chopper: Efficient algorithm for tree mining. J. Comput. Sci. & Technol. 19, 309–319 (2004). https://doi.org/10.1007/BF02944901

Download citation

Received: 25 March 2003
Revised: 11 August 2003
Issue Date: May 2004
DOI: https://doi.org/10.1007/BF02944901

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Chopper: Efficient algorithm for tree mining

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

Density-Based Clustering Based on Hierarchical Density Estimates

Learning from positive and unlabeled data: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Chopper: Efficient algorithm for tree mining

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

Density-Based Clustering Based on Hierarchical Density Estimates

Learning from positive and unlabeled data: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation