Skip to main content
Log in

Chopper: Efficient algorithm for tree mining

  • Knowledge and Data Processing
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

With the development of Internet, frequent pattern mining has been extended to more complex patterns like tree mining and graph mining. Such applications arise in complex domains like bioinformatics, web mining, etc. In this paper, we present a novel algorithm, namedChopper, to discover frequent subtrees from ordered labeled trees. An extensive performance study shows that the newly developed algorithm outperformsTreeMiner V, one of the fastest methods proposed previously, in mining large databases. At the end of this paper, the potential improvement ofChopper is mentioned.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Zaki M J. Efficiently mining frequent trees in a forest. In8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Copyright 2002 ACM 1-58113-567-X/02/0007, July 2002.

  2. Cook D, Holder L. Substructure discovery using minimal description length and background knowledge.Journal of Artificial Intelligence Research, 1994, 1: 231–255.

    Google Scholar 

  3. Agrawal R, Mannila H, Srikant Ret al. Fast discovery of association rules. InAdvances in Knowledge Discovery and Data Mining, Fayyad Uet al. (eds.), AAAI Press, Menlo Park, CA, 1996, pp.307–328.

    Google Scholar 

  4. Cooley R, Mobasher B, Sravastava J. Web mining: Information and pattern discovering on the World Wide Web. In8th IEEE Int. Conf. Tools with AI, Newport Beach, California, USA, Nov. 1997, pp.558–567.

  5. Zaki M J. SPADE: An efficient algorithm for mining frequent sequences.Machine Learning Journal, Jan/Feb 2001, 42(1/2): 112–120. Special issue on Unsupervised Learning.

    Article  Google Scholar 

  6. Asai T, Abe K, Kawasoe Set al. Efficient substructure discovery from large semi-structured data. InProc. SDM'02, Hyatt Regency, Crystal City, Arlington, Virginia, USA, Apr. 2002, pp.158–174.

  7. Deahaspe L, Toivonen H, King R D. Finging frequent substructures in chemical compounds. InProc. KDD98, New York, USA, 1998, pp.30–36.

  8. Matsuda T, Horiuchi T, Motoda Het al. Graph-based induction for general graph structured data. InProc. DS'99, New York, USA, 1999, pp.340–342.

  9. Mannila H, Meek C. Global partial orders from sequential data. InProc. KDD2000, Boston, USA, 2000, pp.161–168.

  10. Miyahara T, Shoudai T, Uchida Tet al. Discovery of frequent tree structured patterns in semistructured Web documents. InProc. PAKDD-2001, Hong Kong, China, 2001, pp.47–52.

  11. Wang K, Liu H. Schema discovery for semistructured data. InProc. KDD'97, Newport Beach, USA, 1997, pp.271–274.

  12. Wang J T L, Shapiro B A, Shasha Det al. Automated discovery of active motifs in multiple RNA secondary structures. InProc. KDD-96, Portland, USA, 1996, pp.70–75.

  13. Pei J, Han J, Mortazavi-Asl Bet al. PrefixSpan: Mining sequential patterns by prefix-projected growth. InProc. ICDE01, Heidelberg, Germany, April 2001, pp.215–224.

  14. Scott Fortin. The graph isomorphism problem. Technical Report No. TR96-20, Dept. of Computer Science, University of Alberta, 1996.

  15. Richard Cole, Ramesh Hariharan, Piotr Indyk. Tree pattern matching and subset matching in deterministicO(n log3 n)-time. InProc. the 10th Annual ACMSIAM Symposium on Discrete Algorithms, Robert E Tarjan, Tandy Warnow (eds.), Baltimore, Maryland, USA, Jan. 1999, pp.245–254.

  16. http://music.hyperreal.org

  17. http://www.cs.washington.edu/research/adaptive

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Wang.

Additional information

This paper is supported by the Key Program of National Natural Science Foundation of China (Grant No.69933010) and the National High-Tech Development 863 Program of China (Grant Nos.2002AA4Z3430 and 2002AA231041).

Chen Wang was born in 1976. He received his B.E. degree and M.S. degree in computer science from Soochow University in 1999 and 2002 respectively. Now, he is currently a Ph.D. candidate in computer science at Fudan University. His research interests include data mining, database and knowledge base.

Qing-Qing Yuan was born in 1978. She received her B.E. degree and M.S. degree in computer science from Fudan University in 2000 and 2003 respectively. Her research interests include data mining, database and knowledge base.

Hao-Feng Zhou was born in 1975. He received his B.E. degree in computer science from Shanghai University in 1997, his M.S. degree and Ph.D. in computer science from Fudan University in 2000 and 2003 respectively. His research interests include data mining, database and knowledge base.

Wei Wang was born in 1970. He received the M.S. degree in 1992 and the Ph.D. degree in 1998. Now he is an associate professor of the Dept. of Computing and Information Technology, Fudan University. His main research areas include spatial-temporal database, constraint database, index technology and semistructure database.

Bai-Le Shi was born in 1935. He received the M.S. degree in 1956. Now he is a chief professor of the Dept. of Computing and Information Technology, Fudan University. His main research areas include object-oriented database, knowledge database, digital library.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Hong, MS., Wang, W. et al. Chopper: Efficient algorithm for tree mining. J. Comput. Sci. & Technol. 19, 309–319 (2004). https://doi.org/10.1007/BF02944901

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02944901

Keywords

Navigation