Skip to main content
Log in

Extracting frequent connected subgraphs from large graph sets

  • Software Engineering
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Mining frequent patterns from datasets is one of the key success of data mining research. Currently, most of the studies focus on the data sets in which the elements are independent, such as the items in the marketing basket. However, the objects in the real world often have close relationship with each other. How to extract frequent patterns from these relations is the objective of this paper. The authors use graphs to model the relations, and select a simple type for analysis. Combining the graph theory and algorithms to generate frequent patterns, a new algorithm called Topology, which can mine these graphs efficiently, has been proposed. The performance of the algorithm is evaluated by doing experiments with synthetic datasets and real data. The experimental results show that Topology can do the job well. At the end of this paper, the potential improvement is mentioned.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal Ret al. Mining association rules between sets of items in large databases. InProc. ACM SIGMOD, Washington D C, USA, 1993, pp.207–216.

  2. Agrawal Ret al. Fast algorithms for mining association rules in large databases. InProc. VLDB, Santiago, Chile, 1994, pp.487–499.

  3. Park J Set al. An effective hash based algorithm for mining association rules. InProc. ACM SIGMOD, San Jose, California, USA, 1995, pp.175–186.

  4. Brin Set al. Dynamic itemset counting and implication rules for market basket data. InProc. ACM SIGMOD, Tucson, Arizona, USA, 1997, pp.255–264.

  5. Han Jet al. Mining frequent patterns without candidate generation. InProc. ACM SIGMOD Dallas, Texas, USA, 2000, pp.1–12.

  6. Read R Cet al. The graph isomorphism disease.J. Graph Theory, 1977, 4: 339–363.

    Article  MathSciNet  Google Scholar 

  7. Babai Let al. Canonical labeling of graphs. InProc. ACM STOC, Boston, Massachusetts, USA, 1983, pp.171–183.

  8. Inokuchi Aet al. An apriori-based algorithm for mining frequent substructures from graph data. InProc. PKDD, LNCS 1910, Springer, Lyon, France, 2000, pp.13–23.

    Google Scholar 

  9. Inokuchi Aet al. Applying algebraic mining method of graph substructures to mutageniesis data analysis. InKDD Challenge, PAKDD, Kyoto, Japan, 2000, pp.41–46.

  10. Inokuchi Aet al. A fast algorithm for mining frequent connected subgraphs. Research Report RT0448, IBM Research, Tokyo Research Laboratory, 2002.

  11. Kuramochi Met al. Frequent subgraph discovery. InProc. IEEE ICDM, San Jose California, USA, 2001, pp.313–320.

  12. Kuramochi Met al. An efficient algorithm for discovering frequent subgraph. Technical Report 02-026, Dept. of Computer Science, University of Minnesota, 2002.

  13. Yan Xet al. gSpan: Graph-based substructure pattern mining. InProc. IEEE ICDM, Maebashi City, Japan, 2002.

  14. Pei Jet al. PrefixSpan: Mining sequential patterns by prefix-projected growth. InProc. ICDE, Dusseldorf, Germany, 2001, pp.215–224.

  15. Cook D Jet al. Substructure discovery using minimum description length and background knowledge.J. Artificial Intelligence Research, 1994, 1: 231–255.

    Google Scholar 

  16. Yoshida Ket al. CLIP: Concept learning from inference patterns.Artificial Intelligence, 1995, 1: 63–92.

    Article  Google Scholar 

  17. Motoda Het al. Machine learning techniques to make computers easier to use. InProc. IJCAI, 1997, 2: 1622–1631, Nagoya, Japan.

  18. Matsuda Tet al. Extension of graph-based induction for general graph structured data. InProc. PAKDD, Springer, Kyoto, Japan, 2000, LNCS 1805: 420–431.

    Google Scholar 

  19. Matsuda Tet al. Knowledge discovery from structured data by beam-wise graph-based induction. InProc. PRICAI, Springer, Tokyo, Japan, 2002, LNCS 2417: 255–264.

    Google Scholar 

  20. Raedt L Deet al. The levelwise version space algorithm and its application to molecular fragment finding. InProc. IJCAI, Seattle, Washington, USA, 2001, 2: 853–862.

  21. Dehaspe Let al. Finding frequent substructures in chemical compounds. InProc. KDD, New York, USA, 1998, pp.30–36.

  22. Kramer Set al. Molecular feature mining in HIV data. InProc. ACM SIGKDD, San Francisco, USA, 2001, pp.136–143.

  23. Weininger D. SMILES, a chemical language and information system.J. Chemical Information and Computer Sciences, 1988, 1: 31–36.

    Google Scholar 

  24. James C Aet al. Daylight Theory Manual—Daylight 4.71.

  25. Wang Xet al. Finding patterns in three-dimensional graphs: Algorithms and applications to scientific data mining.IEEE TKDE, 2002, 4: 731–749.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Wang.

Additional information

This work was supported by the National Natural Science Foundation of China (Grant Nos.69933030 and 60303008) and the National High-Technology Development 863 Program of China (Grant No.2002AA4Z3430).

Wei Wang received the B.S. degree in computer science in 1992 from Shandong University, the Ph.D. degree in computer science in 1998 from Fudan University, respectively. He is now a professor in Department of Computing and Information Technology, Fudan University. His research interests include database, data warehouse, data mining.

Qing-Qing Yuan received the B.S., the M.S. degrees in computer science in 2000 from Fudan University, in 2003, respectively. Now she is a Ph.D. candidate in Department of Computer Science, University of California. Santa BarBara. Her research interests include database and data mining.

Hao-Feng Zhou received the B.S. degree in computer science in 1997 from Shanghai University, the M.S. degree and the Ph.D. degree in computer science in 2000 and in 2003, from Fudan University, respectively. His research interests include database and data mining.

Ming-Sheng Hong received the B.S. degree in computer science in 2002 from Fudan University. Now she is a Ph.D. candidate in Department of Computer Science, University of Connell. His research interests include database and data mining.

Bai-Le Shi received the B.S. degree in mathematics in 1957 from Peking University. He is a professor in Department of Computing and Information Technology, Fudan University. He is also director of the Shanghai (International) Database Research Center. His research interests include database, data warehouse and digital library.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, W., Yuan, QQ., Zhou, HF. et al. Extracting frequent connected subgraphs from large graph sets. J. Comput. Sci. & Technol. 19, 867–875 (2004). https://doi.org/10.1007/BF02973450

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02973450

Keywords

Navigation