Cl-GBI: A Novel Approach for Extracting Typical Patterns from Graph-Structured Data

Nguyen, Phu Chien; Ohara, Kouzou; Motoda, Hiroshi; Washio, Takashi

doi:10.1007/11430919_74

Phu Chien Nguyen²¹,
Kouzou Ohara²¹,
Hiroshi Motoda²¹ &
…
Takashi Washio²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3518))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2561 Accesses
8 Citations

Abstract

Graph-Based Induction (GBI) is a machine learning technique developed for the purpose of extracting typical patterns from graph-structured data by stepwise pair expansion (pair-wise chunking). GBI is very efficient because of its greedy search strategy, however, it suffers from the problem of overlapping subgraphs. As a result, some of typical patterns cannot be discovered by GBI though a beam search has been incorporated in an improved version of GBI called Beam-wise GBI (B-GBI). In this paper, improvement is made on the search capability by using a new search strategy, where frequent pairs are never chunked but used as pseudo nodes in the subsequent steps, thus allowing extraction of overlapping subgraphs. This new algorithm, called Cl-GBI (Chunkingless GBI), was tested against two datasets, the promoter dataset from UCI repository and the hepatitis dataset provided by Chiba University, and shown successful in extracting more typical patterns than B-GBI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blake, C.L., Keogh, E., Merz, C.J.: UCI Repository of Machine Learning Database (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Borgelt, C., Berthold, M.R.: Mining Molecular Fragments: Finding Relevant Substructures of Molecules. In: Proc. ICDM 2002, pp. 51–58 (2002)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software (1984)
Google Scholar
Cook, D.J., Holder, L.B.: Substructure Discovery Using Minimum Description Length and Background Knowledge. Artificial Intelligence Research 1, 231–255 (1994)
Google Scholar
Fortin, S.: The Graph Isomorphism Problem, Technical Report TR96-20, Department of Computer Science, University of Alberta, Edmonton, Canada (1996)
Google Scholar
Gaemsakul, W., Matsuda, T., Yoshida, T., Motoda, M., Washio, T.: Classifier Construction by Graph-Based Induction for Graph-Structured Data. In: Proc. PAKDD 2003, pp. 52–62 (2003)
Google Scholar
Huan, J., Wang, W., Prins, J.: Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. In: Proc. ICDM 2003, pp. 549–552 (2003)
Google Scholar
Inokuchi, A., Washio, T., Motoda, H.: Complete Mining of Frequent Patterns from Graphs: Mining Graph Data. Machine Learning 50(3), 321–354 (2003)
Article MATH Google Scholar
Inokuchi, A., Washio, T., Nishimura, K., Motoda, H.: A Fast Algorithm for Mining Frequent Connected Subgraphs. IBM Research Report RT0448, Tokyo Research Laboratory, IBM Japan (2002)
Google Scholar
Kuramochi, M., Karypis, G.: An Efficient Algorithm for Discovering Frequent Subgraphs. IEEE Trans. Knowledge and Data Engineering 16(9), 1038–1051 (2004)
Article Google Scholar
Kuramochi, M., Karypis, G.: GREW–A Scalable Frequent Subgraph Discovery Algorithm. In: Proc. ICDM 2004, pp. 439–442 (2004)
Google Scholar
Matsuda, T., Motoda, H., Yoshida, T., Washio, T.: Mining Patterns from Structured Data by Beam-wise Graph-Based Induction. In: Proc. DS 2002, pp. 422–429 (2002)
Google Scholar
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Google Scholar
Yan, X., Han, J.: gSpan: Graph-Based Structure Pattern Mining. In: Proc. ICDM 2002, pp. 721–724 (2002)
Google Scholar
Yoshida, K., Motoda, M.: CLIP: Concept Learning from Inference Patterns. Artificial Intelligence 75(1), 63–92 (1995)
Article Google Scholar

Download references

Author information

Authors and Affiliations

The Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka, 567-0047, Japan
Phu Chien Nguyen, Kouzou Ohara, Hiroshi Motoda & Takashi Washio

Authors

Phu Chien Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Kouzou Ohara
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Motoda
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Washio
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology, Asahidai 1-1, 923-12292, Nomi, Japan
Tu Bao Ho
University of Hong Kong, Pokfulam Road, Hong Kong, China
David Cheung
Department of Computer Science and Engineering, Arizona State University, Tempe, Arizona, USA
Huan Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, P.C., Ohara, K., Motoda, H., Washio, T. (2005). Cl-GBI: A Novel Approach for Extracting Typical Patterns from Graph-Structured Data. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_74

Download citation

DOI: https://doi.org/10.1007/11430919_74
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26076-9
Online ISBN: 978-3-540-31935-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics