Abstract
We propose a definition for frequent approximate patterns in order to model important subgraphs in a graph database with incomplete or inaccurate information. By our definition, frequent approximate patterns possess three main properties: possible absence of exact match, maximal representation, and the Apriori Property. Since approximation increases the number of frequent patterns, we present a novel randomized algorithm (called RAM) using feature retrieval. A large number of real and synthetic data sets are used to demonstrate the effectiveness and efficiency of the frequent approximate graph pattern model and the RAM algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bader, J., Chaudhuri, A., Rothberg, J., Chant, J.: Gaining confidence in high-throughput protein interaction networks. Nature Biotechnology 22(1), 78–85 (2004)
Chang, R., Podgurski, A., Yang, J.: Finding What’s not there: a new approach to revealing neglected conditions in software. In: Proc. of ISSTA (2007)
Cong, G., Yi, L., Liu, B., Wang, K.: Discovering frequent substructures from hierarchical semi-structured data. In: Proc of SDM (2002)
Gunopulos, D., Mannila, H., Saluja, S.: Discovering All Most Specific Sentences by Randomized Algorithms Source. LNCS 1997(1997)
Hasan, M., Chaoji, V., Salem, S., Besson, J., Zaki, M.: ORIGAMI: Mining Representative Orthogonal Graph Patterns. In: Perner, P. (ed.) ICDM 2007. LNCS (LNAI), vol. 4597. Springer, Heidelberg (2007)
Holder, L., Cook, D., Djoko, S.: Substructure discovery in the subdue system. In: Proc. AAAI (1994)
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proc. of ICDM (2003)
Huan, J., Wang, W., Prins, J., Yang, J.: SPIN: mining maximal frequent subgraphs from graph databases. In: Proc. of KDD (2004)
Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of PDKK (2000)
Koyuturk, M., Grama, A., Szpankowski, W.: An efficient algorithm for detecting frequent subgraphs in bioloical networks. Bionformatics 20, 200–207 (2004)
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proc. of ICDE (2001)
Kuramochi, M., Karypis, G.: Finding frequent patterns in a large sparse graph. Data Min. Knowl. Discov. (2005)
Nijssen, S., Kok, J.: A quickstart in frequent structure mining can make a difference. In: Proc of KDD (2004)
Pei, J., Jiang, D., Zhang, A.: On Mining Cross-Graph Quasi-Cliques. In: Proc. of KDD (2005)
Park, J., Chen, M., Yu, P.: An effective hash based algorithm for mining association rules. In: Proc. SIGMOD, pp. 175–186 (1995)
Thomas, L., Valluri, S., Karlapalem, K.: MARGIN:Maximal Frequent Subgraph Mining. In: Proc. of ICDM (2006)
Yan, X., Han, J.: CloseGraph: Mining closed frequent graph patterns. In: Proc. of SIGKDD (2003)
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proc. of ICDM (2002)
Yan, X., Yu, P., Han, J.: Substructure similarity search in graph databases. In: Proc. of SIGMOD (2005)
Liu, J., Paulsen, S., Xu, X., Wang, W., Nobel, A., Prins, J.: Mining approximate frequent itemset from noisy data. In: ICDM (2005)
Zaki, M.: Efficiently mining frequent trees in a forest: algorithms and applications. In: IEEE TKDE (2005)
Kyoto Encyclopedia of Genes and Genomes, http://www.genome.jp/kegg/
Metabolic pathway categories in KEGG, http://www.kegg.com/kegg/pathway/map/map01100.html
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, S., Yang, J. (2008). RAM: Randomized Approximate Graph Mining. In: Ludäscher, B., Mamoulis, N. (eds) Scientific and Statistical Database Management. SSDBM 2008. Lecture Notes in Computer Science, vol 5069. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69497-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-69497-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69476-2
Online ISBN: 978-3-540-69497-7
eBook Packages: Computer ScienceComputer Science (R0)