ABSTRACT
Mining frequent subgraph patterns in graph databases is a challenging and important problem with applications in several domains. Recently, there is a growing interest in generalizing the problem to uncertain graphs, which can model the inherent uncertainty in the data of many applications. The main difficulty in solving this problem results from the large number of candidate subgraph patterns to be examined and the large number of subgraph isomorphism tests required to find the graphs that contain a given pattern. The latter becomes even more challenging, when dealing with uncertain graphs. In this paper, we propose a method that uses an index of the uncertain graph database to reduce the number of comparisons needed to find frequent subgraph patterns. The proposed algorithm relies on the apriori property for enumerating candidate subgraph patterns efficiently. Then, the index is used to reduce the number of comparisons required for computing the expected support of each candidate pattern. It also enables additional optimizations with respect to scheduling and early termination, that further increase the efficiency of the method. The evaluation of our approach on three real-world datasets as well as on synthetic uncertain graph databases demonstrates the significant cost savings with respect to the state-of-the-art approach.
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, pages 487--499, 1994. Google ScholarDigital Library
- S. Asthana, O. D. King, F. D. Gibbons, and F. P. Roth. Predicting protein complex membership using probabilistic network reliability. Genome Research, 14:1170--1175, 2004.Google ScholarCross Ref
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970. Google ScholarDigital Library
- J. Cheng, Y. Ke, and W. Ng. Graphgen: A synthetic graph generator. http://www.cse.ust.hk/graphgen/, 2006.Google Scholar
- J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD, pages 857--872, 2007. Google ScholarDigital Library
- L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. An improved algorithm for matching large graphs. In 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition, pages 149--159, 2001.Google Scholar
- J. Ghosh, H. Q. Ngo, S. Yoon, and C. Qiao. On a routing problem within probabilistic graphs and its application to intermittently connected networks. In INFOCOM, pages 1721--1729, 2007.Google ScholarDigital Library
- E. Gudes, S. E. Shimony, and N. Vanetik. Discovering frequent graph patterns using disjoint paths. IEEE Trans. Knowl. Data Eng., 18(11):1441--1456, 2006. Google ScholarDigital Library
- H. He and A. K. Singh. Closure-tree: An index structure for graph queries. In ICDE, page 38, 2006. Google ScholarDigital Library
- C. Helma, R. D. King, S. Kramer, and A. Srinivasan. The predictive toxicology evaluation challenge 2000--2001. Bioinformatics, 17(1):107--108, 2001.Google ScholarCross Ref
- P. Hintsanen and H. Toivonen. Finding reliable subgraphs from large probabilistic graphs. Data Min. Knowl. Discov., 17(1):3--23, 2008. Google ScholarDigital Library
- M. Hua and J. Pei. Probabilistic path queries in road networks: traffic uncertainty aware path selection. In EDBT, pages 347--358, 2010. Google ScholarDigital Library
- J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM, 2003. Google ScholarDigital Library
- J. Huan, W. Wang, J. Prins, and J. Yang. Spin: mining maximal frequent subgraphs from graph databases. In KDD, pages 581--586, 2004. Google ScholarDigital Library
- J. Huff and J. Haseman. Long-term chemical carcinogenesis experiments for identifying potential human cancer hazards. Environmental Health Perspectives, 96(3):23--31, 1991.Google ScholarCross Ref
- A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD, pages 13--23, 2000. Google ScholarDigital Library
- D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD, pages 137--146, 2003. Google ScholarDigital Library
- M. Kuramochi and G. Karypis. An efficient algorithm for discovering frequent subgraphs. IEEE Trans. Knowl. Data Eng., 16(9):1038--1051, 2004. Google ScholarDigital Library
- D. Liben-Nowell and J. M. Kleinberg. The link prediction problem for social networks. In CIKM, pages 556--559, 2003. Google ScholarDigital Library
- Y. Liu, J. Li, and H. Gao. Summarizing graph patterns. In ICDE, pages 903--912, 2008. Google ScholarDigital Library
- S. Nijssen and J. N. Kok. A quickstart in frequent structure mining can make a difference. In KDD, pages 647--652, 2004. Google ScholarDigital Library
- M. Potamias, F. Bonchi, A. Gionis, and G. Kollios. k-nearest neighbors in uncertain graphs. In PVLDB, 2010. Google ScholarDigital Library
- J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1):31--42, 1976. Google ScholarDigital Library
- L. G. Valiant. The complexity of computing the permanent. Theor. Comput. Sci., 8:189--201, 1979.Google ScholarCross Ref
- C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-based graph databases. In KDD, pages 316--325, 2004. Google ScholarDigital Library
- T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explor. Newsl., 5(1):59--68, 2003. Google ScholarDigital Library
- X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In SIGMOD, pages 433--444, 2008. Google ScholarDigital Library
- X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM, pages 721--724, 2002. Google ScholarDigital Library
- X. Yan and J. Han. Closegraph: mining closed frequent graph patterns. In KDD, pages 286--295, 2003. Google ScholarDigital Library
- X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarDigital Library
- S. Zhang, J. Yang, and S. Li. Ring: An integrated method for frequent representative subgraph mining. In ICDM, pages 1082--1087, 2009. Google ScholarDigital Library
- Z. Zou, J. Li, H. Gao, and S. Zhang. Frequent subgraph pattern mining on uncertain graph data. In CIKM, pages 583--592, 2009. Google ScholarDigital Library
Index Terms
- Efficient discovery of frequent subgraph patterns in uncertain graph databases
Recommendations
Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data miningFrequent subgraph mining has been extensively studied on certain graph data. However, uncertainties are inherently accompanied with graph data in practice, and there is very few work on mining uncertain graph data. This paper investigates frequent ...
Frequent subgraph pattern mining on uncertain graph data
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementGraph data are subject to uncertainties in many applications due to incompleteness and imprecision of data. Mining uncertain graph data is semantically different from and computationally more challenging than mining exact graph data. This paper ...
Mining Frequent Subgraph Patterns from Uncertain Graph Data
In many real applications, graph data is subject to uncertainties due to incompleteness and imprecision of data. Mining such uncertain graph data is semantically different from and computationally more challenging than mining conventional exact graph ...
Comments