skip to main content
10.1145/1951365.1951408acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Efficient discovery of frequent subgraph patterns in uncertain graph databases

Published:21 March 2011Publication History

ABSTRACT

Mining frequent subgraph patterns in graph databases is a challenging and important problem with applications in several domains. Recently, there is a growing interest in generalizing the problem to uncertain graphs, which can model the inherent uncertainty in the data of many applications. The main difficulty in solving this problem results from the large number of candidate subgraph patterns to be examined and the large number of subgraph isomorphism tests required to find the graphs that contain a given pattern. The latter becomes even more challenging, when dealing with uncertain graphs. In this paper, we propose a method that uses an index of the uncertain graph database to reduce the number of comparisons needed to find frequent subgraph patterns. The proposed algorithm relies on the apriori property for enumerating candidate subgraph patterns efficiently. Then, the index is used to reduce the number of comparisons required for computing the expected support of each candidate pattern. It also enables additional optimizations with respect to scheduling and early termination, that further increase the efficiency of the method. The evaluation of our approach on three real-world datasets as well as on synthetic uncertain graph databases demonstrates the significant cost savings with respect to the state-of-the-art approach.

References

  1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, pages 487--499, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Asthana, O. D. King, F. D. Gibbons, and F. P. Roth. Predicting protein complex membership using probabilistic network reliability. Genome Research, 14:1170--1175, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  3. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Cheng, Y. Ke, and W. Ng. Graphgen: A synthetic graph generator. http://www.cse.ust.hk/graphgen/, 2006.Google ScholarGoogle Scholar
  5. J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD, pages 857--872, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. An improved algorithm for matching large graphs. In 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition, pages 149--159, 2001.Google ScholarGoogle Scholar
  7. J. Ghosh, H. Q. Ngo, S. Yoon, and C. Qiao. On a routing problem within probabilistic graphs and its application to intermittently connected networks. In INFOCOM, pages 1721--1729, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Gudes, S. E. Shimony, and N. Vanetik. Discovering frequent graph patterns using disjoint paths. IEEE Trans. Knowl. Data Eng., 18(11):1441--1456, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. He and A. K. Singh. Closure-tree: An index structure for graph queries. In ICDE, page 38, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Helma, R. D. King, S. Kramer, and A. Srinivasan. The predictive toxicology evaluation challenge 2000--2001. Bioinformatics, 17(1):107--108, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  11. P. Hintsanen and H. Toivonen. Finding reliable subgraphs from large probabilistic graphs. Data Min. Knowl. Discov., 17(1):3--23, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Hua and J. Pei. Probabilistic path queries in road networks: traffic uncertainty aware path selection. In EDBT, pages 347--358, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Huan, W. Wang, J. Prins, and J. Yang. Spin: mining maximal frequent subgraphs from graph databases. In KDD, pages 581--586, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Huff and J. Haseman. Long-term chemical carcinogenesis experiments for identifying potential human cancer hazards. Environmental Health Perspectives, 96(3):23--31, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  16. A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD, pages 13--23, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD, pages 137--146, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Kuramochi and G. Karypis. An efficient algorithm for discovering frequent subgraphs. IEEE Trans. Knowl. Data Eng., 16(9):1038--1051, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Liben-Nowell and J. M. Kleinberg. The link prediction problem for social networks. In CIKM, pages 556--559, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Liu, J. Li, and H. Gao. Summarizing graph patterns. In ICDE, pages 903--912, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Nijssen and J. N. Kok. A quickstart in frequent structure mining can make a difference. In KDD, pages 647--652, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Potamias, F. Bonchi, A. Gionis, and G. Kollios. k-nearest neighbors in uncertain graphs. In PVLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1):31--42, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. G. Valiant. The complexity of computing the permanent. Theor. Comput. Sci., 8:189--201, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  25. C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-based graph databases. In KDD, pages 316--325, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explor. Newsl., 5(1):59--68, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In SIGMOD, pages 433--444, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM, pages 721--724, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. X. Yan and J. Han. Closegraph: mining closed frequent graph patterns. In KDD, pages 286--295, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Zhang, J. Yang, and S. Li. Ring: An integrated method for frequent representative subgraph mining. In ICDM, pages 1082--1087, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Z. Zou, J. Li, H. Gao, and S. Zhang. Frequent subgraph pattern mining on uncertain graph data. In CIKM, pages 583--592, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient discovery of frequent subgraph patterns in uncertain graph databases

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology
      March 2011
      587 pages
      ISBN:9781450305280
      DOI:10.1145/1951365

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 March 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate7of10submissions,70%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader