research-article

Efficient discovery of frequent subgraph patterns in uncertain graph databases

Authors:
Odysseas Papapetrou

L3S Research Center, Hannover, Germany

L3S Research Center, Hannover, Germany
View Profile

,
Ekaterini Ioannou

L3S Research Center, Hannover, Germany

L3S Research Center, Hannover, Germany
View Profile

,
Dimitrios Skoutas

L3S Research Center, Hannover, Germany

L3S Research Center, Hannover, Germany
View Profile

EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database TechnologyMarch 2011Pages 355–366https://doi.org/10.1145/1951365.1951408

Published:21 March 2011Publication History

EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology

Pages 355–366

ABSTRACT

Mining frequent subgraph patterns in graph databases is a challenging and important problem with applications in several domains. Recently, there is a growing interest in generalizing the problem to uncertain graphs, which can model the inherent uncertainty in the data of many applications. The main difficulty in solving this problem results from the large number of candidate subgraph patterns to be examined and the large number of subgraph isomorphism tests required to find the graphs that contain a given pattern. The latter becomes even more challenging, when dealing with uncertain graphs. In this paper, we propose a method that uses an index of the uncertain graph database to reduce the number of comparisons needed to find frequent subgraph patterns. The proposed algorithm relies on the apriori property for enumerating candidate subgraph patterns efficiently. Then, the index is used to reduce the number of comparisons required for computing the expected support of each candidate pattern. It also enables additional optimizations with respect to scheduling and early termination, that further increase the efficiency of the method. The evaluation of our approach on three real-world datasets as well as on synthetic uncertain graph databases demonstrates the significant cost savings with respect to the state-of-the-art approach.

References

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, pages 487--499, 1994. Google ScholarDigital Library
S. Asthana, O. D. King, F. D. Gibbons, and F. P. Roth. Predicting protein complex membership using probabilistic network reliability. Genome Research, 14:1170--1175, 2004.Google ScholarCross Ref
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970. Google ScholarDigital Library
J. Cheng, Y. Ke, and W. Ng. Graphgen: A synthetic graph generator. http://www.cse.ust.hk/graphgen/, 2006.Google Scholar
J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD, pages 857--872, 2007. Google ScholarDigital Library
L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. An improved algorithm for matching large graphs. In 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition, pages 149--159, 2001.Google Scholar
J. Ghosh, H. Q. Ngo, S. Yoon, and C. Qiao. On a routing problem within probabilistic graphs and its application to intermittently connected networks. In INFOCOM, pages 1721--1729, 2007.Google ScholarDigital Library
E. Gudes, S. E. Shimony, and N. Vanetik. Discovering frequent graph patterns using disjoint paths. IEEE Trans. Knowl. Data Eng., 18(11):1441--1456, 2006. Google ScholarDigital Library
H. He and A. K. Singh. Closure-tree: An index structure for graph queries. In ICDE, page 38, 2006. Google ScholarDigital Library
C. Helma, R. D. King, S. Kramer, and A. Srinivasan. The predictive toxicology evaluation challenge 2000--2001. Bioinformatics, 17(1):107--108, 2001.Google ScholarCross Ref
P. Hintsanen and H. Toivonen. Finding reliable subgraphs from large probabilistic graphs. Data Min. Knowl. Discov., 17(1):3--23, 2008. Google ScholarDigital Library
M. Hua and J. Pei. Probabilistic path queries in road networks: traffic uncertainty aware path selection. In EDBT, pages 347--358, 2010. Google ScholarDigital Library
J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM, 2003. Google ScholarDigital Library
J. Huan, W. Wang, J. Prins, and J. Yang. Spin: mining maximal frequent subgraphs from graph databases. In KDD, pages 581--586, 2004. Google ScholarDigital Library
J. Huff and J. Haseman. Long-term chemical carcinogenesis experiments for identifying potential human cancer hazards. Environmental Health Perspectives, 96(3):23--31, 1991.Google ScholarCross Ref
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD, pages 13--23, 2000. Google ScholarDigital Library
D. Kempe, J. M. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In KDD, pages 137--146, 2003. Google ScholarDigital Library
M. Kuramochi and G. Karypis. An efficient algorithm for discovering frequent subgraphs. IEEE Trans. Knowl. Data Eng., 16(9):1038--1051, 2004. Google ScholarDigital Library
D. Liben-Nowell and J. M. Kleinberg. The link prediction problem for social networks. In CIKM, pages 556--559, 2003. Google ScholarDigital Library
Y. Liu, J. Li, and H. Gao. Summarizing graph patterns. In ICDE, pages 903--912, 2008. Google ScholarDigital Library
S. Nijssen and J. N. Kok. A quickstart in frequent structure mining can make a difference. In KDD, pages 647--652, 2004. Google ScholarDigital Library
M. Potamias, F. Bonchi, A. Gionis, and G. Kollios. k-nearest neighbors in uncertain graphs. In PVLDB, 2010. Google ScholarDigital Library
J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1):31--42, 1976. Google ScholarDigital Library
L. G. Valiant. The complexity of computing the permanent. Theor. Comput. Sci., 8:189--201, 1979.Google ScholarCross Ref
C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-based graph databases. In KDD, pages 316--325, 2004. Google ScholarDigital Library
T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explor. Newsl., 5(1):59--68, 2003. Google ScholarDigital Library
X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In SIGMOD, pages 433--444, 2008. Google ScholarDigital Library
X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM, pages 721--724, 2002. Google ScholarDigital Library
X. Yan and J. Han. Closegraph: mining closed frequent graph patterns. In KDD, pages 286--295, 2003. Google ScholarDigital Library
X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarDigital Library
S. Zhang, J. Yang, and S. Li. Ring: An integrated method for frequent representative subgraph mining. In ICDM, pages 1082--1087, 2009. Google ScholarDigital Library
Z. Zou, J. Li, H. Gao, and S. Zhang. Frequent subgraph pattern mining on uncertain graph data. In CIKM, pages 583--592, 2009. Google ScholarDigital Library

Index Terms

Efficient discovery of frequent subgraph patterns in uncertain graph databases
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Recommendations

Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Frequent subgraph mining has been extensively studied on certain graph data. However, uncertainties are inherently accompanied with graph data in practice, and there is very few work on mining uncertain graph data. This paper investigates frequent ...
Read More
Frequent subgraph pattern mining on uncertain graph data
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Graph data are subject to uncertainties in many applications due to incompleteness and imprecision of data. Mining uncertain graph data is semantically different from and computationally more challenging than mining exact graph data. This paper ...
Read More
Mining Frequent Subgraph Patterns from Uncertain Graph Data

In many real applications, graph data is subject to uncertainties due to incompleteness and imprecision of data. Mining such uncertain graph data is semantically different from and computationally more challenging than mining conventional exact graph ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology
March 2011
587 pages
ISBN:9781450305280
DOI:10.1145/1951365
Editors:
Anastasia Ailamaki
EPFL, Switzerland
,
Sihem Amer-Yahia
Yahoo! Research
,
Jignesh Pate
University of Wisconsin-Madison
,
Tore Risch
Uppsala University, Sweden
,
Pierre Senellart
Télécom ParisTech, France
,
Julia Stoyanovich
University of Pennsylvania
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 March 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate7of10submissions,70%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 463
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient discovery of frequent subgraph patterns in uncertain graph databases

EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics

Frequent subgraph pattern mining on uncertain graph data

Mining Frequent Subgraph Patterns from Uncertain Graph Data