skip to main content
10.1145/2247596.2247666acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
tutorial

Indexing and mining topological patterns for drug discovery

Published: 27 March 2012 Publication History

Abstract

Increased availability of large repositories of chemical compounds has created new challenges and opportunities for the application of data-mining and indexing techniques to problems in chemical informatics. The primary goal in analysis of molecular databases is to identify structural patterns that can predict biological activity. Two of the most popular approaches to representing molecular topologies are graphs and 3D geometries. As a result, the problem of indexing and mining structural patterns map to indexing and mining patterns from graph and 3D geometric databases.
In this tutorial, we will first introduce the problem of drug discovery and how computer science plays a critical role in that process. We will then proceed by introducing the problem of performing subgraph and similarity searches on large graph databases. Due to the NP-hardness of the problems, a number of heuristics have been designed in recent years and the tutorial will present an overview of those techniques. Next, we will introduce the problem of mining frequent subgraph patterns along with some of their limitations that ignited the interest in the problem of mining statistically significant subgraph patterns. After presenting an in-depth survey of the techniques on mining significant subgraph patterns, the tutorial will proceed towards the problem of analyzing 3D geometric structures of molecules. Finally, we will conclude by presenting two open computer science problems that can have a significant impact in the field of drug discovery.

References

[1]
C. Borgelt and M. R. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. ICDM, 2002.
[2]
J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD, 2007.
[3]
M. A. Hasan and M. J. Zaki. Output space sampling for graph patterns. PVLDB, 2(1), 2009.
[4]
H. He and A. K. Singh. Closure-tree: An index structure for graph queries. In Proceedings of the 22nd International Conference on Data Engineering, ICDE, 2006.
[5]
H. He and A. K. Singh. GraphRank: Statistical Modeling and Mining of Significant Subgraphs in the Feature Space. In ICDM, 2006.
[6]
J. Huan, W. Wang, and J. Prins. Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. In ICDM, 2003.
[7]
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Principles of Data Mining and Knowledge Discovery, volume 1910, pages 13--23. 2000.
[8]
H. Jiang, H. Wang, P. S. Yu, and S. Zhou. Gstring: A novel approach for efficient search in graph databases. In ICDE, 2007.
[9]
N. Jin, C. Young, and W. W. 0010. Gaia: graph classification using evolutionary computation. In SIGMOD, 2010.
[10]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, 2001.
[11]
S. Nijssen and J. N. Kok. The Gaston tool for Frequent Subgraph Mining. In Proceedings of the International Workshop on Graph-Based Tools, 2004.
[12]
Y. Podolyan and G. Karypis. Common pharmacophore identification using frequent clique detection algorithm. Journal of Chemical Information and Modeling, 49(1):13--21, 2009.
[13]
S. Ranu, B. T. Calhoun, A. K. Singh, and S. J. Swamidass. Probabilistic substructure mining from small-molecule screens. Molecular Informatics, 30(9):809--815, 2011.
[14]
S. Ranu and A. K. Singh. Graphsig: A scalable approach to mining significant subgraphs in large graph databases. In ICDE, 2009.
[15]
S. Ranu and A. K. Singh. Mining statistically significant molecular substructures for efficient molecular classification. J. Chem. Inf. Model., 49:2537--2550, 2009.
[16]
S. Ranu and A. K. Singh. Novel method for pharmacophore analysis by examining the joint pharmacophore space. Journal of Chemical Information and Modeling, 51(5):1106--1121, 2011.
[17]
H. Shang, X. Lin, Y. Zhang, J. X. Yu, and W. W. 0011. Connected substructure similarity search. In SIGMOD, 2010.
[18]
H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. VLDB, 2008.
[19]
D. Shasha, J. T.-L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In PODS, 2002.
[20]
N. Vanetik and E. Gudes. Mining Frequent Labeled and Partially Labeled Graph Patterns. In ICDE, 2004.
[21]
D. W. Williams, J. Huan, and W. Wang. Graph database indexing using structured graph decomposition. In ICDE, 2007.
[22]
X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining Significant Graph Patterns by Scalable Leap Search. In SIGMOD, 2008.
[23]
X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM, 2002.
[24]
X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In SIGMOD, 2004.
[25]
X. Yan, P. S. Yu, and J. Han. Substructure similarity search in graph databases. In SIGMOD, 2005.
[26]
X. Yan, F. Zhu, J. Han, and P. S. Yu. Searching substructures with superimposed distance. In ICDE, 2006.
[27]
Z. Zeng, A. K. H. Tung, J. Wang, J. Feng, and L. Zhou. Comparing stars: On approximating graph edit distance. PVLDB, 2(1), 2009.
[28]
S. Zhang, M. Hu, and J. Yang. Treepi: A novel graph indexing method. In ICDE, 2007.
[29]
P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: Tree + delta >= graph. In VLDB, 2007.
[30]
L. Zou, L. C. 0002, J. X. Yu, and Y. Lu. A novel spectral coding in a large graph database. In EDBT, 2008.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '12: Proceedings of the 15th International Conference on Extending Database Technology
March 2012
643 pages
ISBN:9781450307901
DOI:10.1145/2247596

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. frequent subgraphs
  2. graph databases
  3. graph indexing
  4. graph mining
  5. significant geometric patterns
  6. significant subgraphs
  7. top-k queries

Qualifiers

  • Tutorial

Conference

EDBT '12

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Mining Top-k pairs of correlated subgraphs in a large networkProceedings of the VLDB Endowment10.14778/3397230.339724513:9(1511-1524)Online publication date: 26-Jun-2020
  • (2019)RAQ: Relationship-Aware Graph Querying in Large NetworksThe World Wide Web Conference10.1145/3308558.3313448(1886-1896)Online publication date: 13-May-2019
  • (2018)ReslingKnowledge and Information Systems10.1007/s10115-017-1129-y54:1(123-149)Online publication date: 1-Jan-2018
  • (2017)Similarity Search in Large-Scale Graph DatabasesHandbook of Big Data Technologies10.1007/978-3-319-49340-4_15(507-529)Online publication date: 26-Feb-2017
  • (2016)A Scalable and Generic Framework to Mine Top-k Representative Subgraph Patterns2016 IEEE 16th International Conference on Data Mining (ICDM)10.1109/ICDM.2016.0048(370-379)Online publication date: Dec-2016
  • (2015)Authenticated Subgraph Similarity Searchin Outsourced Graph DatabasesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2014.231681827:7(1838-1860)Online publication date: 1-Jul-2015
  • (2015)Motif Discovery in Protein 3D‐Structures using Graph Mining TechniquesPattern Recognition in Computational Molecular Biology10.1002/9781119078845.ch10(165-189)Online publication date: 18-Dec-2015
  • (2014)Answering top-k representative queries on graph databasesProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2610524(1163-1174)Online publication date: 18-Jun-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media