A Method of Improving the Efficiency of Mining Sub-structures in Molecular Structure Databases

Li, Haibo; Wang, Yuanzhen; Lü, Kevin

doi:10.1007/978-3-540-73390-4_20

Haibo Li¹,
Yuanzhen Wang¹ &
Kevin Lü²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4587))

Included in the following conference series:

British National Conference on Databases

619 Accesses

Abstract

One problem exists in current substructure mining algorithms is that when the sizes of molecular structure databases increase, the costs in terms of both time and space increase to a level that normal PCs are not powerful enough to perform substructure data mining tasks. After examining a number of well known molecular structure databases, we found that there exist a large number of common loop substructures within molecular structure databases, and repeatedly mining these same substructures costs the system resources significantly. In this paper, we introduce a new method: (1) to treat these common loop substructures as some kinds of “atom” structures; (2) to maintain the links of the new “atom” structures with the rest of the molecular structures, and to reorganize the original molecular structures. Therefore we avoid repeat many same operations during mining process and produce less redundant results. We tested the method using four real molecular structure databases: AID2DA’99/CA, AID2DA’99/CM, AID2DA’99 and NCI’99. The results indicated that (1) the speed of substructure mining has been improved due to the reorganization; (2) the number of patterns obtained by mining has been reduced with less redundant information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, A.D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Chapter Google Scholar
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proc. 2001 Int. Conf. Data Mining (ICDM 2001), San Jose, CA (November 2001)
Google Scholar
National Cancer Institute (NCI). Dtp/2d and 3d structural information (1999), http://cactus.nci.nih.gov/ncidb2/download.html
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD International Conference on knowledge Discovery and Data Mining (KDD 2004), Seattle, USA (August 2004)
Google Scholar
Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In UIUC-CS Tech. Report: R-2002-2296, A short version published. In: Proc. 2002 Int. Conf. Data Mining(ICDM 2002), Maebashi, Japan (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Science, Huazhong University of Science and Technology, Wuhan 430074, China
Haibo Li & Yuanzhen Wang
BBS, Room76, Tin Building, Brunel University, Uxbridge,UB8 3PH, UK
Kevin Lü

Authors

Haibo Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuanzhen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Lü
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Richard Cooper Jessie Kennedy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, H., Wang, Y., Lü, K. (2007). A Method of Improving the Efficiency of Mining Sub-structures in Molecular Structure Databases. In: Cooper, R., Kennedy, J. (eds) Data Management. Data, Data Everywhere. BNCOD 2007. Lecture Notes in Computer Science, vol 4587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73390-4_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-73390-4_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73389-8
Online ISBN: 978-3-540-73390-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics