Abstract
Molecular docking is one main technique in Virtual Screening. During a molecular docking process, the molecule docking time presents serious diversity because of different chemical structures. The time diversity can cause certain nodes to overload, thereby reducing the data processing ability of the whole distributed molecular docking system. Therefore, a reasonable and efficient data grouping strategy is essential in the molecular docking system. In this paper, molecular structural similarity is researched in depth, and a similarity-based data grouping method is proposed. On the basis of the work in Database Management System for Virtual Screening, the method takes advantage of the computational chemistry software Chemistry Development Kit and cluster analysis methods to process the chemical molecules data. Finally, we deploy and implement the data grouping method on the Hadoop distributed platform. The experimental results show that this data grouping method can improve the efficiency of molecular docking.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mclnnes, C.: Virtual screening strategies in drug discovery. Current Opinion in Chemical Biology 11, 494–502 (2007)
Conrad, M.: Molecular computing: the lock-key paradigm. Computer 25(11), 11–20 (1992)
Beynon, M.D., Kurc, T., Catalyurek, U., Chang, C., Sussman, A., Saltz, J.: Distributed processing of very large datasets with DataCutter. Parallel Computing 27(11), 1457–1478 (2001)
Yi, Z.: The Rethinking of the Competitive Strategy Based on the Cannikin Law. Journal of Ningbo Institute of Education 2, 029 (2011)
Khetan, A., Vivek, B., Gupta, S.C.: A Novel Survey on Load Balancing in Cloud Computing. International Journal of Engineering 2(2) (2013)
Jingwei, L., Rongjing, H., Ruisheng, Z., Jiuqiang, C., Guangcai, L.: An Effective Data Management Solution for Distributed Virtual Screening. In: The 2012 IET International Conference on Frotier Computin., pp. 280–285 (2012)
Maldonado, A.G., Doucet, J.P., Petitjean, M., Fan, B.T.: Molecular similarity and diversity in chemoinformatics from theory to applications. Molecular Diversity 10(1), 39–79 (2006)
Johnson, M.A., Gerald, M.: Maggiora: Concepts and applications of molecular similarity, vol. 8. Wiley, New York (1990)
Daylight Chemical Information Systems Int., http://www.daylight.com/
Barnard Chemical Information Ltd., http://www.bci.gb.com/
Tripos Inc., http://www.tripos.com/
White, T.: Hadoop: The definitive guide. O’Reilly Media, Inc. (2012)
ZINC- A free database for virtural screening, http://zinc.docking.org/
PubChem, http://pubchem.ncbi.nlm.nih.gov/
Taylor, R.C.: An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics 11(suppl. 12) (2010)
Ellingson, S.R., Jerome, B.: High-throughput virtual molecular docking: Hadoop implementation of AutoDock4 on a private cloud. In: Proceedings of the Second International Workshop on Emerging Computational Methods for the life Sciences. ACM (2011)
Holliday, J.D., Hu, C.Y., Peter, W.: Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. Combinatorial Chemistry & High Throughput Screening 5(2), 155–166 (2002)
Steinbeck, C., Hoppe, C., Kuhn, S., Floris, M., Guha, R., Willighagen, E.L.: Recent developments of the chemistry development kit (CDK) – an open-source Java library for chemo- and bioinformatics. Curr. Pharm. Des. 12(17), 2111–2120 (2006)
Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttman, E., Willighagen, E.: The Chemistry Development Kit (CDK): an open-source Java library for Chemo-and Bioinformatics. J. Chem. Inf. Comput. Sci. 43(2), 493–500 (2003)
Borthakur, D.: HDFS architecture guide. Hadoop Apache Project, http://hadoop.apache.org/common/docs/current/hdfs_design.pdf
Chen, X., Frank, K.B.: Asymmetry of chemical similarity. Chem. Med. Chem. 2(2), 180–182 (2007)
Kaufman, L., Peter, J.R.: Finding groups in data: an introduction to cluster analysis, vol. 344. Wiley-Interscience (2009)
Hai, M., Zhang, S., Zhu, L., Wang, Y.: A Survey of Distributed Clustering Algorithms. In: 2012 International Conference on Industrial Control and Electronics Engineering (ICICEE), pp. 1142–1145. IEEE (2012)
Yuan, D., et al.: A data dependency based strategy for intermediate data storage in scientific cloud workflow systems. Concurrency and Computation: Practice and Experience 24(9), 956–976 (2012)
Ping, S.H.E.N.: The Research on Mining High Dimensional Data. Computer Knowledge and Technology 6, 011 (2009)
Zhou, T., Caflisch, A.: Data management system for distributed virtual screening. Journal of Chemical Information and Modeling 49(1), 145–152 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, R., Liu, G., Hu, R., Wei, J., Li, J. (2013). A Similarity-Based Grouping Method for Molecular Docking in Distributed System. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53914-5_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-53914-5_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53913-8
Online ISBN: 978-3-642-53914-5
eBook Packages: Computer ScienceComputer Science (R0)