skip to main content
10.1145/3090354.3090363acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbdcaConference Proceedingsconference-collections
research-article

Big Data Analytics Techniques in Virtual Screening for Drug Discovery

Published: 29 March 2017 Publication History

Abstract

Virtual screening (VS) is a computational method used in the drug discovery process by searching large libraries of small molecules to identify that represent leads for certain target. According to the use of information about the ligand, the target or both, virtual screening techniques are classified into ligand-based and structure-based methods. These methods can be combined to build a hierarchical schema in order to benefit the advantages of each one. With the rapid development of High-Throughput Technologies in structural biology, that allows producing massive libraries of small molecules include tens of millions of molecules, led to define VS as Big Data analytics problem. MapReduce is a parallel programming model produced by Google, designed for Large Scale Data processing. Apache Hadoop is the most widely used open source MapReduce implementation. It was for many years, the leading Big Data framework. Recently, with the emergence of Apache Spark as a Big Data processing framework, it has become the most popular, due to their improvement of some deficiencies known with Hadoop's MapReduce such as, speed, pipelining, and iterative jobs. In this paper, we review the Molecular Docking (MD) workflow. Next, we analyze the two most applied Big Data analytic tools in VS field which are Hadoop's MapReduce and Spark. We identify some known shortcomings that make Hadoop's MapReduce not suitable for MD issue, and point out the need of a novel MD workflow in Spark.

References

[1]
Ocaña, K.et al. 2014. Exploring Large Scale Receptor-Ligand Pairs in Molecular Docking Workflows in HPC Clouds. In IEEE 28th International Parallel & Distributed Processing Symposium Workshops. 536--545. DOI= 10.1109/IPDPSW.2014.65.
[2]
Kumar, A. and Zhang, K. 2015. Hierarchical virtual screening approaches in small molecule drug discovery. Methods, 71 (2015). 26--37. DOI= http://dx.doi.org/10.1016/j.ymeth.2014.07.007.
[3]
Preeja, M. P. et al. 2015. Ligand-Based Virtual Screening using Random Walk Kernel and Empirical Filters. In the 3rd International Conference on Recent Trends in Computing 2015, Journal of Procedia Computer Science, 57(2015), 418--427. DOI= 10.1016/j.procs.2015.07.508.
[4]
Nuno, M.F.S.A.C. et al. 2015. Review: Receptor-based virtual screening protocol for drug discovery. Journal of Archivesof Biochemistry and Biophysics, 582, 56--67. DOI= http://dx.doi.org/10.1016/j.abb.2015.05.011.
[5]
Grinter, S. Z. and Zou, X. 2014. Review: Challenges, Applications, and Recent Advances of Protein-Ligand Docking in Structure-Based Drug Design. Journal of Molecules,19. 10150-10176. DOI= 10.3390/molecules190710150.
[6]
Cheng, T. et al. 2012. Structure-Based Virtual Screening for Drug Discovery: a Problem-Centric Review. Journal of AAPS,14(1).133--141. DOI=10.1208/s12248-012-9322-0.
[7]
Lavecchia, A. 2015. Machine-learning approaches in drug discovery: methods and applications. Journal of Drug Discovery Today, 20 (3).318--331. DOI= http://dx.doi.org/10.1016/j.drudis.2014.10.012.
[8]
PubChem database, http://pubchem.ncbi.nlm.nih.gov.
[9]
ZINC, a free database, http://zinc.docking.org/.
[10]
Capuccini, M. 2015. Structure-Based Virtual Screening in Spark. Degree project in bioinformatics. Biology Education Centre and Department of Pharmaceutical Bio sciences, Uppsala University.
[11]
Ahmed, L. et al. 2013. Using Iterative MapReduce for Parallel Virtual Screening. In IEEE, International Conference on Cloud Computing Technology and Science.27--32. DOI=10.1109/CloudCom.2013.99.
[12]
Norgan, A.P. et al. 2011.Multilevel Parallelization of AutoDock 4.2.Journal of Chemoinformatics, 3(12). DOI= 10.1186/1758-2946-3-12.
[13]
Jaghoori, M. M. Bleijlevens, B. and Olabarriaga, S.D. 2016. 1001 Ways to run AutoDock Vina for virtual screening. Journal of Comput Aided Mol Des, 30, 237--249. DOI= 10.1007/s10822-016-9900-9.
[14]
Ekanayake, J. et al. 2010. Twister: A Runtime for Iterative MapReduce. In the Proceeding of the19th ACM International Symposium on High Performance Distributed Computing (Chicago, Illinois, June 21-25, 2010).810--818. DOI= 10.1145/1851476.1851593.
[15]
Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04: In Sixth Symposium on Operating System Design and Implementation (San Francisco). DOI= 10.1145/1327452.1327492
[16]
Apache Hadoop, http://hadoop.apache.org.
[17]
Apache Spark™, http://spark.apache.org.
[18]
Zaharia, M. et al.2010. Spark: Cluster computing with working sets. In the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10 (Berkeley, CA,USA).1--7.
[19]
Scott, J. A. 2015. Getting Started with Apache Spark. MapR Technologies, Inc., 350 Holger Way, San Jose, CA 95134.
[20]
Apache Spark, Introduction, www.tutorialspoint.com.
[21]
Meng, X.Y. et al. 2011. Molecular Docking: A powerful approach for structure-based drug discovery. Journal of Curr Comput Aided Drug Des, 7(2).146--157.
[22]
Ashtawy, H.M. and Mahapatra, N.R. 2014. Molecular Docking for Drug Discovery: Machine-Learning Approaches for Native Pose Prediction of Protein-Ligand Complexes. Chapter in Computational Intelligence Methods for Bioinformatics and Biostatistics, Volume 8452 of the series Lecture Notes in Computer Science. 15--32. DOI= 10.1007/978-3-319-09042-9_2.
[23]
Herland, M. et al. 2014. A review of data mining using big data in health informatics. Journal of Big Data, 35 pages. DOI= 10.1186/2196-1115-1-2.
[24]
Wandelt, S. et al. 2012. Data Management Challenges in Next Generation Sequencing. Journal of Datenbank-Spektrum, 12 (3). 161--171. DOI= 10.1007/s13222-012-0098-2.
[25]
Yen, W. et al. 2017. -Omic and Electronic Health Records Big Data Analytics for Precision Medicine. J. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 64 (2). 263--273. DOI= 10.1109/TBME.2016.2573285.
[26]
Ferhat, ç.ö. and Balaban, M.E. 2013. A MapReduce-based distributed SVM algorithm for binary classification. Turkish Journal of Electrical Engineering & Computer Sciences. 11pages.
[27]
Malik, L. 2015. MapReduce Algorithms Optimizes the Potential of Big Data. International Journal of Computer Science and Mobile Computing, 4(6).663--674.
[28]
Ghemawat, S.et al. 2003. The Google File System. In the 19th ACM Symposium on Operating Systems Principles (Lake George, NY, October, 2003). DOI= 10.1145/1165389.945450
[29]
Wei, H. et al. 2017. In-Memory Parallel Processing of Massive Remotely Sensed Data Using an Apache Spark on Hadoop YARN Model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(1). 3--19. DOI= 10.1109/JSTARS.2016.2547020.
[30]
Vapnik, V. 1995. The Nature of Statistical Learning Theory [online]. Springer-Verlag New York. http://www.springer.com/us/book/9780387987804.
[31]
Ellingson, S. R. and Baudry, J. 2011. High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud. In the Second International Emerging Computational Methods for the Life Sciences Workshop, ACM International Symposium on High Performance Distributed Computing (San Jose, CA). 33--38. DOI= 10.1145/1996023.1996028.
[32]
Jing, Z. et al.2012. Hadoop MapReduce Framework to Implement Molecular Docking of Large-Scale Virtual Screening. In IEEE Asia-Pacific Services Computing Conference.350--353. DOI= 10.1109/APSCC.2012.67.
[33]
Dongliang, D., Dongyue, W. and Fuli, Y. 2016. An Overview on Cloud Computing Platform Spark for Human Genome Mining. In Proceedings of IEEE International Conference on Mechatronics and Automation, Harbin, China. 2605-2610. DOI= 10.1109/ICMA.2016.7558977.
[34]
OpenBabel: The Open Source Chemistry Toolbox, http://openbabel.org/wiki/Main_Page.
[35]
Ruisheng, Z. et al. 2013. A Similarity-Based Grouping Method for Molecular Docking in Distributed System. Chapter in: Advanced Data Mining and Applications, Volume 8346 of the series Lecture Notes in Computer Science. 554--563. DOI= 10.1007/978-3-642-53914-5_47.
[36]
Huang, S.Y. et al. 2016. HybridDock: A Hybrid Protein--Ligand Docking Protocol Integrating Protein- and Ligand-Based Approaches. Journal of Chemical Information and Modeling, 56(6). 1078--1087. DOI= 10.1021/acs.jcim.5b00275.
[37]
Vyas, B. et al. 2016. Pharmacophore and docking-based hierarchical virtual screening for the designing of aldose reductase inhibitors: synthesis and biological evaluation. Medicinal Chemistry Research, 25(4). 609--626. DOI= 10.1007/s00044-016-1510-5.

Cited By

View all
  • (2021)A Binary Classification Model for Toxicity Prediction in Drug DesignHybrid Artificial Intelligent Systems10.1007/978-3-030-86271-8_13(149-157)Online publication date: 22-Sep-2021
  • (2021)Key Aspects for Achieving Hits by Virtual Screening StudiesFunctional Properties of Advanced Engineering Materials and Biomolecules10.1007/978-3-030-62226-8_16(455-487)Online publication date: 18-May-2021
  • (2018)A Computer Cluster for Big Data and Data Analytics ManagementProceedings of the Euro American Conference on Telematics and Information Systems10.1145/3293614.3293626(1-8)Online publication date: 12-Nov-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
BDCA'17: Proceedings of the 2nd international Conference on Big Data, Cloud and Applications
March 2017
685 pages
ISBN:9781450348522
DOI:10.1145/3090354
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Ministère de I'enseignement supérieur: Ministère de I'enseignement supérieur

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Big Data
  2. Drug Discovery
  3. Hadoop
  4. Hierarchical Virtual Screening
  5. MapReduce
  6. Molecular Docking
  7. Spark
  8. Virtual Screening

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

BDCA'17

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)A Binary Classification Model for Toxicity Prediction in Drug DesignHybrid Artificial Intelligent Systems10.1007/978-3-030-86271-8_13(149-157)Online publication date: 22-Sep-2021
  • (2021)Key Aspects for Achieving Hits by Virtual Screening StudiesFunctional Properties of Advanced Engineering Materials and Biomolecules10.1007/978-3-030-62226-8_16(455-487)Online publication date: 18-May-2021
  • (2018)A Computer Cluster for Big Data and Data Analytics ManagementProceedings of the Euro American Conference on Telematics and Information Systems10.1145/3293614.3293626(1-8)Online publication date: 12-Nov-2018
  • (2018)Ensemble Learning for Large Scale Virtual Screening on Apache SparkComputational Intelligence and Its Applications10.1007/978-3-319-89743-1_22(244-256)Online publication date: 12-Apr-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media