research-article

Big Data Analytics Techniques in Virtual Screening for Drug Discovery

Authors:

Mohamed Chawki BatoucheAuthors Info & Claims

BDCA'17: Proceedings of the 2nd international Conference on Big Data, Cloud and Applications

Article No.: 9, Pages 1 - 7

https://doi.org/10.1145/3090354.3090363

Published: 29 March 2017 Publication History

Abstract

Virtual screening (VS) is a computational method used in the drug discovery process by searching large libraries of small molecules to identify that represent leads for certain target. According to the use of information about the ligand, the target or both, virtual screening techniques are classified into ligand-based and structure-based methods. These methods can be combined to build a hierarchical schema in order to benefit the advantages of each one. With the rapid development of High-Throughput Technologies in structural biology, that allows producing massive libraries of small molecules include tens of millions of molecules, led to define VS as Big Data analytics problem. MapReduce is a parallel programming model produced by Google, designed for Large Scale Data processing. Apache Hadoop is the most widely used open source MapReduce implementation. It was for many years, the leading Big Data framework. Recently, with the emergence of Apache Spark as a Big Data processing framework, it has become the most popular, due to their improvement of some deficiencies known with Hadoop's MapReduce such as, speed, pipelining, and iterative jobs. In this paper, we review the Molecular Docking (MD) workflow. Next, we analyze the two most applied Big Data analytic tools in VS field which are Hadoop's MapReduce and Spark. We identify some known shortcomings that make Hadoop's MapReduce not suitable for MD issue, and point out the need of a novel MD workflow in Spark.

References

[1]

Ocaña, K.et al. 2014. Exploring Large Scale Receptor-Ligand Pairs in Molecular Docking Workflows in HPC Clouds. In IEEE 28th International Parallel & Distributed Processing Symposium Workshops. 536--545. DOI= 10.1109/IPDPSW.2014.65.

Digital Library

[2]

Kumar, A. and Zhang, K. 2015. Hierarchical virtual screening approaches in small molecule drug discovery. Methods, 71 (2015). 26--37. DOI= http://dx.doi.org/10.1016/j.ymeth.2014.07.007.

[3]

Preeja, M. P. et al. 2015. Ligand-Based Virtual Screening using Random Walk Kernel and Empirical Filters. In the 3rd International Conference on Recent Trends in Computing 2015, Journal of Procedia Computer Science, 57(2015), 418--427. DOI= 10.1016/j.procs.2015.07.508.

[4]

Nuno, M.F.S.A.C. et al. 2015. Review: Receptor-based virtual screening protocol for drug discovery. Journal of Archivesof Biochemistry and Biophysics, 582, 56--67. DOI= http://dx.doi.org/10.1016/j.abb.2015.05.011.

[5]

Grinter, S. Z. and Zou, X. 2014. Review: Challenges, Applications, and Recent Advances of Protein-Ligand Docking in Structure-Based Drug Design. Journal of Molecules,19. 10150-10176. DOI= 10.3390/molecules190710150.

[6]

Cheng, T. et al. 2012. Structure-Based Virtual Screening for Drug Discovery: a Problem-Centric Review. Journal of AAPS,14(1).133--141. DOI=10.1208/s12248-012-9322-0.

[7]

Lavecchia, A. 2015. Machine-learning approaches in drug discovery: methods and applications. Journal of Drug Discovery Today, 20 (3).318--331. DOI= http://dx.doi.org/10.1016/j.drudis.2014.10.012.

[8]

PubChem database, http://pubchem.ncbi.nlm.nih.gov.

[9]

ZINC, a free database, http://zinc.docking.org/.

[10]

Capuccini, M. 2015. Structure-Based Virtual Screening in Spark. Degree project in bioinformatics. Biology Education Centre and Department of Pharmaceutical Bio sciences, Uppsala University.

[11]

Ahmed, L. et al. 2013. Using Iterative MapReduce for Parallel Virtual Screening. In IEEE, International Conference on Cloud Computing Technology and Science.27--32. DOI=10.1109/CloudCom.2013.99.

Digital Library

[12]

Norgan, A.P. et al. 2011.Multilevel Parallelization of AutoDock 4.2.Journal of Chemoinformatics, 3(12). DOI= 10.1186/1758-2946-3-12.

[13]

Jaghoori, M. M. Bleijlevens, B. and Olabarriaga, S.D. 2016. 1001 Ways to run AutoDock Vina for virtual screening. Journal of Comput Aided Mol Des, 30, 237--249. DOI= 10.1007/s10822-016-9900-9.

[14]

Ekanayake, J. et al. 2010. Twister: A Runtime for Iterative MapReduce. In the Proceeding of the19th ACM International Symposium on High Performance Distributed Computing (Chicago, Illinois, June 21-25, 2010).810--818. DOI= 10.1145/1851476.1851593.

Digital Library

[15]

Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04: In Sixth Symposium on Operating System Design and Implementation (San Francisco). DOI= 10.1145/1327452.1327492

[16]

Apache Hadoop, http://hadoop.apache.org.

[17]

Apache Spark™, http://spark.apache.org.

[18]

Zaharia, M. et al.2010. Spark: Cluster computing with working sets. In the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10 (Berkeley, CA,USA).1--7.

[19]

Scott, J. A. 2015. Getting Started with Apache Spark. MapR Technologies, Inc., 350 Holger Way, San Jose, CA 95134.

[20]

Apache Spark, Introduction, www.tutorialspoint.com.

[21]

Meng, X.Y. et al. 2011. Molecular Docking: A powerful approach for structure-based drug discovery. Journal of Curr Comput Aided Drug Des, 7(2).146--157.

[22]

Ashtawy, H.M. and Mahapatra, N.R. 2014. Molecular Docking for Drug Discovery: Machine-Learning Approaches for Native Pose Prediction of Protein-Ligand Complexes. Chapter in Computational Intelligence Methods for Bioinformatics and Biostatistics, Volume 8452 of the series Lecture Notes in Computer Science. 15--32. DOI= 10.1007/978-3-319-09042-9_2.

[23]

Herland, M. et al. 2014. A review of data mining using big data in health informatics. Journal of Big Data, 35 pages. DOI= 10.1186/2196-1115-1-2.

[24]

Wandelt, S. et al. 2012. Data Management Challenges in Next Generation Sequencing. Journal of Datenbank-Spektrum, 12 (3). 161--171. DOI= 10.1007/s13222-012-0098-2.

[25]

Yen, W. et al. 2017. -Omic and Electronic Health Records Big Data Analytics for Precision Medicine. J. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 64 (2). 263--273. DOI= 10.1109/TBME.2016.2573285.

[26]

Ferhat, ç.ö. and Balaban, M.E. 2013. A MapReduce-based distributed SVM algorithm for binary classification. Turkish Journal of Electrical Engineering & Computer Sciences. 11pages.

[27]

Malik, L. 2015. MapReduce Algorithms Optimizes the Potential of Big Data. International Journal of Computer Science and Mobile Computing, 4(6).663--674.

[28]

Ghemawat, S.et al. 2003. The Google File System. In the 19th ACM Symposium on Operating Systems Principles (Lake George, NY, October, 2003). DOI= 10.1145/1165389.945450

Digital Library

[29]

Wei, H. et al. 2017. In-Memory Parallel Processing of Massive Remotely Sensed Data Using an Apache Spark on Hadoop YARN Model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(1). 3--19. DOI= 10.1109/JSTARS.2016.2547020.

[30]

Vapnik, V. 1995. The Nature of Statistical Learning Theory [online]. Springer-Verlag New York. http://www.springer.com/us/book/9780387987804.

[31]

Ellingson, S. R. and Baudry, J. 2011. High-Throughput Virtual Molecular Docking: Hadoop Implementation of AutoDock4 on a Private Cloud. In the Second International Emerging Computational Methods for the Life Sciences Workshop, ACM International Symposium on High Performance Distributed Computing (San Jose, CA). 33--38. DOI= 10.1145/1996023.1996028.

[32]

Jing, Z. et al.2012. Hadoop MapReduce Framework to Implement Molecular Docking of Large-Scale Virtual Screening. In IEEE Asia-Pacific Services Computing Conference.350--353. DOI= 10.1109/APSCC.2012.67.

Digital Library

[33]

Dongliang, D., Dongyue, W. and Fuli, Y. 2016. An Overview on Cloud Computing Platform Spark for Human Genome Mining. In Proceedings of IEEE International Conference on Mechatronics and Automation, Harbin, China. 2605-2610. DOI= 10.1109/ICMA.2016.7558977.

Digital Library

[34]

OpenBabel: The Open Source Chemistry Toolbox, http://openbabel.org/wiki/Main_Page.

[35]

Ruisheng, Z. et al. 2013. A Similarity-Based Grouping Method for Molecular Docking in Distributed System. Chapter in: Advanced Data Mining and Applications, Volume 8346 of the series Lecture Notes in Computer Science. 554--563. DOI= 10.1007/978-3-642-53914-5_47.

[36]

Huang, S.Y. et al. 2016. HybridDock: A Hybrid Protein--Ligand Docking Protocol Integrating Protein- and Ligand-Based Approaches. Journal of Chemical Information and Modeling, 56(6). 1078--1087. DOI= 10.1021/acs.jcim.5b00275.

[37]

Vyas, B. et al. 2016. Pharmacophore and docking-based hierarchical virtual screening for the designing of aldose reductase inhibitors: synthesis and biological evaluation. Medicinal Chemistry Research, 25(4). 609--626. DOI= 10.1007/s00044-016-1510-5.

Cited By

Varela-Salinas GCamacho-Cruz HSaldivar AMartinez-Rodriguez JRodriguez-Rodriguez JGarcia-Perez C(2021)A Binary Classification Model for Toxicity Prediction in Drug DesignHybrid Artificial Intelligent Systems10.1007/978-3-030-86271-8_13(149-157)Online publication date: 22-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-86271-8_13
Federico LBarcelos MSilva GFrancischini ITaft Cda Silva C(2021)Key Aspects for Achieving Hits by Virtual Screening StudiesFunctional Properties of Advanced Engineering Materials and Biomolecules10.1007/978-3-030-62226-8_16(455-487)Online publication date: 18-May-2021
https://doi.org/10.1007/978-3-030-62226-8_16
García-Ojeda JOrtíz MGarcía RCáceres JArgoti APaillard GMendes Neto FFerreira Coutinho EChagas do Nascimento R(2018)A Computer Cluster for Big Data and Data Analytics ManagementProceedings of the Euro American Conference on Telematics and Information Systems10.1145/3293614.3293626(1-8)Online publication date: 12-Nov-2018
https://dl.acm.org/doi/10.1145/3293614.3293626
Show More Cited By

Index Terms

Big Data Analytics Techniques in Virtual Screening for Drug Discovery

Index terms have been assigned to the content through auto-classification.

Recommendations

Discovery of novel influenza inhibitors targeting the interaction of dsRNA with the NS1 protein by structure-based virtual screening

Influenza A Non-structural protein 1 (NS1A) RNA-Binding Domain (RBD) bound to a double-stranded RNA (dsRNA), which can inhibit the activation of antiviral pathway. The chemical compound binding sites at this pocket have abilities to block NS1 protein to ...
Pharmacophore-based virtual screening for identifying β5 subunit inhibitor of 20S proteasome
Graphical abstract
Lig1546/ZINC33356235 in the binding pocket of β5 subunit of 20S proteasome.

Display Omitted
Highlights
- A pharmacophore-based virtual screening and molecular docking were employed to identify ligands as inhibitors of proteasome β5 subunit.
Abstract
Proteasomal system plays an important role in maintaining cell homeostatis. Overexpression of proteasomes leads to several major diseases, such as cancer and autoimmune disorder. The β5 subunit of proteasome is a crucial active site in ...
A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing
ICDS 2015: Proceedings of the Second International Conference on Data Science - Volume 9208

With the fast development of remote sensing techniques, the volume of acquired data grows exponentially. This brings a big challenge to process massive remote sensing data. In the paper, an in-memory computing framework is proposed to address this ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

BDCA'17: Proceedings of the 2nd international Conference on Big Data, Cloud and Applications

March 2017

685 pages

ISBN:9781450348522

DOI:10.1145/3090354

Conference Chairs:
Mohamed Lazaar
ENSA, Tetuan, Morocco
,
Youness Tabii
ENSA, Tetuan, Morocco
,
Mohamed Chrayah
ENSA, Tetuan - Morocco
,
Mohammed Al Achhab
ENSA, Tetuan, Morocco

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Ministère de I'enseignement supérieur: Ministère de I'enseignement supérieur

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

BDCA'17

BDCA'17: 2nd international Conference on Big Data, Cloud and Applications

March 29 - 30, 2017

Tetouan, Morocco

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
214
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Varela-Salinas GCamacho-Cruz HSaldivar AMartinez-Rodriguez JRodriguez-Rodriguez JGarcia-Perez C(2021)A Binary Classification Model for Toxicity Prediction in Drug DesignHybrid Artificial Intelligent Systems10.1007/978-3-030-86271-8_13(149-157)Online publication date: 22-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-86271-8_13
Federico LBarcelos MSilva GFrancischini ITaft Cda Silva C(2021)Key Aspects for Achieving Hits by Virtual Screening StudiesFunctional Properties of Advanced Engineering Materials and Biomolecules10.1007/978-3-030-62226-8_16(455-487)Online publication date: 18-May-2021
https://doi.org/10.1007/978-3-030-62226-8_16
García-Ojeda JOrtíz MGarcía RCáceres JArgoti APaillard GMendes Neto FFerreira Coutinho EChagas do Nascimento R(2018)A Computer Cluster for Big Data and Data Analytics ManagementProceedings of the Euro American Conference on Telematics and Information Systems10.1145/3293614.3293626(1-8)Online publication date: 12-Nov-2018
https://dl.acm.org/doi/10.1145/3293614.3293626
Sid KBatouche M(2018)Ensemble Learning for Large Scale Virtual Screening on Apache SparkComputational Intelligence and Its Applications10.1007/978-3-319-89743-1_22(244-256)Online publication date: 12-Apr-2018
https://doi.org/10.1007/978-3-319-89743-1_22

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten