Abstract:
The applications of computational tools at various stages of drug discovery is one of the most active axes of research. Virtual Screening (VS) is a very common applicatio...Show MoreMetadata
Abstract:
The applications of computational tools at various stages of drug discovery is one of the most active axes of research. Virtual Screening (VS) is a very common application, which aims to screen and analyze large chemical libraries using algorithms and models to extract drug-candidates that can bind to therapeutic targets. Machine learning (ML) techniques are widely applied as a tool to analyze the chemical libraries in ligand-based virtual screening (LBVS). Deep learning (DL) is a novel mode of machine learning that provides several new architectures primarily based on classical Artificial Neural Network algorithms, but with many hidden layers to learn features with multiple levels of abstraction. Recently, chemical libraries are identified as Big Data, due to their huge size, the variety of data, and the speed at which they are created, streamed and aggregated. In this context, we need advanced tools to handle and treat this type of data. Apache Spark is the most widely used engine for big data processing, with many improvements that make it more suitable for virtual screening analysis. In this work, we propose a novel workflow named DeepD_ DrugC based on Spark and Deep Neural Network model implemented with Deeplearning4j (DL4J) to improve the prediction results in LBVS. To evaluate the workflow, we suggest a process to create training datasets using the PubChem Bioassay database for cancer disease. The evaluation results show a good precision more than 93%, with acceptable scaling behavior.
Date of Conference: 12-13 October 2022
Date Added to IEEE Xplore: 16 November 2022
ISBN Information: