Skip to main content

Extracting Knowledge from Recombinations of SMILES Representations

  • Conference paper
  • First Online:
Artificial Intelligence Applications and Innovations. AIAI 2023 IFIP WG 12.5 International Workshops (AIAI 2023)

Abstract

The exploitation of all possible combinations of the non-common substructure of compounds using Simplified Molecular-Input Line-Entry System (SMILES) representations is an essential part in terms of accurate chemical information processing. SMILES is a widely used encoding for representing chemical compounds as strings of characters. In our paper, a novel approach, which treats the SMILES strings as a sequence of letters, numbers and symbols in order to extract meaningful knowledge, is presented. It identifies the common substructure between two given SMILES. For the non-common substructure, we extensively search all possible combinations of the string characters of all possible lengths. Finally, for all these character combinations, we accept only those that are chemically correct. So, our approach suggests all possible substructures that may be present for the non-common substructure between two compounds using the atoms that already exist in the initial non-common substructure. This approach can generate all possible fragments that could exist for a given non-common substructure while maintaining the common substructure and could be used in drug discovery and other chemical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arús-Pous, J., et al.: Smiles-based deep generative scaffold decorator for de-novo drug design. J. Cheminform. 12(1), 1–18 (2020)

    Article  Google Scholar 

  2. Backman, T.W.H., Cao, Y., Girke, T.: Chemmine tools: an online service for analyzing and clustering small molecules. Nucleic Acids Res. 39(suppl_2), W486–W491 (2011)

    Google Scholar 

  3. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval: The Concepts and Technology Behind Search, 2nd edn. Pearson Education Ltd., Harlow (2011)

    Google Scholar 

  4. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  5. Didachos, C., Kintos, D.P., Fousteris, M., Gerogiannis, V.C., Son, L.H., Kanavos, A.: A cloud-based distributed computing approach for extracting molecular descriptors. In: 6th ACM International Conference on Algorithms, Computing and Systems (ICACS), pp. 20:1–20:6 (2022)

    Google Scholar 

  6. Didachos, C., Kintos, D.P., Fousteris, M., Mylonas, P., Kanavos, A.: An optimized cloud computing method for extracting molecular descriptors. In: GeNeDis 2022: Genetics, Geriatrics and Neurodegenerative Diseases Research, pp. 165–173 (2021)

    Google Scholar 

  7. Dudek, A.Z., Arodz, T., Galvez, J.: Computational methods in developing quantitative structure-activity relationships (QSAR): a review. Comb. Chem. High Throughput Screen. 9(3), 213–228 (2006)

    Article  Google Scholar 

  8. Duesbury, E., Holliday, J.D., Willett, P.: Maximum common subgraph isomorphism algorithms. MATCH Commun. Math. Comput. Chem. 77(2), 213–232 (2017)

    MathSciNet  MATH  Google Scholar 

  9. Giordano, D., Biancaniello, C., Argenio, M.A., Facchiano, A.: Drug design by pharmacophore and virtual screening approach. Pharmaceuticals 15(5), 646 (2022)

    Article  Google Scholar 

  10. Guimaraes, G.L., Sanchez-Lengeling, B., Outeiral, C., Farias, P.L.C., Aspuru-Guzik, A.: Objective-reinforced generative adversarial networks (organ) for sequence generation models. arXiv preprint arXiv:1705.10843 (2018)

  11. Hessler, G., Baringhaus, K.H.: Artificial intelligence in drug design. Molecules 23(10), 2520 (2018)

    Article  Google Scholar 

  12. Lavecchia, A.: Machine-learning approaches in drug discovery: methods and applications. Drug Discov. Today 20(3), 318–331 (2015)

    Article  Google Scholar 

  13. Lavecchia, A., Cerchia, C.: In silico methods to address polypharmacology: current status, applications and future perspectives. Drug Discov. Today 21(2), 288–298 (2016)

    Article  Google Scholar 

  14. Li, Q.: Application of fragment-based drug discovery to versatile targets. Front. Mol. Biosci. 7, 180 (2020)

    Article  Google Scholar 

  15. Lyu, J., et al.: Ultra-large library docking for discovering new chemotypes. Nature 566(7743), 224–229 (2019)

    Article  Google Scholar 

  16. Maggiora, G., Vogt, M., Stumpfe, D., Bajorath, J.: Molecular similarity in medicinal chemistry: miniperspective. J. Med. Chem. 57(8), 3186–3204 (2014)

    Article  Google Scholar 

  17. Mouchlis, V.D., et al.: Advances in de novo drug design: from conventional to machine learning methods. Int. J. Mol. Sci. 22(4), 1676 (2021)

    Article  Google Scholar 

  18. Olivecrona, M., Blaschke, T., Engkvist, O., Chen, H.: Molecular de-novo design through deep reinforcement learning. J. Cheminform. 9(1), 1–14 (2017)

    Article  Google Scholar 

  19. Öztürk, H., Özgür, A., Olmez, E.O.: DeepDTA: deep drug-target binding affinity prediction. Bioinformatics 34(17), i821–i829 (2018)

    Article  Google Scholar 

  20. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2007)

    Article  Google Scholar 

  21. Popova, M., Isayev, O., Tropsha, A.: Deep reinforcement learning for de novo drug design. Sci. Adv. 4(7), eaap7885 (2018)

    Google Scholar 

  22. Rodrigues, T., et al.: De novo fragment design for drug discovery and chemical biology. Angew. Chem. Int. Ed. 54(50), 15079–15083 (2015)

    Article  Google Scholar 

  23. Sanchez-Lengeling, B., Aspuru-Guzik, A.: Inverse molecular design using machine learning: generative models for matter engineering. Science 361(6400), 360–365 (2018)

    Article  Google Scholar 

  24. Schneider, G., Fechner, U.: Computer-based de novo design of drug-like molecules. Nat. Rev. Drug Discov. 4(8), 649–663 (2005)

    Article  Google Scholar 

  25. Stumpfe, D., Bajorath, J.: Similarity searching. Wiley Interdisc. Rev. Comput. Mol. Sci. 1(2), 260–282 (2011)

    Article  Google Scholar 

  26. Tropsha, A.: Best practices for QSAR model development, validation, and exploitation. Mol. Inf. 29(6–7), 476–488 (2010)

    Article  Google Scholar 

  27. Vamathevan, J., et al.: Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18(6), 463–477 (2019)

    Article  Google Scholar 

  28. Weininger, D.: Smiles, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28(1), 31–36 (1988)

    Google Scholar 

  29. Wu, H.Y., Chiang, C.W., Li, L.: Text mining for drug-drug interaction. Biomed. Lit. Min. 47–75 (2014)

    Google Scholar 

  30. Yan, J., Gao, K.: Research and exploration on the construction method of knowledge graph of water field based on text. In: 2nd IEEE ICISCAE, pp. 71–77 (2019)

    Google Scholar 

Download references

Acknowledgement

This research was co-financed by the European Union and Greek national funds through the “Competitiveness, Entrepreneurship and Innovation” Operational Programme 2014–2020, under the Call “Support for regional excellence”; project title: “Intelligent Research Infrastructure for Shipping, Transport and Supply Chain - ENIRISST+”; MIS code: 5047041.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Kanavos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Didachos, C., Kanavos, A. (2023). Extracting Knowledge from Recombinations of SMILES Representations. In: Maglogiannis, I., Iliadis, L., Papaleonidas, A., Chochliouros, I. (eds) Artificial Intelligence Applications and Innovations. AIAI 2023 IFIP WG 12.5 International Workshops. AIAI 2023. IFIP Advances in Information and Communication Technology, vol 677. Springer, Cham. https://doi.org/10.1007/978-3-031-34171-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34171-7_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34170-0

  • Online ISBN: 978-3-031-34171-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics