Skip to main content

Neural Model-Based Similarity Prediction for Compounds with Unknown Structures

  • Conference paper
  • First Online:
Applied Informatics (ICAI 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1643))

Included in the following conference series:

  • 524 Accesses

Abstract

Compounds similarity analysis is widely used in many areas related to cheminformatics. Its calculation is straightforward when compounds structures are known. However, there are no methods to get similarity when this information is not available. Here we propose a novel approach to solve this problem. It generates compound representations from metabolic networks, and are use a neural network to predict similarity. The results show that generated embeddings preserve the neighborhood of the original metabolic graph, i.e. compounds participating into the same reactions are close together in the embedding space. Results for compounds with known structures show that the proposal allows to estimate the similarity with an error of less than 10%. In addition, a qualitative analysis of similarity shows that the prediction for compounds with unknown structure provides promising results using the generated embeddings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.genome.jp/pathway/map00010.

  2. 2.

    https://www.genome.jp/kegg/.

  3. 3.

    https://pubchem.ncbi.nlm.nih.gov/.

  4. 4.

    https://www.rdkit.org.

  5. 5.

    https://optuna.readthedocs.io/en/stable/reference/generated/optuna.samplers.TPESampler.html.

References

  1. Bajusz, D., Rácz, A., Héberger, K.: Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminformatics 7(1), 1–13 (2015). https://doi.org/10.1186/s13321-015-0069-3

    Article  Google Scholar 

  2. Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, vol. 24. Curran Associates, Inc. (2011)

    Google Scholar 

  3. Bergstra, J., Yamins, D., Cox, D.: Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 115–123. PMLR, Atlanta, Georgia, USA, 17–19 June 2013

    Google Scholar 

  4. Eugenio, B., Gerard Matias, D.P.L.: Evaluación de un modelo neuronal para la estimación de similaridad entre compuestos a partir de representaciones one-hot. In: 52st JAIIO Jornadas Argentinas de Informática - ASAI (2022)

    Google Scholar 

  5. Brown, R.D., Martin, Y.C.: Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection (1996)

    Google Scholar 

  6. Covington, P., Adams, J., Sargin, E.: Deep neural networks for youtube recommendations. In: Proceedings of the 10th ACM Conference on Recommender Systems, pp. 191–198. ACM, Boston Massachusetts USA, September 2016. https://doi.org/10.1145/2959100.2959190

  7. Durant, J.L., Leland, B.A., Henry, D.R., Nourse, J.G.: Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42(6), 1273–1280 (2002). https://doi.org/10.1021/ci010132r

    Article  Google Scholar 

  8. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM, San Francisco California USA, August 2016. https://doi.org/10.1145/2939672.2939754

  9. Haykin, S.: Neural Networks: a Comprehensive Foundation. Prentice Hall PTR, Hoboken (1994)

    Google Scholar 

  10. Hutter, F., Hoos, H., Leyton-Brown, K.: An efficient approach for assessing hyperparameter importance. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 32, pp. 754–762. PMLR, Bejing, China, 22–24 June 2014

    Google Scholar 

  11. McShan, D.C., Rao, S., Shah, I.: PathMiner: predicting metabolic pathways by heuristic search. Bioinformatics 19(13), 1692–1698 (2003)

    Article  Google Scholar 

  12. Muegge, I., Mukherjee, P.: An overview of molecular fingerprint similarity search in virtual screening. Expert Opin. Drug Discov. 11, 137–148 (2016). https://doi.org/10.1517/17460441.2016.1117070

  13. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710, August 2014. https://doi.org/10.1145/2623330.2623732, arXiv:1403.6652 [cs]

  14. Rahman, S.A., Advani, P., Schunk, R., Schrader, R., Schomburg, D.: Metabolic pathway analysis web service (pathway hunter tool at CUBIC). Bioinformatics 21(7), 1189–1193 (2005)

    Article  Google Scholar 

  15. Steck, H., Baltrunas, L., Elahi, E., Liang, D., Raimond, Y., Basilico, J.: Deep learning for recommender systems: a Netflix case study. AI Mag. 42(3), 7–18 (2021). https://doi.org/10.1609/aimag.v42i3.18140, number: 3

  16. Thomsen, J.U., Meyer, B.: Pattern recognition of the 1H NMR spectra of sugar alditols using a neural network. J. Magn. Reson. (1969) 84(1), 212–217 (1989). https://doi.org/10.1016/0022-2364(89)90021-8

  17. Tiwari, S.P.: Social media based recommender system for e- commerce platforms. Int. J. Res. Eng. Sci. (IJRES) 87–98 (2021)

    Google Scholar 

  18. Wager, S., Wang, S., Liang, P.S.: Dropout training as adaptive regularization, p. 9 (2013)

    Google Scholar 

  19. Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 1225–1234. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2939672.2939753

  20. Willett, P., Barnard, J.M., Downs, G.M.: Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983–996 (1998). https://doi.org/10.1021/ci9800211

    Article  Google Scholar 

  21. Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 478–487. PMLR, New York, 20–22 June 2016

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eugenio Borzone .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Borzone, E., Di Persia, L.E., Gerard, M. (2022). Neural Model-Based Similarity Prediction for Compounds with Unknown Structures. In: Florez, H., Gomez, H. (eds) Applied Informatics. ICAI 2022. Communications in Computer and Information Science, vol 1643. Springer, Cham. https://doi.org/10.1007/978-3-031-19647-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19647-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19646-1

  • Online ISBN: 978-3-031-19647-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics