Abstract
Compounds similarity analysis is widely used in many areas related to cheminformatics. Its calculation is straightforward when compounds structures are known. However, there are no methods to get similarity when this information is not available. Here we propose a novel approach to solve this problem. It generates compound representations from metabolic networks, and are use a neural network to predict similarity. The results show that generated embeddings preserve the neighborhood of the original metabolic graph, i.e. compounds participating into the same reactions are close together in the embedding space. Results for compounds with known structures show that the proposal allows to estimate the similarity with an error of less than 10%. In addition, a qualitative analysis of similarity shows that the prediction for compounds with unknown structure provides promising results using the generated embeddings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bajusz, D., Rácz, A., Héberger, K.: Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminformatics 7(1), 1–13 (2015). https://doi.org/10.1186/s13321-015-0069-3
Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, vol. 24. Curran Associates, Inc. (2011)
Bergstra, J., Yamins, D., Cox, D.: Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 28, pp. 115–123. PMLR, Atlanta, Georgia, USA, 17–19 June 2013
Eugenio, B., Gerard Matias, D.P.L.: Evaluación de un modelo neuronal para la estimación de similaridad entre compuestos a partir de representaciones one-hot. In: 52st JAIIO Jornadas Argentinas de Informática - ASAI (2022)
Brown, R.D., Martin, Y.C.: Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection (1996)
Covington, P., Adams, J., Sargin, E.: Deep neural networks for youtube recommendations. In: Proceedings of the 10th ACM Conference on Recommender Systems, pp. 191–198. ACM, Boston Massachusetts USA, September 2016. https://doi.org/10.1145/2959100.2959190
Durant, J.L., Leland, B.A., Henry, D.R., Nourse, J.G.: Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42(6), 1273–1280 (2002). https://doi.org/10.1021/ci010132r
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM, San Francisco California USA, August 2016. https://doi.org/10.1145/2939672.2939754
Haykin, S.: Neural Networks: a Comprehensive Foundation. Prentice Hall PTR, Hoboken (1994)
Hutter, F., Hoos, H., Leyton-Brown, K.: An efficient approach for assessing hyperparameter importance. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 32, pp. 754–762. PMLR, Bejing, China, 22–24 June 2014
McShan, D.C., Rao, S., Shah, I.: PathMiner: predicting metabolic pathways by heuristic search. Bioinformatics 19(13), 1692–1698 (2003)
Muegge, I., Mukherjee, P.: An overview of molecular fingerprint similarity search in virtual screening. Expert Opin. Drug Discov. 11, 137–148 (2016). https://doi.org/10.1517/17460441.2016.1117070
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710, August 2014. https://doi.org/10.1145/2623330.2623732, arXiv:1403.6652 [cs]
Rahman, S.A., Advani, P., Schunk, R., Schrader, R., Schomburg, D.: Metabolic pathway analysis web service (pathway hunter tool at CUBIC). Bioinformatics 21(7), 1189–1193 (2005)
Steck, H., Baltrunas, L., Elahi, E., Liang, D., Raimond, Y., Basilico, J.: Deep learning for recommender systems: a Netflix case study. AI Mag. 42(3), 7–18 (2021). https://doi.org/10.1609/aimag.v42i3.18140, number: 3
Thomsen, J.U., Meyer, B.: Pattern recognition of the 1H NMR spectra of sugar alditols using a neural network. J. Magn. Reson. (1969) 84(1), 212–217 (1989). https://doi.org/10.1016/0022-2364(89)90021-8
Tiwari, S.P.: Social media based recommender system for e- commerce platforms. Int. J. Res. Eng. Sci. (IJRES) 87–98 (2021)
Wager, S., Wang, S., Liang, P.S.: Dropout training as adaptive regularization, p. 9 (2013)
Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 1225–1234. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2939672.2939753
Willett, P., Barnard, J.M., Downs, G.M.: Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983–996 (1998). https://doi.org/10.1021/ci9800211
Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 478–487. PMLR, New York, 20–22 June 2016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Borzone, E., Di Persia, L.E., Gerard, M. (2022). Neural Model-Based Similarity Prediction for Compounds with Unknown Structures. In: Florez, H., Gomez, H. (eds) Applied Informatics. ICAI 2022. Communications in Computer and Information Science, vol 1643. Springer, Cham. https://doi.org/10.1007/978-3-031-19647-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-19647-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19646-1
Online ISBN: 978-3-031-19647-8
eBook Packages: Computer ScienceComputer Science (R0)