ABSTRACT
In this study, we proposeanovel method for the inverse QSAR/QSPR. Given a set of chemical compounds G and their values a(G) of a chemical property, we define a feature vector f(G) of each chemical compound G. By using a set of feature vectors as training data, the first phase of our method constructs a prediction function ψ with an artifcial neural network (ANN) so that ψ(f (G)) takes a value nearly equal to a(G) for many chemical compounds G in the set. Given a target value a* of the chemical property, the second phase infers a chemical structure G* such that a(G*) = a* in the following way. We compute a vector f* such thatψ(f*) = a*, where finding such a vector f* is formulated as a mixed integer linear programming problem (MILP). Finally we generate a chemical structure G* such that f (G*) = f*. For acyclic chemical compounds and some chemical properties such as heat of formation, boiling point, and retention time, we conducted some computational experiments with our method.
- Tatsuya Akutsu, Daiji Fukagawa, Jesper Jansson, and Kunihiko Sadakane. 2012. Inferring a Graph From Path Frequency. Discrete Applied Mathematics 160, 10-11 (2012), 1416--1428.Google ScholarDigital Library
- Tatsuya Akutsu and Hiroshi Nagamochi. 2019. A Mixed Integer Linear Programming Formulation to Artificial Neural Networks. In Proceedings of the 2nd International Conference on Information Science and Systems. ACM, 215--220.Google ScholarDigital Library
- Tatsuya Akutsu and Hiroshi Nagamochi. 2019. A Mixed Integer Linear Programming Formulation to Artificial Neural Networks. Technical Report 2019-001. Department of Applied Mathematics and Physics, Kyoto University. http://www.amp.i.kyoto-u.ac.jp/tecrep/index.htmlGoogle Scholar
- Hiroki Fujiwara, Jiexun Wang, Liang Zhao, Hiroshi Nagamochi, and Tatsuya Akutsu. 2008. Enumerating treelike chemical graphs with given path frequency. Journal of Chemical Information and Modeling 48, 7 (2008), 1345--1357.Google ScholarCross Ref
- Rafael Gómez-Bombarelli, Jennifer N. Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alán Aspuru-Guzik. 2018. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science 4, 2 (2018), 268--276.Google ScholarCross Ref
- IBM ILOG CPLEX Optimization Studio 12.8Google Scholar
- Hisaki Ikebata, Kenta Hongo, Tetsu Isomura, Ryo Maezono, and Ryo Yoshida. 2017. Bayesian molecular design with a chemical language model. Journal of Computer-Aided Molecular Design 31, 4 (2017), 379--391.Google ScholarCross Ref
- Mehdi Jalali-Heravi and Mohammad Hossein Fatemi. 2001. Artificial neural network modeling of Kovats retention indices for noncyclic and monocyclic terpenes. Journal of Chromatography A 915, 1-2 (2001), 177--183.Google ScholarCross Ref
- Adalbert Kerber, Reinhard Laue, Thomas Grüner, and Markus Meringer. 1998. MOLGEN 4.0. Match Communications in Mathematical and in Computer Chemistry 37 (1998), 205--208.Google Scholar
- Matt J. Kusner, Brooks Paige, and José Miguel Hernández-Lobato. 2017. Grammar variational autoencoder. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. 1945--1954.Google ScholarDigital Library
- Jinghui Li, Hiroshi Nagamochi, and Tatsuya Akutsu. 2016. Enumerating substituted benzene isomers of tree-like chemical graphs. IEEE/ACM Transactions on Computational Biology and Bioinformatics 15, 2 (2016), 633--646.Google ScholarCross Ref
- Pengyu Liu, Yu Bao, Morihiro Hayashida, and Tatsuya Akutsu. 2017. Finding preimages for neural networks: an integer linear programming approach. Poster abstract, 17th International Workshop on Bioinformatics and Systems Biology.Google Scholar
- Tomoyuki Miyao, Hiromasa Kaneko, and Kimito Funatsu. 2016. Inverse QSPR/QSAR analysis for chemical structure generation (from y to x). Journal of Chemical Information and Modeling 56, 2 (2016), 286--299.Google ScholarCross Ref
- Hiroshi Nagamochi. 2009. A detachment algorithm for inferring a graph from path frequency. Algorithmica 53, 2 (2009), 207--224.Google ScholarDigital Library
- Jean-Louis Reymond. 2015. The chemical space project. Accounts of Chemical Research 48, 3 (2015), 722--730.Google ScholarCross Ref
- Kunal Roy and Achintya Saha. 2003. Comparative QSPR studies with molecular connectivity, molecular negentropy and TAU indices. Part I: Molecular thermo-chemical properties of diverse functional acyclic compounds. Journal of Molecular Modeling 9, 4 (2003), 259--270.Google ScholarCross Ref
- Chetan Rupakheti, Aaron Virshup, Weitao Yang, and David N. Beratan. 2015. Strategy to discover diverse optimal molecules in the small molecule universe. Journal of Chemical Information and Modeling 55, 3 (2015), 529--537.Google ScholarCross Ref
- Marwin H.S. Segler, Thierry Kogej, Christian Tyrchan, and Mark P. Waller. 2017. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Central Science 4, 1 (2017), 120--131.Google ScholarCross Ref
- Mariya I. Skvortsova, Igor I. Baskin, Olga L. Slovokhotova, Vladimir A. Palyulin, and Nikolai S. Zefirov. 1993. Inverse problem in QSAR/QSPR studies for the case of topological indexes characterizing molecular shape (Kier indices). Journal of Chemical Information and Computer Sciences 33, 4 (1993), 630--634.Google ScholarCross Ref
- Xiufeng Yang, Jinzhe Zhang, Kazuki Yoshizoe, Kei Terayama, and Koji Tsuda. 2017. ChemTS: an efficient python library for de novo molecular generation. Science and Technology of Advanced Materials 18, 1 (2017), 972--976.Google ScholarCross Ref
Index Terms
- A Method for the Inverse QSAR/QSPR Based on Artificial Neural Networks and Mixed Integer Linear Programming
Recommendations
An Inverse QSAR Method Based on Decision Tree and Integer Programming
Intelligent Computing Theories and ApplicationAbstractRecently a novel framework has been proposed for designing the molecular structure of chemical compounds using both artificial neural networks (ANNs) and mixed integer linear programming (MILP). In the framework, we first define a feature vector f(...
A Method for Molecular Design Based on Linear Regression and Integer Programming
ICBBB '22: Proceedings of the 2022 12th International Conference on Bioscience, Biochemistry and BioinformaticsRecently a novel framework has been proposed for designing the molecular structure of chemical compounds using both artificial neural networks (ANNs) and mixed integer linear programming (MILP). In the framework, we first define a feature vector of a ...
A Mixed Integer Linear Programming Formulation to Artificial Neural Networks
ICISS '19: Proceedings of the 2nd International Conference on Information Science and SystemsThis paper studies the problem of computing an input vector for a given pair of Artificial Neural Network (ANN) and output vector, which is a kind of inverse problem. This problem has potential applications in design of new objects, especially in design ...
Comments