Abstract
By leveraging the kernel trick in the output space, kernel-induced losses provide a principled way to define structured output prediction tasks for a wide variety of output modalities. In particular, they have been successfully used in the context of surrogate non-parametric regression, where the kernel trick is typically exploited in the input space as well. However, when inputs are images or texts, more expressive models such as deep neural networks seem more suited than non-parametric methods. In this work, we tackle the question of how to train neural networks to solve structured output prediction tasks, while still benefiting from the versatility and relevance of kernel-induced losses. We design a novel family of deep neural architectures, whose last layer predicts in a data-dependent finite-dimensional subspace of the infinite-dimensional output feature space deriving from the kernel-induced loss. This subspace is chosen as the span of the eigenfunctions of a randomly-approximated version of the empirical kernel covariance operator. Interestingly, this approach unlocks the use of gradient descent algorithms (and consequently of any neural architecture) for structured prediction. Experiments on synthetic tasks as well as real-world supervised graph prediction problems show the relevance of our method.
T. El Ahmad and J. Yang—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
RDKit: Open-source cheminformatics. https://www.rdkit.org.
- 2.
References
Alaoui, A., Mahoney, M.W.: Fast randomized kernel ridge regression with statistical guarantees. In: NeurIPS, vol. 28 (2015)
Bach, F.: Sharp analysis of low-rank kernel matrix approximations. In: COLT, pp. 185–209 (2013)
Bakir, G., Hofmann, T., Smola, A.J., Schölkopf, B., Taskar, B.: Predicting Structured Data. The MIT Press, Cambridge (2007)
Belanger, D., McCallum, A.: Structured prediction energy networks. In: ICML, pp. 983–992 (2016)
Belanger, D., Yang, B., McCallum, A.: End-to-end learning for structured prediction energy networks. In: ICML, vol. 70, pp. 429–439 (2017)
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: EMNLP-IJCNLP, pp. 3615–3620. Hong Kong, China (2019)
Borgwardt, K., Ghisu, E., Llinares-López, F., O’Bray, L., Rieck, B.: Graph kernels: state-of-the-art and future challenges. Found. Trends Mach. Learn. 13(5–6), 531–712 (2020)
Borgwardt, K., Kriegel, H.: Shortest-path kernels on graphs. In: Fifth IEEE International Conference on Data Mining (ICDM’05), pp. 8 pp.– (2005)
Brogat-Motte, L., Flamary, R., Brouard, C., Rousu, J., D’Alché-Buc, F.: Learning to predict graphs with fused Gromov-Wasserstein barycenters. In: ICML, vol. 162, pp. 2321–2335 (2022)
Brouard, C., d’Alché-Buc, F., Szafranski, M.: Semi-supervised penalized output kernel regression for link prediction. In: ICML, pp. 593–600 (2011)
Brouard, C., Shen, H., Dührkop, K., d’Alché-Buc, F., Böcker, S., Rousu, J.: Fast metabolite identification with input output kernel regression. Bioinformatics 32(12), 28–36 (2016)
Brouard, C., Szafranski, M., d’Alché Buc, F.: Input output kernel regression: supervised and semi-supervised structured output prediction with operator-valued kernels. JMLR 17(1), 6105–6152 (2016)
Cabannes, V.A., Bach, F., Rudi, A.: Fast rates for structured prediction. In: COLT, pp. 823–865 (2021)
Ciliberto, C., Rosasco, L., Rudi, A.: A consistent regularization approach for structured prediction. In: NeurIPS, pp. 4412–4420 (2016)
Ciliberto, C., Rosasco, L., Rudi, A.: A general framework for consistent structured prediction with implicit loss embeddings. JMLR 21(98), 1–67 (2020)
Cortes, C., Mohri, M., Weston, J.: A general regression technique for learning transductions. In: ICML, pp. 153–160 (2005)
Cortes, C., Mohri, M., Weston, J.: A general regression framework for learning string-to-string mappings. In: Predicting Structured Data (2007)
Costa, F., Grave, K.D.: Fast neighborhood subgraph pairwise distance kernel. In: ICML, pp. 255–262 (2010)
Deshwal, A., Doppa, J.R., Roth, D.: Learning and inference for structured prediction: a unifying perspective. In: IJCAI (2019)
Drineas, P., Mahoney, M.W., Cristianini, N.: On the nyström method for approximating a gram matrix for improved kernel-based learning. JMLR 6(12) (2005)
Edwards, C., Zhai, C., Ji, H.: Text2Mol: cross-modal molecule retrieval with natural language queries. In: EMNLP, pp. 595–607 (2021)
El Ahmad, T., Brogat-Motte, L., Laforgue, P., d’Alché-Buc, F.: Sketch in, sketch out: accelerating both learning and inference for structured prediction with kernels (2023)
El Ahmad, T., Laforgue, P., d’Alché Buc, F.: Fast kernel methods for generic lipschitz losses via \(p\)-sparsified sketches. TMLR (2023)
Gärtner, T.: Kernels for Structured Data, Series in Machine Perception and Artificial Intelligence, vol. 72. WorldScientific (2008)
Geurts, P., Wehenkel, L., d’Alché Buc, F.: Kernelizing the output of tree-based methods. In: ICML, pp. 345–352 (2006)
Graber, C., Meshi, O., Schwing, A.: Deep structured prediction with nonlinear output transformations. In: NeurIPS, vol. 31 (2018)
Gygli, M., Norouzi, M., Angelova, A.: Deep value networks learn to evaluate and iteratively refine structured outputs. In: ICML, p. 1341–1351 (2017)
Hagberg, A.A., Schult, D.A., Swart, P.J.: Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in Science Conference, pp. 11–15 (2008)
Hastings, J., et al.: ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res. 44(D1), D1214–D1219 (2016)
Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 73–101 (1964)
Jaeger, S., Fulle, S., Turk, S.: Mol2Vec: unsupervised machine learning approach with chemical intuition. J. Chem. Inf. Model. 58(1), 27–35 (2018)
Jumper, J., et al.: Highly accurate protein structure prediction with alphafold. Nature 596(7873), 583–589 (2021)
Kadri, H., Ghavamzadeh, M., Preux, P.: A generalized kernel approach to structured output learning. In: ICML, pp. 471–479 (2013)
Kim, S., et al.: PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 47(D1), D1102–D1109 (2019)
Kim, S., et al.: PubChem substance and compound databases. Nucleic Acids Res. 44(D1), D1202–D1213 (2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Kirillov, A., et al.: Segment anything. In: ICCV, pp. 4015–4026 (2023)
Korba, A., Garcia, A., d’ Alché-Buc, F.: A structured prediction approach for label ranking. In: NeurIPS, vol. 31 (2018)
Kpotufe, S., Sriperumbudur, B.K.: Gaussian sketching yields a J-L lemma in RKHS. In: Chiappa, S., Calandra, R. (eds.) AISTATS (2020)
Lacotte, J., Pilanci, M.: Adaptive and oblivious randomized subspace methods for high-dimensional optimization: sharp analysis and lower bounds. IEEE Trans. Inf. Theory 68(5), 3281–3303 (2022)
Laforgue, P., Lambert, A., Brogat-Motte, L., d’Alché Buc, F.: Duality in RKHSS with infinite dimensional outputs: application to robust losses. In: ICML, pp. 5598–5607 (2020)
LeCun, Y., Chopra, S., Hadsell, R., Ranzato, A., Huang, F.J.: A tutorial on energy-based learning. In: Predicting Structured Data (2006)
Lee, J.Y., Patel, D., Goyal, P., Zhao, W., Xu, Z., McCallum, A.: Structured energy network as a loss. In: NeurIPS (2022)
Li, Z., Ton, J.F., Oglic, D., Sejdinovic, D.: Towards a unified analysis of random Fourier features. JMLR 22(108), 1–51 (2021)
Mahoney, M.W., et al.: Randomized algorithms for matrices and data. Found. Trends® Mach. Learn. 3(2), 123–224 (2011)
Meanti, G., Carratino, L., Rosasco, L., Rudi, A.: Kernel methods through the roof: handling billions of points efficiently. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 33 (2020)
Meanti, G., Chatalic, A., Kostic, V., Novelli, P., Pontil, M., Rosasco, L.: Estimating Koopman operators with sketching to provably learn large scale dynamical systems. In: NeurIPS (2023)
Nikolentzos, G., Meladianos, P., Limnios, S., Vazirgiannis, M.: A degeneracy framework for graph similarity. In: IJCAI (2018)
Nowak, A., Bach, F., Rudi, A.: Sharp analysis of learning with discrete losses. In: AISTAT (2019)
Nowak, A., Bach, F., Rudi, A.: Consistent structured prediction with max-min margin Markov networks. In: ICML (2020)
Nowozin, S., Lampert, C.H.: Structured learning and prediction in computer vision. Found. Trends Comput. Graph. Vision 6 (2011)
Rahimi, A., Recht, B.: Random features for large scale kernel machines. In: NeurIPS, vol. 20, pp. 1177–1184 (2007)
Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18(8), 1093–1110 (2005)
Ramakrishnan, R., Dral, P.O., Rupp, M., von Lilienfeld, O.A.: Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1 (2014)
Ruddigkeit, L., van Deursen, R., Blum, L.C., Reymond, J.L.: Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Modeli. 52(11) (2012)
Rudi, A., Camoriano, R., Rosasco, L.: Less is more: Nyström computational regularization. In: NeurIPS, vol. 28 (2015)
Rudi, A., Rosasco, L.: Generalization properties of learning with random features. In: NeurIPS, pp. 3215–3225 (2017)
Schölkopf, B., Smola, A., Müller, K.R.: Kernel principal component analysis. In: ICANN (1997)
Seidman, S.B.: Network structure and minimum degree. Soc. Netw. 5(3), 269–287 (1983)
Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-Lehman graph kernels. JMLR (2011)
Siglidis, G., Nikolentzos, G., Limnios, S., Giatsidis, C., Skianis, K., Vazirgiannis, M.: Grakel: a graph kernel library in python. JMLR (2020)
Steinwart, I., Christmann, A.: Sparsity of SVMs that use the epsilon-insensitive loss. In: NeurIPS (2008)
Sterge, N., Sriperumbudur, B., Rosasco, L., Rudi, A.: Gain with no pain: efficiency of kernel-PCA by Nyström sampling. In: AISTATS (2020)
Sterge, N., Sriperumbudur, B.K.: Statistical optimality and computational efficiency of Nystrom kernel PCA. JMLR 23(337), 1–32 (2022)
Tanimoto, T.: An Elementary Mathematical Theory of Classification and Prediction. International Business Machines Corporation (1958)
Tripp, A., Bacallado, S., Singh, S., Hernández-Lobato, J.M.: Tanimoto random features for scalable molecular machine learning. In: NeurIPS (2023)
Tu, L., Gimpel, K.: Learning approximate inference networks for structured prediction. In: ICLR (2018)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Vayer, T., Courty, N., Tavenard, R., Laetitia, C., Flamary, R.: Optimal transport for structured data with application on graphs. In: ICML (2019)
Weisfeiler, B., Leman, A.: The reduction of a graph to canonical form and the algebra which appears therein. NTI, Ser. 2(9), 12–16 (1968)
Weston, J., Chapelle, O., Vapnik, V., Elisseeff, A., Schölkopf, B.: Kernel dependency estimation. In: NeurIPS, pp. 897–904. MIT Press (2003)
Williams, C., Seeger, M.: Using the Nyström method to speed up kernel machines. In: NeurIPS, vol. 13, pp. 682–688 (2001)
Woodruff, D.P.: Sketching as a tool for numerical linear algebra. Found. Trends Theor. Comput. Sci. 10(1–2), 1–157 (2014)
Yang, T., Li, Y.F., Mahdavi, M., Jin, R., Zhou, Z.H.: Nyström method vs random Fourier features: a theoretical and empirical comparison. In: NeurIPS, vol. 25 (2012)
Yang, Y., Pilanci, M., Wainwright, M.J., et al.: Randomized sketches for kernels: fast and optimal nonparametric regression. Ann. Stat. 45(3), 991–1023 (2017)
Zhao, W., Zhou, D., Cao, B., Zhang, K., Chen, J.: Adversarial modality alignment network for cross-modal molecule retrieval. IEEE Trans. Artif. Intell. 5(1) (2024)
Acknowledgments
Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or European Commission. Neither the European Union nor the granting authority can be held responsible for them. This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement 101120237 (ELIAS), the Télécom Paris research chair on Data Science and Artificial Intelligence for Digitalized Industry and Services (DSAIDIS) and the PEPR-IA through the project FOUNDRY.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
El Ahmad, T., Yang, J., Laforgue, P., d’Alché-Buc, F. (2024). Deep Sketched Output Kernel Regression for Structured Prediction. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14943. Springer, Cham. https://doi.org/10.1007/978-3-031-70352-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-70352-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70351-5
Online ISBN: 978-3-031-70352-2
eBook Packages: Computer ScienceComputer Science (R0)