Deep Sketched Output Kernel Regression for Structured Prediction

El Ahmad, Tamim; Yang, Junjie; Laforgue, Pierre; d’Alché-Buc, Florence

doi:10.1007/978-3-031-70352-2_6

Tamim El Ahmad¹³,
Junjie Yang¹³,
Pierre Laforgue¹⁴ &
…
Florence d’Alché-Buc¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14943))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

674 Accesses

Abstract

By leveraging the kernel trick in the output space, kernel-induced losses provide a principled way to define structured output prediction tasks for a wide variety of output modalities. In particular, they have been successfully used in the context of surrogate non-parametric regression, where the kernel trick is typically exploited in the input space as well. However, when inputs are images or texts, more expressive models such as deep neural networks seem more suited than non-parametric methods. In this work, we tackle the question of how to train neural networks to solve structured output prediction tasks, while still benefiting from the versatility and relevance of kernel-induced losses. We design a novel family of deep neural architectures, whose last layer predicts in a data-dependent finite-dimensional subspace of the infinite-dimensional output feature space deriving from the kernel-induced loss. This subspace is chosen as the span of the eigenfunctions of a randomly-approximated version of the empirical kernel covariance operator. Interestingly, this approach unlocks the use of gradient descent algorithms (and consequently of any neural architecture) for structured prediction. Experiments on synthetic tasks as well as real-world supervised graph prediction problems show the relevance of our method.

T. El Ahmad and J. Yang—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multilayer Graph Node Kernels: Stacking While Maintaining Convexity

Article 07 November 2017

Kernel Graph Convolutional Neural Networks

A unifying view of explicit and implicit feature maps of graph kernels

Article Open access 17 September 2019

Notes

1.
RDKit: Open-source cheminformatics. https://www.rdkit.org.
2.
https://github.com/cnedwards/text2mol.

References

Alaoui, A., Mahoney, M.W.: Fast randomized kernel ridge regression with statistical guarantees. In: NeurIPS, vol. 28 (2015)
Google Scholar
Bach, F.: Sharp analysis of low-rank kernel matrix approximations. In: COLT, pp. 185–209 (2013)
Google Scholar
Bakir, G., Hofmann, T., Smola, A.J., Schölkopf, B., Taskar, B.: Predicting Structured Data. The MIT Press, Cambridge (2007)
Google Scholar
Belanger, D., McCallum, A.: Structured prediction energy networks. In: ICML, pp. 983–992 (2016)
Google Scholar
Belanger, D., Yang, B., McCallum, A.: End-to-end learning for structured prediction energy networks. In: ICML, vol. 70, pp. 429–439 (2017)
Google Scholar
Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: EMNLP-IJCNLP, pp. 3615–3620. Hong Kong, China (2019)
Google Scholar
Borgwardt, K., Ghisu, E., Llinares-López, F., O’Bray, L., Rieck, B.: Graph kernels: state-of-the-art and future challenges. Found. Trends Mach. Learn. 13(5–6), 531–712 (2020)
Article Google Scholar
Borgwardt, K., Kriegel, H.: Shortest-path kernels on graphs. In: Fifth IEEE International Conference on Data Mining (ICDM’05), pp. 8 pp.– (2005)
Google Scholar
Brogat-Motte, L., Flamary, R., Brouard, C., Rousu, J., D’Alché-Buc, F.: Learning to predict graphs with fused Gromov-Wasserstein barycenters. In: ICML, vol. 162, pp. 2321–2335 (2022)
Google Scholar
Brouard, C., d’Alché-Buc, F., Szafranski, M.: Semi-supervised penalized output kernel regression for link prediction. In: ICML, pp. 593–600 (2011)
Google Scholar
Brouard, C., Shen, H., Dührkop, K., d’Alché-Buc, F., Böcker, S., Rousu, J.: Fast metabolite identification with input output kernel regression. Bioinformatics 32(12), 28–36 (2016)
Article Google Scholar
Brouard, C., Szafranski, M., d’Alché Buc, F.: Input output kernel regression: supervised and semi-supervised structured output prediction with operator-valued kernels. JMLR 17(1), 6105–6152 (2016)
MathSciNet Google Scholar
Cabannes, V.A., Bach, F., Rudi, A.: Fast rates for structured prediction. In: COLT, pp. 823–865 (2021)
Google Scholar
Ciliberto, C., Rosasco, L., Rudi, A.: A consistent regularization approach for structured prediction. In: NeurIPS, pp. 4412–4420 (2016)
Google Scholar
Ciliberto, C., Rosasco, L., Rudi, A.: A general framework for consistent structured prediction with implicit loss embeddings. JMLR 21(98), 1–67 (2020)
MathSciNet Google Scholar
Cortes, C., Mohri, M., Weston, J.: A general regression technique for learning transductions. In: ICML, pp. 153–160 (2005)
Google Scholar
Cortes, C., Mohri, M., Weston, J.: A general regression framework for learning string-to-string mappings. In: Predicting Structured Data (2007)
Google Scholar
Costa, F., Grave, K.D.: Fast neighborhood subgraph pairwise distance kernel. In: ICML, pp. 255–262 (2010)
Google Scholar
Deshwal, A., Doppa, J.R., Roth, D.: Learning and inference for structured prediction: a unifying perspective. In: IJCAI (2019)
Google Scholar
Drineas, P., Mahoney, M.W., Cristianini, N.: On the nyström method for approximating a gram matrix for improved kernel-based learning. JMLR 6(12) (2005)
Google Scholar
Edwards, C., Zhai, C., Ji, H.: Text2Mol: cross-modal molecule retrieval with natural language queries. In: EMNLP, pp. 595–607 (2021)
Google Scholar
El Ahmad, T., Brogat-Motte, L., Laforgue, P., d’Alché-Buc, F.: Sketch in, sketch out: accelerating both learning and inference for structured prediction with kernels (2023)
Google Scholar
El Ahmad, T., Laforgue, P., d’Alché Buc, F.: Fast kernel methods for generic lipschitz losses via $p$-sparsified sketches. TMLR (2023)
Google Scholar
Gärtner, T.: Kernels for Structured Data, Series in Machine Perception and Artificial Intelligence, vol. 72. WorldScientific (2008)
Google Scholar
Geurts, P., Wehenkel, L., d’Alché Buc, F.: Kernelizing the output of tree-based methods. In: ICML, pp. 345–352 (2006)
Google Scholar
Graber, C., Meshi, O., Schwing, A.: Deep structured prediction with nonlinear output transformations. In: NeurIPS, vol. 31 (2018)
Google Scholar
Gygli, M., Norouzi, M., Angelova, A.: Deep value networks learn to evaluate and iteratively refine structured outputs. In: ICML, p. 1341–1351 (2017)
Google Scholar
Hagberg, A.A., Schult, D.A., Swart, P.J.: Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in Science Conference, pp. 11–15 (2008)
Google Scholar
Hastings, J., et al.: ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res. 44(D1), D1214–D1219 (2016)
Article Google Scholar
Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 73–101 (1964)
Google Scholar
Jaeger, S., Fulle, S., Turk, S.: Mol2Vec: unsupervised machine learning approach with chemical intuition. J. Chem. Inf. Model. 58(1), 27–35 (2018)
Article Google Scholar
Jumper, J., et al.: Highly accurate protein structure prediction with alphafold. Nature 596(7873), 583–589 (2021)
Article Google Scholar
Kadri, H., Ghavamzadeh, M., Preux, P.: A generalized kernel approach to structured output learning. In: ICML, pp. 471–479 (2013)
Google Scholar
Kim, S., et al.: PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 47(D1), D1102–D1109 (2019)
Article Google Scholar
Kim, S., et al.: PubChem substance and compound databases. Nucleic Acids Res. 44(D1), D1202–D1213 (2016)
Article MathSciNet Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Kirillov, A., et al.: Segment anything. In: ICCV, pp. 4015–4026 (2023)
Google Scholar
Korba, A., Garcia, A., d’ Alché-Buc, F.: A structured prediction approach for label ranking. In: NeurIPS, vol. 31 (2018)
Google Scholar
Kpotufe, S., Sriperumbudur, B.K.: Gaussian sketching yields a J-L lemma in RKHS. In: Chiappa, S., Calandra, R. (eds.) AISTATS (2020)
Google Scholar
Lacotte, J., Pilanci, M.: Adaptive and oblivious randomized subspace methods for high-dimensional optimization: sharp analysis and lower bounds. IEEE Trans. Inf. Theory 68(5), 3281–3303 (2022)
Article MathSciNet Google Scholar
Laforgue, P., Lambert, A., Brogat-Motte, L., d’Alché Buc, F.: Duality in RKHSS with infinite dimensional outputs: application to robust losses. In: ICML, pp. 5598–5607 (2020)
Google Scholar
LeCun, Y., Chopra, S., Hadsell, R., Ranzato, A., Huang, F.J.: A tutorial on energy-based learning. In: Predicting Structured Data (2006)
Google Scholar
Lee, J.Y., Patel, D., Goyal, P., Zhao, W., Xu, Z., McCallum, A.: Structured energy network as a loss. In: NeurIPS (2022)
Google Scholar
Li, Z., Ton, J.F., Oglic, D., Sejdinovic, D.: Towards a unified analysis of random Fourier features. JMLR 22(108), 1–51 (2021)
MathSciNet Google Scholar
Mahoney, M.W., et al.: Randomized algorithms for matrices and data. Found. Trends® Mach. Learn. 3(2), 123–224 (2011)
Google Scholar
Meanti, G., Carratino, L., Rosasco, L., Rudi, A.: Kernel methods through the roof: handling billions of points efficiently. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 33 (2020)
Google Scholar
Meanti, G., Chatalic, A., Kostic, V., Novelli, P., Pontil, M., Rosasco, L.: Estimating Koopman operators with sketching to provably learn large scale dynamical systems. In: NeurIPS (2023)
Google Scholar
Nikolentzos, G., Meladianos, P., Limnios, S., Vazirgiannis, M.: A degeneracy framework for graph similarity. In: IJCAI (2018)
Google Scholar
Nowak, A., Bach, F., Rudi, A.: Sharp analysis of learning with discrete losses. In: AISTAT (2019)
Google Scholar
Nowak, A., Bach, F., Rudi, A.: Consistent structured prediction with max-min margin Markov networks. In: ICML (2020)
Google Scholar
Nowozin, S., Lampert, C.H.: Structured learning and prediction in computer vision. Found. Trends Comput. Graph. Vision 6 (2011)
Google Scholar
Rahimi, A., Recht, B.: Random features for large scale kernel machines. In: NeurIPS, vol. 20, pp. 1177–1184 (2007)
Google Scholar
Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18(8), 1093–1110 (2005)
Article Google Scholar
Ramakrishnan, R., Dral, P.O., Rupp, M., von Lilienfeld, O.A.: Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1 (2014)
Google Scholar
Ruddigkeit, L., van Deursen, R., Blum, L.C., Reymond, J.L.: Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Modeli. 52(11) (2012)
Google Scholar
Rudi, A., Camoriano, R., Rosasco, L.: Less is more: Nyström computational regularization. In: NeurIPS, vol. 28 (2015)
Google Scholar
Rudi, A., Rosasco, L.: Generalization properties of learning with random features. In: NeurIPS, pp. 3215–3225 (2017)
Google Scholar
Schölkopf, B., Smola, A., Müller, K.R.: Kernel principal component analysis. In: ICANN (1997)
Google Scholar
Seidman, S.B.: Network structure and minimum degree. Soc. Netw. 5(3), 269–287 (1983)
Article MathSciNet Google Scholar
Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-Lehman graph kernels. JMLR (2011)
Google Scholar
Siglidis, G., Nikolentzos, G., Limnios, S., Giatsidis, C., Skianis, K., Vazirgiannis, M.: Grakel: a graph kernel library in python. JMLR (2020)
Google Scholar
Steinwart, I., Christmann, A.: Sparsity of SVMs that use the epsilon-insensitive loss. In: NeurIPS (2008)
Google Scholar
Sterge, N., Sriperumbudur, B., Rosasco, L., Rudi, A.: Gain with no pain: efficiency of kernel-PCA by Nyström sampling. In: AISTATS (2020)
Google Scholar
Sterge, N., Sriperumbudur, B.K.: Statistical optimality and computational efficiency of Nystrom kernel PCA. JMLR 23(337), 1–32 (2022)
MathSciNet Google Scholar
Tanimoto, T.: An Elementary Mathematical Theory of Classification and Prediction. International Business Machines Corporation (1958)
Google Scholar
Tripp, A., Bacallado, S., Singh, S., Hernández-Lobato, J.M.: Tanimoto random features for scalable molecular machine learning. In: NeurIPS (2023)
Google Scholar
Tu, L., Gimpel, K.: Learning approximate inference networks for structured prediction. In: ICLR (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Vayer, T., Courty, N., Tavenard, R., Laetitia, C., Flamary, R.: Optimal transport for structured data with application on graphs. In: ICML (2019)
Google Scholar
Weisfeiler, B., Leman, A.: The reduction of a graph to canonical form and the algebra which appears therein. NTI, Ser. 2(9), 12–16 (1968)
Google Scholar
Weston, J., Chapelle, O., Vapnik, V., Elisseeff, A., Schölkopf, B.: Kernel dependency estimation. In: NeurIPS, pp. 897–904. MIT Press (2003)
Google Scholar
Williams, C., Seeger, M.: Using the Nyström method to speed up kernel machines. In: NeurIPS, vol. 13, pp. 682–688 (2001)
Google Scholar
Woodruff, D.P.: Sketching as a tool for numerical linear algebra. Found. Trends Theor. Comput. Sci. 10(1–2), 1–157 (2014)
Article MathSciNet Google Scholar
Yang, T., Li, Y.F., Mahdavi, M., Jin, R., Zhou, Z.H.: Nyström method vs random Fourier features: a theoretical and empirical comparison. In: NeurIPS, vol. 25 (2012)
Google Scholar
Yang, Y., Pilanci, M., Wainwright, M.J., et al.: Randomized sketches for kernels: fast and optimal nonparametric regression. Ann. Stat. 45(3), 991–1023 (2017)
Article MathSciNet Google Scholar
Zhao, W., Zhou, D., Cao, B., Zhang, K., Chen, J.: Adversarial modality alignment network for cross-modal molecule retrieval. IEEE Trans. Artif. Intell. 5(1) (2024)
Google Scholar

Download references

Acknowledgments

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or European Commission. Neither the European Union nor the granting authority can be held responsible for them. This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement 101120237 (ELIAS), the Télécom Paris research chair on Data Science and Artificial Intelligence for Digitalized Industry and Services (DSAIDIS) and the PEPR-IA through the project FOUNDRY.

Author information

Authors and Affiliations

LTCI, Télécom Paris, IP Paris, Palaiseau, France
Tamim El Ahmad, Junjie Yang & Florence d’Alché-Buc
Department of Computer Science, University of Milan, Milan, Italy
Pierre Laforgue

Authors

Tamim El Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Junjie Yang
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Laforgue
View author publications
You can also search for this author in PubMed Google Scholar
Florence d’Alché-Buc
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tamim El Ahmad .

Editor information

Editors and Affiliations

LTCI, Télécom Paris, Palaiseau Cedex, France
Albert Bifet
KU Leuven, Leuven, Belgium
Jesse Davis
Faculty of Informatics, Vytautas Magnus University, Akademija, Lithuania
Tomas Krilavičius
Institute of Computer Science, University of Tartu, Tartu, Estonia
Meelis Kull
Department of Computer Science, Bundeswehr University Munich, Munich, Germany
Eirini Ntoutsi
Department of Computer Science, University of Helsinki, Helsinki, Finland
Indrė Žliobaitė

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 510 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

El Ahmad, T., Yang, J., Laforgue, P., d’Alché-Buc, F. (2024). Deep Sketched Output Kernel Regression for Structured Prediction. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14943. Springer, Cham. https://doi.org/10.1007/978-3-031-70352-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-70352-2_6
Published: 22 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70351-5
Online ISBN: 978-3-031-70352-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Deep Sketched Output Kernel Regression for Structured Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multilayer Graph Node Kernels: Stacking While Maintaining Convexity

Kernel Graph Convolutional Neural Networks

A unifying view of explicit and implicit feature maps of graph kernels

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

1 Electronic supplementary material

Supplementary material 1 (pdf 510 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Deep Sketched Output Kernel Regression for Structured Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multilayer Graph Node Kernels: Stacking While Maintaining Convexity

Kernel Graph Convolutional Neural Networks

A unifying view of explicit and implicit feature maps of graph kernels

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

1 Electronic supplementary material

Supplementary material 1 (pdf 510 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation