Skip to main content

Deep Sketched Output Kernel Regression for Structured Prediction

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Research Track (ECML PKDD 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14943))

  • 674 Accesses

Abstract

By leveraging the kernel trick in the output space, kernel-induced losses provide a principled way to define structured output prediction tasks for a wide variety of output modalities. In particular, they have been successfully used in the context of surrogate non-parametric regression, where the kernel trick is typically exploited in the input space as well. However, when inputs are images or texts, more expressive models such as deep neural networks seem more suited than non-parametric methods. In this work, we tackle the question of how to train neural networks to solve structured output prediction tasks, while still benefiting from the versatility and relevance of kernel-induced losses. We design a novel family of deep neural architectures, whose last layer predicts in a data-dependent finite-dimensional subspace of the infinite-dimensional output feature space deriving from the kernel-induced loss. This subspace is chosen as the span of the eigenfunctions of a randomly-approximated version of the empirical kernel covariance operator. Interestingly, this approach unlocks the use of gradient descent algorithms (and consequently of any neural architecture) for structured prediction. Experiments on synthetic tasks as well as real-world supervised graph prediction problems show the relevance of our method.

T. El Ahmad and J. Yang—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    RDKit: Open-source cheminformatics. https://www.rdkit.org.

  2. 2.

    https://github.com/cnedwards/text2mol.

References

  1. Alaoui, A., Mahoney, M.W.: Fast randomized kernel ridge regression with statistical guarantees. In: NeurIPS, vol. 28 (2015)

    Google Scholar 

  2. Bach, F.: Sharp analysis of low-rank kernel matrix approximations. In: COLT, pp. 185–209 (2013)

    Google Scholar 

  3. Bakir, G., Hofmann, T., Smola, A.J., Schölkopf, B., Taskar, B.: Predicting Structured Data. The MIT Press, Cambridge (2007)

    Google Scholar 

  4. Belanger, D., McCallum, A.: Structured prediction energy networks. In: ICML, pp. 983–992 (2016)

    Google Scholar 

  5. Belanger, D., Yang, B., McCallum, A.: End-to-end learning for structured prediction energy networks. In: ICML, vol. 70, pp. 429–439 (2017)

    Google Scholar 

  6. Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. In: EMNLP-IJCNLP, pp. 3615–3620. Hong Kong, China (2019)

    Google Scholar 

  7. Borgwardt, K., Ghisu, E., Llinares-López, F., O’Bray, L., Rieck, B.: Graph kernels: state-of-the-art and future challenges. Found. Trends Mach. Learn. 13(5–6), 531–712 (2020)

    Article  Google Scholar 

  8. Borgwardt, K., Kriegel, H.: Shortest-path kernels on graphs. In: Fifth IEEE International Conference on Data Mining (ICDM’05), pp. 8 pp.– (2005)

    Google Scholar 

  9. Brogat-Motte, L., Flamary, R., Brouard, C., Rousu, J., D’Alché-Buc, F.: Learning to predict graphs with fused Gromov-Wasserstein barycenters. In: ICML, vol. 162, pp. 2321–2335 (2022)

    Google Scholar 

  10. Brouard, C., d’Alché-Buc, F., Szafranski, M.: Semi-supervised penalized output kernel regression for link prediction. In: ICML, pp. 593–600 (2011)

    Google Scholar 

  11. Brouard, C., Shen, H., Dührkop, K., d’Alché-Buc, F., Böcker, S., Rousu, J.: Fast metabolite identification with input output kernel regression. Bioinformatics 32(12), 28–36 (2016)

    Article  Google Scholar 

  12. Brouard, C., Szafranski, M., d’Alché Buc, F.: Input output kernel regression: supervised and semi-supervised structured output prediction with operator-valued kernels. JMLR 17(1), 6105–6152 (2016)

    MathSciNet  Google Scholar 

  13. Cabannes, V.A., Bach, F., Rudi, A.: Fast rates for structured prediction. In: COLT, pp. 823–865 (2021)

    Google Scholar 

  14. Ciliberto, C., Rosasco, L., Rudi, A.: A consistent regularization approach for structured prediction. In: NeurIPS, pp. 4412–4420 (2016)

    Google Scholar 

  15. Ciliberto, C., Rosasco, L., Rudi, A.: A general framework for consistent structured prediction with implicit loss embeddings. JMLR 21(98), 1–67 (2020)

    MathSciNet  Google Scholar 

  16. Cortes, C., Mohri, M., Weston, J.: A general regression technique for learning transductions. In: ICML, pp. 153–160 (2005)

    Google Scholar 

  17. Cortes, C., Mohri, M., Weston, J.: A general regression framework for learning string-to-string mappings. In: Predicting Structured Data (2007)

    Google Scholar 

  18. Costa, F., Grave, K.D.: Fast neighborhood subgraph pairwise distance kernel. In: ICML, pp. 255–262 (2010)

    Google Scholar 

  19. Deshwal, A., Doppa, J.R., Roth, D.: Learning and inference for structured prediction: a unifying perspective. In: IJCAI (2019)

    Google Scholar 

  20. Drineas, P., Mahoney, M.W., Cristianini, N.: On the nyström method for approximating a gram matrix for improved kernel-based learning. JMLR 6(12) (2005)

    Google Scholar 

  21. Edwards, C., Zhai, C., Ji, H.: Text2Mol: cross-modal molecule retrieval with natural language queries. In: EMNLP, pp. 595–607 (2021)

    Google Scholar 

  22. El Ahmad, T., Brogat-Motte, L., Laforgue, P., d’Alché-Buc, F.: Sketch in, sketch out: accelerating both learning and inference for structured prediction with kernels (2023)

    Google Scholar 

  23. El Ahmad, T., Laforgue, P., d’Alché Buc, F.: Fast kernel methods for generic lipschitz losses via \(p\)-sparsified sketches. TMLR (2023)

    Google Scholar 

  24. Gärtner, T.: Kernels for Structured Data, Series in Machine Perception and Artificial Intelligence, vol. 72. WorldScientific (2008)

    Google Scholar 

  25. Geurts, P., Wehenkel, L., d’Alché Buc, F.: Kernelizing the output of tree-based methods. In: ICML, pp. 345–352 (2006)

    Google Scholar 

  26. Graber, C., Meshi, O., Schwing, A.: Deep structured prediction with nonlinear output transformations. In: NeurIPS, vol. 31 (2018)

    Google Scholar 

  27. Gygli, M., Norouzi, M., Angelova, A.: Deep value networks learn to evaluate and iteratively refine structured outputs. In: ICML, p. 1341–1351 (2017)

    Google Scholar 

  28. Hagberg, A.A., Schult, D.A., Swart, P.J.: Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in Science Conference, pp. 11–15 (2008)

    Google Scholar 

  29. Hastings, J., et al.: ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res. 44(D1), D1214–D1219 (2016)

    Article  Google Scholar 

  30. Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 73–101 (1964)

    Google Scholar 

  31. Jaeger, S., Fulle, S., Turk, S.: Mol2Vec: unsupervised machine learning approach with chemical intuition. J. Chem. Inf. Model. 58(1), 27–35 (2018)

    Article  Google Scholar 

  32. Jumper, J., et al.: Highly accurate protein structure prediction with alphafold. Nature 596(7873), 583–589 (2021)

    Article  Google Scholar 

  33. Kadri, H., Ghavamzadeh, M., Preux, P.: A generalized kernel approach to structured output learning. In: ICML, pp. 471–479 (2013)

    Google Scholar 

  34. Kim, S., et al.: PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 47(D1), D1102–D1109 (2019)

    Article  Google Scholar 

  35. Kim, S., et al.: PubChem substance and compound databases. Nucleic Acids Res. 44(D1), D1202–D1213 (2016)

    Article  MathSciNet  Google Scholar 

  36. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  37. Kirillov, A., et al.: Segment anything. In: ICCV, pp. 4015–4026 (2023)

    Google Scholar 

  38. Korba, A., Garcia, A., d’ Alché-Buc, F.: A structured prediction approach for label ranking. In: NeurIPS, vol. 31 (2018)

    Google Scholar 

  39. Kpotufe, S., Sriperumbudur, B.K.: Gaussian sketching yields a J-L lemma in RKHS. In: Chiappa, S., Calandra, R. (eds.) AISTATS (2020)

    Google Scholar 

  40. Lacotte, J., Pilanci, M.: Adaptive and oblivious randomized subspace methods for high-dimensional optimization: sharp analysis and lower bounds. IEEE Trans. Inf. Theory 68(5), 3281–3303 (2022)

    Article  MathSciNet  Google Scholar 

  41. Laforgue, P., Lambert, A., Brogat-Motte, L., d’Alché Buc, F.: Duality in RKHSS with infinite dimensional outputs: application to robust losses. In: ICML, pp. 5598–5607 (2020)

    Google Scholar 

  42. LeCun, Y., Chopra, S., Hadsell, R., Ranzato, A., Huang, F.J.: A tutorial on energy-based learning. In: Predicting Structured Data (2006)

    Google Scholar 

  43. Lee, J.Y., Patel, D., Goyal, P., Zhao, W., Xu, Z., McCallum, A.: Structured energy network as a loss. In: NeurIPS (2022)

    Google Scholar 

  44. Li, Z., Ton, J.F., Oglic, D., Sejdinovic, D.: Towards a unified analysis of random Fourier features. JMLR 22(108), 1–51 (2021)

    MathSciNet  Google Scholar 

  45. Mahoney, M.W., et al.: Randomized algorithms for matrices and data. Found. Trends® Mach. Learn. 3(2), 123–224 (2011)

    Google Scholar 

  46. Meanti, G., Carratino, L., Rosasco, L., Rudi, A.: Kernel methods through the roof: handling billions of points efficiently. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 33 (2020)

    Google Scholar 

  47. Meanti, G., Chatalic, A., Kostic, V., Novelli, P., Pontil, M., Rosasco, L.: Estimating Koopman operators with sketching to provably learn large scale dynamical systems. In: NeurIPS (2023)

    Google Scholar 

  48. Nikolentzos, G., Meladianos, P., Limnios, S., Vazirgiannis, M.: A degeneracy framework for graph similarity. In: IJCAI (2018)

    Google Scholar 

  49. Nowak, A., Bach, F., Rudi, A.: Sharp analysis of learning with discrete losses. In: AISTAT (2019)

    Google Scholar 

  50. Nowak, A., Bach, F., Rudi, A.: Consistent structured prediction with max-min margin Markov networks. In: ICML (2020)

    Google Scholar 

  51. Nowozin, S., Lampert, C.H.: Structured learning and prediction in computer vision. Found. Trends Comput. Graph. Vision 6 (2011)

    Google Scholar 

  52. Rahimi, A., Recht, B.: Random features for large scale kernel machines. In: NeurIPS, vol. 20, pp. 1177–1184 (2007)

    Google Scholar 

  53. Ralaivola, L., Swamidass, S.J., Saigo, H., Baldi, P.: Graph kernels for chemical informatics. Neural Netw. 18(8), 1093–1110 (2005)

    Article  Google Scholar 

  54. Ramakrishnan, R., Dral, P.O., Rupp, M., von Lilienfeld, O.A.: Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1 (2014)

    Google Scholar 

  55. Ruddigkeit, L., van Deursen, R., Blum, L.C., Reymond, J.L.: Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Modeli. 52(11) (2012)

    Google Scholar 

  56. Rudi, A., Camoriano, R., Rosasco, L.: Less is more: Nyström computational regularization. In: NeurIPS, vol. 28 (2015)

    Google Scholar 

  57. Rudi, A., Rosasco, L.: Generalization properties of learning with random features. In: NeurIPS, pp. 3215–3225 (2017)

    Google Scholar 

  58. Schölkopf, B., Smola, A., Müller, K.R.: Kernel principal component analysis. In: ICANN (1997)

    Google Scholar 

  59. Seidman, S.B.: Network structure and minimum degree. Soc. Netw. 5(3), 269–287 (1983)

    Article  MathSciNet  Google Scholar 

  60. Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-Lehman graph kernels. JMLR (2011)

    Google Scholar 

  61. Siglidis, G., Nikolentzos, G., Limnios, S., Giatsidis, C., Skianis, K., Vazirgiannis, M.: Grakel: a graph kernel library in python. JMLR (2020)

    Google Scholar 

  62. Steinwart, I., Christmann, A.: Sparsity of SVMs that use the epsilon-insensitive loss. In: NeurIPS (2008)

    Google Scholar 

  63. Sterge, N., Sriperumbudur, B., Rosasco, L., Rudi, A.: Gain with no pain: efficiency of kernel-PCA by Nyström sampling. In: AISTATS (2020)

    Google Scholar 

  64. Sterge, N., Sriperumbudur, B.K.: Statistical optimality and computational efficiency of Nystrom kernel PCA. JMLR 23(337), 1–32 (2022)

    MathSciNet  Google Scholar 

  65. Tanimoto, T.: An Elementary Mathematical Theory of Classification and Prediction. International Business Machines Corporation (1958)

    Google Scholar 

  66. Tripp, A., Bacallado, S., Singh, S., Hernández-Lobato, J.M.: Tanimoto random features for scalable molecular machine learning. In: NeurIPS (2023)

    Google Scholar 

  67. Tu, L., Gimpel, K.: Learning approximate inference networks for structured prediction. In: ICLR (2018)

    Google Scholar 

  68. Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)

    Google Scholar 

  69. Vayer, T., Courty, N., Tavenard, R., Laetitia, C., Flamary, R.: Optimal transport for structured data with application on graphs. In: ICML (2019)

    Google Scholar 

  70. Weisfeiler, B., Leman, A.: The reduction of a graph to canonical form and the algebra which appears therein. NTI, Ser. 2(9), 12–16 (1968)

    Google Scholar 

  71. Weston, J., Chapelle, O., Vapnik, V., Elisseeff, A., Schölkopf, B.: Kernel dependency estimation. In: NeurIPS, pp. 897–904. MIT Press (2003)

    Google Scholar 

  72. Williams, C., Seeger, M.: Using the Nyström method to speed up kernel machines. In: NeurIPS, vol. 13, pp. 682–688 (2001)

    Google Scholar 

  73. Woodruff, D.P.: Sketching as a tool for numerical linear algebra. Found. Trends Theor. Comput. Sci. 10(1–2), 1–157 (2014)

    Article  MathSciNet  Google Scholar 

  74. Yang, T., Li, Y.F., Mahdavi, M., Jin, R., Zhou, Z.H.: Nyström method vs random Fourier features: a theoretical and empirical comparison. In: NeurIPS, vol. 25 (2012)

    Google Scholar 

  75. Yang, Y., Pilanci, M., Wainwright, M.J., et al.: Randomized sketches for kernels: fast and optimal nonparametric regression. Ann. Stat. 45(3), 991–1023 (2017)

    Article  MathSciNet  Google Scholar 

  76. Zhao, W., Zhou, D., Cao, B., Zhang, K., Chen, J.: Adversarial modality alignment network for cross-modal molecule retrieval. IEEE Trans. Artif. Intell. 5(1) (2024)

    Google Scholar 

Download references

Acknowledgments

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or European Commission. Neither the European Union nor the granting authority can be held responsible for them. This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement 101120237 (ELIAS), the Télécom Paris research chair on Data Science and Artificial Intelligence for Digitalized Industry and Services (DSAIDIS) and the PEPR-IA through the project FOUNDRY.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tamim El Ahmad .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 510 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

El Ahmad, T., Yang, J., Laforgue, P., d’Alché-Buc, F. (2024). Deep Sketched Output Kernel Regression for Structured Prediction. In: Bifet, A., Davis, J., Krilavičius, T., Kull, M., Ntoutsi, E., Žliobaitė, I. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14943. Springer, Cham. https://doi.org/10.1007/978-3-031-70352-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70352-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70351-5

  • Online ISBN: 978-3-031-70352-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics