Skip to main content

Deep Learning, Grammar Transfer, and Transportation Theory

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12458))

  • 1465 Accesses

Abstract

Despite its widespread adoption and success, deep learning-based artificial intelligence is limited in providing an understandable decision-making process of what it does. This makes the “intelligence” part questionable since we expect real artificial intelligence to not only complete a given task but also perform in a way that is understandable. One way to approach this is to build a connection between artificial intelligence and human intelligence. Here, we use grammar transfer to demonstrate a paradigm that connects these two types of intelligence. Specifically, we define the action of transferring the knowledge learned by a recurrent neural network from one regular grammar to another grammar as grammar transfer. We are motivated by the theory that there is a natural correspondence between second-order recurrent neural networks and deterministic finite automata, which are uniquely associated with regular grammars. To study the process of grammar transfer, we propose a category based framework we denote as grammar transfer learning. Under this framework, we introduce three isomorphic categories and define ideal transfers by using transportation theory in operations research. By regarding the optimal transfer plan as a sensible operation from a human perspective, we then use it as a reference for examining whether a learning model behaves intelligently when performing the transfer task. Experiments under our framework demonstrate that this learning model can learn a grammar intelligently in general, but fails to follow the optimal way of learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    These methods propose to either decompose the output decision of a model into its input, which can have a linear or nonlinear form at the feature or instance level [5, 22, 26], or employ interpretable models to approximate the black box DL models [13, 36].

  2. 2.

    The algorithm introduced by [51] has exponential time complexity.

  3. 3.

    We compute the manipulation distance and provide the the results in Fig. 2c.

References

  1. Abbe, E., Sandon, C.: Provable limitations of deep learning. arXiv preprint arXiv:1812.06369 (2018)

  2. Arjonilla, F.J., Ogata, T.: General problem solving with category theory. arXiv preprint arXiv:1709.04825 (2017)

  3. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In International Conference on Machine Learning, pp. 214–223 (2017)

    Google Scholar 

  4. Awodey, S.: Category Theory. Oxford University Press, Oxford (2010)

    MATH  Google Scholar 

  5. Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS One 10(7), e0130140 (2015)

    Article  Google Scholar 

  6. Chen, D., et al.: Deep learning and alternative learning strategies for retrospective real-world clinical data. NPJ Digit. Med. 2(1), 1–5 (2019)

    Article  Google Scholar 

  7. Chollet, F.: The measure of intelligence. arXiv preprint arXiv:1911.01547 (2019)

  8. Courty, N., Flamary, R., Habrard, A., Rakotomamonjy, A.: Joint distribution optimal transportation for domain adaptation. In: Advances in Neural Information Processing Systems, pp. 3730–3739 (2017)

    Google Scholar 

  9. Courty, N., Flamary, R., Tuia, D., Rakotomamonjy, A.: Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1853–1865 (2016)

    Article  Google Scholar 

  10. De la Higuera, C.: Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, Cambridge (2010)

    Book  Google Scholar 

  11. Falqueto, J., Lima, W.C., Borges, P.S.S., Barreto, J.M.: The measurement of artificial intelligence: an IQ for machines. In: Proceedings of The International Conference on Modeling, Identification and Control, Insbruck, Austria. Citeseer (2001)

    Google Scholar 

  12. Fong, B., Spivak, D., Tuyéras, R.: Backprop as functor: a compositional perspective on supervised learning. In: 2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pp. 1–13. IEEE (2019)

    Google Scholar 

  13. Frosst, N., Hinton, G.E.: Distilling a neural network into a soft decision tree. In: CEx@AI*IA, Volume 2071 of CEUR Workshop Proceedings. CEUR-WS.org (2017)

    Google Scholar 

  14. Gayraud, N.T.H., Rakotomamonjy, A., Clerc, M.: Optimal transport applied to transfer learning for P300 detection (2017)

    Google Scholar 

  15. Giles, C.L., Miller, C.B., Chen, D., Chen, H.-H., Sun, G.-Z., Lee, Y.-C.: Learning and extracting finite state automata with second-order recurrent neural networks. Neural Comput. 4(3), 393–405 (1992)

    Article  Google Scholar 

  16. Giles, C.L., Sun, G.-Z., Chen, H.-H., Lee, Y.-C., Chen, D.: Higher order recurrent networks and grammatical inference. In: Advances in Neural Information Processing Systems, pp. 380–387 (1990)

    Google Scholar 

  17. Healy, M.J.: Category theory applied to neural modeling and graphical representations. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, vol. 3, pp. 35–40. IEEE (2000)

    Google Scholar 

  18. Healy, M.J., Caudell, T.P.: Neural networks, knowledge and cognition: a mathematical semantic model based upon category theory (2004)

    Google Scholar 

  19. Healy, M.J., Caudell, T.P.: Ontologies and worlds in category theory: implications for neural systems. Axiomathes 16(1–2), 165–214 (2006)

    Article  Google Scholar 

  20. Hopcroft, J.E.: Introduction to Automata Theory, Languages, and Computation. Pearson Education India (2008)

    Google Scholar 

  21. Hou, B.-J., Zhou, Z.-H.: Learning with interpretable structure from RNN. CoRR, abs/1810.10708 (2018)

    Google Scholar 

  22. Koh, P.W., Liang, P.: Understanding black-box predictions via influence functions. In: ICML, Volume 70 of Proceedings of Machine Learning Research, pp. 1885–1894. PMLR (2017)

    Google Scholar 

  23. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  24. Lewis, M.: Compositionality for recursive neural networks. arXiv preprint arXiv:1901.10723 (2019)

  25. Lu, Y., Chen, L., Saidi, A.: Optimal transport for deep joint transfer learning. arXiv preprint arXiv:1709.02995 (2017)

  26. Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. In: NIPS, pages 4765–4774 (2017)

    Google Scholar 

  27. Lane, S.M.: Categories for the Working Mathematician, vol. 5. Springer, Heidelberg (2013)

    MATH  Google Scholar 

  28. Marcus, G.: Deep learning: a critical appraisal. arXiv preprint arXiv:1801.00631 (2018)

  29. Mitchell, M.: Artificial Intelligence: A Guide for Thinking Humans. Penguin UK (2019)

    Google Scholar 

  30. Monge, G.: Mémoire sur la théorie des déblais et des remblais. Histoire de l’Académie Royale des Sciences de Paris (1781)

    Google Scholar 

  31. Navarrete, J.A., Dartnell, P.: Towards a category theory approach to analogy: analyzing re-representation and acquisition of numerical knowledge. PLoS Comput. Biol. 13(8), e1005683 (2017)

    Article  Google Scholar 

  32. Omlin, C.W., Giles, C.L.: Constructing deterministic finite-state automata in recurrent neural networks. Neural Comput. 8(4), 675–696 (1996)

    Article  Google Scholar 

  33. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)

    Article  Google Scholar 

  34. Papamakarios, G.: Distilling model knowledge. arXiv preprint arXiv:1510.02437 (2015)

  35. Rabusseau, G., Li, T., Precup, D.: Connecting weighted automata and recurrent neural networks through spectral learning. In: AISTATS, volume 89 of Proceedings of Machine Learning Research, pp. 1630–1639. PMLR (2019)

    Google Scholar 

  36. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?": explaining the predictions of any classifier. In: KDD, pp. 1135–1144. ACM (2016)

    Google Scholar 

  37. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Malaysia (2016)

    MATH  Google Scholar 

  38. Schlag, I., Schmidhuber, J.: Learning to reason with third order tensor products. In: Advances in Neural Information Processing Systems, pp. 9981–9993 (2018)

    Google Scholar 

  39. Shen, J., Qu, Y., Zhang, W., Yu, Y.: Wasserstein distance guided representation learning for domain adaptation. arXiv preprint arXiv:1707.01217 (2017)

  40. Smith, G.: The AI Delusion. Oxford University Press, Oxford (2018)

    Book  Google Scholar 

  41. Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 270–279. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7_27

    Chapter  Google Scholar 

  42. Tomita, M.: Learning of construction of finite automata from examples using hill-climbing. RR: Regular set recognizer. Technical report, Carnegie-Mellon Univ Pittsburgh PA Dept of Computer Science (1982)

    Google Scholar 

  43. Villani, C.: Topics in Optimal Transportation. Number 58. American Mathematical Society (2003)

    Google Scholar 

  44. Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-71050-9

    Book  MATH  Google Scholar 

  45. Waldrop, M.M.: News feature: what are the limits of deep learning? Proc. Natl. Acad. Sci. 116(4), 1074–1077 (2019)

    Article  Google Scholar 

  46. Wang, J., Bao, W., Sun, L., Zhu, X., Cao, B., Philip, S.Y.: Private model compression via knowledge distillation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 1190–1197 (2019)

    Google Scholar 

  47. Wang, Q., Zhang, K., Liu, X., Giles, C.L.: Connecting first and second order recurrent networks with deterministic finite automata. arXiv preprint arXiv:1911.04644 (2019)

  48. Weinberg, S.: What is quantum field theory, and what did we think it is? arXiv preprint hep-th/9702027 (1997)

    Google Scholar 

  49. Weiss, G., Goldberg, Y., Yahav, E.: Extracting automata from recurrent neural networks using queries and counterexamples. arXiv preprint arXiv:1711.09576 (2017)

  50. Wu, Y., Zhang, S., Zhang, Y., Bengio, Y., Salakhutdinov, R.R.: On multiplicative integration with recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 2856–2864 (2016)

    Google Scholar 

  51. Zhang, K., Wang, Q., Liu, X., Giles, C.L.: Shapley homology: topological analysis of sample influence for neural networks. Neural Comput. 32(7), 1355–1378 (2020)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaixuan Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, K., Wang, Q., Lee Giles, C. (2021). Deep Learning, Grammar Transfer, and Transportation Theory. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12458. Springer, Cham. https://doi.org/10.1007/978-3-030-67661-2_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67661-2_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67660-5

  • Online ISBN: 978-3-030-67661-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics