Skip to main content

You Can’t Learn What’s Not There: Self Supervised Learning and the Poverty of the Stimulus

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12801))

Abstract

Diathesis alternation describes the property of language that individual verbs can be used in different subcategorization frames. However, seemingly similar verbs such as drizzle and spray can behave differently in terms of the alternations they can participate in (drizzle/spray water on the plant; *drizzle/spray the plant with water). By hypothesis, primary linguistic data is not sufficient to learn which verbs alternate and which do not. We tested two state-of-the-art machine learning models trained by self supervision, and found little evidence that they could learn the correct pattern of acceptability judgement in the locative alternation. This is consistent with a poverty of stimulus argument that primary linguistic data does not provide sufficient information to learn aspects of linguistic knowledge. The finding has important consequences for machine learning models trained by self supervision, since they depend on the evidence present in the raw training input.

Notes and Comments. This research was supported by the Project News Angler, which is funded by the Norwegian Research Council’s IKTPLUSS programme as project 275872.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.kaggle.com/c/cola-in-domain-open-evaluation/leaderboard.

References

  1. Baker, C.: Syntactic theory and the projection problem. Linguist. Inquiry 10, 533–581 (1979)

    Google Scholar 

  2. Bender, E.M., Koller, A.: Climbing towards NLU: on meaning, form, and understanding in the age of data. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5185–5198. Association for Computational Linguistics, July 2020. https://www.aclweb.org/anthology/2020.acl-main.463

  3. Berwick, R.C., Pietroski, P., Yankama, B., Chomsky, N.: Poverty of the stimulus revisited. Cogn. Sci. 35(7), 1207–1242 (2011). https://doi.org/10.1111/j.1551-6709.2011.01189.x. ISSN 1551-6709

    Article  Google Scholar 

  4. Chmomsky, N.: Aspects of the Theory of Syntax. MIT Press, Cambridge (1965)

    Google Scholar 

  5. Chomsky, N.: Rules and Representations. Columbia Classics in Philosophy. Columbia University Press (1980). https://books.google.no/books?id=KdYOYJwjFo0C. ISBN 9780231048279

  6. Cowie, F.: Innateness and language. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Fall 2017 edn (2017)

    Google Scholar 

  7. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) NAACL-HLT (1), pp. 4171–4186. Association for Computational Linguistics (2019). http://dblp.uni-trier.de/db/conf/naacl/naacl2019-1.html#DevlinCLT19. ISBN 978-1-950737-13-0

  8. Ettinger, A.: What BERT is not: lessons from a new suite of psycholinguistic diagnostics for language models. Trans. Assoc. Comput. Linguist. 8, 34–48 (2020). https://doi.org/10.1162/tacl_a_00298

    Article  Google Scholar 

  9. Harris, Z.S.: Distributional structure. WORD 10(2–3), 146–162 (1954). https://doi.org/10.1080/00437956.1954.11659520. ISSN 0043-7956

    Article  Google Scholar 

  10. Kassner, N., Schütze, H.: Negated and misprimed probes for pretrained language models: birds can talk, but cannot fly. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7811–7818. Association for Computational Linguistics, July 2020. https://www.aclweb.org/anthology/2020.acl-main.698

  11. Levin, B.: English Verb Classes and Alternations: A Preliminary Investigation. The University of Chicago Press, The University of Chicago (1993). ISBN 0-226-47532-8

    Google Scholar 

  12. Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. CoRR, abs/1907.11692 (2019). http://arxiv.org/abs/1907.11692

  13. Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Struct. 405(2), 442–451 (1975). https://doi.org/10.1016/0005-2795(75)90109-9. http://www.sciencedirect.com/science/article/pii/0005279575901099. ISSN 0005-2795

    Article  Google Scholar 

  14. Perfors, A., Tenenbaum, J.B., Wonnacott, E.: Variability, negative evidence, and the acquisition of verb argument constructions. J. Child Lang. 37(3), 607–642 (2010). https://doi.org/10.1017/S0305000910000012

    Article  Google Scholar 

  15. Pinker, S.: Learnability and Cognition: The Acquisition of Argument Structure (1989/2013), New edn. MIT Press, Cambridge (2013)

    Google Scholar 

  16. Pinker, S.: The Stuff of Thought : Language as a Window Into Human Nature. Viking, New York (2007)

    Google Scholar 

  17. Sahlgren, M.: The distributional hypothesis. Italian J. Linguist. 20, 33–53 (2008)

    Google Scholar 

  18. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, Red Hook, NY, USA, pp. 6000–6010. Curran Associates Inc. (2017). ISBN 9781510860964

    Google Scholar 

  19. Veres, C., Sandblåst, B.H.: A machine learning benchmark with meaning: learnability and verb semantics. In: Liu, J., Bailey, J. (eds.) AI 2019. LNCS (LNAI), vol. 11919, pp. 369–380. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35288-2_30

    Chapter  Google Scholar 

  20. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: The Proceedings of ICLR (2019)

    Google Scholar 

  21. Wang, A., et al.: SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems (2019)

    Google Scholar 

  22. Wang, W., et al.: Structbert: Incorporating language structures into pre-training for deep language understanding (2019)

    Google Scholar 

  23. Warstadt, A., Singh, A., Bowman, S.R.: Neural Network Acceptability Judgments (2018)

    Google Scholar 

  24. Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. CoRR, abs/1906.08237 (2019). http://arxiv.org/abs/1906.08237

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Csaba Veres .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Veres, C., Sampson, J. (2021). You Can’t Learn What’s Not There: Self Supervised Learning and the Poverty of the Stimulus. In: Métais, E., Meziane, F., Horacek, H., Kapetanios, E. (eds) Natural Language Processing and Information Systems. NLDB 2021. Lecture Notes in Computer Science(), vol 12801. Springer, Cham. https://doi.org/10.1007/978-3-030-80599-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-80599-9_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-80598-2

  • Online ISBN: 978-3-030-80599-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics