Abstract
Although natural (i.e. human) languages do not seem to follow a strictly formal grammar, their structure analysis and generation can be approximated by one. Having such a grammar is an important tool for programmatic language understanding. Due to the huge number of natural languages and their variations, processing tools that rely on human intervention are available only for the most popular ones. We explore the problem of unsupervisedly inducing a formal grammar for any language, using the Link Grammar paradigm, from unannotated parses also obtained without supervision from an input corpus. The details of our state-of-the-art grammar induction technology and its evaluation techniques are described, as well as preliminary results of its application on both synthetic and real world text-corpora.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Yuret, D.: Discovery of linguistic relations using lexical attraction. arXiv:cmp-lg/9805009 [cs.CL] (1998)
Vepstas, L., Goertzel, B.: Learning language from a large (unannotated) corpus. arXiv:1401.3372 [cs.CL], 14 January 2014
Sleator, D., Temperley, D.: Parsing English with a link grammar. In: Third International Workshop on Parsing Technologies (1993)
Glushchenko, A., et al.: Unsupervised language learning in OpenCog. In: Iklé, M., Franz, A., Rzepka, R., Goertzel, B. (eds.) AGI 2018. LNCS (LNAI), vol. 10999, pp. 109–118. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97676-1_11
Wrenn, J., Stetson, P., Johnson, S.: An unsupervised machine learning approach to segmentation of clinician-entered free text. In: AMIA Annual Symposium Proceedings 2007, vol. 2007, pp. 811–815 (2007)
Castillo-Domenech, C., Suarez-Madrigal, A.: Statistical parsing and unambiguous word representation in OpenCog’s Unsupervised Language Learning project. Göteborg: Chalmers University of Technology (2018). https://publications.lib.chalmers.se/records/fulltext/256408/256408.pdf
Dupoux E: Cognitive science in the era of artificial intelligence: a roadmap for reverse-engineering the infant language-learner. arXiv:1607.08723 [cs.CL] (2018)
Goertzel, B., Pennachin, C., Geisweiller, N: Engineering General Intelligence, Part 2: The CogPrime Architecture for Integrative, Embodied AGI. Atlantis Press (2014)
Harwath, D., Torralba, A., Glass, J.: Unsupervised learning of spoken language with visual context. In: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain (2016)
Došilović, F., Brčić, M., Hlupić, N.: Explainable artificial intelligence: a survey. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (2018)
Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: NIPS 20114 Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2, pp. 2177–2185 (2014)
Church, K., Hank, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. Arch. 16(1), 22–29 (1990)
Wall, M., Rechtsteiner, A., Rocha, L.: Singular value decomposition and principal component analysis. arXiv:physics/0208101 (2002)
Dai, B., Ding, S., Wahba, G.: Multivariate Bernoulli distribution. Bernoulli 19(4), 1465–1483 (2013)
Sculley, D.: Web-scale k-means clustering. In: WWW 2010 Proceedings of the 19th International Conference on World-Wide-Web, Raleigh, NC, USA, pp. 1177–1178 (2010)
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)
Bernstein-Ratner, N.: The phonology of parent child speech. Children’s Lang. 6(3), 159–174 (1987)
Brent, M., Cartwright, T.: Distributional regularity and phonotactic constraints are useful for segmentation. Cognition 61, 93–125 (1996)
Brent, M., Siskind, J.: The role of exposure to isolated words in early vocabulary development. Cognition 81(2), B33–B44 (2001)
Acknowledgements
We appreciate contributions by Linas Vepstas, including insightful discussions and critique on our research. We thank Amir Plivatsky for valuable feedback and maintenance and incremental improvements of the LG parser technology used in our work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Glushchenko, A., Suarez, A., Kolonin, A., Goertzel, B., Baskov, O. (2019). Programmatic Link Grammar Induction for Unsupervised Language Learning. In: Hammer, P., Agrawal, P., Goertzel, B., Iklé, M. (eds) Artificial General Intelligence. AGI 2019. Lecture Notes in Computer Science(), vol 11654. Springer, Cham. https://doi.org/10.1007/978-3-030-27005-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-27005-6_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27004-9
Online ISBN: 978-3-030-27005-6
eBook Packages: Computer ScienceComputer Science (R0)