Skip to main content

Automatic Extraction of Typological Linguistic Features from Descriptive Grammars

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Included in the following conference series:

Abstract

The present paper describes experiments on automatically extracting typological linguistic features of natural languages from traditional written descriptive grammars. The feature-extraction task has high potential value in typological, genealogical, historical, and other related areas of linguistics that make use of databases of structural features of languages. Until now, extraction of such features from grammars has been done manually, which is highly time and labor consuming and becomes prohibitive when extended to the thousands of languages for which linguistic descriptions are available. The system we describe here starts from semantically parsed text over which a set of rules are applied in order to extract feature values. We evaluate the system’s performance on the manually curated Grambank database as the gold standard and report the first measures of precision and recall for this problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    An adnominal property word corresponds to an adjective or participle in English and many other languages.

  2. 2.

    http://www.nltk.org/.

  3. 3.

    A Tibeto-Burman language of Burma with about 10,000 speakers.

  4. 4.

    The argument string is split into a set of words using NLTK’s word tokenizer.

  5. 5.

    An Atlantic-Congo language spoken in Africa.

  6. 6.

    A Chibchan language spoken in Central America.

  7. 7.

    In a separate study we found that the average cost for a salaried student assistant to extract one datapoint from a descriptive grammar is 1.53 EUR.

References

  1. Björkelund, A., Hafdell, L., Nugues, P.: Multilingual semantic role labeling. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, CoNLL 2009, pp. 43–48. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  2. Broscheit, S., Poesio, M., Ponzetto, S.P., Rodriguez, K.J., Romano, L., Uryupina, O., Versley, Y., Zanoli, R., Kessler, F.B.: Bart: a multilingual anaphora resolution system. In: Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval 2010, pp. 104–107 (2010)

    Google Scholar 

  3. Grierson, G.A.: A Linguistic Survey of India, vol. I–XI. Government of India, Central Publication Branch, Calcutta (1903–1927)

    Google Scholar 

  4. Polyakov, V.N., Solovyev, V.D., Wichmann, S., Belyaev, O.: Using wals and jazyki mira. Linguist. Typology 13, 137–167 (2009)

    Article  Google Scholar 

  5. Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N., Surdeanu, M., Jurafsky, D., Manning, C.: A multi-pass sieve for coreference resolution. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, pp. 492–501. Association for Computational Linguistics, Stroudsburg (2010)

    Google Scholar 

  6. Reesink, G., Singer, R., Dunn, M.: Explaining the linguistic diversity of sahul using population models. PLoS Biol. 7(11), 1–9 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shafqat Mumtaz Virk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Virk, S.M., Borin, L., Saxena, A., Hammarström, H. (2017). Automatic Extraction of Typological Linguistic Features from Descriptive Grammars. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64206-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64205-5

  • Online ISBN: 978-3-319-64206-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics