Automatic Extraction of Typological Linguistic Features from Descriptive Grammars

Virk, Shafqat Mumtaz; Borin, Lars; Saxena, Anju; Hammarström, Harald

doi:10.1007/978-3-319-64206-2_13

Shafqat Mumtaz Virk¹⁵,
Lars Borin¹⁵,
Anju Saxena¹⁶ &
…
Harald Hammarström¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1547 Accesses
4 Citations

Abstract

The present paper describes experiments on automatically extracting typological linguistic features of natural languages from traditional written descriptive grammars. The feature-extraction task has high potential value in typological, genealogical, historical, and other related areas of linguistics that make use of databases of structural features of languages. Until now, extraction of such features from grammars has been done manually, which is highly time and labor consuming and becomes prohibitive when extended to the thousands of languages for which linguistic descriptions are available. The system we describe here starts from semantically parsed text over which a set of rules are applied in order to extract feature values. We evaluate the system’s performance on the manually curated Grambank database as the gold standard and report the first measures of precision and recall for this problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
An adnominal property word corresponds to an adjective or participle in English and many other languages.
2.
http://www.nltk.org/.
3.
A Tibeto-Burman language of Burma with about 10,000 speakers.
4.
The argument string is split into a set of words using NLTK’s word tokenizer.
5.
An Atlantic-Congo language spoken in Africa.
6.
A Chibchan language spoken in Central America.
7.
In a separate study we found that the average cost for a salaried student assistant to extract one datapoint from a descriptive grammar is 1.53 EUR.

References

Björkelund, A., Hafdell, L., Nugues, P.: Multilingual semantic role labeling. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task, CoNLL 2009, pp. 43–48. Association for Computational Linguistics, Stroudsburg (2009)
Google Scholar
Broscheit, S., Poesio, M., Ponzetto, S.P., Rodriguez, K.J., Romano, L., Uryupina, O., Versley, Y., Zanoli, R., Kessler, F.B.: Bart: a multilingual anaphora resolution system. In: Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval 2010, pp. 104–107 (2010)
Google Scholar
Grierson, G.A.: A Linguistic Survey of India, vol. I–XI. Government of India, Central Publication Branch, Calcutta (1903–1927)
Google Scholar
Polyakov, V.N., Solovyev, V.D., Wichmann, S., Belyaev, O.: Using wals and jazyki mira. Linguist. Typology 13, 137–167 (2009)
Article Google Scholar
Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N., Surdeanu, M., Jurafsky, D., Manning, C.: A multi-pass sieve for coreference resolution. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, pp. 492–501. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Reesink, G., Singer, R., Dunn, M.: Explaining the linguistic diversity of sahul using population models. PLoS Biol. 7(11), 1–9 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Språkbanken, Department of Swedish, University of Gothenburg, Gothenburg, Sweden
Shafqat Mumtaz Virk & Lars Borin
Department of Linguistics and Philology, Uppsala University, Uppsala, Sweden
Anju Saxena & Harald Hammarström

Authors

Shafqat Mumtaz Virk
View author publications
You can also search for this author in PubMed Google Scholar
Lars Borin
View author publications
You can also search for this author in PubMed Google Scholar
Anju Saxena
View author publications
You can also search for this author in PubMed Google Scholar
Harald Hammarström
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shafqat Mumtaz Virk .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Kamil Ekštein
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Virk, S.M., Borin, L., Saxena, A., Hammarström, H. (2017). Automatic Extraction of Typological Linguistic Features from Descriptive Grammars. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-64206-2_13
Published: 29 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64205-5
Online ISBN: 978-3-319-64206-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics