Abstract
We present our work aiming at turning the linguistic material available in Grierson’s classical Linguistic Survey of India (LSI) from a printed discursive textual description into a formally structured digital language resource, a database suitable for a broad array of linguistic investigations of the languages of South Asia. While doing so, we develop state-of-the-art language technology for automatically extracting the relevant grammatical information from the text of the LSI, and interactive linguistic information visualization tools for better analysis and comparisons of languages based on their structural and functional features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In linguistic works, South Asia is defined as the seven countries Pakistan, India, Nepal, Bhutan, Bangladesh, Sri Lanka, and the Maldives, plus some immediately adjacent areas (e.g., Tibet).
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
For instance, location data come mainly from the Glottolog: http://glottolog.org.
- 8.
http://dsal.uchicago.edu/books/lsi/ (Page images, no text search available.).
- 9.
- 10.
A Tibeto-Burman language spoken in southern Tedim township, Chin State, Burma.
- 11.
- 12.
References
Borin, L., Forsberg, M., Roxendal, J.: Korp – the corpus infrastructure of Språkbanken. In: Proceedings of LREC 2012, pp. 474–478. ELRA, Istanbul (2012). http://www.lrec-conf.org/proceedings/lrec2012/pdf/248_Paper.pdf
Broadwell, P.M., Tangherlini, T.R.: TrollFinder: geo-semantic exploration of a very large corpus of Danish folklore. In: The Third Workshop on Computational Models of Narrative, pp. 50–57. ELRA, Istanbul (2012)
Chuang, J., Ramage, D., Manning, C.D., Heer, J.: Interpretation and trust: designing model-driven visualizations for text analysis. In: ACM Human Factors in Computing Systems (CHI) (2012). http://vis.stanford.edu/papers/designing-model-driven-vis
Dryer, M.S., Haspelmath, M. (eds.): WALS Online. Max Planck Institute for Evolutionary Anthropology, Leipzig (2013). http://wals.info/
Ebert, K.: South Asia as a linguistic area. In: Brown, K. (ed.) Encyclopedia of Languages and Linguistics, 2nd edn. Elsevier, Oxford (2006)
Evert, S., Hardie, A.: Twenty-first century corpus workbench: updating a query architecture for the new millennium. In: Proceedings of the Corpus Linguistics 2011 Conference. University of Birmingham, Birmingham (2011)
Grierson, G.A.: A Linguistic Survey of India, vol. I-XI. Government of India, Central Publication Branch, Calcutta (1903–1927)
Hammarström, H., Forkel, R., Haspelmath, M., Bank, S.: Glottolog 2.7. Jena: Max Planck Institute for the Science of Human History (2016). http://glottolog.org
Havre, S., Hetzler, B., Nowell, L.: ThemeRiver: visualizing theme changes over time. IEEE Symposium on Information Visualization 2000. InfoVis 2000, pp. 115–123. IEEE, Salt Lake City (2000)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of ACL 2003, pp. 423–430. ACL, Sapporo (2003). http://dx.doi.org/10.3115/1075096.1075150
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL System Demonstrations, pp. 55–60. ACL, Portland (2014). http://www.aclweb.org/anthology/P/P14/P14-5010
Masica, C.P.: Defining a Linguistic Area: South Asia. Chicago University Press, Chicago (1976)
Michaelis, S.M., Maurer, P., Haspelmath, M., Huber, M. (eds.): APiCS Online. Max Planck Institute for Evolutionary Anthropology, Leipzig (2013). http://apics-online.info/
Recasens, M., Marneffe, M.C.D., Potts, C.: The life and death of discourse entities: identifying singleton mentions. In: Proceedings of NAACL-HLT 2013. ACL, Atlanta (2013)
Schilit, B.N., Kolak, O.: Exploring a digital library through key ideas. In: Proceedings of JCDL 2008, pp. 177–186. ACM, Pittsburgh (2008)
Smith, D.A.: Detecting and browsing events in unstructured text. In: SIGIR 2002. ACM, Tampere (2002)
Sun, G.D., Wu, Y.C., Liang, R.H., Liu, S.X.: A survey of visual analytics techniques and applications: state-of-the-art research and future challenges. J. Comput. Sci. Technol. 28(5), 852–867 (2013). http://dx.doi.org/10.1007/s11390-013-1383-8
Versley, Y., Moschitti, A., Poesio, M., Yang, X.: Coreference systems based on kernels methods. In: Proceedings of COLING 2008. ACL, Manchester (2008)
Acknowledgments
The work presented here was funded by the Swedish Research Council as part of the project South Asia as a linguistic area? Exploring big-data methods in areal and genetic linguistics (2015–2019, contract no. 421-2014-969), and by the University of Gothenburg as part of its funding of the Språkbanken language technology and digital humanities infrastructure.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Borin, L., Virk, S.M., Saxena, A. (2018). Language Technology for Digital Linguistics: Turning the Linguistic Survey of India into a Rich Source of Linguistic Information. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_42
Download citation
DOI: https://doi.org/10.1007/978-3-319-77113-7_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77112-0
Online ISBN: 978-3-319-77113-7
eBook Packages: Computer ScienceComputer Science (R0)