Abstract
In this paper, we deal with the problem of storing and retrieving dialectal data in a unified framework. In particular, we discuss issues concerning the design and implementation of a multimedia database which will contain written and oral data from three Greek dialects in Asia Minor. At first, we describe the overall architecture of a system aiming at providing the user with the possibility to store audio recordings, text transcripts, and other annotations. Then we discuss the possibilities and limitations of a retrieval module aiming at combining different linguistic levels for a unified exploitation of oral and written corpora.
This research has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program ”Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) - Research Funding Program: Thalis. Investing in knowledge society through the European SocialFund.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anhoj, J.: Generic Design of Web-Based Clinical Databases. Journal Medical Internet Research 4 (2003)
Barbiers, S., et al.: Dynamic Syntactic Atlas of the Dutch dialects (DynaSAND). Meertens Institute, Amsterdam (2006), http://www.meertens.knaw.nl/sand/
Boersma, P.: The use of Praat in corpus research. In: Jacques Durand, J., Gut, U., Kristofferson, G. (eds.) Handbook of Corpus Phonology, OUP, Oxford (2012)
Boersma, P., Weenink, D.: Praat: Doing phonetics by computer (2013), http://www.praat.org
Buttcher, S., Clarke, C., Cormack, G.: Information Retrieval: Implementing and Evaluating Search Engines. MIT Press, Cambridge (2010)
ELAN: Max Planck Institute for Psycholinguistics, The Language Archive, Nijmegen, The Netherlands, http://tla.mpi.nl/tools/tla-tools/elan/
Fromont, R., Hay, J.: ONZE Miner: the development of a browser-based research tool. Corpora 3(2), 173–193 (2008)
Galiotou, E., Karanikolas, N., Manolessou, I., Pantelidis, N., Papazachariou, D., Ralli, A., Xydopoulos, G.: Asia Minor Greek: Towards a Computational Processing. In: Procedia: Social and Behavioral Science. Elsevier (in press, 2014)
Johnson, S.B., Chatziantoniou, D.: Extended SQL for manipulating clinical warehouse data. In: AMIA 1999, pp. 819–823 (1999)
Karanikolas, N.N., Galiotou, E., Xydopoulos, G.J., Ralli, A., Athanasakos, K., Koronakis, G.: Structuring a Multimedia tridialectal dictionary. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 509–518. Springer, Heidelberg (2013)
Koliopoulou, M., Markopoulos, T., Pantelidis, N.: Pontus, Cappadocia, Aivali: Challenges of a digital corpus of written material. In: The 11th International Conference of Greek Linguistics, Rhodes (September 2013) (in Greek)
Koutsoukos, N., Ralli, A.: From derivation to inflection: a process of grammaticalization. In: Morphology Meeting 2012. Leiden, the Netherlands (2012)
LaBB-CAT (formerly ONZE Miner), http://onzeminer.sourceforge.net/
Manolessou, I., Beis, S., Bassea-Bezantakou: The phonetic transcription of Modern Greek dialects. Lexicographicon Deltion 26, 161–222 (2012) (in Greek)
Nadkarni, P.: Clinical Patient Record Systems Architecture: An Overview. Journal of Postgraduate Medicine 46(3), 199–204 (2000)
Nadkarni, P.: An introduction to entity-attribute-value design for generic clinical study data management systems. Presentation in: National GCRC Meeting, Baltimore, MD (2002)
Nerbonne, J., Kleiweg, P.: Lexical distance in LAMSAS. Computers and the Humanities 37(3), 339–357 (2003)
Ralli, A., Papazachariou, D., Karasimos, A.: Laboratory of Modern Greek Dialects and the project GreeD. In: Ralli, A., et al. (eds.) Proc. 4th Int. Conf. of Modern Greek Dialects and Linguistic Theory (2010)
Sloetjes, H., Wittenburg, P.: Annotation by category - ELAN and ISO DCR. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (2008)
Themistocleous, C., Katsogiannou, M., Armosti, S., Christodoulou, K.: Cypriot Greek Lexicography: An Online Lexical Database. In: Proceedings of Euralex, pp. 889–891 (2012)
Wallis, S., Nelson, G.: Knowledge discovery in grammatically analyzed corpora. Data Mining & Knowledge Discovery 5, 305–335 (2001)
Wells, J.C.: ’SAMPA computer readable phonetic alphabet’. In: Gibbon, D., Moore, R., Winski, R. (eds.) Handbook of Standards and Resources for Spoken Language Systems 1997, Part IV, section B. Mouton de Gruyter, Berlin (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Karanikolas, N.N., Galiotou, E., Ralli, A. (2014). Towards a Unified Exploitation of Electronic Dialectal Corpora: Problems and Perspectives. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-10816-2_32
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10815-5
Online ISBN: 978-3-319-10816-2
eBook Packages: Computer ScienceComputer Science (R0)