Abstract
The main goal of this paper is to perform the dialog act(DA) tagging for Telugu corpus. Annotation of utterances with dialog acts is necessary to recognize the intent of speaker in dialog systems. While English language follows a strict subject–verb–object(SVO) syntax, Telugu is a free word order language. The n-gram DA tagging methods proposed for English language will not work for free word order languages like Telugu. In this paper, we propose a method to perform DA tagging for Telugu corpus using advanced machine learning techniques combined with karaka dependency relation modifiers. In other words, we use syntactic features obtained from karaka dependencies and apply combination of language models(LMs) at utterance level with Hidden Markov Model(HMM) at context level for DA tagging. The use of karaka dependencies for free word order languages like Telugu helps in extracting the modifier-modified relationships between words or word clusters for an utterance. The modifier-modified relationships remain fixed even though the word order in an utterance changes. These extracted modifier-modified relationships appear similar to n-grams. Statistical machine learning methods such as combination of LMs and HMM are applied to predict DA for an utterance in a dialog. The proposed method is compared with several baseline tagging algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The dialogs given here were originally written in Telugu but translated to English for reader’s understanding.
References
Austin, J.L.: How to do Things with Words, vol. 367. Oxford University Press, Cambridge (1975)
Král, P., Cerisara, C.: Automatic dialogue act recognition with syntactic features. Lang. Resour. Eval. 48, 419–441 (2014)
Ivanovic, E.: Dialogue act tagging for instant messaging chat sessions. In: Proceedings of the ACL Student Research Workshop, pp. 79–84. Association for Computational Linguistics (2005)
Garner, P.N., Browning, S.R., Moore, R.K., Russell, M.J.: A theory of word frequencies and its application to dialogue move recognition. In: ICSLP 1996 Proceedings of Fourth International Conference on Spoken Language, vol. 3, pp. 1880–1883. IEEE (1996)
Louwerse, M.M., Crossley, S.A.: Dialog act classification using n-gram algorithms. In: FLAIRS Conference, pp. 758–763 (2006)
Webb, N., Ferguson, M.: Automatic extraction of cue phrases for cross-corpus dialogue act classification. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1310–1317. Association for Computational Linguistics (2010)
Král, P., Cerisara, C.: Dialogue act recognition approaches. Comput. Inf. 29, 227–250 (2012)
Bharati, A., Sangal, R., Sharma, D.M., Bai, L.: AnnCorra: annotating corpora guidelines for POS and chunk annotation for Indian languages. LTRC-TR31 (2006)
Bharati, A., Chaitanya, V., Sangal, R., Ramakrishnamacharyulu, K.: Natural Language Processing: A Paninian Perspective. Prentice-Hall of India, New Delhi (1995)
Begum, R., Husain, S., Dhwaj, A., Sharma, D.M., Bai, L., Sangal, R.: Dependency annotation scheme for Indian languages. In: IJCNLP, pp. 721–726. Citeseer (2008)
Mohanan, K.P.: Grammatical relations and clause structure in Malayalam. Ment. Represent. Gramm. Relat. 504, 589 (1982)
Dowlagar, S., Mamidi, R.: A semi supervised dialog act tagging for Telugu. In: ICON 2015 : 12th International Conference on Natural Language Processing (2015)
Brants, T., Popat, A.C., Xu, P., Och, F.J., Dean, J.: Large language models in machine translation. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Citeseer (2007)
Jurafsky, D., Martin, J.H.: Speech & Language Processing. Pearson Education India, Noida (2000)
Core, M.G., Allen, J.: Coding dialogs with the DAMSL annotation scheme. In: AAAI Fall Symposium on Communicative Action in Humans and Machines, Boston, MA, pp. 28–35 (1997)
PVS, A., Karthik, G.: Part-of-speech tagging and chunking using conditional random fields and transformation based learning. Shallow Parsing South Asian Lang. 21 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Dowlagar, S., Mamidi, R. (2018). A Karaka Dependency Based Dialog Act Tagging for Telugu Using Combination of LMs and HMM. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-75477-2_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)